GPU in the Lunatics' apps

Author	Message
Belthazor Volunteer tester Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13	Message 1164473 - Posted: 22 Oct 2011, 12:39:40 UTC After install of 0.38 Lunatics, in addition of CUDA 6.08 I have CUDA 6.09 and CUDA 6.10_fermi. Is there any reasons to re-schedule WU from 6.08 to 6.09 or 6.10, and if yes, which of them would be better? ID: 1164473 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1164474 - Posted: 22 Oct 2011, 12:47:04 UTC - in response to Message 1164473. Last modified: 22 Oct 2011, 12:47:42 UTC After install of 0.38 Lunatics, in addition of CUDA 6.08 I have CUDA 6.09 and CUDA 6.10_fermi. Is there any reasons to re-schedule WU from 6.08 to 6.09 or 6.10, and if yes, which of them would be better? Doesn't matter which plan class/app version you reschedule to, as all your Cuda Wu's will use the x38g Cuda app anyway, Claggy ID: 1164474 ·

Belthazor Volunteer tester Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13	Message 1164475 - Posted: 22 Oct 2011, 12:53:55 UTC Last modified: 22 Oct 2011, 12:55:20 UTC Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... ID: 1164475 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1164476 - Posted: 22 Oct 2011, 13:06:50 UTC - in response to Message 1164475. Last modified: 22 Oct 2011, 13:07:28 UTC Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... Which host did you install the Cuda app on? as the only Cuda host you have doesn't meet the requirements for running the x38g app, The 0.38 Installer says: (x38g using Cuda32: requires driver 263.06+) Claggy ID: 1164476 ·

Belthazor Volunteer tester Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13	Message 1164477 - Posted: 22 Oct 2011, 13:10:58 UTC I don't know what Acer did, but I can't install any drivers from Nvidia site except ver.186 So shall I uninstall Lunatics? ID: 1164477 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1164478 - Posted: 22 Oct 2011, 13:16:31 UTC - in response to Message 1164477. Last modified: 22 Oct 2011, 13:18:04 UTC I don't know what Acer did, but I can't install any drivers from Nvidia site except ver.186 So shall I uninstall Lunatics? The x38g Cuda app will be running in CPU fallback mode, hugely slower than even the Stock CPU app, I was going to suggest using the 0.37 installer and use the x32f app, but that requires 197.13+ (Cuda 3.0) drivers, Best to modify your app_info to use the Stock app instead, PM me your app_info and i'll modify it for you. Claggy ID: 1164478 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1164501 - Posted: 22 Oct 2011, 15:20:44 UTC - in response to Message 1164475. Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml. As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform. Joe ID: 1164501 ·

Belthazor Volunteer tester Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13	Message 1164512 - Posted: 22 Oct 2011, 15:49:04 UTC At any rate, it was an dubious adventure, so I lost all my stock of WUs and now I have about 50 ghosts. When they expired, will it be a catastrophe for a RAC? ID: 1164512 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1164513 - Posted: 22 Oct 2011, 15:52:38 UTC - in response to Message 1164512. At any rate, it was an dubious adventure, so I lost all my stock of WUs and now I have about 50 ghosts. When they expired, will it be a catastrophe for a RAC? They won't affect your RAC, only work returned without errors and validated adds to your RAC. Errored or aborted work will not reduce it, other than they do not add anything to your totals. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1164513 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1164521 - Posted: 22 Oct 2011, 16:16:07 UTC - in response to Message 1164501. Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml. As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform. Joe That doesnÂ´t work with ATI cards. After 4 weeks crunching without flops values estimated times are still 6 times higher than real. With each crime and every kindness we birth our future. ID: 1164521 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1164532 - Posted: 22 Oct 2011, 17:00:38 UTC - in response to Message 1164527. Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml. As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform. Joe That doesnÂ´t work with ATI cards. After 4 weeks crunching without flops values estimated times are still 6 times higher than real. Yes, same here with my ATI 4850. Before ther latest F'up with the completion times/DCF/APR, my ATI running 2 AP's at a time was spot on with its completion estimates. It does 2 AP's in about 8 hours. Now though the estimates is, (depending on what the CPU AP's completes in), in between 24 and 31 hours, and it does not seem to adjust down much at all over the weeks. The CPU AP's though, has settled at reasonable estimates. Edit: However as long as I can keep the ATI card fed without having to reschedule, I refuse to fiddle with any flops. It's not a matter of ATI or NVIDIA, any GPU which is much more than 10 times as fast as the CPU Whetstones will suffer the same. Quoting some extracts from Dr. Anderson's checkin-note for changeset [trac]changeset:21164[/trac] : projected FLOPS: the scheduler's best guess as to what will satisfy X * elapsed_time = wu.rsc_fpops_est; this is used to make server-side runtime estimates, ... the <flops> field in app_info.xml files should be projected FLOPS. Those of you who refuse to "fiddle with" the value are condemning your hosts to sending a flops value usually far less than the GPU is actually producing. That's your choice, the system will probably supply enough GPU work to keep them busy unless there's an unexpected outage. My preference is to give a reasonable approximation of what an application will do so the system works as intended, I never expect BOINC to be able to rescind the old GIGO principle. Joe ID: 1164532 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1164618 - Posted: 22 Oct 2011, 21:48:10 UTC Sorry but i disagree on that Joe. I say the code simply doesnÂ´t work on GPUs. A ATI 4850 isnÂ´t double as fast and dont adjust like it should. With each crime and every kindness we birth our future. ID: 1164618 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1164631 - Posted: 22 Oct 2011, 22:32:02 UTC - in response to Message 1164618. Last modified: 22 Oct 2011, 23:00:00 UTC Well, I see, on the I7-2600+ 2 ATI 5870 GPUs, times on both MB and AstroPulse aren't that far of. And I don't use FLOPs entries, have tried, speed indicated by BOINC 2270GFLOPS, made estimated time decrease, too much, but temporerely good for work fetch, maybe? They were fairly quick adjusted, after I stopped changing cmd-line-settings and run 1 AstroPulse per GPU and 2 MB WUs, both rev.177 on GPU and rev.365* on (SSSE3x) on CPU and SSE3+HD5 on ATI (AMD) GPU. (BÃªta MB app.rev365, for (SSSE3x) for CPU and (SSE3) for GPU.) I run 2 MBs and period_iteration{for_pulsefind}(?) 1*, which stesses the GPU, 80% load. I lowered Core Voltage and Core speed, also memory speed. Got home and saw the smartdoctors, minimized flashing. Temp was 92C, soon after, they trotled down and I lowered settings. The rev.365 app. and 2 at a time, looks OK and a candidate for an installer?! Never noticed this before, but period_iteration 1 is the fasted setting on ATI 5870 GPU(s). ID: 1164631 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1164671 - Posted: 23 Oct 2011, 1:29:13 UTC - in response to Message 1164618. Sorry but i disagree on that Joe. I say the code simply doesnÂ´t work on GPUs. A ATI 4850 isnÂ´t double as fast and dont adjust like it should. Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting... Joe ID: 1164671 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1164700 - Posted: 23 Oct 2011, 6:28:43 UTC - in response to Message 1164671. Last modified: 23 Oct 2011, 6:32:45 UTC Sorry but i disagree on that Joe. I say the code simply doesnÂ´t work on GPUs. A ATI 4850 isnÂ´t double as fast and dont adjust like it should. Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting... Joe I wasnÂ´t talking about my host Joe. Last week my estimated times was 2:20 whilst running VLARs which take 3000 - 4000 seconds. This week iÂ´m running mid range tasks which take 1600 - 1800 seconds each and estimated times are 3:35. That after a couple thousand units finnished. Not what i would call stable. Maybe iÂ´m not the patient kind of guy. With each crime and every kindness we birth our future. ID: 1164700 ·

Highlander Send message Joined: 5 Oct 99 Posts: 167 Credit: 37,987,668 RAC: 16	Message 1164705 - Posted: 23 Oct 2011, 7:22:54 UTC Since this patch http://boinc.berkeley.edu/trac/changeset/24128 there is no more stable estimate adjustment on app_info platforms when running CPU+GPU setup (without flops entry). The estimation has worked well before, but since that patch i had to setup a 1+10 cache to get a realtime cache of 2 days :-/. - Performance is not a simple linear function of the number of CPUs you throw at the problem. - ID: 1164705 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1164748 - Posted: 23 Oct 2011, 12:06:29 UTC - in response to Message 1164705. Last modified: 23 Oct 2011, 12:35:36 UTC ---[SNIPPED]--- Those of you who refuse to "fiddle with" the value are condemning your hosts to sending a flops value usually far less than the GPU is actually producing. That's your choice, the system will probably supply enough GPU work to keep them busy unless there's an unexpected outage. My preference is to give a reasonable approximation of what an application will do so the system works as intended, I never expect BOINC to be able to rescind the old GIGO principle. Joe, is right and I certainly will change this. Stabillity is also more important, then excitement. While this (ATI)-host, already is in constant short supply of work, which can be a result of the lower XFLOPS estimate, if I understand this correctly. If I were to set set a FLOPS entry of 3.0 e+9, the server/scheduler, should respond more realistic? Or I'll change it to 1 MB per GPU, already switched to 1 AstroPulse task per GPU. Cause too many time I see 1 WU, running on 0.5 GPU, cause there aren't enough. ID: 1164748 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1164797 - Posted: 23 Oct 2011, 18:40:00 UTC - in response to Message 1164700. Sorry but i disagree on that Joe. I say the code simply doesnÂ´t work on GPUs. A ATI 4850 isnÂ´t double as fast and dont adjust like it should. Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting... Joe I wasnÂ´t talking about my host Joe. Last week my estimated times was 2:20 whilst running VLARs which take 3000 - 4000 seconds. This week iÂ´m running mid range tasks which take 1600 - 1800 seconds each and estimated times are 3:35. That after a couple thousand units finnished. Not what i would call stable. Maybe iÂ´m not the patient kind of guy. The rsc_fpops_est values produced by the splitters were calibrated to approximately match a typical CPU a few years ago based on the Estimates and Deadlines revisited thread. For OpenCL ATI work it is not a good match at VLAR, and I hope before S@H v7 comes live here enough data can be gathered to revise those estimates to work better across the range of different kinds of processing. There's too much variation from system to system to expect it to ever be consistently accurate, but improvement is certainly possible. The dominant AR range for work here does tend to vary from week to week or month to month. The server-side average for APR can track that kind of change and compensate, though it takes about 300 validations to adapt. There are delays which affect that, of course, both your own host's cache size and how soon the wingmate reports. But if APR is allowed to function it should be able to keep estimates reasonable most of the time. APR does not function if a host is saying it's flops are less than 1/10 of APR, based on changeset [trac]changeset:24217[/trac], which replaced changeset [trac]changeset:24128[/trac]. For most anonymous platform hosts with GPUs, the current situation is that the APR for MB CPU work is less than 10 times the Whetstone benchmark but the APR for MB GPU work is much more than 10 times the benchmark. So without <flops> in app_info.xml each CPU task completed puts DCF near 1.0, then subsequent GPU tasks gradually lower it toward a small fractional value. That sawtooth pattern of course makes what is actually happening unclear. Joe ID: 1164797 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 1164801 - Posted: 23 Oct 2011, 19:07:54 UTC I've kept flops in all along, it only needs adjustments a few times when you have it right the first time, my DCF is most of the time around 1.0 on all my PCs. I tried without flops on my latest PC and it just don't work, either CPU or GPU is correct but not both, at last I put flops in. ID: 1164801 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1164803 - Posted: 23 Oct 2011, 19:14:47 UTC - in response to Message 1164797. Last modified: 23 Oct 2011, 19:35:06 UTC I've set a 10 e+10FLOPS value, on the ATI 5870 GPUs, which is quite real too, BOINC reports 2.7 e+11 (2720GFLOPS) for each 5870. I'll watch this host, see if it get's more work, too. And it has downloaded work, right after I changed this, to the above value. ID: 1164803 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.