Message boards :
Number crunching :
GPU in the Lunatics' apps
Message board moderation
Author | Message |
---|---|
Belthazor Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13 |
After install of 0.38 Lunatics, in addition of CUDA 6.08 I have CUDA 6.09 and CUDA 6.10_fermi. Is there any reasons to re-schedule WU from 6.08 to 6.09 or 6.10, and if yes, which of them would be better? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
After install of 0.38 Lunatics, in addition of CUDA 6.08 I have CUDA 6.09 and CUDA 6.10_fermi. Is there any reasons to re-schedule WU from 6.08 to 6.09 or 6.10, and if yes, which of them would be better? Doesn't matter which plan class/app version you reschedule to, as all your Cuda Wu's will use the x38g Cuda app anyway, Claggy |
Belthazor Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13 |
Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... Which host did you install the Cuda app on? as the only Cuda host you have doesn't meet the requirements for running the x38g app, The 0.38 Installer says: (x38g using Cuda32: requires driver 263.06+) Claggy |
Belthazor Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13 |
I don't know what Acer did, but I can't install any drivers from Nvidia site except ver.186 So shall I uninstall Lunatics? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I don't know what Acer did, but I can't install any drivers from Nvidia site except ver.186 The x38g Cuda app will be running in CPU fallback mode, hugely slower than even the Stock CPU app, I was going to suggest using the 0.37 installer and use the x32f app, but that requires 197.13+ (Cuda 3.0) drivers, Best to modify your app_info to use the Stock app instead, PM me your app_info and i'll modify it for you. Claggy |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml. As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform. Joe |
Belthazor Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13 |
At any rate, it was an dubious adventure, so I lost all my stock of WUs and now I have about 50 ghosts. When they expired, will it be a catastrophe for a RAC? |
kittyman Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004 |
At any rate, it was an dubious adventure, so I lost all my stock of WUs and now I have about 50 ghosts. When they expired, will it be a catastrophe for a RAC? They won't affect your RAC, only work returned without errors and validated adds to your RAC. Errored or aborted work will not reduce it, other than they do not add anything to your totals. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Mike Send message Joined: 17 Feb 01 Posts: 34264 Credit: 79,922,639 RAC: 80 |
Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... That doesn´t work with ATI cards. After 4 weeks crunching without flops values estimated times are still 6 times higher than real. With each crime and every kindness we birth our future. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least... It's not a matter of ATI or NVIDIA, any GPU which is much more than 10 times as fast as the CPU Whetstones will suffer the same. Quoting some extracts from Dr. Anderson's checkin-note for changeset [trac]changeset:21164[/trac] : projected FLOPS: the scheduler's best guess as to what will satisfy Those of you who refuse to "fiddle with" the value are condemning your hosts to sending a flops value usually far less than the GPU is actually producing. That's your choice, the system will probably supply enough GPU work to keep them busy unless there's an unexpected outage. My preference is to give a reasonable approximation of what an application will do so the system works as intended, I never expect BOINC to be able to rescind the old GIGO principle. Joe |
Mike Send message Joined: 17 Feb 01 Posts: 34264 Credit: 79,922,639 RAC: 80 |
Sorry but i disagree on that Joe. I say the code simply doesn´t work on GPUs. A ATI 4850 isn´t double as fast and dont adjust like it should. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Well, I see, on the I7-2600+ 2 ATI 5870 GPUs, times on both MB and AstroPulse aren't that far of. And I don't use FLOPs entries, have tried, speed indicated by BOINC 2270GFLOPS, made estimated time decrease, too much, but temporerely good for work fetch, maybe? They were fairly quick adjusted, after I stopped changing cmd-line-settings and run 1 AstroPulse per GPU and 2 MB WUs, both rev.177 on GPU and rev.365* on (SSSE3x) on CPU and SSE3+HD5 on ATI (AMD) GPU. *(Bêta MB app.rev365, for (SSSE3x) for CPU and (SSE3) for GPU.) I run 2 MBs and period_iteration{for_pulsefind}(?) 1, which stesses the GPU, 80% load. I lowered Core Voltage and Core speed, also memory speed. Got home and saw the smartdoctors, minimized flashing. Temp was 92C, soon after, they trotled down and I lowered settings. The rev.365 app. and 2 at a time, looks OK and a candidate for an installer?! Never noticed this before, but period_iteration 1 is the fasted setting on ATI 5870 GPU(s). |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Sorry but i disagree on that Joe. Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting... Joe |
Mike Send message Joined: 17 Feb 01 Posts: 34264 Credit: 79,922,639 RAC: 80 |
Sorry but i disagree on that Joe. I wasn´t talking about my host Joe. Last week my estimated times was 2:20 whilst running VLARs which take 3000 - 4000 seconds. This week i´m running mid range tasks which take 1600 - 1800 seconds each and estimated times are 3:35. That after a couple thousand units finnished. Not what i would call stable. Maybe i´m not the patient kind of guy. With each crime and every kindness we birth our future. |
Highlander Send message Joined: 5 Oct 99 Posts: 167 Credit: 37,987,668 RAC: 16 |
Since this patch http://boinc.berkeley.edu/trac/changeset/24128 there is no more stable estimate adjustment on app_info platforms when running CPU+GPU setup (without flops entry). The estimation has worked well before, but since that patch i had to setup a 1+10 cache to get a realtime cache of 2 days :-/. - Performance is not a simple linear function of the number of CPUs you throw at the problem. - |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
---[SNIPPED]---
Joe, is right and I certainly will change this. Stabillity is also more important, then excitement. While this (ATI)-host, already is in constant short supply of work, which can be a result of the lower XFLOPS estimate, if I understand this correctly. If I were to set set a FLOPS entry of 3.0 e+9, the server/scheduler, should respond more realistic? Or I'll change it to 1 MB per GPU, already switched to 1 AstroPulse task per GPU. Cause too many time I see 1 WU, running on 0.5 GPU, cause there aren't enough. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Sorry but i disagree on that Joe. The rsc_fpops_est values produced by the splitters were calibrated to approximately match a typical CPU a few years ago based on the Estimates and Deadlines revisited thread. For OpenCL ATI work it is not a good match at VLAR, and I hope before S@H v7 comes live here enough data can be gathered to revise those estimates to work better across the range of different kinds of processing. There's too much variation from system to system to expect it to ever be consistently accurate, but improvement is certainly possible. The dominant AR range for work here does tend to vary from week to week or month to month. The server-side average for APR can track that kind of change and compensate, though it takes about 300 validations to adapt. There are delays which affect that, of course, both your own host's cache size and how soon the wingmate reports. But if APR is allowed to function it should be able to keep estimates reasonable most of the time. APR does not function if a host is saying it's flops are less than 1/10 of APR, based on changeset [trac]changeset:24217[/trac], which replaced changeset [trac]changeset:24128[/trac]. For most anonymous platform hosts with GPUs, the current situation is that the APR for MB CPU work is less than 10 times the Whetstone benchmark but the APR for MB GPU work is much more than 10 times the benchmark. So without <flops> in app_info.xml each CPU task completed puts DCF near 1.0, then subsequent GPU tasks gradually lower it toward a small fractional value. That sawtooth pattern of course makes what is actually happening unclear. Joe |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
I've kept flops in all along, it only needs adjustments a few times when you have it right the first time, my DCF is most of the time around 1.0 on all my PCs. I tried without flops on my latest PC and it just don't work, either CPU or GPU is correct but not both, at last I put flops in. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
I've set a 10 e+10FLOPS value, on the ATI 5870 GPUs, which is quite real too, BOINC reports 2.7 e+11 (2720GFLOPS) for each 5870. I'll watch this host, see if it get's more work, too. And it has downloaded work, right after I changed this, to the above value. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.