GPU in the Lunatics' apps

Message boards : Number crunching : GPU in the Lunatics' apps
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1164473 - Posted: 22 Oct 2011, 12:39:40 UTC

After install of 0.38 Lunatics, in addition of CUDA 6.08 I have CUDA 6.09 and CUDA 6.10_fermi. Is there any reasons to re-schedule WU from 6.08 to 6.09 or 6.10, and if yes, which of them would be better?
ID: 1164473 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1164474 - Posted: 22 Oct 2011, 12:47:04 UTC - in response to Message 1164473.  
Last modified: 22 Oct 2011, 12:47:42 UTC

After install of 0.38 Lunatics, in addition of CUDA 6.08 I have CUDA 6.09 and CUDA 6.10_fermi. Is there any reasons to re-schedule WU from 6.08 to 6.09 or 6.10, and if yes, which of them would be better?


Doesn't matter which plan class/app version you reschedule to, as all your Cuda Wu's will use the x38g Cuda app anyway,

Claggy
ID: 1164474 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1164475 - Posted: 22 Oct 2011, 12:53:55 UTC
Last modified: 22 Oct 2011, 12:55:20 UTC

Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least...
ID: 1164475 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1164476 - Posted: 22 Oct 2011, 13:06:50 UTC - in response to Message 1164475.  
Last modified: 22 Oct 2011, 13:07:28 UTC

Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least...

Which host did you install the Cuda app on? as the only Cuda host you have doesn't meet the requirements for running the x38g app,

The 0.38 Installer says: (x38g using Cuda32: requires driver 263.06+)

Claggy
ID: 1164476 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1164477 - Posted: 22 Oct 2011, 13:10:58 UTC

I don't know what Acer did, but I can't install any drivers from Nvidia site except ver.186
So shall I uninstall Lunatics?
ID: 1164477 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1164478 - Posted: 22 Oct 2011, 13:16:31 UTC - in response to Message 1164477.  
Last modified: 22 Oct 2011, 13:18:04 UTC

I don't know what Acer did, but I can't install any drivers from Nvidia site except ver.186
So shall I uninstall Lunatics?

The x38g Cuda app will be running in CPU fallback mode, hugely slower than even the Stock CPU app,

I was going to suggest using the 0.37 installer and use the x32f app, but that requires 197.13+ (Cuda 3.0) drivers,

Best to modify your app_info to use the Stock app instead, PM me your app_info and i'll modify it for you.

Claggy
ID: 1164478 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1164501 - Posted: 22 Oct 2011, 15:20:44 UTC - in response to Message 1164475.  

Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least...

When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml.

As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform.
                                                                  Joe
ID: 1164501 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 219
Credit: 10,373,795
RAC: 13
Russia
Message 1164512 - Posted: 22 Oct 2011, 15:49:04 UTC

At any rate, it was an dubious adventure, so I lost all my stock of WUs and now I have about 50 ghosts. When they expired, will it be a catastrophe for a RAC?
ID: 1164512 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1164513 - Posted: 22 Oct 2011, 15:52:38 UTC - in response to Message 1164512.  

At any rate, it was an dubious adventure, so I lost all my stock of WUs and now I have about 50 ghosts. When they expired, will it be a catastrophe for a RAC?

They won't affect your RAC, only work returned without errors and validated adds to your RAC. Errored or aborted work will not reduce it, other than they do not add anything to your totals.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1164513 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1164521 - Posted: 22 Oct 2011, 16:16:07 UTC - in response to Message 1164501.  

Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least...

When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml.

As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform.
                                                                  Joe


That doesn´t work with ATI cards.
After 4 weeks crunching without flops values estimated times are still 6 times higher than real.



With each crime and every kindness we birth our future.
ID: 1164521 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1164532 - Posted: 22 Oct 2011, 17:00:38 UTC - in response to Message 1164527.  

Ok. And the second question. Why "remaining" time is sharply changed from about two hours to 25 hours? I hope only that the real time will not change at least...

When it was running stock, the servers were supplying a <flops> value of about 2.4e10 for the 6.08 GPU application. For anonymous platform, the core client will be supplying a <flops> of about 2.1e09 based on the CPU Whetstone benchmark unless you put the <flops> values in app_info.xml.

As the ratio isn't much more than 10, the system will be able to adapt fairly well even without being told a reasonable <flops>. But there may be some interesting effects until it does adapt, particularly around the time there are 10 validated tasks which were sent after the transition to anonymous platform.
                                                                  Joe

That doesn´t work with ATI cards.
After 4 weeks crunching without flops values estimated times are still 6 times higher than real.

Yes, same here with my ATI 4850. Before ther latest F'up with the completion times/DCF/APR, my ATI running 2 AP's at a time was spot on with its completion estimates. It does 2 AP's in about 8 hours. Now though the estimates is, (depending on what the CPU AP's completes in), in between 24 and 31 hours, and it does not seem to adjust down much at all over the weeks. The CPU AP's though, has settled at reasonable estimates.


Edit: However as long as I can keep the ATI card fed without having to reschedule, I refuse to fiddle with any flops.

It's not a matter of ATI or NVIDIA, any GPU which is much more than 10 times as fast as the CPU Whetstones will suffer the same.

Quoting some extracts from Dr. Anderson's checkin-note for changeset [trac]changeset:21164[/trac] :

projected FLOPS: the scheduler's best guess as to what will satisfy
X * elapsed_time = wu.rsc_fpops_est;
this is used to make server-side runtime estimates,
...
the <flops> field in app_info.xml files should be projected FLOPS.

Those of you who refuse to "fiddle with" the value are condemning your hosts to sending a flops value usually far less than the GPU is actually producing. That's your choice, the system will probably supply enough GPU work to keep them busy unless there's an unexpected outage. My preference is to give a reasonable approximation of what an application will do so the system works as intended, I never expect BOINC to be able to rescind the old GIGO principle.
                                                                 Joe
ID: 1164532 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1164618 - Posted: 22 Oct 2011, 21:48:10 UTC

Sorry but i disagree on that Joe.
I say the code simply doesn´t work on GPUs.
A ATI 4850 isn´t double as fast and dont adjust like it should.



With each crime and every kindness we birth our future.
ID: 1164618 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1164631 - Posted: 22 Oct 2011, 22:32:02 UTC - in response to Message 1164618.  
Last modified: 22 Oct 2011, 23:00:00 UTC

Well, I see, on the I7-2600+ 2 ATI 5870 GPUs, times on both MB and AstroPulse
aren't that far of. And I don't use FLOPs entries, have tried, speed indicated by BOINC 2270GFLOPS, made estimated time decrease, too much, but temporerely
good for work fetch, maybe?
They were fairly quick adjusted, after I stopped changing cmd-line-settings
and run 1 AstroPulse per GPU and 2 MB WUs, both rev.177 on GPU and rev.365*
on (SSSE3x) on CPU and SSE3+HD5 on ATI (AMD) GPU.

*(Bêta MB app.rev365, for (SSSE3x) for CPU and (SSE3) for GPU.)

I run 2 MBs and period_iteration{for_pulsefind}(?) 1, which stesses the GPU, 80% load. I lowered Core Voltage and Core speed, also memory speed.
Got home and saw the smartdoctors, minimized flashing.
Temp was 92C, soon after, they trotled down and I lowered settings.
The rev.365 app. and 2 at a time, looks OK and a candidate for an installer?!
Never noticed this before, but period_iteration 1 is the fasted setting on ATI 5870 GPU(s).
ID: 1164631 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1164671 - Posted: 23 Oct 2011, 1:29:13 UTC - in response to Message 1164618.  

Sorry but i disagree on that Joe.
I say the code simply doesn´t work on GPUs.
A ATI 4850 isn´t double as fast and dont adjust like it should.

Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting...
                                                                   Joe
ID: 1164671 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1164700 - Posted: 23 Oct 2011, 6:28:43 UTC - in response to Message 1164671.  
Last modified: 23 Oct 2011, 6:32:45 UTC

Sorry but i disagree on that Joe.
I say the code simply doesn´t work on GPUs.
A ATI 4850 isn´t double as fast and dont adjust like it should.

Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting...
                                                                   Joe


I wasn´t talking about my host Joe.

Last week my estimated times was 2:20 whilst running VLARs which take 3000 - 4000 seconds.
This week i´m running mid range tasks which take 1600 - 1800 seconds each and estimated times are 3:35.
That after a couple thousand units finnished.
Not what i would call stable.

Maybe i´m not the patient kind of guy.


With each crime and every kindness we birth our future.
ID: 1164700 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1164705 - Posted: 23 Oct 2011, 7:22:54 UTC

Since this patch http://boinc.berkeley.edu/trac/changeset/24128 there is no more stable estimate adjustment on app_info platforms when running CPU+GPU setup (without flops entry).

The estimation has worked well before, but since that patch i had to setup a 1+10 cache to get a realtime cache of 2 days :-/.
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1164705 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1164748 - Posted: 23 Oct 2011, 12:06:29 UTC - in response to Message 1164705.  
Last modified: 23 Oct 2011, 12:35:36 UTC

---[SNIPPED]---

Those of you who refuse to "fiddle with" the value are condemning your hosts to sending a flops value usually far less than the GPU is actually producing. That's your choice, the system will probably supply enough GPU work to keep them busy unless there's an unexpected outage. My preference is to give a reasonable approximation of what an application will do so the system works as intended, I never expect BOINC to be able to rescind the old GIGO principle.


Joe, is right and I certainly will change this. Stabillity is also more important, then
excitement.
While this (ATI)-host, already is in constant short supply of work, which can be a result of the lower XFLOPS estimate, if I understand this correctly.
If I were to set set a FLOPS entry of 3.0 e+9, the server/scheduler, should respond
more realistic?
Or I'll change it to 1 MB per GPU, already switched to 1 AstroPulse task per GPU.
Cause too many time I see 1 WU, running on 0.5 GPU, cause there aren't enough.
ID: 1164748 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1164797 - Posted: 23 Oct 2011, 18:40:00 UTC - in response to Message 1164700.  

Sorry but i disagree on that Joe.
I say the code simply doesn´t work on GPUs.
A ATI 4850 isn´t double as fast and dont adjust like it should.

Your host with the ATI card has a Whetstone benchmark around 3e09, and you're running 2 MB tasks at a time on the GPU. That means the core client will tell the Scheduler the GPU MB application has a <flops> of about 1.5e09. It's your choice whether to allow that kind of terrible estimate to be sent or not. My choice would be to correct it. But I like things to be stable rather than exciting...
                                                                   Joe

I wasn´t talking about my host Joe.

Last week my estimated times was 2:20 whilst running VLARs which take 3000 - 4000 seconds.
This week i´m running mid range tasks which take 1600 - 1800 seconds each and estimated times are 3:35.
That after a couple thousand units finnished.
Not what i would call stable.

Maybe i´m not the patient kind of guy.

The rsc_fpops_est values produced by the splitters were calibrated to approximately match a typical CPU a few years ago based on the Estimates and Deadlines revisited thread. For OpenCL ATI work it is not a good match at VLAR, and I hope before S@H v7 comes live here enough data can be gathered to revise those estimates to work better across the range of different kinds of processing. There's too much variation from system to system to expect it to ever be consistently accurate, but improvement is certainly possible.

The dominant AR range for work here does tend to vary from week to week or month to month. The server-side average for APR can track that kind of change and compensate, though it takes about 300 validations to adapt. There are delays which affect that, of course, both your own host's cache size and how soon the wingmate reports. But if APR is allowed to function it should be able to keep estimates reasonable most of the time. APR does not function if a host is saying it's flops are less than 1/10 of APR, based on changeset [trac]changeset:24217[/trac], which replaced changeset [trac]changeset:24128[/trac]. For most anonymous platform hosts with GPUs, the current situation is that the APR for MB CPU work is less than 10 times the Whetstone benchmark but the APR for MB GPU work is much more than 10 times the benchmark. So without <flops> in app_info.xml each CPU task completed puts DCF near 1.0, then subsequent GPU tasks gradually lower it toward a small fractional value. That sawtooth pattern of course makes what is actually happening unclear.
                                                               Joe
ID: 1164797 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1164801 - Posted: 23 Oct 2011, 19:07:54 UTC

I've kept flops in all along, it only needs adjustments a few times when you have it right the first time, my DCF is most of the time around 1.0 on all my PCs. I tried without flops on my latest PC and it just don't work, either CPU or GPU is correct but not both, at last I put flops in.
ID: 1164801 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1164803 - Posted: 23 Oct 2011, 19:14:47 UTC - in response to Message 1164797.  
Last modified: 23 Oct 2011, 19:35:06 UTC

I've set a 10 e+10FLOPS value, on the ATI 5870 GPUs, which is quite real too,
BOINC reports 2.7 e+11 (2720GFLOPS) for each 5870.

I'll watch this host, see if it get's more work, too.

And it has downloaded work, right after I changed this, to the above value.
ID: 1164803 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : GPU in the Lunatics' apps


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.