BOINC 6.6.20 and CUDA and AP

Message boards : Number crunching : BOINC 6.6.20 and CUDA and AP
Message board moderation

To post messages, you must log in.

AuthorMessage
SmartWombat
Avatar

Send message
Joined: 9 Jan 04
Posts: 64
Credit: 6,577,011
RAC: 0
United Kingdom
Message 889291 - Posted: 29 Apr 2009, 0:24:57 UTC

Just upgraded to 6.6.20 and immediately got a lot of problems.
File has wrong size, calculation error, only using two CPUs on a 4 core system, remaining work time in the thousands of hours, all tasks running as high priority, only AP work units downloaded, CUDA reporting 0.15CPUs.

So far I have:
Upgraded the Nvidia GEForce 9600GT drivers to the latest WHQL
CUDA device: GeForce 9600 GT (driver version 18250, CUDA version 1.1, 512MB, est. 39GFLOPS)
? why does BOINC report CUDA 1.1 when I downloaded and installed what were linked to as CUDA2.0 drivers ?

With previous drivers the GPU speed was double:
CUDA device: GeForce 9600 GT (driver version 17116, CUDA version 1.1, 512MB, est. 78GFLOPS)
? any idea why the speed has dropped ?


Created cc_config.xml in the data directory C:\Documents and Settings\All Users\Application Data\BOINC
set <zero_debts>1</zero_debts> started boinc, stopped, set <zero_debts>0</zero_debts> and restarted.

Looks like I'm at 10% progress in 6h17m so the remaining 2191 hours is a bit over the top.
? will that resolve itself with time ?


PAul

[IMG][/IMG]
ID: 889291 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 889369 - Posted: 29 Apr 2009, 7:13:27 UTC - in response to Message 889291.  

? why does BOINC report CUDA 1.1 when I downloaded and installed what were linked to as CUDA2.0 drivers ?

It doesn't report the drivers there, but the CUDA computation capability of your GPU as detected by the nVidia API. Compare it to DirectX, having a DirectX7 compliant video card and you installing DirectX9. The card will only use the Dx7 portion of Dx9, not all of Dx9 as it isn't capable of using that.

If you want it to show Capability 2.0 you'd need a GTX 280 and higher.
ID: 889369 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 889383 - Posted: 29 Apr 2009, 8:11:45 UTC - in response to Message 889291.  

Just upgraded to 6.6.20 and immediately got a lot of problems.

I did not see any of the 6.6.20 specific problems I am aware of in your list, though I am mostly familiar with them as they present on GPU Grid.

Be advised that there are quite a few issues with the 6.6.x series, several of which we have now traced back to as early as 6.6.15.

*MY* experience (YMMV) is that the only "safe" and reliable version of BOINC to use with CUDA has been 6.5.0; all others have one problem or another ...

For example, 6.6.20 introduced a problem with some GPU Grid tasks taking up to 4 times as long to run, seemingly fixed in 6.6.23 which unfortunately seemed to make the work fetch imbalance bug worse (still not addressed) then 6.6.24 introduced a new problem which can be worked around with 6.6.25 and a new setting ...

We may have identified the issue with 6.6.24 and later not liking the second GPU in multi-GPU settings (memory size) though to this point the diagnostic print to log has still not been accepted ...

ANYWAY, if 6.6.20 works, cool ... but, so far, experience at GPU Grid is that the last "reliable" and low maintenance version is 6.5.0 ... as always YMMV ...
ID: 889383 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 889403 - Posted: 29 Apr 2009, 10:07:20 UTC

On the other hand, v6.5.0 has the problem that you can't run the same application (setiathome_enhanced - aka MB) on both the CPU and the GPU without complicated workrounds.

Paul's experience is predominantly with computers connected to a large and diverse range of different projects - v6.5.0 may well be the best so far in that scenario.

I prefer to crunch a smaller range of projects, and I've found v6.6 - once the initial setting up hiccups, which can be considerable (especially with optimised applications) have been overcome - to be far more satisfactory. There are still bugs to be ironed out - v6.6.23 is better than v6.6.20 - but for SETI enthusiasts I would recommend persevering with the upgrade.

Horses for courses - YMMV.
ID: 889403 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 889414 - Posted: 29 Apr 2009, 11:08:58 UTC - in response to Message 889291.  

Just upgraded to 6.6.20 and immediately got a lot of problems.
File has wrong size, calculation error, only using two CPUs on a 4 core system, remaining work time in the thousands of hours, all tasks running as high priority, only AP work units downloaded, CUDA reporting 0.15CPUs.

So far I have:
Upgraded the Nvidia GEForce 9600GT drivers to the latest WHQL
CUDA device: GeForce 9600 GT (driver version 18250, CUDA version 1.1, 512MB, est. 39GFLOPS)
? why does BOINC report CUDA 1.1 when I downloaded and installed what were linked to as CUDA2.0 drivers ?

With previous drivers the GPU speed was double:
CUDA device: GeForce 9600 GT (driver version 17116, CUDA version 1.1, 512MB, est. 78GFLOPS)
? any idea why the speed has dropped ?


Created cc_config.xml in the data directory C:\Documents and Settings\All Users\Application Data\BOINC
set <zero_debts>1</zero_debts> started boinc, stopped, set <zero_debts>0</zero_debts> and restarted.

Looks like I'm at 10% progress in 6h17m so the remaining 2191 hours is a bit over the top.
? will that resolve itself with time ?



As Jord has mentioned the cuda version isn't in fact the cuda version but is what is known as the compute capability. The message has been changed in 6.6.24 (onwards) to reflect this. As you have noticed its misleading.

The latest "release" drivers (182.50) are cuda 2 but i'm not sure if they are 2.0, or 2.1.

As Richard has mentioned 6.6.23 has a couple of fixes for cuda stuff to do with freeing video memory, etc and seems to be a bit more reliable than 6.6.20, but it is still a development version. As he says "horses for courses".

For the estimated GFLOPS that would most likely be the drivers, which were probably mis-reporting it in earlier ones. From memory you should have 180.48 as the minimum driver version to run S@H.

Estimates, well they should improve in time. It sounds like you may some of the VLAR work units which the cuda app is really bad at. They take a much longer time to process than normal work units. There are a few work arounds for them, none of which I use. I just leave them to work their way through. We have made a suggestion to the project admin staff of a way to work around them too but haven't heard anything back. They crunch a lot quicker on the cpu than cuda.

As for the only using 2 cpus, check your preferences. It should say "use at most 100%" of cpu. It sounds like yours may be set to 50% for some reason.

Cuda using .15% cpu. Thats good. Its meant to use only 10-15% of a single processor, which means the remaining 85% is available for cpu based crunching. You will notice it says "(.15 cpus, 1 cuda)" under the status column. That means it is using the minimal amount of cpu, but using one cuda device (video card).
BOINC blog
ID: 889414 · Report as offensive
SmartWombat
Avatar

Send message
Joined: 9 Jan 04
Posts: 64
Credit: 6,577,011
RAC: 0
United Kingdom
Message 889482 - Posted: 29 Apr 2009, 16:02:52 UTC

Thanks everyone.
I'll keep updating versions then, and hope that the estimates sort themselves out.
Currently I'm runnign with GPU turned off in cc_config, so I'll re-enable the CUDA once I've got the basic CPU compute working smoothly.
There are so many layers of preferences, I'll go back and check they're all set for my new system.
I replaced a 2 core with a 4 core (and it will go to 8 code with seceond CPU probably next month) so I suspect the settings at one level of preference ae from my old machine.

PAul

[IMG][/IMG]
ID: 889482 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 889588 - Posted: 29 Apr 2009, 20:29:19 UTC - in response to Message 889403.  

I prefer to crunch a smaller range of projects, and I've found v6.6 - once the initial setting up hiccups, which can be considerable (especially with optimised applications) have been overcome - to be far more satisfactory. There are still bugs to be ironed out - v6.6.23 is better than v6.6.20 - but for SETI enthusiasts I would recommend persevering with the upgrade.


Your other comments are true ... but I also have contacts with people that are running more restricted sets of projects. AND, they also have problems with 6.6.20 ... including a new one for me today where a person running only SaH AP also ran into the 6.6.20 run time bug where their run times were 2-4 times the normal ... change back to 6.4.5 (no CUDA) and run time started dropping.

SO, I am even more leery about 6.6.20 ...

But if it works for you ... I'm happy ... I am just cautioning ... note that the 6.6.20 bug is NOT consistent and may allow you to run for some time before striking. I do not know what causes the issue to arise.
ID: 889588 · Report as offensive
Profile mimo
Volunteer tester
Avatar

Send message
Joined: 7 Feb 03
Posts: 92
Credit: 14,957,404
RAC: 0
Slovakia
Message 889609 - Posted: 29 Apr 2009, 21:49:07 UTC - in response to Message 889369.  

? why does BOINC report CUDA 1.1 when I downloaded and installed what were linked to as CUDA2.0 drivers ?

It doesn't report the drivers there, but the CUDA computation capability of your GPU as detected by the nVidia API. Compare it to DirectX, having a DirectX7 compliant video card and you installing DirectX9. The card will only use the Dx7 portion of Dx9, not all of Dx9 as it isn't capable of using that.

If you want it to show Capability 2.0 you'd need a GTX 280 and higher.


2.0 ccompute capability dont exist for now, gtx 2xx have compute capability 1.3

ID: 889609 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 889621 - Posted: 29 Apr 2009, 22:30:41 UTC - in response to Message 889588.  

... including a new one for me today where a person running only SaH AP also ran into the 6.6.20 run time bug where their run times were 2-4 times the normal ...

Don't know if this is related, or even if it is new, but while I was running 6.6.20 (until a few days ago) I noticed that some non-VLAR CUDA WU's seemed to be taking longer than expected. I happened to catch a pair in the act on my GTX295 - following a run of WU's from the same batch that all crunched in about 8 minutes each, this pair were at about 50% after 11 minutes. On consulting Speedfan I saw this; i.e. the decline in temps of the upper 2 traces which are the GPU cores. The right-hand side of the plot from Speedfan shows what happened when I stopped and restarted the Boinc client. Those 2 WU's finished in a shade over 16 minutes rather than the 20+ minutes that I had noted for other "slackers" before and the following WU's (still from the same batch) reverted to 8 minute crunch times.
I had already contemplated moving to 6.6.23 to get away from the frustration of trying to locate particular WU's in my Task List because of the permanent EDF mode for CUDA tasks on 6.6.20. This kicked me into action and I have not (so far) noticed this effect on 6.6.23.

F.
ID: 889621 · Report as offensive
SmartWombat
Avatar

Send message
Joined: 9 Jan 04
Posts: 64
Credit: 6,577,011
RAC: 0
United Kingdom
Message 889635 - Posted: 29 Apr 2009, 23:01:03 UTC

Things are looking good.

Re-enabled the GPU and I now have four AP tasks running, one on each core.
Plus the GPU is ripping through S@H workunits in about 24 minutes.
Seems the estimation is getting better, as one of the AP WUs is now NOT running as high priority - indicating that one at least is out of panic mode.

I'm not trying to watch the F1 race from BBC iPlayer while running BOINC :)
I can browse the web and read email while BOINC is using all 4 cores and the GPU which is what I expected.

So it looks like 6.6.20 has recovered from its initial issues and is running smoothly. Now if I said that out loud it would HCF immediately ...
PAul

[IMG][/IMG]
ID: 889635 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 889637 - Posted: 29 Apr 2009, 23:08:12 UTC - in response to Message 889635.  

I'm not trying to watch the F1 race from BBC iPlayer while running BOINC :)

Ahh - that's the advantage of the GTX295 - I can watch BBC iPlayer (in its "high res" mode) whilst crunching on both GPU cores and all 4 CPU cores ;)

Good to hear you got it working properly on your rig.

F.
ID: 889637 · Report as offensive

Message boards : Number crunching : BOINC 6.6.20 and CUDA and AP


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.