Bug in server affecting older BOINC clients with NVIDIA GPUs.


log in

Advanced search

Message boards : News : Bug in server affecting older BOINC clients with NVIDIA GPUs.

1 · 2 · 3 · 4 . . . 7 · Next
Author Message
Eric Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1075
Credit: 7,762,805
RAC: 7,323
United States
Message 1271871 - Posted: 17 Aug 2012, 5:37:11 UTC
Last modified: 17 Aug 2012, 5:38:33 UTC

We've identified a bug in the current BOINC server that is online at SETI@home. With older BOINC clients this bug results in running multiple SETI@home GPU applications simultaneously on a single GPU.

While we debug and fix the problem we've suspended distribution of NVIDIA work. We hope that everything will be back to normal some time tomorrow.
____________

robertmiles
Send message
Joined: 16 Jan 12
Posts: 29
Credit: 580,684
RAC: 527
United States
Message 1271877 - Posted: 17 Aug 2012, 6:34:52 UTC - in response to Message 1271871.
Last modified: 17 Aug 2012, 6:35:21 UTC

Do the multiple workunits use some same resource of the GPU or interfere in some other way? Some BOINC projects seem to let two or more OpenCL GPU workunits, but not multiple CUDA GPU workunits, share a single GPU after checking that they are not assigned any of the same resources within the GPU.

Profile Francesco Forti
Avatar
Send message
Joined: 24 May 00
Posts: 280
Credit: 131,274,885
RAC: 46,259
Switzerland
Message 1271902 - Posted: 17 Aug 2012, 8:13:59 UTC

Last week I had same problem but with boinc 7.0.28 and lunatics v0.40 32 bit.
As soon I had tried to run two instance of GPU task on a new GT 640 (driver 301.42) using count 0.5 I get 80 instance of GPU running and I had to deinstall adnd reinstall everithing. Other hosts, with older GTX cards, are also abbe to run two instances.


____________

Shakir
Send message
Joined: 14 Aug 99
Posts: 3
Credit: 22,451,391
RAC: 23,629
Germany
Message 1271920 - Posted: 17 Aug 2012, 9:22:23 UTC - in response to Message 1271902.
Last modified: 17 Aug 2012, 9:24:46 UTC

I have a Notebook where this problem occoures.
It isnt a Problem starting more threads on the GPU, the bigest Problem in my case is, the 10 Setis on the GPU use around 2 gig ram and the NB is 100% in swap mode. I cant deactivate the GPU calculation either. Well, there are 8 regular CPU Setis working, so the system hit against the 4 gig ram wall.
I have to deaktivate calculation until this issue is fixed.

Thanks for working on it.
____________

Eric Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1075
Credit: 7,762,805
RAC: 7,323
United States
Message 1272118 - Posted: 17 Aug 2012, 15:59:37 UTC - in response to Message 1271877.
Last modified: 17 Aug 2012, 16:00:04 UTC

Do the multiple workunits use some same resource of the GPU or interfere in some other way?


The main problem is that the BOINC client, depending upon the relative speed of your GPU and CPU, could decide to run as many as 10 GPU apps per CPU core simultaneously. If you've got 4 CPU cores, that's 40 GPU apps running at once. So no, we're not talking about running 2 or even 4 apps simultaneously on the GPU.

The possible results, in order of severity, could be: 1) The apps error out when the GPU runs out of memory. 2) Your GPU driver freezes causes a reboot every time BOINC tries to run the apps. 3) Your GPU overheats and causes a reboot every time BOINC tries to run the apps.
____________

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 186
Credit: 25,422,366
RAC: 25,306
Canada
Message 1272124 - Posted: 17 Aug 2012, 16:09:10 UTC

I have Boinc client 7.0.31 installed on windows 7 x64

I am also running x41x miltibeam for gpu app

Since the outage my 460 gtx is now taking 1.5 hours to complete two tasks

It usually takes 15 minutes

Now I was wondering what was happening as I hope I did not blow my card.
It does not heat up to its normal temps when crunching

My task manager is showing the six cpu apps running and the two x41x tasks running but that is all.

I am hoping it is the server doing this and not poor 460 gtx which has been my main crunching unit

Michael Miles

Eric Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1075
Credit: 7,762,805
RAC: 7,323
United States
Message 1272161 - Posted: 17 Aug 2012, 16:51:49 UTC - in response to Message 1272124.

We've installed a fix from David Anderson that we hope will solve the problem. If you have a BOINC version 7 client, the problem never affected you, and you can stop reading this now.

If you use BOINC version 6, you are probably affected. The fix we installed will not fix workunits that have already been downloaded. For that, you've got four options. 1) Abort all your CUDA tasks. 2) Upgrade to BOINC v7 or 3) Exit BOINC, edit your client_state.xml to replace all the occurrences of "<type>NVIDIA</type>" with "<type>CUDA</type>" or 4) Just let it run and deal with a few reboots.

____________

Profile Sunny129
Avatar
Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1272179 - Posted: 17 Aug 2012, 17:11:53 UTC - in response to Message 1272124.

there was a first time poster named Cathy who posted a question about her GPU problems in this thread earlier, but her post has since mysteriously vanished. her post may have been slightly out of place in this thread, as its not a troubleshooting thread, but rather a thread dedicated to the status of the NOINC server bug. regardless, i'm hoping that her post was not deleted altogether, and at the very least towed to the appropriate sub-forum or thread so that her question can get answered...


I have Boinc client 7.0.31 installed on windows 7 x64

I am also running x41x miltibeam for gpu app

Since the outage my 460 gtx is now taking 1.5 hours to complete two tasks

It usually takes 15 minutes


Now I was wondering what was happening as I hope I did not blow my card.
It does not heat up to its normal temps when crunching

My task manager is showing the six cpu apps running and the two x41x tasks running but that is all.

I am hoping it is the server doing this and not poor 460 gtx which has been my main crunching unit

Michael Miles

doesn't sound like a server-side issue to me, despite all the issues the server has right now. it sounds more like your video driver crashed and reset itself, leaving the GPU in limp/safe mode. are your GPU's core and memory clocks underclocked to approx. half of what they should be? you'll need to open Catalyst Control Center or some 3rd party utility like MSI Afterburner to confirm this. if so, you'll need to suspend all BOINC work and reboot the entire system to bring the GPU out of safe mode. but even if this isn't the solution, i highly doubt physical damage is responsible for the way your GPU is currently acting.
____________

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 186
Credit: 25,422,366
RAC: 25,306
Canada
Message 1272188 - Posted: 17 Aug 2012, 17:24:13 UTC - in response to Message 1272179.

I did think it was a driver crash but a reboot will clear this out.
All OC programs say it is running okay except for temp on my gpu which is lower than usual.
I am going to try a driver reinstall and see what happens.
It started to do this right after the last outage which make me suspicious

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,366,360
RAC: 1,317
United States
Message 1272191 - Posted: 17 Aug 2012, 17:26:12 UTC - in response to Message 1272188.

I did think it was a driver crash but a reboot will clear this out.
All OC programs say it is running okay except for temp on my gpu which is lower than usual.
I am going to try a driver reinstall and see what happens.
It started to do this right after the last outage which make me suspicious



The units that are taking 1.5 hours aren't VLAR's by chance, are they?
____________

rob smith
Volunteer moderator
Send message
Joined: 7 Mar 03
Posts: 7652
Credit: 44,597,646
RAC: 74,553
United Kingdom
Message 1272192 - Posted: 17 Aug 2012, 17:27:13 UTC

Or Astropulses?
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 186
Credit: 25,422,366
RAC: 25,306
Canada
Message 1272229 - Posted: 17 Aug 2012, 18:07:17 UTC - in response to Message 1272192.


They are vlars, I just noticed that.
I am going to abort the vlars and see what happens

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 186
Credit: 25,422,366
RAC: 25,306
Canada
Message 1272231 - Posted: 17 Aug 2012, 18:12:59 UTC - in response to Message 1272229.

That was it.
Thanks you all. Man what a relief.
I thought I cooked my card running it in this weather.

Vlars ran up the time by a huge margin.

Thank you, thank you, thank you GOD almighty.

One thing I don't have right now is 200 dollars for a new card

Michael Miles

alan soden
Send message
Joined: 12 Feb 12
Posts: 2
Credit: 75,480
RAC: 49
Ireland
Message 1272318 - Posted: 17 Aug 2012, 22:15:30 UTC

ha ha maybe this bug is extraterestial ???
good luck with the fix.

Eric Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1075
Credit: 7,762,805
RAC: 7,323
United States
Message 1272335 - Posted: 17 Aug 2012, 23:02:05 UTC - in response to Message 1272318.

Damn, now the vlars are broke again?
____________

Peter
Avatar
Send message
Joined: 4 May 12
Posts: 22
Credit: 26,746
RAC: 0
United States
Message 1272351 - Posted: 17 Aug 2012, 23:37:07 UTC - in response to Message 1271871.

Hello
Thank you for finding the problem, I got it and when I stated that I was told it is impossible. So you proved that I was right, am not crazy.
All I can say to ones that told me I was wrong (I Told you so!)
____________
THEY SEE YOU!! LOOK UP!!

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 59,834,039
RAC: 87,893
Argentina
Message 1272360 - Posted: 18 Aug 2012, 0:02:03 UTC - in response to Message 1272335.

Damn, now the vlars are broke again?


I dont think so... the vlars aborted by him were sent on Aug 11 (I guess before the fixes)...
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,910,549
RAC: 13,462
United Kingdom
Message 1272375 - Posted: 18 Aug 2012, 0:34:56 UTC - in response to Message 1272360.

Damn, now the vlars are broke again?

I dont think so... the vlars aborted by him were sent on Aug 11 (I guess before the fixes)...

Agreed. No new ones seen here, either.

Profile BilBg
Avatar
Send message
Joined: 27 May 07
Posts: 2452
Credit: 5,393,105
RAC: 7,576
Bulgaria
Message 1272377 - Posted: 18 Aug 2012, 0:53:26 UTC - in response to Message 1272161.

If you use BOINC version 6, you are probably affected.

The fix we installed will not fix workunits that have already been downloaded.

For that, you've got four options.
1) Abort all your CUDA tasks.
2) Upgrade to BOINC v7 or
3) Exit BOINC, edit your client_state.xml to replace all the occurrences of "<type>NVIDIA</type>" with "<type>CUDA</type>" or
4) Just let it run and deal with a few reboots.

I like the option '3)' most (the 'Replace All' is easy using just Notepad)

I can think of another 'fix' (for those uncomfortable with any of 1) ... 4) above) but it involves 'hand' work:
- Temporarily Disable getting NVIDIA/CUDA tasks ('Use NVIDIA GPU' here: http://setiathome.berkeley.edu/prefs.php?subset=project)
- Suspend all your current CUDA tasks (sort by Application column, click ... Shift+click to select all CUDA tasks)
- Resume them one at a time (or 2, 3 at a time if your GPU is good enough (Fermi++))
- When all 'old' CUDA tasks are done - Enable again getting NVIDIA/CUDA tasks ('Use NVIDIA GPU' yes)


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 5
United Kingdom
Message 1272397 - Posted: 18 Aug 2012, 2:21:34 UTC - in response to Message 1272335.

Damn, now the vlars are broke again?

I run ATI GPU and have not seen a VLAR in days
If there are any out there i can not find them :(

1 · 2 · 3 · 4 . . . 7 · Next

Message boards : News : Bug in server affecting older BOINC clients with NVIDIA GPUs.

Copyright © 2014 University of California