Bug in server affecting older BOINC clients with NVIDIA GPUs.

Message boards : News : Bug in server affecting older BOINC clients with NVIDIA GPUs.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1271871 - Posted: 17 Aug 2012, 5:37:11 UTC
Last modified: 17 Aug 2012, 5:38:33 UTC

We've identified a bug in the current BOINC server that is online at SETI@home. With older BOINC clients this bug results in running multiple SETI@home GPU applications simultaneously on a single GPU.

While we debug and fix the problem we've suspended distribution of NVIDIA work. We hope that everything will be back to normal some time tomorrow.
@SETIEric@qoto.org (Mastodon)

ID: 1271871 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1271877 - Posted: 17 Aug 2012, 6:34:52 UTC - in response to Message 1271871.  
Last modified: 17 Aug 2012, 6:35:21 UTC

Do the multiple workunits use some same resource of the GPU or interfere in some other way? Some BOINC projects seem to let two or more OpenCL GPU workunits, but not multiple CUDA GPU workunits, share a single GPU after checking that they are not assigned any of the same resources within the GPU.
ID: 1271877 · Report as offensive
Profile Francesco Forti
Avatar

Send message
Joined: 24 May 00
Posts: 334
Credit: 204,421,005
RAC: 15
Switzerland
Message 1271902 - Posted: 17 Aug 2012, 8:13:59 UTC

Last week I had same problem but with boinc 7.0.28 and lunatics v0.40 32 bit.
As soon I had tried to run two instance of GPU task on a new GT 640 (driver 301.42) using count 0.5 I get 80 instance of GPU running and I had to deinstall adnd reinstall everithing. Other hosts, with older GTX cards, are also abbe to run two instances.


ID: 1271902 · Report as offensive
Shakir

Send message
Joined: 14 Aug 99
Posts: 3
Credit: 90,452,595
RAC: 93
Germany
Message 1271920 - Posted: 17 Aug 2012, 9:22:23 UTC - in response to Message 1271902.  
Last modified: 17 Aug 2012, 9:24:46 UTC

I have a Notebook where this problem occoures.
It isnt a Problem starting more threads on the GPU, the bigest Problem in my case is, the 10 Setis on the GPU use around 2 gig ram and the NB is 100% in swap mode. I cant deactivate the GPU calculation either. Well, there are 8 regular CPU Setis working, so the system hit against the 4 gig ram wall.
I have to deaktivate calculation until this issue is fixed.

Thanks for working on it.
ID: 1271920 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1272118 - Posted: 17 Aug 2012, 15:59:37 UTC - in response to Message 1271877.  
Last modified: 17 Aug 2012, 16:00:04 UTC

Do the multiple workunits use some same resource of the GPU or interfere in some other way?


The main problem is that the BOINC client, depending upon the relative speed of your GPU and CPU, could decide to run as many as 10 GPU apps per CPU core simultaneously. If you've got 4 CPU cores, that's 40 GPU apps running at once. So no, we're not talking about running 2 or even 4 apps simultaneously on the GPU.

The possible results, in order of severity, could be: 1) The apps error out when the GPU runs out of memory. 2) Your GPU driver freezes causes a reboot every time BOINC tries to run the apps. 3) Your GPU overheats and causes a reboot every time BOINC tries to run the apps.
@SETIEric@qoto.org (Mastodon)

ID: 1272118 · Report as offensive
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1272124 - Posted: 17 Aug 2012, 16:09:10 UTC

I have Boinc client 7.0.31 installed on windows 7 x64

I am also running x41x miltibeam for gpu app

Since the outage my 460 gtx is now taking 1.5 hours to complete two tasks

It usually takes 15 minutes

Now I was wondering what was happening as I hope I did not blow my card.
It does not heat up to its normal temps when crunching

My task manager is showing the six cpu apps running and the two x41x tasks running but that is all.

I am hoping it is the server doing this and not poor 460 gtx which has been my main crunching unit

Michael Miles
ID: 1272124 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1272161 - Posted: 17 Aug 2012, 16:51:49 UTC - in response to Message 1272124.  

We've installed a fix from David Anderson that we hope will solve the problem. If you have a BOINC version 7 client, the problem never affected you, and you can stop reading this now.

If you use BOINC version 6, you are probably affected. The fix we installed will not fix workunits that have already been downloaded. For that, you've got four options. 1) Abort all your CUDA tasks. 2) Upgrade to BOINC v7 or 3) Exit BOINC, edit your client_state.xml to replace all the occurrences of "NVIDIA" with "CUDA" or 4) Just let it run and deal with a few reboots.

@SETIEric@qoto.org (Mastodon)

ID: 1272161 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1272179 - Posted: 17 Aug 2012, 17:11:53 UTC - in response to Message 1272124.  

there was a first time poster named Cathy who posted a question about her GPU problems in this thread earlier, but her post has since mysteriously vanished. her post may have been slightly out of place in this thread, as its not a troubleshooting thread, but rather a thread dedicated to the status of the NOINC server bug. regardless, i'm hoping that her post was not deleted altogether, and at the very least towed to the appropriate sub-forum or thread so that her question can get answered...


I have Boinc client 7.0.31 installed on windows 7 x64

I am also running x41x miltibeam for gpu app

Since the outage my 460 gtx is now taking 1.5 hours to complete two tasks

It usually takes 15 minutes


Now I was wondering what was happening as I hope I did not blow my card.
It does not heat up to its normal temps when crunching

My task manager is showing the six cpu apps running and the two x41x tasks running but that is all.

I am hoping it is the server doing this and not poor 460 gtx which has been my main crunching unit

Michael Miles

doesn't sound like a server-side issue to me, despite all the issues the server has right now. it sounds more like your video driver crashed and reset itself, leaving the GPU in limp/safe mode. are your GPU's core and memory clocks underclocked to approx. half of what they should be? you'll need to open Catalyst Control Center or some 3rd party utility like MSI Afterburner to confirm this. if so, you'll need to suspend all BOINC work and reboot the entire system to bring the GPU out of safe mode. but even if this isn't the solution, i highly doubt physical damage is responsible for the way your GPU is currently acting.
ID: 1272179 · Report as offensive
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1272188 - Posted: 17 Aug 2012, 17:24:13 UTC - in response to Message 1272179.  

I did think it was a driver crash but a reboot will clear this out.
All OC programs say it is running okay except for temp on my gpu which is lower than usual.
I am going to try a driver reinstall and see what happens.
It started to do this right after the last outage which make me suspicious

ID: 1272188 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1272191 - Posted: 17 Aug 2012, 17:26:12 UTC - in response to Message 1272188.  

I did think it was a driver crash but a reboot will clear this out.
All OC programs say it is running okay except for temp on my gpu which is lower than usual.
I am going to try a driver reinstall and see what happens.
It started to do this right after the last outage which make me suspicious



The units that are taking 1.5 hours aren't VLAR's by chance, are they?
ID: 1272191 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1272192 - Posted: 17 Aug 2012, 17:27:13 UTC

Or Astropulses?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1272192 · Report as offensive
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1272229 - Posted: 17 Aug 2012, 18:07:17 UTC - in response to Message 1272192.  


They are vlars, I just noticed that.
I am going to abort the vlars and see what happens

ID: 1272229 · Report as offensive
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1272231 - Posted: 17 Aug 2012, 18:12:59 UTC - in response to Message 1272229.  

That was it.
Thanks you all. Man what a relief.
I thought I cooked my card running it in this weather.

Vlars ran up the time by a huge margin.

Thank you, thank you, thank you GOD almighty.

One thing I don't have right now is 200 dollars for a new card

Michael Miles
ID: 1272231 · Report as offensive
alan soden

Send message
Joined: 12 Feb 12
Posts: 2
Credit: 105,617
RAC: 0
Ireland
Message 1272318 - Posted: 17 Aug 2012, 22:15:30 UTC

ha ha maybe this bug is extraterestial ???
good luck with the fix.
ID: 1272318 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1272335 - Posted: 17 Aug 2012, 23:02:05 UTC - in response to Message 1272318.  

Damn, now the vlars are broke again?
@SETIEric@qoto.org (Mastodon)

ID: 1272335 · Report as offensive
Peter
Avatar

Send message
Joined: 4 May 12
Posts: 22
Credit: 26,746
RAC: 0
United States
Message 1272351 - Posted: 17 Aug 2012, 23:37:07 UTC - in response to Message 1271871.  

Hello
Thank you for finding the problem, I got it and when I stated that I was told it is impossible. So you proved that I was right, am not crazy.
All I can say to ones that told me I was wrong (I Told you so!)
THEY SEE YOU!! LOOK UP!!
ID: 1272351 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1272360 - Posted: 18 Aug 2012, 0:02:03 UTC - in response to Message 1272335.  

Damn, now the vlars are broke again?


I dont think so... the vlars aborted by him were sent on Aug 11 (I guess before the fixes)...
ID: 1272360 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1272375 - Posted: 18 Aug 2012, 0:34:56 UTC - in response to Message 1272360.  

Damn, now the vlars are broke again?

I dont think so... the vlars aborted by him were sent on Aug 11 (I guess before the fixes)...

Agreed. No new ones seen here, either.
ID: 1272375 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1272377 - Posted: 18 Aug 2012, 0:53:26 UTC - in response to Message 1272161.  

If you use BOINC version 6, you are probably affected.

The fix we installed will not fix workunits that have already been downloaded.

For that, you've got four options.
1) Abort all your CUDA tasks.
2) Upgrade to BOINC v7 or
3) Exit BOINC, edit your client_state.xml to replace all the occurrences of "<type>NVIDIA</type>" with "<type>CUDA</type>" or
4) Just let it run and deal with a few reboots.

I like the option '3)' most (the 'Replace All' is easy using just Notepad)

I can think of another 'fix' (for those uncomfortable with any of 1) ... 4) above) but it involves 'hand' work:
- Temporarily Disable getting NVIDIA/CUDA tasks ('Use NVIDIA GPU' here: http://setiathome.berkeley.edu/prefs.php?subset=project)
- Suspend all your current CUDA tasks (sort by Application column, click ... Shift+click to select all CUDA tasks)
- Resume them one at a time (or 2, 3 at a time if your GPU is good enough (Fermi++))
- When all 'old' CUDA tasks are done - Enable again getting NVIDIA/CUDA tasks ('Use NVIDIA GPU' yes)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1272377 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1272397 - Posted: 18 Aug 2012, 2:21:34 UTC - in response to Message 1272335.  

Damn, now the vlars are broke again?

I run ATI GPU and have not seen a VLAR in days
If there are any out there i can not find them :(
ID: 1272397 · Report as offensive
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : News : Bug in server affecting older BOINC clients with NVIDIA GPUs.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.