A growing problem with GT/GTX 6xx series cards.

Message boards : Number crunching : A growing problem with GT/GTX 6xx series cards.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1370770 - Posted: 23 May 2013, 0:08:41 UTC

1 thing that I have noticed over the last few months is the ever growing amount of rigs with GT/GTX 6xx series cards that have not had the environment variable setting, CUDA_GRID_SIZE_COMPAT value 1, set and are just trashing work at an ever increasing and alarming rate.

Some notice that they only produce errors, but instead of finding out why they just stop crunching (likely removing BOINC) without aborting the work they have left onboard so a combination of both means that lot of work now has to hang around the database for so much longer than it really needs to.

I don't mind taking the time on occasions to PM people with rigs that are throwing invalid work (and errors when I stumble over them) but there is absolutely no way that I'd have the time to PM all those GT/GTX 6xx owners (there is just far to many of them).

Yes, I got bored 1 day and started looking for them (while waiting for a job to turn up), but I gave up half way through the finished tasks on just 1 of my rigs.

Cheers.
ID: 1370770 · Report as offensive
spitfire_mk_2
Avatar

Send message
Joined: 14 Apr 00
Posts: 563
Credit: 27,306,885
RAC: 0
United States
Message 1370805 - Posted: 23 May 2013, 4:54:17 UTC

More work for the rest of us.
ID: 1370805 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1370806 - Posted: 23 May 2013, 4:58:42 UTC - in response to Message 1370805.  


Maybe so, but also unnecessary...
ID: 1370806 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1371072 - Posted: 23 May 2013, 21:23:57 UTC

Wouldnt be of help to issue a notice through BOINC messages?
ID: 1371072 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1371077 - Posted: 23 May 2013, 21:43:57 UTC
Last modified: 23 May 2013, 21:44:56 UTC

In about a week, (give or take), the problematic 6.10 (Cuda 3.0) application will probably be supplanted by x41zc as stock on main, for V7 Cuda multibeam processing. These applications are not susceptible to the Cuda 3.0 library faults concerned..
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1371077 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1371098 - Posted: 23 May 2013, 22:33:30 UTC - in response to Message 1371077.  

So not a problem with the card itself, a problem with their users who won't RTFM.

Honestly are we surprised?
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1371098 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1371102 - Posted: 23 May 2013, 22:54:00 UTC - in response to Message 1371098.  

So not a problem with the card itself, a problem with their users who won't RTFM.

Honestly are we surprised?

Actually, a problem with NVidia (the company) - which supplied the project with a CUDA 3.0 stock application compatible with Fermis, but didn't come back and supply a CUDA 3.2 (or later) stock application to support their own Kepler release.
ID: 1371102 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1371124 - Posted: 24 May 2013, 0:33:17 UTC - in response to Message 1371102.  

So not a problem with the card itself, a problem with their users who won't RTFM.

Honestly are we surprised?

Actually, a problem with NVidia (the company) - which supplied the project with a CUDA 3.0 stock application compatible with Fermis, but didn't come back and supply a CUDA 3.2 (or later) stock application to support their own Kepler release.

Nvidia, I'm beginning to think that their support of cuda isn't too serious anymore, that's My suspicion at least, with downclocks happening, I'm experimenting with 306.97 and a dummy plug on the 2nd DVI port, which when I have the desktop extended to it, results in 772MHz on My GTX580 being the minimum that the card will do, I found out about this by accident since I thought when Boinc was done for the day that the clock speed of the gpu would go from 772MHz to 51MHz, only that didn't happen, to turn off this effect I just specify in 'screen resolution' the desktop be restricted to the 1st monitor and not the second monitor/dummy plug on the card.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1371124 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1371374 - Posted: 24 May 2013, 16:03:01 UTC - in response to Message 1370770.  

GT/GTX 6xx series cards that have not had the environment variable setting


Thank you for this hint!
I am not producing that much amount of invalids with my 650Ti (the errors you see are 98% due to optimisation tests and failures during the card switch), but I had overlooked this issue, though I am constantly reading this forum.
Maybe it should be put in more evidence.
Now I have set the environment variable and I will see how everything will behave and if this brings the problems to 0 level.

Sleepy
ID: 1371374 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1371390 - Posted: 24 May 2013, 16:38:41 UTC
Last modified: 24 May 2013, 16:40:30 UTC

Thank you for this hint!
I am not producing that much amount of invalids with my 650Ti (the errors you see are 98% due to optimisation tests and failures during the card switch), but I had overlooked this issue, though I am constantly reading this forum.
Maybe it should be put in more evidence.

If you see Jason's post, hopefully soon it will be a thing of the past!

In about a week, (give or take), the problematic 6.10 (Cuda 3.0) application will probably be supplanted by x41zc as stock on main, for V7 Cuda multibeam processing. These applications are not susceptible to the Cuda 3.0 library faults concerned..

Although as you are using x41zc, you don't actually need the environment variable setting anyway.

Just one needed not all three:

Solution/workround:
a) Use an optimised Cuda application, where available.
b) Downgrade to the 301.42 driver - only possible on GTX 670/680/690 cards, not on newer releases like the 650/660 or their Ti variants.
c) Set an environment variable.
ID: 1371390 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1371604 - Posted: 25 May 2013, 1:05:38 UTC - in response to Message 1371077.  

In about a week, (give or take), the problematic 6.10 (Cuda 3.0) application will probably be supplanted by x41zc as stock on main, for V7 Cuda multibeam processing. These applications are not susceptible to the Cuda 3.0 library faults concerned..

That's good to hear.

Now to see if my 2500K can break into the top 100 by RAC before all those cards stop producing errors and both my rigs get pushed well down the list.

Cheers.
ID: 1371604 · Report as offensive
FLAT ERIC

Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1371812 - Posted: 25 May 2013, 15:47:47 UTC

I have been out of the loop for a few years and decided to come back with a new computer I just finished building. I am generating about 100-200 of these errors a day even after changing the environment variable.

I just updated my driver from 314 to 320 and I have GeForce GTX670

what exactly does it mean to prevent the computer to go to blank screen? the boinc screen saver has a "blank screen" option that I set to never. My power options are set to turn the monitor off after an hour. Is that what it is talking about?

Is there anything else I should do to fix this?

I'm running version 7.0.64 version of Boinc
ID: 1371812 · Report as offensive
Daddiogrif

Send message
Joined: 9 Apr 13
Posts: 5
Credit: 52,648
RAC: 0
United States
Message 1371821 - Posted: 25 May 2013, 16:07:10 UTC

Noob here

Just to be safe ...and lazy I suppose sorry

Any problems with a GTX 260 GPU's
thing take up so much room I no longer can fit my old GTX 8600 for PhysX card lol

Thanks
K
ID: 1371821 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1371853 - Posted: 25 May 2013, 17:07:36 UTC - in response to Message 1371812.  
Last modified: 25 May 2013, 17:08:52 UTC

I have been out of the loop for a few years and decided to come back with a new computer I just finished building. I am generating about 100-200 of these errors a day even after changing the environment variable.

I just updated my driver from 314 to 320 and I have GeForce GTX670

what exactly does it mean to prevent the computer to go to blank screen? the boinc screen saver has a "blank screen" option that I set to never. My power options are set to turn the monitor off after an hour. Is that what it is talking about?

Is there anything else I should do to fix this?

I'm running version 7.0.64 version of Boinc

I use windows 7 Pro x64(a 64 bit operating system), you could go to Control Panel/Power Options, click on Power Options and then set the monitor to not turn off, as all such options on this PC are set to Never.

Also the screen saver should be turned off also, that would be right click on the Desktop/Personalize, click on Personalize, then look for the icon on the lower right that says screen saver, double click on that icon, then you can turn off the screen saver, but this is only until one upgrades Boinc, then the screensaver is activated again.

And no this won't hurt anything, you can always manually turn the monitor off manually, there by allowing Boinc to crunch. I don't know if this will resolve any errors or not of course, but it can't hurt.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1371853 · Report as offensive
Profile trader
Volunteer tester

Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1371855 - Posted: 25 May 2013, 17:12:40 UTC - in response to Message 1371853.  

I've seen several posts on this issue and I use win7 x64 ult. i have a screen saver running. and power management for the monitor is turned on. never experienced this issue myself. could this issue be caused by low system memory, swapping between page file and ram?
ID: 1371855 · Report as offensive
FLAT ERIC

Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1371878 - Posted: 25 May 2013, 17:57:32 UTC - in response to Message 1371855.  

I've seen several posts on this issue and I use win7 x64 ult. i have a screen saver running. and power management for the monitor is turned on. never experienced this issue myself. could this issue be caused by low system memory, swapping between page file and ram?


I hope that isn't my issue. I am running 16gb of ram and 2gb on the video card.

I use windows 7 Pro x64(a 64 bit operating system), you could go to Control Panel/Power Options, click on Power Options and then set the monitor to not turn off, as all such options on this PC are set to Never.

Also the screen saver should be turned off also, that would be right click on the Desktop/Personalize, click on Personalize, then look for the icon on the lower right that says screen saver, double click on that icon, then you can turn off the screen saver, but this is only until one upgrades Boinc, then the screensaver is activated again.


man.. I really enjoyed the screen saver. I will give it a try. Thanks!

ID: 1371878 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1371897 - Posted: 25 May 2013, 18:47:34 UTC - in response to Message 1371812.  

I have been out of the loop for a few years and decided to come back with a new computer I just finished building. I am generating about 100-200 of these errors a day even after changing the environment variable.

I just updated my driver from 314 to 320 and I have GeForce GTX670

what exactly does it mean to prevent the computer to go to blank screen? the boinc screen saver has a "blank screen" option that I set to never. My power options are set to turn the monitor off after an hour. Is that what it is talking about?

Is there anything else I should do to fix this?

I'm running version 7.0.64 version of Boinc

Did you restart your PC after changing the environment variable as your errors still show as if you havn't.

Cheers.
ID: 1371897 · Report as offensive
FLAT ERIC

Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1371957 - Posted: 25 May 2013, 21:08:39 UTC - in response to Message 1371897.  


Did you restart your PC after changing the environment variable as your errors still show as if you havn't.


Thanks!
I may not have restarted. I just restarted right now. how do you check that?


ID: 1371957 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1371964 - Posted: 25 May 2013, 21:25:52 UTC - in response to Message 1371957.  


Did you restart your PC after changing the environment variable as your errors still show as if you havn't.


Thanks!
I may not have restarted. I just restarted right now. how do you check that?



The reason that I asked that was because your Stderr outputs for your errored CUDA work still showed, "CUFFT error in file 'd:/Projects/SETI/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 62", which is the error produced by not having that environment variable set when running the current stock CUDA app.

You'll just have to wait now until you get some more CUDA work to see if it's fixed.

Cheers.
ID: 1371964 · Report as offensive
FLAT ERIC

Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1372841 - Posted: 28 May 2013, 21:54:29 UTC

seems to be fixed :D
ID: 1372841 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : A growing problem with GT/GTX 6xx series cards.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.