A growing problem with GT/GTX 6xx series cards.


log in

Advanced search

Message boards : Number crunching : A growing problem with GT/GTX 6xx series cards.

1 · 2 · 3 · Next
Author Message
Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7100
Credit: 95,243,888
RAC: 73,864
Australia
Message 1370770 - Posted: 23 May 2013, 0:08:41 UTC

1 thing that I have noticed over the last few months is the ever growing amount of rigs with GT/GTX 6xx series cards that have not had the environment variable setting, CUDA_GRID_SIZE_COMPAT value 1, set and are just trashing work at an ever increasing and alarming rate.

Some notice that they only produce errors, but instead of finding out why they just stop crunching (likely removing BOINC) without aborting the work they have left onboard so a combination of both means that lot of work now has to hang around the database for so much longer than it really needs to.

I don't mind taking the time on occasions to PM people with rigs that are throwing invalid work (and errors when I stumble over them) but there is absolutely no way that I'd have the time to PM all those GT/GTX 6xx owners (there is just far to many of them).

Yes, I got bored 1 day and started looking for them (while waiting for a job to turn up), but I gave up half way through the finished tasks on just 1 of my rigs.

Cheers.

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 455
Credit: 12,601,810
RAC: 9,483
United States
Message 1370805 - Posted: 23 May 2013, 4:54:17 UTC

More work for the rest of us.
____________

Lionel
Send message
Joined: 25 Mar 00
Posts: 545
Credit: 230,901,492
RAC: 244,021
Australia
Message 1370806 - Posted: 23 May 2013, 4:58:42 UTC - in response to Message 1370805.


Maybe so, but also unnecessary...
____________

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 74,127,145
RAC: 67,392
Argentina
Message 1371072 - Posted: 23 May 2013, 21:23:57 UTC

Wouldnt be of help to issue a notice through BOINC messages?
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 5021
Credit: 73,552,037
RAC: 15,105
Australia
Message 1371077 - Posted: 23 May 2013, 21:43:57 UTC
Last modified: 23 May 2013, 21:44:56 UTC

In about a week, (give or take), the problematic 6.10 (Cuda 3.0) application will probably be supplanted by x41zc as stock on main, for V7 Cuda multibeam processing. These applications are not susceptible to the Cuda 3.0 library faults concerned..
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,837,003
RAC: 2,189
United States
Message 1371098 - Posted: 23 May 2013, 22:33:30 UTC - in response to Message 1371077.

So not a problem with the card itself, a problem with their users who won't RTFM.

Honestly are we surprised?
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8550
Credit: 50,380,753
RAC: 50,632
United Kingdom
Message 1371102 - Posted: 23 May 2013, 22:54:00 UTC - in response to Message 1371098.

So not a problem with the card itself, a problem with their users who won't RTFM.

Honestly are we surprised?

Actually, a problem with NVidia (the company) - which supplied the project with a CUDA 3.0 stock application compatible with Fermis, but didn't come back and supply a CUDA 3.2 (or later) stock application to support their own Kepler release.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46393
Credit: 36,739,633
RAC: 4,847
United States
Message 1371124 - Posted: 24 May 2013, 0:33:17 UTC - in response to Message 1371102.

So not a problem with the card itself, a problem with their users who won't RTFM.

Honestly are we surprised?

Actually, a problem with NVidia (the company) - which supplied the project with a CUDA 3.0 stock application compatible with Fermis, but didn't come back and supply a CUDA 3.2 (or later) stock application to support their own Kepler release.

Nvidia, I'm beginning to think that their support of cuda isn't too serious anymore, that's My suspicion at least, with downclocks happening, I'm experimenting with 306.97 and a dummy plug on the 2nd DVI port, which when I have the desktop extended to it, results in 772MHz on My GTX580 being the minimum that the card will do, I found out about this by accident since I thought when Boinc was done for the day that the clock speed of the gpu would go from 772MHz to 51MHz, only that didn't happen, to turn off this effect I just specify in 'screen resolution' the desktop be restricted to the 1st monitor and not the second monitor/dummy plug on the card.
____________
My Facebook, War Commander, 2015

sleepy
Avatar
Send message
Joined: 21 May 99
Posts: 78
Credit: 22,818,307
RAC: 19,054
Italy
Message 1371374 - Posted: 24 May 2013, 16:03:01 UTC - in response to Message 1370770.

GT/GTX 6xx series cards that have not had the environment variable setting


Thank you for this hint!
I am not producing that much amount of invalids with my 650Ti (the errors you see are 98% due to optimisation tests and failures during the card switch), but I had overlooked this issue, though I am constantly reading this forum.
Maybe it should be put in more evidence.
Now I have set the environment variable and I will see how everything will behave and if this brings the problems to 0 level.

Sleepy
____________

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7002
Credit: 26,704,626
RAC: 32,687
United Kingdom
Message 1371390 - Posted: 24 May 2013, 16:38:41 UTC
Last modified: 24 May 2013, 16:40:30 UTC

Thank you for this hint!
I am not producing that much amount of invalids with my 650Ti (the errors you see are 98% due to optimisation tests and failures during the card switch), but I had overlooked this issue, though I am constantly reading this forum.
Maybe it should be put in more evidence.

If you see Jason's post, hopefully soon it will be a thing of the past!

In about a week, (give or take), the problematic 6.10 (Cuda 3.0) application will probably be supplanted by x41zc as stock on main, for V7 Cuda multibeam processing. These applications are not susceptible to the Cuda 3.0 library faults concerned..

Although as you are using x41zc, you don't actually need the environment variable setting anyway.

Just one needed not all three:

Solution/workround:
a) Use an optimised Cuda application, where available.
b) Downgrade to the 301.42 driver - only possible on GTX 670/680/690 cards, not on newer releases like the 650/660 or their Ti variants.
c) Set an environment variable.
____________


Today is life, the only life we're sure of. Make the most of today.

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7100
Credit: 95,243,888
RAC: 73,864
Australia
Message 1371604 - Posted: 25 May 2013, 1:05:38 UTC - in response to Message 1371077.

In about a week, (give or take), the problematic 6.10 (Cuda 3.0) application will probably be supplanted by x41zc as stock on main, for V7 Cuda multibeam processing. These applications are not susceptible to the Cuda 3.0 library faults concerned..

That's good to hear.

Now to see if my 2500K can break into the top 100 by RAC before all those cards stop producing errors and both my rigs get pushed well down the list.

Cheers.

FLAT ERIC
Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1371812 - Posted: 25 May 2013, 15:47:47 UTC

I have been out of the loop for a few years and decided to come back with a new computer I just finished building. I am generating about 100-200 of these errors a day even after changing the environment variable.

I just updated my driver from 314 to 320 and I have GeForce GTX670

what exactly does it mean to prevent the computer to go to blank screen? the boinc screen saver has a "blank screen" option that I set to never. My power options are set to turn the monitor off after an hour. Is that what it is talking about?

Is there anything else I should do to fix this?

I'm running version 7.0.64 version of Boinc

Daddiogrif
Send message
Joined: 9 Apr 13
Posts: 5
Credit: 52,648
RAC: 0
United States
Message 1371821 - Posted: 25 May 2013, 16:07:10 UTC

Noob here

Just to be safe ...and lazy I suppose sorry

Any problems with a GTX 260 GPU's
thing take up so much room I no longer can fit my old GTX 8600 for PhysX card lol

Thanks
K

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46393
Credit: 36,739,633
RAC: 4,847
United States
Message 1371853 - Posted: 25 May 2013, 17:07:36 UTC - in response to Message 1371812.
Last modified: 25 May 2013, 17:08:52 UTC

I have been out of the loop for a few years and decided to come back with a new computer I just finished building. I am generating about 100-200 of these errors a day even after changing the environment variable.

I just updated my driver from 314 to 320 and I have GeForce GTX670

what exactly does it mean to prevent the computer to go to blank screen? the boinc screen saver has a "blank screen" option that I set to never. My power options are set to turn the monitor off after an hour. Is that what it is talking about?

Is there anything else I should do to fix this?

I'm running version 7.0.64 version of Boinc

I use windows 7 Pro x64(a 64 bit operating system), you could go to Control Panel/Power Options, click on Power Options and then set the monitor to not turn off, as all such options on this PC are set to Never.

Also the screen saver should be turned off also, that would be right click on the Desktop/Personalize, click on Personalize, then look for the icon on the lower right that says screen saver, double click on that icon, then you can turn off the screen saver, but this is only until one upgrades Boinc, then the screensaver is activated again.

And no this won't hurt anything, you can always manually turn the monitor off manually, there by allowing Boinc to crunch. I don't know if this will resolve any errors or not of course, but it can't hurt.
____________
My Facebook, War Commander, 2015

Profile trader
Volunteer tester
Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1371855 - Posted: 25 May 2013, 17:12:40 UTC - in response to Message 1371853.

I've seen several posts on this issue and I use win7 x64 ult. i have a screen saver running. and power management for the monitor is turned on. never experienced this issue myself. could this issue be caused by low system memory, swapping between page file and ram?

FLAT ERIC
Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1371878 - Posted: 25 May 2013, 17:57:32 UTC - in response to Message 1371855.

I've seen several posts on this issue and I use win7 x64 ult. i have a screen saver running. and power management for the monitor is turned on. never experienced this issue myself. could this issue be caused by low system memory, swapping between page file and ram?


I hope that isn't my issue. I am running 16gb of ram and 2gb on the video card.

I use windows 7 Pro x64(a 64 bit operating system), you could go to Control Panel/Power Options, click on Power Options and then set the monitor to not turn off, as all such options on this PC are set to Never.

Also the screen saver should be turned off also, that would be right click on the Desktop/Personalize, click on Personalize, then look for the icon on the lower right that says screen saver, double click on that icon, then you can turn off the screen saver, but this is only until one upgrades Boinc, then the screensaver is activated again.


man.. I really enjoyed the screen saver. I will give it a try. Thanks!

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7100
Credit: 95,243,888
RAC: 73,864
Australia
Message 1371897 - Posted: 25 May 2013, 18:47:34 UTC - in response to Message 1371812.

I have been out of the loop for a few years and decided to come back with a new computer I just finished building. I am generating about 100-200 of these errors a day even after changing the environment variable.

I just updated my driver from 314 to 320 and I have GeForce GTX670

what exactly does it mean to prevent the computer to go to blank screen? the boinc screen saver has a "blank screen" option that I set to never. My power options are set to turn the monitor off after an hour. Is that what it is talking about?

Is there anything else I should do to fix this?

I'm running version 7.0.64 version of Boinc

Did you restart your PC after changing the environment variable as your errors still show as if you havn't.

Cheers.

FLAT ERIC
Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1371957 - Posted: 25 May 2013, 21:08:39 UTC - in response to Message 1371897.


Did you restart your PC after changing the environment variable as your errors still show as if you havn't.


Thanks!
I may not have restarted. I just restarted right now. how do you check that?


Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7100
Credit: 95,243,888
RAC: 73,864
Australia
Message 1371964 - Posted: 25 May 2013, 21:25:52 UTC - in response to Message 1371957.


Did you restart your PC after changing the environment variable as your errors still show as if you havn't.


Thanks!
I may not have restarted. I just restarted right now. how do you check that?



The reason that I asked that was because your Stderr outputs for your errored CUDA work still showed, "CUFFT error in file 'd:/Projects/SETI/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 62", which is the error produced by not having that environment variable set when running the current stock CUDA app.

You'll just have to wait now until you get some more CUDA work to see if it's fixed.

Cheers.

FLAT ERIC
Send message
Joined: 8 Dec 07
Posts: 4
Credit: 1,028,735
RAC: 0
United States
Message 1372841 - Posted: 28 May 2013, 21:54:29 UTC

seems to be fixed :D

1 · 2 · 3 · Next

Message boards : Number crunching : A growing problem with GT/GTX 6xx series cards.

Copyright © 2014 University of California