Short estimated runtimes - don't panic

Message boards : Number crunching : Short estimated runtimes - don't panic
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1216850 - Posted: 11 Apr 2012, 14:25:25 UTC - in response to Message 1216775.  
Last modified: 11 Apr 2012, 15:15:36 UTC

Just curiosity, if flops are used then there should no be changes in the estimations? Does it means that they are playing with the APR?
(I'm using flops, and I've not seen any change, but neither I've seen any change in my, ussually, very high APR values..)

No, not 'playing with' - just a step towards using the whole APR properly.

For a while, we allowed APR to grow to unbelievable figures, and then stopped believing it. Now, we're starting to keep the value sane - and so we can start believing it again, as we should have been able to all along.


LOL... taking those steps to be able to trust on the APR was the intended meaning of "playing with"...

AP V6 APR is working very well on my hosts, just a litle above of the perfect value to keep the DCF at 1 +/-10%, very acceptable anyway. I guess that the filter on the splitters plus the threshold for the % blanked needed to be used in the APR, did a very good job.

But, APR for MB is more than 30 times higher than it should be... good to know that is going to be fixed!
ID: 1216850 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1216860 - Posted: 11 Apr 2012, 14:42:49 UTC

So I was thinking that once one of my already-cached APs finished the times for the newly-acquired tasks would be adjusted, but they haven't. It appears taht I have to complete one of those tasks for it to work. So as I burn through older tasks, the cache is being replenished with more and more low-estimate tasks.

I could.. and probably should go with a smaller cache or at least NNT before I start getting close to deadline issues. However, I'm churning through 6 per day, which means with a 25-day deadline, I shouldn't go much past about 140 of them cached. But I think my 1GB disk limit will end up being hit somewhere around 127 or so, so it should work out just fine.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1216860 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51474
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1216878 - Posted: 11 Apr 2012, 15:40:19 UTC

Great to finally see some movement towards setting things right again!

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1216878 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1216880 - Posted: 11 Apr 2012, 15:43:17 UTC - in response to Message 1216850.  

Horacio wrote:
...
AP V6 APR is working very well on my hosts, just a litle above of the perfect value to keep the DCF at 1 +/-10%, very acceptable anyway. I guess that the filter on the splitters plus the threshold for the % blanked needed to be used in the APR, did a very good job.

But, APR for MB is more than 30 times higher than it should be... good to know that is going to be fixed!

None of your hosts are showing high APRs for MB. In any case, the change to ignore runtimes of result_overfow MB tasks was made last year so there's nothing new to affect those APRs.
                                                                  Joe
ID: 1216880 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1216892 - Posted: 11 Apr 2012, 16:17:56 UTC - in response to Message 1216878.  

Great to finally see some movement towards setting things right again!


Agreed, but for my main cruncher the effects today have been rather dramatic. The first GPU WUs it downloaded after the weekly maintenance must have been shorties which (like all shorties on my system) immediately ran in HP mode. As the system corrected their short predicted run times, it also adjusted all the older WUs as well. As a result I now have around 800 MB WUs each with a predicted run time of 10 hours when they will actually take 20 minutes each on my GTX460! The net result has been that the cruncher has been starting and stopping WUs in HP mode all day. At one stage the list of WUs started and suspened because another one was higher priority covered a whole page of the screen.

To calm it down I have had to select NNT and suspend all non-started WUs. I will release them a few at a time. Has needed constant manual intervention all day.
ID: 1216892 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1216897 - Posted: 11 Apr 2012, 16:25:09 UTC - in response to Message 1216880.  

Horacio wrote:
...
AP V6 APR is working very well on my hosts, just a litle above of the perfect value to keep the DCF at 1 +/-10%, very acceptable anyway. I guess that the filter on the splitters plus the threshold for the % blanked needed to be used in the APR, did a very good job.

But, APR for MB is more than 30 times higher than it should be... good to know that is going to be fixed!

None of your hosts are showing high APRs for MB. In any case, the change to ignore runtimes of result_overfow MB tasks was made last year so there's nothing new to affect those APRs.
                                                                  Joe


This one, http://setiathome.berkeley.edu/host_app_versions.php?hostid=6187288
S@H enhanced GPU APR is now around 133, the flops value that works is 13 (and still some WUs give a DCF raw_ratio above 2)... Some time ago, (GPUs added in octuber last year) this value was above 300.

And in this one http://setiathome.berkeley.edu/host_app_versions.php?hostid=6569691, APR is 46.64 while the working flops value is 10, this host was build this year.

Ok, right now its not 30 times greather, but still not enough accurate to beeing able to stop using flops.

Anyway, the APR is just one side of this matter, the estimated tasks sizes of the MB Wus seem to be very accurate (and consistent with the Angle Range) for the GT430, but they fail badly on the 560Ti in which some tasks estimated as shorties take much more time than expected and viceversa... (But, I guess this is somewhat related to the different optimizations that are used with each hardware and I dont see any way in which the project can help from server side)
ID: 1216897 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13822
Credit: 208,696,464
RAC: 304
Australia
Message 1216900 - Posted: 11 Apr 2012, 16:31:16 UTC - in response to Message 1216892.  

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?

Grant
Darwin NT
ID: 1216900 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1216903 - Posted: 11 Apr 2012, 16:37:48 UTC - in response to Message 1216900.  

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?


It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.
ID: 1216903 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51474
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1216905 - Posted: 11 Apr 2012, 16:42:44 UTC - in response to Message 1216903.  

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?


It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.

You could try temporarily unchecking the option to leave suspended WUs in memory in either your preferences or in Boinc.
I am not positive this works for GPU tasks, but if it does it might allow you to leave Boinc on autopilot.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1216905 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1216907 - Posted: 11 Apr 2012, 16:55:42 UTC - in response to Message 1216905.  

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?


It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.

You could try temporarily unchecking the option to leave suspended WUs in memory in either your preferences or in Boinc.
I am not positive this works for GPU tasks, but if it does it might allow you to leave Boinc on autopilot.

It does work for both CPU & GPU applications. I have had to use that set to disabled as my GT8500 only has 256MB. So no room to run one and hold onto another.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1216907 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14666
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1216911 - Posted: 11 Apr 2012, 17:13:22 UTC - in response to Message 1216907.  
Last modified: 11 Apr 2012, 17:17:09 UTC

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?

It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.

You could try temporarily unchecking the option to leave suspended WUs in memory in either your preferences or in Boinc.
I am not positive this works for GPU tasks, but if it does it might allow you to leave Boinc on autopilot.

It does work for both CPU & GPU applications. I have had to use that set to disabled as my GT8500 only has 256MB. So no room to run one and hold onto another.

Really? Which version of BOINC? I thought they'd dropped the (very brief) experiment of keeping GPU apps in video memory when suspended a long time ago. Wait while I look...

Edit ... that was easy.

The policy for GPU jobs:

* jobs are always removed from memory, regardless of checkpoint (GPU memory is not paged, so it's bad to leave an idle app in memory)

That's from the changelog for v6.6.12, 4 March 2009.
ID: 1216911 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1216916 - Posted: 11 Apr 2012, 17:21:22 UTC - in response to Message 1216911.  
Last modified: 11 Apr 2012, 17:25:48 UTC

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?

It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.

You could try temporarily unchecking the option to leave suspended WUs in memory in either your preferences or in Boinc.
I am not positive this works for GPU tasks, but if it does it might allow you to leave Boinc on autopilot.

It does work for both CPU & GPU applications. I have had to use that set to disabled as my GT8500 only has 256MB. So no room to run one and hold onto another.

Really? Which version of BOINC? I thought they'd dropped the (very brief) experiment of keeping GPU apps in video memory when suspended a long time ago. Wait while I look...

Edit ... that was easy.

The policy for GPU jobs:

* jobs are always removed from memory, regardless of checkpoint (GPU memory is not paged, so it's bad to leave an idle app in memory)

That's from the changelog for v6.6.12, 4 March 2009.

And Boinc 6.6.37 had (6 July 2009):

- client: when suspending a GPU job, always remove it from memory, even if it hasn't checkpointed. Otherwise we'll typically run another GPU job right away, and it will bomb out or revert to CPU mode because it can't allocate video RAM


Claggy
ID: 1216916 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1216918 - Posted: 11 Apr 2012, 17:23:46 UTC - in response to Message 1216905.  

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?


It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.

You could try temporarily unchecking the option to leave suspended WUs in memory in either your preferences or in Boinc.
I am not positive this works for GPU tasks, but if it does it might allow you to leave Boinc on autopilot.


I had considered that, but like you was not sure if it applied to GPUs or not and the response from Richard suggests not. I am a bit of a control freak so the NNT and suspend most tasks option worked OK. the system has just finished all the partially started tasks and I have unsuspended another 50 or so GPU tasks. I will leave it on NNT overnight (with enough non-suspended tasks to keep it going) and see what the estimated completion times look like tomorrow. They should return to normal, especially as the CPU is busy crunching APs and so will not interfere.
ID: 1216918 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1216957 - Posted: 11 Apr 2012, 18:55:44 UTC - in response to Message 1216911.  

Has needed constant manual intervention all day.

I can't see why. It's not going to miss any deadlines so why not just let it go?

It had so many suspended GPU WUs that it was running out of GPU memory. I was getting error messages saying WUs were suspended waiting for GPU memory. By manually suspending them and releasing them a few at a time I have managed to get both GPUs in the system running properly again.

You could try temporarily unchecking the option to leave suspended WUs in memory in either your preferences or in Boinc.
I am not positive this works for GPU tasks, but if it does it might allow you to leave Boinc on autopilot.

It does work for both CPU & GPU applications. I have had to use that set to disabled as my GT8500 only has 256MB. So no room to run one and hold onto another.

Really? Which version of BOINC? I thought they'd dropped the (very brief) experiment of keeping GPU apps in video memory when suspended a long time ago. Wait while I look...

Edit ... that was easy.

The policy for GPU jobs:

* jobs are always removed from memory, regardless of checkpoint (GPU memory is not paged, so it's bad to leave an idle app in memory)

That's from the changelog for v6.6.12, 4 March 2009.

You are in fact correct. I just checked that with 6.12.33 I am using now and the 6.10.48 I was using before. I must have been remembering how it worked in a much older version or had some other issue where there were multiple GPU exes running on my 8500.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1216957 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1217043 - Posted: 11 Apr 2012, 22:30:52 UTC - in response to Message 1216897.  

Horacio wrote:
...
AP V6 APR is working very well on my hosts, just a litle above of the perfect value to keep the DCF at 1 +/-10%, very acceptable anyway. I guess that the filter on the splitters plus the threshold for the % blanked needed to be used in the APR, did a very good job.

But, APR for MB is more than 30 times higher than it should be... good to know that is going to be fixed!

None of your hosts are showing high APRs for MB. In any case, the change to ignore runtimes of result_overfow MB tasks was made last year so there's nothing new to affect those APRs.
                                                                  Joe


This one, http://setiathome.berkeley.edu/host_app_versions.php?hostid=6187288
S@H enhanced GPU APR is now around 133, the flops value that works is 13 (and still some WUs give a DCF raw_ratio above 2)... Some time ago, (GPUs added in octuber last year) this value was above 300.

133 GFLOPS corresponds to run time around 20 minutes for midrange, 6 minutes for VHAR, your 560ti GPUs are doing tasks at about that speed. With flops at ~1/10 APR, the servers have been able to compensate effectively by scaling rsc_fpops_est down, and the change in allowed ratio to 1/50 doesn't make a difference there. No need to change anything for S@H Enhanced since you're well past the crazy things that happen before the averages are established. But when S@H v7 is released here, getting the <flops> right as early as possible would make it possible to avoid those difficulties. OTOH, those difficulties are typically just uncomfortable rather than damaging, a good excuse for discussions here.

And in this one http://setiathome.berkeley.edu/host_app_versions.php?hostid=6569691, APR is 46.64 while the working flops value is 10, this host was build this year.

Ok, right now its not 30 times greather, but still not enough accurate to beeing able to stop using flops.

Anyway, the APR is just one side of this matter, the estimated tasks sizes of the MB Wus seem to be very accurate (and consistent with the Angle Range) for the GT430, but they fail badly on the 560Ti in which some tasks estimated as shorties take much more time than expected and viceversa... (But, I guess this is somewhat related to the different optimizations that are used with each hardware and I dont see any way in which the project can help from server side)

The main problem with a few tasks taking much longer than estimated is that DCF jumps up to predict that all cached tasks are also going to run slowly, and only comes back down slowly.

I can just barely understand why "being able to stop using flops" seems like a good thing to some users. My take is that the BOINC core client provides such an extremely conservative flops that it simply makes sense to provide something better. Those doing only CPU work can get by without <flops> in app_info.xml, but those with CUDA or OpenCL capable GPUs which outperform their CPUs by a factor of 10 or so should be aware that the core client is telling the servers the GPUs are slower than the CPUs if <flops> are not set. After APR is established the servers will compensate, of course, but that period just after release of a new app (or after a host has accidentally been assigned a new hostID) when the estimates are terrible need not be so.
                                                                  Joe
ID: 1217043 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1217058 - Posted: 11 Apr 2012, 23:24:36 UTC - in response to Message 1217043.  

The main problem with a few tasks taking much longer than estimated is that DCF jumps up to predict that all cached tasks are also going to run slowly, and only comes back down slowly.

I was not talking about the estimated times (I dont really pay atention to that numbers as they change due to the DCF variations), I was talking about Estimated Task Sizes. The real crunching time for the GT430 is proportional to the size almost always and also consistent with the Angle range, but in the 560Ti there are cases in which the real time is so far from the expected for that size that the raw_ratio (from DCF-debug) boldly goes where no WU went before!.

I can just barely understand why "being able to stop using flops" seems like a good thing to some users.

Just because Im lazy and updating the optimized apps require some extra work editing the app_info, but also a lot of work to get the new right value when there are new optimizations and different performances.
Of course "beeing able" dosnt means "take the option out"...(I like to know that there are available lifeboats, but I prefer not to have to use them... LOL)

For hosts running only one project, any error in this estimations have almost no consequences, at most some over/under-fetched cache and just for a while.
The "worst" issues comes when there are several projects and one of them changes all the estimated times by a factor of 5 (due to DCF rised by just one weird WU) making that all the other projects with short deadlines enter in panic mode (preventing the offending project to run, which delays the tuning down of the wrong DCF) and then as in any "panic" situation everything is messed up until the cops arrive...

Anyway, I agree that all this is just an excusse to talk about something here.


ID: 1217058 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1217199 - Posted: 12 Apr 2012, 8:26:43 UTC - in response to Message 1216892.  
Last modified: 12 Apr 2012, 8:34:15 UTC

Great to finally see some movement towards setting things right again!


...The net result has been that the cruncher has been starting and stopping WUs in HP mode all day. At one stage the list of WUs started and suspened because another one was higher priority covered a whole page of the screen.

To calm it down I have had to select NNT and suspend all non-started WUs. I will release them a few at a time. Has needed constant manual intervention all day.


It now seems that this problem is not related to the predicted short run times. Yesterday my main cruncher downloaded a whole load of updates from Microsoft and has been running strangely ever since. I leave the computer (which is based at work) with the monitor manually turned off but the PC set never to go to sleep and this has not changed. However, when left like this both graphics cards (as of yesterday) hang. They start one WU and after 30 seconds in which no processing is done try a different WU and so on. Overnight last night, they managed to crunch just 4 WUs between them! However, when I remotely monitor the PC using LogMeIn (as I did all day yesterday) then everything works OK. The CPU is not affected by this problem and has been happily crunching APs throughout. One of the Microsoft updates was a new video driver (295.73).

I am about to go into work (supposed to be on holiday!) to do a clean reinstall of the Nvidea driver that has worked fine for the last 3 months (270.61). However if anyone has any other suggestions as to what could be causing this and solutions I would like to hear them.
ID: 1217199 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1217203 - Posted: 12 Apr 2012, 8:46:29 UTC - in response to Message 1217199.  

Great to finally see some movement towards setting things right again!


...The net result has been that the cruncher has been starting and stopping WUs in HP mode all day. At one stage the list of WUs started and suspened because another one was higher priority covered a whole page of the screen.

To calm it down I have had to select NNT and suspend all non-started WUs. I will release them a few at a time. Has needed constant manual intervention all day.


It now seems that this problem is not related to the predicted short run times. Yesterday my main cruncher downloaded a whole load of updates from Microsoft and has been running strangely ever since. I leave the computer (which is based at work) with the monitor manually turned off but the PC set never to go to sleep and this has not changed. However, when left like this both graphics cards (as of yesterday) hang. They start one WU and after 30 seconds in which no processing is done try a different WU and so on. Overnight last night, they managed to crunch just 4 WUs between them! However, when I remotely monitor the PC using LogMeIn (as I did all day yesterday) then everything works OK. The CPU is not affected by this problem and has been happily crunching APs throughout. One of the Microsoft updates was a new video driver (295.73).

I am about to go into work (supposed to be on holiday!) to do a clean reinstall of the Nvidea driver that has worked fine for the last 3 months (270.61). However if anyone has any other suggestions as to what could be causing this and solutions I would like to hear them.


If Im not wrong the drivers installed have a bug which disables the GPUs when the monitor goes to sleep (from the OS). If you set the monitor to never go to sleep you wont need to reinstall drivers. (but you will need to turn it off to not waste energy)
ID: 1217203 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1217205 - Posted: 12 Apr 2012, 8:57:29 UTC - in response to Message 1217199.  
Last modified: 12 Apr 2012, 8:59:31 UTC

Great to finally see some movement towards setting things right again!


...The net result has been that the cruncher has been starting and stopping WUs in HP mode all day. At one stage the list of WUs started and suspened because another one was higher priority covered a whole page of the screen.

To calm it down I have had to select NNT and suspend all non-started WUs. I will release them a few at a time. Has needed constant manual intervention all day.


It now seems that this problem is not related to the predicted short run times. Yesterday my main cruncher downloaded a whole load of updates from Microsoft and has been running strangely ever since. I leave the computer (which is based at work) with the monitor manually turned off but the PC set never to go to sleep and this has not changed. However, when left like this both graphics cards (as of yesterday) hang. They start one WU and after 30 seconds in which no processing is done try a different WU and so on. Overnight last night, they managed to crunch just 4 WUs between them! However, when I remotely monitor the PC using LogMeIn (as I did all day yesterday) then everything works OK. The CPU is not affected by this problem and has been happily crunching APs throughout. One of the Microsoft updates was a new video driver (295.73).

I am about to go into work (supposed to be on holiday!) to do a clean reinstall of the Nvidea driver that has worked fine for the last 3 months (270.61). However if anyone has any other suggestions as to what could be causing this and solutions I would like to hear them.


setiathome_CUDA: cudaGetDeviceCount() call failed.
setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device cannot be used
  Cuda device initialisation retry 1 of 6, waiting 5 secs...
Cuda error 'Couldn't get cuda device count
' in file 'c:/[Projects]/X_CudaMB/client/cuda/cudaAcceleration.cu' in line 146 : no CUDA-capable device is detected.


You are on a 295 driver, which has a known bug causeing CUDA devices to disappear when the display goes to sleep on DVI connections IIRC.

Yes, that driver microsoft installed is the culprit - either downgrade tp 290.x or upgrade to 301.x

edit: and as horacio says using display setting to keep them active and turning off the monitor (which you do anyway if I understood correctly) is a workaround.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1217205 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1217260 - Posted: 12 Apr 2012, 13:21:59 UTC - in response to Message 1217205.  

You are on a 295 driver, which has a known bug causeing CUDA devices to disappear when the display goes to sleep on DVI connections IIRC.

Yes, that driver microsoft installed is the culprit - either downgrade tp 290.x or upgrade to 301.x

edit: and as horacio says using display setting to keep them active and turning off the monitor (which you do anyway if I understood correctly) is a workaround.


Thanks all for the advice after a lot of messing around this morning I managed to get the cruncher back to driver 270.61 and now all seems OK again. Just another example of Bill Gates trying to stop us finding ET:)) Now if it will just behave for a few days (been fighting weekend power cuts and overnight GPU crashes for the last few weeks) I should finally make it to 7Million credits which BoincStats was expecting me to get to yesterday!

I have another question. My motherboard has a x16 and a x4 slot for the GPUs. Whilst messing around this morning I noticed that my faster GPU (GTX460) is currently in the x4 slot and the slower GPU (GT430) is in the x16 slot. Would I see a (significant) performance enhancement if I swapped the GPUs over?
ID: 1217260 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Short estimated runtimes - don't panic


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.