Questions and Answers :
GPU applications :
GPU runs 6 cuda WUs swtching among them.
Message board moderation
Author | Message |
---|---|
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
My GPU is running 6 WUs, swtching among them each 5 to 30 minutes. The problem is that it seems to take a good time to retake a WU each time it switches and thus waisting a lot of time. Wich is the fastest way? If the one-at-a-time is the fastest way, how to fix it? This problem started after updating from 6.6.20 to 6.6.31. Is an "standard tune recipe" for cpu and gpu available? I started 6.6.20&cuda on May 13th and experienced a very rising curve up to Jun 1st when turn the pc off for a cooler replacement. I re-boled it in again on Jun 6th, updated to 6.6.31 and after a week, no rising tendency at all. W/o cuda this pc was about 600/700 RAC and now I'm still here. I run a pentium D 3GHz 2Mb RAM 2+2 Mb cache, 8600GT w/ 6.14.11.7813 driver, XP SP3 and 100% seti. Thanks for any help, Eduardo |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
First thing I would do is head here http://www.nvidia.com/Download/index.aspx?lang=en-us to get the latest Cuda driver. PROUD MEMBER OF Team Starfire World BOINC |
Basshopper Send message Joined: 5 Aug 99 Posts: 6 Credit: 20,615,691 RAC: 0 |
Get the next update 6.6.36 that fixed that issue. |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
Basshopper and perryjay: I did both things and I'll report the efect tomorrow. Thanks, Eduardo |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
Hey, gays, I updated to 6.6.36 and 12 hours later it is still running 5 WUs simultaneously, switching among them each 5 to 30 minutes . Is there another way to fix it? Thanks, Eduardo |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
I suspect that those are in deadline trouble. Please check the deadline of the tasks, both those in question with this behaviour and those still in queue. If some are within their last 24 hours before deadline, they could go run with this behaviour. Best to abort tasks nearing their deadline, if you can't run them all before that time. Next adjust your additional amount of work request to 2 to 3 days. |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
Jord, even before I'd read your last post, I'd already aborted 3 tasks that I suspected were the problem (and they weren't in deadline trouble yet) AND THIS INDEED SOLVED THE PROBLEM! NOW IT IS RUNNING JUST 1 CUDA AT A TIME. And the numbers got to grow again. Thanks to all for the assistance, Eduardo P.S.: I researched the work done recently and found that it was costing up to 100% more time to crunch each cuda unit. |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
I suspect that those are in deadline trouble. Please check the deadline of the tasks, both those in question with this behaviour and those still in queue. If some are within their last 24 hours before deadline, they could go run with this behaviour. I have the same problem, they just pile up (about 50) stopping all the time even after a few minutes. And deadline... The closest one is 7/5/2009 as it is now 7/17/2009 it is well within the deadline. I got one 6/29 waiting (2 SECONDS before the end) to run even as others 7/5 are running. Is there any way to prevent the waiting. It just makes no sense at all, the longest units are about 10 minutes. |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
The problem return yesterday when the computer turn to run 3 simultaneously. Tonight, there were 13 units in process and the times are more than 3 hours each instead of 1:07/1:10 typical time: it is consuming a lot of extra time to crunch them, an incredible waste of computing power. I ask for help again. I aborted 2 of the units with more than 3:30 hours (there were 4 in this condition) that seems to be the cause, kept just one running and suspended all other 10. I'll see if it works. Tahnks for any help, Eduardo |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
It didn't work: 5 hours later, the boinc is running 4 other cuda units simultaneously. Is it available a fix for this problem? I'll keep it running, waiting for a solution, because it is more useful to run low efficiency cuda than none. 6.6.36 w/ 2.8.7 widgets. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
How big is your cache set to?, it might a bit too large, try setting it to about 2 days max, and how many Cuda tasks do you have? Claggy |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
My cache is set to 10+2 days. It was loaded a couple of days before and I have about 140 cuda units ready to start and 13 running. Right now I'm out of work of cpu wu's and the manager does not ask for more work. I know this is another problem being discussed in another thread, but my ccomputing power is available to Seti and it's a shame it's not being used at is maximum. Please, help. Thanks, Eduardo |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
My cache is set to 10+2 days. How so? Is that the Connect to set to 10 and the Additional work set to 2? As that only caches 2 days of work, while you told Boinc that it doesn't have an internet connection until those 10 days are gone. You set the cache through the Additional work value, with a maximum of 10 days. With short work (less than 2 weeks deadline) you may not get work when the cache is set to 10 days, as then Boinc may think you won't be able to finish it in time. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
When Boinc 6.6.33 was released i found that, if you suspend the running cuda task at the top of the cache, Boinc closes the Cuda app running that WU(while watching windows task manager)(so freeing up the GPU memory), that's O.K, But if you do that while the Cuda WU is in it's first 10 secs or so, (ie before it's checkpointed), then Boinc Doesn't shut that Cuda WU down (and so doesn't free up the GPU memory), and when the next Cuda WU starts there isn't enough GPU memory left, and that Cuda WU falls into CPU fallback mode, the WU takes a huge amount of CPU time to complete, (while sharing CPU time with normal CPU apps), when that WU completes, DCF (duration correction factor) gets driven upwards sharply, making all the completion times a lot longer, and more likely to jump into EDF. I think your PC probably did a Cuda VLAR task (Very Low Angle Range), Which take a long time with the Cuda app, the DCF was driven up, causing Boinc to switch to EDF (Earliest Deadline First) for Cuda (Because of your excessive and Backwards Cache setting), The first EDF Cuda WU started, then immediately switched to a second, (so using up all the GPU memory), then every Cuda WU that starts runs in CPU fallback mode, and if any finish then it just drives DCF higher still. The Way round it?, 1. make you're cache size in your preferences smaller, Try 0+10 first to see if EDF stops, 2. Suspend a Large enough Block of Cuda WU's so Boinc returns the GPU to FIFO order (First In, First Out), 3. Let time pass, while watching your DCF, it should now get driven downwards (to under 1)(or you could just edit it down to 1) 4. unsuspend some Cuda WU's as the cache size comes down, until the Cuda WU's always run in FIFO order. Claggy Edit: Just tried suspending cuda tasks in quick succesion on Boinc 6.6.36, Cuda app still goes into fallback mode. |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
Now I see how Boinc understand the settings and how I've mislead it. I changed the cache settings and I'm following the suggested steps. I'll report its efffectiveness. Thanks very much, Eduardo |
S@NL - eFMer - efmer.com/boinc Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 |
I got a VLAR killer so that's not it. Got 120 waiting piled up. The CUDA tasks stay in memory of the CPU, up to 8. The CUDA tasks stop at impossible times like 0%, 0.00% or 100%. Because of this I get system crashes, not what I want. Will suspend the rest put the buffer to 2 days, wish leaves me with less than .5 days of CUDA buffer. Not what I like..... |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
6 hours after step one, with no behavior change, I went into step 2 and as soon as I'd started to suspend a "large enough" amount of cuda units, the effect was that the manager got instantaneously crazy, starting units and living them waiting after 2 to 5 seconds of processing: now I got 49 waiting to run. Thus I suspended everything else except 15 and go into step 3 "letting time to pass" and see if it cleans up. The truth is that I'm going to pilot the amount of available units each day until this ends up. And the boinc manager still does not request new tasks and the computer's out of regular cpu work for 12 hours now. Shall I reset the project and start all over again with this new preferences settings? Eduardo |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
If you end up with 50 Cuda tasks started, make sure you restart Boinc (after suspending some tasks, so it'll run in FIFO order), it'll free up the GPU memory. Boinc won't ask for work from a project if any of it's WU's are suspended, so you'll have to unsuspend them to get more CPU work. Or you could always get the ReSchedule program from here: Re: CPU <-> GPU rebranding and rebrand some of of the GPU tasks to the CPU, you'll need 7zip to decompress it, put the program and config files in your Boinc Data folder, and a short cut to the program on your desktop, Shut down Boinc, run the ReSchedule program, you'll have to untick a box (in the config tab) to get the slider to work, click test, then when that's finished, click run, then shut down ReSchedule, and restart Boinc. Claggy |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
One cuda unit from that group of 15 I left running took more than 5 (five!) hours to ask for 40 credits, so I abort 3 others that were going in the same way. And the tips from Claggy are expllaining a lot how the manager works: I unsuspend all units and the manager started to request new work imediately. But maybe the real question is: why, after many days running right and without any interference of my side either hardware, software or settings, why does the manager begin to jump from task to task all the time ? This reminds me of that problem with old releases that were uncapable to unwind some units AND uncapable to ask for exit with bad wu, I guess 5.8 or something. I don't know what was done to solve it but it doesn't happen anymore(I have the same computer). Maybe same kind of adjustment must be done regarding cuda management. While I wrote this, the manager got new work and now I'm crunching regular units again. I'd like to have a definitive solution but meanwhile let's follow claggy advice and wait the time to pass and see what happens. Thank for your advicing, Eduardo |
Eduardo Bicudo Dreyfuss Send message Joined: 20 Jun 05 Posts: 58 Credit: 1,386,154 RAC: 0 |
I still have the same behaviour, now crunching 45(!)cuda WU at the same time and some of them are taking up to 7h to crunch compared to 1:07/1:10 typical time(see (*) below). I checked the BOINC Q&A and this is being tracked there also, under the name "Stop switching between WU". My pc was at 920 RAC, in a rising curve expected to reach 1400 at least and now it's at 740 and going down, a big waste of computer power. I please ask for a solution. Thanks, Eduardo (*)30mr.09ab.10053.19704.3.8.74_1, reported jun25, 02:00 gmt, and I couldn't know haw many credits, since right after reporting it wasn't in my tasks list anymore. Is this one a VLAR that justifies this huge time to crunch it? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.