GPU runs 6 cuda WUs swtching among them.

Questions and Answers : GPU applications : GPU runs 6 cuda WUs swtching among them.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 906688 - Posted: 12 Jun 2009, 15:14:07 UTC

My GPU is running 6 WUs, swtching among them each 5 to 30 minutes. The problem is that it seems to take a good time to retake a WU each time it switches and thus waisting a lot of time.

Wich is the fastest way?
If the one-at-a-time is the fastest way, how to fix it?

This problem started after updating from 6.6.20 to 6.6.31.

Is an "standard tune recipe" for cpu and gpu available?

I started 6.6.20&cuda on May 13th and experienced a very rising curve up to Jun 1st when turn the pc off for a cooler replacement. I re-boled it in again on Jun 6th, updated to 6.6.31 and after a week, no rising tendency at all. W/o cuda this pc was about 600/700 RAC and now I'm still here.

I run a pentium D 3GHz 2Mb RAM 2+2 Mb cache, 8600GT w/ 6.14.11.7813 driver, XP SP3 and 100% seti.

Thanks for any help,

Eduardo
ID: 906688 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 906692 - Posted: 12 Jun 2009, 15:32:13 UTC - in response to Message 906688.  

First thing I would do is head here http://www.nvidia.com/Download/index.aspx?lang=en-us to get the latest Cuda driver.


PROUD MEMBER OF Team Starfire World BOINC
ID: 906692 · Report as offensive
Basshopper

Send message
Joined: 5 Aug 99
Posts: 6
Credit: 20,615,691
RAC: 0
United States
Message 906989 - Posted: 12 Jun 2009, 22:40:57 UTC - in response to Message 906688.  
Last modified: 12 Jun 2009, 23:17:19 UTC

Get the next update 6.6.36 that fixed that issue.
ID: 906989 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 907146 - Posted: 13 Jun 2009, 5:28:48 UTC

Basshopper and perryjay: I did both things and I'll report the efect tomorrow.

Thanks,

Eduardo
ID: 907146 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 907268 - Posted: 13 Jun 2009, 13:57:47 UTC

Hey, gays, I updated to 6.6.36 and 12 hours later it is still running 5 WUs simultaneously, switching among them each 5 to 30 minutes .

Is there another way to fix it?

Thanks,

Eduardo


ID: 907268 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 907273 - Posted: 13 Jun 2009, 14:07:21 UTC - in response to Message 907268.  
Last modified: 13 Jun 2009, 14:08:13 UTC

I suspect that those are in deadline trouble. Please check the deadline of the tasks, both those in question with this behaviour and those still in queue. If some are within their last 24 hours before deadline, they could go run with this behaviour.

Best to abort tasks nearing their deadline, if you can't run them all before that time.

Next adjust your additional amount of work request to 2 to 3 days.
ID: 907273 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 907317 - Posted: 13 Jun 2009, 16:48:45 UTC

Jord, even before I'd read your last post, I'd already aborted 3 tasks that I suspected were the problem (and they weren't in deadline trouble yet) AND THIS INDEED SOLVED THE PROBLEM!

NOW IT IS RUNNING JUST 1 CUDA AT A TIME.

And the numbers got to grow again.

Thanks to all for the assistance,

Eduardo

P.S.: I researched the work done recently and found that it was costing up to 100% more time to crunch each cuda unit.
ID: 907317 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 908336 - Posted: 17 Jun 2009, 7:56:21 UTC - in response to Message 907273.  

I suspect that those are in deadline trouble. Please check the deadline of the tasks, both those in question with this behaviour and those still in queue. If some are within their last 24 hours before deadline, they could go run with this behaviour.

Best to abort tasks nearing their deadline, if you can't run them all before that time.

Next adjust your additional amount of work request to 2 to 3 days.

I have the same problem, they just pile up (about 50) stopping all the time even after a few minutes. And deadline... The closest one is 7/5/2009 as it is now 7/17/2009 it is well within the deadline.
I got one 6/29 waiting (2 SECONDS before the end) to run even as others 7/5 are running.
Is there any way to prevent the waiting. It just makes no sense at all, the longest units are about 10 minutes.
ID: 908336 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 909314 - Posted: 19 Jun 2009, 22:53:30 UTC

The problem return yesterday when the computer turn to run 3 simultaneously. Tonight, there were 13 units in process and the times are more than 3 hours each instead of 1:07/1:10 typical time: it is consuming a lot of extra time to crunch them, an incredible waste of computing power.

I ask for help again.

I aborted 2 of the units with more than 3:30 hours (there were 4 in this condition) that seems to be the cause, kept just one running and suspended all other 10. I'll see if it works.

Tahnks for any help,

Eduardo
ID: 909314 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 909394 - Posted: 20 Jun 2009, 4:31:14 UTC - in response to Message 909314.  

It didn't work: 5 hours later, the boinc is running 4 other cuda units simultaneously.

Is it available a fix for this problem?

I'll keep it running, waiting for a solution, because it is more useful to run low efficiency cuda than none.

6.6.36 w/ 2.8.7 widgets.
ID: 909394 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 909536 - Posted: 20 Jun 2009, 15:18:24 UTC - in response to Message 909394.  
Last modified: 20 Jun 2009, 15:19:12 UTC

How big is your cache set to?, it might a bit too large, try setting it to about 2 days max,

and how many Cuda tasks do you have?

Claggy
ID: 909536 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 909706 - Posted: 20 Jun 2009, 22:46:52 UTC - in response to Message 909536.  

My cache is set to 10+2 days. It was loaded a couple of days before and I have about 140 cuda units ready to start and 13 running.

Right now I'm out of work of cpu wu's and the manager does not ask for more work. I know this is another problem being discussed in another thread, but my ccomputing power is available to Seti and it's a shame it's not being used at is maximum.

Please, help.

Thanks,

Eduardo
ID: 909706 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 909716 - Posted: 20 Jun 2009, 23:25:15 UTC - in response to Message 909706.  

My cache is set to 10+2 days.

How so? Is that the Connect to set to 10 and the Additional work set to 2?
As that only caches 2 days of work, while you told Boinc that it doesn't have an internet connection until those 10 days are gone.

You set the cache through the Additional work value, with a maximum of 10 days.
With short work (less than 2 weeks deadline) you may not get work when the cache is set to 10 days, as then Boinc may think you won't be able to finish it in time.
ID: 909716 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 909721 - Posted: 20 Jun 2009, 23:43:30 UTC - in response to Message 909706.  
Last modified: 21 Jun 2009, 0:11:54 UTC

When Boinc 6.6.33 was released i found that, if you suspend the running cuda task at the top of the cache,
Boinc closes the Cuda app running that WU(while watching windows task manager)(so freeing up the GPU memory), that's O.K,

But if you do that while the Cuda WU is in it's first 10 secs or so, (ie before it's checkpointed), then Boinc Doesn't shut that Cuda WU down (and so doesn't free up the GPU memory),
and when the next Cuda WU starts there isn't enough GPU memory left, and that Cuda WU falls into CPU fallback mode, the WU takes a huge amount of CPU time to complete,
(while sharing CPU time with normal CPU apps), when that WU completes, DCF (duration correction factor) gets driven upwards sharply, making all the completion times a lot longer, and more likely to jump into EDF.

I think your PC probably did a Cuda VLAR task (Very Low Angle Range), Which take a long time with the Cuda app,
the DCF was driven up, causing Boinc to switch to EDF (Earliest Deadline First) for Cuda (Because of your excessive and Backwards Cache setting),
The first EDF Cuda WU started, then immediately switched to a second, (so using up all the GPU memory),
then every Cuda WU that starts runs in CPU fallback mode, and if any finish then it just drives DCF higher still.

The Way round it?,
1. make you're cache size in your preferences smaller, Try 0+10 first to see if EDF stops,
2. Suspend a Large enough Block of Cuda WU's so Boinc returns the GPU to FIFO order (First In, First Out),
3. Let time pass, while watching your DCF, it should now get driven downwards (to under 1)(or you could just edit it down to 1)
4. unsuspend some Cuda WU's as the cache size comes down, until the Cuda WU's always run in FIFO order.

Claggy

Edit: Just tried suspending cuda tasks in quick succesion on Boinc 6.6.36, Cuda app still goes into fallback mode.
ID: 909721 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 909737 - Posted: 21 Jun 2009, 1:23:34 UTC - in response to Message 909721.  

Now I see how Boinc understand the settings and how I've mislead it.

I changed the cache settings and I'm following the suggested steps.

I'll report its efffectiveness.

Thanks very much,

Eduardo
ID: 909737 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 909783 - Posted: 21 Jun 2009, 5:31:53 UTC - in response to Message 909737.  

I got a VLAR killer so that's not it.
Got 120 waiting piled up.
The CUDA tasks stay in memory of the CPU, up to 8.
The CUDA tasks stop at impossible times like 0%, 0.00% or 100%.
Because of this I get system crashes, not what I want.
Will suspend the rest put the buffer to 2 days, wish leaves me with less than .5 days of CUDA buffer. Not what I like.....
ID: 909783 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 909800 - Posted: 21 Jun 2009, 8:12:49 UTC - in response to Message 909737.  

6 hours after step one, with no behavior change, I went into step 2 and as soon as I'd started to suspend a "large enough" amount of cuda units, the effect was that the manager got instantaneously crazy, starting units and living them waiting after 2 to 5 seconds of processing: now I got 49 waiting to run. Thus I suspended everything else except 15 and go into step 3 "letting time to pass" and see if it cleans up. The truth is that I'm going to pilot the amount of available units each day until this ends up.

And the boinc manager still does not request new tasks and the computer's out of regular cpu work for 12 hours now.

Shall I reset the project and start all over again with this new preferences settings?

Eduardo
ID: 909800 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 909830 - Posted: 21 Jun 2009, 12:21:26 UTC - in response to Message 909800.  

If you end up with 50 Cuda tasks started, make sure you restart Boinc (after suspending some tasks, so it'll run in FIFO order), it'll free up the GPU memory.

Boinc won't ask for work from a project if any of it's WU's are suspended, so you'll have to unsuspend them to get more CPU work.

Or you could always get the ReSchedule program from here: Re: CPU <-> GPU rebranding and rebrand some of of the GPU tasks to the CPU,
you'll need 7zip to decompress it, put the program and config files in your Boinc Data folder, and a short cut to the program on your desktop,
Shut down Boinc, run the ReSchedule program, you'll have to untick a box (in the config tab) to get the slider to work, click test, then when that's finished,
click run, then shut down ReSchedule, and restart Boinc.

Claggy

ID: 909830 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 909977 - Posted: 22 Jun 2009, 1:29:02 UTC - in response to Message 909830.  

One cuda unit from that group of 15 I left running took more than 5 (five!) hours to ask for 40 credits, so I abort 3 others that were going in the same way. And the tips from Claggy are expllaining a lot how the manager works: I unsuspend all units and the manager started to request new work imediately.

But maybe the real question is: why, after many days running right and without any interference of my side either hardware, software or settings, why does the manager begin to jump from task to task all the time ?

This reminds me of that problem with old releases that were uncapable to unwind some units AND uncapable to ask for exit with bad wu, I guess 5.8 or something. I don't know what was done to solve it but it doesn't happen anymore(I have the same computer). Maybe same kind of adjustment must be done regarding cuda management.

While I wrote this, the manager got new work and now I'm crunching regular units again.

I'd like to have a definitive solution but meanwhile let's follow claggy advice and wait the time to pass and see what happens.

Thank for your advicing,

Eduardo
ID: 909977 · Report as offensive
Profile Eduardo Bicudo Dreyfuss

Send message
Joined: 20 Jun 05
Posts: 58
Credit: 1,386,154
RAC: 0
Brazil
Message 911056 - Posted: 25 Jun 2009, 3:07:26 UTC - in response to Message 909977.  

I still have the same behaviour, now crunching 45(!)cuda WU at the same time and some of them are taking up to 7h to crunch compared to 1:07/1:10 typical time(see (*) below).

I checked the BOINC Q&A and this is being tracked there also, under the name "Stop switching between WU".

My pc was at 920 RAC, in a rising curve expected to reach 1400 at least and now it's at 740 and going down, a big waste of computer power.

I please ask for a solution.

Thanks,

Eduardo

(*)30mr.09ab.10053.19704.3.8.74_1, reported jun25, 02:00 gmt, and I couldn't know haw many credits, since right after reporting it wasn't in my tasks list anymore. Is this one a VLAR that justifies this huge time to crunch it?
ID: 911056 · Report as offensive
1 · 2 · Next

Questions and Answers : GPU applications : GPU runs 6 cuda WUs swtching among them.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.