Exceeded elapsed time limit... yikes!

Message boards : Number crunching : Exceeded elapsed time limit... yikes!
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile woodyrox
Volunteer tester

Send message
Joined: 7 Apr 01
Posts: 34
Credit: 16,069,169
RAC: 0
United States
Message 1131230 - Posted: 24 Jul 2011, 4:54:05 UTC

I posted this in the BOINC GPU forum. They suggested I come here...


So I installed a decent gpu in a not so hot box. The GPU is a PNY GT 430 installed in a Celeron D 356 Vista box. Installed Boinc 6.10.60 and got one CUDA-fermi task. GPU started crunching, and right off the bat I thought something was fishy. Both of the "To completion" timers were running forward -- ie, getting bigger. Also, the CUDA GPU task was showing a very high CPU usage, like 0.52 CPUs + 1.00 NVIDIA GPUs. I don't see CPU numbers this big on my other boinc computers. Like I've got a Pentium E2220 with a lousy 8400 GS GPU and it says 0.09 CPU + 1.00 NVIDIA GPU. So I was surprised to see such a high CPU demand on the low-end box.

I watched the elapsed time & to completion time of the CUDA task and realized that as soon as the "Elapsed time" = "To completion" timer would start counting down. Sure enough that's what happened. That's for the CUDA task timer. The CPU task was still getting bigger because the CPU task wasn't getting as much CPU as the 1.00 it was expecting. That all made sense.

So then, all of a sudden, the task was finished. I looked at the messages and saw:

Sat 23 Jul 2011 10:36:24 PM EDT SETI@home Aborting task 09mr11ag.20908.17409.16.10.203_1: exceeded elapsed time limit 8658.846851


Just when everything seemed to be going gangbusters, the task went belly up. OK, so then my box begged for some more GPU taks and got them. It's crunching one right now, but I can see the exact same thing is gonna happen. The counter is running in the wrong direction and it's expected completion time won't be met. Mark my words.

So my question... what to do?

thanks

PS: The initial "To completion" time estimate is crazy low -- just a few minutes. Ain't no how, now way this box will crunch these data sets that fast. I need more time. 3 hours not 10 minutes!
ID: 1131230 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1131264 - Posted: 24 Jul 2011, 6:20:52 UTC
Last modified: 24 Jul 2011, 6:22:07 UTC

Right off, the seti CUDA process has a problem with your GPU. From the stderr output for one of the WU's:

setiathome_CUDA: No CUDA devices found
setiathome_CUDA: Found 0 CUDA device(s):
setiathome_CUDA: CUDA Device 1 specified, checking...
Device cannot be used
SETI@home NOT using CUDA, falling back on host CPU processing

One of the more technically inclined denizens here might offer more details.
ID: 1131264 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1131267 - Posted: 24 Jul 2011, 6:30:55 UTC

Yeah, something's wrong with that GT430 rig...
It does not appear to have turned in any valid GPU tasks.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1131267 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1131273 - Posted: 24 Jul 2011, 6:49:53 UTC

CPU fall back is a known issue with 275x drivers.

You can go back to 266 drivers or run Lunatics installer.
Those new lunatics apps uses new boinc api and should prevent CPU fall back.



With each crime and every kindness we birth our future.
ID: 1131273 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1131284 - Posted: 24 Jul 2011, 7:29:35 UTC - in response to Message 1131273.  

I'd also recommend testing your GPU to see if it's not borked. If it is, fret not as the prices for things like 480's are rather low these days.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1131284 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1131310 - Posted: 24 Jul 2011, 9:14:43 UTC - in response to Message 1131230.  
Last modified: 24 Jul 2011, 9:28:01 UTC

Have you by any chance switched user? or used Remote Desktop? Doing those things will cause the GPU to become unusable,

I'd reboot, and see if GPU is detected on Boinc start, and if GPU computation runs normally,

Claggy
ID: 1131310 · Report as offensive
Profile woodyrox
Volunteer tester

Send message
Joined: 7 Apr 01
Posts: 34
Credit: 16,069,169
RAC: 0
United States
Message 1131331 - Posted: 24 Jul 2011, 10:07:08 UTC

Thanks to everyone for your replies.

I don't know if my GPU is bad. BOINC loads it fine and says this:

Sat 23 Jul 2011 08:04:32 PM EDT		NVIDIA GPU 0: GeForce GT 430 (driver version 27533, CUDA version 4000, compute capability 2.1, 962MB, 179 GFLOPS peak)


CPU-Z says I've got a CUDA GPU, but I don't think it tests performance. I ran clinfo from lunatics, Its output log doesn't seem to show any problems.

Two of my now 3 error runs show a debug dump. One simply says time limit exceeded. I don't know what that means.

I have not switched user and I have not run a remote desktop. I use VNCultra for that but have not used it while running these GPU tasks.

I'll update boinc to 6.12 though I doubt it's a boinc problem. I'll also install 266 drivers and tell you what happens.

If it is a GPU problem, how would I go about confirming that? I ran several video benchmarks and they all seem to think my GPU is ok as they report good video performance.
ID: 1131331 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1131332 - Posted: 24 Jul 2011, 10:13:16 UTC - in response to Message 1131331.  
Last modified: 24 Jul 2011, 10:15:41 UTC

If you get GPU-Z you can see what the GPU usage is like, and see if it's reported as Cuda and OpenCL capable,

Claggy
ID: 1131332 · Report as offensive
Profile woodyrox
Volunteer tester

Send message
Joined: 7 Apr 01
Posts: 34
Credit: 16,069,169
RAC: 0
United States
Message 1131342 - Posted: 24 Jul 2011, 10:53:54 UTC

Hey all, I loaded the 266 drivers and it's a different world! All of the cuda_fermi estimated times to completion are now much more reasonable. They're over an hour now where before they were under 5 minutes. The run status still says 0.52 CPUs + 1.00 NVIDIA GPUs. But the CPU is not being used 50% by the cuda_fermi task. There's plenty of CPU left to decrement the "To completion" time of the CPU task.

The GT 430 reported two cuda_fermi tasks that are awaiting validation. I think one was mostly completed by the CPU, but the second one was completed by the GPU, and fast. The GPU is grinding through another task right now and won't be long.

Bottom line is -- it's working like it's supposed to. I did not update to boinc 6.12 and don't plan to.

Thanks for your help.
ID: 1131342 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1131383 - Posted: 24 Jul 2011, 14:23:03 UTC - in response to Message 1131342.  

..
The run status still says 0.52 CPUs + 1.00 NVIDIA GPUs. But the CPU is not being used 50% by the cuda_fermi task.
...

That 0.52 is just a rough estimate done server-side, based on nVidia's formula for how fast the card is and your CPU's Whetstone benchmark. The BOINC core client uses it in deciding whether a CPU should be left idle to feed the GPU, but with any estimate less than 1.0 it only matters for those who are crunching with two or more GPUs.
                                                                 Joe
ID: 1131383 · Report as offensive

Message boards : Number crunching : Exceeded elapsed time limit... yikes!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.