Message boards :
Number crunching :
Exceeded elapsed time limit... yikes!
Message board moderation
Author | Message |
---|---|
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
I posted this in the BOINC GPU forum. They suggested I come here... So I installed a decent gpu in a not so hot box. The GPU is a PNY GT 430 installed in a Celeron D 356 Vista box. Installed Boinc 6.10.60 and got one CUDA-fermi task. GPU started crunching, and right off the bat I thought something was fishy. Both of the "To completion" timers were running forward -- ie, getting bigger. Also, the CUDA GPU task was showing a very high CPU usage, like 0.52 CPUs + 1.00 NVIDIA GPUs. I don't see CPU numbers this big on my other boinc computers. Like I've got a Pentium E2220 with a lousy 8400 GS GPU and it says 0.09 CPU + 1.00 NVIDIA GPU. So I was surprised to see such a high CPU demand on the low-end box. I watched the elapsed time & to completion time of the CUDA task and realized that as soon as the "Elapsed time" = "To completion" timer would start counting down. Sure enough that's what happened. That's for the CUDA task timer. The CPU task was still getting bigger because the CPU task wasn't getting as much CPU as the 1.00 it was expecting. That all made sense. So then, all of a sudden, the task was finished. I looked at the messages and saw: Sat 23 Jul 2011 10:36:24 PM EDT SETI@home Aborting task 09mr11ag.20908.17409.16.10.203_1: exceeded elapsed time limit 8658.846851 Just when everything seemed to be going gangbusters, the task went belly up. OK, so then my box begged for some more GPU taks and got them. It's crunching one right now, but I can see the exact same thing is gonna happen. The counter is running in the wrong direction and it's expected completion time won't be met. Mark my words. So my question... what to do? thanks PS: The initial "To completion" time estimate is crazy low -- just a few minutes. Ain't no how, now way this box will crunch these data sets that fast. I need more time. 3 hours not 10 minutes! |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
Right off, the seti CUDA process has a problem with your GPU. From the stderr output for one of the WU's: setiathome_CUDA: No CUDA devices found setiathome_CUDA: Found 0 CUDA device(s): setiathome_CUDA: CUDA Device 1 specified, checking... Device cannot be used SETI@home NOT using CUDA, falling back on host CPU processing One of the more technically inclined denizens here might offer more details. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Yeah, something's wrong with that GT430 rig... It does not appear to have turned in any valid GPU tasks. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
CPU fall back is a known issue with 275x drivers. You can go back to 266 drivers or run Lunatics installer. Those new lunatics apps uses new boinc api and should prevent CPU fall back. With each crime and every kindness we birth our future. |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
|
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Have you by any chance switched user? or used Remote Desktop? Doing those things will cause the GPU to become unusable, I'd reboot, and see if GPU is detected on Boinc start, and if GPU computation runs normally, Claggy |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
Thanks to everyone for your replies. I don't know if my GPU is bad. BOINC loads it fine and says this: Sat 23 Jul 2011 08:04:32 PM EDT NVIDIA GPU 0: GeForce GT 430 (driver version 27533, CUDA version 4000, compute capability 2.1, 962MB, 179 GFLOPS peak) CPU-Z says I've got a CUDA GPU, but I don't think it tests performance. I ran clinfo from lunatics, Its output log doesn't seem to show any problems. Two of my now 3 error runs show a debug dump. One simply says time limit exceeded. I don't know what that means. I have not switched user and I have not run a remote desktop. I use VNCultra for that but have not used it while running these GPU tasks. I'll update boinc to 6.12 though I doubt it's a boinc problem. I'll also install 266 drivers and tell you what happens. If it is a GPU problem, how would I go about confirming that? I ran several video benchmarks and they all seem to think my GPU is ok as they report good video performance. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
If you get GPU-Z you can see what the GPU usage is like, and see if it's reported as Cuda and OpenCL capable, Claggy |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
Hey all, I loaded the 266 drivers and it's a different world! All of the cuda_fermi estimated times to completion are now much more reasonable. They're over an hour now where before they were under 5 minutes. The run status still says 0.52 CPUs + 1.00 NVIDIA GPUs. But the CPU is not being used 50% by the cuda_fermi task. There's plenty of CPU left to decrement the "To completion" time of the CPU task. The GT 430 reported two cuda_fermi tasks that are awaiting validation. I think one was mostly completed by the CPU, but the second one was completed by the GPU, and fast. The GPU is grinding through another task right now and won't be long. Bottom line is -- it's working like it's supposed to. I did not update to boinc 6.12 and don't plan to. Thanks for your help. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
.. That 0.52 is just a rough estimate done server-side, based on nVidia's formula for how fast the card is and your CPU's Whetstone benchmark. The BOINC core client uses it in deciding whether a CPU should be left idle to feed the GPU, but with any estimate less than 1.0 it only matters for those who are crunching with two or more GPUs. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.