Message boards :
Number crunching :
cuda23 never showed progress on BM
Message board moderation
Author | Message |
---|---|
Joseph Stateson Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 |
I aborted this task because it never showed any progress after 13 hours of runtime. BM showed 0.00% complete while BoincTasks percent complete varied around 1400.00% It was the 1400 that caught my attention. This happens very rarely, but when it does it consumes a lot of run time before I notice the problem. I would think that any project using a GPU for co-processing should implement some type of timeout for "no progress". I do not know why BoincTasks reported non-zero progress of %1400 for this task while BM showed 0. This indicates that BoincTasks spotted something that BM didnt. my 2c. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I would think that any project using a GPU for co-processing should implement some type of timeout for "no progress". There is "timeout" for any task (not only GPU) - it is 10 times the initial estimated time. E.g. if the initial estimated time to completion is 1 hour - after the task runs for 10 hours the task will be aborted (even if it is at 99%). Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I would think that any project using a GPU for co-processing should implement some type of timeout for "no progress". That's only reasonably accurate after a system has done enough work that BOINC is scaling the estimate and bound for each task and the Duration Correction Factor for the project is near 1.0. Beemer Biker's Computer 5796714 has no validated tasks under the new system so the cutoff time could have been much longer than he let it run. The " Device 1: GeForce 9800 GTX/9800 GTX+ is okay" line is output after a quick check of card capability, but before any real attempt is made to initialize processing. It just says the card has CUDA capability and at least 128 MB of memory. After that there's a call to a function in the cudart.dll which should get the basic setup started, apparently that never returned. Joe |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I haven't had any stickers in a while. Sometimes they would be at 0% or any percent below 100%. When I would find one it would normally be 80-100 hours of run time when the estimates were 3-4 hours. At first I was editing the client_state.xml to restart it & try again. However most of the time they would just get stuck again so I switched to aborting them. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Does your next Cuda Wu compute Normally? I'd be tempted to downgrade the driver from 270.61 to 266.58 and see if that works, Claggy |
Joseph Stateson Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 |
This message board was down while I had the same problem with a milkyway job. The milkway jobs were open_cl as opposed to cuda_fermi whatever that is or cuea23. I posted the same problem over at milkyway where the suggestion was to upgrade to the beta driver 275.whatever Anyway it seems that if I stop boinc and restart then the job either completes immediately with success or restarts at 0.0 and runs the normal 15 or so minutes before completing successfully. Since milkyway deletes results almost as fast as they are posted I can only guess that it was validated. I had a 3 hour 30 minute job (3:25 cpu time) that completed within 20 minutes from scratch (0.0 %) after restarting boinc. Discussion here I think the job that had the problem I posted about seems to have problems at all the wingmen. However, I have observed jobs completeing succesfully after restarting boinc, both SETI and MILKY I have upgraded all my systems to that 275.latest&geatest and will see what happens. |
Joseph Stateson Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 |
Here is another one that hung (never progressed) until I restarted boinc: From BoincTasks (note the 847% complete as shown by BoincTasks) and runtime of 1 hour, 22 minutes. This is the same image but from Boincmgr 6.12.26 I restarted boinc and immediately progress was shown under boinc. The task completed in 8 minutes as shown here I updated the project and the result was posted here This job ran AFTER I upgrade to that beta nvidia driver 275.27 From looking at the work I see no indication of the over 1 hour run time that unaccountably occured. Something is amiss. This system has only a single gtx280 and a q6700 cpu and is used exculsively for boinc (no games) |
Joseph Stateson Send message Joined: 27 May 99 Posts: 309 Credit: 70,759,933 RAC: 3 |
Different system, dual opteron, same problem: no progress unless the task is restarted: This system has two cards: gtx460 (d0) and a 9800gtx (d1) and is also running that 275.27 and 6.12.26 the above is from boinctasks otherwise the progress would have shown 0.0 % [EDIT] I suspect a driver problem. I just logged onto that system, vista 64, and had to dismiss two nvidia driver reset warnings that popped up. A driver reset during crunching possibly is causing this to happen. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
And I never wanted to put quite different GPUs, in a system. I would throw out the 9800GTX+ mine gave too much -9 and Find triplets in a row.... You can run 2 SETI MB WUs on your FERMI. (Optimized, if you'd like) Also depends which Compute Level is needed, 1.1 1.2, 1.3, 2.0, 2.1, last two are FERMI compatible. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Do you overclock your GPUs? Do you check temperatures? Is the PSU power enough? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.