197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

Message boards : Number crunching : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 26 May 99
Posts: 9144
Credit: 52,280,773
RAC: 15,972
United Kingdom
Message 1907487 - Posted: 16 Dec 2017, 12:59:16 UTC
Last modified: 16 Dec 2017, 12:59:39 UTC

I have tried a search for this but the results don't seem to decribe what I am seeing.

I have two identical machines:

6906753

and

8275447

Both are having the same problem with a number of task failing with the above error.

When I look at Boinc tasks I see the same thing on both machines large elapsed times of several hours and massive times left over 20 hours, and as I watch the time left increases.

If I stop and restart Boinc then all is OK again for a while, then it happens again, not every task but often enough to be annoying.

This year has been a bit hectic so I haven't kept up to date with things and have obviously done something wrong.

Any advice appreciated.
ID: 1907487 · Report as offensive     Reply Quote
Profile MikeSpecial Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30804
Credit: 59,284,243
RAC: 24,771
Germany
Message 1907490 - Posted: 16 Dec 2017, 13:40:07 UTC

How many instances are you running on CPU ?
Looks like lack of ressources.
With each crime and every kindness we birth our future.
ID: 1907490 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 26 May 99
Posts: 9144
Credit: 52,280,773
RAC: 15,972
United Kingdom
Message 1907491 - Posted: 16 Dec 2017, 13:50:40 UTC - in response to Message 1907490.  

How many instances are you running on CPU ?
Looks like lack of ressources.

Two CPU tasks one GPU.

I currently have both machines set to NNT tasks and it happened this morning when there were no CPU tasks running.
ID: 1907491 · Report as offensive     Reply Quote
Profile MikeSpecial Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30804
Credit: 59,284,243
RAC: 24,771
Germany
Message 1907493 - Posted: 16 Dec 2017, 14:07:04 UTC
Last modified: 16 Dec 2017, 14:08:35 UTC

That`s weird.

Go to your projects folder and change mb_cmdline*txt file to


-sbs 512 -period_iterations_num 40 -use_sleep -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32
With each crime and every kindness we birth our future.
ID: 1907493 · Report as offensive     Reply Quote
Profile Brent NormanSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1963
Credit: 131,544,347
RAC: 344,080
Canada
Message 1907494 - Posted: 16 Dec 2017, 14:07:50 UTC - in response to Message 1907487.  
Last modified: 16 Dec 2017, 14:17:57 UTC

Are you rescheduling? That is a most likely cause.

EDIT: Hmm, tasks are running 6-8h before hitting the limit, so that shouldn't be it. Did you reboot? I don't see a driver change on either computer from working to not. 388.59 and 388.13 I think they were. Maybe driver reload time ...
ID: 1907494 · Report as offensive     Reply Quote
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1439
Credit: 147,558,810
RAC: 189,490
United States
Message 1907532 - Posted: 16 Dec 2017, 17:14:49 UTC

Have you been having any video driver crashes on those machines? I have one machine that gets those quite frequently. The driver recovers, but the side effect is that GPU tasks that are running when the driver crashes sometimes seem to hang, though not always. The elapsed time increases but the progress does not. The only way I manage to avoid getting the timeouts that you're experiencing is to check on that machine a couple times a day to see if any tasks are hung. If they are, I just suspend those individual tasks briefly, then resume. When they start running again, the elapsed time drops back to a more normal figure and everything runs along just fine (unless there's another driver crash).

Interestingly, I never used to have that problem when I was running a 550Ti in that machine. It only started when I switched to a 750Ti a year or so ago, and continues today with a 960. At the time I installed the 750Ti, I was running Cuda, so the first thing I did when the driver crashes started was to switch to SoG. No change. After that, I tried newer drivers, older drivers, TDR delay increases, etc., etc., with no discernible effect, so I've just learned to live with it. I generally manage to avoid the errors, but I often end up with several hours of lost processing time before I discover that a task is hung.
ID: 1907532 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 26 May 99
Posts: 9144
Credit: 52,280,773
RAC: 15,972
United Kingdom
Message 1907543 - Posted: 16 Dec 2017, 17:49:32 UTC

Are you rescheduling? That is a most likely cause.


No never have.

Have you been having any video driver crashes on those machines?


Recently, yes often on both machines.

I tried newer drivers, older drivers, TDR delay increases, etc., etc.


Yes I have tried that to no effect,

I was running one on the newer driver but to be fair that has made it worse!!

I will try Mikes command line on one machine and see what happens
ID: 1907543 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9190
Credit: 118,780,000
RAC: 49,590
Australia
Message 1907572 - Posted: 16 Dec 2017, 21:19:15 UTC - in response to Message 1907543.  
Last modified: 16 Dec 2017, 21:20:14 UTC

Have you been having any video driver crashes on those machines?

Recently, yes often on both machines.

On my Win 10 systems, i'm running 372.54 without issues.
I can't remember if I had to do the TDR registry edit or not- I did have to do it on my C2D with 32bit Vista to stop driver re-starts (and I had to stick with a much older driver version otherwise I would get restarts even running CUDA50). Basically it just increases the period of time before the OS thinks the video driver isn't responding & re-initializes it. You can disable this function, but it means a full system crash if you have a driver problem actually occur.
Graphics driver stopped responding and has recovered....TDR fix (see solution 3).
I think you might be running in to similar issues I had- 32bit OS, low CPU clock speed results in (relatively) long delays in handling system interrupts & the TDR kicks in.


Grant
Darwin NT
ID: 1907572 · Report as offensive     Reply Quote
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1439
Credit: 147,558,810
RAC: 189,490
United States
Message 1907583 - Posted: 16 Dec 2017, 22:23:37 UTC - in response to Message 1907572.  

For me, even bumping the TdrDelay all the way up to 16 (0x00000010) didn't help, or at least didn't help much. I just figure that once I moved from the 550Ti to the 750Ti, the speed and efficiency of the newer GPU just became too much for the old C2D and/or the MB bus to handle consistently. I just looked at that system and found that it's had 3 driver crashes in the last 17+ hours, though none of them caused any tasks to hang. I've never tried to figure out if there's a pattern of any kind. It just didn't seem to be worth the effort.
ID: 1907583 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9190
Credit: 118,780,000
RAC: 49,590
Australia
Message 1907584 - Posted: 16 Dec 2017, 22:31:23 UTC - in response to Message 1907583.  
Last modified: 16 Dec 2017, 22:36:33 UTC

For me, even bumping the TdrDelay all the way up to 16 (0x00000010) didn't help, or at least didn't help much. I just figure that once I moved from the 550Ti to the 750Ti, the speed and efficiency of the newer GPU just became too much for the old C2D and/or the MB bus to handle consistently. I just looked at that system and found that it's had 3 driver crashes in the last 17+ hours, though none of them caused any tasks to hang. I've never tried to figure out if there's a pattern of any kind. It just didn't seem to be worth the effort.

32bit OS?
I'd always thought of putting a C2Quad in mine thinking that would help (I've got 2 GTX 750Tis in there), but it's looking like it boils down to the 32bit OS, and the slow clock speed. A 64bit OS would probably allow the system to use more of the RAM, with less resource contention with the larger available address space.
On mine, the harder the GPUs work, the less time the CPU has to process WUs, and the higher the system resources taken up with Interrupts & DPCs (up to 15% when I tried to run SoG on it).
Grant
Darwin NT
ID: 1907584 · Report as offensive     Reply Quote
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1439
Credit: 147,558,810
RAC: 189,490
United States
Message 1907589 - Posted: 16 Dec 2017, 22:50:46 UTC - in response to Message 1907584.  

32bit OS?
I'd always thought of putting a C2Quad in mine thinking that would help (I've got 2 GTX 750Tis in there), but it's looking like it boils down to the 32bit OS, and the slow clock speed. A 64bit OS would probably allow the system to use more of the RAM, with less resource contention with the larger available address space.
On mine, the harder the GPUs work, the less time the CPU has to process WUs, and the higher the system resources taken up with Interrupts & DPCs (up to 15% when I tried to run SoG on it).
Yeah, 32-bit Vista on an old HP dc7700 with a C2D E7500. I did switch over to SoG (from Cuda50) to see if that would help, but there really wasn't any discernible change in the driver crashes or task hangs. I stuck with SoG, though, because I still get better overall production out of it. It can sometimes go several days without any hangs, then get a couple in the same day. Just no obvious pattern. That machine is mainly a crunch-only box. Once in a great while I use it for streaming (through an HDMI connection to my plasma TV), but on those occasions I shut down BOINC first.
ID: 1907589 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9190
Credit: 118,780,000
RAC: 49,590
Australia
Message 1907599 - Posted: 16 Dec 2017, 23:18:31 UTC - in response to Message 1907589.  

32-bit Vista on an old HP dc7700 with a C2D E7500. I did switch over to SoG (from Cuda50) to see if that would help, but there really wasn't any discernible change in the driver crashes or task hangs. I stuck with SoG, though, because I still get better overall production out of it.

On my 32bit Vista system i'm running 344.11 Anything higher and even with CUDA50 I got driver restarts. I even tried just 1 GTX 750Ti, reserved a core for it and just used the defaults for SoG, but still got too many driver restarts.
With CUDA50 i'm running 2 WUs at a time (although it makes the Arecibo WUs take 3 times as long if run with a GBT WU), but at least there are no restarts. I've lost entire caches to restarts.
:-/
Grant
Darwin NT
ID: 1907599 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 26 May 99
Posts: 9144
Credit: 52,280,773
RAC: 15,972
United Kingdom
Message 1907600 - Posted: 16 Dec 2017, 23:22:11 UTC

I have tried the TDR previously and noticed no real improvement, however that was just to prevent the driver restarts.

This problem with tasks hanging is different and new. If I don't notice the task hang then they error. I suspect it will get worse.

I tried Mikes command line and it has made no difference as the machine I put it on just had a task hang.

So as both machines are getting a bit long in the tooth now I think I will let them retire gracefully.

Annoying thing is they have both had new Corsair PSU's in the last year!
ID: 1907600 · Report as offensive     Reply Quote
Profile Jeff BuckSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1439
Credit: 147,558,810
RAC: 189,490
United States
Message 1907608 - Posted: 16 Dec 2017, 23:42:36 UTC - in response to Message 1907599.  

Grant (SSSF) wrote:
On my 32bit Vista system i'm running 344.11 Anything higher and even with CUDA50 I got driver restarts. I even tried just 1 GTX 750Ti, reserved a core for it and just used the defaults for SoG, but still got too many driver restarts.
With CUDA50 i'm running 2 WUs at a time (although it makes the Arecibo WUs take 3 times as long if run with a GBT WU), but at least there are no restarts. I've lost entire caches to restarts.
:-/
I'm running the 353.62 driver, which I think is as low as I can go with SoG. I'm pretty sure that's about what I was running with Cuda50, also, but I don't remember if I tried to go lower or not. Probably not.

Bernie Vine wrote:
So as both machines are getting a bit long in the tooth now I think I will let them retire gracefully.
Maybe it would be a good time to try experimenting with Linux. Since I've been running Linux on 3 of my other boxes, I've also considered switching over my problem child but, since I'd want to keep the Windows partition to make it dual-boot, I'd have to put a larger HD in it first. Just not enough room on the current drive for another 30GB partition.
ID: 1907608 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9190
Credit: 118,780,000
RAC: 49,590
Australia
Message 1907609 - Posted: 16 Dec 2017, 23:42:36 UTC - in response to Message 1907600.  
Last modified: 17 Dec 2017, 0:02:18 UTC

This problem with tasks hanging is different and new. If I don't notice the task hang then they error. I suspect it will get worse.

Have you tried an older driver?
Have you got Process Explorer (or similar) to see if something else is using the CPU time and stopping the GPUs from crunching?

Have you got a app_config.xml file (or similar) to reserve a core for the GPU WUs?
eg
<app_config>
 <app>
  <name>setiathome_v8</name>
  <gpu_versions>
  <gpu_usage>1.00</gpu_usage>
  <cpu_usage>1.00</cpu_usage>
  </gpu_versions>
 </app>
</app_config>

Reserves a CPU core for each GPU WU.
If the GPU runs out of work, the CPU core will then process CPU WUs.
Grant
Darwin NT
ID: 1907609 · Report as offensive     Reply Quote
bluestar

Send message
Joined: 5 Sep 12
Posts: 2412
Credit: 1,932,899
RAC: 15
Message 1907640 - Posted: 17 Dec 2017, 2:40:20 UTC

But also a gaussian score being returned here, so except for that of a graphics card which could be used, apparently something "hogging" the CPU here,
and if so, I would always choose to pull the floppy drive when it is happening.
ID: 1907640 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester

Send message
Joined: 26 May 99
Posts: 9144
Credit: 52,280,773
RAC: 15,972
United Kingdom
Message 1907683 - Posted: 17 Dec 2017, 12:03:08 UTC
Last modified: 17 Dec 2017, 12:06:56 UTC

Well this morning both machines had stuck tasks. So I suspended then un-suspended and what then happens is that it pauses for a minute, then the task that was hung goes into "waiting to run" and a new task starts.

On both machines I have set NNT and one has now reached its last GPU task one that has been "waiting to run" what then happens is this

SETI@home 8.20 setiathome_v8 (opencl_nvidia_SoG) blc04_2bit_guppi_57903_49285_HIP9020_0009.13163.409.23.46.244.vlar_1 00:03:53 (00:00:42) 18.10 18.08 00:14:38 2/5/2018 4:05:20 AM Waiting to run,Suspended: Waiting to acquire lock [0] 00:00:00

Which required a restart of BOINC to get it to run.


As to Linux, I have to admit i have tried it in the past, but I am to "Windows" orientated. I admit if it cant be done with a GUI then I am not interested. Looking at some of the Linux threads in NC and I see I would again get frustrated and give up one more.

I do have 3 other old machines currently doing nothing and I will fire one of them up see if I can get it running then perhaps put one of the 750's in it to see if the problem persists.


P.S
I would always choose to pull the floppy drive when it is happening.


Er I assume you mean DVD drive, these machines may be old but as they are Dell's they don't have "floppy drives". ;-)
ID: 1907683 · Report as offensive     Reply Quote
juan BFPSpecial Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 6448
Credit: 359,763,304
RAC: 152,306
Panama
Message 1907688 - Posted: 17 Dec 2017, 12:24:48 UTC

Did you try to increase the windows swap file? Or use a less aggressive SoG setting for memory usage?

Ask because some (me included) has several problems few weeks ago after some windows updates related to the available memory.
Yes i agree, that sounds weird but changes in the memory setting fixes the problem.

Not say this could be your problem since the msg we receive was task postponed not Waiting to acquire lock but our GPU was 1060/1070 and yours is a 750. You lose nothing if you check that.
ID: 1907688 · Report as offensive     Reply Quote
Profile Chris SCrowdfunding Project Donor*Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 40355
Credit: 38,397,516
RAC: 57,113
United Kingdom
Message 1907690 - Posted: 17 Dec 2017, 12:28:26 UTC

Bernie,

Just looked at your top machine.

1. You are running Build R3330. If you use lunatics you should be on build R3557.

2. Your CPU time is the same as your GPU time, you should use the -use_sleep command.

Sounds like lack of CPU resources. Maybe no help.
ID: 1907690 · Report as offensive     Reply Quote
bluestar

Send message
Joined: 5 Sep 12
Posts: 2412
Credit: 1,932,899
RAC: 15
Message 1907695 - Posted: 17 Dec 2017, 12:45:16 UTC - in response to Message 1907683.  
Last modified: 17 Dec 2017, 12:51:03 UTC

Perhaps right there, Bernie, but also that a DVD should still be having a driver interfacing against the operating system,
and my guess is that stuck drivers should not affect the CPU itself, because this should rather be the responsibility of the operating system.

Therefore perhaps making it the needle rather than the soft pillow here, because except for that of speed, always that of a couple of questions around for
that of running tasks by means of a graphics card.

Here the opposite thing, however, so therefore I gave it the regular look in the usual way of looking at that of possible hardware which could be affecting the CPU,
and from own experience, should tell that a stuck floppy could bring a whole system to a halt.

https://en.wikipedia.org/wiki/Poll

https://en.wikipedia.org/wiki/Polling_(computer_science)

Not for that of an election here, since I made a similar reference to that of Deadlock earlier on, except for perhaps not an external event being the possible reason,
but rather an internal one being part of the current scheme.

Becomes a mentioning to that of I/O here (or input/output), but rather I was thinking about that of IRQ here being perhaps a more common or typical thing.

https://en.wikipedia.org/wiki/Interrupt

So why not error here for these tasks, if perhaps not already so?

Again, it validated here for the first error task in that list, so if perhaps stuck, it becomes that of a stuck task, and next with perhaps respect to that of the CPU itself.

Therefore my thought here about this perhaps being a hardware issue, as mentioned.
ID: 1907695 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.