Want: EXIT_TIME_LIMIT_EXCEEDED to happen faster

Message boards : Number crunching : Want: EXIT_TIME_LIMIT_EXCEEDED to happen faster
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 1604518 - Posted: 23 Nov 2014, 14:21:25 UTC
Last modified: 23 Nov 2014, 14:48:40 UTC

There is a discussion here but I cannot make sense of it.

My Problem: I have tasks on my pair of 5850s that should take 20-40 seconds. Occasionally one hangs and almost 2 hours later that exit timeout error occurs and that GPU finally starts crunching again. I want to somehow set the expected time to about 2 minutes and it seems there possibly is a way to do it by estimating the flops usage but I don't understand claggys instructions.

There is a discussion here on how to set a timeout in the project job file but a client restart erases the XML file and I am back babysitting the project in either case.
ID: 1604518 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1604524 - Posted: 23 Nov 2014, 14:45:50 UTC - in response to Message 1604518.  

There is a discussion here but I cannot make sense of it.

That url doesn't go anywhere.

My Problem: I have tasks on my pair of 5850s that should take 20-40 seconds. Occasionally one hangs and almost 2 hours later that exit timeout error occurs and that GPU finally starts crunching again. I want to somehow set the expected time to about 2 minutes and it seems there possibly is a way to do it by estimating the flops usage but I don't understand claggys instructions.

No normal Seti v7 task should only take 20 to 40 seconds, I don't think there are any GPUs fast enough to do Seti v7 Wu's that fast,
you're also got no HD5850's attached to Seti, If you're talking about another project's apps, wouldn't the correct course of action be to stop those apps hanging?

What were my alleged instructions? I don't recall giving you any instructions.

There is a discussion here on how to set a timeout in the project job file but a client restart erases the XML file and I am back babysitting the project in either case.


Claggy
ID: 1604524 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 1604526 - Posted: 23 Nov 2014, 14:50:57 UTC - in response to Message 1604524.  

Sorry, I was busy looking for the correct url as I failed to bookmark the thread
http://setiathome.berkeley.edu/forum_thread.php?id=71148&postid=1347580#1347580

Anyway, you are correct as the problem should be addressed by the project if they would get around to it.

I probabloy should have posted this at the boinc client forum.
ID: 1604526 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1604528 - Posted: 23 Nov 2014, 15:07:30 UTC - in response to Message 1604526.  

Sorry, I was busy looking for the correct url as I failed to bookmark the thread
http://setiathome.berkeley.edu/forum_thread.php?id=71148&postid=1347580#1347580

Anyway, you are correct as the problem should be addressed by the project if they would get around to it.

I probabloy should have posted this at the boinc client forum.

No need to bookmark that thread - the (very specific) BOINC upgrade problem being discussed there now has its own 'sticky' thread at the top of this forum:

Seti, BOINC 7.0.28 going to 7.0.6x and 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED error on GPU work

Unless you were making that specific transition across the BOINC v7.0.30 boundary, both threads are irrelevant to you.

Most projects (including this one) won't have a 'job file': that's specific to the 'wrapper' application which is only used by projects which can't, or won't, write their own native-to-BOINC science application.

The only other control you have available is <rsc_fpops_bound>, but that would be even more wasteful of your babysitting time: it's sent in the <workunit> specification of each individual task you're allocated. If your tasks are indeed completing validly in 20-40 seconds (unlikely here), you'd need to do a big global search/replace and restart BOINC every few minutes.
ID: 1604528 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1604569 - Posted: 23 Nov 2014, 19:05:34 UTC

The easiest way to unstick GPU tasks is to bounce the GPU at an interval.
In the early days of doing MB work on my ATI GPUs I would run this every few hours.
boinccmd --set_gpu_mode never 10
That way BOINC would suspend GPU processing for 10 seconds & then resume.

For Bitcoin Utopia you may want to run that every 10-15 min.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1604569 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 1605465 - Posted: 26 Nov 2014, 0:52:22 UTC - in response to Message 1604569.  

Thanks! This worked. I am setting up the win7 task scheduler to perform this function. Two tasks that were hung actually completed successfully after restarting. One was hung for 5 hours. I noticed that the run time reported by the project did NOT show 5 hours, only the normal 40 or so seconds. Apparently it did not exit properly after successfully completing.
ID: 1605465 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1605470 - Posted: 26 Nov 2014, 1:22:48 UTC - in response to Message 1605465.  

Thanks! This worked. I am setting up the win7 task scheduler to perform this function. Two tasks that were hung actually completed successfully after restarting. One was hung for 5 hours. I noticed that the run time reported by the project did NOT show 5 hours, only the normal 40 or so seconds. Apparently it did not exit properly after successfully completing.

If you suspend a BCU tasks it starts over. I don't think they checkpint because when you are running one it is doing stuff over the internet.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1605470 · Report as offensive

Message boards : Number crunching : Want: EXIT_TIME_LIMIT_EXCEEDED to happen faster


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.