work units in error state

Questions and Answers : Getting started : work units in error state
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile genfoch01
Avatar

Send message
Joined: 23 Jun 99
Posts: 4
Credit: 1,626,500
RAC: 0
United States
Message 1116267 - Posted: 12 Jun 2011, 14:43:37 UTC

So I am running boinc on a windows machine using the gpu. I have been getting wu in an error state with the stderr

<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
]]>

anyone know what this means? It can not be speaking of the wu deadline since thats over 2 weeks away.

I now have 7 of these so I am curious as to what this is about and if there is any way to eliminate it.
Angels can fly because they take themselves lightly

- G.K.Chesterton
ID: 1116267 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1116322 - Posted: 12 Jun 2011, 16:15:25 UTC - in response to Message 1116267.  

When have you rebooted that machine the last time?

It looks like the CUDA tasks get stuck on your GPU. Are they showing any progress in the Tasks tab of BOINC manager when running?

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1116322 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1116409 - Posted: 12 Jun 2011, 21:20:22 UTC - in response to Message 1116267.  

<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
]]>

anyone know what this means?

What it means is this:
Each task comes equipped with a 'runtime estimate' which is measured in FLOPs (FLoating point OPerations). It's the <rsc_fpops_est>64788201036261.297000</rsc_fpops_est> amount in the <workunit/> tags in client_state.xml file (my example given is from an Einstein task).

BOINC will check whether or not the application goes over this estimated amount, and if it does, the application is stopped and that error is given.

Another error similar to that is when <rsc_disk_bound>100000000.000000</rsc_disk_bound> gets exceeded, then the message is "Maximum disk space exceeded". Then the amount of disk space that the task should use up is too big.
ID: 1116409 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1116464 - Posted: 13 Jun 2011, 0:13:48 UTC - in response to Message 1116409.  
Last modified: 13 Jun 2011, 0:43:03 UTC


Simply put:
If the task for whatever reason runs 10 times longer than the initial estimate (column "To completion")
BOINC aborts the task automatically as it decides that the task is hang/stuck.

(But why it happened to tasks with "Run time: 0.00" I don't know
http://setiathome.berkeley.edu/results.php?hostid=6014929&offset=0&show_names=0&state=5&appid=

Maybe because you just reached:
Consecutive valid tasks 11
http://setiathome.berkeley.edu/host_app_versions.php?hostid=6014929
)

Info on old threads:
http://setiathome.berkeley.edu/forum_thread.php?id=60413&nowrap=true
http://setiathome.berkeley.edu/forum_thread.php?id=62226


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1116464 · Report as offensive

Questions and Answers : Getting started : work units in error state


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.