Is there a script to auto-abort a task if...

Message boards : Number crunching : Is there a script to auto-abort a task if...
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Ozymandias
Volunteer tester

Send message
Joined: 15 May 99
Posts: 15
Credit: 88,034,550
RAC: 46
United States
Message 1854976 - Posted: 12 Mar 2017, 6:48:06 UTC

I have a laptop that I use to crunch and ever so often a GPU task will hang indefinitely (just aborted one at 20+ hours).
If I kill the offending process in task manager, the progress starts over and hangs at the same percentage.
The only way to move on is to abort the task and move on.
Is there a way to script this in Windows (or in a bat file) so that if a task takes longer than X hours that it will abort?
Most of the tasks take a few hours but when it hits 5+ hours I know it's hanging up.
Any help would be appreciated!
ID: 1854976 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1855016 - Posted: 12 Mar 2017, 10:35:15 UTC

hi, find this computer https://setiathome.berkeley.edu/show_host_detail.php?hostid=7752795 with bad Gpu tasks ..

when i'm looking to your results, your GPU don't go very faster compared to your CPU like here
https://setiathome.berkeley.edu/results.php?hostid=7752795&offset=40&show_names=0&state=4&appid=

have you made some optimisation with your GPU ? ( boinc 7.6.33 )

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<stderr_txt>
Maximum single buffer size set to:128MB
Number of period iterations for PulseFind set to:100

Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
1 slot of 64 used for this instance
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 2



do you take one CPU core appart for your GPU even with R3330 app ? in your Seti preferences, use of 100% cpu time ? try 75% for testing .
CPU app will use 3 core/4 and let one free for GPU computation helping to see if it change the GPU hang .

it seem to have another computer with same Graphic card working perfectly ..
https://setiathome.berkeley.edu/results.php?hostid=7967141

when i look at the Results , there is no optimisation found in header .. with Boinc 7.6.22
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 0
0 slot of 64 used for this instance
Info: BOINC provided OpenCL device ID used
Info: CPU affinity mask used: 1

ID: 1855016 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1855089 - Posted: 12 Mar 2017, 17:11:14 UTC
Last modified: 12 Mar 2017, 17:12:24 UTC

I tossed this together really quickly and there are probably better ways to do it.
This will get the job done if you want to abort tasks that run over a specific limit.

@ECHO OFF
pushd %~dp0
cls
set outfile1=_task-name.txt
set outfile2=_task-project.txt
set outfile3=_task-checkpoint.txt
set skipcount=0
set bnccmd=boinccmd --get_tasks
set killproject=http://setiathome.berkeley.edu/
set runlimit=18000

@ECHO SOF SOF SOF SOF > %outfile1%
@ECHO SOF SOF SOF SOF > %outfile2%
@ECHO SOF SOF SOF SOF > %outfile3%
%bnccmd% | find "name:" | find /V "WU name:">> %outfile1%
%bnccmd% | find "URL:" >> %outfile2%
%bnccmd% | find "time:" | find /V "checkpoint CPU time:" | find /V "final CPU time:" >> %outfile3%
@ECHO EOF EOF EOF EOF >> %outfile1%
@ECHO EOF EOF EOF EOF >> %outfile2%
@ECHO EOF EOF EOF EOF >> %outfile3%

@ECHO Starting task run time check %date% %time%
:miniloop
set /a skipcount=%skipcount%+1
FOR /F "skip=%skipcount% tokens=4* delims= " %%a in (%outfile3%) do set runtime=%%a & goto checktime
:checktime
if %runtime% equ EOF goto end
if %runtime% equ 0.000000 goto miniloop
set /a runtime/=1
if %runtime% LSS %runlimit% goto miniloop
FOR /F "skip=%skipcount% tokens=3* delims= " %%a in (%outfile2%) do set projectname=%%a & goto checkproject
:checkproject
if %projectname% NEQ %killproject% goto miniloop
FOR /F "skip=%skipcount% tokens=2* delims= " %%a in (%outfile1%) do set taskname=%%a & goto killtask
:killtask
@ECHO Run time limit reached %runtime%/%runlimit% seconds. Killing task %taskname%
@ECHO %date% %time%, %runtime%/%runlimit%, %taskname% >> _kill_log.txt
boinccmd --task %projectname% %taskfile% abort
goto miniloop
:end
@ECHO Stopping task run time check %date% %time%
del %outfile1% & del %outfile2% & del %outfile3%


bnccmd - The boinccmd command line. Can be modified with --host --passwd values to run on remote hosts. See the BOINC wiki for full details on using boinccmd and configuring remote hosts.
killproject - The project you want to check
runlimit - The number of seconds that will trigger an abort.
_kill_log.txt - The log of tasks that have been aborted.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1855089 · Report as offensive
Profile Ozymandias
Volunteer tester

Send message
Joined: 15 May 99
Posts: 15
Credit: 88,034,550
RAC: 46
United States
Message 1856227 - Posted: 18 Mar 2017, 5:23:47 UTC - in response to Message 1855089.  

Great replies! I will test the BAT file tomorrow and let you know what I see.
Thank you!!!!
ID: 1856227 · Report as offensive
Profile Ozymandias
Volunteer tester

Send message
Joined: 15 May 99
Posts: 15
Credit: 88,034,550
RAC: 46
United States
Message 1856229 - Posted: 18 Mar 2017, 5:40:09 UTC - in response to Message 1855016.  
Last modified: 18 Mar 2017, 5:40:49 UTC

I had been testing command line parameters on the GPU executable and did not get any better results.
That is why you see the SBS and the Period set differently on one computer and not the other.
I had forgotten those were set so I took them off.

And yes, I have 2 laptops with exactly the same hardware and drivers and yet GPU tasks will stall on one but not the other.
I have wiped the offending laptop multiple times and tried numerous different driver combinations.
The only thing I can gather is that the hardware must have a flaw that the OS can compensate for, but will hang the GPU executable.
It is quite frustrating to say the least.

As for the speed of the GPU, for a little while I ran without GPU processing on one of the laptops.
The RAC was lower by a fair amount so I turned the GPU back on and started my frustrating journey...
ID: 1856229 · Report as offensive
Profile Ozymandias
Volunteer tester

Send message
Joined: 15 May 99
Posts: 15
Credit: 88,034,550
RAC: 46
United States
Message 1857162 - Posted: 23 Mar 2017, 3:44:30 UTC - in response to Message 1855089.  

Just to close this topic.
The BAT file works as advertised but it still bothered me I was aborting tasks.
Per HAL9000's suggestion, I found updated optimized GPU apps at Mike's World and took them for a spin.
The new apps have been working flawlessly.

Again, thank you all for the replies.
You have made my crunching world whole again :-)

jkh
ID: 1857162 · Report as offensive

Message boards : Number crunching : Is there a script to auto-abort a task if...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.