SETI@home | Backing off 2 hr 45 min 50 sec on download

Message boards : Number crunching : SETI@home | Backing off 2 hr 45 min 50 sec on download
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1195973 - Posted: 16 Feb 2012, 12:43:28 UTC
Last modified: 16 Feb 2012, 13:29:00 UTC

When will the BIONC client get fixed to use sensible backoffs? I have enough work to keep my CPUs and GPUs busy for 8 minutes at most so backing off this long seems totally inappropiate to me. I know I could revert to the old versions, but feel the current and future versions should get fixed.
Wow, the BOINC scheduler must read these posts! I just got:
16/02/2012 12:43:22 | SETI@home | Scheduler request completed: got 46 new tasks
But given the following I suspect not.
16/02/2012 13:06:09 | SETI@home | Backing off 7 hr 15 min 35 sec on download of 09jn11ac.2831.9888.3.10.211
I wonder, is there an API to allow a program to effectively "Press [Retry Now]"?
ID: 1195973 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34766
Credit: 261,360,520
RAC: 489
Australia
Message 1195978 - Posted: 16 Feb 2012, 12:58:47 UTC - in response to Message 1195973.  

The long backoffs are a feature of the 6.12.xx line that you must put up with if you want to use those versions, if you don't like those backoffs then revert to a later version of the 6.10xx line, I use versions 6.10.56 to 6.10.60 myself and if you check out the Top Hosts page you'll see that most of them use the 6.10.xx line as well.

Cheers.
ID: 1195978 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1195990 - Posted: 16 Feb 2012, 14:26:20 UTC - in response to Message 1195973.  

I wonder, is there an API to allow a program to effectively "Press [Retry Now]"?

[eesridr:BOINC] > crontab -l
* * * * * source retryfiles
[eesridr:BOINC] > cat ~/retryfiles
cd ~/BOINC/
./boinccmd --get_file_transfers | gawk -f retry.awk
cd

[eesridr:BOINC] > cat retry.awk
/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

[eesridr:BOINC] > cat cc_config.xml
<cc_config>
<options>
<max_file_xfers>50</max_file_xfers>
<max_file_xfers_per_project>50</max_file_xfers_per_project>
</options>
</cc_config>

ID: 1195990 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1195991 - Posted: 16 Feb 2012, 14:27:35 UTC - in response to Message 1195990.  
Last modified: 16 Feb 2012, 14:34:44 UTC

That looks like a complicated script.
Modifying the source and (re)compiling boinc was much easier for me.

Specifically, increasing the value of #define FILE_XFER_FAILURE_LIMIT in client/client_types.h, to prevent project backoffs (default was 3 which is ridiculous).
Also, a slight adjustment in calculate_exponential_backoff() in client/client_state.cpp to cap it to 15 mins max
ID: 1195991 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1195992 - Posted: 16 Feb 2012, 14:36:49 UTC - in response to Message 1195991.  
Last modified: 16 Feb 2012, 14:39:35 UTC

That looks like a complicated script.

Not really. cron sources the retryfiles script every minute.
retryfiles gets the file transfer list and pipes it to the awk script.
That picks up each filename as it goes past, and whenever it sees an inactive transfer it issues a retry command for the last filename seen.

Modifying the source and (re)compiling boinc was much easier for me.

Depending on how fast your computer is, it probably took less time to write than your running make. ;-)
[Plus I can use it unmodified on all my machines, Linux and Windows, and not recompile a thing.]
ID: 1195992 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1196014 - Posted: 16 Feb 2012, 15:59:57 UTC - in response to Message 1195990.  
Last modified: 16 Feb 2012, 16:07:57 UTC

Thank you, having max_file_xfers 50 helps is lot. I was hoping for a DLL I could call from C/C++ though. I could execute boinccmd.exe and grab the output, but would prefer to call functions.

16/02/2012 16:05:49 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and NVIDIA GPU
16/02/2012 16:06:00 | SETI@home | Scheduler request completed: got 0 new tasks
16/02/2012 16:06:00 | SETI@home | Not sending work - last request too recent: 25 sec

Is there a way to get round the last request too recent please?
ID: 1196014 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1196019 - Posted: 16 Feb 2012, 16:12:04 UTC - in response to Message 1196014.  

Thank you, having max_file_xfers 50 helps is lot. I was hoping for a DLL I could call from C/C++ though. I could execute boinccmd.exe and grab the output, but would prefer to call functions.

16/02/2012 16:05:49 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and NVIDIA GPU
16/02/2012 16:06:00 | SETI@home | Scheduler request completed: got 0 new tasks
16/02/2012 16:06:00 | SETI@home | Not sending work - last request too recent: 25 sec

Is there a way to get round the last request too recent please?


No, that is a server side value.
ID: 1196019 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1196037 - Posted: 16 Feb 2012, 16:40:57 UTC - in response to Message 1195991.  

That looks like a complicated script.
Modifying the source and (re)compiling boinc was much easier for me.

Specifically, increasing the value of #define FILE_XFER_FAILURE_LIMIT in client/client_types.h, to prevent project backoffs (default was 3 which is ridiculous).
Also, a slight adjustment in calculate_exponential_backoff() in client/client_state.cpp to cap it to 15 mins max

This might be easier than that script. It is what I have been using for my systems.

One script with a delay for how often I want them to retry. This an 1 hour. Which I is about as low as I think it should be. I do it this way with I have a large number of machines.

_xfer_retry_01hr.bat
@ECHO OFF
Set ctime=0
Set dtime=3600
:start
@ECHO Retrying everyone %time%
start /min _xfer_retry_FooBar001.bat %ctime%
start /min _xfer_retry_FooBar002.bat %ctime%
start /min _xfer_retry_FooBar003.bat %ctime%
timeout %dtime% /nobreak
goto start

As I have a lot of systems. I made it easier on myself by having the things to configure in one spot at the top of the file.
_xfer_retry_FooBar001.bat
@ECHO OFF
set CMDPATH=D:\BOINC
set project=http://setiathome.berkeley.edu/
set password=password
Set hName=FooBar001
Set hadd=%hName%
Set DirPath=\\%hadd%\d$\Boinc\projects\setiathome.berkeley.edu
@ECHO %hName% MB Xfer retry in progress %time%
For /F %%a in ('dir "%DirPath%\??????a?.*" /b /OS') Do %CMDPATH%\boinccmd --host %hadd% --passwd %password% --file_transfer %project% %%a retry
@ECHO %hName% AP Xfer retry in progress %time%
For /F %%a in ('dir "%DirPath%\ap_*.*" /b /OS') Do %CMDPATH%\boinccmd --host %hadd% --passwd %password% --file_transfer %project% %%a retry
@ECHO %hName% Xfer retry complete    %time%
TIMEOUT %1
exit


If you just have 1 system you don't need to worry about several parts of the command and could just put this in your BOINC program folder.
_xfer_retry.bat
@ECHO OFF
set project=http://setiathome.berkeley.edu/
Set DirPath=C:\ProgramData\BOINC\projects\setiathome.berkeley.edu
@ECHO MB Xfer retry in progress %time%
For /F %%a in ('dir "%DirPath%\??????a?.*" /b /OS') Do boinccmd --file_transfer %project% %%a retry
@ECHO AP Xfer retry in progress %time%
For /F %%a in ('dir "%DirPath%\ap_*.*" /b /OS') Do boinccmd--file_transfer %project% %%a retry
@ECHO Xfer retry complete    %time%
TIMEOUT 10


Since the grunt of this is done in a for loop you could probably write this into a small app since you said you preferred something in the C environment.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1196037 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1196057 - Posted: 16 Feb 2012, 17:08:00 UTC - in response to Message 1196037.  

yes newer builds have that nasty message "last update to soon" because so many of us love to hit that retry button to force things to move along. Luckily, you only have to wait 1 minute to click again.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1196057 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1196069 - Posted: 16 Feb 2012, 18:12:11 UTC - in response to Message 1196057.  

yes newer builds have that nasty message "last update to soon" because so many of us love to hit that retry button to force things to move along. Luckily, you only have to wait 1 minute to click again.

No, it's nothing to do with the "newer builds" - it's a message, and time interval, set by the project server.

Here at SETI, you have to wait 5 minutes 03 seconds before you click the button again. Other projects have different limits - anything from 7 seconds to 4 hours.
ID: 1196069 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1196096 - Posted: 16 Feb 2012, 19:33:54 UTC - in response to Message 1196069.  

What I have in mind is to use the "Reserved" check box on the SIV [BOINC Status] panel for Auto Retry.

Looking at the output I get as follows. Is there a way to automatically get the project URL?

1) -----------
name: ap_09jn11ac_B6_P0_00012_20120216_06419.wu
generated locally: no
uploaded: no
upload when present: no
sticky: no
generated locally: no
pers xfer active: yes
xfer active: yes
time_so_far: 1986.045093
bytes_xferred: 6672034.000000
xfer_speed: 17802.299492
ID: 1196096 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1196099 - Posted: 16 Feb 2012, 19:47:27 UTC - in response to Message 1196096.  
Last modified: 16 Feb 2012, 19:49:23 UTC

What I have in mind is to use the "Reserved" check box on the SIV [BOINC Status] panel for Auto Retry.
[img ]http://rh-software.com/siv_bon.png[/img]
Looking at the output I get as follows. Is there a way to automatically get the project URL?

1) -----------
name: ap_09jn11ac_B6_P0_00012_20120216_06419.wu
generated locally: no
uploaded: no
upload when present: no
sticky: no
generated locally: no
pers xfer active: yes
xfer active: yes
time_so_far: 1986.045093
bytes_xferred: 6672034.000000
xfer_speed: 17802.299492

You can do boinccmd --file_transfer for completed files. BOINC seems to be smart enough to know it doesn't need to download already completed files. I am guessing that a check to see if the file is complete or an active transfer is done first.

Edit: Oh right my point! You can do --get_tasks, which gives you the project info, instead of --get_file_transfers.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1196099 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1196103 - Posted: 16 Feb 2012, 20:13:35 UTC - in response to Message 1196099.  

It looks like I will need to do them both as --get_tasks does not return xfer active:.

189) -----------
name: 04my11ab.27986.21744.14.10.219_1
WU name: 04my11ab.27986.21744.14.10.219
project URL: http://setiathome.berkeley.edu/
report deadline: Thu Mar 22 07:07:08 2012
ready to report: no
got server ack: no
final CPU time: 0.000000
state: 2
scheduler state: 0
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: 0
app version num: 0
checkpoint CPU time: 0.000000
current CPU time: 0.000000
fraction done: 0.000000
swap size: 0.000000
working set size: 0.000000
estimated CPU time remaining: 3365.796945
supports graphics: no
ID: 1196103 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1196171 - Posted: 17 Feb 2012, 1:35:08 UTC - in response to Message 1196103.  

It looks like I will need to do them both as --get_tasks does not return xfer active:.

189) -----------
name: 04my11ab.27986.21744.14.10.219_1
WU name: 04my11ab.27986.21744.14.10.219
project URL: http://setiathome.berkeley.edu/
report deadline: Thu Mar 22 07:07:08 2012
ready to report: no
got server ack: no
final CPU time: 0.000000
state: 2
scheduler state: 0
exit_status: 0
signal: 0
suspended via GUI: no
active_task_state: 0
app version num: 0
checkpoint CPU time: 0.000000
current CPU time: 0.000000
fraction done: 0.000000
swap size: 0.000000
working set size: 0.000000
estimated CPU time remaining: 3365.796945
supports graphics: no

Like I said I don't think you have to worry about telling BOINC to retry files that are not transferring. That is one of reasons I just went with "dir ??????a?.* /b /OS to get the task names". At first I was doing all of the files but then I would get an error message when I passed app_info.xml or the exes to BOINC for transfer.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1196171 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1196294 - Posted: 17 Feb 2012, 9:28:35 UTC - in response to Message 1196171.  

I have it working quite well and now have 724MB in my cache which is more than ever before. In the end I used --get_file_transfers to check for stalled transfers and then a GetFileAttributes() of \projects\setiathome.berkeley.edu\ to check it's a SETI@home WU.
I am now pondering if I should include this in the standard SIV release given the extra load it could put on the SETI@home servers.
ID: 1196294 · Report as offensive
K L ANG

Send message
Joined: 21 Apr 11
Posts: 3
Credit: 2,575,521
RAC: 0
United Kingdom
Message 1196343 - Posted: 17 Feb 2012, 14:48:05 UTC

I've found the 6.12.xx client better at task scheduling when crunching on multiple projects at the same time. I've just 'downgraded' to 6.10.xx to get round the lengthy backoff times for S@H tasks.
ID: 1196343 · Report as offensive
Profile Michael W.F. Miles
Avatar

Send message
Joined: 24 Mar 07
Posts: 268
Credit: 34,410,870
RAC: 0
Canada
Message 1199519 - Posted: 25 Feb 2012, 2:43:20 UTC

I have been looking at SIV and very impressed with the program.
I can't however get the auto retry stalled function to work
It will not let me check the check box.
I have tried starting as Admin but no go.
Any Ideas

Michael Miles
The Assimilators
ID: 1199519 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1199619 - Posted: 25 Feb 2012, 8:41:01 UTC - in response to Message 1199519.  

At the moment Auto Retry is not generally enabled as I am concerned it may overload the servers. I plan to make it generally available once I have feedback from Beta testers.
ID: 1199619 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1199683 - Posted: 25 Feb 2012, 16:15:06 UTC - in response to Message 1199619.  

At the moment Auto Retry is not generally enabled as I am concerned it may overload the servers. I plan to make it generally available once I have feedback from Beta testers.

I think in general the servers are, and have been, overloaded already. Are you using a hard coded retry interval or making it a user defined setting? I would think any more than 60-90 minutes would be to often. I think the guys in the lab think anything < a few hours is to often.

One thing I have found is that I have to sometimes suspend network traffic. As some transfers are reading something, like 2.5k, in the speed field but nothing is actually going on. If you are not already doing so you may want add a network suspend and then back to the users setting if possible. I actually do a network suspend, send the retry, and then let the network go again.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1199683 · Report as offensive
Digital1

Send message
Joined: 5 Aug 11
Posts: 11
Credit: 14,879,656
RAC: 0
United States
Message 1199699 - Posted: 25 Feb 2012, 17:08:03 UTC

In the BOINC manager, under tools --> computing preferences, on the network usage tab, set the additional work buffer to a couple of days or so. That should allow you to keep enough jobs in the queue and not worry so much on the download backoff times.
ID: 1199699 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : SETI@home | Backing off 2 hr 45 min 50 sec on download


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.