Message boards :
Number crunching :
SETI@home | Backing off 2 hr 45 min 50 sec on download
Message board moderation
Author | Message |
---|---|
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
When will the BIONC client get fixed to use sensible backoffs? I have enough work to keep my CPUs and GPUs busy for 8 minutes at most so backing off this long seems totally inappropiate to me. I know I could revert to the old versions, but feel the current and future versions should get fixed. Wow, the BOINC scheduler must read these posts! I just got: 16/02/2012 12:43:22 | SETI@home | Scheduler request completed: got 46 new tasks But given the following I suspect not. 16/02/2012 13:06:09 | SETI@home | Backing off 7 hr 15 min 35 sec on download of 09jn11ac.2831.9888.3.10.211 I wonder, is there an API to allow a program to effectively "Press [Retry Now]"? |
Wiggo Send message Joined: 24 Jan 00 Posts: 34887 Credit: 261,360,520 RAC: 489 |
The long backoffs are a feature of the 6.12.xx line that you must put up with if you want to use those versions, if you don't like those backoffs then revert to a later version of the 6.10xx line, I use versions 6.10.56 to 6.10.60 myself and if you check out the Top Hosts page you'll see that most of them use the 6.10.xx line as well. Cheers. |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
I wonder, is there an API to allow a program to effectively "Press [Retry Now]"? [eesridr:BOINC] > crontab -l * * * * * source retryfiles [eesridr:BOINC] > cat ~/retryfiles cd ~/BOINC/ ./boinccmd --get_file_transfers | gawk -f retry.awk cd [eesridr:BOINC] > cat retry.awk /name/ { n = $2;} / xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");} [eesridr:BOINC] > cat cc_config.xml <cc_config> <options> <max_file_xfers>50</max_file_xfers> <max_file_xfers_per_project>50</max_file_xfers_per_project> </options> </cc_config> |
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0 |
That looks like a complicated script. Modifying the source and (re)compiling boinc was much easier for me. Specifically, increasing the value of #define FILE_XFER_FAILURE_LIMIT in client/client_types.h, to prevent project backoffs (default was 3 which is ridiculous). Also, a slight adjustment in calculate_exponential_backoff() in client/client_state.cpp to cap it to 15 mins max |
ivan Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 |
That looks like a complicated script. Not really. cron sources the retryfiles script every minute. retryfiles gets the file transfer list and pipes it to the awk script. That picks up each filename as it goes past, and whenever it sees an inactive transfer it issues a retry command for the last filename seen. Modifying the source and (re)compiling boinc was much easier for me. Depending on how fast your computer is, it probably took less time to write than your running make. ;-) [Plus I can use it unmodified on all my machines, Linux and Windows, and not recompile a thing.] |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Thank you, having max_file_xfers 50 helps is lot. I was hoping for a DLL I could call from C/C++ though. I could execute boinccmd.exe and grab the output, but would prefer to call functions. 16/02/2012 16:05:49 | SETI@home | Reporting 1 completed tasks, requesting new tasks for CPU and NVIDIA GPU 16/02/2012 16:06:00 | SETI@home | Scheduler request completed: got 0 new tasks 16/02/2012 16:06:00 | SETI@home | Not sending work - last request too recent: 25 sec Is there a way to get round the last request too recent please? |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Thank you, having max_file_xfers 50 helps is lot. I was hoping for a DLL I could call from C/C++ though. I could execute boinccmd.exe and grab the output, but would prefer to call functions. No, that is a server side value. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
That looks like a complicated script. This might be easier than that script. It is what I have been using for my systems. One script with a delay for how often I want them to retry. This an 1 hour. Which I is about as low as I think it should be. I do it this way with I have a large number of machines. _xfer_retry_01hr.bat @ECHO OFF Set ctime=0 Set dtime=3600 :start @ECHO Retrying everyone %time% start /min _xfer_retry_FooBar001.bat %ctime% start /min _xfer_retry_FooBar002.bat %ctime% start /min _xfer_retry_FooBar003.bat %ctime% timeout %dtime% /nobreak goto start As I have a lot of systems. I made it easier on myself by having the things to configure in one spot at the top of the file. _xfer_retry_FooBar001.bat @ECHO OFF set CMDPATH=D:\BOINC set project=http://setiathome.berkeley.edu/ set password=password Set hName=FooBar001 Set hadd=%hName% Set DirPath=\\%hadd%\d$\Boinc\projects\setiathome.berkeley.edu @ECHO %hName% MB Xfer retry in progress %time% For /F %%a in ('dir "%DirPath%\??????a?.*" /b /OS') Do %CMDPATH%\boinccmd --host %hadd% --passwd %password% --file_transfer %project% %%a retry @ECHO %hName% AP Xfer retry in progress %time% For /F %%a in ('dir "%DirPath%\ap_*.*" /b /OS') Do %CMDPATH%\boinccmd --host %hadd% --passwd %password% --file_transfer %project% %%a retry @ECHO %hName% Xfer retry complete %time% TIMEOUT %1 exit If you just have 1 system you don't need to worry about several parts of the command and could just put this in your BOINC program folder. _xfer_retry.bat @ECHO OFF set project=http://setiathome.berkeley.edu/ Set DirPath=C:\ProgramData\BOINC\projects\setiathome.berkeley.edu @ECHO MB Xfer retry in progress %time% For /F %%a in ('dir "%DirPath%\??????a?.*" /b /OS') Do boinccmd --file_transfer %project% %%a retry @ECHO AP Xfer retry in progress %time% For /F %%a in ('dir "%DirPath%\ap_*.*" /b /OS') Do boinccmd--file_transfer %project% %%a retry @ECHO Xfer retry complete %time% TIMEOUT 10 Since the grunt of this is done in a for loop you could probably write this into a small app since you said you preferred something in the C environment. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
yes newer builds have that nasty message "last update to soon" because so many of us love to hit that retry button to force things to move along. Luckily, you only have to wait 1 minute to click again. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
yes newer builds have that nasty message "last update to soon" because so many of us love to hit that retry button to force things to move along. Luckily, you only have to wait 1 minute to click again. No, it's nothing to do with the "newer builds" - it's a message, and time interval, set by the project server. Here at SETI, you have to wait 5 minutes 03 seconds before you click the button again. Other projects have different limits - anything from 7 seconds to 4 hours. |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
What I have in mind is to use the "Reserved" check box on the SIV [BOINC Status] panel for Auto Retry. Looking at the output I get as follows. Is there a way to automatically get the project URL? 1) ----------- |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
What I have in mind is to use the "Reserved" check box on the SIV [BOINC Status] panel for Auto Retry. You can do boinccmd --file_transfer for completed files. BOINC seems to be smart enough to know it doesn't need to download already completed files. I am guessing that a check to see if the file is complete or an active transfer is done first. Edit: Oh right my point! You can do --get_tasks, which gives you the project info, instead of --get_file_transfers. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
It looks like I will need to do them both as --get_tasks does not return xfer active:. 189) ----------- |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
It looks like I will need to do them both as --get_tasks does not return xfer active:. Like I said I don't think you have to worry about telling BOINC to retry files that are not transferring. That is one of reasons I just went with "dir ??????a?.* /b /OS to get the task names". At first I was doing all of the files but then I would get an error message when I passed app_info.xml or the exes to BOINC for transfer. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
I have it working quite well and now have 724MB in my cache which is more than ever before. In the end I used --get_file_transfers to check for stalled transfers and then a GetFileAttributes() of \projects\setiathome.berkeley.edu\ to check it's a SETI@home WU. I am now pondering if I should include this in the standard SIV release given the extra load it could put on the SETI@home servers. |
K L ANG Send message Joined: 21 Apr 11 Posts: 3 Credit: 2,575,521 RAC: 0 |
I've found the 6.12.xx client better at task scheduling when crunching on multiple projects at the same time. I've just 'downgraded' to 6.10.xx to get round the lengthy backoff times for S@H tasks. |
Michael W.F. Miles Send message Joined: 24 Mar 07 Posts: 268 Credit: 34,410,870 RAC: 0 |
I have been looking at SIV and very impressed with the program. I can't however get the auto retry stalled function to work It will not let me check the check box. I have tried starting as Admin but no go. Any Ideas Michael Miles The Assimilators |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
At the moment Auto Retry is not generally enabled as I am concerned it may overload the servers. I plan to make it generally available once I have feedback from Beta testers. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
At the moment Auto Retry is not generally enabled as I am concerned it may overload the servers. I plan to make it generally available once I have feedback from Beta testers. I think in general the servers are, and have been, overloaded already. Are you using a hard coded retry interval or making it a user defined setting? I would think any more than 60-90 minutes would be to often. I think the guys in the lab think anything < a few hours is to often. One thing I have found is that I have to sometimes suspend network traffic. As some transfers are reading something, like 2.5k, in the speed field but nothing is actually going on. If you are not already doing so you may want add a network suspend and then back to the users setting if possible. I actually do a network suspend, send the retry, and then let the network go again. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Digital1 Send message Joined: 5 Aug 11 Posts: 11 Credit: 14,879,656 RAC: 0 |
In the BOINC manager, under tools --> computing preferences, on the network usage tab, set the additional work buffer to a couple of days or so. That should allow you to keep enough jobs in the queue and not worry so much on the download backoff times. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.