I know I shouldn't look a gift horse in the mouth, but...

Message boards : Number crunching : I know I shouldn't look a gift horse in the mouth, but...
Message board moderation

To post messages, you must log in.

AuthorMessage
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1170060 - Posted: 11 Nov 2011, 12:19:41 UTC

Anyone know what might have changed yesterday?

After a frustrating week of watching my RAC crash, suddenly, starting yesterday afternoon I started getting task downloads reporting over 100 Kbs and I am finally starting to build up a cache of tasks. I have 150 machines that for a week could not get many tasks to successfully download. I have the resource for SETI set to 10000 and Einstein set to 0 but I was running over 90 percent Einstein. Spending hours clicking retry on downloads was not yielding very many successful task download completions. All of my machines now have almost a full days worth of cache; Yeah!!! I loose most of the machines in December but if this good fortune continues I might have a chance to make it up a notch in the standings before I loose time on those machines.

My sincerest thanks to whoever cleaned out the pipes. My mouse buttons will really appreciate the rest.
Merci Beacoup et laissez les bon temps rouler!!!!
ID: 1170060 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1170533 - Posted: 12 Nov 2011, 16:41:34 UTC - in response to Message 1170060.  

Unfortunately it was a shorted lived situation. Later that morning I was back to the cpus completing tasks faster than they would complete new downloads and the cache is already dwindling. I guess I need to find a different offering for the network gods.
ID: 1170533 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1170538 - Posted: 12 Nov 2011, 17:02:49 UTC - in response to Message 1170533.  

Unfortunately it was a shorted lived situation. Later that morning I was back to the cpus completing tasks faster than they would complete new downloads and the cache is already dwindling. I guess I need to find a different offering for the network gods.

Were you waving a live chicken or a dead one.


BOINC WIKI
ID: 1170538 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1170542 - Posted: 12 Nov 2011, 17:13:00 UTC - in response to Message 1170538.  

It was a rubber one. Too many PETA members in my town....
ID: 1170542 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1170552 - Posted: 12 Nov 2011, 18:05:46 UTC - in response to Message 1170542.  

It was a rubber one. Too many PETA members in my town....

Is that People Eating Tasty Animals, or People for the Ethical Treatment of Animals?*

I noticed a few of my machines are to the same point of processing things faster than downloading as well. Such is the will of the S@H gods.

*Disclaimer: I don't eat animals nor am I a member of either group.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1170552 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1170555 - Posted: 12 Nov 2011, 18:19:25 UTC - in response to Message 1170552.  

It was a rubber one. Too many PETA members in my town....

Is that People Eating Tasty Animals, or People for the Ethical Treatment of Animals?*

I noticed a few of my machines are to the same point of processing things faster than downloading as well. Such is the will of the S@H gods.

*Disclaimer: I don't eat animals nor am I a member of either group.

I think right now it's 'People Eating Tasty Astropulse'.....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1170555 · Report as offensive
Profile speedbump

Send message
Joined: 19 May 01
Posts: 247
Credit: 192,906,380
RAC: 0
United States
Message 1170578 - Posted: 12 Nov 2011, 19:56:17 UTC

I noticed your RAC dropping, I was wondering why. Your machines rely on CPU work units, which I dont have a problem getting. Most of my RAC is frpm GPU work units, which is hard to come by in enough quanities at the moment.I think we have all had problems getiing them to download, I keep having to force them, which is hard with several machines located all over. I can only imagine with the number you have, and I would think they are spread over a much greater distance.
ID: 1170578 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1170647 - Posted: 13 Nov 2011, 1:32:19 UTC - in response to Message 1170578.  

Most of these servers are actually in the same building. 80 percent of them are blade servers and we can get 64 of them per 6 foot cabinet, (4 chassis with 16 blades per chassis).

I use one of the excellent tools from eFMer, boinctasks, http://www.efmer.eu/boinc/boinc_tasks/download.html. (thanks Fred) There is a "retry all" selection that can be used to trigger download retries on multiple machines. This tool is "a must" to be able to manage a large number of machines.

I can spend a lot to time clicking retry to get the server caches full. (I guess SETI benfits from my OCD) The last few weeks I have not been able to keep a cache on most of the servers because downloads can take up to 50 retries before completing. Last week I was not able to get many tasks at all and I had to let the machines crunch my backup project, Einstein@Home.


ID: 1170647 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1170661 - Posted: 13 Nov 2011, 3:17:51 UTC - in response to Message 1170647.  

Most of these servers are actually in the same building. 80 percent of them are blade servers and we can get 64 of them per 6 foot cabinet, (4 chassis with 16 blades per chassis).

I use one of the excellent tools from eFMer, boinctasks, http://www.efmer.eu/boinc/boinc_tasks/download.html. (thanks Fred) There is a "retry all" selection that can be used to trigger download retries on multiple machines. This tool is "a must" to be able to manage a large number of machines.

I can spend a lot to time clicking retry to get the server caches full. (I guess SETI benfits from my OCD) The last few weeks I have not been able to keep a cache on most of the servers because downloads can take up to 50 retries before completing. Last week I was not able to get many tasks at all and I had to let the machines crunch my backup project, Einstein@Home.


I use BOINCcmd, which is part of BOINC, to have my machines automatically retry their transfers every 6 hours. It has a lot of useful commands when you need to control a plethora of remote hosts.
On Tuesdays once we go down for maintenance I normally suspend network communication for all of my machines for 6 hours with 1 click of a bat file.

This is one I use for my machines at home.
set cmdpath=S:\BOINC
set pass=password
set cmd=--set_network_mode never 21600
%cmdpath%\boinccmd --host HAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host HAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host HTPCII --passwd %pass% %cmd%
%cmdpath%\boinccmd --host SAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host SAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host AE35 --passwd %pass% %cmd%

SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1170661 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1170671 - Posted: 13 Nov 2011, 4:55:59 UTC - in response to Message 1170661.  


I use BOINCcmd, which is part of BOINC, to have my machines automatically retry their transfers every 6 hours. It has a lot of useful commands when you need to control a plethora of remote hosts.
On Tuesdays once we go down for maintenance I normally suspend network communication for all of my machines for 6 hours with 1 click of a bat file.

This is one I use for my machines at home.
set cmdpath=S:\BOINC
set pass=password
set cmd=--set_network_mode never 21600
%cmdpath%\boinccmd --host HAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host HAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host HTPCII --passwd %pass% %cmd%
%cmdpath%\boinccmd --host SAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host SAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host AE35 --passwd %pass% %cmd%


Thank you for the code snippet. I try to hack this up to stop all the machines from hitting my HTTP proxy server during the weekly scheduled outages.

ID: 1170671 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1170677 - Posted: 13 Nov 2011, 6:41:48 UTC

In fact once you've got the command or BAT file setup you can just have it run as a scheduled task via the windows task scheduler.

All you need to do is work out when the scheduled outage starts in your time zone and have it run once to disable comms and a another one to turn comms back on after the outage.
ID: 1170677 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1170724 - Posted: 13 Nov 2011, 14:45:44 UTC - in response to Message 1170677.  

In fact once you've got the command or BAT file setup you can just have it run as a scheduled task via the windows task scheduler.

All you need to do is work out when the scheduled outage starts in your time zone and have it run once to disable comms and a another one to turn comms back on after the outage.

There is a duration field in the example HAL9000 gave. I thought it would automatically come back on after the duration had expired, would it not?
ID: 1170724 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1170745 - Posted: 13 Nov 2011, 15:47:01 UTC - in response to Message 1170724.  

In fact once you've got the command or BAT file setup you can just have it run as a scheduled task via the windows task scheduler.

All you need to do is work out when the scheduled outage starts in your time zone and have it run once to disable comms and a another one to turn comms back on after the outage.

There is a duration field in the example HAL9000 gave. I thought it would automatically come back on after the duration had expired, would it not?

In the manual that I posted a link to it states: "If duration is zero or absent, this mode is permanent. Otherwise, after 'duration' seconds elapse, revert to last permanent mode.". So for '--set_network_mode never 21600' that is effectually a 6 hour snooze of the network if the last setting was '--set_network_mode auto' or '--set_network_mode always'. However if the last setting was '--set_network_mode never' nothing would change. Before I was using duration I was 2 timed events. One ran on Tuesday at 11:00AM EDT for '--set_network_mode never' and then again at 5:00PM for '--set_network_mode auto'. However I found using duration to be a better suited to the job. On my work network sometimes a machine or two will miss a command. So having duration turn the network back on automatically is much better than sending the auto command a few times.

In my other bat files that need to repeat every so often I use the timeout command at the end.
@ECHO OFF
Set dtime=43200
:start
@ECHO Updating everyone %time%
start /min Update_All.bat
timeout %dtime% /nobreak
goto start

I guess there were some advantages of growing up in the DOS era.

I found using BOINCcmd to be the easiest way to manage my ~30 machines. At home I only have 1 machine connected to the UPS data connection. So I have an event setup 'when the computer is on batteries', or whatever it is exactly, that it runs the command to suspend computation on all of my other machines. Then once it is off of the battery resume. If I had a UPS with a network connection I probably wouldn't need to do that, but it is what I have to work with.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1170745 · Report as offensive

Message boards : Number crunching : I know I shouldn't look a gift horse in the mouth, but...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.