I know I shouldn't look a gift horse in the mouth, but...


log in

Advanced search

Message boards : Number crunching : I know I shouldn't look a gift horse in the mouth, but...

Author Message
mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 789,627,285
RAC: 57,864
United States
Message 1170060 - Posted: 11 Nov 2011, 12:19:41 UTC

Anyone know what might have changed yesterday?

After a frustrating week of watching my RAC crash, suddenly, starting yesterday afternoon I started getting task downloads reporting over 100 Kbs and I am finally starting to build up a cache of tasks. I have 150 machines that for a week could not get many tasks to successfully download. I have the resource for SETI set to 10000 and Einstein set to 0 but I was running over 90 percent Einstein. Spending hours clicking retry on downloads was not yielding very many successful task download completions. All of my machines now have almost a full days worth of cache; Yeah!!! I loose most of the machines in December but if this good fortune continues I might have a chance to make it up a notch in the standings before I loose time on those machines.

My sincerest thanks to whoever cleaned out the pipes. My mouse buttons will really appreciate the rest.
Merci Beacoup et laissez les bon temps rouler!!!!
____________

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 789,627,285
RAC: 57,864
United States
Message 1170533 - Posted: 12 Nov 2011, 16:41:34 UTC - in response to Message 1170060.

Unfortunately it was a shorted lived situation. Later that morning I was back to the cpus completing tasks faster than they would complete new downloads and the cache is already dwindling. I guess I need to find a different offering for the network gods.
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24387
Credit: 519,750
RAC: 26
United States
Message 1170538 - Posted: 12 Nov 2011, 17:02:49 UTC - in response to Message 1170533.

Unfortunately it was a shorted lived situation. Later that morning I was back to the cpus completing tasks faster than they would complete new downloads and the cache is already dwindling. I guess I need to find a different offering for the network gods.

Were you waving a live chicken or a dead one.
____________


BOINC WIKI

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 789,627,285
RAC: 57,864
United States
Message 1170542 - Posted: 12 Nov 2011, 17:13:00 UTC - in response to Message 1170538.

It was a rubber one. Too many PETA members in my town....
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4086
Credit: 111,836,125
RAC: 147,782
United States
Message 1170552 - Posted: 12 Nov 2011, 18:05:46 UTC - in response to Message 1170542.

It was a rubber one. Too many PETA members in my town....

Is that People Eating Tasty Animals, or People for the Ethical Treatment of Animals?*

I noticed a few of my machines are to the same point of processing things faster than downloading as well. Such is the will of the S@H gods.

*Disclaimer: I don't eat animals nor am I a member of either group.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38925
Credit: 579,117,752
RAC: 512,085
United States
Message 1170555 - Posted: 12 Nov 2011, 18:19:25 UTC - in response to Message 1170552.

It was a rubber one. Too many PETA members in my town....

Is that People Eating Tasty Animals, or People for the Ethical Treatment of Animals?*

I noticed a few of my machines are to the same point of processing things faster than downloading as well. Such is the will of the S@H gods.

*Disclaimer: I don't eat animals nor am I a member of either group.

I think right now it's 'People Eating Tasty Astropulse'.....
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile speedbump
Send message
Joined: 19 May 01
Posts: 247
Credit: 192,906,380
RAC: 0
United States
Message 1170578 - Posted: 12 Nov 2011, 19:56:17 UTC

I noticed your RAC dropping, I was wondering why. Your machines rely on CPU work units, which I dont have a problem getting. Most of my RAC is frpm GPU work units, which is hard to come by in enough quanities at the moment.I think we have all had problems getiing them to download, I keep having to force them, which is hard with several machines located all over. I can only imagine with the number you have, and I would think they are spread over a much greater distance.
____________

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 789,627,285
RAC: 57,864
United States
Message 1170647 - Posted: 13 Nov 2011, 1:32:19 UTC - in response to Message 1170578.

Most of these servers are actually in the same building. 80 percent of them are blade servers and we can get 64 of them per 6 foot cabinet, (4 chassis with 16 blades per chassis).

I use one of the excellent tools from eFMer, boinctasks, http://www.efmer.eu/boinc/boinc_tasks/download.html. (thanks Fred) There is a "retry all" selection that can be used to trigger download retries on multiple machines. This tool is "a must" to be able to manage a large number of machines.

I can spend a lot to time clicking retry to get the server caches full. (I guess SETI benfits from my OCD) The last few weeks I have not been able to keep a cache on most of the servers because downloads can take up to 50 retries before completing. Last week I was not able to get many tasks at all and I had to let the machines crunch my backup project, Einstein@Home.


____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4086
Credit: 111,836,125
RAC: 147,782
United States
Message 1170661 - Posted: 13 Nov 2011, 3:17:51 UTC - in response to Message 1170647.

Most of these servers are actually in the same building. 80 percent of them are blade servers and we can get 64 of them per 6 foot cabinet, (4 chassis with 16 blades per chassis).

I use one of the excellent tools from eFMer, boinctasks, http://www.efmer.eu/boinc/boinc_tasks/download.html. (thanks Fred) There is a "retry all" selection that can be used to trigger download retries on multiple machines. This tool is "a must" to be able to manage a large number of machines.

I can spend a lot to time clicking retry to get the server caches full. (I guess SETI benfits from my OCD) The last few weeks I have not been able to keep a cache on most of the servers because downloads can take up to 50 retries before completing. Last week I was not able to get many tasks at all and I had to let the machines crunch my backup project, Einstein@Home.


I use BOINCcmd, which is part of BOINC, to have my machines automatically retry their transfers every 6 hours. It has a lot of useful commands when you need to control a plethora of remote hosts.
On Tuesdays once we go down for maintenance I normally suspend network communication for all of my machines for 6 hours with 1 click of a bat file.

This is one I use for my machines at home.
set cmdpath=S:\BOINC
set pass=password
set cmd=--set_network_mode never 21600
%cmdpath%\boinccmd --host HAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host HAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host HTPCII --passwd %pass% %cmd%
%cmdpath%\boinccmd --host SAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host SAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host AE35 --passwd %pass% %cmd%

____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 789,627,285
RAC: 57,864
United States
Message 1170671 - Posted: 13 Nov 2011, 4:55:59 UTC - in response to Message 1170661.


I use BOINCcmd, which is part of BOINC, to have my machines automatically retry their transfers every 6 hours. It has a lot of useful commands when you need to control a plethora of remote hosts.
On Tuesdays once we go down for maintenance I normally suspend network communication for all of my machines for 6 hours with 1 click of a bat file.

This is one I use for my machines at home.
set cmdpath=S:\BOINC
set pass=password
set cmd=--set_network_mode never 21600
%cmdpath%\boinccmd --host HAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host HAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host HTPCII --passwd %pass% %cmd%
%cmdpath%\boinccmd --host SAL9000II --passwd %pass% %cmd%
REM %cmdpath%\boinccmd --host SAL9000 --passwd %pass% %cmd%
%cmdpath%\boinccmd --host AE35 --passwd %pass% %cmd%


Thank you for the code snippet. I try to hack this up to stop all the machines from hitting my HTTP proxy server during the weekly scheduled outages.

____________

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 937
Credit: 22,616,889
RAC: 89,286
Australia
Message 1170677 - Posted: 13 Nov 2011, 6:41:48 UTC

In fact once you've got the command or BAT file setup you can just have it run as a scheduled task via the windows task scheduler.

All you need to do is work out when the scheduled outage starts in your time zone and have it run once to disable comms and a another one to turn comms back on after the outage.

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 789,627,285
RAC: 57,864
United States
Message 1170724 - Posted: 13 Nov 2011, 14:45:44 UTC - in response to Message 1170677.

In fact once you've got the command or BAT file setup you can just have it run as a scheduled task via the windows task scheduler.

All you need to do is work out when the scheduled outage starts in your time zone and have it run once to disable comms and a another one to turn comms back on after the outage.

There is a duration field in the example HAL9000 gave. I thought it would automatically come back on after the duration had expired, would it not?
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4086
Credit: 111,836,125
RAC: 147,782
United States
Message 1170745 - Posted: 13 Nov 2011, 15:47:01 UTC - in response to Message 1170724.

In fact once you've got the command or BAT file setup you can just have it run as a scheduled task via the windows task scheduler.

All you need to do is work out when the scheduled outage starts in your time zone and have it run once to disable comms and a another one to turn comms back on after the outage.

There is a duration field in the example HAL9000 gave. I thought it would automatically come back on after the duration had expired, would it not?

In the manual that I posted a link to it states: "If duration is zero or absent, this mode is permanent. Otherwise, after 'duration' seconds elapse, revert to last permanent mode.". So for '--set_network_mode never 21600' that is effectually a 6 hour snooze of the network if the last setting was '--set_network_mode auto' or '--set_network_mode always'. However if the last setting was '--set_network_mode never' nothing would change. Before I was using duration I was 2 timed events. One ran on Tuesday at 11:00AM EDT for '--set_network_mode never' and then again at 5:00PM for '--set_network_mode auto'. However I found using duration to be a better suited to the job. On my work network sometimes a machine or two will miss a command. So having duration turn the network back on automatically is much better than sending the auto command a few times.

In my other bat files that need to repeat every so often I use the timeout command at the end.
@ECHO OFF
Set dtime=43200
:start
@ECHO Updating everyone %time%
start /min Update_All.bat
timeout %dtime% /nobreak
goto start

I guess there were some advantages of growing up in the DOS era.

I found using BOINCcmd to be the easiest way to manage my ~30 machines. At home I only have 1 machine connected to the UPS data connection. So I have an event setup 'when the computer is on batteries', or whatever it is exactly, that it runs the command to suspend computation on all of my other machines. Then once it is off of the battery resume. If I had a UPS with a network connection I probably wouldn't need to do that, but it is what I have to work with.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Message boards : Number crunching : I know I shouldn't look a gift horse in the mouth, but...

Copyright © 2014 University of California