Panic Mode On (62) Server problems?

Message boards : Number crunching : Panic Mode On (62) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1175702 - Posted: 4 Dec 2011, 20:35:32 UTC - in response to Message 1175643.  

Well things here are business as usual with my 3 rigs bouncing on/off the limits still.

Cheers.
ID: 1175702 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1175703 - Posted: 4 Dec 2011, 20:36:45 UTC

Umm, it is up and running again. Great!
ID: 1175703 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1175706 - Posted: 4 Dec 2011, 20:55:43 UTC - in response to Message 1175621.  

I did just get one work request successfully through for my top rig by spending some quality time with my mouse and the retry button.

Meow meow meow!


I tried that but mine was too far gone, I only had shorties left and I could not press the button fast enough to keep up with them.




To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

In other words, every minute cron runs retryfiles; retryfiles lists the files awaiting transfer and hands the results to retry.awk. The awk script stores the name of each file as the data passes through it, then if it sees that that file has an inactive transfer it spawns a system command to tell boinccmd to retry the transfer... The nice thing about doing it this way is that I only bother the Berkeley servers if I find a transfer that's in a wait-for-retry state.

This is working well on my two main Windows/NVIDIA boxes; their tasks in progress are slowly rising whereas this morning they both had empty caches. The commands as given should work out-of-the-box with most Linux installations; for Windows you need to install cygwin and its cron service -- or other equivalent software.

To run the script less often than every minute use */<n> as the first entry in the crontab line, where <n> is however many minutes you want to delay between instances, e.g. */5 for every fifth minute.
ID: 1175706 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1175710 - Posted: 4 Dec 2011, 21:09:02 UTC - in response to Message 1175706.  


I tried that but mine was too far gone, I only had shorties left and I could not press the button fast enough to keep up with them.




To save having to monitor the Retry button I made up a little cron job and a wee awk script:



Snip.

This may be useful to some, a little basic to others, but its way above my head, mouse clicking is about my limit:-)


Kevin


ID: 1175710 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1175720 - Posted: 4 Dec 2011, 22:19:30 UTC - in response to Message 1175710.  


To save having to monitor the Retry button I made up a little cron job and a wee awk script:



Snip.

This may be useful to some, a little basic to others, but its way above my head, mouse clicking is about my limit:-)



Each to his own, Kev. You know I need to understand this sort of thing for my job; the beauty of computers is that they can relieve us of button-clicking duty, they never get tired. So ultimately a little bit of time spent learning something like awk (or python, or perl if you want even more capability) _can_ pay you back in added flexibility. But it's not compulsory...

Cheers, mate!
ID: 1175720 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1175734 - Posted: 4 Dec 2011, 22:44:00 UTC - in response to Message 1175720.  


Each to his own, Kev. You know I need to understand this sort of thing for my job; the beauty of computers is that they can relieve us of button-clicking duty, they never get tired. So ultimately a little bit of time spent learning something like awk (or python, or perl if you want even more capability) _can_ pay you back in added flexibility. But it's not compulsory...

Cheers, mate!


I am just a lorry driver by trade, yes they are sneeking into the cabs, there is even a bunch of electronics between my right foot and the engine now but the thing I have to worry about is whats on the dash (ie: call out tow truck).

I only dabble with computers for pleasure and due to other commitments time is in very short supply, If or when I can find the time increasing my capabilities with a computer is on my to do list.



Kevin


ID: 1175734 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1175767 - Posted: 5 Dec 2011, 2:37:03 UTC

My APs upload first try every time one finishes. Scheduler requests go through every time, but about 99.5% of the time respond with "no tasks available" or "your app_info.xml file doesn't have a usable version of Seti@Home Enhanced." On the rare occasion that I do get issued an AP, it instant-fails 1-10 times and then finally goes through. I haven't hit any buttons in at least a month.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1175767 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1175798 - Posted: 5 Dec 2011, 7:33:32 UTC - in response to Message 1175192.  


Take a look at Scarecrow's graphs.
A surge as the backlog of uploads goes through is to be expected- but for it to be sustained at over 110,000 for several hours. Talk about a hammering.
Grant
Darwin NT
ID: 1175798 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1175805 - Posted: 5 Dec 2011, 8:53:30 UTC - in response to Message 1175798.  
Last modified: 5 Dec 2011, 9:12:56 UTC

And it would appear it was too much for too long- the uploads are backing up yet again.


EDIT- they have finally cleaered, but the inbound traffic is looking jagged again. Not a good sign.
Grant
Darwin NT
ID: 1175805 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1175917 - Posted: 5 Dec 2011, 19:07:02 UTC

One, and a third of an astropulse left to go. I'm starting to panic a bit.

Member of the People Encouraging Niceness In Society club.

ID: 1175917 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1175976 - Posted: 6 Dec 2011, 1:19:53 UTC
Last modified: 6 Dec 2011, 1:49:32 UTC

Oops, Looks like uploads have gone MGD again.

All my rigs are now in "project backoff" for uploads and all have been getting "No Tasks Available" when asking for work for some hours.

And it's now just after knock off time in Berkeley so there will be no-one there to apply the rubber hammer. :P

EDIT: Looks like I was wrong about the rubber hammer, about 15 minutes after I posted, all uploads cleared at good speed.

T.A.
ID: 1175976 · Report as offensive
Amauri
Volunteer tester

Send message
Joined: 18 May 08
Posts: 26
Credit: 1,107,140
RAC: 0
Brazil
Message 1176024 - Posted: 6 Dec 2011, 4:14:00 UTC - in response to Message 1175706.  

To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}



Great job, Ivan, thank you!
ID: 1176024 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1176044 - Posted: 6 Dec 2011, 8:27:24 UTC - in response to Message 1176024.  


Once again, uploads accumulate.
Grant
Darwin NT
ID: 1176044 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1176067 - Posted: 6 Dec 2011, 13:14:28 UTC

Cricket graph has just base lined for uploads and downloads. Thats probably that until after the weekly outage.
ID: 1176067 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6651
Credit: 121,090,076
RAC: 0
United States
Message 1176070 - Posted: 6 Dec 2011, 13:34:55 UTC

I'm still dreaming of the day when my GPU's don't run dry 2-3 times a week.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1176070 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1176071 - Posted: 6 Dec 2011, 13:37:01 UTC

And as we speak the Crickets are coming back to life - well maybe....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1176071 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1176075 - Posted: 6 Dec 2011, 13:54:09 UTC - in response to Message 1175706.  
Last modified: 6 Dec 2011, 13:54:18 UTC


To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

As I have so many machines I have been doing something similar. Except I am using a For loop and I have 1 machine do this for all of my machines over the network.
I do see an occasional 'authorization failure' when it is retrying files, but doesn't seem to effect anything. Then on the next hourly pass I may not see any.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1176075 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1176076 - Posted: 6 Dec 2011, 13:56:27 UTC - in response to Message 1176071.  

And as we speak the Crickets are coming back to life - well maybe....


Those crickets are a law unto themselves (not that I am complaining about another hour or so to fill up before the outage).
ID: 1176076 · Report as offensive
Profile john3760
Avatar

Send message
Joined: 9 Feb 11
Posts: 334
Credit: 3,400,979
RAC: 0
United Kingdom
Message 1176082 - Posted: 6 Dec 2011, 14:36:00 UTC

i went out for a bit and had 750 WUs dumped on me .i only have 1 computer so i will appologise to my wingmen beforehand and try to get through them as quickly as possible.

john3760
ID: 1176082 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1176091 - Posted: 6 Dec 2011, 15:33:02 UTC - in response to Message 1176082.  
Last modified: 6 Dec 2011, 15:34:52 UTC

i went out for a bit and had 750 WUs dumped on me .i only have 1 computer so i will appologise to my wingmen beforehand and try to get through them as quickly as possible.

john3760

Wish I had such luck....
The way it's been working here, my top rig runs out of Seti due to repeated 'no work' responses and just a handful of issued tasks. It has a 0% share on Einstein, so when the GPU runs dry on Seti, it picks up several hours of Einstein work and persistently keeps trying to get work from Seti while that is being crunched. It manages to get something built up and goes back to it when the Einstein is done and then repeats the cycle.

Of course, with today's outage coming up, it's gonna be doing Einstein for the next 12 hours or more.

Only the slower hosts on the project could stay supplied with Seti work the way things are going right now.

The servers have actually held up reasonably well considering the shorty pounding...if we could get a day or two with some datasets split that did not contain 95% VHAR we might be able to get a leg up on things.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1176091 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (62) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.