Panic Mode On (62) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (62) Server problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next
Author Message
rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8732
Credit: 61,623,914
RAC: 55,047
United Kingdom
Message 1176071 - Posted: 6 Dec 2011, 13:37:01 UTC

And as we speak the Crickets are coming back to life - well maybe....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4589
Credit: 121,541,944
RAC: 54,105
United States
Message 1176075 - Posted: 6 Dec 2011, 13:54:09 UTC - in response to Message 1175706.
Last modified: 6 Dec 2011, 13:54:18 UTC


To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

As I have so many machines I have been doing something similar. Except I am using a For loop and I have 1 machine do this for all of my machines over the network.
I do see an occasional 'authorization failure' when it is retrying files, but doesn't seem to effect anything. Then on the next hourly pass I may not see any.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

MikeN
Send message
Joined: 24 Jan 11
Posts: 302
Credit: 32,822,153
RAC: 7,163
United Kingdom
Message 1176076 - Posted: 6 Dec 2011, 13:56:27 UTC - in response to Message 1176071.

And as we speak the Crickets are coming back to life - well maybe....


Those crickets are a law unto themselves (not that I am complaining about another hour or so to fill up before the outage).
____________

Profile john3760
Avatar
Send message
Joined: 9 Feb 11
Posts: 334
Credit: 3,400,979
RAC: 0
United Kingdom
Message 1176082 - Posted: 6 Dec 2011, 14:36:00 UTC

i went out for a bit and had 750 WUs dumped on me .i only have 1 computer so i will appologise to my wingmen beforehand and try to get through them as quickly as possible.

john3760

Profile Zapped SparkyProject donor
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 8896
Credit: 1,320,100
RAC: 709
United Kingdom
Message 1176099 - Posted: 6 Dec 2011, 16:17:45 UTC

Managed to pick up an astropulse last night, otherwise in about an hour it would've been backup project time.
____________
In an alternate universe, it was a ZX81 that asked for clothes, boots and motorcycle.

Client error 418: I'm a teapot

Tropical Goldfish Fish 15: Squeaky bras 'R us

Illusions of normality sufferer

AndrewM
Volunteer tester
Send message
Joined: 5 Jan 08
Posts: 361
Credit: 33,872,609
RAC: 0
Australia
Message 1176149 - Posted: 6 Dec 2011, 23:41:21 UTC - in response to Message 1176070.

I'm still dreaming of the day when my GPU's don't run dry 2-3 times a week.

Steve


I'm still dreaming of the week when my GPU's don't run dry 2-3 times a day.


____________
AndrewM

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3721
Credit: 48,768,260
RAC: 1,737
United States
Message 1176166 - Posted: 7 Dec 2011, 0:51:14 UTC

From empty yesterday on both machines, I currently have about 400 units on each after I re-enabled the proxy server.
____________

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2897
Credit: 218,381,374
RAC: 62,793
United States
Message 1176179 - Posted: 7 Dec 2011, 1:48:08 UTC - in response to Message 1176166.

From empty yesterday on both machines, I currently have about 400 units on each after I re-enabled the proxy server.


I really, really believe that if we knew WHY, we'd know something.

Maybe we'd know WHY.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4589
Credit: 121,541,944
RAC: 54,105
United States
Message 1176181 - Posted: 7 Dec 2011, 1:59:43 UTC

During the maintenance outage I noticed all of my "suck" downloads completed are great speed. I was seeing 800k on several.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1758
Credit: 206,458,599
RAC: 16,722
Australia
Message 1176195 - Posted: 7 Dec 2011, 3:37:00 UTC

I've noticed that when connected direct, if you can get units, the download speed is quite good, up to 25KBps +, even though the proxy is still faster, these speeds are the fastest I've ever had direct from the project in 5 years.

I wonder if this means that despite what the Cricket graphs tell us, the actual network loading is not "super saturated" like it is when normally coming back from an outage, just "busy".

This, combined with the great difficulty getting work allocated (usually only one or two units at a time) means we could be looking at a Scheduler problem rather than a network overload.

All the scheduling processes are located on "bane". I wonder if this server is really being the "bane" of our lives ?

T.A.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4589
Credit: 121,541,944
RAC: 54,105
United States
Message 1176204 - Posted: 7 Dec 2011, 4:26:58 UTC - in response to Message 1176195.

I've noticed that when connected direct, if you can get units, the download speed is quite good, up to 25KBps +, even though the proxy is still faster, these speeds are the fastest I've ever had direct from the project in 5 years.

I wonder if this means that despite what the Cricket graphs tell us, the actual network loading is not "super saturated" like it is when normally coming back from an outage, just "busy".

This, combined with the great difficulty getting work allocated (usually only one or two units at a time) means we could be looking at a Scheduler problem rather than a network overload.

All the scheduling processes are located on "bane". I wonder if this server is really being the "bane" of our lives ?

T.A.

IIRC we are subject to the C10K problem. Maybe with the updates Matt was talking about this will no longer be an issue? As some software is not subject to this problem.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Starman
Avatar
Send message
Joined: 15 May 99
Posts: 134
Credit: 39,122,045
RAC: 17,339
Canada
Message 1176377 - Posted: 8 Dec 2011, 1:56:15 UTC

Is there anybody Home?

Looks like something is broken again! Can't report what work units I have completed. Not that it is slowing my decline in RAC by much.
____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 16,209,426
RAC: 7,278
United States
Message 1176378 - Posted: 8 Dec 2011, 2:03:55 UTC - in response to Message 1176377.

I just managed to report a few, (28) but it took almost three minutes to complete. Also looks like the Cricket graphs are way down. Not completely dead but struggling.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Lint trapProject donor
Send message
Joined: 30 May 03
Posts: 871
Credit: 27,835,874
RAC: 11,625
United States
Message 1176385 - Posted: 8 Dec 2011, 2:44:18 UTC


I was getting only the front page for a while there. All other pages were reporting the project as down for maintenance. That was soon after 00:00 GMT, IIRC.

Just now I was able to report 90 completed tasks. No problems.

Lt

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8732
Credit: 61,623,914
RAC: 55,047
United Kingdom
Message 1176444 - Posted: 8 Dec 2011, 8:26:12 UTC

It looks as though the upload server has just gone off for a break. Also the backup server is reporting that its about 8hours behind the master, so things aren't too happy in the server room.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,695,266
RAC: 29,028
Australia
Message 1176447 - Posted: 8 Dec 2011, 8:44:17 UTC - in response to Message 1176444.

It looks as though the upload server has just gone off for a break.

Yep.
Once again i'm buried under uploads that won't.
____________
Grant
Darwin NT.

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 174
Credit: 17,393,079
RAC: 9,028
Netherlands
Message 1176468 - Posted: 8 Dec 2011, 10:25:51 UTC

Let's not moan to hard all at once... Things have been picking up for the good for a while now.

The boys in the lab will probably just jumpstart the rigs again when they get in in the morning and all will be well

If they would just find a way to stop the "shortie"-storm I'd be a very happy cruncher ;-)
____________

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2897
Credit: 218,381,374
RAC: 62,793
United States
Message 1176469 - Posted: 8 Dec 2011, 10:26:18 UTC - in response to Message 1176447.

It looks as though the upload server has just gone off for a break.

Yep.
Once again i'm buried under uploads that won't.


It's 38 degrees F in Berkeley. Someone open a window.

No, I didn't mean they should jump.

Profile Anthony Arbuzoff
Volunteer tester
Avatar
Send message
Joined: 6 Apr 00
Posts: 204
Credit: 2,598,253
RAC: 1,799
Russia
Message 1176471 - Posted: 8 Dec 2011, 10:45:55 UTC - in response to Message 1176469.


It's 38 degrees F in Berkeley.


Is it about 4 C? So cold in CA? I'm shocked!

____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (62) Server problems?

Copyright © 2014 University of California