Panic Mode On (62) Server problems?

Message boards : Number crunching : Panic Mode On (62) Server problems?

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 13318
Credit: 154,456,973
RAC: 116,421
United Kingdom
Message 1176071 - Posted: 6 Dec 2011, 13:37:01 UTC

And as we speak the Crickets are coming back to life - well maybe....


Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

ID: 1176071 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6089
Credit: 155,113,771
RAC: 48,986
United States
Message 1176075 - Posted: 6 Dec 2011, 13:54:09 UTC - in response to Message 1175706.
Last modified: 6 Dec 2011, 13:54:18 UTC


To save having to monitor the Retry button I made up a little cron job and a wee awk script:

crontab entry:
* * * * * source /home/Compaq_Owner/retryfiles

retryfiles:

cd c:
cd 'Program Files/BOINC'
./boinccmd.exe --get_file_transfers | gawk -f retry.awk

Program Files\BOINC\retry.awk:

/name/ { n = $2;}
/ xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

As I have so many machines I have been doing something similar. Except I am using a For loop and I have 1 machine do this for all of my machines over the network.
I do see an occasional 'authorization failure' when it is retrying files, but doesn't seem to effect anything. Then on the next hourly pass I may not see any.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1176075 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 314
Credit: 44,947,476
RAC: 11,368
United Kingdom
Message 1176076 - Posted: 6 Dec 2011, 13:56:27 UTC - in response to Message 1176071.

And as we speak the Crickets are coming back to life - well maybe....


Those crickets are a law unto themselves (not that I am complaining about another hour or so to fill up before the outage).

ID: 1176076 · Report as offensive
Profile john3760
Avatar

Send message
Joined: 9 Feb 11
Posts: 334
Credit: 3,400,979
RAC: 0
United Kingdom
Message 1176082 - Posted: 6 Dec 2011, 14:36:00 UTC

i went out for a bit and had 750 WUs dumped on me .i only have 1 computer so i will appologise to my wingmen beforehand and try to get through them as quickly as possible.

john3760

ID: 1176082 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45906
Credit: 814,920,889
RAC: 124,170
United States
Message 1176091 - Posted: 6 Dec 2011, 15:33:02 UTC - in response to Message 1176082.
Last modified: 6 Dec 2011, 15:34:52 UTC

i went out for a bit and had 750 WUs dumped on me .i only have 1 computer so i will appologise to my wingmen beforehand and try to get through them as quickly as possible.

john3760

Wish I had such luck....
The way it's been working here, my top rig runs out of Seti due to repeated 'no work' responses and just a handful of issued tasks. It has a 0% share on Einstein, so when the GPU runs dry on Seti, it picks up several hours of Einstein work and persistently keeps trying to get work from Seti while that is being crunched. It manages to get something built up and goes back to it when the Einstein is done and then repeats the cycle.

Of course, with today's outage coming up, it's gonna be doing Einstein for the next 12 hours or more.

Only the slower hosts on the project could stay supplied with Seti work the way things are going right now.

The servers have actually held up reasonably well considering the shorty pounding...if we could get a day or two with some datasets split that did not contain 95% VHAR we might be able to get a leg up on things.
Cats.....what more does one need?

Have made friends in this life.
Most were cats.

ID: 1176091 · Report as offensive
Profile Dimly Lit Lightbulb 😀Project Donor
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14363
Credit: 2,923,749
RAC: 4,806
United Kingdom
Message 1176099 - Posted: 6 Dec 2011, 16:17:45 UTC

Managed to pick up an astropulse last night, otherwise in about an hour it would've been backup project time.


ID: 1176099 · Report as offensive
AndrewM
Volunteer tester

Send message
Joined: 5 Jan 08
Posts: 369
Credit: 34,275,196
RAC: 0
Australia
Message 1176149 - Posted: 6 Dec 2011, 23:41:21 UTC - in response to Message 1176070.

I'm still dreaming of the day when my GPU's don't run dry 2-3 times a week.

Steve


I'm still dreaming of the week when my GPU's don't run dry 2-3 times a day.


AndrewM

ID: 1176149 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4097
Credit: 51,576,090
RAC: 1,593
United States
Message 1176166 - Posted: 7 Dec 2011, 0:51:14 UTC

From empty yesterday on both machines, I currently have about 400 units on each after I re-enabled the proxy server.



ID: 1176166 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3373
Credit: 248,429,844
RAC: 19,362
United States
Message 1176179 - Posted: 7 Dec 2011, 1:48:08 UTC - in response to Message 1176166.

From empty yesterday on both machines, I currently have about 400 units on each after I re-enabled the proxy server.


I really, really believe that if we knew WHY, we'd know something.

Maybe we'd know WHY.

ID: 1176179 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6089
Credit: 155,113,771
RAC: 48,986
United States
Message 1176181 - Posted: 7 Dec 2011, 1:59:43 UTC

During the maintenance outage I noticed all of my "suck" downloads completed are great speed. I was seeing 800k on several.


SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1176181 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1790
Credit: 225,291,086
RAC: 10,243
Australia
Message 1176195 - Posted: 7 Dec 2011, 3:37:00 UTC

I've noticed that when connected direct, if you can get units, the download speed is quite good, up to 25KBps +, even though the proxy is still faster, these speeds are the fastest I've ever had direct from the project in 5 years.

I wonder if this means that despite what the Cricket graphs tell us, the actual network loading is not "super saturated" like it is when normally coming back from an outage, just "busy".

This, combined with the great difficulty getting work allocated (usually only one or two units at a time) means we could be looking at a Scheduler problem rather than a network overload.

All the scheduling processes are located on "bane". I wonder if this server is really being the "bane" of our lives ?

T.A.

ID: 1176195 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6089
Credit: 155,113,771
RAC: 48,986
United States
Message 1176204 - Posted: 7 Dec 2011, 4:26:58 UTC - in response to Message 1176195.

I've noticed that when connected direct, if you can get units, the download speed is quite good, up to 25KBps +, even though the proxy is still faster, these speeds are the fastest I've ever had direct from the project in 5 years.

I wonder if this means that despite what the Cricket graphs tell us, the actual network loading is not "super saturated" like it is when normally coming back from an outage, just "busy".

This, combined with the great difficulty getting work allocated (usually only one or two units at a time) means we could be looking at a Scheduler problem rather than a network overload.

All the scheduling processes are located on "bane". I wonder if this server is really being the "bane" of our lives ?

T.A.

IIRC we are subject to the C10K problem. Maybe with the updates Matt was talking about this will no longer be an issue? As some software is not subject to this problem.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1176204 · Report as offensive
Starman
Avatar

Send message
Joined: 15 May 99
Posts: 186
Credit: 64,936,397
RAC: 21,846
Canada
Message 1176377 - Posted: 8 Dec 2011, 1:56:15 UTC

Is there anybody Home?

Looks like something is broken again! Can't report what work units I have completed. Not that it is slowing my decline in RAC by much.


Gigabyte Z170X-UD5
i7-6700K
32 MB Corsair Vengeance LPX 2400mhz
Samsung 850 Pro SSD 512GB
WD Caviar Black 2.0TB
Corsair HX850i
Corsair H80iGT
MSI R9-380 Gaming 4G
Visiontek HD7870 2G
Corsair Obsidian 450D

ID: 1176377 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1176378 - Posted: 8 Dec 2011, 2:03:55 UTC - in response to Message 1176377.

I just managed to report a few, (28) but it took almost three minutes to complete. Also looks like the Cricket graphs are way down. Not completely dead but struggling.




PROUD MEMBER OF Team Starfire World BOINC

ID: 1176378 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 1176385 - Posted: 8 Dec 2011, 2:44:18 UTC


I was getting only the front page for a while there. All other pages were reporting the project as down for maintenance. That was soon after 00:00 GMT, IIRC.

Just now I was able to report 90 completed tasks. No problems.

Lt

ID: 1176385 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 13318
Credit: 154,456,973
RAC: 116,421
United Kingdom
Message 1176444 - Posted: 8 Dec 2011, 8:26:12 UTC

It looks as though the upload server has just gone off for a break. Also the backup server is reporting that its about 8hours behind the master, so things aren't too happy in the server room.


Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

ID: 1176444 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7476
Credit: 90,985,153
RAC: 46,013
Australia
Message 1176447 - Posted: 8 Dec 2011, 8:44:17 UTC - in response to Message 1176444.

It looks as though the upload server has just gone off for a break.

Yep.
Once again i'm buried under uploads that won't.
Grant
Darwin NT

ID: 1176447 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 29,949,041
RAC: 19,905
Netherlands
Message 1176468 - Posted: 8 Dec 2011, 10:25:51 UTC

Let's not moan to hard all at once... Things have been picking up for the good for a while now.

The boys in the lab will probably just jumpstart the rigs again when they get in in the morning and all will be well

If they would just find a way to stop the "shortie"-storm I'd be a very happy cruncher ;-)


ID: 1176468 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3373
Credit: 248,429,844
RAC: 19,362
United States
Message 1176469 - Posted: 8 Dec 2011, 10:26:18 UTC - in response to Message 1176447.

It looks as though the upload server has just gone off for a break.

Yep.
Once again i'm buried under uploads that won't.


It's 38 degrees F in Berkeley. Someone open a window.

No, I didn't mean they should jump.

ID: 1176469 · Report as offensive
Profile Belthazor
Volunteer tester
Avatar

Send message
Joined: 6 Apr 00
Posts: 218
Credit: 3,991,774
RAC: 4,314
Russia
Message 1176471 - Posted: 8 Dec 2011, 10:45:55 UTC - in response to Message 1176469.


It's 38 degrees F in Berkeley.


Is it about 4 C? So cold in CA? I'm shocked!

ID: 1176471 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (62) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.