Huh (Jun 08 2007)

Message boards : Technical News : Huh (Jun 08 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 584120 - Posted: 9 Jun 2007, 0:04:08 UTC
Last modified: 9 Jun 2007, 0:04:34 UTC

Around 10am this morning gowron's nfsds were all in disk wait. Not sure why, but that pretty much hosed the whole download part of our system. Jeff's been fighting with it all day. I've been at home, chiming in with my two cents every so often. Hopefully he'll get beyond it before too long.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 584120 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 584128 - Posted: 9 Jun 2007, 0:25:34 UTC - in response to Message 584120.  

Hopefully he'll get beyond it before too long.

May have just done it.
Network traffic shows the flood gates have opened.

Grant
Darwin NT
ID: 584128 · Report as offensive
Profile keeper97

Send message
Joined: 16 Sep 02
Posts: 2
Credit: 271,318
RAC: 0
Canada
Message 584220 - Posted: 9 Jun 2007, 3:25:41 UTC

Well everyone seems to be in a holding pattern, I'm not reporting and continuing to crunch away. Good luck guys ;-D
ID: 584220 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 584245 - Posted: 9 Jun 2007, 4:03:08 UTC

I think it is interesting and highly creditable that you have been able to identify the ghost units, and re-send them. My one and only ghost was re-issued to me today when I asked for work. Well Done!
ID: 584245 · Report as offensive
Profile Jim Geuin

Send message
Joined: 17 May 99
Posts: 6
Credit: 5,538,490
RAC: 32
United States
Message 585201 - Posted: 10 Jun 2007, 20:12:55 UTC

What is the problem today? I looked at the server status and it looks like essentially everything is down. I hope it is scheduled downtime and not another catastrophe.
ID: 585201 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 585236 - Posted: 10 Jun 2007, 21:32:25 UTC - in response to Message 585201.  

What is the problem today? I looked at the server status and it looks like essentially everything is down. I hope it is scheduled downtime and not another catastrophe.



I have no clue; however, Beta is affected by this "whatever-it-is" too... Also, accessing the Websites for SETI and Beta are very problematic for me. At first I thought I was having the usual Time Warner problems; however, I can get into Rosetta, Einstein, and NanoHive just fine - only SETI and Beta Websites and BOINC connections seem to be affected.


TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 585236 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 585241 - Posted: 10 Jun 2007, 21:43:21 UTC - in response to Message 585236.  

What is the problem today? I looked at the server status and it looks like essentially everything is down. I hope it is scheduled downtime and not another catastrophe.



I have no clue; however, Beta is affected by this "whatever-it-is" too... Also, accessing the Websites for SETI and Beta are very problematic for me. At first I thought I was having the usual Time Warner problems; however, I can get into Rosetta, Einstein, and NanoHive just fine - only SETI and Beta Websites and BOINC connections seem to be affected.



It's definately Seti server problems of some sort. The Cricket graph tells the tale. Right now the server status page shows everything up but the splitters, but something is definately borking comms in and out, even for the forums.
Uploads are working fine, but no reporting or downloading new work. May have to wait until tomorrow when eveyone gets into work to get it sorted again, unless they are still there working on it.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 585241 · Report as offensive
Dominic

Send message
Joined: 25 Oct 99
Posts: 2
Credit: 7,660
RAC: 0
United States
Message 585303 - Posted: 11 Jun 2007, 0:58:46 UTC - in response to Message 585241.  

Looks like I got one of the last work units to get out before the big slowdown. 134190219

12 hours later copies of the work unit have not been sent to additional users.

I have a very slow machine (about 1 hour cpu time per unit of credit.) so I'm limit my que to reduce latency getting results back. I'm currently set to get the next work unit about 12 hours before completing the last. Looks like I should increase that some.
ID: 585303 · Report as offensive
Profile KWSN - MajorKong
Volunteer tester
Avatar

Send message
Joined: 5 Jan 00
Posts: 2892
Credit: 1,499,890
RAC: 0
United States
Message 585304 - Posted: 11 Jun 2007, 1:03:55 UTC - in response to Message 585241.  


It's definately Seti server problems of some sort. The Cricket graph tells the tale. Right now the server status page shows everything up but the splitters, but something is definately borking comms in and out, even for the forums.
Uploads are working fine, but no reporting or downloading new work. May have to wait until tomorrow when eveyone gets into work to get it sorted again, unless they are still there working on it.


On the server status page, notice the large number of workunits (260318) waiting on assimilation. Also notice the 5 hour backlog on the transitioners. Add the symptoms of slow forum response, and trouble reporting completed work and requesting new work but no trouble uploading completed work.

My guess is that sidious (the BOINC database machine) is overloaded due to a large number of things it is being asked to do at the moment. Furthermore, I guess that the ghost workunit aborts over the last two days are slamming the system. Maybe part of the bottleneck is in the assimilators' access to the master science database as well, i dunno. But this theory seems to fit. Everything points to a bottleneck on sidious. Slow forums (but requests are eventually handled), trouble reporting (some are working, though... just had some of mine report a while ago), trouble getting new work downloaded... all depend on sidious. Uploading completed results doesn't really hit sidious that much, so they are somewhat unaffected. The staff likely has the splitters turned off to ease the load on sidious until the backlog clears.

Thats my guess.
ID: 585304 · Report as offensive
Haos.PL
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 63
Credit: 3,268,546
RAC: 0
Poland
Message 585307 - Posted: 11 Jun 2007, 1:22:35 UTC

Again i have issues with no work available on chicken app. I`m switching temporarily to E@H... in 50h, when Einstein unit`s done, all problems should be sorted out, i hope:)
ID: 585307 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 585312 - Posted: 11 Jun 2007, 7:04:25 UTC - in response to Message 585307.  

Again i have issues with no work available on chicken app.

There is no work available for anyone.
The Server Status page shows things are up again, but there is still very little coming down the pipes (Cricket shows it's less than 1/4 of what it should be after such an outage).
I think it requires people's presence in the Lab, the remote work doesn't appear to have sorted it out.
Grant
Darwin NT
ID: 585312 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 585372 - Posted: 11 Jun 2007, 14:52:33 UTC
Last modified: 11 Jun 2007, 14:54:43 UTC

I'm getting the dreaded
"6/11/2007 7:49:33 AM|SETI@home|Scheduler request failed: HTTP service unavailable"
message in my messages tab when trying to report work: both when requesting new work; and when just trying to report.
.

Hello, from Albany, CA!...
ID: 585372 · Report as offensive
JPP

Send message
Joined: 31 May 99
Posts: 18
Credit: 59,436,360
RAC: 47
France
Message 585400 - Posted: 11 Jun 2007, 16:08:42 UTC


well
i can just confirm it is broken (again) , hhpt service unavailable
the pain i see here is than afer many failures,app need to download the master file again , i wonder why
cheers
jeanpierrejpp

ID: 585400 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 585428 - Posted: 11 Jun 2007, 17:46:20 UTC - in response to Message 585401.  


well
i can just confirm it is broken (again) , http service unavailable
the pain i see here is than afer many failures,app need to download the master file again , i wonder why
cheers
jeanpierrejpp

The master file gives the URL of the Scheduler. So loading the master file is just checking to make sure the failures aren't because the URL has changed.
                                                                 Joe
ID: 585428 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 586102 - Posted: 13 Jun 2007, 2:42:56 UTC - in response to Message 585307.  

Again i have issues with no work available on chicken app. I`m switching temporarily to E@H... in 50h, when Einstein unit`s done, all problems should be sorted out, i hope:)

Why switch? Why not just assign resource shares and let BOINC do what it can get?
ID: 586102 · Report as offensive
Odysseus
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 1808
Credit: 6,701,347
RAC: 6
Canada
Message 586124 - Posted: 13 Jun 2007, 3:26:00 UTC - in response to Message 586102.  

Why switch? Why not just assign resource shares and let BOINC do what it can get?

Not as many opportunities for button-ab^H^Hpushing that way. ;)

ID: 586124 · Report as offensive

Message boards : Technical News : Huh (Jun 08 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.