Panic Mode On (17) Server problems

Message boards : Number crunching : Panic Mode On (17) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
Profile Jack Zhang
Volunteer tester
Avatar

Send message
Joined: 2 Jul 06
Posts: 206
Credit: 6,142,449
RAC: 0
Canada
Message 908600 - Posted: 18 Jun 2009, 1:04:48 UTC - in response to Message 908595.  

Does this mean all of our results are invalid? OH NOES!

[sarcasm self-check complete] -Quote from Portal
What if Fiction was Fact and Fact was Fiction and vice versa?
ID: 908600 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 908608 - Posted: 18 Jun 2009, 1:38:19 UTC

I did notice the number of tapes left online is a reasonable number now. It is shrinking fast and we still haven't heard anything about new data being recorded/shipped, so unless there's a few more tapes at off-site storage, we could actually run out.. ::gasp::

One positive side-effect this will have is that long-term pending WUs should clear up, and the load on the database would decrease, allowing time for it to recover on its own..though it would leave a lot of idle CPUs.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 908608 · Report as offensive
Chelski
Avatar

Send message
Joined: 3 Jan 00
Posts: 121
Credit: 8,979,050
RAC: 0
Malaysia
Message 908631 - Posted: 18 Jun 2009, 3:50:45 UTC - in response to Message 908608.  

Cosmic_Ocean wrote:
I did notice the number of tapes left online is a reasonable number now. It is shrinking fast and we still haven't heard anything about new data being recorded/shipped, so unless there's a few more tapes at off-site storage, we could actually run out.. ::gasp::

Well, if the current situation persists (WU upload problems and Validate errors for those unlucky WUs that got reported in), there would be many many reissues.
ID: 908631 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 908636 - Posted: 18 Jun 2009, 4:21:33 UTC - in response to Message 908631.  


Network downloads have tapered off to normal levels (and no more traffic spikes), but there is still a huge level of upload traffic. But looking at Scarecrow's graphs it looks as though nothing is actually getting through.
Been unable to upload myself for over 12 hours now.
Grant
Darwin NT
ID: 908636 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 908643 - Posted: 18 Jun 2009, 5:26:53 UTC - in response to Message 908591.  

I still think the math behind the quota needs to be revised. Sure, one bad task can still be -1, but a good task needs to be something like +1 or +2 instead of x2. Takes 100+ bad tasks to get down to a quota of 1, but it only takes 8 good ones to get back to 100. Sounds like it defeats the purpose of the quota altogether.

It really depends on what you're trying to achieve.

If the purpose is to keep a machine that returns consistently bad results from chewing through work, it's good enough. It'll take a while to throttle it down, so an occasional validator error or math bug won't hurt machines that are not broken.

It also means a broken machine (or maybe a bug-fix to the application) can recover quite quickly once it's fixed.

If you did away with the -1/x2 mechanism entirely, things would probably still be okay, there would just be a lot more reissues.



Wouldn't it cause less?

Letting every broken host have 100 work units per CPU every day would mean more "broken" results returned every day, and more reissues.
ID: 908643 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 908653 - Posted: 18 Jun 2009, 6:34:11 UTC

I now have about 300 6.08 Cuda trying to upload and about 30 mins left to crunch then this i7 is out of Seti work.
Everything is ticked in my preferences to accepet any work.
I did have about 4 day cache but nothing now.
I am running on pending now 121k left in there.
I have reverted to my second string projects for the time being.
Hope to be back here when there is something to crunch.
If I abort the stuck uploads that is about 12000 credits lost,
so I have suspended network and will try tomorrow.
Dave
ID: 908653 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 908676 - Posted: 18 Jun 2009, 8:36:10 UTC - in response to Message 908648.  

Can anybody tell me what is going on with SETI?
No WU are uploading and no new WU downloading, and my account are not chancing:(

There's still some sort of problem with the servers. I suspect it's a download problem- download traffic has dropped to lower than normal levels, even though there is plenty of work available to download. But there is a huge amount of inbound traffic, yet it's impossible to actually return any results because of that traffic.
Grant
Darwin NT
ID: 908676 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 908698 - Posted: 18 Jun 2009, 11:07:46 UTC
Last modified: 18 Jun 2009, 11:08:00 UTC

luckily I have only one to upload, unluckily it has been trying since yesterday now http errors which is usual after an outage. Have to wait another few hours before I can see if I get any work, yesterday my conmunication was deferred for 24 hours which is ok after it has checked for awhile will let you know if I get any around 18:30 BST
ID: 908698 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 908701 - Posted: 18 Jun 2009, 11:45:13 UTC

Well I just shut the old P4 down due to i cant upload, But thats ok. Now is the time to blow the dust bunnies out of it.
Mac still has quite a bit of MB WU. Ill let them run out and if they havnet uploaded ill suspend Seti and let milkyway have free run.
[/quote]

Old James
ID: 908701 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 908702 - Posted: 18 Jun 2009, 11:55:09 UTC - in response to Message 908701.  


Still plenty of work Ready to Send, but outbound network traffic has dropped even further, and inbound has increased slightly. Looks like they'll have their work cut out for them tomorrow getting things unclogged.
Grant
Darwin NT
ID: 908702 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 908707 - Posted: 18 Jun 2009, 12:26:58 UTC

I tried a few times to get work for the P4 but no go so i just shut er down.
[/quote]

Old James
ID: 908707 · Report as offensive
Profile MadMaC
Volunteer tester
Avatar

Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 908717 - Posted: 18 Jun 2009, 13:02:25 UTC

I know there are problems which are leading to an inability to upload or download work - thats fine, these things happen. My query revolves around the status page, the ones I check are

http://tnmshouse.com/xml_trans.php
http://setiathome.berkeley.edu/sah_status.html

these give no indication of everything being wrong, am I misreading these or am I looking at the wrong pages and should be checking somewhere else..


ID: 908717 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 908740 - Posted: 18 Jun 2009, 13:48:14 UTC - in response to Message 908737.  

. . . *My* 0.02 cents . . .

Wow, that's little! Where do they have hundredth of cents? ;-)
ID: 908740 · Report as offensive
Profile [AF>france>pas-de-calais]symaski62
Volunteer tester

Send message
Joined: 12 Aug 05
Posts: 258
Credit: 100,548
RAC: 0
France
Message 908741 - Posted: 18 Jun 2009, 13:52:04 UTC

http://bluenorthernsoftware.com/scarecrow/sahstats/graphs.php?t=48

11h00 => 51 & 52 results/ second record ?!?

BOUM! FOU !
SETI@Home Informational message -9 result_overflow
with a general handicap of 80% and it makes much d' efforts for the community and s' expimer, thank you d' to be understanding.
ID: 908741 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 908743 - Posted: 18 Jun 2009, 13:56:31 UTC - in response to Message 908740.  


. . . *My* 0.02 cents . . .

Wow, that's little! Where do they have hundredth of cents? ;-)


eh Gundolf ;)) maybe THIS: $00.02 [cents]


BOINC Wiki . . .

Science Status Page . . .
ID: 908743 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 908754 - Posted: 18 Jun 2009, 14:27:23 UTC - in response to Message 908342.  

You can switch it on right now, if you attach to another project as it has been suggested you do since the day BOINC went live.


No thanks.

If you looked in my account data/profile.. I'm 100 % SETIan.. no other BOINC project.

So?

Consider crunching for SETI BETA. They have a slightly different mix of servers, and they aren't necessarily down when SETI "main" is down.


So?

Hmm.. if the servers would be separated.. maybe it would be a well idea..

But now (maybe always) SETI@home Beta Test have the same probs like SETI@home.

ID: 908754 · Report as offensive
Profile Vipin Palazhi
Avatar

Send message
Joined: 29 Feb 08
Posts: 286
Credit: 167,386,578
RAC: 0
India
Message 908755 - Posted: 18 Jun 2009, 14:33:38 UTC

I just added a new cruncher to my account. It sent out work requests twice for the 2 day cache that I have set and successfully downloaded 17 WUs without any trouble. So downloads dont seem to be an issue. However, my other crunchers have 100s of completed WUs waiting for upload, and unless these get reported, no new work will be downloaded. Suspended network activity on all for the time being.
______________

ID: 908755 · Report as offensive
MJS

Send message
Joined: 3 Apr 99
Posts: 27
Credit: 1,029,606,029
RAC: 234
United States
Message 908774 - Posted: 18 Jun 2009, 15:18:53 UTC - in response to Message 908762.  

FYI

Host 4947578 has been off line all night. No work ! 4 pages of work waiting to up load. Additional work has always been set at 2 days. It is lucky to keep one day of work.

Mike
ID: 908774 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 908786 - Posted: 18 Jun 2009, 15:53:52 UTC - in response to Message 908737.  

The folks at Climate make this suggestion periodically as well -- makes sense for those doing single project work (which doesn't make all that much sense as a primary focus of the BOINC project was multi-project support so that project specific outages would be less bothersome).

Now if the BOINC client included a feature for 'suspend network activity for specific projects' that would be nice. Then again, if the BOINC client including support for ATI GPU's, that also would be nice.




and, a Suggestion - for the Time being: Suspend network activity - user request UNTIL the ISSUES are corrected [including VALIDATE ERRORS]




ID: 908786 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 908787 - Posted: 18 Jun 2009, 16:05:41 UTC - in response to Message 908754.  

Hmm.. if the servers would be separated.. maybe it would be a well idea..

The problem is that a lot of the storage is through the NAS box, or through mounts on various drives -- and those work best on a LAN. The interdependencies mean that you have more than one critical server for some functions.

The best idea would be for SETI to find a great big pile of money (or a great big pile of brand new servers, with the right OS preloaded and driver issues all resolved).

Then they could reduce the interdependencies.


ID: 908787 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (17) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.