Panic Mode On (77) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 23 · Next
Author Message
msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38327
Credit: 560,955,321
RAC: 654,506
United States
Message 1295644 - Posted: 15 Oct 2012, 16:47:25 UTC - in response to Message 1295642.

Batten the hatches folks. Smooth sailing is over. AP's are being split again. Expect lots of complaints about slow downloads, no new tasks, can't report, slow uploads, no more coffee, bad times in general. :-)

What's odd is that the Cricket graph has not maxxed back out yet. Normally see that as soon as AP cranks back up. Maybe things were going good enough over the weekend that many MB caches are full. I know the kitties got about everything they needed.


I guess there are lots of AP only machines, which has been asking for AP's for days, and Boinc has gone into long pending/deferred hours. They'll come back asking for AP's, and when they do, all h*** will break loose.

Dunno...if few were asking for AP work, I would expect the server status page to show some AP ready to send being built up and I don't see that happening yet either.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3336
Credit: 19,065,571
RAC: 19,917
Sweden
Message 1295646 - Posted: 15 Oct 2012, 16:51:52 UTC - in response to Message 1295644.
Last modified: 15 Oct 2012, 16:52:50 UTC

Batten the hatches folks. Smooth sailing is over. AP's are being split again. Expect lots of complaints about slow downloads, no new tasks, can't report, slow uploads, no more coffee, bad times in general. :-)

What's odd is that the Cricket graph has not maxxed back out yet. Normally see that as soon as AP cranks back up. Maybe things were going good enough over the weekend that many MB caches are full. I know the kitties got about everything they needed.


I guess there are lots of AP only machines, which has been asking for AP's for days, and Boinc has gone into long pending/deferred hours. They'll come back asking for AP's, and when they do, all h*** will break loose.

Dunno...if few were asking for AP work, I would expect the server status page to show some AP ready to send being built up and I don't see that happening yet either.


Hmm, AP's disappeared from splitting again. Better just wait and see what happens.
____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38327
Credit: 560,955,321
RAC: 654,506
United States
Message 1295649 - Posted: 15 Oct 2012, 16:54:38 UTC - in response to Message 1295646.

Batten the hatches folks. Smooth sailing is over. AP's are being split again. Expect lots of complaints about slow downloads, no new tasks, can't report, slow uploads, no more coffee, bad times in general. :-)

What's odd is that the Cricket graph has not maxxed back out yet. Normally see that as soon as AP cranks back up. Maybe things were going good enough over the weekend that many MB caches are full. I know the kitties got about everything they needed.


I guess there are lots of AP only machines, which has been asking for AP's for days, and Boinc has gone into long pending/deferred hours. They'll come back asking for AP's, and when they do, all h*** will break loose.

Dunno...if few were asking for AP work, I would expect the server status page to show some AP ready to send being built up and I don't see that happening yet either.


Hmm, AP's disappeared from splitting again. Better just wait and see what happens.

Something odd there. Usually AP datasets take a long time to split and send. I suspect little AP work was generated from these few.

____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 579
Credit: 131,569,471
RAC: 114,074
United Kingdom
Message 1295675 - Posted: 15 Oct 2012, 17:34:09 UTC - in response to Message 1295556.

I reckon if anyone doesn't have enough work then the problem is on their end.

Cheers.

Wish I could work out what -- I'm only getting two tasks/request on my fastest machine and not much more on the others, in general.

Gah! Finally realised that my disk usage was 1.5 GB, hitting the limit; upped it to 10 GB and jobs started flowing... Turns out that when I was playing with BOINC versions in August I'd screwed up and "lost" 3000 WUs, which were reported as errors but not deleted from my project area!
____________

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 1296340 - Posted: 17 Oct 2012, 21:16:17 UTC

I just spotted that all the multibeam splitter`s are disabled
and RTS iz kinda low
PANIC

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7026
Credit: 59,289,366
RAC: 20,714
Germany
Message 1296351 - Posted: 17 Oct 2012, 21:29:41 UTC
Last modified: 17 Oct 2012, 21:30:19 UTC

Still ~ 19,000 MB WUs available for D/L .. ;-)


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR



>Das Deutsche Cafe. The German Cafe.<

Phil
Send message
Joined: 12 Mar 04
Posts: 14
Credit: 4,841,296
RAC: 0
United Kingdom
Message 1296360 - Posted: 17 Oct 2012, 22:06:40 UTC

Zero WU RTS

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,768,880
RAC: 22,871
United Kingdom
Message 1296371 - Posted: 17 Oct 2012, 22:39:03 UTC

Eric was dealing with the problem that led to new WU numbering scheme

It was a compiler bug, messing up the splitter code.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3336
Credit: 19,065,571
RAC: 19,917
Sweden
Message 1296381 - Posted: 17 Oct 2012, 23:00:05 UTC - in response to Message 1296371.

Eric was dealing with the problem that led to new WU numbering scheme

It was a compiler bug, messing up the splitter code.


Does that mean that we have to redo 2 weeks of WU's? If that's the case I'll consider the first run to be just a test run. Either that, or I'll have to start crying :-)
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2238
Credit: 8,453,935
RAC: 4,049
United States
Message 1296423 - Posted: 18 Oct 2012, 0:48:54 UTC

Not complaining nor panic.. but my RAC has dropped in half due to no APs. Might be a good thing though. I knew when I stole my PSU from another machine that the squealing/chirping that was intermittently coming from the capacitors meant that it wouldn't have too terribly long to live. It is still alive, but my board has been beeping some error alert for 30 seconds every 6-8 hours and I'm not sure why, but it probably has something to do with some voltage being below a threshold.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 23,284,405
RAC: 31,727
United States
Message 1296448 - Posted: 18 Oct 2012, 3:15:07 UTC - in response to Message 1296371.

Excerpts from another thread:

...I noticed a number of new units I am receiving have unusually long names, as in:
21ap11af.25749.2934.140733193388044.10.40_2
versus a more typical:
09au12ag.28990.14407.13.10.216_0
Alternatively, does anyone know what that new 15 digit number in the fourth position means?

...I hope we get an official explanation soon, and I hope these "longnames" aren't wreaking havoc with the SETI database.

...I was also thinking along the line of, if these names are in error, as was suggested in the thread referenced by Wiggo, then it might cause serious problems later when the SETI staff attempt to identify and work with the returned results. If so, then I hope it turns out to be a non-issue, or at least a minor "ho-hum...". :)


Now Eric is "dealing with the problem that led to new WU numbering scheme" that came from a "compiler bug, messing up the splitter code".

Based on all the above, I wonder if it would help Eric to send a BOINC Notice asking those of us who feel comfortable with manual manipulation of our caches to suspend until further notice all cached WUs with the apparently erroneous "longnames"?

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,768,880
RAC: 22,871
United Kingdom
Message 1296471 - Posted: 18 Oct 2012, 6:29:39 UTC - in response to Message 1296448.

Now Eric is "dealing with the problem that led to new WU numbering scheme" that came from a "compiler bug, messing up the splitter code".

Based on all the above, I wonder if it would help Eric to send a BOINC Notice asking those of us who feel comfortable with manual manipulation of our caches to suspend until further notice all cached WUs with the apparently erroneous "longnames"?

No, no need to do anything like that. Eric dealt with the long names in passing, while working on testing for the new v7 application at SETI Beta. Here's what he said:

That huge number is supposed to be the ID of the receiver we used, which is obtained from a library routine. I'm guessing that at some point the library was compiled with a version of g++ that doesn't have the same storage format for the data type used.

Fortunately the correct values is used in the workunit header, so we don't have to do all this work over.

(my emphasis)

The technically minded can read the gory details from beta message 44104 onwards.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5697
Credit: 56,444,616
RAC: 49,012
Australia
Message 1296480 - Posted: 18 Oct 2012, 7:13:12 UTC - in response to Message 1296471.


Once again the splitters are unable to keep up with work demand.
____________
Grant
Darwin NT.

N9JFE David S
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 10760
Credit: 13,484,402
RAC: 14,719
United States
Message 1296526 - Posted: 18 Oct 2012, 14:28:18 UTC

Why is it there are 6 AP splitters and only 5 MB splitters? AP gets split faster than MB anyway, so it seems to me it would make more sense to have more MB splitters than APs. Could lando run a couple more MBs, especially since he's not running his APs right now?

Okay, now I'm wondering... isn't the SSP supposed to update every 10 minutes? It's now 1427 UTC and the page is still showing 1410.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38327
Credit: 560,955,321
RAC: 654,506
United States
Message 1296543 - Posted: 18 Oct 2012, 15:14:44 UTC - in response to Message 1296526.

Why is it there are 6 AP splitters and only 5 MB splitters? AP gets split faster than MB anyway, so it seems to me it would make more sense to have more MB splitters than APs. Could lando run a couple more MBs, especially since he's not running his APs right now?

Okay, now I'm wondering... isn't the SSP supposed to update every 10 minutes? It's now 1427 UTC and the page is still showing 1410.

Sometimes it's every 10 minutes and sometimes every 20...
Last update 15:00.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5697
Credit: 56,444,616
RAC: 49,012
Australia
Message 1296997 - Posted: 19 Oct 2012, 21:25:53 UTC


Bit of a backlog of MB validations building up.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5697
Credit: 56,444,616
RAC: 49,012
Australia
Message 1297343 - Posted: 20 Oct 2012, 21:56:39 UTC - in response to Message 1296997.


MB validators appear to be clearing their backlog, now the assimilators are falling behind.
____________
Grant
Darwin NT.

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 1298419 - Posted: 24 Oct 2012, 21:17:21 UTC
Last modified: 24 Oct 2012, 21:20:01 UTC

Hay man what happend.
Sumfing is broke
The cricket is dead, gon all white :(

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7026
Credit: 59,289,366
RAC: 20,714
Germany
Message 1298424 - Posted: 24 Oct 2012, 22:30:59 UTC

Project is temporarily shut down for maintenance


.. and then ..

Scheduler request failed: Couldn't connect to server Scheduler request failed: HTTP internal server error Scheduler request failed: Timeout was reached



* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR



>Das Deutsche Cafe. The German Cafe.<

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6528
Credit: 90,629,106
RAC: 75,090
Australia
Message 1298435 - Posted: 24 Oct 2012, 22:56:24 UTC - in response to Message 1298424.

Things have been working well here since the servers come back online (approx. 90mins now).

Cheers.
____________

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?

Copyright © 2014 University of California