Panic Mode On (87) Server Problems?

Message boards : Number crunching : Panic Mode On (87) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 24 · Next

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1490630 - Posted: 18 Mar 2014, 13:41:28 UTC - in response to Message 1490620.  
Last modified: 18 Mar 2014, 13:46:51 UTC

No Beta site traffic either.

Well, I just got work from Beta and reporting to main worked too.

Actually the stuck SSP shows more than 300k to send - we do know that a stuck SSP can lead to runaway splitters, when stuck below the highwater mark. I guess if it got stuck _above_ the highwater mark now, splitters are probably not gettting the signal to fire up or at least not working flat out.

Either way, I doubt it will get sorted before maintenance.

edit: small correction, my beta UPload seems to be stuck too.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1490630 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1490641 - Posted: 18 Mar 2014, 13:59:36 UTC
Last modified: 18 Mar 2014, 14:00:33 UTC

Must be close to 7AM at CA. DL/UL are working fine from here. SSP stuck.
ID: 1490641 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1490643 - Posted: 18 Mar 2014, 14:04:06 UTC - in response to Message 1490641.  

We're in for a huge spike in the stats graphs once this lot gets assimilated!

ID: 1490643 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490644 - Posted: 18 Mar 2014, 14:10:33 UTC - in response to Message 1490630.  

No Beta site traffic either.

Well, I just got work from Beta and reporting to main worked too.

Actually the stuck SSP shows more than 300k to send - we do know that a stuck SSP can lead to runaway splitters, when stuck below the highwater mark. I guess if it got stuck _above_ the highwater mark now, splitters are probably not gettting the signal to fire up or at least not working flat out.

Either way, I doubt it will get sorted before maintenance.

edit: small correction, my beta UPload seems to be stuck too.

Well, there's a clue - they were splitting at 15/sec when the page locked. That means they were not at high water mark - so they probably went on, and on, and on...

15 * 3600 * 20 (hours) is over a million tasks split since then. We've been drawing them down, of course, but not so many shorties as in recent weeks. I reckon we'll be bloated.

(and of course they had lots of raw material to work on)
ID: 1490644 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1490647 - Posted: 18 Mar 2014, 14:20:28 UTC - in response to Message 1490644.  
Last modified: 18 Mar 2014, 14:38:08 UTC

I've had a Stuck Beta Upload since Last Night. Main works Beta doesn't.
Tue Mar 18 02:00:51 2014 | SETI@home Beta Test | Temporarily failed upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0: transient HTTP error
Tue Mar 18 02:00:51 2014 | SETI@home Beta Test | Backing off 03:48:21 on upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0
Tue Mar 18 10:05:55 2014 | SETI@home Beta Test | Temporarily failed upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0: transient HTTP error
Tue Mar 18 10:05:55 2014 | SETI@home Beta Test | Backing off 05:41:24 on upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0

I restarted before I went to bed to see if that would help, it didn't.
...all mine are stuck
ID: 1490647 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490654 - Posted: 18 Mar 2014, 14:39:25 UTC - in response to Message 1490647.  

I've had a Stuck Beta Upload since Last Night. Main works Beta doesn't.
Tue Mar 18 02:00:51 2014 | SETI@home Beta Test | Temporarily failed upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0: transient HTTP error
Tue Mar 18 02:00:51 2014 | SETI@home Beta Test | Backing off 03:48:21 on upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0
Tue Mar 18 10:05:55 2014 | SETI@home Beta Test | Temporarily failed upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0: transient HTTP error
Tue Mar 18 10:05:55 2014 | SETI@home Beta Test | Backing off 05:41:24 on upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0

I restarted before I went to bed to see if that would help, it didn't.
...all mine are stuck

Does <http_debug> give you any idea *which* transient http error it is?
ID: 1490654 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1490657 - Posted: 18 Mar 2014, 14:45:26 UTC

I'm getting lots more APs than normal for my GPUs...is that a bribe to keep me quiet?
ID: 1490657 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34041
Credit: 18,883,157
RAC: 18
Belgium
Message 1490661 - Posted: 18 Mar 2014, 14:54:47 UTC - in response to Message 1490657.  

I'm getting lots more APs than normal for my GPUs...is that a bribe to keep me quiet?



Have more of them than usual myself but I'm not sad about it:)
rOZZ
Music
Pictures
ID: 1490661 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490667 - Posted: 18 Mar 2014, 15:07:09 UTC - in response to Message 1490654.  

Does <http_debug> give you any idea *which* transient http error it is?

Got a PM with the answer for Beta:

HTTP/1.1 503 Service Temporarily Unavailable
ID: 1490667 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1490685 - Posted: 18 Mar 2014, 16:17:18 UTC

My BOINC has managed to confuse me.

18/03/2014 17:09:37 | SETI@home | [file_xfer] http op done; retval -113 (can't resolve hostname)
18/03/2014 17:09:37 | SETI@home | [file_xfer] http op done; retval -113 (can't resolve hostname)
18/03/2014 17:09:37 | SETI@home | [file_xfer] file transfer status -113 (can't resolve hostname)
18/03/2014 17:09:37 | SETI@home | Temporarily failed upload of 01au13ab.15508.12337.438086664204.12.111_0_0: can't resolve hostname
18/03/2014 17:09:37 | SETI@home | [file_xfer] project-wide xfer delay for 9886.473586 sec
18/03/2014 17:09:37 | SETI@home | Backing off 01:58:29 on upload of 01au13ab.15508.12337.438086664204.12.111_0_0
18/03/2014 17:09:37 | SETI@home | [file_xfer] file transfer status -113 (can't resolve hostname)
18/03/2014 17:09:37 | SETI@home | Temporarily failed upload of 01jn13aa.13749.123934.438086664202.12.20_0_0: can't resolve hostname
18/03/2014 17:09:37 | SETI@home | [file_xfer] project-wide xfer delay for 17024.164556 sec
18/03/2014 17:09:37 | SETI@home | Backing off 00:05:27 on upload of 01jn13aa.13749.123934.438086664202.12.20_0_0
18/03/2014 17:10:04 |  | Project communication failed: attempting access to reference site
18/03/2014 17:10:07 |  | Internet access OK - project servers may be temporarily down.

..

18/03/2014 17:13:42 | SETI@home | Started upload of 01jn13aa.13749.123934.438086664202.12.20_0_0
18/03/2014 17:13:46 | SETI@home | Finished upload of 01jn13aa.13749.123934.438086664202.12.20_0_0
18/03/2014 17:13:49 | SETI@home | Started upload of 01au13ab.15508.12337.438086664204.12.111_0_0
18/03/2014 17:13:53 | SETI@home | Finished upload of 01au13ab.15508.12337.438086664204.12.111_0_0


ID: 1490685 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490691 - Posted: 18 Mar 2014, 16:34:15 UTC - in response to Message 1490685.  
Last modified: 18 Mar 2014, 16:35:55 UTC

(can't resolve hostname) is usually a DNS failure closer to home.

But now I'm getting "connect() failed" - that's the 21 second timeout - on uploads. I wondered if Eric (according to Beta, he's the only tech in town) was going to try turning all the servers off, and rebooting them in the right order, but if I can post this (and you can read it) the web and database servers are still alive.

(Edit - but the scheduler is down for maintenance)
ID: 1490691 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1490694 - Posted: 18 Mar 2014, 16:50:30 UTC
Last modified: 18 Mar 2014, 16:54:56 UTC

Uploads are dead here on main now and downloads have dropped off per Cricket. Scheduler is still up.

Edit: Server status page updated [As of 18 Mar 2014, 16:50:06 UTC].
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1490694 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1490699 - Posted: 18 Mar 2014, 16:53:55 UTC - in response to Message 1490694.  

Uploads are dead here on main now and downloads have dropped off per Cricket. Scheduler is still up.

Uploads are working again and my Beta Upload just cleared;
Tue Mar 18 12:51:38 2014 | SETI@home Beta Test | Started upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0
Tue Mar 18 12:51:51 2014 | SETI@home Beta Test | Finished upload of ap_25jn13ac_B0_P1_00384_20140308_21884.wu_2_0
Tue Mar 18 12:51:51 2014 | SETI@home Beta Test | Sending scheduler request: To report completed tasks.
Tue Mar 18 12:51:51 2014 | SETI@home Beta Test | Reporting 1 completed tasks
Tue Mar 18 12:51:51 2014 | SETI@home Beta Test | Not requesting tasks: "no new tasks" requested via Manager
Tue Mar 18 12:51:52 2014 | SETI@home Beta Test | Scheduler request completed

;-)
ID: 1490699 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1490705 - Posted: 18 Mar 2014, 17:00:31 UTC - in response to Message 1490691.  

(can't resolve hostname) is usually a DNS failure closer to home.

My system (& thus BOINC) had just revived from hibernation, before BOINC tried that upload. So whatever DNS trouble had managed to manifest itself, must then have happened during the time the system was in hibernation.
ID: 1490705 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490711 - Posted: 18 Mar 2014, 17:05:03 UTC - in response to Message 1490705.  

(can't resolve hostname) is usually a DNS failure closer to home.

My system (& thus BOINC) had just revived from hibernation, before BOINC tried that upload. So whatever DNS trouble had managed to manifest itself, must then have happened during the time the system was in hibernation.

Can happen that a PC is quicker to recover to the desktop, than all the network connections are to re-establish themselves - especially if you use a wireless network connection (or a modem). If you have a hard-wired router, and leave that powered up, it's usually quicker.
ID: 1490711 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1490715 - Posted: 18 Mar 2014, 17:08:34 UTC

Here's the big bulge in ready to send that Richard predicted earlier in this thread:

[As of 18 Mar 2014, 17:00:05 UTC]
Data Distribution State SETI@home # Astropulse # As of*
Results ready to send 602,189 26,288 9m
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1490715 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490731 - Posted: 18 Mar 2014, 17:22:23 UTC - in response to Message 1490715.  

Here's the big bulge in ready to send that Richard predicted earlier in this thread:

[As of 18 Mar 2014, 17:00:05 UTC]
Data Distribution State SETI@home # Astropulse # As of*
Results ready to send 602,189 26,288 9m

And did you see "result creation rate 298.6622/sec"

We haven't reached the top yet...
ID: 1490731 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1490747 - Posted: 18 Mar 2014, 17:47:42 UTC

And did you see "result creation rate 298.6622/sec"

Yes, and it's even higher how which botched up my theory of how it is calculated. How about the S@H V7 pending validation? Hope they fix that.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1490747 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1490750 - Posted: 18 Mar 2014, 17:52:02 UTC - in response to Message 1490711.  

Can happen that a PC is quicker to recover to the desktop, than all the network connections are to re-establish themselves - especially if you use a wireless network connection (or a modem). If you have a hard-wired router, and leave that powered up, it's usually quicker.

All my PCs and NAS are hardwired to the router. I prefer to have a constant available 1 gigabit LAN, than a sometimes available 135 Mbit LAN (300 Mbit is impossible, even if you're right next to the router).

The router and NAS are always on.
ID: 1490750 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1490752 - Posted: 18 Mar 2014, 17:55:23 UTC - in response to Message 1490747.  

And did you see "result creation rate 298.6622/sec"

Yes, and it's even higher how which botched up my theory of how it is calculated. How about the S@H V7 pending validation? Hope they fix that.

I think the validation number is genuine, now the transitioners have found the results reported overnight.

Joe had an explanation for the 'result creation rate' a few days ago. The splitters make WUs, and put them in the database: but transitioners are needed to turn WUs into results (tasks). Making a WU is a slow, steady process that involves mathematics and data shuffling from disk to disk: making a result from a WU is just a bit of database record-keeping, and takes barely any time. Did you see the database queries/second go over 4,500?
ID: 1490752 · Report as offensive
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (87) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.