Panic Mode On (80) Server Problems?

Message boards : Number crunching : Panic Mode On (80) Server Problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 25 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1321335 - Posted: 29 Dec 2012, 16:12:44 UTC

Almost nothing is working currently, most of us are on backup projects.

ID: 1321335 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65689
Credit: 55,293,173
RAC: 49
United States
Message 1321343 - Posted: 29 Dec 2012, 16:24:33 UTC - in response to Message 1321335.  

Aye the plumbing is all plugged up, ach someone stuffed too many Klingons in there...
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1321343 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1321391 - Posted: 29 Dec 2012, 17:26:48 UTC

I'm seeing the same thing you guys are - Scheduler time-outs, them come back 5 minutes later and get through, and pick up the ghosts from the earlier contact.

First tries at downloads time out, second or third time they come down like greased lightning. Must depend on which download server I hit.
Donald
Infernal Optimist / Submariner, retired
ID: 1321391 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1321430 - Posted: 29 Dec 2012, 17:50:54 UTC
Last modified: 29 Dec 2012, 17:52:21 UTC

A couple days ago, I set the host with the new GT630 to NNT for Einstein because I wanted it to actually get some Seti done so I could see how long it was taking 2 at a time. So I check on it today and find it doing nothing because all the Seti downloads are stuck. Grrrrr.

Seeing that even babysitting it wasn't going to get me Seti work as fast as it could be finished, I turned Einstein loose again.

Now I want to know why Einsteins that are due on 1/12/13 and estimated to run in 5:32 are running HP ahead of the one Seti I managed to get that is due on 1/11/13 and estimated to run in 21:32.

Meanwhile, my i7 suddenly had a surge of successful downloads. It got everything and asked for more, and was resent 20 ghosts. Then another 20 ghosts. Then I discovered a typo in its hosts file and fixed it. Now it's moving downloads in little bits at a time (no networking pun intended). Maybe I should take out that line in hosts. [edit] Or change it from georgem to vader.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1321430 · Report as offensive
Profile Brother Frank

Send message
Joined: 10 Dec 11
Posts: 26
Credit: 15,142,410
RAC: 0
United States
Message 1321491 - Posted: 29 Dec 2012, 18:13:56 UTC - in response to Message 1321335.  

My upload timeouts began about Wednesday and got worse day by day. Noticed that my RAC rose and rose from Dec 5th until Dec 27th time point. I am getting work unit limit reached even on my little i3 notebook without dedicated graphics. Nothing is working here anymore. Am switching to alternate project GPU Grid and Rosetta @ Home today. Have tried proxy servers and no new work setting used during that very bad set of scheduler/server problems from November to early December time period. This last two + days feels as bad as the last meltdown.
Frank Elliott,Member of Carepages.com,a chronic illness support site. Was FrankLivingFully there.Free user name & pw needed. My Google+ Profile is:
https://profiles.google.com/u/0/10871372137584 Science,SF,Space,Astronomy,Medicine,Psyc Topics.
ID: 1321491 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1321498 - Posted: 29 Dec 2012, 18:16:28 UTC - in response to Message 1321391.  

I'm seeing the same thing you guys are - Scheduler time-outs, them come back 5 minutes later and get through, and pick up the ghosts from the earlier contact.

First tries at downloads time out, second or third time they come down like greased lightning. Must depend on which download server I hit.

Yes, I synchronised my home machine with NTP (I presume the Labs are too; my lab servers are) and observed that after my PC sent a request for more work both the time-stamp on my "Last contact" and the "Sent" time for (a) recent file(s) updated to 3-5 seconds after my request time. Sometimes the scheduler request would time out, other times I got notification in 5-20 seconds that there was more work, often in the form of ghosts if the file entry existed before my request. Seems the database is updating quickly, but somewhere along the chain the return of information about the new file(s) is disappearing.
ID: 1321498 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1321552 - Posted: 29 Dec 2012, 18:58:57 UTC
Last modified: 29 Dec 2012, 19:12:37 UTC

It's still sometimes taking me around 5-10 minutes to complete a simple file upload. Once that is accomplished, the rest seems to work fine. If you are working on 20-40 minute long tasks, as myself, it's not that much of a problem. First step is to Fix the Upload Server.
ID: 1321552 · Report as offensive
Antjest
Volunteer tester

Send message
Joined: 27 Oct 99
Posts: 27
Credit: 19,796,139
RAC: 0
Slovenia
Message 1321613 - Posted: 29 Dec 2012, 20:40:51 UTC

All the problems started when 4th and 5th AP splitter went online. That was the only visible change after the maintenance outage.
Wheather this is server overload problem or network congestion.

They should stop those two and my bet is everything will return to normal as it was for a few days before.

ID: 1321613 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1321681 - Posted: 29 Dec 2012, 21:54:03 UTC - in response to Message 1321613.  
Last modified: 29 Dec 2012, 22:05:50 UTC

The splitters are working better than they have for months, the assimilators are finally keeping up.
And it appears the Scheduler is no longer giving errors, it's just a shame it's not possible to upload and almost all Scheduler requests result in a timeout.


EDIT- and once again for some unfathonable reason, setting No New Tasks increases the chance of not getting a Scheduler time out. It still happens, just not as regularly as when trying to get work.
Grant
Darwin NT
ID: 1321681 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1321717 - Posted: 29 Dec 2012, 22:14:01 UTC

Panic mode? Again?

I know to the funding issue, I have made my donation, but....

Good luck for next year, I'm officially off-line again. Power bill vs S@H....

Set all my crunchers to NNT, only left Beta Big units to crunch to research.

See You All next Year and HOPE for Best!
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1321717 · Report as offensive
Rolf

Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 0
Switzerland
Message 1321733 - Posted: 29 Dec 2012, 22:20:50 UTC - in response to Message 1321613.  
Last modified: 29 Dec 2012, 22:22:40 UTC

All the problems started when 4th and 5th AP splitter went online. That was the only visible change after the maintenance outage.

This is more or less the same I wanted to say in my post:
http://setiathome.berkeley.edu/forum_thread.php?id=70070&postid=1321332
Easy to test, isn't too complicated! I think it's worth testing it!
Edit: And it will not increase anybody's power bill.
ID: 1321733 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65689
Credit: 55,293,173
RAC: 49
United States
Message 1321749 - Posted: 29 Dec 2012, 22:29:21 UTC - in response to Message 1321733.  

All the problems started when 4th and 5th AP splitter went online. That was the only visible change after the maintenance outage.

This is more or less the same I wanted to say in my post:
http://setiathome.berkeley.edu/forum_thread.php?id=70070&postid=1321332
Easy to test, isn't too complicated! I think it's worth testing it!
Edit: And it will not increase anybody's power bill.

So that's what torpedoed the system... sigh.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1321749 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321803 - Posted: 29 Dec 2012, 23:09:25 UTC

Ya know.........the kitties really don't care anymore.
They have given this poject EVERYTHING for many years now.


Yes, it pisses me off when things turn to sh/t.
I want my caches back.

It really does piss me off right now.. But, when the smoke clears, as it always does sooner or later.

You know who the first one back on track is?

Me.

Me. So, nobody best be calling me any names at this point.


"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1321803 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1321833 - Posted: 29 Dec 2012, 23:56:29 UTC

FWIW Uploads are uploading with an occasional Retry
to help it along , downloads are downloading,
the GPUs are staying busy with SETI. Rosetta and Einstein
on the CPUs. I'm only running 1.0/.01 for Cache.

Maybe a smaller Cache works better in the present circumstances.
ID: 1321833 · Report as offensive
Chris Oliver Project Donor
Avatar

Send message
Joined: 4 Jul 99
Posts: 72
Credit: 134,288,250
RAC: 15
United Kingdom
Message 1321855 - Posted: 30 Dec 2012, 0:26:17 UTC - in response to Message 1321803.  

Best to ween the kitties off the weed in the new year.....



Ya know.........the kitties really don't care anymore.
They have given this poject EVERYTHING for many years now.


Yes, it pisses me off when things turn to sh/t.
I want my caches back.

It really does piss me off right now.. But, when the smoke clears, as it always does sooner or later.

You know who the first one back on track is?

Me.

Me. So, nobody best be calling me any names at this point.



ID: 1321855 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1321856 - Posted: 30 Dec 2012, 0:27:22 UTC

As I recall, during the other 'Psychotic Server Episodes', the Uploads continued without any problems. It never took longer than a second or two to complete the upload. This is what has been going on since yesterday, around the time the Multibeam Shortie Storm began;
12/29/2012 6:58:31 PM | SETI@home | Computation for task ap_25no12ad_B5_P1_00295_20121227_30240.wu_1 finished
12/29/2012 6:58:31 PM | SETI@home | Starting task ap_25no12ad_B6_P0_00240_20121227_31377.wu_1 using astropulse_v6 version 604 (ati_opencl_100) in slot 2
12/29/2012 6:58:33 PM | SETI@home | Started upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 6:58:56 PM |  | Project communication failed: attempting access to reference site
12/29/2012 6:58:56 PM | SETI@home | Temporarily failed upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0: connect() failed
12/29/2012 6:58:56 PM | SETI@home | Backing off 3 min 47 sec on upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 6:58:57 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 6:59:28 PM | SETI@home | Started upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:00:13 PM |  | Project communication failed: attempting access to reference site
12/29/2012 7:00:13 PM | SETI@home | Temporarily failed upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0: connect() failed
12/29/2012 7:00:13 PM | SETI@home | Backing off 5 min 12 sec on upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:00:15 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 7:00:54 PM | SETI@home | Started upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:04:04 PM |  | Project communication failed: attempting access to reference site
12/29/2012 7:04:04 PM | SETI@home | Temporarily failed upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0: transient HTTP error
12/29/2012 7:04:04 PM | SETI@home | Backing off 10 min 45 sec on upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:04:05 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 7:04:06 PM | SETI@home | Started upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:05:02 PM |  | Project communication failed: attempting access to reference site
12/29/2012 7:05:02 PM | SETI@home | Temporarily failed upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0: connect() failed
12/29/2012 7:05:02 PM | SETI@home | Backing off 20 min 59 sec on upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:05:03 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 7:05:04 PM | SETI@home | Started upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:05:37 PM | SETI@home | Finished upload of ap_25no12ad_B5_P1_00295_20121227_30240.wu_1_0
12/29/2012 7:05:40 PM | SETI@home | Sending scheduler request: To fetch work.
12/29/2012 7:05:40 PM | SETI@home | Reporting 1 completed tasks, requesting new tasks for NVIDIA and ATI
12/29/2012 7:05:44 PM | SETI@home | Scheduler request completed: got 1 new tasks
12/29/2012 7:05:47 PM | SETI@home | Started download of 08oc12aa.31216.126844.12.10.140
12/29/2012 7:06:01 PM | SETI@home | Finished download of 08oc12aa.31216.126844.12.10.140
....

That's 7 minutes to upload one file. If that would have been last night during the height of the SS, I would have had another completed task before that one uploaded. In fact, I had around 8 stalled uploads at one point last night. If I got them all to remain active long enough, the report went through and I received downloaded files. If one was stalled, nothing worked. Things calmed down when the SS passed and the 4 minute MBs were replaced with 20+ minute MBs. Right now, I'm completing the Upload before the next task is finished.

The way I see it, the latest problem began when someone decided to release a Shortie Storm on a malfunctioning Upload Server. The Upload Server is still Borked.
ID: 1321856 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 1321887 - Posted: 30 Dec 2012, 1:37:12 UTC - in response to Message 1321856.  

The way I see it, the latest problem began when someone decided to release a Shortie Storm on a malfunctioning Upload Server. The Upload Server is still Borked.

Even before the shorties, uploads & the Scheduler were both stuffed.
At least ince the shorties haved stopped arriving things aren't as bad as they were- i wouldn't say they've improved as things are still seriously screwed. But they're not as bad as they were (uploads still take ages & i'm still getting them backing up & Scheduler requests still timeout although the number of errors has dropped).
Grant
Darwin NT
ID: 1321887 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1321910 - Posted: 30 Dec 2012, 2:47:50 UTC - in response to Message 1321855.  
Last modified: 30 Dec 2012, 2:49:28 UTC

Best to ween the kitties off the weed in the new year.....




And YOUR point is exactly.........WHAT?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1321910 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1321918 - Posted: 30 Dec 2012, 3:12:54 UTC - in response to Message 1321887.  

The way I see it, the latest problem began when someone decided to release a Shortie Storm on a malfunctioning Upload Server. The Upload Server is still Borked.

Even before the shorties, uploads & the Scheduler were both stuffed.
At least ince the shorties haved stopped arriving things aren't as bad as they were- i wouldn't say they've improved as things are still seriously screwed. But they're not as bad as they were (uploads still take ages & i'm still getting them backing up & Scheduler requests still timeout although the number of errors has dropped).

I really haven't had any trouble with downloads. Other than the scheduler continually trying to give me all ATI MBs and nothing for the nVidia card. I just made the transition from all MBs to half APs & half MBs. Things are working well except for the Upload Server problem. Here's the latest one, 11 minutes and note how the scheduler didn't make a peep during that time. As soon as the upload completed, the scheduler kicked in and topped me off at 200 again.
12/29/2012 9:32:33 PM | SETI@home | Computation for task 08oc12ab.6744.8247.6.10.239_0 finished
12/29/2012 9:32:33 PM | SETI@home | Starting task 08oc12ab.6744.8247.6.10.208_1 using setiathome_enhanced version 609 (cuda23) in slot 3
12/29/2012 9:32:35 PM | SETI@home | Started upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:33:38 PM | SETI@home | Started download of 08oc12aa.13907.128071.13.10.99
12/29/2012 9:37:22 PM |  | Project communication failed: attempting access to reference site
12/29/2012 9:37:22 PM | SETI@home | Temporarily failed upload of 08oc12ab.6744.8247.6.10.239_0_0: transient HTTP error
12/29/2012 9:37:22 PM | SETI@home | Backing off 3 min 56 sec on upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:37:24 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 9:37:27 PM | SETI@home | Started upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:38:38 PM |  | Project communication failed: attempting access to reference site
12/29/2012 9:38:38 PM | SETI@home | Temporarily failed upload of 08oc12ab.6744.8247.6.10.239_0_0: transient HTTP error
12/29/2012 9:38:38 PM | SETI@home | Backing off 4 min 15 sec on upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:38:39 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 9:38:43 PM | SETI@home | Started upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:39:06 PM |  | Project communication failed: attempting access to reference site
12/29/2012 9:39:06 PM | SETI@home | Temporarily failed upload of 08oc12ab.6744.8247.6.10.239_0_0: connect() failed
12/29/2012 9:39:06 PM | SETI@home | Backing off 9 min 57 sec on upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:39:07 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 9:39:11 PM | SETI@home | Started upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:39:13 PM |  | Project communication failed: attempting access to reference site
12/29/2012 9:39:13 PM | SETI@home | Temporarily failed download of 08oc12aa.13907.128071.13.10.99: transient HTTP error
12/29/2012 9:39:13 PM | SETI@home | Backing off 6 min 0 sec on download of 08oc12aa.13907.128071.13.10.99
12/29/2012 9:39:14 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 9:39:15 PM | SETI@home | Started download of 08oc12aa.13907.128071.13.10.99
12/29/2012 9:39:51 PM | SETI@home | Finished download of 08oc12aa.13907.128071.13.10.99
12/29/2012 9:41:37 PM |  | Project communication failed: attempting access to reference site
12/29/2012 9:41:37 PM | SETI@home | Temporarily failed upload of 08oc12ab.6744.8247.6.10.239_0_0: transient HTTP error
12/29/2012 9:41:37 PM | SETI@home | Backing off 22 min 6 sec on upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:41:38 PM |  | Internet access OK - project servers may be temporarily down.
12/29/2012 9:41:40 PM | SETI@home | Started upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:43:47 PM | SETI@home | Finished upload of 08oc12ab.6744.8247.6.10.239_0_0
12/29/2012 9:43:48 PM | SETI@home | Sending scheduler request: To fetch work.
12/29/2012 9:43:48 PM | SETI@home | Reporting 1 completed tasks, requesting new tasks for NVIDIA and ATI
12/29/2012 9:43:54 PM | SETI@home | Scheduler request completed: got 1 new tasks
12/29/2012 9:43:56 PM | SETI@home | Started download of 08oc12ab.13879.15200.11.10.53
12/29/2012 9:44:10 PM | SETI@home | Finished download of 08oc12ab.13879.15200.11.10.53
....
ID: 1321918 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1321923 - Posted: 30 Dec 2012, 3:35:11 UTC

I know this latest mess seemed to start about the same time as the AP splitters got `up to speed`
So, instead of switching some of them off to see if that fixes anything,
its been done so why flog a dead horse again,
lets try something crazy and turn off the multibeam splitters instead,
I know it sounds insane, but sometimes you just got to take a walk on the wild side to get a look at the problem from a diferent angle,
fault finding in a complex system is a git to do,
and mostly all you can hope to do is make the problem react to something you did even if it can not be seen directly where the problem is at least it did something different than last ten times you tried to poke it with a sharp stick and missed.
The project is kind of stuffed anyway so what have we got to loose.

Ok, so i may be mad or so far from the truth or real problem that i may be nearer to finding ET cos i am so far out i end up being closer to them,
whatever,
just a bit of frustrated head scratching kind of idea,
ID: 1321923 · Report as offensive
1 · 2 · 3 · 4 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.