Panic Mode On (5) Server Problems!

Message boards : Number crunching : Panic Mode On (5) Server Problems!
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next

AuthorMessage
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65762
Credit: 55,293,173
RAC: 49
United States
Message 650859 - Posted: 29 Sep 2007, 14:06:52 UTC
Last modified: 29 Sep 2007, 14:11:18 UTC

Right now three of My 4 PCs can't upload, I don't know about reporting of course, The following is from PC1.

9/29/2007 7:02:48 AM|SETI@home|Computation for task 07mr07ab.20893.9480.9.6.212_2 finished
9/29/2007 7:02:48 AM||Starting 07mr07ad.4802.9888.4.6.241_0
9/29/2007 7:02:48 AM|SETI@home|Starting task 07mr07ad.4802.9888.4.6.241_0 using setiathome_enhanced version 527
9/29/2007 7:02:50 AM|SETI@home|[file_xfer] Started upload of file 07mr07ab.20893.9480.9.6.212_2_0
9/29/2007 7:03:33 AM||Project communication failed: attempting access to reference site
9/29/2007 7:03:33 AM|SETI@home|[file_xfer] Temporarily failed upload of 07mr07ab.20893.9480.9.6.212_2_0: system connect
9/29/2007 7:03:33 AM|SETI@home|Backing off 1 min 0 sec on upload of file 07mr07ab.20893.9480.9.6.212_2_0
9/29/2007 7:03:34 AM||Access to reference site succeeded - project servers may be temporarily down.
9/29/2007 7:04:33 AM|SETI@home|[file_xfer] Started upload of file 07mr07ab.20893.9480.9.6.212_2_0
9/29/2007 7:09:38 AM||Project communication failed: attempting access to reference site
9/29/2007 7:09:38 AM|SETI@home|[file_xfer] Temporarily failed upload of 07mr07ab.20893.9480.9.6.212_2_0: http error
9/29/2007 7:09:38 AM|SETI@home|Backing off 1 min 0 sec on upload of file 07mr07ab.20893.9480.9.6.212_2_0
9/29/2007 7:09:39 AM||Access to reference site succeeded - project servers may be temporarily down.
9/29/2007 7:10:39 AM|SETI@home|[file_xfer] Started upload of file 07mr07ab.20893.9480.9.6.212_2_0
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 650859 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 650860 - Posted: 29 Sep 2007, 14:10:52 UTC
Last modified: 29 Sep 2007, 14:12:34 UTC

I've got one that won't upload and another that won't report. You're not alone Batman.

[EDIT] And of course, they won't download new work either [/EDIT]


Calm Chaos Forum...Join Calm Chaos Now
ID: 650860 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 650861 - Posted: 29 Sep 2007, 14:13:55 UTC
Last modified: 29 Sep 2007, 14:20:18 UTC

Yeah, network traffic died about an hour ago. As far as reporting goes, the scheduler seem to be responding, so anything which had been uploaded more than an hour ago should go through on the report, but everything later than that will be stuck.

I also could hit the fanout directory, so theoretically you should still be able to get new work, but traffic graph indicates not much is going out the door right now either.

<edit> @ Labbie: Interesting, I had one host which appears to have reported after the the traffic graph and the one upload I have pending. Maybe I just got lucky.

<edit2> OOPS, on second look, I haven't had any apparent scheduler contact since about 12:00 UTC, which was an hour before the traffic graph crashed.

Alinator
ID: 650861 · Report as offensive
Elphidieus

Send message
Joined: 1 Nov 02
Posts: 67
Credit: 3,140,607
RAC: 0
Malaysia
Message 650862 - Posted: 29 Sep 2007, 14:14:58 UTC
Last modified: 29 Sep 2007, 14:15:07 UTC

These crap of being unable to download or upload anything from seti server is really getting on my nerves.... I don't really get some of you guys with green stars next to your ID can really cope with all those $$$ going down to waste, only to be greeted by this MB plague....
ID: 650862 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19074
Credit: 40,757,560
RAC: 67
United Kingdom
Message 650864 - Posted: 29 Sep 2007, 14:21:54 UTC - in response to Message 650862.  

These crap of being unable to download or upload anything from seti server is really getting on my nerves.... I don't really get some of you guys with green stars next to your ID can really cope with all those $$$ going down to waste, only to be greeted by this MB plague....

So what are you doing wrong. This the grah for the last ten days.
ID: 650864 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 650879 - Posted: 29 Sep 2007, 15:21:18 UTC
Last modified: 29 Sep 2007, 15:24:14 UTC

The Cricket graph shows the bandwidth completely free for the last few hours.

No wonder the completed WU results cannot be u/led and new work d/l

The server status looks OK, but this was at noon with no updates since
It's good to be back amongst friends and colleagues



ID: 650879 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 650883 - Posted: 29 Sep 2007, 15:33:52 UTC
Last modified: 29 Sep 2007, 15:36:26 UTC

Eric posted this message a day or two ago, and I assume that he is implementing this database change (reconfiguring RAID) at this time.
Whit

* * * * * * * * * * * * * * * * * * * * *

I think we've figured out what the real issue with gowron (the NAS box that holds the work units) is. It's that those who don't learn from history are doomed to repeat it. In other words we had this problem once before.... about 7 years ago. Yesterday, David reminded me how much smarter I was back then.

We have gowron configured into RAID 5 arrays, which is positively stupid for the job it's doing. RAID 5 works great for reading large files since you are reading from (in our case) from 5 drives at once. For reading small files like work units it sucks, because every time you read one you are moving the heads of all 6 drives and then reading a small amount of data. In other words we're spending all of our time seeking and very little actually reading.

A much better way would be to have 6 drives set up as a concatenation of three RAID 1 mirrors. That way each pair of drives would seek individually, so we'd reduce our relative time spent seeking by a factor of 3 at the expense of slowing down our data reads by a factor of 2.5. It's worth the trade, so I'm moving workunits off of gowron so we can reconfigure.

Eric
ID: 650883 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 650975 - Posted: 29 Sep 2007, 18:40:31 UTC

Not getting any work and caches are emptying fast. The server status page looks Greek to me. I suppose it'll be four more days of Einstein again and a further reduction of RAC. I should have reset the cache back to two days but it's too much goddamn trouble guessing when/when not to do so.
ID: 650975 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65762
Credit: 55,293,173
RAC: 49
United States
Message 650986 - Posted: 29 Sep 2007, 19:04:27 UTC - in response to Message 650971.  

... I lost a crunched WU,
never seen this error before.

9/29/2007 8:05:28 PM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.11708.3753.16.6.233_1_0
9/29/2007 8:05:28 PM|SETI@home|[file_xfer] Throughput 21207 bytes/sec
9/29/2007 8:05:34 PM|SETI@home|Sending scheduler request: Requested by user
9/29/2007 8:05:34 PM|SETI@home|Requesting 9103 seconds of new work, and reporting 1 completed tasks
9/29/2007 8:05:39 PM|SETI@home|Scheduler RPC succeeded
9/29/2007 8:05:39 PM|SETI@home|Message from server: Server error: can't attach shared memory

The WUs have state non crunched on the Berkeley server ! 161796635

I'm getting the same error on My PCs, So It's not on our End, Berkeley has an amnesiac on Its hands I think. :D Hopefully somebody can fix this as I can upload, Just not report and I have 15 WU's and rising to report.

9/29/2007 11:57:29 AM|SETI@home|Starting task 07mr07ad.4828.22567.7.6.180_1 using setiathome_enhanced version 527
9/29/2007 11:57:31 AM|SETI@home|[file_xfer] Started upload of file 07mr07ad.4789.7434.3.6.169_2_0
9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.4789.7434.3.6.169_2_0
9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Throughput 45378 bytes/sec
9/29/2007 12:01:14 PM|SETI@home|Sending scheduler request: Requested by user
9/29/2007 12:01:14 PM|SETI@home|Reporting 6 tasks
9/29/2007 12:01:19 PM|SETI@home|Scheduler RPC succeeded
9/29/2007 12:01:19 PM|SETI@home|Message from server: Server error: can't attach shared memory
9/29/2007 12:01:19 PM|SETI@home|Deferring communication 1 hr 0 min 0 sec, because project is down
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 650986 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 650988 - Posted: 29 Sep 2007, 19:07:41 UTC
Last modified: 29 Sep 2007, 19:17:12 UTC

At least 3 days cache here. Closest deadline is 5 Oct, and that one is complete waiting to upload (Beta).

I've just suspended Network Activity and will check the Cricket graphs every few hours.

I think about 3 days is probably optimum cache size.

If you've run out of SETI work, try another project as backup e.g. Rosetta, Leiden, Einstein etc. More projects here.

[EDIT]
I have SETI main on 50% resource share with all my other projects (including SETI Beta) sharing the other 50%
[/EDIT]
Sir Arthur C Clarke 1917-2008
ID: 650988 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 650991 - Posted: 29 Sep 2007, 19:14:33 UTC - in response to Message 650986.  

... I lost a crunched WU,
never seen this error before.

9/29/2007 8:05:28 PM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.11708.3753.16.6.233_1_0
9/29/2007 8:05:28 PM|SETI@home|[file_xfer] Throughput 21207 bytes/sec
9/29/2007 8:05:34 PM|SETI@home|Sending scheduler request: Requested by user
9/29/2007 8:05:34 PM|SETI@home|Requesting 9103 seconds of new work, and reporting 1 completed tasks
9/29/2007 8:05:39 PM|SETI@home|Scheduler RPC succeeded
9/29/2007 8:05:39 PM|SETI@home|Message from server: Server error: can't attach shared memory

The WUs have state non crunched on the Berkeley server ! 161796635

I'm getting the same error on My PCs, So It's not on our End, Berkeley has an amnesiac on Its hands I think. :D Hopefully somebody can fix this as I can upload, Just not report and I have 15 WU's and rising to report.

9/29/2007 11:57:29 AM|SETI@home|Starting task 07mr07ad.4828.22567.7.6.180_1 using setiathome_enhanced version 527
9/29/2007 11:57:31 AM|SETI@home|[file_xfer] Started upload of file 07mr07ad.4789.7434.3.6.169_2_0
9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.4789.7434.3.6.169_2_0
9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Throughput 45378 bytes/sec
9/29/2007 12:01:14 PM|SETI@home|Sending scheduler request: Requested by user
9/29/2007 12:01:14 PM|SETI@home|Reporting 6 tasks
9/29/2007 12:01:19 PM|SETI@home|Scheduler RPC succeeded
9/29/2007 12:01:19 PM|SETI@home|Message from server: Server error: can't attach shared memory
9/29/2007 12:01:19 PM|SETI@home|Deferring communication 1 hr 0 min 0 sec, because project is down


OK, just to make sure I'm reading this situation correctly;

The results you and Crystallize are referring to have dropped off the CC task list?

Alinator
ID: 650991 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65762
Credit: 55,293,173
RAC: 49
United States
Message 651005 - Posted: 29 Sep 2007, 19:29:55 UTC - in response to Message 650991.  
Last modified: 29 Sep 2007, 19:32:46 UTC

... I lost a crunched WU,
never seen this error before.

9/29/2007 8:05:28 PM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.11708.3753.16.6.233_1_0
9/29/2007 8:05:28 PM|SETI@home|[file_xfer] Throughput 21207 bytes/sec
9/29/2007 8:05:34 PM|SETI@home|Sending scheduler request: Requested by user
9/29/2007 8:05:34 PM|SETI@home|Requesting 9103 seconds of new work, and reporting 1 completed tasks
9/29/2007 8:05:39 PM|SETI@home|Scheduler RPC succeeded
9/29/2007 8:05:39 PM|SETI@home|Message from server: Server error: can't attach shared memory

The WUs have state non crunched on the Berkeley server ! 161796635

I'm getting the same error on My PCs, So It's not on our End, Berkeley has an amnesiac on Its hands I think. :D Hopefully somebody can fix this as I can upload, Just not report and I have 15 WU's and rising to report.

9/29/2007 11:57:29 AM|SETI@home|Starting task 07mr07ad.4828.22567.7.6.180_1 using setiathome_enhanced version 527
9/29/2007 11:57:31 AM|SETI@home|[file_xfer] Started upload of file 07mr07ad.4789.7434.3.6.169_2_0
9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.4789.7434.3.6.169_2_0
9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Throughput 45378 bytes/sec
9/29/2007 12:01:14 PM|SETI@home|Sending scheduler request: Requested by user
9/29/2007 12:01:14 PM|SETI@home|Reporting 6 tasks
9/29/2007 12:01:19 PM|SETI@home|Scheduler RPC succeeded
9/29/2007 12:01:19 PM|SETI@home|Message from server: Server error: can't attach shared memory
9/29/2007 12:01:19 PM|SETI@home|Deferring communication 1 hr 0 min 0 sec, because project is down


OK, just to make sure I'm reading this situation correctly;

The results you and Crystallize are referring to have dropped off the CC task list?

Alinator

I have no idea, All I know is I can't report, I've since stopped any network traffic to Seti till this is fixed as It can't be on Our end as It's happening on all My PCs and So unless Someone decided to ban us(Which I doubt), Then the Server has a problem that needs fixing thats all. The other part is Crystallizes idea. None of My PCs have any such thing as Shared memory. Virtual and real memory I have and that's It.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 651005 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 651009 - Posted: 29 Sep 2007, 19:38:40 UTC
Last modified: 29 Sep 2007, 20:10:12 UTC

It is on Berk's end. I'm getting the message too. No need to worry, nothing is lost, communications is just deferred for awhile until they get things working again. Matt said he would be moving things around and I would guess he had to shut things down while he fixed them.


Oops, my mistake, it was Eric that said that.


PROUD MEMBER OF Team Starfire World BOINC
ID: 651009 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65762
Credit: 55,293,173
RAC: 49
United States
Message 651012 - Posted: 29 Sep 2007, 19:45:30 UTC

This has happened before and Eric had to look into the Logs and such to fix what ever happened, So I'm in the USA on the Pacific Coast and the Time zone of the same name. Shared memory or some dimm must be flaky or something on the Server.

1999-Classic
2004
2006
http://www.google.com/search?q=seti+server+can%27t+attach+shared+memory&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

I'm not sure Resetting each PCs Boinc/Seti program will help, As I'd I'd lose all the data I'd crunched and not reported.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 651012 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65762
Credit: 55,293,173
RAC: 49
United States
Message 651013 - Posted: 29 Sep 2007, 19:46:20 UTC - in response to Message 651009.  

It is on Berk's end. I'm getting the message too. No need to worry, nothing is lost, communications is just deferred for awhile until they get things working again. Matt said he would be moving things around and I would guess he had to shut things down while he fixed them.

Thanks.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 651013 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 651019 - Posted: 29 Sep 2007, 19:59:12 UTC
Last modified: 29 Sep 2007, 19:59:48 UTC

What I meant was if the number of results shown in progress for the host on the web site is the same as the number of WU's on the CC task list for SAH, then the report didn't really happen, even though the message tab logs make it look like it went poof due to the failure message.

So as was mentioned, when the backend gets straightened out they should go through then OK, with no lost credit.

Alinator
ID: 651019 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65762
Credit: 55,293,173
RAC: 49
United States
Message 651023 - Posted: 29 Sep 2007, 20:06:23 UTC - in response to Message 651019.  

What I meant was if the number of results shown in progress for the host on the web site is the same as the number of WU's on the CC task list for SAH, then the report didn't really happen, even though the message tab logs make it look like it went poof due to the failure message.

So as was mentioned, when the backend gets straightened out they should go through then OK, with no lost credit.

Alinator

Then suspending Seti network activity can't hurt, As then I'll be able to crunch uninterrupted for a bit.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 651023 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 651050 - Posted: 29 Sep 2007, 21:01:56 UTC

I noticed my PIII had managed to report it's one and only Beta WU,
While my C2D machine still said: 'Server error: can't attach shared memory'
So i restarted Boinc on the C2D and now it reported all it's completed tasks,
Both on Main & Beta. But now it's showing that there's no work available.
So I've set them to NNT for now.

Claggy
ID: 651050 · Report as offensive
Profile John Clark
Volunteer tester
Avatar

Send message
Joined: 29 Sep 99
Posts: 16515
Credit: 4,418,829
RAC: 0
United Kingdom
Message 651065 - Posted: 29 Sep 2007, 21:28:10 UTC
Last modified: 29 Sep 2007, 21:49:51 UTC

My PIIIs did the same, but shutting BOINC down on my C2Q and restarting did not send the 24 results ready to report. Keeps delaying communications by 1 hour!

Looks like something is starting to move!

The 26 WU results which were ready to report have all done so, and a shed load of new WUs are waiting to be downloaded (contact has been made).

It's good to be back amongst friends and colleagues



ID: 651065 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 651122 - Posted: 29 Sep 2007, 23:00:53 UTC


Yep, looks like things have been kick started again. Network traffic is way up, and the result creation rate has hit 18/s (don't think that'll last too long though, likely to gum up other processes if it stays that way).
Grant
Darwin NT
ID: 651122 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next

Message boards : Number crunching : Panic Mode On (5) Server Problems!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.