Message boards :
Number crunching :
Panic Mode On (5) Server Problems!
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next
Author | Message |
---|---|
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65762 Credit: 55,293,173 RAC: 49 |
Right now three of My 4 PCs can't upload, I don't know about reporting of course, The following is from PC1. 9/29/2007 7:02:48 AM|SETI@home|Computation for task 07mr07ab.20893.9480.9.6.212_2 finished 9/29/2007 7:02:48 AM||Starting 07mr07ad.4802.9888.4.6.241_0 9/29/2007 7:02:48 AM|SETI@home|Starting task 07mr07ad.4802.9888.4.6.241_0 using setiathome_enhanced version 527 9/29/2007 7:02:50 AM|SETI@home|[file_xfer] Started upload of file 07mr07ab.20893.9480.9.6.212_2_0 9/29/2007 7:03:33 AM||Project communication failed: attempting access to reference site 9/29/2007 7:03:33 AM|SETI@home|[file_xfer] Temporarily failed upload of 07mr07ab.20893.9480.9.6.212_2_0: system connect 9/29/2007 7:03:33 AM|SETI@home|Backing off 1 min 0 sec on upload of file 07mr07ab.20893.9480.9.6.212_2_0 9/29/2007 7:03:34 AM||Access to reference site succeeded - project servers may be temporarily down. 9/29/2007 7:04:33 AM|SETI@home|[file_xfer] Started upload of file 07mr07ab.20893.9480.9.6.212_2_0 9/29/2007 7:09:38 AM||Project communication failed: attempting access to reference site 9/29/2007 7:09:38 AM|SETI@home|[file_xfer] Temporarily failed upload of 07mr07ab.20893.9480.9.6.212_2_0: http error 9/29/2007 7:09:38 AM|SETI@home|Backing off 1 min 0 sec on upload of file 07mr07ab.20893.9480.9.6.212_2_0 9/29/2007 7:09:39 AM||Access to reference site succeeded - project servers may be temporarily down. 9/29/2007 7:10:39 AM|SETI@home|[file_xfer] Started upload of file 07mr07ab.20893.9480.9.6.212_2_0 The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Labbie Send message Joined: 19 Jun 06 Posts: 4083 Credit: 5,930,102 RAC: 0 |
I've got one that won't upload and another that won't report. You're not alone Batman. [EDIT] And of course, they won't download new work either [/EDIT] Calm Chaos Forum...Join Calm Chaos Now |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Yeah, network traffic died about an hour ago. As far as reporting goes, the scheduler seem to be responding, so anything which had been uploaded more than an hour ago should go through on the report, but everything later than that will be stuck. I also could hit the fanout directory, so theoretically you should still be able to get new work, but traffic graph indicates not much is going out the door right now either. <edit> @ Labbie: Interesting, I had one host which appears to have reported after the the traffic graph and the one upload I have pending. Maybe I just got lucky. <edit2> OOPS, on second look, I haven't had any apparent scheduler contact since about 12:00 UTC, which was an hour before the traffic graph crashed. Alinator |
Elphidieus Send message Joined: 1 Nov 02 Posts: 67 Credit: 3,140,607 RAC: 0 |
These crap of being unable to download or upload anything from seti server is really getting on my nerves.... I don't really get some of you guys with green stars next to your ID can really cope with all those $$$ going down to waste, only to be greeted by this MB plague.... |
W-K 666 Send message Joined: 18 May 99 Posts: 19074 Credit: 40,757,560 RAC: 67 |
These crap of being unable to download or upload anything from seti server is really getting on my nerves.... I don't really get some of you guys with green stars next to your ID can really cope with all those $$$ going down to waste, only to be greeted by this MB plague.... So what are you doing wrong. This the grah for the last ten days. |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
The Cricket graph shows the bandwidth completely free for the last few hours. No wonder the completed WU results cannot be u/led and new work d/l The server status looks OK, but this was at noon with no updates since It's good to be back amongst friends and colleagues |
Swibby Bear Send message Joined: 1 Aug 01 Posts: 246 Credit: 7,945,093 RAC: 0 |
Eric posted this message a day or two ago, and I assume that he is implementing this database change (reconfiguring RAID) at this time. Whit * * * * * * * * * * * * * * * * * * * * * I think we've figured out what the real issue with gowron (the NAS box that holds the work units) is. It's that those who don't learn from history are doomed to repeat it. In other words we had this problem once before.... about 7 years ago. Yesterday, David reminded me how much smarter I was back then. We have gowron configured into RAID 5 arrays, which is positively stupid for the job it's doing. RAID 5 works great for reading large files since you are reading from (in our case) from 5 drives at once. For reading small files like work units it sucks, because every time you read one you are moving the heads of all 6 drives and then reading a small amount of data. In other words we're spending all of our time seeking and very little actually reading. A much better way would be to have 6 drives set up as a concatenation of three RAID 1 mirrors. That way each pair of drives would seek individually, so we'd reduce our relative time spent seeking by a factor of 3 at the expense of slowing down our data reads by a factor of 2.5. It's worth the trade, so I'm moving workunits off of gowron so we can reconfigure. Eric |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
Not getting any work and caches are emptying fast. The server status page looks Greek to me. I suppose it'll be four more days of Einstein again and a further reduction of RAC. I should have reset the cache back to two days but it's too much goddamn trouble guessing when/when not to do so. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65762 Credit: 55,293,173 RAC: 49 |
... I lost a crunched WU, I'm getting the same error on My PCs, So It's not on our End, Berkeley has an amnesiac on Its hands I think. :D Hopefully somebody can fix this as I can upload, Just not report and I have 15 WU's and rising to report. 9/29/2007 11:57:29 AM|SETI@home|Starting task 07mr07ad.4828.22567.7.6.180_1 using setiathome_enhanced version 527 9/29/2007 11:57:31 AM|SETI@home|[file_xfer] Started upload of file 07mr07ad.4789.7434.3.6.169_2_0 9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Finished upload of file 07mr07ad.4789.7434.3.6.169_2_0 9/29/2007 11:57:35 AM|SETI@home|[file_xfer] Throughput 45378 bytes/sec 9/29/2007 12:01:14 PM|SETI@home|Sending scheduler request: Requested by user 9/29/2007 12:01:14 PM|SETI@home|Reporting 6 tasks 9/29/2007 12:01:19 PM|SETI@home|Scheduler RPC succeeded 9/29/2007 12:01:19 PM|SETI@home|Message from server: Server error: can't attach shared memory 9/29/2007 12:01:19 PM|SETI@home|Deferring communication 1 hr 0 min 0 sec, because project is down The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
At least 3 days cache here. Closest deadline is 5 Oct, and that one is complete waiting to upload (Beta). I've just suspended Network Activity and will check the Cricket graphs every few hours. I think about 3 days is probably optimum cache size. If you've run out of SETI work, try another project as backup e.g. Rosetta, Leiden, Einstein etc. More projects here. [EDIT] I have SETI main on 50% resource share with all my other projects (including SETI Beta) sharing the other 50% [/EDIT] Sir Arthur C Clarke 1917-2008 |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
... I lost a crunched WU, OK, just to make sure I'm reading this situation correctly; The results you and Crystallize are referring to have dropped off the CC task list? Alinator |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65762 Credit: 55,293,173 RAC: 49 |
... I lost a crunched WU, I have no idea, All I know is I can't report, I've since stopped any network traffic to Seti till this is fixed as It can't be on Our end as It's happening on all My PCs and So unless Someone decided to ban us(Which I doubt), Then the Server has a problem that needs fixing thats all. The other part is Crystallizes idea. None of My PCs have any such thing as Shared memory. Virtual and real memory I have and that's It. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
It is on Berk's end. I'm getting the message too. No need to worry, nothing is lost, communications is just deferred for awhile until they get things working again. Matt said he would be moving things around and I would guess he had to shut things down while he fixed them. Oops, my mistake, it was Eric that said that. PROUD MEMBER OF Team Starfire World BOINC |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65762 Credit: 55,293,173 RAC: 49 |
This has happened before and Eric had to look into the Logs and such to fix what ever happened, So I'm in the USA on the Pacific Coast and the Time zone of the same name. Shared memory or some dimm must be flaky or something on the Server. 1999-Classic 2004 2006 http://www.google.com/search?q=seti+server+can%27t+attach+shared+memory&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a I'm not sure Resetting each PCs Boinc/Seti program will help, As I'd I'd lose all the data I'd crunched and not reported. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65762 Credit: 55,293,173 RAC: 49 |
It is on Berk's end. I'm getting the message too. No need to worry, nothing is lost, communications is just deferred for awhile until they get things working again. Matt said he would be moving things around and I would guess he had to shut things down while he fixed them. Thanks. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
What I meant was if the number of results shown in progress for the host on the web site is the same as the number of WU's on the CC task list for SAH, then the report didn't really happen, even though the message tab logs make it look like it went poof due to the failure message. So as was mentioned, when the backend gets straightened out they should go through then OK, with no lost credit. Alinator |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65762 Credit: 55,293,173 RAC: 49 |
What I meant was if the number of results shown in progress for the host on the web site is the same as the number of WU's on the CC task list for SAH, then the report didn't really happen, even though the message tab logs make it look like it went poof due to the failure message. Then suspending Seti network activity can't hurt, As then I'll be able to crunch uninterrupted for a bit. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I noticed my PIII had managed to report it's one and only Beta WU, While my C2D machine still said: 'Server error: can't attach shared memory' So i restarted Boinc on the C2D and now it reported all it's completed tasks, Both on Main & Beta. But now it's showing that there's no work available. So I've set them to NNT for now. Claggy |
John Clark Send message Joined: 29 Sep 99 Posts: 16515 Credit: 4,418,829 RAC: 0 |
My PIIIs did the same, but shutting BOINC down on my C2Q and restarting did not send the 24 results ready to report. Keeps delaying communications by 1 hour! Looks like something is starting to move! The 26 WU results which were ready to report have all done so, and a shed load of new WUs are waiting to be downloaded (contact has been made). It's good to be back amongst friends and colleagues |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Yep, looks like things have been kick started again. Network traffic is way up, and the result creation rate has hit 18/s (don't think that'll last too long though, likely to gum up other processes if it stays that way). Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.