Panic Mode On (6) Server Problems!

Message boards : Number crunching : Panic Mode On (6) Server Problems!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 695824 - Posted: 30 Dec 2007, 4:57:14 UTC

I think Jim's point was that he was getting corrupted downloads, not no downloads.
I would suggest a reboot of modem/router/computer in that order if not tried already.
I have not seen any downloads that had checksum or other data errors.
Just dog-slow transfer rates, dropped connections, and HTTP or connect errors when trying to start the downloads.

On the other hand....the Cricket Graph shows inbound traffic to the servers slowing a bit, and outbound traffic very high. I have seen more downloading success in the last hour or so. Hopefully this means there is a lull in the storm at last...if the datasets being split contain some WUs with a bit of meat on them and the downloads can get 'er done and quit pestering the servers to complete, we may actually gain some ground.

Hope Matt has some success figuring out why the project only seems to be getting 60Mb/sec of bandwidth when it should be getting 100Mb/sec.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 695824 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 695827 - Posted: 30 Dec 2007, 5:17:26 UTC - in response to Message 695810.  

My total is now up to 40 workunits failed on download.
Is anyone getting good downloads?

I think there would be a lot of complaining if many were getting files that BOINC immediately discards. You're not alone, though, see the Tasks for user thread.
                                                              Joe
ID: 695827 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 695841 - Posted: 30 Dec 2007, 7:15:28 UTC - in response to Message 695827.  

My total is now up to 40 workunits failed on download.
Is anyone getting good downloads?

I think there would be a lot of complaining if many were getting files that BOINC immediately discards. You're not alone, though, see the Tasks for user thread.
                                                              Joe


Consider this a repost of my complaint where I said that it was the last straw...

2007-12-27 19:56:36 [SETI@home] [file_xfer] Started download of file 10no06aa.1460.8661.9.6.55
2007-12-27 19:56:38 [---] Access to reference site succeeded - project servers may be temporarily down.
2007-12-27 19:58:53 [SETI@home] [file_xfer] Finished download of file 10no06aa.1460.8661.9.6.55
2007-12-27 19:58:53 [SETI@home] [file_xfer] Throughput 2680 bytes/sec
2007-12-27 19:58:53 [SETI@home] [file_xfer] Started download of file 10no06aa.1460.8661.9.6.58
2007-12-27 19:58:53 [SETI@home] [error] MD5 check failed for 10no06aa.1460.8661.9.6.55
2007-12-27 19:58:53 [SETI@home] [error] expected 5940ec487388adf48e5b9896d0c835fe, got 904723d22ec8b6b13cab2b5b6d05ed64
2007-12-27 19:58:53 [SETI@home] [error] Checksum or signature error for 10no06aa.1460.8661.9.6.55
2007-12-27 19:58:54 [SETI@home] Deferring communication for 5 min 17 sec
2007-12-27 19:58:54 [SETI@home] Reason: Unrecoverable error for result 10no06aa.1460.8661.9.6.55_1 (WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>10no06aa.1460.8661.9.6.55</file_name>
  <error_code>-119</error_code>
  <error_message>MD5 check failed</error_message>
</file_xfer_error>
)
ID: 695841 · Report as offensive
peristalsis

Send message
Joined: 23 Jul 99
Posts: 154
Credit: 28,610,163
RAC: 51
United States
Message 695852 - Posted: 30 Dec 2007, 9:43:11 UTC

I had a whole batch of "checksum errors" and "access denied".
My machine/Berkeley..I dunno. Good crunches before and after the batch of problems. Gave me an excuse to take the windows box down to parade rest, take it out to the garage and give it its yearly 'blow out the crud' meeting wth the air compressor...j
ID: 695852 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 695857 - Posted: 30 Dec 2007, 10:35:27 UTC

Is there a parameter that limits to the number of outgoing pending downloads that seti provides? If so, would reducing that number help with server issues like the ones we've all had recently due to the short wu's?
ID: 695857 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 695864 - Posted: 30 Dec 2007, 12:28:09 UTC

It's now almost exactly eight days since the start of the mass attack of the AR=2.47s. Anyone who failed to complete their download will shortly be coming up to deadline and re-issue.

1) Expect a download blip again as the resends start to go out.

2) If there's no bottleneck, the resends will probably download, crunch, return and validate in double-quick time. So watch out for Work fetch bug in BOINC v5.10.13 (and many other versions as well). Although the trac ticket is still open, I think it's been fixed (and other users' reporting of the same bug has been closed - c'est la vie), but I don't know which version of BOINC the fix is included in.

Basically, if you see "file not found" in the message log for a transfer that's still trying and failing to download, just put it out of its misery with an 'abort transfer'. It ain't going nowhere.

Just another little panic for the New Year holiday......
ID: 695864 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 695959 - Posted: 30 Dec 2007, 18:45:10 UTC

Ah, bliss......

Back to broadband and a decent sized monitor.

That last bug I posted about is in every version of BOINC, v5.10.20 and below. Changelog.
ID: 695959 · Report as offensive
Jim Volfan

Send message
Joined: 22 May 99
Posts: 52
Credit: 24,239,706
RAC: 90
United States
Message 695972 - Posted: 30 Dec 2007, 19:08:38 UTC - in response to Message 695959.  

I must have posted right at the end of the flurry of bad downloads. I ended up with 40 failed workunits. I have downloaded several since then that are awaiting CPU time. Thanks for the responses.
ID: 695972 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 696154 - Posted: 31 Dec 2007, 3:32:47 UTC


Well, things were looking good for a while there, but now the traffic's gone back upto 58Mb/s again.
Hopefully it's just short spurt as some of the shorties that have reached their report deadline are being re-issued & not a whole new batch of them.
Grant
Darwin NT
ID: 696154 · Report as offensive
Andre Howard
Volunteer tester
Avatar

Send message
Joined: 16 May 99
Posts: 124
Credit: 217,463,217
RAC: 0
United States
Message 696159 - Posted: 31 Dec 2007, 3:42:38 UTC - in response to Message 696154.  
Last modified: 31 Dec 2007, 3:43:30 UTC


Well, things were looking good for a while there, but now the traffic's gone back upto 58Mb/s again.
Hopefully it's just short spurt as some of the shorties that have reached their report deadline are being re-issued & not a whole new batch of them.


No these are brand new ones, I recently hit allow new work and got a boatload dated Sept 28 06

ID: 696159 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 697741 - Posted: 5 Jan 2008, 23:35:58 UTC


I just noticed a couple of my uploads weren't, so i checked the network graphs & the traffic had been dropping off drastically for the last couple of hours with the odd spike. Checked Scarecrow's graphs & it looks like there's a blockage somewhere- results waiting for assimilation is building at about 500/hr.
Grant
Darwin NT
ID: 697741 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 697748 - Posted: 6 Jan 2008, 0:21:07 UTC - in response to Message 697741.  
Last modified: 6 Jan 2008, 0:23:55 UTC


I just noticed a couple of my uploads weren't, so i checked the network graphs & the traffic had been dropping off drastically for the last couple of hours with the odd spike. Checked Scarecrow's graphs & it looks like there's a blockage somewhere- results waiting for assimilation is building at about 500/hr.

Have a look at Scarecrow's Server / Daemon History page - Assimilator 1 has been on strike for three of the last four hours.

[Edit - mind you, don't read too much into performance at this time of night. I think the daily stats export is quite a hit on the database.]
ID: 697748 · Report as offensive
P . P . L .
Volunteer tester

Send message
Joined: 7 Jun 03
Posts: 86
Credit: 161,216
RAC: 0
Australia
Message 697970 - Posted: 6 Jan 2008, 20:45:12 UTC

I'm getting this now, does something need a kick!

1/7/2008 7:32:30 AM|SETI@home|Sending scheduler request: To fetch work. Requesting 24264 seconds of work, reporting 0 completed tasks

1/7/2008 7:32:35 AM|SETI@home|Scheduler request failed: HTTP service unavailable

pete.


ID: 697970 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19062
Credit: 40,757,560
RAC: 67
United Kingdom
Message 698005 - Posted: 6 Jan 2008, 22:37:32 UTC

Cricket graphs taken a tumble,

ID: 698005 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 698655 - Posted: 9 Jan 2008, 17:48:07 UTC


Looks like another batch of shorties going through- forums are sluggish & a dozen downloads are stuck.
Grant
Darwin NT
ID: 698655 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 698660 - Posted: 9 Jan 2008, 17:57:22 UTC - in response to Message 698655.  


Looks like another batch of shorties going through- forums are sluggish & a dozen downloads are stuck.

Same here.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 698660 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 698669 - Posted: 9 Jan 2008, 18:28:02 UTC - in response to Message 698660.  


Looks like another batch of shorties going through- forums are sluggish & a dozen downloads are stuck.

Same here.


Who knows here, but based on what you all are reporting and the whining / complaining that Mr. Anderson is doing about credit ratios after doing all this needless tinkering with social features that weigh down the db I/O, I have knocked another 5% off of my Intel system's resource allocation to SETI. I'm now down to only 15%...
ID: 698669 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 698863 - Posted: 10 Jan 2008, 8:52:07 UTC


Just had a look at Scarecrow's graphs & it looks like there's a problem with the assimilators again. 225,000 & climbing rapidly.
Grant
Darwin NT
ID: 698863 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 698865 - Posted: 10 Jan 2008, 9:01:13 UTC - in response to Message 698863.  


Just had a look at Scarecrow's graphs & it looks like there's a problem with the assimilators again. 225,000 & climbing rapidly.


I suspect it was the wave of fast overflows from 22fe07ah that I posted about and/or something similar... Of course, there could also be a stuck result like what Matt mentioned...

:really heads toward bed now:
ID: 698865 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 700036 - Posted: 14 Jan 2008, 17:52:19 UTC


Anyone elese getting this message?
15/01/2008 3:04:39|SETI@home|Scheduler request failed: couldn't resolve host name
Forums & web site are nice & responsive, all other web sites are nice & quick. But when BOINC tries to contact the Scheduler i'm getting that error (been happening for just on 15 minutes).
Grant
Darwin NT
ID: 700036 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (6) Server Problems!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.