Message boards :
Number crunching :
Panic Mode On (5) Server Problems!
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 18 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Yep, looks as though someone's been able to kick start things. I'll give it a few more hours to settle down & then enable network access again. Grant Darwin NT |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
I still have a couple stuck but most are getting through. Guess the two stuck are just being stubborn. Guess I'll worry about it when I sober.....uhh wake up. PROUD MEMBER OF Team Starfire World BOINC |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Just re-enabled network access; even though traffic is still extremely high (up & down) all results returned first try, no errors & same for new work downloading. Grant Darwin NT |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
It looks like Richard Haselgrove's picture was taken just before the end of the workday - at ten-'til-four, PM. It wasn't ten-twenty AM because the hourhand would've been positioned incorrectly. It would've been dark had the PMs and AMs been reversed. I conclude that the cops would have had a happy day unless all there had drunken in moderation or had taken a bus, taxi or train home. |
Dingo Send message Joined: 28 Jun 99 Posts: 104 Credit: 16,364,896 RAC: 1 |
It looks like Richard Haselgrove's picture was taken just before the end of the workday - at ten-'til-four, PM. It wasn't ten-twenty AM because the hourhand would've been positioned incorrectly. It would've been dark had the PMs and AMs been reversed. I conclude that the cops would have had a happy day unless all there had drunken in moderation or had taken a bus, taxi or train home. Seti comes back up quickly usually and it is very stable. I think that the admins do a great under pressure from all of us jumping in when the first sign of work not coming to us. Proud Founder and member of Have a look at my WebCam |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Just had a look at the network graphs- things aren't looking healthy at all. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
And the splitters all went splat before the server status page last updated itself, three hours ago.... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
And the splitters all went splat before the server status page last updated itself, three hours ago.... Good thing it's Monday over there, another 4-5 hours & they can inspect the patient physically. Grant Darwin NT |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Please....Mister.... How's about just a few "good ole" stuck wus. Ah...the stuck wu.....those were the "good days"..... LOL |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49 |
And the splitters all went splat before the server status page last updated itself, three hours ago.... As long as the Patient doesn't turn into a Munster named Herman, Couldn't resist :D, Hopefully what is off can be turned back on. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
And the splitters all went splat before the server status page last updated itself, three hours ago.... Maybe the patient should be named Abby Normal. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
And the splitters all went splat before the server status page last updated itself, three hours ago.... ... or Hans Delbrück. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
What is it about the weekends? Are people making their caches even bigger still? That should increase the moaning about pending credit even further. Network traffic has had a case of the stutters for about 19 hours, the Ready to Send buffer has been steadily dropping, the Result Creation Rate has increased in response, but still the In Progress number continues to climb (almost up to 2.2 million). I've been getting quite a few short Work Units lately, but i wouldn't have thought that alone would cause such a large increase in the demand for work. Grant Darwin NT |
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
I've been getting quite a few short Work Units lately, but i wouldn't have thought that alone would cause such a large increase in the demand for work. It would cause an absolutely huge swing in demand if everyone were running short queues. I infer from the fairly rapid swing that an appreciable fraction of the compute resource is short-queued. Since we spend a lot of time here decrying the "selfish" long-queue folks, perhaps this is a good moment to recognize that they are a moderating influence on this particular aspect of the problem. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65746 Credit: 55,293,173 RAC: 49 |
I've been getting quite a few short Work Units lately, but i wouldn't have thought that alone would cause such a large increase in the demand for work. The bandwidth does seem a bit limited during the last download, Most of the time It's system connect, It's not really affecting My quads, But My dual cores lose some processing ability and PC5(E4300) is trying to get 5 WU's, Where as PC1(QX6700) only wants one WU right now. Below is an example from PC1, I put PC5 online to help increase My crunching and so far It hasn't done much besides try and climb a slippery slope very slowly before sliding back some. :( 9/21/2007 7:42:14 PM|SETI@home|[file_xfer] Started download of file 04mr07ab.20521.4162.15.6.51 9/21/2007 7:42:18 PM|SETI@home|Sending scheduler request: To report completed tasks 9/21/2007 7:42:18 PM|SETI@home|Reporting 1 tasks 9/21/2007 7:42:23 PM|SETI@home|Scheduler RPC succeeded [server version 511] 9/21/2007 7:42:23 PM|SETI@home|Deferring communication 11 sec, because requested by project 9/21/2007 7:43:01 PM|SETI@home|[file_xfer] Finished download of file 04mr07ab.20521.4162.15.6.51 9/21/2007 7:43:01 PM|SETI@home|[file_xfer] Throughput 8273 bytes/sec I almost feel like going to No New Tasks and aborting the potential downloads for a while. :( The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
The bandwidth does seem a bit limited during the last download, Most of the time It's system connect, Yep, starting to get some "system connect" errors here as well, although so far they are all downloading OK after a couple of attempts. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I've been getting quite a few short Work Units lately, but i wouldn't have thought that alone would cause such a large increase in the demand for work. I'm one of the ones who has been implicitly criticising excessively large cache sizes, but I don't think that cache size, per se, affects work demand in the way you imply. Whether your cache is set at 0.1 day or 10 days, BOINC will request new work when the threshhold is crossed - when the work on hand drops below 0.099 or 9.999 days respectively. It's the size of the WU that's issued that determines how far above threshhold the queue becomes, and hence how long your computer will 'go quiet' in terms of work requests. If the scheduler is issuing 3-hour WUs (at your estimated crunch speed), then your work requests will be inhibited for 3 hours. If the scheduler is issuing 30-minute WUs, you'll be back for more in half an hour. There's a secondary effect because of the variable accuracy of the 'crunchability' estimates for different WUs, reflected in RDCF. When you complete a particularly indigestible WU, RDCF jumps up. If you have a large cache, that makes a big (absolute) difference to the estimated amount of work on hand, and work requests will be inhibited for a long time - several hours or even days. Then, as you crunch on 'sweeter' WUs, RDCF will fall, and you will start to request top-up work at a slightly greater rate than your actual crunching speed. So the work requests of large-cache hosts will be slightly more erratic, but the variation depends on what is being crunched, not on what is being issued. Demand is decoupled from supply, and I don't think the RDCF effect has a significant impact on server performance. I'm concerned about the (very) large caches because of the stress they put on the Berkeley servers. Results in "progress" peaked at 2,155,052 overnight. That must put a tremendous strain on every component of the work allocation and download sub-system: everything from feeder queries to file system directory lookups. I would suggest that, even with the outages we've seen recently, a 2 or 3 day cache would be a more rational choice. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I would suggest that, even with the outages we've seen recently, a 2 or 3 day cache would be a more rational choice. My cache is set to 4 days, which with the RDCF moving about due to different Work Units it gives me around a 3.5 day turnaround time. I've only run out of work twice in the last 3 years or so. Grant Darwin NT |
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0 |
Since we spend a lot of time here decrying the "selfish" long-queue folks, perhaps this is a good moment to recognize that they are a moderating influence on this particular aspect of the problem. Thanks Richard. I think I had that wrong. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Since we spend a lot of time here decrying the "selfish" long-queue folks, perhaps this is a good moment to recognize that they are a moderating influence on this particular aspect of the problem. Consider a machine which does work of AR (angle range) 0.41 in about 3 hours and has a DCF (Duration Correction Factor) which happens to be right for that AR. Then suppose the splitters start producing AR 1.5 work. The estimates say they will take 1/6 the time of AR 0.41 or 1/2 hour. So a request for 3 hours of work will get 6 WUs. Because the actual crunch time is about 1/4 the time of AR 0.41, completion of one of the AR 1.5 WUs will increase DCF by about 50%. If the queue looked like 1 day just before completion of that WU, it looks like 1.5 days after. That inhibits work request for about half a day if the desired queue was 1 day, and larger queue settings magnify that effect. However, a host with a queue of 2 days may download a lot of the AR 1.5 WUs before actually crunching one. A host with maximized queue settings may go into EDF (Earliest Deadline First) and get the DCF adjustment fairly soon. Joe |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.