Message boards :
Number crunching :
Panic Mode On (79) Server Problems?
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 22 · Next
Author | Message |
---|---|
Mark Fiske Send message Joined: 15 Aug 11 Posts: 713 Credit: 7,392,921 RAC: 0 |
Well, I wasn't expecting this but I just got 94 CPU WU's out of the blue. Happy Camper! Mark |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Timeouts, Timeouts, Timeouts!!!!!!! Back to Timeouts, Timeouts, Timeouts!!!!!!! again. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Timeouts, Timeouts, Timeouts!!!!!!! I think it's now just throwing random errors. Timeouts, couldn't connect & failure when receiving data from the peer depending on the mood it's in. I've even received a No tasks sent, but that was on some GPU requests- i got a whole bunch of VLARs on one CPU request so at least that particular response makes sense. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches. Then i've got the hassle of finding a working proxy, then finding a new one every few days when the working one nolonger does. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Just noticed that the AP Science Database & Assimilators haven't been running for a few days- lots of work to be assimilated is backing up. Grant Darwin NT |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it. 11/25/2012 1:01:03 PM | SETI@home | Sending scheduler request: To fetch work. 11/25/2012 1:01:03 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI 11/25/2012 1:01:25 PM | | Project communication failed: attempting access to reference site 11/25/2012 1:01:25 PM | SETI@home | Scheduler request failed: Couldn't connect to server 11/25/2012 1:01:26 PM | | Internet access OK - project servers may be temporarily down. 11/25/2012 1:03:06 PM | SETI@home | Sending scheduler request: To fetch work. 11/25/2012 1:03:06 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI 11/25/2012 1:03:29 PM | | Project communication failed: attempting access to reference site 11/25/2012 1:03:29 PM | SETI@home | Scheduler request failed: Couldn't connect to server 11/25/2012 1:03:30 PM | | Internet access OK - project servers may be temporarily down. 11/25/2012 1:06:11 PM | SETI@home | Sending scheduler request: To fetch work. 11/25/2012 1:06:11 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI 11/25/2012 1:06:34 PM | | Project communication failed: attempting access to reference site 11/25/2012 1:06:34 PM | SETI@home | Scheduler request failed: Couldn't connect to server 11/25/2012 1:06:35 PM | | Internet access OK - project servers may be temporarily down. 11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work. 11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI 11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site 11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server 11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down. "Life is just nature's way of keeping meat fresh." - The Doctor |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it. I've been getting a lot of can't connect errors here too. Not on your end. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it. See what I mean... VOODOO! 11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work. 11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI 11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site 11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server 11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down. 11/25/2012 1:27:50 PM | SETI@home | Sending scheduler request: To fetch work. 11/25/2012 1:27:50 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI 11/25/2012 1:29:03 PM | SETI@home | Scheduler request completed: got 11 new tasks "Life is just nature's way of keeping meat fresh." - The Doctor |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches. As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps... |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it. getting a lot of that here as well...suspect that many of us will be... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches. That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one? Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Even with all the wierdness going on, my systems have managed to stay busy while at work. And while the inbound network traffic has been rather odd (little peaks here & there & gradually increasing overall) since coming back up after the multiple Scheduler breakdowns, there have been a couple of significant dips while i was away. And they also affected the download traffic. Grant Darwin NT |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30651 Credit: 53,134,872 RAC: 32 |
Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches. Because as Eric stated there is a problem upstream from SSL possibly in the Campus tunnel. Has nothing to do with pipe size. Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts? IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next? |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches. maximum throughput (down to us) is governed by the maximum rate at which work can be created and sent...that is the natural ceiling...given a maximum down rate you can approximate a maximum up rate based on average returned work unit size...the pipe should be wider than these to allow for other things such as overhead/management traffic... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts? Maybe, maybe not. When the Scheduler was using the campus network, it was responding in less than 7 seconds, often within 2-4 So it would appear the network congestion is a factor- remove it & no more ghosts at all. IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next? Keep in mind if there were a 5 fold increase in available bandwidth, the load on the servers would drop 5 times faster. The load would probably be less than it is now becasue there wouldn't be all the re-tries going on, or the acccumulation of ghosts. I have no doubt we'd find some new major problem sooner rather than later, but it would erase completely several existing ones. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Inbound & outbound traffic is plummeting. Hopefully it'll recover again like it's done twice previously. *fingers crossed* Grant Darwin NT |
musicplayer Send message Joined: 17 May 10 Posts: 2430 Credit: 926,046 RAC: 0 |
And I got a new batch of jobs coming my way. Thanks! |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
It would be fun to play some other game for a while. Let's try it and see! |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I've been speculating for at least two years now that if we were able to increase the pipe to even just 200mbit, I'm not sure the scheduler/feeder would handle it. Even two years ago when GPUs were much slower and less common, the database was having trouble keeping up. Of course, we have better hardware now, but I think it's significantly more likely we'll run into a software limitation, on top of the disk I/O limitation. Bigger pipe will likely cause more issues without some sort of restraint (per-host limits are a simple way to do it, but there are better ways.. like server-side cache size based on DCF). It was a good idea to run the scheduler on a different link, as long as that can be reliable. It will at least allow a high rate of successful contacts to report work and be assigned new work, and then you just have to fight for bandwidth on the download link, which in the grand scheme of things, isn't that huge of an issue. You wouldn't end up with ghosts, you'd just end up with 10+ hour back-offs, but you can over-come those with some manual intervention, or some less draconian exponential back-off calculations in the client itself. Maybe once the scheduler reliability issues get sorted out, we can possibly test the database's capability to keep up for a 24-hour period by updating some DNS records and ramp the bandwidth up? Maybe pick a Saturday or we have the winter holidays coming up where the campus will be empty except a select few faculty members. Could do a 1-3 day test on 200+ mbit then, assuming the red tape can be removed temporarily for such a thing. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.