Message boards :
Number crunching :
Panic Mode On (28) Server problems
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 16 · Next
Author | Message |
---|---|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Maybe I did not state my case clearly....... You made your case perfectly clear, and I understood it perfectly. You want all the available bandwidth, and if doing so slows everyone down including you then that's what you really want. The technical term for what you want is a "Denial of Service Attack." |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
I want it the plonk the servers anytime it needs to get work or report it. The reason it takes a couple of days to clear is because of all the attempts to get work or report it. If that didn't happen, what takes 2 days to recover from would only take half a day. It's the continuous retries that cause the problem. Grant Darwin NT |
FiveHamlet Send message Joined: 5 Oct 99 Posts: 783 Credit: 32,638,578 RAC: 0 |
Looks to me like another shorties storm. Got over 400 on Rig on a Bench and 200 plus on my other main cruncher. Dave |
Dave Send message Joined: 29 Mar 02 Posts: 778 Credit: 25,001,396 RAC: 0 |
What about the client "plinking" the server after a random time interval e.g it could be 1 min, could be 30, could be 3 hours? |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
What about the client "plinking" the server after a random time interval e.g it could be 1 min, could be 30, could be 3 hours? For anyone who is really interested in this, I'd recommend reading RFC-2821 (you can Google for it), because internet E-Mail has all of the same issues. The section on "sending strategies" (section 4.5.4) goes straight to what the BOINC client is trying to do with the BOINC servers. As a goal, you want the minimum number of connections per second to the server that are required to fully use the available resources. Double that number and everything takes twice as long, but the same number of "things" happens per minute, so staying on the low edge gives some room for bursts and etc. The project-wide backoff idea comes right out of RFC-2821. It says: Retries continue until the message is transmitted or the sender gives up; the give-up time generally needs to be at least 4-5 days. The parameters to the retry algorithm MUST be configurable. A client SHOULD keep a list of hosts it cannot reach and corresponding connection timeouts, rather than just retrying queued mail items. Experience suggests that failures are typically transient (the target system or its connection has crashed), favoring a policy of two connection attempts in the first hour the message is in the queue, and then backing off to one every two or three hours. The second paragraph is the interesting one. The idea is that if the receiving mail server can't accept mail right this second, the next message to them in the queue isn't likely to succeed if we send it right now. (I would not recommend a 4-5 day timeout for BOINC, it does not share that with SMTP) If the client backoff was fairly extreme (or took due-date into account so that those uploads for work due in two weeks was on a more leisurely schedule) the load(s) on the SETI@Home servers would not have the peaks and valleys. But it would leave the average cruncher shocked and concerned because they've never seen how their E-Mail is handled, but they can see what the BOINC client is doing to try to send work. ... and it's the same issue, especially at busy sites. Implement some BIG backoffs, and watch everything go through on the second try. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Not really a panic, but the Cricket graphs seem to have gone wonky. "Time is simply the mechanism that keeps everything from happening all at once." |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Not really a panic, but the Cricket graphs seem to have gone wonky. Someone mentioned that in the tech news post earlier. I was thinking it looked like the update job was still going, but just not collecting data. I've realized that when that normally happens the graph will stay at whatever level it was when it last updated. The current graph shows mostly nothing, literally "Cur: nan bits/sec". Tho that could just be how cricket responds to not getting new data. I use MRTG instead of cricket. So I don't really know its ins and outs. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
It's not just SETI's cricket graph that is down, I tried looking at a few other Berkeley routers and they are all missing data for the same period. |
52 Aces Send message Joined: 7 Jan 02 Posts: 497 Credit: 14,261,068 RAC: 67 |
2/3/2010 1:57:12 PM SETI@home Requesting new tasks for CPU Something just sucked away the spare inventory. It dropped fast. Maybe Higley school district is back online :-) Good news is lots of 'tapes' still being split. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Good news is lots of 'tapes' still being split. I would't be sure about it. Current result creation rate: 3.3043/sec. That are probably just resends. EDIT: Can have something to do with "Workunits waiting for assimilation: 321,084". If they are not getting assimilated, they cannot be deleted -> no disc space. AFAIR we had that at least once. |
W-K 666 Send message Joined: 18 May 99 Posts: 19314 Credit: 40,757,560 RAC: 67 |
Good news is lots of 'tapes' still being split. I think you are correct. For the last few hours, since 05:45 utc, the only msg i get when requesting work, is; no work from project. The server status page, is not reporting problems, except the "Workunits waiting for assimilation" numbers. And the cricket graph is all over the place. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Oh dear - the server status page hasn't updated since 09:20 UTC and the cricket graph has taken a dive... F. |
Matthew S. McCleary Send message Joined: 9 Sep 99 Posts: 121 Credit: 2,288,242 RAC: 0 |
"Message from server: (Project has no jobs available)" on several of my crunchers. Server status page hasn't been updated in six hours, but reports 51,000 multibeam available. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Oh dear - the server status page hasn't updated since 09:20 UTC and the cricket graph has taken a dive... The cricket is not chirping much......meow. "Time is simply the mechanism that keeps everything from happening all at once." |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I'd guess it's related to the issue Matt posted in the Tech News last night/this morning. Seems the science DB is being a bit fussy. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
"Message from server: (Project has no jobs available)" on several of my crunchers. At least there are some generous people out there. Somebody detached a 32-core SUN SPARC-Enterprise just as the WUs ran out, and donated 400 tasks to the common good. |
Matthew S. McCleary Send message Joined: 9 Sep 99 Posts: 121 Credit: 2,288,242 RAC: 0 |
"Message from server: (Project has no jobs available)" on several of my crunchers. Man, that guy has two 64-CPU, two 32-CPU, and two 8-CPU Suns. Wish I had that kind of hardware to monkey with. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
"Message from server: (Project has no jobs available)" on several of my crunchers. They may not be out of work, or tapes. Could just be the process feeding the feeder has stopped. Without the pages displaying the server status updating it's unknown what might be going on. Uploads & reporting it going on w/o any problems tho. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Luke Send message Joined: 31 Dec 06 Posts: 2546 Credit: 817,560 RAC: 0 |
"Message from server: (Project has no jobs available)" on several of my crunchers. Server status page hasn't been updated in six hours, but reports 51,000 multibeam available. Same problem here. uploading & reporting are fine though. Good time to give my cache a purge. I've set NNT on all machines. Perhaps I'll run my laptop on PrimeGrid for a few days, once I'm out of S@H tasks. - Luke. |
FiveHamlet Send message Joined: 5 Oct 99 Posts: 783 Credit: 32,638,578 RAC: 0 |
Secondary science database has been disabled and Server Staus page has just got up to date. Things might start to recover soon. With some luck and a fare wind. Now getting Project is temp shut down for maintenence message. Thank's to the team. Dave |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.