Daily midnight traffic peaks

Author	Message
archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0	Message 871354 - Posted: 2 Mar 2009, 15:53:20 UTC Last modified: 2 Mar 2009, 15:54:37 UTC While I recall sometimes in the past seeing clear network traffic peaks after midnight, I think there has been little of that for some time. But for four out of the most recent five days, the network activity graphs have shown a strong peak, quickly rising to near the bandwidth limit, then, perhaps an hour later, starting a slower but still rather rapid decay back down to the 50 to 70 range which seems the short-term norm lately. Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards? Clues on the SETI server status page also are interesting. It shows AP result creation rate as well over half that of MP, though the results received rate is barely over a 20th. Taken at face value, this suggests that the very large majority of distributed AP results are erroring out, or otherwise not successfully returned. Another possibility might be that some of the AP numbers include both 5.00 and 5.03, while others are only 5.03? My anecdotal observation is that new 5.00 result issue is usually for a WU which already has one valid return, so grant credit immediately on the next return, while new 5.03 seems typically to have a multi-week wait for the quorum partner. [edit: bolded the links] ID: 871354 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 871359 - Posted: 2 Mar 2009, 16:02:04 UTC It might not be errors. It is possible that some of the faster hosts (Q9xxx), or the fast Xeons with or without a GPU, or multiple GPUs are just crunching more than the quota allows per day. At midnight, the quota starts over and those systems fill the cache up until they get the quota again. I think this has been around for quite a while..even before the first AP release. Of course with AP's occasional problems, and the file size being ~24x larger, the midnight-spike becomes amplified. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 871359 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 871448 - Posted: 2 Mar 2009, 21:37:09 UTC - in response to Message 871354. Last modified: 2 Mar 2009, 21:39:12 UTC ... Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards? Maybe, or maybe it's just hosts with huge queues they're trying to fill. With the current WUs almost all having deadlines of 3 weeks or more, there are few which the Scheduler will consider infeasible for such hosts, and the host work fetch requests are not inhibited due to work being done at high priority. Clues on the SETI server status page also are interesting. It shows AP result creation rate as well over half that of MP, though the results received rate is barely over a 20th. Taken at face value, this suggests that the very large majority of distributed AP results are erroring out, or otherwise not successfully returned. ... Scarecrow's graphs for AP and Enhanced are showing the average AP result creation rate as 0.625 and the average S@H Enhanced rate as 8.782. Both are operating in the good mode where they build the "Results ready to send" up to a high water mark, then rest for awhile[1]. That makes the most recent sample on the Server status page fairly meaningless in isolation. [1] Even when the splitters are resting, there's some Result creation from reissues. Joe ID: 871448 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 871483 - Posted: 2 Mar 2009, 22:43:41 UTC - in response to Message 871354. Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards "Midnight" for the purposes of daily quotas is Midnight, UTC. That's around 4:00pm PST (Berkeley time). The Cricket graphs look like they're on local time, which is basically UTC -8 hours. So the peak is roughly 0800 UTC -- which does not line up with the "fresh" quotas at midnight. ID: 871483 ·

archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0	Message 871485 - Posted: 2 Mar 2009, 22:46:08 UTC - in response to Message 871448. Scarecrow's graphs for AP and Enhanced are showing the average AP result creation rate as 0.625 and the average S@H Enhanced rate as 8.782. Both are operating in the good mode where they build the "Results ready to send" up to a high water mark, then rest for awhile[1]. That makes the most recent sample on the Server status page fairly meaningless in isolation. Joe Ah, I missed that completely. On the other hand, hand clicking on a small sample of actual newly downloaded AP 5.03 work does seem to show that many of the quite common _2, _3, _4 results are resends for work sent today which got a prompt client error/compute error 0.00 CPU time on hosts which are doing this repeatedly. On one case, clicking on down to the task level gets this: <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> CreateProcess() failed - A required privilege is not held by the client. (0x522) </message> ]]> in others this: <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> too many exit(0)s </message> ]]> in others this: <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> CreateProcess() failed - Access is denied. (0x5) </message> ]]> in others (actually these logged client error/downloading repeatedly, and obviously the specific file reference varies): <core_client_version>6.2.15</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>ap_06dc08ah_B1_P0_00065_20090213_06338.wu</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> </message> ]]> all these from failed quorum partners of just one host, for downloads in the last day, and in most if not all cases the offending host had failed a whole page minimum. Certainly some of this goes on all the time, but, despite my error on the creation rate evidence, I think it is rather more prevalent on AP 5.03 at the moment than the historic norm. Many of them, however, were not managing anywhere near 100*ncpu failures a day--more like 2, most likely because they had failed enough in the past to get down to the 1/day/CPU limit. ID: 871485 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 871487 - Posted: 2 Mar 2009, 22:54:33 UTC - in response to Message 871483. Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards "Midnight" for the purposes of daily quotas is Midnight, UTC. That's around 4:00pm PST (Berkeley time). The Cricket graphs look like they're on local time, which is basically UTC -8 hours. So the peak is roughly 0800 UTC -- which does not line up with the "fresh" quotas at midnight. I'm pretty sure that 'quota' resets at midnight, local time (@ server) - which in SETI's case means PST, in line with the Cricket graphs. Not that I'm about to trash 99 WUs to test that theory.... ID: 871487 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 871499 - Posted: 2 Mar 2009, 23:04:51 UTC I agree, quota resets are the most likely cause. Due to all the trashed work caches and people trying to get that app_info file correct. Boinc....Boinc....Boinc....Boinc.... ID: 871499 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004	Message 871660 - Posted: 3 Mar 2009, 8:15:41 UTC - in response to Message 871487. Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards "Midnight" for the purposes of daily quotas is Midnight, UTC. That's around 4:00pm PST (Berkeley time). The Cricket graphs look like they're on local time, which is basically UTC -8 hours. So the peak is roughly 0800 UTC -- which does not line up with the "fresh" quotas at midnight. I'm pretty sure that 'quota' resets at midnight, local time (@ server) - which in SETI's case means PST, in line with the Cricket graphs. Not that I'm about to trash 99 WUs to test that theory.... I am pretty certain that the Cricket Graphs are running on Berkeley time.....so the spikes occur at server midnight, concurrent with the reset of the daily quotas. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 871660 ·

Rudy Volunteer tester Send message Joined: 23 Jun 99 Posts: 189 Credit: 794,998 RAC: 0	Message 871755 - Posted: 3 Mar 2009, 15:59:51 UTC Last modified: 3 Mar 2009, 16:00:16 UTC Looking at the scarecrow graphs for results returned (7 day trends), it looks like most of the traffic spikes are coming from AP. Both the AP results returned spikes up and the crunch time spikes down at midnight. Could there be a large number of crunchers with corrupt AP applications or app info still out there? ID: 871755 ·

PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1	Message 871770 - Posted: 3 Mar 2009, 16:31:58 UTC - in response to Message 871448. ... Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards? Maybe, or maybe it's just hosts with huge queues they're trying to fill. With the current WUs almost all having deadlines of 3 weeks or more, there are few which the Scheduler will consider infeasible for such hosts, and the host work fetch requests are not inhibited due to work being done at high priority. Joe Where can I find a description of how the deadlines are computed? There seems to be a lot of noise on the message boards (not surprising!). Thx. ID: 871770 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 871774 - Posted: 3 Mar 2009, 16:41:55 UTC - in response to Message 871770. ... Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards? Maybe, or maybe it's just hosts with huge queues they're trying to fill. With the current WUs almost all having deadlines of 3 weeks or more, there are few which the Scheduler will consider infeasible for such hosts, and the host work fetch requests are not inhibited due to work being done at high priority. Joe Where can I find a description of how the deadlines are computed? There seems to be a lot of noise on the message boards (not surprising!). Thx. There's a table here, in the middle of the thread Estimates and Deadlines revisited which documents the research which Joe initiated in December 2007. ID: 871774 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 871866 - Posted: 4 Mar 2009, 1:10:52 UTC - in response to Message 871660. Last modified: 4 Mar 2009, 1:11:28 UTC I am pretty certain that the Cricket Graphs are running on Berkeley time.....so the spikes occur at server midnight, concurrent with the reset of the daily quotas. I'm in the same time zone, it's just after 5:00pm and the Cricket Graph page says Last updated at Tue Mar 3 17:08:43 2009. If they were on GMT, it'd be more like Wed Mar 4 01:08:43 2009. I don't use Cricket, I use MRTG -- same author. MRTG updates every five minutes. ID: 871866 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 871903 - Posted: 4 Mar 2009, 2:35:47 UTC I just noticed something... Mark, I think you must be looking at the wrong router interface for the cricket graph.. supposed to be 2_3, not 5_1. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 871903 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 871944 - Posted: 4 Mar 2009, 5:47:05 UTC - in response to Message 871903. I just noticed something... Mark, I think you must be looking at the wrong router interface for the cricket graph.. supposed to be 2_3, not 5_1. Either works for the big picture, but 5_1 seems to have some percentage of other traffic than ours, for instance it has a 123 MBits/sec peak reading currently which would be a neat trick through the 100 MBits/sec link. Joe ID: 871944 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004	Message 871959 - Posted: 4 Mar 2009, 8:04:24 UTC - in response to Message 871944. Last modified: 4 Mar 2009, 8:10:00 UTC I just noticed something... Mark, I think you must be looking at the wrong router interface for the cricket graph.. supposed to be 2_3, not 5_1. Either works for the big picture, but 5_1 seems to have some percentage of other traffic than ours, for instance it has a 123 MBits/sec peak reading currently which would be a neat trick through the 100 MBits/sec link. Joe If I recall correctly, the other difference is that the 2_3 graph is inside out....in other words, what it shows as 'in' is actually coming in from Seti (and maybe some other Berkeley traffic and then sent out, and what it shows as 'out' is actually inbound from us and out to the Seti servers. The 5_1 graph actually shows outbound from Seti and inbound from us. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 871959 ·

Odan Send message Joined: 8 May 03 Posts: 91 Credit: 15,331,177 RAC: 0	Message 872026 - Posted: 4 Mar 2009, 13:59:24 UTC - in response to Message 871959. If I recall correctly, the other difference is that the 2_3 graph is inside out....in other words, what it shows as 'in' is actually coming in from Seti (and maybe some other Berkeley traffic and then sent out, and what it shows as 'out' is actually inbound from us and out to the Seti servers. The 5_1 graph actually shows outbound from Seti and inbound from us. Yes, you are right; 2_3 is inside out. You will notice that a few days ago when SETI was maxed out trying to send out AP V5 the green "in" graph was maxed out. ID: 872026 ·

Virtual Boss* Volunteer tester Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0	Message 872036 - Posted: 4 Mar 2009, 14:38:25 UTC Could there be a large number of crunchers with corrupt AP applications or app info still out there? My answer would be YES 2 out of 6 WU's I got have hosts that are trashing every WU they get. Hosts 4077778 and 4069312. ID: 872036 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 872038 - Posted: 4 Mar 2009, 14:46:05 UTC - in response to Message 872036. Last modified: 4 Mar 2009, 14:46:20 UTC 1170594009 417760101 21 Feb 2009 20:15:02 UTC 1 Mar 2009 20:51:41 UTC Over Client error Compute error 464,726.30 614.09 --- So big time wasted... It seems it would be better if AP would be opt in rather than opt out. That is, disabled by default... It doesn't work OK for set-and-forget hosts... Some user attention required at least to [re]install opt app. Stock app crash looks too devastating indeed. ID: 872038 ·

Rudy Volunteer tester Send message Joined: 23 Jun 99 Posts: 189 Credit: 794,998 RAC: 0	Message 872041 - Posted: 4 Mar 2009, 14:50:10 UTC - in response to Message 872036. My answer would be YES 2 out of 6 WU's I got have hosts that are trashing every WU they get. very interesting looking at when they are trashing, it seems that the quota reset is every 25 hours from the last contact and not at midnight ID: 872041 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 872044 - Posted: 4 Mar 2009, 14:56:33 UTC - in response to Message 872041. My answer would be YES 2 out of 6 WU's I got have hosts that are trashing every WU they get. very interesting looking at when they are trashing, it seems that the quota reset is every 25 hours from the last contact and not at midnight Not quite. The quota is reset at midnight: but if BOINC is backed off because of quota restrictions, it backs off to a random time sometime in the first hour after the reset. If everyone's client called in for a new quota at exactly 00:00:01, we'd see a much sharper spike than the one we're discussing - BOINC is trying to flatten it out a bit. ID: 872044 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.