Daily midnight traffic peaks |
![]() |
| log in |
Message boards : Number crunching : Daily midnight traffic peaks
1 · 2 · Next
| Author | Message |
|---|---|
|
While I recall sometimes in the past seeing clear network traffic peaks after midnight, I think there has been little of that for some time. But for four out of the most recent five days, the network activity graphs have shown a strong peak, quickly rising to near the bandwidth limit, then, perhaps an hour later, starting a slower but still rather rapid decay back down to the 50 to 70 range which seems the short-term norm lately. | |
| ID: 871354 · | |
|
It might not be errors. It is possible that some of the faster hosts (Q9xxx), or the fast Xeons with or without a GPU, or multiple GPUs are just crunching more than the quota allows per day. At midnight, the quota starts over and those systems fill the cache up until they get the quota again. | |
| ID: 871359 · | |
... Maybe, or maybe it's just hosts with huge queues they're trying to fill. With the current WUs almost all having deadlines of 3 weeks or more, there are few which the Scheduler will consider infeasible for such hosts, and the host work fetch requests are not inhibited due to work being done at high priority. Clues on the SETI server status page also are interesting. It shows AP result creation rate as well over half that of MP, though the results received rate is barely over a 20th. Taken at face value, this suggests that the very large majority of distributed AP results are erroring out, or otherwise not successfully returned. Scarecrow's graphs for AP and Enhanced are showing the average AP result creation rate as 0.625 and the average S@H Enhanced rate as 8.782. Both are operating in the good mode where they build the "Results ready to send" up to a high water mark, then rest for awhile[1]. That makes the most recent sample on the Server status page fairly meaningless in isolation. [1] Even when the splitters are resting, there's some Result creation from reissues. Joe | |
| ID: 871448 · | |
Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards "Midnight" for the purposes of daily quotas is Midnight, UTC. That's around 4:00pm PST (Berkeley time). The Cricket graphs look like they're on local time, which is basically UTC -8 hours. So the peak is roughly 0800 UTC -- which does not line up with the "fresh" quotas at midnight. ____________ | |
| ID: 871483 · | |
Scarecrow's graphs for AP and Enhanced are showing the average AP result creation rate as 0.625 and the average S@H Enhanced rate as 8.782. Both are operating in the good mode where they build the "Results ready to send" up to a high water mark, then rest for awhile[1]. That makes the most recent sample on the Server status page fairly meaningless in isolation.Ah, I missed that completely. On the other hand, hand clicking on a small sample of actual newly downloaded AP 5.03 work does seem to show that many of the quite common _2, _3, _4 results are resends for work sent today which got a prompt client error/compute error 0.00 CPU time on hosts which are doing this repeatedly. On one case, clicking on down to the task level gets this: <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> CreateProcess() failed - A required privilege is not held by the client. (0x522) </message> ]]> in others this: <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> too many exit(0)s </message> ]]> in others this: <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> CreateProcess() failed - Access is denied. (0x5) </message> ]]> in others (actually these logged client error/downloading repeatedly, and obviously the specific file reference varies): <core_client_version>6.2.15</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>ap_06dc08ah_B1_P0_00065_20090213_06338.wu</file_name> <error_code>-119</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> </message> ]]> all these from failed quorum partners of just one host, for downloads in the last day, and in most if not all cases the offending host had failed a whole page minimum. Certainly some of this goes on all the time, but, despite my error on the creation rate evidence, I think it is rather more prevalent on AP 5.03 at the moment than the historic norm. Many of them, however, were not managing anywhere near 100*ncpu failures a day--more like 2, most likely because they had failed enough in the past to get down to the 1/day/CPU limit. ____________ | |
| ID: 871485 · | |
Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards I'm pretty sure that 'quota' resets at midnight, local time (@ server) - which in SETI's case means PST, in line with the Cricket graphs. Not that I'm about to trash 99 WUs to test that theory.... | |
| ID: 871487 · | |
|
I agree, quota resets are the most likely cause. Due to all the trashed work caches and people trying to get that app_info file correct. | |
| ID: 871499 · | |
Can it be that enough hosts are still chewing up their daily 100/cpu limit daily to create this demand spike as they go on to their next day of discards I am pretty certain that the Cricket Graphs are running on Berkeley time.....so the spikes occur at server midnight, concurrent with the reset of the daily quotas. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 871660 · | |
|
Looking at the scarecrow graphs for results returned (7 day trends), it looks like most of the traffic spikes are coming from AP. Both the AP results returned spikes up and the crunch time spikes down at midnight. | |
| ID: 871755 · | |
... Where can I find a description of how the deadlines are computed? There seems to be a lot of noise on the message boards (not surprising!). Thx. | |
| ID: 871770 · | |
... There's a table here, in the middle of the thread Estimates and Deadlines revisited which documents the research which Joe initiated in December 2007. | |
| ID: 871774 · | |
I'm in the same time zone, it's just after 5:00pm and the Cricket Graph page says Last updated at Tue Mar 3 17:08:43 2009. If they were on GMT, it'd be more like Wed Mar 4 01:08:43 2009. I don't use Cricket, I use MRTG -- same author. MRTG updates every five minutes. ____________ | |
| ID: 871866 · | |
|
I just noticed something... Mark, I think you must be looking at the wrong router interface for the cricket graph.. supposed to be 2_3, not 5_1. | |
| ID: 871903 · | |
I just noticed something... Mark, I think you must be looking at the wrong router interface for the cricket graph.. supposed to be 2_3, not 5_1. Either works for the big picture, but 5_1 seems to have some percentage of other traffic than ours, for instance it has a 123 MBits/sec peak reading currently which would be a neat trick through the 100 MBits/sec link. Joe | |
| ID: 871944 · | |
I just noticed something... Mark, I think you must be looking at the wrong router interface for the cricket graph.. supposed to be 2_3, not 5_1. If I recall correctly, the other difference is that the 2_3 graph is inside out....in other words, what it shows as 'in' is actually coming in from Seti (and maybe some other Berkeley traffic and then sent out, and what it shows as 'out' is actually inbound from us and out to the Seti servers. The 5_1 graph actually shows outbound from Seti and inbound from us. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 871959 · | |
If I recall correctly, the other difference is that the 2_3 graph is inside out....in other words, what it shows as 'in' is actually coming in from Seti (and maybe some other Berkeley traffic and then sent out, and what it shows as 'out' is actually inbound from us and out to the Seti servers. Yes, you are right; 2_3 is inside out. You will notice that a few days ago when SETI was maxed out trying to send out AP V5 the green "in" graph was maxed out. | |
| ID: 872026 · | |
Could there be a large number of crunchers with corrupt AP applications or app info still out there? My answer would be YES 2 out of 6 WU's I got have hosts that are trashing every WU they get. Hosts 4077778 and 4069312. | |
| ID: 872036 · | |
|
1170594009 417760101 21 Feb 2009 20:15:02 UTC 1 Mar 2009 20:51:41 UTC Over Client error Compute error 464,726.30 614.09 --- | |
| ID: 872038 · | |
very interesting looking at when they are trashing, it seems that the quota reset is every 25 hours from the last contact and not at midnight | |
| ID: 872041 · | |
My answer would be YES Not quite. The quota is reset at midnight: but if BOINC is backed off because of quota restrictions, it backs off to a random time sometime in the first hour after the reset. If everyone's client called in for a new quota at exactly 00:00:01, we'd see a much sharper spike than the one we're discussing - BOINC is trying to flatten it out a bit. | |
| ID: 872044 · | |
Message boards : Number crunching : Daily midnight traffic peaks
| Copyright © 2013 University of California |