Panic Mode On (80) Server Problems?

Author	Message
Rolf Send message Joined: 16 Jun 09 Posts: 114 Credit: 7,817,146 RAC: 0	Message 1330874 - Posted: 24 Jan 2013, 19:42:22 UTC - in response to Message 1330873. Inbound traffic Which traffic? ID: 1330874 ·

Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9	Message 1330876 - Posted: 24 Jan 2013, 19:57:43 UTC Last modified: 24 Jan 2013, 20:01:45 UTC Thank goodness green traffic (outbound) has diminished Finally got thru after 8 hours of trying to get a measely little 12 jobs of short shorties.:-( all 110 seconds. followed by 5 minutes later with 83 more short shorties.. I still think these shorties should be converted into CPU jobs so they do not clog the thruput as much. ID: 1330876 ·

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 1330891 - Posted: 24 Jan 2013, 20:39:07 UTC - in response to Message 1330858. Just noticed one thing in the log of one of my computers: 24/01/2013 19:21:16 SETI@home Sending scheduler request: To fetch work. 24/01/2013 19:21:16 SETI@home Reporting 4 completed tasks, requesting new tasks 24/01/2013 19:23:17 Project communication failed: attempting access to reference site 24/01/2013 19:23:21 Internet access OK - project servers may be temporarily down. 24/01/2013 19:23:21 SETI@home Scheduler request failed: Timeout was reached Is the timeout not supposed to be 5 minutes and not just 2? Timeout is whatever you have configured locally in place of <http_transfer_timeout>seconds</http_transfer_timeout> abort HTTP transfers if idle for this many seconds; default 300 Yeah... and my cc_config doesn't have that tag (I checked after I saw that), that's why I was suprised. Well, now the sheduler request got thru and the downloads seem to hang long enough without any progress before they time out. ID: 1330891 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1330893 - Posted: 24 Jan 2013, 20:41:36 UTC - in response to Message 1330876. Last modified: 24 Jan 2013, 20:48:49 UTC Thank goodness green traffic (outbound) has diminished Finally got thru after 8 hours of trying to get a measely little 12 jobs of short shorties.:-( all 110 seconds. followed by 5 minutes later with 83 more short shorties.. I still think these shorties should be converted into CPU jobs so they do not clog the thruput as much. I began having problems connecting to the server late last night. When I finally connected early this morning, all I received were shorties. I went all day not being able to connect until just a few minutes ago. All I got were shorties. It's another Shortie Storm; 1/24/2013 2:22:59 PM \| SETI@home \| update requested by user 1/24/2013 2:23:04 PM \| SETI@home \| Sending scheduler request: Requested by user. 1/24/2013 2:23:04 PM \| SETI@home \| Reporting 50 completed tasks 1/24/2013 2:23:04 PM \| SETI@home \| Requesting new tasks for CPU and NVIDIA and ATI 1/24/2013 2:23:24 PM \| SETI@home \| Computation for task 23jn12ad.19766.4975.14.10.59_0 finished 1/24/2013 2:23:24 PM \| SETI@home \| Starting task 25my12aa.26065.6202.6.10.85_1 using setiathome_enhanced version 609 (cuda23) in slot 4 1/24/2013 2:23:26 PM \| SETI@home \| Started upload of 23jn12ad.19766.4975.14.10.59_0_0 1/24/2013 2:23:30 PM \| SETI@home \| Finished upload of 23jn12ad.19766.4975.14.10.59_0_0 1/24/2013 2:23:50 PM \| SETI@home \| Scheduler request completed: got 38 new tasks .... I have since received about another dozen or so other Shorties. My other machine is basically the same story... ID: 1330893 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1330899 - Posted: 24 Jan 2013, 21:02:16 UTC Last modified: 24 Jan 2013, 21:03:20 UTC LetÂ´s do some math... again. If MB is working it uses +/-70% or more of the total BW avaiable When AP start it uses another 70% to do itÂ´s works. So is a quest of simple math in the best situation: 70+70 = 140% > 100% of the avaiable 100Mbps BW then... you all know the answer. Is crystal clear that the actual structure canÂ´t handle both project running at the same time, or they get more BW from the actual link, 2oo Mpbs or more at least (a simple change of the 320k WU to wathever new size they choose have a high possibility to make the things worst for us who actualy canÂ´t DL even a 320k file), or they split the DL in two separate pipes one for each project, each one with 100 Mbps at least (that will work for a few months) Just donÂ´t see who donÂ´t want to see... Politics, always the polictis in action. ID: 1330899 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380	Message 1330907 - Posted: 24 Jan 2013, 21:23:20 UTC There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type. Thus for every AP WU the number of MB in the queue is reduced by 8/0.37 = 21. In reality you could probably get away with reducing the number of MB by a small number, somewhere between 15 and 20. This would however only affect one of the two problems afflicting the download system. The other is the maximum number of tasks delivered in a single hit. Given that the buffer is only 100WU it is grossly unfair that a single cruncher can get the whole buffer in one hit. My new cruncher has done that several times recently, and I'd be quite happy in only getting 50 per hit, and being able to go back again in another five minutes and get another 50WU. It would of course be far better to be able to report and collect a smaller number each time and not the stupid attempts at reporting vast numbers that I am seeing just now due to the way the connections are dropping. A situation that is NOT helped by stupid long back-offs, which actually make the situation worse not better. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1330907 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1330908 - Posted: 24 Jan 2013, 21:26:16 UTC Here's real simple math. The longer CUDA MB tasks are 367 KBs and take around 25 minutes on my 5 year old card. The Shorties are also 367 KBs and take around 4 minutes on the same card. When running Shorties I am using well over six times the BW due to all the associated traffic with each transfer. It's not Rocket Science. Now imagine the newer cards that complete the same 367 KB Shortie in less than a minute... ID: 1330908 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1330910 - Posted: 24 Jan 2013, 21:32:08 UTC - in response to Message 1330907. Last modified: 24 Jan 2013, 21:45:03 UTC There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type. They increased the Buffer size to 200 sometime last August, I think this is why we have a lot of problems when we get a Shortie storm, too many small Wu's going out too frequently, Claggy ID: 1330910 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380	Message 1330912 - Posted: 24 Jan 2013, 21:34:29 UTC And do several of them at the same time. The shortie turn-around on my new cruncher is about 1 per minute, and I haven't ramped it up yet, once going at full tilt it will probably be nearer 3 per minute, without over-clocking. Add to that the 8 at time the CPU will do once it get through the load that MalariaControl deposited the other day and you get some idea of what can be done. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1330912 ·

musicplayer Send message Joined: 17 May 10 Posts: 2430 Credit: 926,046 RAC: 0	Message 1330944 - Posted: 24 Jan 2013, 22:09:31 UTC Last modified: 24 Jan 2013, 22:20:40 UTC And if you happen to run the tasks that are carrying out the gaussian search, these tasks are selected from the best ones that were shorties before that. So, what are the numbers needed when it comes to spike and pulse scores and possible triplets? Apparently there is no need for any re-observation when it comes to the resends of these tasks. Only a resend of earlier tasks (to users) where the correct parameter is set in order for the given task in question to do the same thing. Same goes for vlar's, I guess. Definitely not necessary to run these tasks on every part of the sky. But for these tasks also, a gaussian score may be needed in order for a possible signal to be detected. Are we back to doing the "ordinary" tasks carrying out the gaussian search when it comes to the "best" vlar's as well? ID: 1330944 ·

bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0	Message 1330956 - Posted: 24 Jan 2013, 22:28:34 UTC - in response to Message 1330912. Why not just offer APs for downloads on, say, Wednesdays and Saturdays only, then offer only MBs the rest of the week? I'm sure there's a reason for that to be hard to do, but I don't see it. Can anybody point out why that would be hard to do? Or not make sense? ID: 1330956 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1330957 - Posted: 24 Jan 2013, 22:36:34 UTC Last modified: 24 Jan 2013, 22:40:39 UTC One day SAH will release SETI@home v7 large workunits apps (the GPU apps will follow soon), currently tested at SAH-BETA. I don't know how much longer our PCs need to calculate this WUs, but I guess ~ 4 times longer than the currently SAH Enhanced WUs .. Maybe someone who tested already this apps/kind of new WUs can say it .. Then, this value less SAH WU DL. * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * ID: 1330957 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1330958 - Posted: 24 Jan 2013, 22:40:15 UTC - in response to Message 1330956. Last modified: 24 Jan 2013, 22:41:44 UTC Why not just offer APs for downloads on, say, Wednesdays and Saturdays only, then offer only MBs the rest of the week? I'm sure there's a reason for that to be hard to do, but I don't see it. Can anybody point out why that would be hard to do? Or not make sense? Because there are limits in place. With only a max of 100 WUs, for example, my GPUs would chew through that amount in roughly 3 hours (that is assuming I can get 100 WUs which at the moment I can't) and then they would be sitting there idle for the rest of the day. So for 2 days a week they would not be working. However, I would tend to support the general thrust of what you are saying if the limits were removed or increased to circa 800 per GPU and 200 per CPU core. This might help to provide a buffer to assist in overcoming the firestorm on the other side when MB comes back on. rgds ID: 1330958 ·

bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0	Message 1330964 - Posted: 24 Jan 2013, 23:05:07 UTC - in response to Message 1330958. OTOH, you're not getting any work units when the servers are overloaded anyway. 100 is better than none to my way of thinking and if it works better maybe the limits can be raised, as you say. ID: 1330964 ·

Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22	Message 1330969 - Posted: 24 Jan 2013, 23:30:43 UTC - in response to Message 1330824. It's ironic no sooner as I posted my message below, all tasks reported. Hey, glad to see I'm not the only one that happens to. "Life is just nature's way of keeping meat fresh." - The Doctor ID: 1330969 ·

Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597	Message 1330982 - Posted: 25 Jan 2013, 0:42:56 UTC - in response to Message 1330969. about 100 shorties just came in on one box and the other is getting them as well now ... looks like things are about to get turbulent ... ID: 1330982 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1331024 - Posted: 25 Jan 2013, 3:50:06 UTC About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that. Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot. Or even like it was for a while there early last year.. AP would not be issued out by the scheduler except for during certain 4-hour blocks. You would go 4 hours with absolutely no APs going out at all, and then in the next 4 hours, only MB-resends would go out, but no new ones.. it was just all AP for those 4 hours. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1331024 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1331066 - Posted: 25 Jan 2013, 6:33:02 UTC - in response to Message 1331024. Last modified: 25 Jan 2013, 7:12:50 UTC One system out of GPU work, the other one running out of GPU & CPU work. Apart from the odd abberation, Scheduler requests just result in "Couldn't connect to server" messages. EDIT- if only i had gotten home from work & posted about the issues sooner. Inbound network traffic has surged, i'm now downloading work... The Scheduler would appear to be alive again. It takes at least a couple of minutes, but it's possible to connect. Grant Darwin NT ID: 1331066 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1331076 - Posted: 25 Jan 2013, 7:36:41 UTC - in response to Message 1331066. ...and now that the Scheduler is working again, the network pipes are fully clogged & downloads have gone from almost 10kB/s to less than 1. Grant Darwin NT ID: 1331076 ·

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 1331092 - Posted: 25 Jan 2013, 8:38:34 UTC - in response to Message 1331024. About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that. Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot. The amount of splitters seems to be be OK, MB is splitted at about the same speed as AP, better than that you won't get it. That avoids the situation we had before: few days of intensive AP splitting and lots of problems with downloads and after few days with bandwidth usage of about 70%. Now with a more or less constant ratio of available MB and AP WUs, all they need to do is to slow down the feeder a bit, so it refills the sheduler queue not as often as it is doing now. ID: 1331092 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.