Message boards :
Number crunching :
Panic Mode On (80) Server Problems?
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next
Author | Message |
---|---|
Rolf Send message Joined: 16 Jun 09 Posts: 114 Credit: 7,817,146 RAC: 0 |
Inbound traffic Which traffic? |
Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9 |
Thank goodness green traffic (outbound) has diminished Finally got thru after 8 hours of trying to get a measely little 12 jobs of short shorties.:-( all 110 seconds. followed by 5 minutes later with 83 more short shorties.. I still think these shorties should be converted into CPU jobs so they do not clog the thruput as much. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Just noticed one thing in the log of one of my computers: Yeah... and my cc_config doesn't have that tag (I checked after I saw that), that's why I was suprised. Well, now the sheduler request got thru and the downloads seem to hang long enough without any progress before they time out. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Thank goodness green traffic (outbound) has diminished I began having problems connecting to the server late last night. When I finally connected early this morning, all I received were shorties. I went all day not being able to connect until just a few minutes ago. All I got were shorties. It's another Shortie Storm; 1/24/2013 2:22:59 PM | SETI@home | update requested by user 1/24/2013 2:23:04 PM | SETI@home | Sending scheduler request: Requested by user. 1/24/2013 2:23:04 PM | SETI@home | Reporting 50 completed tasks 1/24/2013 2:23:04 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and ATI 1/24/2013 2:23:24 PM | SETI@home | Computation for task 23jn12ad.19766.4975.14.10.59_0 finished 1/24/2013 2:23:24 PM | SETI@home | Starting task 25my12aa.26065.6202.6.10.85_1 using setiathome_enhanced version 609 (cuda23) in slot 4 1/24/2013 2:23:26 PM | SETI@home | Started upload of 23jn12ad.19766.4975.14.10.59_0_0 1/24/2013 2:23:30 PM | SETI@home | Finished upload of 23jn12ad.19766.4975.14.10.59_0_0 1/24/2013 2:23:50 PM | SETI@home | Scheduler request completed: got 38 new tasks .... I have since received about another dozen or so other Shorties. My other machine is basically the same story... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Let´s do some math... again. If MB is working it uses +/-70% or more of the total BW avaiable When AP start it uses another 70% to do it´s works. So is a quest of simple math in the best situation: 70+70 = 140% > 100% of the avaiable 100Mbps BW then... you all know the answer. Is crystal clear that the actual structure can´t handle both project running at the same time, or they get more BW from the actual link, 2oo Mpbs or more at least (a simple change of the 320k WU to wathever new size they choose have a high possibility to make the things worst for us who actualy can´t DL even a 320k file), or they split the DL in two separate pipes one for each project, each one with 100 Mbps at least (that will work for a few months) Just don´t see who don´t want to see... Politics, always the polictis in action. |
rob smith Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380 |
There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type. Thus for every AP WU the number of MB in the queue is reduced by 8/0.37 = 21. In reality you could probably get away with reducing the number of MB by a small number, somewhere between 15 and 20. This would however only affect one of the two problems afflicting the download system. The other is the maximum number of tasks delivered in a single hit. Given that the buffer is only 100WU it is grossly unfair that a single cruncher can get the whole buffer in one hit. My new cruncher has done that several times recently, and I'd be quite happy in only getting 50 per hit, and being able to go back again in another five minutes and get another 50WU. It would of course be far better to be able to report and collect a smaller number each time and not the stupid attempts at reporting vast numbers that I am seeing just now due to the way the connections are dropping. A situation that is NOT helped by stupid long back-offs, which actually make the situation worse not better. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here's real simple math. The longer CUDA MB tasks are 367 KBs and take around 25 minutes on my 5 year old card. The Shorties are also 367 KBs and take around 4 minutes on the same card. When running Shorties I am using well over six times the BW due to all the associated traffic with each transfer. It's not Rocket Science. Now imagine the newer cards that complete the same 367 KB Shortie in less than a minute... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type. They increased the Buffer size to 200 sometime last August, I think this is why we have a lot of problems when we get a Shortie storm, too many small Wu's going out too frequently, Claggy |
rob smith Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380 |
And do several of them at the same time. The shortie turn-around on my new cruncher is about 1 per minute, and I haven't ramped it up yet, once going at full tilt it will probably be nearer 3 per minute, without over-clocking. Add to that the 8 at time the CPU will do once it get through the load that MalariaControl deposited the other day and you get some idea of what can be done. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
musicplayer Send message Joined: 17 May 10 Posts: 2430 Credit: 926,046 RAC: 0 |
And if you happen to run the tasks that are carrying out the gaussian search, these tasks are selected from the best ones that were shorties before that. So, what are the numbers needed when it comes to spike and pulse scores and possible triplets? Apparently there is no need for any re-observation when it comes to the resends of these tasks. Only a resend of earlier tasks (to users) where the correct parameter is set in order for the given task in question to do the same thing. Same goes for vlar's, I guess. Definitely not necessary to run these tasks on every part of the sky. But for these tasks also, a gaussian score may be needed in order for a possible signal to be detected. Are we back to doing the "ordinary" tasks carrying out the gaussian search when it comes to the "best" vlar's as well? |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
Why not just offer APs for downloads on, say, Wednesdays and Saturdays only, then offer only MBs the rest of the week? I'm sure there's a reason for that to be hard to do, but I don't see it. Can anybody point out why that would be hard to do? Or not make sense? |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
One day SAH will release SETI@home v7 large workunits apps (the GPU apps will follow soon), currently tested at SAH-BETA. I don't know how much longer our PCs need to calculate this WUs, but I guess ~ 4 times longer than the currently SAH Enhanced WUs .. Maybe someone who tested already this apps/kind of new WUs can say it .. Then, this value less SAH WU DL. * Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Why not just offer APs for downloads on, say, Because there are limits in place. With only a max of 100 WUs, for example, my GPUs would chew through that amount in roughly 3 hours (that is assuming I can get 100 WUs which at the moment I can't) and then they would be sitting there idle for the rest of the day. So for 2 days a week they would not be working. However, I would tend to support the general thrust of what you are saying if the limits were removed or increased to circa 800 per GPU and 200 per CPU core. This might help to provide a buffer to assist in overcoming the firestorm on the other side when MB comes back on. rgds |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
OTOH, you're not getting any work units when the servers are overloaded anyway. 100 is better than none to my way of thinking and if it works better maybe the limits can be raised, as you say. |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
It's ironic no sooner as I posted my message below, all tasks reported. Hey, glad to see I'm not the only one that happens to. "Life is just nature's way of keeping meat fresh." - The Doctor |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
about 100 shorties just came in on one box and the other is getting them as well now ... looks like things are about to get turbulent ... |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that. Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot. Or even like it was for a while there early last year.. AP would not be issued out by the scheduler except for during certain 4-hour blocks. You would go 4 hours with absolutely no APs going out at all, and then in the next 4 hours, only MB-resends would go out, but no new ones.. it was just all AP for those 4 hours. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
One system out of GPU work, the other one running out of GPU & CPU work. Apart from the odd abberation, Scheduler requests just result in "Couldn't connect to server" messages. EDIT- if only i had gotten home from work & posted about the issues sooner. Inbound network traffic has surged, i'm now downloading work... The Scheduler would appear to be alive again. It takes at least a couple of minutes, but it's possible to connect. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
...and now that the Scheduler is working again, the network pipes are fully clogged & downloads have gone from almost 10kB/s to less than 1. Grant Darwin NT |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that. The amount of splitters seems to be be OK, MB is splitted at about the same speed as AP, better than that you won't get it. That avoids the situation we had before: few days of intensive AP splitting and lots of problems with downloads and after few days with bandwidth usage of about 70%. Now with a more or less constant ratio of available MB and AP WUs, all they need to do is to slow down the feeder a bit, so it refills the sheduler queue not as often as it is doing now. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.