Message boards :
Number crunching :
it's the AP Splitter processes killing the Scheduler
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
and when i put the proxy back in: 14/11/2012 23:06:30 | | Using proxy info from GUI 14/11/2012 23:06:30 | | Using HTTP proxy 8.21.6.225:80 14/11/2012 23:06:31 | SETI@home Beta Test | [sched_op] Starting scheduler request 14/11/2012 23:06:31 | SETI@home Beta Test | Sending scheduler request: To fetch work. 14/11/2012 23:06:31 | SETI@home Beta Test | Reporting 1 completed tasks 14/11/2012 23:06:31 | SETI@home Beta Test | Requesting new tasks for CPU and NVIDIA 14/11/2012 23:06:31 | SETI@home Beta Test | [sched_op] CPU work request: 100188.82 seconds; 0.00 devices 14/11/2012 23:06:31 | SETI@home Beta Test | [sched_op] NVIDIA work request: 60562.04 seconds; 0.00 devices 14/11/2012 23:06:47 | SETI@home Beta Test | Scheduler request completed: got 10 new tasks 14/11/2012 23:06:47 | SETI@home Beta Test | [sched_op] Server version 701 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.29784.11115.140733193388042.14.195_1 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.29784.11115.140733193388042.14.227_1 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.12_1 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.29784.11115.140733193388042.14.252_1 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.33_0 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.34_0 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.35_0 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.36_0 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.62_1 14/11/2012 23:06:47 | SETI@home Beta Test | Resent lost task 05ap10al.3278.18477.140733193388041.14.63_0 14/11/2012 23:06:47 | SETI@home Beta Test | Project requested delay of 7 seconds 14/11/2012 23:06:47 | SETI@home Beta Test | [sched_op] estimated total CPU task duration: 0 seconds 14/11/2012 23:06:47 | SETI@home Beta Test | [sched_op] estimated total NVIDIA task duration: 48103 seconds 14/11/2012 23:06:47 | SETI@home Beta Test | [sched_op] handle_scheduler_reply(): got ack for task 05ap10al.8345.16023.9.14.190_0 14/11/2012 23:06:47 | SETI@home Beta Test | [sched_op] Deferring communication for 7 sec 14/11/2012 23:06:47 | SETI@home Beta Test | [sched_op] Reason: requested by project Claggy |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here's my recent experience. I was able to stir up some action, but had to switch back for the download. The upload worked fine with the proxy. That was another AP 604 I just got... 14-Nov-2012 18:04:05 [---] Project communication failed: attempting access to reference site 14-Nov-2012 18:04:05 [SETI@home] Scheduler request failed: Timeout was reached 14-Nov-2012 18:04:08 [---] Internet access OK - project servers may be temporarily down. 14-Nov-2012 18:05:15 [---] Using proxy info from GUI 14-Nov-2012 18:05:15 [---] Using HTTP proxy 8.21.6.225:80 14-Nov-2012 18:05:24 [SETI@home] update requested by user 14-Nov-2012 18:05:26 [SETI@home] Sending scheduler request: Requested by user. 14-Nov-2012 18:05:26 [SETI@home] Reporting 14 completed tasks 14-Nov-2012 18:05:26 [SETI@home] Requesting new tasks for ATI 14-Nov-2012 18:05:40 [SETI@home] Scheduler request completed: got 1 new tasks 14-Nov-2012 18:05:40 [SETI@home] Resent lost task ap_03se12ac_B6_P1_00146_20121114_13217.wu_1 14-Nov-2012 18:05:42 [SETI@home] Started download of ap_03se12ac_B6_P1_00146_20121114_13217.wu 14-Nov-2012 18:07:37 [SETI@home] Computation for task 29au12ab.29554.3339.140733193388043.10.38_0 finished 14-Nov-2012 18:07:37 [SETI@home] Starting task 29au12ab.29505.3339.140733193388042.10.6_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 1 14-Nov-2012 18:07:39 [SETI@home] Started upload of 29au12ab.29554.3339.140733193388043.10.38_0_0 14-Nov-2012 18:07:52 [SETI@home] Computation for task 29au12ab.29505.3339.140733193388042.10.6_0 finished 14-Nov-2012 18:07:52 [SETI@home] Starting task 29au12ab.29554.3339.140733193388043.10.2_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 1 14-Nov-2012 18:07:54 [SETI@home] Started upload of 29au12ab.29505.3339.140733193388042.10.6_0_0 14-Nov-2012 18:07:55 [SETI@home] Finished upload of 29au12ab.29554.3339.140733193388043.10.38_0_0 14-Nov-2012 18:08:18 [SETI@home] Finished upload of 29au12ab.29505.3339.140733193388042.10.6_0_0 14-Nov-2012 18:08:57 [---] Using proxy info from GUI 14-Nov-2012 18:08:57 [---] Not using a proxy 14-Nov-2012 18:09:29 [---] Project communication failed: attempting access to reference site 14-Nov-2012 18:09:29 [SETI@home] Temporarily failed download of ap_03se12ac_B6_P1_00146_20121114_13217.wu: transient HTTP error 14-Nov-2012 18:09:29 [SETI@home] Backing off 3 min 17 sec on download of ap_03se12ac_B6_P1_00146_20121114_13217.wu 14-Nov-2012 18:09:31 [---] Internet access OK - project servers may be temporarily down. 14-Nov-2012 18:09:39 [SETI@home] Started download of ap_03se12ac_B6_P1_00146_20121114_13217.wu 14-Nov-2012 18:10:45 [SETI@home] Sending scheduler request: To fetch work. 14-Nov-2012 18:10:45 [SETI@home] Reporting 2 completed tasks 14-Nov-2012 18:10:45 [SETI@home] Requesting new tasks for CPU 14-Nov-2012 18:11:57 [SETI@home] Computation for task 29au12ab.29554.3339.140733193388043.10.2_0 finished 14-Nov-2012 18:11:57 [SETI@home] Starting task 29au12ab.29554.3339.140733193388043.10.36_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 1 14-Nov-2012 18:11:59 [SETI@home] Started upload of 29au12ab.29554.3339.140733193388043.10.2_0_0 14-Nov-2012 18:12:04 [SETI@home] Finished upload of 29au12ab.29554.3339.140733193388043.10.2_0_0 14-Nov-2012 18:14:42 [SETI@home] Finished download of ap_03se12ac_B6_P1_00146_20121114_13217.wu |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here's another. All of those are CPU tasks. 14-Nov-2012 18:30:33 [---] Project communication failed: attempting access to reference site 14-Nov-2012 18:30:33 [SETI@home] Scheduler request failed: Timeout was reached 14-Nov-2012 18:30:35 [---] Internet access OK - project servers may be temporarily down. 14-Nov-2012 18:32:40 [SETI@home] Computation for task 29au12ab.20898.20926.140733193388046.10.12_0 finished 14-Nov-2012 18:32:40 [SETI@home] Starting task 29au12ab.20898.20926.140733193388046.10.17_1 using setiathome_enhanced version 610 (cuda_fermi) in slot 1 14-Nov-2012 18:32:42 [SETI@home] Started upload of 29au12ab.20898.20926.140733193388046.10.12_0_0 14-Nov-2012 18:33:17 [SETI@home] Finished upload of 29au12ab.20898.20926.140733193388046.10.12_0_0 14-Nov-2012 18:35:15 [---] Using proxy info from GUI 14-Nov-2012 18:35:15 [---] Using HTTP proxy 8.21.6.225:80 14-Nov-2012 18:35:19 [SETI@home] update requested by user 14-Nov-2012 18:35:23 [SETI@home] Sending scheduler request: Requested by user. 14-Nov-2012 18:35:23 [SETI@home] Reporting 8 completed tasks 14-Nov-2012 18:35:23 [SETI@home] Requesting new tasks for CPU 14-Nov-2012 18:35:40 [SETI@home] Scheduler request completed: got 20 new tasks 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.97_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.103_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.161_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.163_0 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.161_0 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.138_0 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.135_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.182_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.164_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.165_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.168_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.171_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.213_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.198_0 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.209_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25213.22734.140733193388039.10.201_0 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.216_0 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.222_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.215_1 14-Nov-2012 18:35:40 [SETI@home] Resent lost task 31au12aa.25244.22734.140733193388040.10.208_1 14-Nov-2012 18:35:43 [SETI@home] Started download of 31au12aa.25244.22734.140733193388040.10.97 14-Nov-2012 18:35:43 [SETI@home] Started download of 31au12aa.25213.22734.140733193388039.10.103 14-Nov-2012 18:36:51 [SETI@home] Computation for task 29au12ab.20898.20926.140733193388046.10.17_1 finished 14-Nov-2012 18:36:51 [SETI@home] Starting task 29au12ab.20898.20926.140733193388046.10.20_0 using setiathome_enhanced version 610 (cuda_fermi) in slot 1 14-Nov-2012 18:36:53 [SETI@home] Started upload of 29au12ab.20898.20926.140733193388046.10.17_1_0 14-Nov-2012 18:37:04 [SETI@home] Finished upload of 29au12ab.20898.20926.140733193388046.10.17_1_0 14-Nov-2012 18:37:45 [---] Using proxy info from GUI 14-Nov-2012 18:37:45 [---] Not using a proxy 14-Nov-2012 18:38:21 [---] Suspending network activity - user request 14-Nov-2012 18:38:27 [---] Resuming network activity 14-Nov-2012 18:38:27 [SETI@home] Started download of 31au12aa.25244.22734.140733193388040.10.97 14-Nov-2012 18:38:27 [SETI@home] Started download of 31au12aa.25213.22734.140733193388039.10.103 14-Nov-2012 18:38:39 [SETI@home] Finished download of 31au12aa.25213.22734.140733193388039.10.103 14-Nov-2012 18:38:39 [SETI@home] Started download of 31au12aa.25244.22734.140733193388040.10.161 14-Nov-2012 18:38:53 [SETI@home] Finished download of 31au12aa.25244.22734.140733193388040.10.161 14-Nov-2012 18:38:53 [SETI@home] Started download of 31au12aa.25213.22734.140733193388039.10.163.... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Claggy That´s what i try to say for weeks, the AP Splitters just triger the problem, maybe just maybe Synergy can´t do all the task because some "mistery reason" (realy makes no diference what is the reason for most of the users, memory disk I/O, etc.). When AP splitters are runing (obviously because AP WU are producing) nothing works, but when you put a proxie, everyting works (at least until the proxie kick us because we use to much bandwith) so the problem is not only the bandwith, is something else. Because that i made the sugestion to stop all AP Spliters and then start one at a time, so the problem will apear and then it will be easy to point and fix, but nobody hear-me. Some time the trial and error metodology works and easely fix a major problem. I belive nothig could be loose if they try... FYI the DL with the proxie i show are now slow because a lot of users start to use that proxie because i send the info for few heavy crunchers members of our team, but it works very fast (DL >100kbps at my end) in the past days. I think the admins of that proxie will kick us soon. (edit) One last info i don´t crunch AP so can´t tell if this proxie works fine with AP work, just know it works ok for MB. Another info, i have 3 diferent ISP (2 cable and 1 ADSL all 10MBPS nominal) conection, on one of them (ADSL) the DL with this proxie still at >100kbps in the other 2 lines the DL are at 5kbps why? i have no ideia. |
Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9 |
Gee it sure would be nice if they could set up a Proxy server we could try at the other end of the LAB Link. We know proxy servers and changes to TCP Optimization seem to help. Smoothing packet flow over the 100Mbit link may all that is needed. Wishful thinking? or too much trouble to implement? PS - Proxy works fine (up to same point as MB) for AP processing |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Claggy You're not listening, I don't think the problem is anything to do with Synergy, or the AP splitters, more a general Networking problem maybe 5+ miles from the Lab, scheduler contacts have been slow for some time, with AP being downloaded it's a lot worse, If one moment you can't get anything more than one or two tasks sent at a time, then you switch to a proxy, and you can get ~80 tasks sent at once, it just proves Synergy is handling everything fine: 15/11/2012 00:25:58 SETI@home [sched_op_debug] Starting scheduler request 15/11/2012 00:25:58 SETI@home Sending scheduler request: Requested by user. 15/11/2012 00:25:58 SETI@home Reporting 5 completed tasks, requesting new tasks for CPU and GPU 15/11/2012 00:25:58 SETI@home [sched_op_debug] CPU work request: 1136294.24 seconds; 0.00 CPUs 15/11/2012 00:25:58 SETI@home [sched_op_debug] NVIDIA GPU work request: 259879.94 seconds; 0.00 GPUs 15/11/2012 00:25:58 SETI@home [sched_op_debug] ATI GPU work request: 0.00 seconds; 0.00 GPUs 15/11/2012 00:26:35 SETI@home Scheduler request completed: got 79 new tasks 15/11/2012 00:26:35 SETI@home [sched_op_debug] Server version 701 15/11/2012 00:26:35 SETI@home Message from server: No tasks are available for the applications you have selected 15/11/2012 00:26:35 SETI@home Message from server: No tasks are available for AstroPulse v6 15/11/2012 00:26:35 SETI@home Message from server: Your preferences allow tasks from applications other than those selected 15/11/2012 00:26:35 SETI@home Message from server: Sending tasks from other applications 15/11/2012 00:26:35 SETI@home Project requested delay of 303 seconds 15/11/2012 00:26:35 SETI@home [sched_op_debug] estimated total CPU job duration: 24083 seconds 15/11/2012 00:26:35 SETI@home [sched_op_debug] estimated total NVIDIA GPU job duration: 11662 seconds 15/11/2012 00:26:35 SETI@home [sched_op_debug] estimated total ATI GPU job duration: 0 seconds 15/11/2012 00:26:35 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 28se12ab.10111.8656.140733193388040.10.203_1 15/11/2012 00:26:35 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 28se12ab.10111.11928.140733193388040.10.199_1 15/11/2012 00:26:35 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 28se12ab.10111.11928.140733193388040.10.187_1 15/11/2012 00:26:35 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 28se12ab.10111.11928.140733193388040.10.181_1 15/11/2012 00:26:35 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 27au12ab.30805.803588.140733193388042.10.213_1 15/11/2012 00:26:35 SETI@home [sched_op_debug] Deferring communication for 5 min 3 sec 15/11/2012 00:26:35 SETI@home [sched_op_debug] Reason: requested by project Claggy |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, I just had a major problem. I don't know if the proxy thing had anything to do with it or not. I wasn't connected to the proxy when it happened. After finding all those lost files, it downloaded an even larger number. While it was downloading, AVG2013 launched a 'scheduled scan'. After the last file downloaded, I got a notice that the State file couldn't be written and BOINC crashed. It left the ATI app running, I had to kill that. BOINC wouldn't connect to client, then Explorer crashed... I had to restart and CCC hung. I finally got it restarted and everything seems fine. Whats Up With That? Strange... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
You're not listening, I don't think the problem is anything to do with Synergy, or the AP splitters, more a general Networking problem maybe 5+ miles from the Lab, scheduler contacts have been slow for some time, with AP being downloaded it's a lot worse, Looking by this point i must agree with you, the source of the problem must be in some place between the Synergy server and the HE network, and with the use of a proxy it simply stops. Then with that info the source of the problem could be easy pointed and fix by a network technics don´t you agree? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
You're not listening, I don't think the problem is anything to do with Synergy, or the AP splitters, more a general Networking problem 5+ miles from the Lab, scheduler contacts have been slow for some time, with AP being downloaded it's a lot worse, Well, everything except handling its own communications in a timely fashion when placed under heavy load by running the AP splitter processes and heaven knows what else. As I'm sure everybody reading this thread knows, computer-to-computer communications are handled in 'packets' - quite small, under 1500 bytes at a time. Think of a postcard. The sending computer writes its postcards (several hundreds or even thousands of them, for the sort of files we deal with here), and gives each one a unique serial number. That means that the receiving computer can shuffle the pack into the right order, no matter what sort of a mess the postcards arrive in. The receiving computer also sends a quick "OK, got it" reply back, quoting the serial number. If the sending computer doesn't get that ACKnowledgement that the packet got through, it tries (or is supposed) to try again. From my very quick and non-expert session with Wireshark this evening, it seems to me that, just possibly, the sequence is: We send a request to Synergy, saying what we're reporting and what we're requesting. That seems to get through fairly well, and Synergy processes our request. Then, Synergy starts to send out the reply. My computers seemed to get the first one or two postcards OK, and duly sent their 'ACK' messages back. But Synergy didn't seem to know that the first messages had got through, and re-sent the same ones. And my computers sent back 'I know, I've got that one already'. And after a few exchanges like that, the entire conversation ground to a halt. So, the weak point in the system seems to be those 'ACK' messages returned from our computers to Synergy, meaning "we're listening, do go on". If Claggy's analysis is right, then maybe - just maybe (I said this was a non-expert reading) - the proxy servers are geared up to receive the packets and send the critical 'ACK' replies more quickly: they arrive while Synergy is still listening, whereas our own 'ACK's from the far corners of the globe take longer to arrive, and by then Synergy has stopped looking out for them, distracted by the next flurry of incoming requests. It's just a theory, and I don't have the slighest idea how to fine-tune a heavily loaded server to avoid missing those ACKs - but it's the only explanation I can think of which comes close to bridging the gap between the "it's the splitters" and the "it's all comms" camps. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Richard An excelent point, that could explain all, another path to follow. I belive is easy to test your theory and finaly fix the problem if that is realy the source of the problem. That was the best explanation i see for the problem that realy show why the problem could happens, and why the proxy works, congrats for the ideia. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
You're not listening, I don't think the problem is anything to do with Synergy, or the AP splitters, more a general Networking problem maybe 5+ miles from the Lab, scheduler contacts have been slow for some time, with AP being downloaded it's a lot worse, I also question whether the proxy is using the Hurricane Link at all, I'm getting downloads of up to 75KBs at the moment from the proxy, switch back to normal and i'm lucky to get 5KBs Claggy |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
You're not listening, I don't think the problem is anything to do with Synergy, or the AP splitters, more a general Networking problem maybe 5+ miles from the Lab, scheduler contacts have been slow for some time, with AP being downloaded it's a lot worse, The Richard hypotheses easely explain that. Rememeber the old DOS days? if you have so many interrupts your system simply could not manage all. In the modern days of high end servers and CPUs with highly optimized multitasking OS that could normaly don´t happens but Synergy could be overloaded with all the work it handles. Is easy to test the theory, put to work only the AP splitters on Lando and if everything works all is explained. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Hey - another thought on this whole mess - what the f@#k do we do with this giant shorty storm going on now - all my resends are shorties. THAT means for each WU sent and processed and returned I'm using about 4-5 times the bandwidth I would with normal-sized WUs. Why is the data being split that way, and what good is this horrid mess doing the science? |
Horacio Send message Joined: 14 Jan 00 Posts: 536 Credit: 75,967,266 RAC: 0 |
So, the weak point in the system seems to be those 'ACK' messages returned from our computers to Synergy, meaning "we're listening, do go on". Good thinking... It happened that once my hosts retrieved all the gosts and reached the limits the contacts with the scheduler started to work "normally" (you know, normal in relative SETI terms) without using the proxy... it seems that when the scheduller has nothing to send (I guess it is a shorter response due to an almost empty list of tasks) the conection works much better... which supports the theory about Synergy dropping/losing ACKs... Anyway, there is something that I dont get... why the scheduller started to assign new work to hosts that had ghost? Its something that has been happenning unnoticed until now? Was the awfull ratio of unsuccessfull RPCs what scaled the number of ghosts out of proportion or there is something else to look for? About fine tunning the scheduller... If it were about Synergy (or the scheduller process) beeing too bussy, Is not possible to have two (or more) schedullers? I mean something like scheduller 1 assign workunits from the subset of the ones with odd IDs and scheduler 2 the others or something alike that will allow to the schedullers to be a bit more patient with each connection... (But from my little knowledge about TCP connections, it could be loosing ACKs due to a wide range of things starting with a trivial setting about how much concurrent connections the OS can (or is configured) to hanlde, up to some weird route loop on a falty router placed anywhere around the world...) |
W-K 666 Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67 |
Hey - another thought on this whole mess - what the f@#k do we do with this giant shorty storm going on now - all my resends are shorties. THAT means for each WU sent and processed and returned I'm using about 4-5 times the bandwidth I would with normal-sized WUs. The data isn't being deliberately split into shorties (VHAR's). The data comes from the telescope as shorties. And Seti has no control over the telescope. The receivers we use are just piggy backed onto the telescope and look at the bit of sky it happens to be pointed at. WLAR's - are when the telescope is tracking one bit of sky, Lots of data on subject Normal mid range - are when the telescope is parked and the tracking is a result of the earths rotation. Good for guassian processing VHAR's - are got when the telescope is ordered to scan large area's of sky quickly. Only picks up the very strongest pulse signals. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Ok... so where are these things kept track-of on Synergy? Perhaps Synergy is hearing the reply but doesn't know what to do with it. Is that hardware (cache) or system RAM? |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
What sort of NIC is in Synergy? Anybody remember? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Could you do the same test with the AP-splitters stoped? and/or with the use of a proxie... that could be very interesting... I made that suggestion a while back in the wish list section. Apparently the campus won't allow it. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Just to add to the data, i used the proxy suggested earlier in the thread. After nothing but Scheduler timeouts, i got work. From request to Scheduler response- 20 seconds. Same again on the second request for work- a response within 20 seconds. Tried it on my other system, 15 seconds to get a response from the Scheduler request, after nothing but timeouts. 2nd request for work- response within 15 seconds. Download speed around 50kB/s or better. Disabled the proxy on the first system, waited for it to try & get work again. Scheduler timeout. Seti has always been odd in regards to using a Proxy- even when network traffic is maxed out & downloads are almost impossible when not using a proxy (and even with the hosts file set to use the good download server) using a proxy has always resulted in good download speeds. I just stopped using them because usually after a few days, the proxy gets taken down/blocked & you have to find another one. Grant Darwin NT |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Right you are, sir. And when the AP SPLITTER quits, but there is still AP work being distributed, all of your Scheduler attempts won't time-out if you aren't using a proxy. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.