Message boards :
Number crunching :
Panic Mode On (80) Server Problems?
Message board moderation
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13947 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Grant ? Grant Darwin NT |
Cosmic_Ocean ![]() Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 ![]() ![]() |
Not entirely sure what that is about, either. I do remember they tried changing the IP and also the ISP the scheduler listened on. The server was still in the closet and was still hooked up to the same internal network as all the other servers, but it had an IP for the campus ISP, and DNS was of course updated to reflect that. My memory is a little fuzzy, but I think that made things worse somehow, but I don't recall just how. There also may have been some kind of issue with remote-login since the scheduler was now on a different subnet, which would require someone to actually go into the lab. If we could possibly get our soft-limit of 100mbit increased to 150, that would probably fix just about everything regarding communications. That won't fix the database having I/O performance issues, or getting fragmented and bloated, so limits may still be required, but maybe the limits could be increased a little, like 50% to start with and see what happens after a week. Then add another 50%, and so on. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22799 Credit: 416,307,556 RAC: 380 ![]() ![]() |
A quick sum Number of MB tasks produced per second ~60 (based on an average production rate of 30WU/s) Amount of MB data to be transferred per second = 60*366 = 22000KB Now that's only 22MB per second, which leaves a fair bit of change from the 100KBs pipe. So what is gobbling up the other 78MB??? My sums ignore overheads, even if these run at 100% of the "real" data there is something having a fair old feast at the expense of S@H's link between the lab and the outside world.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
A quick sum Check your bits and bytes. 22 MegaBytes (normal unit for file sizes and storage) is a lot more than 100 Megabits (normal unit for communications channels) |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
A quick sum He would also be ignoring AP WUs as well........ "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22799 Credit: 416,307,556 RAC: 380 ![]() ![]() |
In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51540 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? It seems to be usually when the larger AP WUs are added to the download mix that things get rather tied up. I have noticed at times that it appears that AP downloads, although still slow, seem to be less likely to stall or hang, thus tying up the download link longer. "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22799 Credit: 416,307,556 RAC: 380 ![]() ![]() |
An AP is about 22 times the size of an MB. It could be that the presence of a feed of APs just trips things over the line. Likewise a high demand, such as a shortie storm has the same effect. A small perturbation is just enough to upset the scheduler, which causes a higher number of "rejects" than normal, and so the snowball of delays and retries grows. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? AP's are ~20 times larger than MB, but only take about 6 times the amount of time to process. The 100Mb pipe is often sufficient for standard MB tasks when there isn't a large volume of shorties. Add in AP or batches of shorties and then it does get choked. Hopefully the work towards larger MB tasks will help take some of the load off of the line. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
Some random thoughts in the evening.. .. are AP and MB work units generated on some machine and then copied over network to a distribution server? If so, are they loaded to the downlod server using the same network card/interface that is used by users to download work units to their machines? if so, could it be that the generator/copier saturates the channel? if not so, how about the disk read/write speed of the download machine? Simultaneous red/write operations could hurt RAID performance. The writes alone are quite costly. But I guess that you have ruled these out already. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0 ![]() |
Can someone recommend a good reliable source to get a paid proxy that would work with SETI? |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
IIRC most of, if not all, the servers use a Fibre Channel interconnect to the storage array. They have seen the FC network become saturated before, but that was from some changes they were trying I believe. Most of that kind of stuff gets posted in Technical News. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
Can someone recommend a good reliable source to get a paid proxy that would work with SETI? I am sure you could find a private paid proxy to use, but you might want to hit up the free ones first. http://www.xroxy.com/proxylist.php?port=&type=&ssl=&country=US&latency=&reliability=&sort=port#table SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Few weeks already my main host is almost constantly out of work from SETI. BOINC big download backofs make impossible to fill cache. Only when I have time to constantly press "retry now" I can fill cache for day or 2 and usually only for GPU, CPU remains empty/on backup project. SETI apps news We're not gonna fight them. We're gonna transcend them. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Been not watching closely over the traditional Australia Day long weekend chaos, and my machines were crunching when I looked occasionally. If I had stuck transfers I just put this retryMainTransfers.cmd in my scheduled tasks for every 20 mins or so: @ECHO OFF boinccmd --get_file_transfers > mainxfers.txt FOR /F "tokens=1,2" %%i IN (mainxfers.txt) DO ( IF "%%i" EQU "name:" echo %%j IF "%%i" EQU "name:" boinccmd --file_transfer http://setiathome.berkeley.edu/ %%j retry ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223 ![]() ![]() |
Been not watching closely over the traditional Australia Day long weekend chaos, and my machines were crunching when I looked occasionally. If I had stuck transfers I just put this retryMainTransfers.cmd in my scheduled tasks for every 20 mins or so: Similarly, I have this as a crontab entry on my Linux boxes, and Windows running cygwin: [eesridr:~] > cat retryfiles pgrep boinc > /dev/null if [ $? -eq 0 ] # Test exit status of "pgrep" command. then cd ~/BOINC/ ./boinccmd --get_file_transfers | gawk -f retry.awk fi [eesridr:~] > cat BOINC/retry.awk /name/ { n = $2;} / xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");} ![]() ![]() ![]() |
ExchangeMan Send message Joined: 9 Jan 00 Posts: 115 Credit: 157,719,104 RAC: 0 ![]() |
I have something very similar to this for the same purpose. Gotta love DOS programming. |
![]() ![]() Send message Joined: 25 May 99 Posts: 944 Credit: 52,956,491 RAC: 67 ![]() ![]() |
Dip in traffic towards Seti detected? Yes, definitely a downturn. Expect failing reports after a good day of rapid access. [edit] The thin blue line has hit the bottom - no more reporting until later, I fear. [end edit] ![]() ![]() |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19691 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Dip in traffic towards Seti detected? That is not the bottom, it is the 10Mb horizontal. The weekly graph shows there is still a bit to go. But yes, there is a problem. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13947 Credit: 208,696,464 RAC: 304 ![]() ![]() |
But yes, there is a problem. Yep, Scheduler borked again. "Couldn't connect to server" once again the standard response. Grant Darwin NT |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.