Panic Mode On (80) Server Problems?

Author	Message
rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1332304 - Posted: 28 Jan 2013, 19:24:17 UTC An AP is about 22 times the size of an MB. It could be that the presence of a feed of APs just trips things over the line. Likewise a high demand, such as a shortie storm has the same effect. A small perturbation is just enough to upset the scheduler, which causes a higher number of "rejects" than normal, and so the snowball of delays and retries grows. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1332304 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1332309 - Posted: 28 Jan 2013, 19:30:29 UTC - in response to Message 1332294. In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor? It seems to be usually when the larger AP WUs are added to the download mix that things get rather tied up. I have noticed at times that it appears that AP downloads, although still slow, seem to be less likely to stall or hang, thus tying up the download link longer. AP's are ~20 times larger than MB, but only take about 6 times the amount of time to process. The 100Mb pipe is often sufficient for standard MB tasks when there isn't a large volume of shorties. Add in AP or batches of shorties and then it does get choked. Hopefully the work towards larger MB tasks will help take some of the load off of the line. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1332309 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1332341 - Posted: 28 Jan 2013, 20:55:15 UTC - in response to Message 1332294. Some random thoughts in the evening.. .. are AP and MB work units generated on some machine and then copied over network to a distribution server? If so, are they loaded to the downlod server using the same network card/interface that is used by users to download work units to their machines? if so, could it be that the generator/copier saturates the channel? if not so, how about the disk read/write speed of the download machine? Simultaneous red/write operations could hurt RAID performance. The writes alone are quite costly. But I guess that you have ruled these out already. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1332341 ·

fscheel Send message Joined: 13 Apr 12 Posts: 73 Credit: 11,135,641 RAC: 0	Message 1332347 - Posted: 28 Jan 2013, 21:14:54 UTC - in response to Message 1331980. Can someone recommend a good reliable source to get a paid proxy that would work with SETI? ID: 1332347 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1332348 - Posted: 28 Jan 2013, 21:27:26 UTC - in response to Message 1332341. Some random thoughts in the evening.. .. are AP and MB work units generated on some machine and then copied over network to a distribution server? If so, are they loaded to the download server using the same network card/interface that is used by users to download work units to their machines? if so, could it be that the generator/copier saturates the channel? if not so, how about the disk read/write speed of the download machine? Simultaneous red/write operations could hurt RAID performance. The writes alone are quite costly. But I guess that you have ruled these out already. IIRC most of, if not all, the servers use a Fibre Channel interconnect to the storage array. They have seen the FC network become saturated before, but that was from some changes they were trying I believe. Most of that kind of stuff gets posted in Technical News. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1332348 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1332349 - Posted: 28 Jan 2013, 21:29:19 UTC - in response to Message 1332347. Can someone recommend a good reliable source to get a paid proxy that would work with SETI? I am sure you could find a private paid proxy to use, but you might want to hit up the free ones first. http://www.xroxy.com/proxylist.php?port=&type=&ssl=&country=US&latency=&reliability=&sort=port#table SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1332349 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1332358 - Posted: 28 Jan 2013, 21:57:29 UTC Few weeks already my main host is almost constantly out of work from SETI. BOINC big download backofs make impossible to fill cache. Only when I have time to constantly press "retry now" I can fill cache for day or 2 and usually only for GPU, CPU remains empty/on backup project. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1332358 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1332359 - Posted: 28 Jan 2013, 22:06:35 UTC - in response to Message 1332358. Last modified: 28 Jan 2013, 22:07:06 UTC Been not watching closely over the traditional Australia Day long weekend chaos, and my machines were crunching when I looked occasionally. If I had stuck transfers I just put this retryMainTransfers.cmd in my scheduled tasks for every 20 mins or so: @ECHO OFF boinccmd --get_file_transfers > mainxfers.txt FOR /F "tokens=1,2" %%i IN (mainxfers.txt) DO ( IF "%%i" EQU "name:" echo %%j IF "%%i" EQU "name:" boinccmd --file_transfer http://setiathome.berkeley.edu/ %%j retry ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1332359 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1332362 - Posted: 28 Jan 2013, 22:21:32 UTC - in response to Message 1332359. Been not watching closely over the traditional Australia Day long weekend chaos, and my machines were crunching when I looked occasionally. If I had stuck transfers I just put this retryMainTransfers.cmd in my scheduled tasks for every 20 mins or so: @ECHO OFF boinccmd --get_file_transfers > mainxfers.txt FOR /F "tokens=1,2" %%i IN (mainxfers.txt) DO ( IF "%%i" EQU "name:" echo %%j IF "%%i" EQU "name:" boinccmd --file_transfer http://setiathome.berkeley.edu/ %%j retry ) Similarly, I have this as a crontab entry on my Linux boxes, and Windows running cygwin: [eesridr:~] > cat retryfiles pgrep boinc > /dev/null if [ $? -eq 0 ] # Test exit status of "pgrep" command. then cd ~/BOINC/ ./boinccmd --get_file_transfers \| gawk -f retry.awk fi [eesridr:~] > cat BOINC/retry.awk /name/ { n = $2;} / xfer active: no/ { system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");} ID: 1332362 ·

ExchangeMan Volunteer tester Send message Joined: 9 Jan 00 Posts: 115 Credit: 157,719,104 RAC: 0	Message 1332384 - Posted: 29 Jan 2013, 0:49:34 UTC - in response to Message 1332359. I have something very similar to this for the same purpose. Gotta love DOS programming. ID: 1332384 ·

KWSN Ekky Ekky Ekky Send message Joined: 25 May 99 Posts: 944 Credit: 52,956,491 RAC: 67	Message 1332457 - Posted: 29 Jan 2013, 8:44:14 UTC Last modified: 29 Jan 2013, 9:13:57 UTC Dip in traffic towards Seti detected? Yes, definitely a downturn. Expect failing reports after a good day of rapid access. [edit] The thin blue line has hit the bottom - no more reporting until later, I fear. [end edit] ID: 1332457 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67	Message 1332464 - Posted: 29 Jan 2013, 9:47:02 UTC - in response to Message 1332457. Dip in traffic towards Seti detected? Yes, definitely a downturn. Expect failing reports after a good day of rapid access. [edit] The thin blue line has hit the bottom - no more reporting until later, I fear. [end edit] That is not the bottom, it is the 10Mb horizontal. The weekly graph shows there is still a bit to go. But yes, there is a problem. ID: 1332464 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1332467 - Posted: 29 Jan 2013, 10:04:11 UTC - in response to Message 1332464. But yes, there is a problem. Yep, Scheduler borked again. "Couldn't connect to server" once again the standard response. Grant Darwin NT ID: 1332467 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1332468 - Posted: 29 Jan 2013, 10:10:20 UTC - in response to Message 1332467. But yes, there is a problem. Yep, Scheduler borked again. "Couldn't connect to server" once again the standard response. The server status page froze at 08:30 UTC - once that happens, there's usually no scheduler service until the staff get to the lab and restart things. Which, since it's Tuesday, means not until after maintenance. And since 'ready to send' was below high water mark when the page froze, and the splitters were running, we'll probably have a big bloat of tasks to work off when things are working again. ID: 1332468 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1332469 - Posted: 29 Jan 2013, 10:17:30 UTC - in response to Message 1332467. But yes, there is a problem. Yep, Scheduler borked again. "Couldn't connect to server" once again the standard response. Make that the only response. The last few times the Scheduler was playing up hitting rerty a few hundred times would eventually report the work done & get a bit more, but not this time. Dead as a dodo. Grant Darwin NT ID: 1332469 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1332470 - Posted: 29 Jan 2013, 10:18:02 UTC Well without a proxy, downloads are still questionable and fail often.. but I picked a proxy from the list and the 3 APs I had in my download queue were screaming in at 75-100KB/sec each. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1332470 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1332474 - Posted: 29 Jan 2013, 11:09:28 UTC 29/01/2013 09:05:19 \| SETI@home \| Scheduler request failed: Couldn't connect to server 29/01/2013 09:05:21 \| \| Internet access OK - project servers may be temporarily down. Again? IÂ´m tired... ID: 1332474 ·

MikeN Send message Joined: 24 Jan 11 Posts: 319 Credit: 64,719,409 RAC: 85	Message 1332497 - Posted: 29 Jan 2013, 13:26:27 UTC Just to add insult to injury, SETI decided to declare all 180 tasks on my main cruncher 'abandoned' at 2AM this morning (UK time). After I rebooted and reset the project I have not been able to connect to SETI to get any new tasks, so it is now eating its way through Einstein and Cosmology and will probably stay that way until after the weekly outage, probably about another 8-9 hours:(( ID: 1332497 ·

Ex: "Socialist" Volunteer tester Send message Joined: 12 Mar 12 Posts: 3433 Credit: 2,616,158 RAC: 2	Message 1332518 - Posted: 29 Jan 2013, 15:23:53 UTC Last modified: 29 Jan 2013, 15:24:37 UTC I don't seem to be able to upload tasks or get any at the moment. I know it's Tuesday AM over in Cali, but isn't it too early for the server to be down? I guess it's good I just bumped up my caches yesterday. #resist ID: 1332518 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1332520 - Posted: 29 Jan 2013, 15:27:27 UTC - in response to Message 1332518. I don't seem to be able to upload tasks or get any at the moment. I know it's Tuesday AM over in Cali, but isn't it too early for the server to be down? I guess it's good I just bumped up my caches yesterday. Servers crashed last night. Bookmark the Cricket graph for future reference. Hopefully they'll be back up later today after the usual maintenance outage. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1332520 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.