Panic Mode On (80) Server Problems?

Message boards : Number crunching : Panic Mode On (80) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · Next

AuthorMessage
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1332304 - Posted: 28 Jan 2013, 19:24:17 UTC

An AP is about 22 times the size of an MB.
It could be that the presence of a feed of APs just trips things over the line. Likewise a high demand, such as a shortie storm has the same effect.
A small perturbation is just enough to upset the scheduler, which causes a higher number of "rejects" than normal, and so the snowball of delays and retries grows.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1332304 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1332309 - Posted: 28 Jan 2013, 19:30:29 UTC - in response to Message 1332294.  

In that case why do we get reasonable download rates sometime when the splitters are going all out, and yet others (like now) the performance is very poor?

It seems to be usually when the larger AP WUs are added to the download mix that things get rather tied up. I have noticed at times that it appears that AP downloads, although still slow, seem to be less likely to stall or hang, thus tying up the download link longer.

AP's are ~20 times larger than MB, but only take about 6 times the amount of time to process. The 100Mb pipe is often sufficient for standard MB tasks when there isn't a large volume of shorties. Add in AP or batches of shorties and then it does get choked. Hopefully the work towards larger MB tasks will help take some of the load off of the line.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1332309 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1332341 - Posted: 28 Jan 2013, 20:55:15 UTC - in response to Message 1332294.  


Some random thoughts in the evening..

.. are AP and MB work units generated on some machine and then copied over network to a distribution server?

If so, are they loaded to the downlod server using the same network card/interface that is used by users to download work units to their machines?

if so, could it be that the generator/copier saturates the channel?

if not so, how about the disk read/write speed of the download machine? Simultaneous red/write operations could hurt RAID performance. The writes alone are quite costly.

But I guess that you have ruled these out already.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1332341 · Report as offensive
fscheel

Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1332347 - Posted: 28 Jan 2013, 21:14:54 UTC - in response to Message 1331980.  

Can someone recommend a good reliable source to get a paid proxy that would work with SETI?
ID: 1332347 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1332348 - Posted: 28 Jan 2013, 21:27:26 UTC - in response to Message 1332341.  


Some random thoughts in the evening..

.. are AP and MB work units generated on some machine and then copied over network to a distribution server?

If so, are they loaded to the download server using the same network card/interface that is used by users to download work units to their machines?

if so, could it be that the generator/copier saturates the channel?

if not so, how about the disk read/write speed of the download machine? Simultaneous red/write operations could hurt RAID performance. The writes alone are quite costly.

But I guess that you have ruled these out already.

IIRC most of, if not all, the servers use a Fibre Channel interconnect to the storage array.

They have seen the FC network become saturated before, but that was from some changes they were trying I believe. Most of that kind of stuff gets posted in Technical News.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1332348 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1332349 - Posted: 28 Jan 2013, 21:29:19 UTC - in response to Message 1332347.  

Can someone recommend a good reliable source to get a paid proxy that would work with SETI?

I am sure you could find a private paid proxy to use, but you might want to hit up the free ones first.
http://www.xroxy.com/proxylist.php?port=&type=&ssl=&country=US&latency=&reliability=&sort=port#table
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1332349 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1332358 - Posted: 28 Jan 2013, 21:57:29 UTC

Few weeks already my main host is almost constantly out of work from SETI.
BOINC big download backofs make impossible to fill cache.
Only when I have time to constantly press "retry now" I can fill cache for day or 2 and usually only for GPU, CPU remains empty/on backup project.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1332358 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1332359 - Posted: 28 Jan 2013, 22:06:35 UTC - in response to Message 1332358.  
Last modified: 28 Jan 2013, 22:07:06 UTC

Been not watching closely over the traditional Australia Day long weekend chaos, and my machines were crunching when I looked occasionally. If I had stuck transfers I just put this retryMainTransfers.cmd in my scheduled tasks for every 20 mins or so:

@ECHO OFF
boinccmd --get_file_transfers > mainxfers.txt
FOR /F "tokens=1,2" %%i IN (mainxfers.txt) DO (
 IF "%%i" EQU "name:" echo %%j
 IF "%%i" EQU "name:" boinccmd --file_transfer http://setiathome.berkeley.edu/ %%j retry
)

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1332359 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1332362 - Posted: 28 Jan 2013, 22:21:32 UTC - in response to Message 1332359.  

Been not watching closely over the traditional Australia Day long weekend chaos, and my machines were crunching when I looked occasionally. If I had stuck transfers I just put this retryMainTransfers.cmd in my scheduled tasks for every 20 mins or so:

@ECHO OFF
boinccmd --get_file_transfers > mainxfers.txt
FOR /F "tokens=1,2" %%i IN (mainxfers.txt) DO (
 IF "%%i" EQU "name:" echo %%j
 IF "%%i" EQU "name:" boinccmd --file_transfer http://setiathome.berkeley.edu/ %%j retry
)


Similarly, I have this as a crontab entry on my Linux boxes, and Windows running cygwin:

[eesridr:~] > cat retryfiles
pgrep boinc > /dev/null
if [ $? -eq 0 ]         # Test exit status of "pgrep" command.
then
cd ~/BOINC/
./boinccmd --get_file_transfers | gawk -f retry.awk
fi

[eesridr:~] > cat BOINC/retry.awk 
/name/ { n = $2;}
/   xfer active: no/
{ system("./boinccmd --file_transfer http://setiathome.berkeley.edu/ " n " retry");}

ID: 1332362 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1332384 - Posted: 29 Jan 2013, 0:49:34 UTC - in response to Message 1332359.  

I have something very similar to this for the same purpose. Gotta love DOS programming.

ID: 1332384 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1332457 - Posted: 29 Jan 2013, 8:44:14 UTC
Last modified: 29 Jan 2013, 9:13:57 UTC

Dip in traffic towards Seti detected?
Yes, definitely a downturn. Expect failing reports after a good day of rapid access.

[edit] The thin blue line has hit the bottom - no more reporting until later, I fear. [end edit]

ID: 1332457 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1332464 - Posted: 29 Jan 2013, 9:47:02 UTC - in response to Message 1332457.  

Dip in traffic towards Seti detected?
Yes, definitely a downturn. Expect failing reports after a good day of rapid access.

[edit] The thin blue line has hit the bottom - no more reporting until later, I fear. [end edit]

That is not the bottom, it is the 10Mb horizontal. The weekly graph shows there is still a bit to go.

But yes, there is a problem.
ID: 1332464 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1332467 - Posted: 29 Jan 2013, 10:04:11 UTC - in response to Message 1332464.  

But yes, there is a problem.

Yep, Scheduler borked again.
"Couldn't connect to server" once again the standard response.
Grant
Darwin NT
ID: 1332467 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1332468 - Posted: 29 Jan 2013, 10:10:20 UTC - in response to Message 1332467.  

But yes, there is a problem.

Yep, Scheduler borked again.
"Couldn't connect to server" once again the standard response.

The server status page froze at 08:30 UTC - once that happens, there's usually no scheduler service until the staff get to the lab and restart things.

Which, since it's Tuesday, means not until after maintenance.

And since 'ready to send' was below high water mark when the page froze, and the splitters were running, we'll probably have a big bloat of tasks to work off when things are working again.
ID: 1332468 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1332469 - Posted: 29 Jan 2013, 10:17:30 UTC - in response to Message 1332467.  

But yes, there is a problem.

Yep, Scheduler borked again.
"Couldn't connect to server" once again the standard response.


Make that the only response.
The last few times the Scheduler was playing up hitting rerty a few hundred times would eventually report the work done & get a bit more, but not this time. Dead as a dodo.
Grant
Darwin NT
ID: 1332469 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1332470 - Posted: 29 Jan 2013, 10:18:02 UTC

Well without a proxy, downloads are still questionable and fail often.. but I picked a proxy from the list and the 3 APs I had in my download queue were screaming in at 75-100KB/sec each.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1332470 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1332474 - Posted: 29 Jan 2013, 11:09:28 UTC

29/01/2013 09:05:19 | SETI@home | Scheduler request failed: Couldn't connect to server
29/01/2013 09:05:21 | | Internet access OK - project servers may be temporarily down.

Again? I´m tired...
ID: 1332474 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1332497 - Posted: 29 Jan 2013, 13:26:27 UTC

Just to add insult to injury, SETI decided to declare all 180 tasks on my main cruncher 'abandoned' at 2AM this morning (UK time). After I rebooted and reset the project I have not been able to connect to SETI to get any new tasks, so it is now eating its way through Einstein and Cosmology and will probably stay that way until after the weekly outage, probably about another 8-9 hours:((
ID: 1332497 · Report as offensive
Profile Ex: "Socialist"
Volunteer tester
Avatar

Send message
Joined: 12 Mar 12
Posts: 3433
Credit: 2,616,158
RAC: 2
United States
Message 1332518 - Posted: 29 Jan 2013, 15:23:53 UTC
Last modified: 29 Jan 2013, 15:24:37 UTC

I don't seem to be able to upload tasks or get any at the moment.

I know it's Tuesday AM over in Cali, but isn't it too early for the server to be down?

I guess it's good I just bumped up my caches yesterday.
#resist
ID: 1332518 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1332520 - Posted: 29 Jan 2013, 15:27:27 UTC - in response to Message 1332518.  

I don't seem to be able to upload tasks or get any at the moment.

I know it's Tuesday AM over in Cali, but isn't it too early for the server to be down?

I guess it's good I just bumped up my caches yesterday.

Servers crashed last night. Bookmark the Cricket graph for future reference.
Hopefully they'll be back up later today after the usual maintenance outage.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1332520 · Report as offensive
Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.