Panic Mode On (80) Server Problems?

Message boards : Number crunching : Panic Mode On (80) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next

AuthorMessage
Rolf

Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 0
Switzerland
Message 1330874 - Posted: 24 Jan 2013, 19:42:22 UTC - in response to Message 1330873.  

Inbound traffic

Which traffic?
ID: 1330874 · Report as offensive
Tom*

Send message
Joined: 12 Aug 11
Posts: 127
Credit: 20,769,223
RAC: 9
United States
Message 1330876 - Posted: 24 Jan 2013, 19:57:43 UTC
Last modified: 24 Jan 2013, 20:01:45 UTC

Thank goodness green traffic (outbound) has diminished

Finally got thru after 8 hours of trying

to get a measely little 12 jobs of short shorties.:-( all 110 seconds.

followed by 5 minutes later with 83 more short shorties..

I still think these shorties should be converted into CPU jobs so they do not

clog the thruput as much.
ID: 1330876 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1330891 - Posted: 24 Jan 2013, 20:39:07 UTC - in response to Message 1330858.  

Just noticed one thing in the log of one of my computers:

24/01/2013 19:21:16 SETI@home Sending scheduler request: To fetch work.
24/01/2013 19:21:16 SETI@home Reporting 4 completed tasks, requesting new tasks
24/01/2013 19:23:17 Project communication failed: attempting access to reference site
24/01/2013 19:23:21 Internet access OK - project servers may be temporarily down.
24/01/2013 19:23:21 SETI@home Scheduler request failed: Timeout was reached

Is the timeout not supposed to be 5 minutes and not just 2?

Timeout is whatever you have configured locally in place of

<http_transfer_timeout>seconds</http_transfer_timeout>
abort HTTP transfers if idle for this many seconds; default 300

Yeah... and my cc_config doesn't have that tag (I checked after I saw that), that's why I was suprised. Well, now the sheduler request got thru and the downloads seem to hang long enough without any progress before they time out.
ID: 1330891 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1330893 - Posted: 24 Jan 2013, 20:41:36 UTC - in response to Message 1330876.  
Last modified: 24 Jan 2013, 20:48:49 UTC

Thank goodness green traffic (outbound) has diminished

Finally got thru after 8 hours of trying

to get a measely little 12 jobs of short shorties.:-( all 110 seconds.

followed by 5 minutes later with 83 more short shorties..

I still think these shorties should be converted into CPU jobs so they do not

clog the thruput as much.

I began having problems connecting to the server late last night. When I finally connected early this morning, all I received were shorties. I went all day not being able to connect until just a few minutes ago. All I got were shorties. It's another Shortie Storm;
1/24/2013 2:22:59 PM | SETI@home | update requested by user
1/24/2013 2:23:04 PM | SETI@home | Sending scheduler request: Requested by user.
1/24/2013 2:23:04 PM | SETI@home | Reporting 50 completed tasks
1/24/2013 2:23:04 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and ATI
1/24/2013 2:23:24 PM | SETI@home | Computation for task 23jn12ad.19766.4975.14.10.59_0 finished
1/24/2013 2:23:24 PM | SETI@home | Starting task 25my12aa.26065.6202.6.10.85_1 using setiathome_enhanced version 609 (cuda23) in slot 4
1/24/2013 2:23:26 PM | SETI@home | Started upload of 23jn12ad.19766.4975.14.10.59_0_0
1/24/2013 2:23:30 PM | SETI@home | Finished upload of 23jn12ad.19766.4975.14.10.59_0_0
1/24/2013 2:23:50 PM | SETI@home | Scheduler request completed: got 38 new tasks
....

I have since received about another dozen or so other Shorties. My other machine is basically the same story...
ID: 1330893 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1330899 - Posted: 24 Jan 2013, 21:02:16 UTC
Last modified: 24 Jan 2013, 21:03:20 UTC

Let´s do some math... again.

If MB is working it uses +/-70% or more of the total BW avaiable

When AP start it uses another 70% to do it´s works.

So is a quest of simple math in the best situation: 70+70 = 140% > 100% of the avaiable 100Mbps BW then... you all know the answer.

Is crystal clear that the actual structure can´t handle both project running at the same time, or they get more BW from the actual link, 2oo Mpbs or more at least (a simple change of the 320k WU to wathever new size they choose have a high possibility to make the things worst for us who actualy can´t DL even a 320k file), or they split the DL in two separate pipes one for each project, each one with 100 Mbps at least (that will work for a few months)

Just don´t see who don´t want to see...

Politics, always the polictis in action.
ID: 1330899 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1330907 - Posted: 24 Jan 2013, 21:23:20 UTC

There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type. Thus for every AP WU the number of MB in the queue is reduced by 8/0.37 = 21. In reality you could probably get away with reducing the number of MB by a small number, somewhere between 15 and 20.

This would however only affect one of the two problems afflicting the download system.

The other is the maximum number of tasks delivered in a single hit. Given that the buffer is only 100WU it is grossly unfair that a single cruncher can get the whole buffer in one hit. My new cruncher has done that several times recently, and I'd be quite happy in only getting 50 per hit, and being able to go back again in another five minutes and get another 50WU.

It would of course be far better to be able to report and collect a smaller number each time and not the stupid attempts at reporting vast numbers that I am seeing just now due to the way the connections are dropping. A situation that is NOT helped by stupid long back-offs, which actually make the situation worse not better.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1330907 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1330908 - Posted: 24 Jan 2013, 21:26:16 UTC

Here's real simple math. The longer CUDA MB tasks are 367 KBs and take around 25 minutes on my 5 year old card. The Shorties are also 367 KBs and take around 4 minutes on the same card. When running Shorties I am using well over six times the BW due to all the associated traffic with each transfer. It's not Rocket Science.

Now imagine the newer cards that complete the same 367 KB Shortie in less than a minute...
ID: 1330908 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1330910 - Posted: 24 Jan 2013, 21:32:08 UTC - in response to Message 1330907.  
Last modified: 24 Jan 2013, 21:45:03 UTC

There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type.

They increased the Buffer size to 200 sometime last August, I think this is why we have a lot of problems when we get a Shortie storm, too many small Wu's going out too frequently,

Claggy
ID: 1330910 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1330912 - Posted: 24 Jan 2013, 21:34:29 UTC

And do several of them at the same time.

The shortie turn-around on my new cruncher is about 1 per minute, and I haven't ramped it up yet, once going at full tilt it will probably be nearer 3 per minute, without over-clocking.
Add to that the 8 at time the CPU will do once it get through the load that MalariaControl deposited the other day and you get some idea of what can be done.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1330912 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1330944 - Posted: 24 Jan 2013, 22:09:31 UTC
Last modified: 24 Jan 2013, 22:20:40 UTC

And if you happen to run the tasks that are carrying out the gaussian search, these tasks are selected from the best ones that were shorties before that.

So, what are the numbers needed when it comes to spike and pulse scores and possible triplets? Apparently there is no need for any re-observation when it comes to the resends of these tasks.

Only a resend of earlier tasks (to users) where the correct parameter is set in order for the given task in question to do the same thing.

Same goes for vlar's, I guess. Definitely not necessary to run these tasks on every part of the sky.

But for these tasks also, a gaussian score may be needed in order for a possible signal to be detected. Are we back to doing the "ordinary" tasks carrying out the gaussian search when it comes to the "best" vlar's as well?
ID: 1330944 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1330956 - Posted: 24 Jan 2013, 22:28:34 UTC - in response to Message 1330912.  

Why not just offer APs for downloads on, say,
Wednesdays and Saturdays only, then offer only
MBs the rest of the week?

I'm sure there's a reason for that to be hard
to do, but I don't see it. Can anybody point
out why that would be hard to do? Or not make
sense?
ID: 1330956 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1330957 - Posted: 24 Jan 2013, 22:36:34 UTC
Last modified: 24 Jan 2013, 22:40:39 UTC

One day SAH will release SETI@home v7 large workunits apps (the GPU apps will follow soon), currently tested at SAH-BETA.

I don't know how much longer our PCs need to calculate this WUs, but I guess ~ 4 times longer than the currently SAH Enhanced WUs ..
Maybe someone who tested already this apps/kind of new WUs can say it ..

Then, this value less SAH WU DL.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
ID: 1330957 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1330958 - Posted: 24 Jan 2013, 22:40:15 UTC - in response to Message 1330956.  
Last modified: 24 Jan 2013, 22:41:44 UTC

Why not just offer APs for downloads on, say,
Wednesdays and Saturdays only, then offer only
MBs the rest of the week?

I'm sure there's a reason for that to be hard
to do, but I don't see it. Can anybody point
out why that would be hard to do? Or not make
sense?


Because there are limits in place. With only a max of 100 WUs, for example, my GPUs would chew through that amount in roughly 3 hours (that is assuming I can get 100 WUs which at the moment I can't) and then they would be sitting there idle for the rest of the day. So for 2 days a week they would not be working.

However, I would tend to support the general thrust of what you are saying if the limits were removed or increased to circa 800 per GPU and 200 per CPU core. This might help to provide a buffer to assist in overcoming the firestorm on the other side when MB comes back on.

rgds
ID: 1330958 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1330964 - Posted: 24 Jan 2013, 23:05:07 UTC - in response to Message 1330958.  

OTOH, you're not getting any work units when
the servers are overloaded anyway. 100 is better
than none to my way of thinking and if it works better
maybe the limits can be raised, as you say.
ID: 1330964 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1330969 - Posted: 24 Jan 2013, 23:30:43 UTC - in response to Message 1330824.  

It's ironic no sooner as I posted my message below, all tasks reported.

Hey, glad to see I'm not the only one that happens to.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1330969 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1330982 - Posted: 25 Jan 2013, 0:42:56 UTC - in response to Message 1330969.  

about 100 shorties just came in on one box and the other is getting them as well now ... looks like things are about to get turbulent ...
ID: 1330982 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1331024 - Posted: 25 Jan 2013, 3:50:06 UTC

About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that.

Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot.

Or even like it was for a while there early last year.. AP would not be issued out by the scheduler except for during certain 4-hour blocks. You would go 4 hours with absolutely no APs going out at all, and then in the next 4 hours, only MB-resends would go out, but no new ones.. it was just all AP for those 4 hours.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1331024 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1331066 - Posted: 25 Jan 2013, 6:33:02 UTC - in response to Message 1331024.  
Last modified: 25 Jan 2013, 7:12:50 UTC

One system out of GPU work, the other one running out of GPU & CPU work.
Apart from the odd abberation, Scheduler requests just result in "Couldn't connect to server" messages.


EDIT- if only i had gotten home from work & posted about the issues sooner. Inbound network traffic has surged, i'm now downloading work...
The Scheduler would appear to be alive again. It takes at least a couple of minutes, but it's possible to connect.
Grant
Darwin NT
ID: 1331066 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1331076 - Posted: 25 Jan 2013, 7:36:41 UTC - in response to Message 1331066.  


...and now that the Scheduler is working again, the network pipes are fully clogged & downloads have gone from almost 10kB/s to less than 1.
Grant
Darwin NT
ID: 1331076 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1331092 - Posted: 25 Jan 2013, 8:38:34 UTC - in response to Message 1331024.  

About the buffer.. See, the way it used to be when AP came along in the first place, the feeder only had 100 slots available. The ratio was 97/3 for MB/AP. Then when we went from v5 to v505, it was adjusted to 96/3/1 until v5 was completely gone, and then I don't know what it became after that.

Maybe we could try going back to something like that. Maybe 190/10, or 195/5? Or even just cut back to maybe one AP splitter? Just need to thin the population a bit and that may help things a lot.

The amount of splitters seems to be be OK, MB is splitted at about the same speed as AP, better than that you won't get it. That avoids the situation we had before: few days of intensive AP splitting and lots of problems with downloads and after few days with bandwidth usage of about 70%. Now with a more or less constant ratio of available MB and AP WUs, all they need to do is to slow down the feeder a bit, so it refills the sheduler queue not as often as it is doing now.
ID: 1331092 · Report as offensive
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.