Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next
Author Message
Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,570,557
RAC: 254
Germany
Message 1330805 - Posted: 24 Jan 2013, 18:33:13 UTC

Just noticed one thing in the log of one of my computers:

24/01/2013 19:21:16 SETI@home Sending scheduler request: To fetch work.
24/01/2013 19:21:16 SETI@home Reporting 4 completed tasks, requesting new tasks
24/01/2013 19:23:17 Project communication failed: attempting access to reference site
24/01/2013 19:23:21 Internet access OK - project servers may be temporarily down.
24/01/2013 19:23:21 SETI@home Scheduler request failed: Timeout was reached

Is the timeout not supposed to be 5 minutes and not just 2?
____________
.

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1330818 - Posted: 24 Jan 2013, 18:43:39 UTC

I cant get anything to report here either

Profile Swordfish
Avatar
Send message
Joined: 5 Aug 06
Posts: 72
Credit: 3,012,670
RAC: 0
United Kingdom
Message 1330824 - Posted: 24 Jan 2013, 18:47:14 UTC

It's ironic no sooner as I posted my message below, all tasks reported.

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,268,196
RAC: 4,153
United States
Message 1330836 - Posted: 24 Jan 2013, 19:04:09 UTC - in response to Message 1330824.

Swordfish, Not to worry:

1/24/2013 11:59:06 AM SETI@home Reporting 3 completed tasks, not requesting new tasks
1/24/2013 12:00:40 PM Project communication failed: attempting access to reference site
1/24/2013 12:00:40 PM SETI@home Scheduler request failed: Server returned nothing (no headers, no data)
1/24/2013 12:00:42 PM Internet access OK - project servers may be temporarily down.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8629
Credit: 51,360,340
RAC: 50,208
United Kingdom
Message 1330858 - Posted: 24 Jan 2013, 19:23:08 UTC - in response to Message 1330805.

Just noticed one thing in the log of one of my computers:

24/01/2013 19:21:16 SETI@home Sending scheduler request: To fetch work.
24/01/2013 19:21:16 SETI@home Reporting 4 completed tasks, requesting new tasks
24/01/2013 19:23:17 Project communication failed: attempting access to reference site
24/01/2013 19:23:21 Internet access OK - project servers may be temporarily down.
24/01/2013 19:23:21 SETI@home Scheduler request failed: Timeout was reached

Is the timeout not supposed to be 5 minutes and not just 2?

Timeout is whatever you have configured locally in place of

<http_transfer_timeout>seconds</http_transfer_timeout>
abort HTTP transfers if idle for this many seconds; default 300

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5861
Credit: 60,359,843
RAC: 48,458
Australia
Message 1330873 - Posted: 24 Jan 2013, 19:36:53 UTC - in response to Message 1330858.


Inbound traffic has dropped even further.
____________
Grant
Darwin NT.

Rolf
Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 2
Switzerland
Message 1330874 - Posted: 24 Jan 2013, 19:42:22 UTC - in response to Message 1330873.

Inbound traffic

Which traffic?

Tom*
Send message
Joined: 12 Aug 11
Posts: 114
Credit: 4,814,441
RAC: 244
United States
Message 1330876 - Posted: 24 Jan 2013, 19:57:43 UTC
Last modified: 24 Jan 2013, 20:01:45 UTC

Thank goodness green traffic (outbound) has diminished

Finally got thru after 8 hours of trying

to get a measely little 12 jobs of short shorties.:-( all 110 seconds.

followed by 5 minutes later with 83 more short shorties..

I still think these shorties should be converted into CPU jobs so they do not

clog the thruput as much.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 828
Credit: 1,570,557
RAC: 254
Germany
Message 1330891 - Posted: 24 Jan 2013, 20:39:07 UTC - in response to Message 1330858.

Just noticed one thing in the log of one of my computers:

24/01/2013 19:21:16 SETI@home Sending scheduler request: To fetch work.
24/01/2013 19:21:16 SETI@home Reporting 4 completed tasks, requesting new tasks
24/01/2013 19:23:17 Project communication failed: attempting access to reference site
24/01/2013 19:23:21 Internet access OK - project servers may be temporarily down.
24/01/2013 19:23:21 SETI@home Scheduler request failed: Timeout was reached

Is the timeout not supposed to be 5 minutes and not just 2?

Timeout is whatever you have configured locally in place of

<http_transfer_timeout>seconds</http_transfer_timeout>
abort HTTP transfers if idle for this many seconds; default 300

Yeah... and my cc_config doesn't have that tag (I checked after I saw that), that's why I was suprised. Well, now the sheduler request got thru and the downloads seem to hang long enough without any progress before they time out.
____________
.

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1356
Credit: 50,447,756
RAC: 101,043
United States
Message 1330893 - Posted: 24 Jan 2013, 20:41:36 UTC - in response to Message 1330876.
Last modified: 24 Jan 2013, 20:48:49 UTC

Thank goodness green traffic (outbound) has diminished

Finally got thru after 8 hours of trying

to get a measely little 12 jobs of short shorties.:-( all 110 seconds.

followed by 5 minutes later with 83 more short shorties..

I still think these shorties should be converted into CPU jobs so they do not

clog the thruput as much.

I began having problems connecting to the server late last night. When I finally connected early this morning, all I received were shorties. I went all day not being able to connect until just a few minutes ago. All I got were shorties. It's another Shortie Storm;
1/24/2013 2:22:59 PM | SETI@home | update requested by user 1/24/2013 2:23:04 PM | SETI@home | Sending scheduler request: Requested by user. 1/24/2013 2:23:04 PM | SETI@home | Reporting 50 completed tasks 1/24/2013 2:23:04 PM | SETI@home | Requesting new tasks for CPU and NVIDIA and ATI 1/24/2013 2:23:24 PM | SETI@home | Computation for task 23jn12ad.19766.4975.14.10.59_0 finished 1/24/2013 2:23:24 PM | SETI@home | Starting task 25my12aa.26065.6202.6.10.85_1 using setiathome_enhanced version 609 (cuda23) in slot 4 1/24/2013 2:23:26 PM | SETI@home | Started upload of 23jn12ad.19766.4975.14.10.59_0_0 1/24/2013 2:23:30 PM | SETI@home | Finished upload of 23jn12ad.19766.4975.14.10.59_0_0 1/24/2013 2:23:50 PM | SETI@home | Scheduler request completed: got 38 new tasks ....

I have since received about another dozen or so other Shorties. My other machine is basically the same story...

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5380
Credit: 304,902,789
RAC: 340,026
Brazil
Message 1330899 - Posted: 24 Jan 2013, 21:02:16 UTC
Last modified: 24 Jan 2013, 21:03:20 UTC

Let´s do some math... again.

If MB is working it uses +/-70% or more of the total BW avaiable

When AP start it uses another 70% to do it´s works.

So is a quest of simple math in the best situation: 70+70 = 140% > 100% of the avaiable 100Mbps BW then... you all know the answer.

Is crystal clear that the actual structure can´t handle both project running at the same time, or they get more BW from the actual link, 2oo Mpbs or more at least (a simple change of the 320k WU to wathever new size they choose have a high possibility to make the things worst for us who actualy can´t DL even a 320k file), or they split the DL in two separate pipes one for each project, each one with 100 Mbps at least (that will work for a few months)

Just don´t see who don´t want to see...

Politics, always the polictis in action.
____________

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8525
Credit: 58,949,434
RAC: 78,963
United Kingdom
Message 1330907 - Posted: 24 Jan 2013, 21:23:20 UTC

There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type. Thus for every AP WU the number of MB in the queue is reduced by 8/0.37 = 21. In reality you could probably get away with reducing the number of MB by a small number, somewhere between 15 and 20.

This would however only affect one of the two problems afflicting the download system.

The other is the maximum number of tasks delivered in a single hit. Given that the buffer is only 100WU it is grossly unfair that a single cruncher can get the whole buffer in one hit. My new cruncher has done that several times recently, and I'd be quite happy in only getting 50 per hit, and being able to go back again in another five minutes and get another 50WU.

It would of course be far better to be able to report and collect a smaller number each time and not the stupid attempts at reporting vast numbers that I am seeing just now due to the way the connections are dropping. A situation that is NOT helped by stupid long back-offs, which actually make the situation worse not better.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1356
Credit: 50,447,756
RAC: 101,043
United States
Message 1330908 - Posted: 24 Jan 2013, 21:26:16 UTC

Here's real simple math. The longer CUDA MB tasks are 367 KBs and take around 25 minutes on my 5 year old card. The Shorties are also 367 KBs and take around 4 minutes on the same card. When running Shorties I am using well over six times the BW due to all the associated traffic with each transfer. It's not Rocket Science.

Now imagine the newer cards that complete the same 367 KB Shortie in less than a minute...

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4139
Credit: 33,404,245
RAC: 18,476
United Kingdom
Message 1330910 - Posted: 24 Jan 2013, 21:32:08 UTC - in response to Message 1330907.
Last modified: 24 Jan 2013, 21:45:03 UTC

There is a fairly simple way of reducing the impact of APs, that is to restrict the download buffer by size, not by number, currently this buffer is set at 100WU of either type.

They increased the Buffer size to 200 sometime last August, I think this is why we have a lot of problems when we get a Shortie storm, too many small Wu's going out too frequently,

Claggy

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8525
Credit: 58,949,434
RAC: 78,963
United Kingdom
Message 1330912 - Posted: 24 Jan 2013, 21:34:29 UTC

And do several of them at the same time.

The shortie turn-around on my new cruncher is about 1 per minute, and I haven't ramped it up yet, once going at full tilt it will probably be nearer 3 per minute, without over-clocking.
Add to that the 8 at time the CPU will do once it get through the load that MalariaControl deposited the other day and you get some idea of what can be done.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

musicplayer
Send message
Joined: 17 May 10
Posts: 1456
Credit: 705,172
RAC: 602
Message 1330944 - Posted: 24 Jan 2013, 22:09:31 UTC
Last modified: 24 Jan 2013, 22:20:40 UTC

And if you happen to run the tasks that are carrying out the gaussian search, these tasks are selected from the best ones that were shorties before that.

So, what are the numbers needed when it comes to spike and pulse scores and possible triplets? Apparently there is no need for any re-observation when it comes to the resends of these tasks.

Only a resend of earlier tasks (to users) where the correct parameter is set in order for the given task in question to do the same thing.

Same goes for vlar's, I guess. Definitely not necessary to run these tasks on every part of the sky.

But for these tasks also, a gaussian score may be needed in order for a possible signal to be detected. Are we back to doing the "ordinary" tasks carrying out the gaussian search when it comes to the "best" vlar's as well?

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,920,394
RAC: 14,025
United States
Message 1330956 - Posted: 24 Jan 2013, 22:28:34 UTC - in response to Message 1330912.

Why not just offer APs for downloads on, say,
Wednesdays and Saturdays only, then offer only
MBs the rest of the week?

I'm sure there's a reason for that to be hard
to do, but I don't see it. Can anybody point
out why that would be hard to do? Or not make
sense?

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7100
Credit: 60,841,640
RAC: 16,976
Germany
Message 1330957 - Posted: 24 Jan 2013, 22:36:34 UTC
Last modified: 24 Jan 2013, 22:40:39 UTC

One day SAH will release SETI@home v7 large workunits apps (the GPU apps will follow soon), currently tested at SAH-BETA.

I don't know how much longer our PCs need to calculate this WUs, but I guess ~ 4 times longer than the currently SAH Enhanced WUs ..
Maybe someone who tested already this apps/kind of new WUs can say it ..

Then, this value less SAH WU DL.


* Best regards! :-) * Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Lionel
Send message
Joined: 25 Mar 00
Posts: 576
Credit: 234,736,019
RAC: 218,239
Australia
Message 1330958 - Posted: 24 Jan 2013, 22:40:15 UTC - in response to Message 1330956.
Last modified: 24 Jan 2013, 22:41:44 UTC

Why not just offer APs for downloads on, say,
Wednesdays and Saturdays only, then offer only
MBs the rest of the week?

I'm sure there's a reason for that to be hard
to do, but I don't see it. Can anybody point
out why that would be hard to do? Or not make
sense?


Because there are limits in place. With only a max of 100 WUs, for example, my GPUs would chew through that amount in roughly 3 hours (that is assuming I can get 100 WUs which at the moment I can't) and then they would be sitting there idle for the rest of the day. So for 2 days a week they would not be working.

However, I would tend to support the general thrust of what you are saying if the limits were removed or increased to circa 800 per GPU and 200 per CPU core. This might help to provide a buffer to assist in overcoming the firestorm on the other side when MB comes back on.

rgds
____________

bill
Send message
Joined: 16 Jun 99
Posts: 861
Credit: 23,920,394
RAC: 14,025
United States
Message 1330964 - Posted: 24 Jan 2013, 23:05:07 UTC - in response to Message 1330958.

OTOH, you're not getting any work units when
the servers are overloaded anyway. 100 is better
than none to my way of thinking and if it works better
maybe the limits can be raised, as you say.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2014 University of California