Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 25 · Next
Author Message
Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5811
Credit: 58,788,301
RAC: 48,478
Australia
Message 1329346 - Posted: 20 Jan 2013, 6:15:59 UTC - in response to Message 1329323.

I cant see buying two new servers if the pipe keeps getting plugged up.

The pipe has been plugged up for years, so the present issues are either due to the download servers, or some other internal network issues IMHO.
The fact the network is saturated doesn't help, of course.
____________
Grant
Darwin NT.

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8374
Credit: 56,486,196
RAC: 78,091
United Kingdom
Message 1329373 - Posted: 20 Jan 2013, 9:25:58 UTC

I find it very strange that, with the pipeline at/near saturation one server manages to shuffle work out at an acceptable rate while the other crawls (at best). The number of re-tries generated by the crawling server cannot help the overall through put, thus adds to the general pipeline clog, which in part will be down to the re-try requests and associated traffic.
Wasn't there a short period a few months back when only the "speedy" server in action, and if I recall there were far fewer re-try events at that time?
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1248
Credit: 47,223,499
RAC: 112,920
United States
Message 1329459 - Posted: 20 Jan 2013, 16:19:38 UTC

My MB Cuda caches contain mostly Shorties.

They're back...

hbomber
Volunteer tester
Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 2
Bulgaria
Message 1329466 - Posted: 20 Jan 2013, 16:55:25 UTC
Last modified: 20 Jan 2013, 17:00:13 UTC

132 out of 200 task are shorties here(AR > 2.2). Bad part is, most of them are for CPU, which is more inefficient than GPU with them (1.5 minutes for GPU vs 12-15 minutes CPU. In contrast, WUs with middle AR, are 7-8 minutes on GPU and 45-55 min on CPU. Huge difference between two groups of units).
I usually rescheduled them before, but someone woudlnt be happy with the given credits, so I stopped doing so.

As for AP download, only proxies helped me to fill my cache. I even lowered it, bcs there was a moment with 100+ APs. My puny 5770 can do like 12-13 APs for a day(for 16-17 hours).
____________

Profile James SotherdenProject donor
Avatar
Send message
Joined: 16 May 99
Posts: 8791
Credit: 34,109,593
RAC: 59,305
United States
Message 1329499 - Posted: 20 Jan 2013, 18:11:05 UTC
Last modified: 20 Jan 2013, 18:11:35 UTC

I have APs that are choking and gagging for the last 8 Hours. They are at 96% to being downloaded so mabye only 3 more hours:)
____________

Old James

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2264
Credit: 8,663,385
RAC: 4,309
United States
Message 1329529 - Posted: 20 Jan 2013, 19:06:07 UTC

I use a proxy and my APs take about 12 minutes to download from start to finish with no time-outs/stalls.

My single-core MB-only machine is just on its own and manages to get MBs just fine with only 2-3 re-tries. Main cruncher is getting 6-8 APs/day and they have to be nudged along and assisted the whole way.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,807,818
RAC: 2,219
United States
Message 1329530 - Posted: 20 Jan 2013, 19:08:51 UTC

The cricket graph for packet count shows it's getting ... wonky again.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1248
Credit: 47,223,499
RAC: 112,920
United States
Message 1329539 - Posted: 20 Jan 2013, 19:33:11 UTC
Last modified: 20 Jan 2013, 19:56:05 UTC

So, what's a "permanent HTTP error"? First time I've seen that error when trying to use a proxy. Now if you choose the wrong proxy your download is terminated and listed as an Error? Is this new with 7.0.44? It's always something...

1/20/2013 2:15:45 PM | | Using proxy info from GUI
1/20/2013 2:15:45 PM | | Using HTTP proxy 165.24.5.219:8080
1/20/2013 2:16:15 PM | | Suspending network activity - user request
1/20/2013 2:16:20 PM | | Resuming network activity
1/20/2013 2:16:20 PM | SETI@home | Started download of ap_25jn12ad_B4_P0_00126_20130120_04307.wu
1/20/2013 2:16:20 PM | SETI@home | Started download of ap_25jn12ad_B4_P1_00162_20130120_04699.wu
1/20/2013 2:16:20 PM | SETI@home | Started download of ap_25jn12ad_B4_P0_00160_20130120_04307.wu
1/20/2013 2:16:20 PM | SETI@home | Started download of ap_25jn12ad_B4_P0_00235_20130120_04307.wu
1/20/2013 2:16:21 PM | SETI@home | Giving up on download of ap_25jn12ad_B4_P0_00126_20130120_04307.wu: permanent HTTP error
1/20/2013 2:16:21 PM | SETI@home | Giving up on download of ap_25jn12ad_B4_P1_00162_20130120_04699.wu: permanent HTTP error
1/20/2013 2:16:21 PM | SETI@home | Giving up on download of ap_25jn12ad_B4_P0_00160_20130120_04307.wu: permanent HTTP error
1/20/2013 2:16:21 PM | SETI@home | Giving up on download of ap_25jn12ad_B4_P0_00235_20130120_04307.wu: permanent HTTP error
1/20/2013 2:16:21 PM | SETI@home | Started download of ap_25jn12ad_B3_P1_00315_20130120_30148.wu
1/20/2013 2:16:21 PM | SETI@home | Started download of ap_29no12ae_B0_P1_00146_20130120_12722.wu
1/20/2013 2:16:23 PM | SETI@home | Giving up on download of ap_25jn12ad_B3_P1_00315_20130120_30148.wu: permanent HTTP error
1/20/2013 2:16:23 PM | SETI@home | Giving up on download of ap_29no12ae_B0_P1_00146_20130120_12722.wu: permanent HTTP error
...


1/20/2013 2:21:11 PM | | Using proxy info from GUI
1/20/2013 2:21:11 PM | | Using HTTP proxy 178.18.17.250:3128
1/20/2013 2:21:22 PM | | Suspending network activity - user request
1/20/2013 2:21:28 PM | | Resuming network activity
1/20/2013 2:21:28 PM | SETI@home | Started download of ap_25jn12ad_B4_P0_00278_20130120_04307.wu
1/20/2013 2:21:28 PM | SETI@home | Started download of ap_25jn12ad_B4_P1_00282_20130120_04699.wu
1/20/2013 2:21:28 PM | SETI@home | Started download of ap_24my12ab_B4_P0_00202_20130119_05992.wu
1/20/2013 2:21:30 PM | SETI@home | Giving up on download of ap_25jn12ad_B4_P0_00278_20130120_04307.wu: permanent HTTP error
1/20/2013 2:21:30 PM | SETI@home | Giving up on download of ap_25jn12ad_B4_P1_00282_20130120_04699.wu: permanent HTTP error
1/20/2013 2:21:30 PM | SETI@home | Giving up on download of ap_24my12ab_B4_P0_00202_20130119_05992.wu: permanent HTTP error
...


And...I was just about to say how great things were working with 7.0.44. You can rack up a lot of Errors quickly this way.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,037
RAC: 5,766
United Kingdom
Message 1329553 - Posted: 20 Jan 2013, 19:54:42 UTC - in response to Message 1329539.
Last modified: 20 Jan 2013, 19:59:35 UTC

Boinc couldn't get the files as they didn't exist on that network connection, so it totally gave up on them, it isn't because of Boinc 7.0.44, the project has had Workunits that have got lost before, this time i'd blame the proxy.

I've had this before when i've connected via a Wireless hotspot, once the login times out, downloads all fail,
I get round it by manually downloading the Wu's, overwriting the failed downloads, then editting my client_state.xml so the failed downloads haven't happened.

Claggy

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1248
Credit: 47,223,499
RAC: 112,920
United States
Message 1329563 - Posted: 20 Jan 2013, 20:16:41 UTC - in response to Message 1329553.

Boinc couldn't get the files as they didn't exist on that network connection, so it totally gave up on them, it isn't because of Boinc 7.0.44, the project has had Workunits that have got lost before, this time i'd blame the proxy.

I've had this before when i've connected via a Wireless hotspot, once the login times out, downloads all fail,
I get round it by manually downloading the Wu's, overwriting the failed downloads, then editting my client_state.xml so the failed downloads haven't happened.

Claggy

Can you point to a link that explains how to resume those downloads? I'd like to get them back. I'm not concerned about erasing the errors, download errors are almost a badge of honor at this point.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,037
RAC: 5,766
United Kingdom
Message 1329575 - Posted: 20 Jan 2013, 21:00:25 UTC - in response to Message 1329563.

Boinc couldn't get the files as they didn't exist on that network connection, so it totally gave up on them, it isn't because of Boinc 7.0.44, the project has had Workunits that have got lost before, this time i'd blame the proxy.

I've had this before when i've connected via a Wireless hotspot, once the login times out, downloads all fail,
I get round it by manually downloading the Wu's, overwriting the failed downloads, then editting my client_state.xml so the failed downloads haven't happened.

Claggy

Can you point to a link that explains how to resume those downloads? I'd like to get them back. I'm not concerned about erasing the errors, download errors are almost a badge of honor at this point.

You can't resume the download, but what you can do if you haven't already reported the errored downloads, is get the urls from your client_state.xml,
download the Wu's with a download manager, overwrite the failed downloads with your good downloads, and edit your client_state.xml (very carefully) like so:

http://setiathome.berkeley.edu/forum_thread.php?id=68768&postid=1260895

If you're reported them, then there's nothing you can do.

Claggy

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1248
Credit: 47,223,499
RAC: 112,920
United States
Message 1329583 - Posted: 20 Jan 2013, 21:18:41 UTC - in response to Message 1329575.

They were reported right after I discontinued the proxy. I seem to be having trouble receiving AP work for the ATI card, and while I wasn't paying attention, over half my ATI AP cache was replaced by Cuda Shorties. I'm down to about 10 hours work for the 6850.

Chris
Send message
Joined: 11 Apr 12
Posts: 9
Credit: 354,608
RAC: 0
United States
Message 1329689 - Posted: 21 Jan 2013, 2:04:16 UTC

Burning through the short work units before the APs download if I don't babysit.

Looks like the server wont send more MB if there are AP in queue but boinc won't go to a backup project (zero work share) if the downloads are in progress. I can suspend the APs so it goes to backup at least.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 73,314,600
RAC: 90,478
Argentina
Message 1329760 - Posted: 21 Jan 2013, 4:34:22 UTC - in response to Message 1329689.

Burning through the short work units before the APs download if I don't babysit.

Looks like the server wont send more MB if there are AP in queue but boinc won't go to a backup project (zero work share) if the downloads are in progress. I can suspend the APs so it goes to backup at least.

Exactly that is happening on my hosts...
If I dont care about them, they stop working at all... the stalled downloads prevents the project to get any aditional work and as there is work "pending" for the main project the backup projects are not asked for work...

But if I set any other project with a resource share higher than zero then those projects become the only ones crunched because, no matter what the cache size is, as SETI is not able to give work at all, BOINC fills the cache with work for the other projects...

At the end... the only workaround Ive found was to give up on SETI because with the current limits, there is no way to keep my hosts feeded, not even with me fully dedicated to hit manually the retry button and/or the suspend/restart network menu options... So right now, my "SETI Crunchers" are set to 99% for SETI and 1% for Einstein and even with that settings they are crunching only for Einstein...
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5274
Credit: 292,528,830
RAC: 471,862
Brazil
Message 1329807 - Posted: 21 Jan 2013, 12:31:15 UTC - in response to Message 1329760.
Last modified: 21 Jan 2013, 12:32:39 UTC

The roule still in place, "far" you are from the lab (and we are far - internet web speaking) worst is the comunication problem.

From my side i can´t even feed the slowest hosts, imagine the 3x690 host (i stop to use it and switch the GPU´s on 3 diferent hosts)... it´s simply a hopeless task! Takes more time to DL than to process the WUs.

The only way to keep our hungry hosts feeded if when AP split stop and we can get a small part of the BW without comunicating errors.
____________

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11418
Credit: 14,175,341
RAC: 14,039
United States
Message 1329908 - Posted: 21 Jan 2013, 19:03:21 UTC

I just remoted into one of my hosts and found it in the middle of a download. It took a total of 4m59s. It's at the limit, so I'm satisfied with speed.

I'm not satisfied with the limit, though. I recently passed 3 million for Einstein and in a lot less time than it took to get from 1 million to 2. My resource shares remain at 110 Seti, 30 Einstein.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 11,565,100
RAC: 12,734
United Kingdom
Message 1329909 - Posted: 21 Jan 2013, 19:09:36 UTC
Last modified: 21 Jan 2013, 19:15:10 UTC

I should love to have some new work downloading, even if it takes all day for one. All I have had on my machine for the last 24 hours is:

"21/01/2013 19:03:15 | SETI@home | Reporting 2 completed tasks, not requesting new tasks"

I am now not even remotely near my customary cache but it still is "not requesting new tasks".

Is it me or is it something else?

[Edit]
Yes it was me. Somehow or other I had managed to set just one task to "Task suspended by user". Fixed. Doh.
____________

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11418
Credit: 14,175,341
RAC: 14,039
United States
Message 1329911 - Posted: 21 Jan 2013, 19:15:38 UTC - in response to Message 1329909.
Last modified: 21 Jan 2013, 19:17:05 UTC

I should love to have some new work downloading, even if it takes all day for one. All I have had on my machine for the last 24 hours is:

"21/01/2013 19:03:15 | SETI@home | Reporting 2 completed tasks, not requesting new tasks"

I am now not even remotely near my customary cache but it still is "not requesting new tasks".

Is it me or is it something else?

If your machine is not requesting new tasks, the issue is on it and not the servers. What is the total estimated time of all the work you have on hand? Do you have any work for other projects?

[edit]Saw your edit after I posted. Glad you figured it out.
____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4087
Credit: 32,996,037
RAC: 5,766
United Kingdom
Message 1329912 - Posted: 21 Jan 2013, 19:16:26 UTC - in response to Message 1329909.
Last modified: 21 Jan 2013, 19:17:59 UTC

I should love to have some new work downloading, even if it takes all day for one. All I have had on my machine for the last 24 hours is:

"21/01/2013 19:03:15 | SETI@home | Reporting 2 completed tasks, not requesting new tasks"

I am now not even remotely near my customary cache but it still is "not requesting new tasks".

Is it me or is it something else?

[Edit]
Yes it was me. Somehow or other I had managed to set just one task to "Task suspended by user". Fixed. Doh.

Make sure none of your downloads are Backed off, otherwise Boinc won't ask for work.

Edit: That'll stop Boinc asking for work too, glad you're worked it out.

Claggy

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1248
Credit: 47,223,499
RAC: 112,920
United States
Message 1329979 - Posted: 21 Jan 2013, 23:26:38 UTC

It looks like the Shortie Storm is over, and my downloads are working much better now. I just downloaded 7 Cuda MBs and it took less than 2 minutes.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2014 University of California