Panic Mode On (52) Server problems?

Message boards : Number crunching : Panic Mode On (52) Server problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1138681 - Posted: 11 Aug 2011, 0:15:59 UTC

And on to #52!!

ID: 1138681 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1138943 - Posted: 11 Aug 2011, 18:46:01 UTC - in response to Message 1138681.  


I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out.
Odd.
Grant
Darwin NT
ID: 1138943 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 1138952 - Posted: 11 Aug 2011, 19:01:28 UTC

Same here.

Dave

ID: 1138952 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30674
Credit: 53,134,872
RAC: 32
United States
Message 1139006 - Posted: 11 Aug 2011, 21:21:24 UTC - in response to Message 1138943.  

On my rigs, three of them get all the work they want, the other, the faster one, doesn't get its fill.

ID: 1139006 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1139299 - Posted: 12 Aug 2011, 6:25:41 UTC - in response to Message 1138943.  

I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out.
Odd.

Remember that no matter how many "Results Ready to Send" the Status page shows, there are only 100 available in the Download Feeder during any given 6-second feeder cycle. If your Scheduler Request hits the server when the feeder is empty, or the Feeder has no Tasks of the type you are asking for, you get none, and have to wait for the backoff to clear before asking again.

And with the Download pipe maxed out, even with the HE router problems blocking access for some crunchers, it could take MANY tries to get new work.

Patience, etc......
Donald
Infernal Optimist / Submariner, retired
ID: 1139299 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1139303 - Posted: 12 Aug 2011, 6:35:05 UTC

My fastest just filled this afternoon, since 8/4. And it did take some massaging. But at least it is out of the way for the most part.

It does sound like they need some spare memory, of what size/shape/configuration remains to be heard. Whether that is top priority or not is also a question.

Do they need router memory, or just a new router? I have no idea. But we will be listening for when they let us know.
Janice
ID: 1139303 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1139348 - Posted: 12 Aug 2011, 7:33:08 UTC - in response to Message 1139299.  

I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out.
Odd.

Remember that no matter how many "Results Ready to Send" the Status page shows, there are only 100 available in the Download Feeder during any given 6-second feeder cycle. If your Scheduler Request hits the server when the feeder is empty, or the Feeder has no Tasks of the type you are asking for, you get none, and have to wait for the backoff to clear before asking again.

That maybe the case, but the fact is the Work in Progress should be climbing as people's caches are being filled. And the mix of work isn't that different from what it was a few days ago, so people are going to need as many WUs now as then in order to fill their caches.
Grant
Darwin NT
ID: 1139348 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1139636 - Posted: 12 Aug 2011, 23:47:12 UTC - in response to Message 1139348.  


Hmmm.
The last 5 requests for work have resulted in "No work sent" messages. Network traffic is still maxed out, there's plenty of work in the Ready to Send buffer & usually you get a "No work available" message if the feeder is empty at the time of a request.
I hope this isn't a repeat of a few weeks ago when it took 10-20 requests before any work would be allocated. Then another 10-20 requests. And so on. I had been hoping my caches might finally be filled this weekend.
Grant
Darwin NT
ID: 1139636 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1139661 - Posted: 13 Aug 2011, 1:07:55 UTC - in response to Message 1139636.  


Hmmm.
The last 5 requests for work have resulted in "No work sent" messages. Network traffic is still maxed out, there's plenty of work in the Ready to Send buffer & usually you get a "No work available" message if the feeder is empty at the time of a request.
I hope this isn't a repeat of a few weeks ago when it took 10-20 requests before any work would be allocated. Then another 10-20 requests. And so on. I had been hoping my caches might finally be filled this weekend.


8/12/2011 7:41:13 PM | SETI@home | No tasks sent
8/12/2011 7:46:26 PM | SETI@home | No tasks sent
8/12/2011 7:51:36 PM | SETI@home | No tasks sent
8/12/2011 7:56:49 PM | SETI@home | No tasks sent
8/12/2011 8:08:04 PM | SETI@home | No tasks sent

;)


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1139661 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1139968 - Posted: 13 Aug 2011, 6:37:44 UTC - in response to Message 1139348.  

I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out.
Odd.

Remember that no matter how many "Results Ready to Send" the Status page shows, there are only 100 available in the Download Feeder during any given 6-second feeder cycle. If your Scheduler Request hits the server when the feeder is empty, or the Feeder has no Tasks of the type you are asking for, you get none, and have to wait for the backoff to clear before asking again.

That maybe the case, but the fact is the Work in Progress should be climbing as people's caches are being filled. And the mix of work isn't that different from what it was a few days ago, so people are going to need as many WUs now as then in order to fill their caches.

I don't think that that is necessarily so. When we had the "shorty storms" recently, folks were burning through them faster than they could upload the completed results or download new work. In fact, IIRC, some folks were having so much trouble uploading completed results that they got caught by the 2x CPU(GPU) Upload limit [If #uploads pending >= 2x processor cores, No New Work].

Grant, I don't know what else to suggest. Maybe the servers just don't like you? (8{)

Donald
Infernal Optimist / Submariner, retired
ID: 1139968 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1139998 - Posted: 13 Aug 2011, 8:00:54 UTC - in response to Message 1139968.  

Grant, I don't know what else to suggest. Maybe the servers just don't like you? (8{)

That's the problem- it's a server issue.
At least this time it's not as bad as last time. Last time it was 10-20 requests before you'd get work, at the moment it's 5-10. Last time my cache was getting smaller. This time it's growning (just barely).
Grant
Darwin NT
ID: 1139998 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22220
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1140003 - Posted: 13 Aug 2011, 8:15:34 UTC

Some folks just don't get the idea of a cache - its there to provide work when work can can't be obtained for whatever reason.
Cache management is about having enough work available so your system is barely affected by problems obtaining work. To that end caches do not need to be full all the time.
So what does that mean?
In S@H terms you need enough in your cache to carry you over a typical outage - at the start of that outage. There is a published outage of up to three days, but in recent weeks we've seen some unplanned outages of a couple of days, but with a day or so recovery between planned and unplanned.
So you could really do with a cache of about 5 days, maybe six, not the boasted about 20days that I saw somewhere (I think this was achieved by setting a long delay between access time and a "real" large cache). Indeed having 20 days worth of work might be counter productive if you start to get a lot of fails, have a big power outage, lose your internet connection, have a major hardware or software crash.

As far as I can see BOINC attempts to fill the cache as fast as possible, and keep it as full as possible. While sounding good its not necessarily the best way of managing a cache when you have a limited feed bandwidth. It is far better to allow the cache to float down to sensible fraction (typically 60-70%) before starting to fill, then fill at a rate that will get the cache back to over 90% within 50% of the cache duration, the final 10 being topped off in another 50% of the cache duration. There is a downside forced by the S&H behaviour in recent weeks - you may need a bigger cache, by about 10%, which works out something like half to one day.

Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1140003 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1140013 - Posted: 13 Aug 2011, 8:42:38 UTC

I have to disagree on that.

It doesn´t really matter what the cache size is.

My host will download 150+ tasks a day no matter whats already in my cache.
You only fill up the cache once.



With each crime and every kindness we birth our future.
ID: 1140013 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1140019 - Posted: 13 Aug 2011, 9:34:23 UTC - in response to Message 1140003.  

Some folks just don't get the idea of a cache - its there to provide work when work can can't be obtained for whatever reason.
Cache management is about having enough work available so your system is barely affected by problems obtaining work. To that end caches do not need to be full all the time.

Nope- it doesn't have to be full all the time. It just has to have enough work to get you through an outage.
And if you can't re-fill your depleted cache when work is available, you will end up running out when it's not available.

Lately, 2 days is about the longest outage we've had, so a 2 day cache should be enough. But the fact is that even though i have a 5 day cache setting for my systems the difficulty in getting work lately means that if i hadn't started with a full 5 day cache before these issues, i would have run out of work several times over the last couple of months because i've been returning work faster than i can replace it.


One of my systems runs an older client which basically asks for work every 5 min when it requires more. It's Average Turn Around time is 5 days for GPU & 4 days for CPU.
My other system with the current client which has the various backoffs- the most annoying being the extremely long broject backoffs- has turn around times of 4 days for GPU & only 2.3 days for the CPU.

Both systems started off with a 5 day cache of work, but the one with the current client due do it's backoffs & the server issues is unable to to build the cache up to 5 days. It's actually struggling to maintain even a couple of days work for the GPU.

The faster machines having smaller caches would reduce the server load considerably. The problem is that it's difficult to maintain even just a small percentage of a cache's setting, and if you can't refill the cache before the next outage you will run out of work. So people want even more work in order to deal with the problem of not being able to get enough work when it's available.
Grant
Darwin NT
ID: 1140019 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1140025 - Posted: 13 Aug 2011, 10:05:19 UTC

The faster machines having smaller caches would reduce the server load considerably


No, they will ask for the same amount of work, no matter how big the cache is.



With each crime and every kindness we birth our future.
ID: 1140025 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1140031 - Posted: 13 Aug 2011, 10:40:02 UTC - in response to Message 1140025.  

The faster machines having smaller caches would reduce the server load considerably

No, they will ask for the same amount of work, no matter how big the cache is.

But each scheduler request contains a report of all tasks currently cached on the host making the request. That has two consequences:

1) The file itself is large for hosts with a large cache. That adds to upload traffic.

2) The host's reported task list is matched against the server's 'work in progress' list (which is what enables 'resent lost task'). That's a heavy server load in itself - it simply wasn't possible here at SETI, until the new servers arrived last summer, because of the numbers involved.

Serparately from the 'server' load in processing the requests, there's also a 'storage' load which increases as average turnround time increases.
ID: 1140031 · Report as offensive
SupeRNovA
Volunteer tester
Avatar

Send message
Joined: 25 Oct 04
Posts: 131
Credit: 12,741,814
RAC: 0
Bulgaria
Message 1140033 - Posted: 13 Aug 2011, 11:20:12 UTC

i have 2x295gpu's and quad processor and running out of work units i can't imagine when i install other 2x295gpu's what will be. there is just not enough work unit. i can't fill up the cache just for 1 day what about to fill it up to 10 days...
ID: 1140033 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22220
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1140037 - Posted: 13 Aug 2011, 12:06:56 UTC

One has to ask why you are struggling while others are achieving much hight RAC than yourself.
One thing to look at is your error rate, are you getting more than the odd unexpected error? If the rate is too high then S@H will reduce the number of WU sent out to you. (I know, as I suffered a cruncher problem which caused my error rate to become unacceptable) Others will provide the trigger points, and the reduced rates.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1140037 · Report as offensive
SupeRNovA
Volunteer tester
Avatar

Send message
Joined: 25 Oct 04
Posts: 131
Credit: 12,741,814
RAC: 0
Bulgaria
Message 1140068 - Posted: 13 Aug 2011, 15:35:38 UTC - in response to Message 1140037.  

nope i haven't got a lot of errors. i have install the 2x295gtx before 2 days and my cache has gone. that is why my RAC is so low and waiting for new work units. i receive work units but they are not enough to work for 24/7
ID: 1140068 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22220
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1140089 - Posted: 13 Aug 2011, 16:10:39 UTC

Someone else might like to comment, but your cruncher with the two GPU has had more than 10 errors in the last 24 hours, compared to others with similar setups that are giving less than one error per day.

Anyway, the blockage appears to have been cleared as there are quite a number of WU (mostly AP for your GPUs).
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1140089 · Report as offensive
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (52) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.