Message boards :
Number crunching :
Panic Mode On (52) Server problems?
Message board moderation
Author | Message |
---|---|
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out. Odd. Grant Darwin NT |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Same here. Dave |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30674 Credit: 53,134,872 RAC: 32 |
On my rigs, three of them get all the work they want, the other, the faster one, doesn't get its fill. |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out. Remember that no matter how many "Results Ready to Send" the Status page shows, there are only 100 available in the Download Feeder during any given 6-second feeder cycle. If your Scheduler Request hits the server when the feeder is empty, or the Feeder has no Tasks of the type you are asking for, you get none, and have to wait for the backoff to clear before asking again. And with the Download pipe maxed out, even with the HE router problems blocking access for some crunchers, it could take MANY tries to get new work. Patience, etc...... Donald Infernal Optimist / Submariner, retired |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
My fastest just filled this afternoon, since 8/4. And it did take some massaging. But at least it is out of the way for the most part. It does sound like they need some spare memory, of what size/shape/configuration remains to be heard. Whether that is top priority or not is also a question. Do they need router memory, or just a new router? I have no idea. But we will be listening for when they let us know. Janice |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out. That maybe the case, but the fact is the Work in Progress should be climbing as people's caches are being filled. And the mix of work isn't that different from what it was a few days ago, so people are going to need as many WUs now as then in order to fill their caches. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Hmmm. The last 5 requests for work have resulted in "No work sent" messages. Network traffic is still maxed out, there's plenty of work in the Ready to Send buffer & usually you get a "No work available" message if the feeder is empty at the time of a request. I hope this isn't a repeat of a few weeks ago when it took 10-20 requests before any work would be allocated. Then another 10-20 requests. And so on. I had been hoping my caches might finally be filled this weekend. Grant Darwin NT |
Slavac Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0 |
8/12/2011 7:41:13 PM | SETI@home | No tasks sent 8/12/2011 7:46:26 PM | SETI@home | No tasks sent 8/12/2011 7:51:36 PM | SETI@home | No tasks sent 8/12/2011 7:56:49 PM | SETI@home | No tasks sent 8/12/2011 8:08:04 PM | SETI@home | No tasks sent ;) Executive Director GPU Users Group Inc. - brad@gpuug.org |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
I've been getting more & more "No work sent" messages from the Scheduler, even though there's plenty in the Ready to Send buffer. AP Work in Progress continues to grow, yet the MB Work in Progress is declining. All while the network traffic is still maxed out. I don't think that that is necessarily so. When we had the "shorty storms" recently, folks were burning through them faster than they could upload the completed results or download new work. In fact, IIRC, some folks were having so much trouble uploading completed results that they got caught by the 2x CPU(GPU) Upload limit [If #uploads pending >= 2x processor cores, No New Work]. Grant, I don't know what else to suggest. Maybe the servers just don't like you? (8{) Donald Infernal Optimist / Submariner, retired |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Grant, I don't know what else to suggest. Maybe the servers just don't like you? (8{) That's the problem- it's a server issue. At least this time it's not as bad as last time. Last time it was 10-20 requests before you'd get work, at the moment it's 5-10. Last time my cache was getting smaller. This time it's growning (just barely). Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22220 Credit: 416,307,556 RAC: 380 |
Some folks just don't get the idea of a cache - its there to provide work when work can can't be obtained for whatever reason. Cache management is about having enough work available so your system is barely affected by problems obtaining work. To that end caches do not need to be full all the time. So what does that mean? In S@H terms you need enough in your cache to carry you over a typical outage - at the start of that outage. There is a published outage of up to three days, but in recent weeks we've seen some unplanned outages of a couple of days, but with a day or so recovery between planned and unplanned. So you could really do with a cache of about 5 days, maybe six, not the boasted about 20days that I saw somewhere (I think this was achieved by setting a long delay between access time and a "real" large cache). Indeed having 20 days worth of work might be counter productive if you start to get a lot of fails, have a big power outage, lose your internet connection, have a major hardware or software crash. As far as I can see BOINC attempts to fill the cache as fast as possible, and keep it as full as possible. While sounding good its not necessarily the best way of managing a cache when you have a limited feed bandwidth. It is far better to allow the cache to float down to sensible fraction (typically 60-70%) before starting to fill, then fill at a rate that will get the cache back to over 90% within 50% of the cache duration, the final 10 being topped off in another 50% of the cache duration. There is a downside forced by the S&H behaviour in recent weeks - you may need a bigger cache, by about 10%, which works out something like half to one day. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I have to disagree on that. It doesn´t really matter what the cache size is. My host will download 150+ tasks a day no matter whats already in my cache. You only fill up the cache once. With each crime and every kindness we birth our future. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
Some folks just don't get the idea of a cache - its there to provide work when work can can't be obtained for whatever reason. Nope- it doesn't have to be full all the time. It just has to have enough work to get you through an outage. And if you can't re-fill your depleted cache when work is available, you will end up running out when it's not available. Lately, 2 days is about the longest outage we've had, so a 2 day cache should be enough. But the fact is that even though i have a 5 day cache setting for my systems the difficulty in getting work lately means that if i hadn't started with a full 5 day cache before these issues, i would have run out of work several times over the last couple of months because i've been returning work faster than i can replace it. One of my systems runs an older client which basically asks for work every 5 min when it requires more. It's Average Turn Around time is 5 days for GPU & 4 days for CPU. My other system with the current client which has the various backoffs- the most annoying being the extremely long broject backoffs- has turn around times of 4 days for GPU & only 2.3 days for the CPU. Both systems started off with a 5 day cache of work, but the one with the current client due do it's backoffs & the server issues is unable to to build the cache up to 5 days. It's actually struggling to maintain even a couple of days work for the GPU. The faster machines having smaller caches would reduce the server load considerably. The problem is that it's difficult to maintain even just a small percentage of a cache's setting, and if you can't refill the cache before the next outage you will run out of work. So people want even more work in order to deal with the problem of not being able to get enough work when it's available. Grant Darwin NT |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
The faster machines having smaller caches would reduce the server load considerably No, they will ask for the same amount of work, no matter how big the cache is. With each crime and every kindness we birth our future. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874 |
The faster machines having smaller caches would reduce the server load considerably But each scheduler request contains a report of all tasks currently cached on the host making the request. That has two consequences: 1) The file itself is large for hosts with a large cache. That adds to upload traffic. 2) The host's reported task list is matched against the server's 'work in progress' list (which is what enables 'resent lost task'). That's a heavy server load in itself - it simply wasn't possible here at SETI, until the new servers arrived last summer, because of the numbers involved. Serparately from the 'server' load in processing the requests, there's also a 'storage' load which increases as average turnround time increases. |
SupeRNovA Send message Joined: 25 Oct 04 Posts: 131 Credit: 12,741,814 RAC: 0 |
i have 2x295gpu's and quad processor and running out of work units i can't imagine when i install other 2x295gpu's what will be. there is just not enough work unit. i can't fill up the cache just for 1 day what about to fill it up to 10 days... |
rob smith Send message Joined: 7 Mar 03 Posts: 22220 Credit: 416,307,556 RAC: 380 |
One has to ask why you are struggling while others are achieving much hight RAC than yourself. One thing to look at is your error rate, are you getting more than the odd unexpected error? If the rate is too high then S@H will reduce the number of WU sent out to you. (I know, as I suffered a cruncher problem which caused my error rate to become unacceptable) Others will provide the trigger points, and the reduced rates. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
SupeRNovA Send message Joined: 25 Oct 04 Posts: 131 Credit: 12,741,814 RAC: 0 |
nope i haven't got a lot of errors. i have install the 2x295gtx before 2 days and my cache has gone. that is why my RAC is so low and waiting for new work units. i receive work units but they are not enough to work for 24/7 |
rob smith Send message Joined: 7 Mar 03 Posts: 22220 Credit: 416,307,556 RAC: 380 |
Someone else might like to comment, but your cruncher with the two GPU has had more than 10 errors in the last 24 hours, compared to others with similar setups that are giving less than one error per day. Anyway, the blockage appears to have been cleared as there are quite a number of WU (mostly AP for your GPUs). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.