Panic Mode On (54) Server problems?

Message boards : Number crunching : Panic Mode On (54) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1153409 - Posted: 18 Sep 2011, 0:59:06 UTC - in response to Message 1152990.  


I'm still scratching my head over the Number of results returned per hour.
Even with a shorty storm from hell, i wouldn't have thought it's be this high for this long. And the fact is most of my CPU work is VLARs, and only 1 in 5 requests for GPU work results in any being sent.
Most of the time it's a "No tasks sent" message. Very occasionally i might get a "Project has no tasks available" message (feeder empty at that time).
Grant
Darwin NT
ID: 1153409 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153420 - Posted: 18 Sep 2011, 1:34:21 UTC - in response to Message 1153409.  
Last modified: 18 Sep 2011, 1:35:00 UTC


I'm still scratching my head over the Number of results returned per hour.
Even with a shorty storm from hell, i wouldn't have thought it's be this high for this long. And the fact is most of my CPU work is VLARs, and only 1 in 5 requests for GPU work results in any being sent.
Most of the time it's a "No tasks sent" message. Very occasionally i might get a "Project has no tasks available" message (feeder empty at that time).

Take a look at the curves for AP, they are going in the opposite direction. Which sort of confirms that AP d/loads have been postponed for now.

So with no AP tasks to fill in their time computers would need about 8 MB tasks on average. And the turnaround time will probably be because the estimates are so high then instead of d/loading 50 tasks/request they are getting just a few.

edit] beware of splinters LOL
ID: 1153420 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1153424 - Posted: 18 Sep 2011, 1:49:42 UTC - in response to Message 1153420.  

Take a look at the curves for AP, they are going in the opposite direction. Which sort of confirms that AP d/loads have been postponed for now.

Postponed, or just blocked by the off the chart runtime estimates? Even if people are stopping new AP work, i wouldn't have expected the effect to be as large as it has been, nor last as long as it has.
Grant
Darwin NT
ID: 1153424 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1153445 - Posted: 18 Sep 2011, 3:55:33 UTC - in response to Message 1153424.  
Last modified: 18 Sep 2011, 3:58:02 UTC

Take a look at the curves for AP, they are going in the opposite direction. Which sort of confirms that AP d/loads have been postponed for now.

Postponed, or just blocked by the off the chart runtime estimates? Even if people are stopping new AP work, i wouldn't have expected the effect to be as large as it has been, nor last as long as it has.


I have to agree with WinterKnight. My 980 rig has been running through a cache of over 100 AP's for the past week. It's now down to its last three in queue. I think BOINC is somehow bypassing AP units in its processes. When BOINC asks for new work, the return message has been "no work available" for AP since last Tuesday. The same holds true for my twin 580 rig, which also can only score 4 or 5 GPU WU's every third or fourth attempt.
ID: 1153445 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153448 - Posted: 18 Sep 2011, 4:47:52 UTC

Another pointer that indicates AP tasks are not been sent is that for the last 2 or 3 days the pipeline has been relatively free of blockages. I've seen relatively few tasks that have backed off and have not seen the dreaded Project backoff at all in that time.
ID: 1153448 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1153450 - Posted: 18 Sep 2011, 4:56:20 UTC - in response to Message 1153445.  
Last modified: 18 Sep 2011, 4:56:49 UTC

My 980 rig has been running through a cache of over 100 AP's for the past week. It's now down to its last three in queue.

What are its estimited completion times?
Has anyone got an AP WU recently- what was it's estimated completion time?


When BOINC asks for new work, the return message has been "no work available" for AP since last Tuesday.

Either the estimated completion times for new AP work are so huge you can only get a couple at a time, or something got borked with the Scheduler when they put in the patch. The AP Raedy to Send buffer just continues to grow, as the Work in Progress gets less & less.


The same holds true for my twin 580 rig, which also can only score 4 or 5 GPU WU's every third or fourth attempt.

I've got that problem too.
There just isn't enough GPU work about- it generally takes at least 5 attempts before any gets downloaded, and then it gets done in no time at all. Every now & then you get a couple of requests in a row that result in work.
The end result is that my GPU cache is pretty much stagnant. A couple of download bursts & it tops up slightly, then several non-results & it shrinks down again.
Grant
Darwin NT
ID: 1153450 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153454 - Posted: 18 Sep 2011, 5:22:41 UTC

The last AP tasks I received were at "13 Sep 2011 | 8:52:02 UTC" thats early Tuesday about 8 hours before the maintenance period.
ID: 1153454 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 1153457 - Posted: 18 Sep 2011, 6:08:03 UTC

I have 12 AP only machines. None has received a new task since the Tuesday outage.

Dave

ID: 1153457 · Report as offensive
Dad
Volunteer tester

Send message
Joined: 21 May 99
Posts: 44
Credit: 35,266,844
RAC: 10
United States
Message 1153461 - Posted: 18 Sep 2011, 6:59:22 UTC

YAY, finally got some gpu wo's!
ID: 1153461 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1153462 - Posted: 18 Sep 2011, 7:04:43 UTC - in response to Message 1153450.  

My 980 rig has been running through a cache of over 100 AP's for the past week. It's now down to its last three in queue.

What are its estimited completion times?
Has anyone got an AP WU recently- what was it's estimated completion time?

The last three, all sent on 9/13, are showing completion estimates of 16:27:31 before they start. Since I'm running 12 at once, actual times tend to be closer to 17:30/18:00.

This machine hasn't had any problems getting GPU work, though. Right now, my cache is 2700 GPU units, 2100 of which are VHAR. That's up from 2000 total last night at this time, and includes all my processing (3 295's 24/7)

The twin 580 rig is still FUBAR workwise, and is still lurching into EDF mode every time an AP unit finishes. With no GPU work of consequence to balance things out I don't see a solution until DA fixes things.

FWIW, the difference in performance of workfetch on these two rigs is, IMO, the fact that I never took the flops entries out of the 980 rig, but did take them out when I upgraded to the twin 580's on the other box.
ID: 1153462 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1153467 - Posted: 18 Sep 2011, 7:34:48 UTC - in response to Message 1153462.  

My 980 rig has been running through a cache of over 100 AP's for the past week. It's now down to its last three in queue.

What are its estimited completion times?
Has anyone got an AP WU recently- what was it's estimated completion time?

The last three, all sent on 9/13, are showing completion estimates of 16:27:31 before they start. Since I'm running 12 at once, actual times tend to be closer to 17:30/18:00.

So you've got nothing new since the outage either.
Grant
Darwin NT
ID: 1153467 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1153469 - Posted: 18 Sep 2011, 7:44:37 UTC
Last modified: 18 Sep 2011, 7:45:39 UTC

Don't know if it's intentional or not, but AP work in the field has been decreasing since the outage. See the Scarecrow graphs.
And MB in the field has been steadily increasing, which is amazing given the shorty storm and the number of results being returned per hour. I have never seen such a high level of returned work sustained this long.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1153469 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1153472 - Posted: 18 Sep 2011, 7:50:44 UTC - in response to Message 1153469.  

And MB in the field has been steadily increasing, which is amazing given the shorty storm and the number of results being returned per hour.

It's also amazing in that i'm unable to build up the caches of GPU work. Most requests for work result in none, then i get a couple of good requests that bump the cache (as piddling as it is) back to where it was. I just can't get it to grow.
Grant
Darwin NT
ID: 1153472 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1153475 - Posted: 18 Sep 2011, 7:58:39 UTC - in response to Message 1153472.  

And MB in the field has been steadily increasing, which is amazing given the shorty storm and the number of results being returned per hour.

It's also amazing in that i'm unable to build up the caches of GPU work. Most requests for work result in none, then i get a couple of good requests that bump the cache (as piddling as it is) back to where it was. I just can't get it to grow.

Same here...most rigs are just maintaining about where they are.
The problem has been compounded by the Boinc code revision that has totally skewed DCFs and work requests because of bloated time to completion estimates.
That might get fixed with a code update during the next outage.

But, unless the work mix coming from Arecibo changes, this is not gonna get better anytime soon. And, in fact, may get worse if AP starts being sent out again, monopolizing the available bandwidth.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1153475 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1153546 - Posted: 18 Sep 2011, 11:46:38 UTC

GPU Still sucking air. Get a handful, 0 available repeatedly, finish 'em up, a while later another handful. Filling not even an issue. Getting ANY is.


Janice
ID: 1153546 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153547 - Posted: 18 Sep 2011, 11:52:57 UTC - in response to Message 1153546.  

GPU Still sucking air. Get a handful, 0 available repeatedly, finish 'em up, a while later another handful. Filling not even an issue. Getting ANY is.


What's your DCF?
ID: 1153547 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1153551 - Posted: 18 Sep 2011, 12:00:15 UTC - in response to Message 1153547.  

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.
Janice
ID: 1153551 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153553 - Posted: 18 Sep 2011, 12:09:06 UTC - in response to Message 1153551.  

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.
ID: 1153553 · Report as offensive
IFRS
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1736
Credit: 259,180,282
RAC: 0
Brazil
Message 1153589 - Posted: 18 Sep 2011, 14:06:44 UTC

It´s not just the client problem. Even if you request work and it´s available, it´s not beeing assigned. It keeps just sending 0 or 1 even if you are drain.
ID: 1153589 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1153592 - Posted: 18 Sep 2011, 14:21:48 UTC
Last modified: 18 Sep 2011, 14:22:58 UTC

I wouldn't be surprised if a lot of feeder slots are permanently occupied by APs no one wants to have (because no doubt AP task duration is badly F-ed up, too).
ID: 1153592 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (54) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.