Panic Mode On (74) Server problems?

Message boards : Number crunching : Panic Mode On (74) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1230092 - Posted: 10 May 2012, 17:57:42 UTC - in response to Message 1230081.  
Last modified: 10 May 2012, 18:00:21 UTC

If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops.

Wow, the 980X hast just hit 4,000 WUs cached.
ID: 1230092 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1230096 - Posted: 10 May 2012, 18:05:48 UTC - in response to Message 1230092.  

If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops.

Wow, the 980X hast just hit 4,000 WUs cached.

I have enough GPU work to last a bit, so I am going to do the 'uncheck use nvidia GPU' trick to get some CPU work flowing.

But, that is a workaround, and should not be necessary.


"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1230096 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1230097 - Posted: 10 May 2012, 18:06:25 UTC - in response to Message 1230072.  

I would like to see a bigger fifo so fewer requests are needed to replenish the cache.


It's called the feeder.

The usual workaround is to disable the resource in the project prefs that is getting all the tasks, until the 'slower' has some sort of cache.

The other option would be to reduce cache, allow the slower resource to catch up and then gradually increase cache again.

It will eventually get sorted by itself, but if you have a large cache to fill, it may take quite a while until you have single resource requests again instead of double ones.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1230097 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1230235 - Posted: 10 May 2012, 21:43:44 UTC - in response to Message 1230136.  

Who stole all APs? Or, who stole the AP splitters?

It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore).

Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1230235 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1230303 - Posted: 10 May 2012, 23:27:22 UTC - in response to Message 1230235.  

Who stole all APs? Or, who stole the AP splitters?

It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore).

Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split.


I am down to 18 from 65 a couple of days ago.

ID: 1230303 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1230426 - Posted: 11 May 2012, 6:05:02 UTC

Well this is just starting to be almost slightly irritating. Because of the adjustments to the estimates, I ended up with like a 22-day AP-only cache and therefore, my average turnaround time was in the high teens. The result of this was that most of my wingmates were waiting for me, so I ended up with nearly every reported result being validated immediately.

But since there hasn't been new work going out and my cache is now down in the ~8-day range, I'm starting to pick up more and more pendings when I report. Oh well. That's the way it goes.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1230426 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 1230437 - Posted: 11 May 2012, 6:27:44 UTC


I still reckon something's not quite right.
I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work.
Grant
Darwin NT
ID: 1230437 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1230442 - Posted: 11 May 2012, 6:33:22 UTC - in response to Message 1230437.  


I still reckon something's not quite right.
I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work.

Well, the tasks are going out, because we're now over 5.5 million out in the field. I don't know how big that figure can be before the database starts slowing down...
ID: 1230442 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1230458 - Posted: 11 May 2012, 8:24:49 UTC - in response to Message 1230437.  
Last modified: 11 May 2012, 8:51:52 UTC

I still reckon something's not quite right.
I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work.

Now there are no limits I expect many hosts are asking for and getting the entire of the feeder buffer. Getting WUs is going to be a problem 'till all the caches are full. I feel it would help a lot if the feeder could have a bigger buffer.

I am puzzled as to why the Result average turnaround is dropping though.
ID: 1230458 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1230538 - Posted: 11 May 2012, 14:42:06 UTC - in response to Message 1230458.  

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1230538 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1230557 - Posted: 11 May 2012, 15:47:14 UTC - in response to Message 1230538.  
Last modified: 11 May 2012, 15:54:30 UTC

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.


that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.


Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1230557 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1230559 - Posted: 11 May 2012, 15:52:35 UTC

I suspect the turnaround time is dropping due to many hosts going into EDF as their caches increase in size and Boinc tries to adjust, thus bringing them to the front of the cache and returning them sooner than they would be otherwise.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1230559 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1230574 - Posted: 11 May 2012, 16:25:17 UTC - in response to Message 1230557.  

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Duh! I just read it wrong, I guess.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.

Thanks.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1230574 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 1230691 - Posted: 11 May 2012, 22:51:36 UTC
Last modified: 11 May 2012, 22:51:54 UTC

So what's the worst that could happen? What could possibly go wrong?
me@rescam.org
ID: 1230691 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1230696 - Posted: 11 May 2012, 23:05:04 UTC - in response to Message 1230691.  

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1230696 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1230697 - Posted: 11 May 2012, 23:07:43 UTC - in response to Message 1230696.  

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>
ID: 1230697 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1230763 - Posted: 12 May 2012, 2:30:20 UTC - in response to Message 1230557.  

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

I am shure that started with 6:12:33 or somewhere near that version,
of all the changes that have been made that is one of them that i noticed and like . . .
ID: 1230763 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1230770 - Posted: 12 May 2012, 3:27:27 UTC - in response to Message 1230697.  

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>

I've got 1067 on the 590 here waiting to be worked on, I have 1227 pending of course, but then I'm testing testing out some modified 590 firmware, so far so good, the 1598 seems to like it, no driver crashes seen today since the fan went to 100% and the volts is now at 0.950v(since about 7am today), I'm using the 2.20 final version of MSI Afterburner, I like it better than EVGA's version, the EVGA has the looks, but the functionality is in the MSI, oh well, the softwares free at least.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1230770 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1230806 - Posted: 12 May 2012, 4:55:50 UTC - in response to Message 1230770.  

I'm pretty happy here now as my 3 rigs arn't hitting up the servers every 5 mins or so and I havn't seen that annoying limit message for days, that alone should take a fair bit of strain off the connection (until next month at least when I'm planning a few little upgrades).

Cheers.
ID: 1230806 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1230942 - Posted: 12 May 2012, 12:39:49 UTC

Personally I think the system works better with the limits in place.

Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold.

With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights.
ID: 1230942 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (74) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.