Panic Mode On (74) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (74) Server problems?

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next
Author Message
N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12493
Credit: 14,825,275
RAC: 3,716
United States
Message 1230538 - Posted: 11 May 2012, 14:42:06 UTC - in response to Message 1230458.

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1230557 - Posted: 11 May 2012, 15:47:14 UTC - in response to Message 1230538.
Last modified: 11 May 2012, 15:54:30 UTC

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.


that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.


Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.
____________
I'm not the Pope. I don't speak Ex Cathedra!

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 12493
Credit: 14,825,275
RAC: 3,716
United States
Message 1230574 - Posted: 11 May 2012, 16:25:17 UTC - in response to Message 1230557.

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Duh! I just read it wrong, I guess.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.

Thanks.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile Misfit
Volunteer tester
Avatar
Send message
Joined: 21 Jun 01
Posts: 21790
Credit: 2,510,901
RAC: 0
United States
Message 1230691 - Posted: 11 May 2012, 22:51:36 UTC
Last modified: 11 May 2012, 22:51:54 UTC

So what's the worst that could happen? What could possibly go wrong?
____________

Join BOINC Synergy!

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1230696 - Posted: 11 May 2012, 23:05:04 UTC - in response to Message 1230691.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1230697 - Posted: 11 May 2012, 23:07:43 UTC - in response to Message 1230696.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>
____________

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,080,534
RAC: 585
United Kingdom
Message 1230763 - Posted: 12 May 2012, 2:30:20 UTC - in response to Message 1230557.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

I am shure that started with 6:12:33 or somewhere near that version,
of all the changes that have been made that is one of them that i noticed and like . . .

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46785
Credit: 36,999,907
RAC: 3,059
United States
Message 1230770 - Posted: 12 May 2012, 3:27:27 UTC - in response to Message 1230697.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>

I've got 1067 on the 590 here waiting to be worked on, I have 1227 pending of course, but then I'm testing testing out some modified 590 firmware, so far so good, the 1598 seems to like it, no driver crashes seen today since the fan went to 100% and the volts is now at 0.950v(since about 7am today), I'm using the 2.20 final version of MSI Afterburner, I like it better than EVGA's version, the EVGA has the looks, but the functionality is in the MSI, oh well, the softwares free at least.
____________
My Facebook, War Commander, 2015

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7896
Credit: 98,324,122
RAC: 30,204
Australia
Message 1230806 - Posted: 12 May 2012, 4:55:50 UTC - in response to Message 1230770.

I'm pretty happy here now as my 3 rigs arn't hitting up the servers every 5 mins or so and I havn't seen that annoying limit message for days, that alone should take a fair bit of strain off the connection (until next month at least when I'm planning a few little upgrades).

Cheers.
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1759
Credit: 206,462,713
RAC: 13,082
Australia
Message 1230942 - Posted: 12 May 2012, 12:39:49 UTC

Personally I think the system works better with the limits in place.

Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold.

With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3682
Credit: 21,180,717
RAC: 6,889
Sweden
Message 1231013 - Posted: 12 May 2012, 14:19:40 UTC

Oh Oh....AP is being split again. Grab them while you can. I only crunch AP 6 on my Celeron Laptop for now, here on main, and that one doesn't compete for any award for the fastest cruncher ever built.

LOL
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 639
Credit: 146,906,381
RAC: 71,571
United Kingdom
Message 1231074 - Posted: 12 May 2012, 17:17:43 UTC - in response to Message 1231013.

Oh Oh....AP is being split again. Grab them while you can. I only crunch AP 6 on my Celeron Laptop for now, here on main, and that one doesn't compete for any award for the fastest cruncher ever built.

LOL

Ah, that explains the uptick in the cricket download graph. Oh, well, this latest free-for-all has left me with a rather large cache on most machines, so I'm not overly concerned about download speeds. Also the UK weather finally cleared up enough for me to take one of my quad-cores outside to blow the dust out of it and replace the PSU that failed three weeks ago. My RAC on that machine is down to 500 from its normal 3600, so I expect my overall RAC to improve even further now -- some of my newer machines still haven't reached a plateau.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8760
Credit: 52,712,719
RAC: 21,695
United Kingdom
Message 1232395 - Posted: 15 May 2012, 8:22:45 UTC

OK, been too long since we had anything to panic about.

There's a 'tape' on the splitter list called 01ap10hv. They start with aa, ab, ac ... and count up.

So, did they really record 203 tapes that day? Over 10 TB? Or is ET pulling our leg?

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4598
Credit: 121,599,744
RAC: 46,234
United States
Message 1232478 - Posted: 15 May 2012, 13:46:26 UTC - in response to Message 1232395.

OK, been too long since we had anything to panic about.

There's a 'tape' on the splitter list called 01ap10hv. They start with aa, ab, ac ... and count up.

So, did they really record 203 tapes that day? Over 10 TB? Or is ET pulling our leg?

I didn't know they had that much on site storage. Perhaps a hiccup in the recording software?
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1759
Credit: 206,462,713
RAC: 13,082
Australia
Message 1232485 - Posted: 15 May 2012, 14:04:53 UTC - in response to Message 1230944.

EDIT 2......
And you might find better time crunching than to pick a bit with the #6 cruncher in the place......
There are reasons I am where I am. Cache is one of them.
It got me crunching through times when the servers were down and out.
I kept going.
Peace.

Shields down Scottie.

AFAIK there is more than one person on this project running a 10 day cache. I was talking generally and using a figure of speech, not having a go at you or anyone else in particular.

If I had wanted to ping you personally I'd have said something much more pointed.

Peace

Amen

T.A.

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4896
Credit: 83,864,569
RAC: 14,817
United States
Message 1232622 - Posted: 15 May 2012, 22:43:05 UTC

Things seem to be at an upload/download stand still at the moment, but I don't mind a bit. It is so nice to carry a bit of a cache, and not have to worry if my rig is about to run out of work. I can survive an outage, or a shorty storm. I am quite pleased that the limits were lifted. If they were still in place, I would be panicing a bit about now.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1232644 - Posted: 15 May 2012, 23:10:13 UTC
Last modified: 15 May 2012, 23:21:23 UTC

All downloads that were stuck on my hosts now went "through":

[error] File 22ap11ad.25473.17253.14.10.23 has wrong size: expected 375365, got 0 [error] Checksum or signature error for 22ap11ad.25473.17253.14.10.23

That's a new one :)
Now I have more errors than some doofus with broken and forgotten GPU.

Edit: nevermind. fresh WUs are downloading fine again
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3725
Credit: 48,768,260
RAC: 1,737
United States
Message 1232663 - Posted: 15 May 2012, 23:48:57 UTC - in response to Message 1232644.

All downloads that were stuck on my hosts now went "through":
[error] File 22ap11ad.25473.17253.14.10.23 has wrong size: expected 375365, got 0 [error] Checksum or signature error for 22ap11ad.25473.17253.14.10.23

That's a new one :)
Now I have more errors than some doofus with broken and forgotten GPU.

Edit: nevermind. fresh WUs are downloading fine again


I have 8 dl error wu as well.
____________

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (74) Server problems?

Copyright © 2014 University of California