Panic Mode On (74) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (74) Server problems?

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next
Author Message
N9JFE
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 9306
Credit: 11,901,493
RAC: 15,043
United States
Message 1230538 - Posted: 11 May 2012, 14:42:06 UTC - in response to Message 1230458.

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1230557 - Posted: 11 May 2012, 15:47:14 UTC - in response to Message 1230538.
Last modified: 11 May 2012, 15:54:30 UTC

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.


that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.


Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.
____________
I'm not the Pope. I don't speak Ex Cathedra!

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37287
Credit: 498,052,389
RAC: 493,577
United States
Message 1230559 - Posted: 11 May 2012, 15:52:35 UTC

I suspect the turnaround time is dropping due to many hosts going into EDF as their caches increase in size and Boinc tries to adjust, thus bringing them to the front of the cache and returning them sooner than they would be otherwise.

____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

N9JFE
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 9306
Credit: 11,901,493
RAC: 15,043
United States
Message 1230574 - Posted: 11 May 2012, 16:25:17 UTC - in response to Message 1230557.

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Duh! I just read it wrong, I guess.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.

Thanks.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile Misfit
Volunteer tester
Avatar
Send message
Joined: 21 Jun 01
Posts: 21779
Credit: 2,506,465
RAC: 3,199
United States
Message 1230691 - Posted: 11 May 2012, 22:51:36 UTC
Last modified: 11 May 2012, 22:51:54 UTC

So what's the worst that could happen? What could possibly go wrong?
____________

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1230696 - Posted: 11 May 2012, 23:05:04 UTC - in response to Message 1230691.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,366,360
RAC: 1,317
United States
Message 1230697 - Posted: 11 May 2012, 23:07:43 UTC - in response to Message 1230696.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>
____________

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 5
United Kingdom
Message 1230763 - Posted: 12 May 2012, 2:30:20 UTC - in response to Message 1230557.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

I am shure that started with 6:12:33 or somewhere near that version,
of all the changes that have been made that is one of them that i noticed and like . . .

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44522
Credit: 35,396,243
RAC: 9,232
Message 1230770 - Posted: 12 May 2012, 3:27:27 UTC - in response to Message 1230697.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>

I've got 1067 on the 590 here waiting to be worked on, I have 1227 pending of course, but then I'm testing testing out some modified 590 firmware, so far so good, the 1598 seems to like it, no driver crashes seen today since the fan went to 100% and the volts is now at 0.950v(since about 7am today), I'm using the 2.20 final version of MSI Afterburner, I like it better than EVGA's version, the EVGA has the looks, but the functionality is in the MSI, oh well, the softwares free at least.
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 5132
Credit: 82,861,892
RAC: 71,987
Australia
Message 1230806 - Posted: 12 May 2012, 4:55:50 UTC - in response to Message 1230770.

I'm pretty happy here now as my 3 rigs arn't hitting up the servers every 5 mins or so and I havn't seen that annoying limit message for days, that alone should take a fair bit of strain off the connection (until next month at least when I'm planning a few little upgrades).

Cheers.
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1626
Credit: 200,722,962
RAC: 70,087
Australia
Message 1230942 - Posted: 12 May 2012, 12:39:49 UTC

Personally I think the system works better with the limits in place.

Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold.

With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37287
Credit: 498,052,389
RAC: 493,577
United States
Message 1230944 - Posted: 12 May 2012, 12:46:06 UTC - in response to Message 1230942.
Last modified: 12 May 2012, 12:59:16 UTC

Personally I think the system works better with the limits in place.

Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold.

With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights.

Personally.......
The kitties have not always been about bragging rights.......
But they have them.

I know about dragging RAC.

I don't give a dang about it.
My pendings have gone up a thousand fold since the change.
And I welcome it.

I brag sometimes....I have the kibbles to back it, my friend.
If that is not your piece of dirt, don't fight it.


EDIT...
And I love the current server situ.
If it continues for another 6 months, even the kitties may consider leaving fewer kibbles in their bowls.

EDIT 2......
And you might find better time crunching than to pick a bit with the #6 cruncher in the place......
There are reasons I am where I am. Cache is one of them.
It got me crunching through times when the servers were down and out.
I kept going.
Peace.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3307
Credit: 16,253,929
RAC: 11,044
Sweden
Message 1231013 - Posted: 12 May 2012, 14:19:40 UTC

Oh Oh....AP is being split again. Grab them while you can. I only crunch AP 6 on my Celeron Laptop for now, here on main, and that one doesn't compete for any award for the fastest cruncher ever built.

LOL
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 552
Credit: 120,017,023
RAC: 84,937
United Kingdom
Message 1231074 - Posted: 12 May 2012, 17:17:43 UTC - in response to Message 1231013.

Oh Oh....AP is being split again. Grab them while you can. I only crunch AP 6 on my Celeron Laptop for now, here on main, and that one doesn't compete for any award for the fastest cruncher ever built.

LOL

Ah, that explains the uptick in the cricket download graph. Oh, well, this latest free-for-all has left me with a rather large cache on most machines, so I'm not overly concerned about download speeds. Also the UK weather finally cleared up enough for me to take one of my quad-cores outside to blow the dust out of it and replace the PSU that failed three weeks ago. My RAC on that machine is down to 500 from its normal 3600, so I expect my overall RAC to improve even further now -- some of my newer machines still haven't reached a plateau.

____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,917,517
RAC: 13,490
United Kingdom
Message 1232395 - Posted: 15 May 2012, 8:22:45 UTC

OK, been too long since we had anything to panic about.

There's a 'tape' on the splitter list called 01ap10hv. They start with aa, ab, ac ... and count up.

So, did they really record 203 tapes that day? Over 10 TB? Or is ET pulling our leg?

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3567
Credit: 97,829,316
RAC: 78,292
United States
Message 1232478 - Posted: 15 May 2012, 13:46:26 UTC - in response to Message 1232395.

OK, been too long since we had anything to panic about.

There's a 'tape' on the splitter list called 01ap10hv. They start with aa, ab, ac ... and count up.

So, did they really record 203 tapes that day? Over 10 TB? Or is ET pulling our leg?

I didn't know they had that much on site storage. Perhaps a hiccup in the recording software?
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1626
Credit: 200,722,962
RAC: 70,087
Australia
Message 1232485 - Posted: 15 May 2012, 14:04:53 UTC - in response to Message 1230944.

EDIT 2......
And you might find better time crunching than to pick a bit with the #6 cruncher in the place......
There are reasons I am where I am. Cache is one of them.
It got me crunching through times when the servers were down and out.
I kept going.
Peace.

Shields down Scottie.

AFAIK there is more than one person on this project running a 10 day cache. I was talking generally and using a figure of speech, not having a go at you or anyone else in particular.

If I had wanted to ping you personally I'd have said something much more pointed.

Peace

Amen

T.A.

Profile SciManStev
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4662
Credit: 77,141,722
RAC: 42,198
United States
Message 1232622 - Posted: 15 May 2012, 22:43:05 UTC

Things seem to be at an upload/download stand still at the moment, but I don't mind a bit. It is so nice to carry a bit of a cache, and not have to worry if my rig is about to run out of work. I can survive an outage, or a shorty storm. I am quite pleased that the limits were lifted. If they were still in place, I would be panicing a bit about now.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1232644 - Posted: 15 May 2012, 23:10:13 UTC
Last modified: 15 May 2012, 23:21:23 UTC

All downloads that were stuck on my hosts now went "through":

[error] File 22ap11ad.25473.17253.14.10.23 has wrong size: expected 375365, got 0
[error] Checksum or signature error for 22ap11ad.25473.17253.14.10.23

That's a new one :)
Now I have more errors than some doofus with broken and forgotten GPU.

Edit: nevermind. fresh WUs are downloading fine again
____________

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3543
Credit: 46,087,472
RAC: 30,496
United States
Message 1232663 - Posted: 15 May 2012, 23:48:57 UTC - in response to Message 1232644.

All downloads that were stuck on my hosts now went "through":
[error] File 22ap11ad.25473.17253.14.10.23 has wrong size: expected 375365, got 0
[error] Checksum or signature error for 22ap11ad.25473.17253.14.10.23

That's a new one :)
Now I have more errors than some doofus with broken and forgotten GPU.

Edit: nevermind. fresh WUs are downloading fine again


I have 8 dl error wu as well.
____________

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (74) Server problems?

Copyright © 2014 University of California