Panic Mode On (74) Server problems?

Message boards : Number crunching : Panic Mode On (74) Server problems?

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
David SProject Donor
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 17034
Credit: 20,915,189
RAC: 5,932
United States
Message 1230538 - Posted: 11 May 2012, 14:42:06 UTC - in response to Message 1230458.

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ID: 1230538 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1230557 - Posted: 11 May 2012, 15:47:14 UTC - in response to Message 1230538.
Last modified: 11 May 2012, 15:54:30 UTC

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.


that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.


Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.
I'm not the Pope. I don't speak Ex Cathedra!

ID: 1230557 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45861
Credit: 814,586,863
RAC: 121,756
United States
Message 1230559 - Posted: 11 May 2012, 15:52:35 UTC

I suspect the turnaround time is dropping due to many hosts going into EDF as their caches increase in size and Boinc tries to adjust, thus bringing them to the front of the cache and returning them sooner than they would be otherwise.


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230559 · Report as offensive
David SProject Donor
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 17034
Credit: 20,915,189
RAC: 5,932
United States
Message 1230574 - Posted: 11 May 2012, 16:25:17 UTC - in response to Message 1230557.

I am puzzled as to why the Result average turnaround is dropping though.

Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer.

that's why Ray finds it puzzling that turnaround time is DEcreasing ;)

Duh! I just read it wrong, I guess.

Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something]

Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :)

If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet.

Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting.
I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC.
GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is.

Thanks.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ID: 1230574 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21790
Credit: 2,510,901
RAC: 0
United States
Message 1230691 - Posted: 11 May 2012, 22:51:36 UTC
Last modified: 11 May 2012, 22:51:54 UTC

So what's the worst that could happen? What could possibly go wrong?



Join BOINC Synergy!

ID: 1230691 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1230696 - Posted: 11 May 2012, 23:05:04 UTC - in response to Message 1230691.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.




Executive Director GPU Users Group Inc. -
brad@gpuug.org

ID: 1230696 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1230697 - Posted: 11 May 2012, 23:07:43 UTC - in response to Message 1230696.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>

ID: 1230697 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 39,653,995
RAC: 17,800
United Kingdom
Message 1230763 - Posted: 12 May 2012, 2:30:20 UTC - in response to Message 1230557.

Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set.

I am shure that started with 6:12:33 or somewhere near that version,
of all the changes that have been made that is one of them that i noticed and like . . .

ID: 1230763 · Report as offensive
zoom314
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 56730
Credit: 40,720,609
RAC: 4,902
United States
Message 1230770 - Posted: 12 May 2012, 3:27:27 UTC - in response to Message 1230697.

Hehehe. Things go good for a while and it has people nervous.

I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds.

Everything's green on my end.


I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g>

I've got 1067 on the 590 here waiting to be worked on, I have 1227 pending of course, but then I'm testing testing out some modified 590 firmware, so far so good, the 1598 seems to like it, no driver crashes seen today since the fan went to 100% and the volts is now at 0.950v(since about 7am today), I'm using the 2.20 final version of MSI Afterburner, I like it better than EVGA's version, the EVGA has the looks, but the functionality is in the MSI, oh well, the softwares free at least.
Pluto is still a planet.

Beep! Beep!

ID: 1230770 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 10499
Credit: 135,173,618
RAC: 37,320
Australia
Message 1230806 - Posted: 12 May 2012, 4:55:50 UTC - in response to Message 1230770.

I'm pretty happy here now as my 3 rigs arn't hitting up the servers every 5 mins or so and I havn't seen that annoying limit message for days, that alone should take a fair bit of strain off the connection (until next month at least when I'm planning a few little upgrades).

Cheers.


ID: 1230806 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1790
Credit: 225,264,087
RAC: 10,125
Australia
Message 1230942 - Posted: 12 May 2012, 12:39:49 UTC

Personally I think the system works better with the limits in place.

Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold.

With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights.

ID: 1230942 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45861
Credit: 814,586,863
RAC: 121,756
United States
Message 1230944 - Posted: 12 May 2012, 12:46:06 UTC - in response to Message 1230942.
Last modified: 12 May 2012, 12:59:16 UTC

Personally I think the system works better with the limits in place.

Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold.

With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights.

Personally.......
The kitties have not always been about bragging rights.......
But they have them.

I know about dragging RAC.

I don't give a dang about it.
My pendings have gone up a thousand fold since the change.
And I welcome it.

I brag sometimes....I have the kibbles to back it, my friend.
If that is not your piece of dirt, don't fight it.


EDIT...
And I love the current server situ.
If it continues for another 6 months, even the kitties may consider leaving fewer kibbles in their bowls.

EDIT 2......
And you might find better time crunching than to pick a bit with the #6 cruncher in the place......
There are reasons I am where I am. Cache is one of them.
It got me crunching through times when the servers were down and out.
I kept going.
Peace.
Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1230944 · Report as offensive
Tutankhamon "Communist"
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6081
Credit: 37,598,075
RAC: 15,029
Sweden
Message 1231013 - Posted: 12 May 2012, 14:19:40 UTC

Oh Oh....AP is being split again. Grab them while you can. I only crunch AP 6 on my Celeron Laptop for now, here on main, and that one doesn't compete for any award for the fastest cruncher ever built.

LOL


This is a test of the Emergency Moron System. Had there been a real moron in the room, there would've been a small mushroom cloud in the place where the idiot had been standing.

ID: 1231013 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 780
Credit: 232,243,824
RAC: 84,056
United Kingdom
Message 1231074 - Posted: 12 May 2012, 17:17:43 UTC - in response to Message 1231013.

Oh Oh....AP is being split again. Grab them while you can. I only crunch AP 6 on my Celeron Laptop for now, here on main, and that one doesn't compete for any award for the fastest cruncher ever built.

LOL

Ah, that explains the uptick in the cricket download graph. Oh, well, this latest free-for-all has left me with a rather large cache on most machines, so I'm not overly concerned about download speeds. Also the UK weather finally cleared up enough for me to take one of my quad-cores outside to blow the dust out of it and replace the PSU that failed three weeks ago. My RAC on that machine is down to 500 from its normal 3600, so I expect my overall RAC to improve even further now -- some of my newer machines still haven't reached a plateau.

ID: 1231074 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11136
Credit: 83,497,343
RAC: 41,074
United Kingdom
Message 1232395 - Posted: 15 May 2012, 8:22:45 UTC

OK, been too long since we had anything to panic about.

There's a 'tape' on the splitter list called 01ap10hv. They start with aa, ab, ac ... and count up.

So, did they really record 203 tapes that day? Over 10 TB? Or is ET pulling our leg?

ID: 1232395 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6085
Credit: 154,971,448
RAC: 46,945
United States
Message 1232478 - Posted: 15 May 2012, 13:46:26 UTC - in response to Message 1232395.

OK, been too long since we had anything to panic about.

There's a 'tape' on the splitter list called 01ap10hv. They start with aa, ab, ac ... and count up.

So, did they really record 203 tapes that day? Over 10 TB? Or is ET pulling our leg?

I didn't know they had that much on site storage. Perhaps a hiccup in the recording software?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1232478 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1790
Credit: 225,264,087
RAC: 10,125
Australia
Message 1232485 - Posted: 15 May 2012, 14:04:53 UTC - in response to Message 1230944.

EDIT 2......
And you might find better time crunching than to pick a bit with the #6 cruncher in the place......
There are reasons I am where I am. Cache is one of them.
It got me crunching through times when the servers were down and out.
I kept going.
Peace.

Shields down Scottie.

AFAIK there is more than one person on this project running a 10 day cache. I was talking generally and using a figure of speech, not having a go at you or anyone else in particular.

If I had wanted to ping you personally I'd have said something much more pointed.

Peace

Amen

T.A.

ID: 1232485 · Report as offensive
Profile SciManStev
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 5840
Credit: 105,982,848
RAC: 3,042
United States
Message 1232622 - Posted: 15 May 2012, 22:43:05 UTC

Things seem to be at an upload/download stand still at the moment, but I don't mind a bit. It is so nice to carry a bit of a cache, and not have to worry if my rig is about to run out of work. I can survive an outage, or a shorty storm. I am quite pleased that the limits were lifted. If they were still in place, I would be panicing a bit about now.

Steve


Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

ID: 1232622 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1232644 - Posted: 15 May 2012, 23:10:13 UTC
Last modified: 15 May 2012, 23:21:23 UTC

All downloads that were stuck on my hosts now went "through":

[error] File 22ap11ad.25473.17253.14.10.23 has wrong size: expected 375365, got 0
[error] Checksum or signature error for 22ap11ad.25473.17253.14.10.23

That's a new one :)
Now I have more errors than some doofus with broken and forgotten GPU.

Edit: nevermind. fresh WUs are downloading fine again

ID: 1232644 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4097
Credit: 51,576,090
RAC: 1,593
United States
Message 1232663 - Posted: 15 May 2012, 23:48:57 UTC - in response to Message 1232644.

All downloads that were stuck on my hosts now went "through":
[error] File 22ap11ad.25473.17253.14.10.23 has wrong size: expected 375365, got 0
[error] Checksum or signature error for 22ap11ad.25473.17253.14.10.23

That's a new one :)
Now I have more errors than some doofus with broken and forgotten GPU.

Edit: nevermind. fresh WUs are downloading fine again


I have 8 dl error wu as well.

ID: 1232663 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (74) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.