Panic Mode On (30) Server problems

Message boards : Number crunching : Panic Mode On (30) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 978694 - Posted: 14 Mar 2010, 17:18:44 UTC

I'm getting downloads now so that isn't a problem for me, yet.

Looking at the Cricket graphs it appears that uploads went wonky about 30 hours ago. Same thing with Scarecrows results received graphs. Currently 10-20% of "normal" uploads per hour.

The system may have started to degrade a day or so before it collapsed to current levels. Hopefully it won't be another extended case of "she canna take it Capt'n".
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 978694 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 978706 - Posted: 14 Mar 2010, 18:22:31 UTC
Last modified: 14 Mar 2010, 18:25:18 UTC

My 920 rig just ran out of Cuda work, and as there are 2300 or so VfingLAR's on the CPU, nothing left to reschedule here.

The Frozen One has a couple of day's worth yet, and the 3rd Cuda has one or two.

Hope this doesn't take days to fix like the last outage.

Another hack attack?
Or did some clown run another rogue script again?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 978706 · Report as offensive
Profile 52 Aces
Avatar

Send message
Joined: 7 Jan 02
Posts: 497
Credit: 14,261,068
RAC: 67
United States
Message 978731 - Posted: 14 Mar 2010, 20:16:45 UTC - in response to Message 978706.  

My 920 rig just ran out of Cuda work, and as there are 2300 or so VfingLAR's on the CPU, nothing left to reschedule here.

Thought you said you had 10 days worth for all the machines --- it's not really 10 days worth if they all don't have the right ratio ;-) Even on good days, the potential to have nothing but "V-to-the-f-Lars" exists, so I keep my ReScheduler slider over at 75% to compensate for the feast/famine nature of the tapes being split.

That the upload path is busted just amplifies things, but that could be weathered if there wasn't an artificial block on downloads. If this drags again, I'll take the half day hit to figure out how to set up a private boinc build absent that one line of code (greedy I know, but history seems to be repeating itself awfully quickly).
ID: 978731 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 978733 - Posted: 14 Mar 2010, 20:18:11 UTC

Since about 9 UTC the work units waiting assimilation has decreased by 111335
This comes as a bit of a surprise to since there has a low turn over of results returned. I have five tasks waiting to upload, I have another 17 in my cache.
ID: 978733 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 978735 - Posted: 14 Mar 2010, 20:26:34 UTC - in response to Message 978731.  

My 920 rig just ran out of Cuda work, and as there are 2300 or so VfingLAR's on the CPU, nothing left to reschedule here.

Thought you said you had 10 days worth for all the machines --- it's not really 10 days worth if they all don't have the right ratio ;-) Even on good days, the potential to have nothing but "V-to-the-f-Lars" exists, so I keep my ReScheduler slider over at 75% to compensate for the feast/famine nature of the tapes being split.

That the upload path is busted just amplifies things, but that could be weathered if there wasn't an artificial block on downloads. If this drags again, I'll take the half day hit to figure out how to set up a private boinc build absent that one line of code (greedy I know, but history seems to be repeating itself awfully quickly).

10 day's worth includes the CPU....
The ratio of VLAR work being issued over the last few weeks has been rather high, so you can't reschedule what you can't get.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 978735 · Report as offensive
Profile 52 Aces
Avatar

Send message
Joined: 7 Jan 02
Posts: 497
Credit: 14,261,068
RAC: 67
United States
Message 978747 - Posted: 14 Mar 2010, 21:09:49 UTC - in response to Message 978735.  

The ratio of VLAR work being issued over the last few weeks has been rather high, so you can't reschedule what you can't get.

I totally hear ya. It's that reason a few weeks back I would occassionally post a ratio of REG/VLAR/VHAR/AP. No one much noticed, but it was a valuable data point for me after I had gotten burned with no cuda work left, so I tried to share. I'd put my toe in the water to sample the ratio with about 200 WU's, and if it was favorable, I'd open the floodgates, and if not, I try again a day later assuming I saw new tapes go live (while keeping my fingers crossed that the project wouldn't pick that day to keel over completely).

Given that approach above, my ratio is still about 60/40 this afternoon (and I can always rebalance back to CPU if necessary), hence I'm not feeling the same pain (yes, I know you eat way more than I do, but I crunch enough work that I beleive the same method would scale). The downside is I have to remember to run re-scheduler about every other day (to balance a 3 day CPU queue and an artifically loaded 12 day queue for the GPU). Yes, this only works for me because we seldom get 7 days of nothing by VLARs, but all these techniques are really little more than increasing personal breathing room for times when the project itself is off balance somehow, and pretty much all you can buy yourself is about half the distance of shortest issued WU's (which is 2 weeks, thus 1 week of air).

Would be a ton easier and far more timely if the Server Status page just gave the % as part of the 'Ready to send' figure (and it could be argued if the cuda code were revisited, maybe there'd be no need for rescheduler in the first place).
ID: 978747 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 978755 - Posted: 14 Mar 2010, 21:34:29 UTC - in response to Message 978747.  


Would be a ton easier and far more timely if the Server Status page just gave the % as part of the 'Ready to send' figure (and it could be argued if the cuda code were revisited, maybe there'd be no need for rescheduler in the first place).

Well, we all know that isn't going to happen, eh?

For now, I'd be happy if I could just report what work I do have done.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 978755 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 978794 - Posted: 14 Mar 2010, 23:25:11 UTC - in response to Message 978747.  

Would be a ton easier and far more timely if the Server Status page just gave the % as part of the 'Ready to send' figure (and it could be argued if the cuda code were revisited, maybe there'd be no need for rescheduler in the first place).

But there is only one queue RTS. At that point there is no knowledge of whether they will be crunched by a CPU or a GPU, they are the same WU's.

F.
ID: 978794 · Report as offensive
Profile ccappel
Avatar

Send message
Joined: 27 Jan 00
Posts: 362
Credit: 1,516,412
RAC: 0
United States
Message 978958 - Posted: 15 Mar 2010, 13:08:08 UTC

Was there any definitive reason given for the last time the uploads were backlogged a couple weeks ago?
"Life is a tragedy for those who feel, and a comedy for those who think."

"I never get into an argument that I cannot win."
ID: 978958 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 978970 - Posted: 15 Mar 2010, 14:06:07 UTC - in response to Message 978958.  

Was there any definitive reason given for the last time the uploads were backlogged a couple weeks ago?

No.
ID: 978970 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 978971 - Posted: 15 Mar 2010, 14:18:02 UTC

Welcome to Monday boys and girls.

It's 7:15AM in Berkeley right now. The SETI crew should be in in about an hour or so. Hope this is something they can get fixed quickly. I almost made it but my stash ran dry sometime last night. The poor servers are gonna get hammered when the floodgates do open today. Be patient, it may take awhile for them to get work out to us. It won't be long.


PROUD MEMBER OF Team Starfire World BOINC
ID: 978971 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 978975 - Posted: 15 Mar 2010, 14:41:08 UTC - in response to Message 978971.  

Welcome to Monday boys and girls.

It's 7:15AM in Berkeley right now. The SETI crew should be in in about an hour or so. Hope this is something they can get fixed quickly. I almost made it but my stash ran dry sometime last night. The poor servers are gonna get hammered when the floodgates do open today. Be patient, it may take awhile for them to get work out to us. It won't be long.

I think they'd do better concentrating on getting the existing work back in, validated, assimilated and scrubbed.

Not much point releasing yet more WUs until Dyno-Rod's attended and got all the pipelines flushed and running clear.
ID: 978975 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 978977 - Posted: 15 Mar 2010, 14:49:03 UTC - in response to Message 978975.  

Welcome to Monday boys and girls.

It's 7:15AM in Berkeley right now. The SETI crew should be in in about an hour or so. Hope this is something they can get fixed quickly. I almost made it but my stash ran dry sometime last night. The poor servers are gonna get hammered when the floodgates do open today. Be patient, it may take awhile for them to get work out to us. It won't be long.

I think they'd do better concentrating on getting the existing work back in, validated, assimilated and scrubbed.

Not much point releasing yet more WUs until Dyno-Rod's attended and got all the pipelines flushed and running clear.

Might have been fixed aleady, just had a Wu finish, it got uploaded first attempt, Boinc asked for work once, and got a task,
I've clicked update on Seti Main and Beta, and they both got updated at the first attempt,
Just the small matter of lots of uploads next, then the fallout.

Claggy
ID: 978977 · Report as offensive
Profile ccappel
Avatar

Send message
Joined: 27 Jan 00
Posts: 362
Credit: 1,516,412
RAC: 0
United States
Message 978979 - Posted: 15 Mar 2010, 14:52:43 UTC - in response to Message 978977.  

Might have been fixed aleady, just had a Wu finish, it got uploaded first attempt, Boinc asked for work once, and got a task,
I've clicked update on Seti Main and Beta, and they both got updated at the first attempt,
Just the small matter of lots of uploads next, then the fallout.

Claggy

I just abused my Retry button on the transfers tab and nothing went thru after multiple attempts. So still not fixed.
"Life is a tragedy for those who feel, and a comedy for those who think."

"I never get into an argument that I cannot win."
ID: 978979 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 978980 - Posted: 15 Mar 2010, 14:55:45 UTC - in response to Message 978975.  


Not much point releasing yet more WUs until Dyno-Rod's attended and got all the pipelines flushed and running clear.


Of course you're right Richard. I was just making a quick post welcoming Monday. When I said the servers were gonna get hammered I meant by all the work trying to go back in to SETI. Once that gets back under some semblance of control then the WUs can flow.

Claggy, I did have two WUs report sometime last night but I'm still sitting on a bunch of uploads. I'm resisting the urge to abuse my buttons! :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 978980 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 978983 - Posted: 15 Mar 2010, 15:03:01 UTC - in response to Message 978731.  

My 920 rig just ran out of Cuda work, and as there are 2300 or so VfingLAR's on the CPU, nothing left to reschedule here.

Thought you said you had 10 days worth for all the machines --- it's not really 10 days worth if they all don't have the right ratio ;-) Even on good days, the potential to have nothing but "V-to-the-f-Lars" exists, so I keep my ReScheduler slider over at 75% to compensate for the feast/famine nature of the tapes being split.

That the upload path is busted just amplifies things, but that could be weathered if there wasn't an artificial block on downloads. If this drags again, I'll take the half day hit to figure out how to set up a private boinc build absent that one line of code (greedy I know, but history seems to be repeating itself awfully quickly).

Even with a 10 day cache you are screwed if VLars are all that is going out. Same thing happened to me, kept getting more and more units and they were all mostly VLars. So I ran out. Mine was on a 4 day, now I will set for 10!
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 978983 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 978985 - Posted: 15 Mar 2010, 15:04:46 UTC - in response to Message 978747.  

The ratio of VLAR work being issued over the last few weeks has been rather high, so you can't reschedule what you can't get.

I totally hear ya. It's that reason a few weeks back I would occassionally post a ratio of REG/VLAR/VHAR/AP. No one much noticed, but it was a valuable data point for me after I had gotten burned with no cuda work left, so I tried to share. I'd put my toe in the water to sample the ratio with about 200 WU's, and if it was favorable, I'd open the floodgates, and if not, I try again a day later assuming I saw new tapes go live (while keeping my fingers crossed that the project wouldn't pick that day to keel over completely).

Given that approach above, my ratio is still about 60/40 this afternoon (and I can always rebalance back to CPU if necessary), hence I'm not feeling the same pain (yes, I know you eat way more than I do, but I crunch enough work that I beleive the same method would scale). The downside is I have to remember to run re-scheduler about every other day (to balance a 3 day CPU queue and an artifically loaded 12 day queue for the GPU). Yes, this only works for me because we seldom get 7 days of nothing by VLARs, but all these techniques are really little more than increasing personal breathing room for times when the project itself is off balance somehow, and pretty much all you can buy yourself is about half the distance of shortest issued WU's (which is 2 weeks, thus 1 week of air).

Would be a ton easier and far more timely if the Server Status page just gave the % as part of the 'Ready to send' figure (and it could be argued if the cuda code were revisited, maybe there'd be no need for rescheduler in the first place).

Actually I was watching and setting my network on or off when the units were right. Start the thread over I will watch.
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 978985 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 978988 - Posted: 15 Mar 2010, 15:07:54 UTC - in response to Message 978977.  

Welcome to Monday boys and girls.

It's 7:15AM in Berkeley right now. The SETI crew should be in in about an hour or so. Hope this is something they can get fixed quickly. I almost made it but my stash ran dry sometime last night. The poor servers are gonna get hammered when the floodgates do open today. Be patient, it may take awhile for them to get work out to us. It won't be long.

I think they'd do better concentrating on getting the existing work back in, validated, assimilated and scrubbed.

Not much point releasing yet more WUs until Dyno-Rod's attended and got all the pipelines flushed and running clear.

Might have been fixed aleady, just had a Wu finish, it got uploaded first attempt, Boinc asked for work once, and got a task,
I've clicked update on Seti Main and Beta, and they both got updated at the first attempt,
Just the small matter of lots of uploads next, then the fallout.

Claggy

I abused my buttons and got a few to go thru, hmmm can someone send me an auto button abuser program?
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 978988 · Report as offensive
Profile stephen Goodyer

Send message
Joined: 8 Oct 06
Posts: 37
Credit: 1,263,530
RAC: 3
United Kingdom
Message 978996 - Posted: 15 Mar 2010, 15:21:53 UTC - in response to Message 978988.  

The upload/download may be fixed, i just had 1 upload and 1 download.
With the upload i was getting a HTTP error, did you get the same?
ID: 978996 · Report as offensive
Profile 52 Aces
Avatar

Send message
Joined: 7 Jan 02
Posts: 497
Credit: 14,261,068
RAC: 67
United States
Message 978998 - Posted: 15 Mar 2010, 15:24:47 UTC - in response to Message 978983.  

kept getting more and more units and they were all mostly VLars. So I ran out. Mine was on a 4 day, now I will set for 10!

Would seem the solution isn't a bigger wheelbarrow --- you'll just end up with 10 days of nothing but VLArs rather quickly. Assumming Uploads aren't blocking you (like they are today), a viable albeit manual work-around is to do a "before & after" test via Re-Scheduler on a small number of fresh downloads. You'll know rather immediately if you should be filling up the wheelbarrow or deferring to another time when Regs might be more plentiful (and generally within a tape or two, the mix will change significantly).

... or at least it's worked very well for me, but as they say, to each his own level of manual intervention ;-)
ID: 978998 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Panic Mode On (30) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.