Monolith (Jun 14 2011)


log in

Advanced search

Message boards : Technical News : Monolith (Jun 14 2011)

1 · 2 · 3 · 4 . . . 6 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 1117102 - Posted: 14 Jun 2011, 23:06:13 UTC

Usual outage day. Project goes down, we squeeze and copy databases, project comes back up. It seems the mysql replica is oddly unable to keep up with much success anymore. I think the cause is our ridiculously consistent heavy load lately thus keeping the databases busier than normal. Anybody have any theories about what is causing the ridiculously consistent heavy load? What's also a little strange is the CPU/IO load on jocelyn is low... so what's the bottleneck? I'd have to guess network, but it's copying the logs from the master faster than executing the SQL within those logs. So...?

And speaking of high production loads I also just noticed we're low on work to split. Prepare for tonight to be a little rocky as files are slow to transfer up from the archives and get radar blanked before being splittable.

By the way, the Astropulse assimilators are off because the database table containing the signals had one of its fragments run out of extents. In layman's terms it reached an arbitrary limit that we'll now have to work around. We'll sort this out shortly.

Kepler data is here in a big ol' box and being archived down to HPSS. It sure is nice seeing the network graph for the whole lab going from a baseline of ~50 Mbits/sec to ~250 Mbits/sec when we started that procedure. Too bad we're still currently stuck using the HE connection for our uploads/downloads. Maybe someday that'll change.

Sorry my posts continue to be intermittent. I apologize but expect things to get worse as the music career will temporary consume me. You may see rather significant periods of silence from me for the next... I dunno... 6 to 12 months? I'm sure the others will chime in as needed if I'm not around.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8278
Credit: 45,027,417
RAC: 13,623
United Kingdom
Message 1117103 - Posted: 14 Jun 2011, 23:13:20 UTC - in response to Message 1117102.
Last modified: 14 Jun 2011, 23:20:44 UTC

Anybody have any theories about what is causing the ridiculously consistent heavy load?

Yes, you've been splitting practically nothing but "shorties" - very high angle range tasks, from a basketweave survey at Arecibo. Hang on, I'll get you the reference.

Edit - try my message 1112964. That covers most of it.

Profile eaglescouter
Send message
Joined: 28 Dec 02
Posts: 162
Credit: 42,012,553
RAC: 106
United States
Message 1117115 - Posted: 14 Jun 2011, 23:54:26 UTC

Unable to upload results,

first box reports: project servers may be temporarily down
second box reports: Internal HTTP server error

help!

____________
It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :)

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 3966
Credit: 31,950,849
RAC: 13,492
United Kingdom
Message 1117120 - Posted: 14 Jun 2011, 23:59:36 UTC - in response to Message 1117102.

Thanks for the update Matt,

Claggy

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 13,760,141
RAC: 12,225
United States
Message 1117121 - Posted: 15 Jun 2011, 0:01:16 UTC - in response to Message 1117115.

Bare with it Eaglescouter, mine tried and got a can't connect to server then turned around a minute later and got right through. It's catch as catch can right now as everybody fills up after the outage.
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Jeff Mercer
Send message
Joined: 14 Aug 08
Posts: 90
Credit: 156,677
RAC: 690
United States
Message 1117129 - Posted: 15 Jun 2011, 0:14:42 UTC


Thanks for the update Matt. So far, I'm running pretty good. I'm getting plenty of work, and so far, no problem uploading or downloading. Enjoy the music !! I play a little Hendrix at times. Takes my mind off of a lot of problems !! Thanks for sticking with the project. I appreciate your hard work.

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 6972
Credit: 57,214,519
RAC: 22,240
Germany
Message 1117154 - Posted: 15 Jun 2011, 1:23:24 UTC - in response to Message 1117102.

Matt, thanks for the news!


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
____________
BR



>Das Deutsche Cafe. The German Cafe.<

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 11732
Credit: 5,969,877
RAC: 0
United States
Message 1117156 - Posted: 15 Jun 2011, 1:42:39 UTC
Last modified: 15 Jun 2011, 1:43:39 UTC

Break a leg! Or is that only for actors?
____________

Profile Cliff Harding
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 826
Credit: 46,574,872
RAC: 13,715
United States
Message 1117170 - Posted: 15 Jun 2011, 2:44:46 UTC - in response to Message 1117102.

It sure is nice seeing the network graph for the whole lab going from a baseline of ~50 Mbits/sec to ~250 Mbits/sec when we started that procedure. Too bad we're still currently stuck using the HE connection for our uploads/downloads. Maybe someday that'll change.


Thanks for the update Matt,keep up the good work. I glad someone/something opened the flood gates even though it may not last long, d/l usually moving at 3.67Kb - 15Kb taking hours just shot up to 88Kb - 347Kb and minutes.

Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2460
Credit: 83,985,890
RAC: 29,951
United States
Message 1117185 - Posted: 15 Jun 2011, 3:17:59 UTC

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!
____________
Boinc....Boinc....Boinc....Boinc....

rob smith
Volunteer moderator
Send message
Joined: 7 Mar 03
Posts: 7706
Credit: 45,300,119
RAC: 80,172
United Kingdom
Message 1117233 - Posted: 15 Jun 2011, 7:53:48 UTC

My pet theory - re-try times are too short for the current "shorty storm". In previous existences I've found that the re-try rate can be very sensitive to the time-out time, small changes in that can have very substantial changes in overall throughput of a system.


On a more human note, enjoy your music career, and when are you touring the UK?
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 1117406 - Posted: 15 Jun 2011, 17:43:55 UTC - in response to Message 1117185.

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

- Matt


____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37489
Credit: 503,015,803
RAC: 567,428
United States
Message 1117410 - Posted: 15 Jun 2011, 17:45:58 UTC - in response to Message 1117406.

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

- Matt


No chance some viral meanie has crept into the works?
____________
******************
Just a kittyman kinda guy.

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8278
Credit: 45,027,417
RAC: 13,623
United Kingdom
Message 1117425 - Posted: 15 Jun 2011, 18:22:58 UTC - in response to Message 1117406.

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

What numbers would those be, Matt? Maybe we can help, looking at it from this end?

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13326
Credit: 27,954,421
RAC: 16,143
United States
Message 1117426 - Posted: 15 Jun 2011, 18:23:54 UTC - in response to Message 1117410.
Last modified: 15 Jun 2011, 18:24:22 UTC

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

- Matt


No chance some viral meanie has crept into the works?


But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 13,760,141
RAC: 12,225
United States
Message 1117441 - Posted: 15 Jun 2011, 19:18:18 UTC

Did you find the bottleneck? I just got a herd of downloads and they are coming at me fast and furious!

Whatever it was, great job guys!
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1833
Credit: 7,617,636
RAC: 18,654
United States
Message 1117498 - Posted: 15 Jun 2011, 21:46:32 UTC - in response to Message 1117426.
Last modified: 15 Jun 2011, 21:49:08 UTC

[snip]

But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling.


*nix is not immune to virii, but few people write viruses for *nix as the damage would be limited - and not as many people are P----d off at Linix or Unix due to them being almost free of cost, as opposed to M$ Windoze... but we're getting off topic...
____________
.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 13,760,141
RAC: 12,225
United States
Message 1117504 - Posted: 15 Jun 2011, 21:56:10 UTC

Something wrong with this batch of work units. I'm getting a ton of -9s. Was afraid it might be me but they are starting to validate against all types of other machines.
____________


PROUD MEMBER OF Team Starfire World BOINC

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13326
Credit: 27,954,421
RAC: 16,143
United States
Message 1117548 - Posted: 15 Jun 2011, 23:20:35 UTC - in response to Message 1117498.

[snip]
But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling.


*nix is not immune to virii, but few people write viruses for *nix as the damage would be limited - and not as many people are P----d off at Linix or Unix due to them being almost free of cost, as opposed to M$ Windoze... but we're getting off topic...


Very far off topic. And the plural of virus is viruses. And Windows is spelled with a "ows" much like Linux isn't spelled with an "s" as in Linsux. And I don't think that people are pissed off with Windows because its not free. People don't write viruses for Linux because its not worth the small user base to put the amount of effort into breaking it.

Profile eaglescouter
Send message
Joined: 28 Dec 02
Posts: 162
Credit: 42,012,553
RAC: 106
United States
Message 1117567 - Posted: 16 Jun 2011, 0:05:16 UTC - in response to Message 1117121.

Bare with it Eaglescouter, mine tried and got a can't connect to server then turned around a minute later and got right through. It's catch as catch can right now as everybody fills up after the outage.


I'm still here. Today my machines are unable to upload completed work.
"Project servers may be temporarily down"

____________
It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :)

1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Technical News : Monolith (Jun 14 2011)

Copyright © 2014 University of California