Monolith (Jun 14 2011)

Message boards : Technical News : Monolith (Jun 14 2011)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1117102 - Posted: 14 Jun 2011, 23:06:13 UTC

Usual outage day. Project goes down, we squeeze and copy databases, project comes back up. It seems the mysql replica is oddly unable to keep up with much success anymore. I think the cause is our ridiculously consistent heavy load lately thus keeping the databases busier than normal. Anybody have any theories about what is causing the ridiculously consistent heavy load? What's also a little strange is the CPU/IO load on jocelyn is low... so what's the bottleneck? I'd have to guess network, but it's copying the logs from the master faster than executing the SQL within those logs. So...?

And speaking of high production loads I also just noticed we're low on work to split. Prepare for tonight to be a little rocky as files are slow to transfer up from the archives and get radar blanked before being splittable.

By the way, the Astropulse assimilators are off because the database table containing the signals had one of its fragments run out of extents. In layman's terms it reached an arbitrary limit that we'll now have to work around. We'll sort this out shortly.

Kepler data is here in a big ol' box and being archived down to HPSS. It sure is nice seeing the network graph for the whole lab going from a baseline of ~50 Mbits/sec to ~250 Mbits/sec when we started that procedure. Too bad we're still currently stuck using the HE connection for our uploads/downloads. Maybe someday that'll change.

Sorry my posts continue to be intermittent. I apologize but expect things to get worse as the music career will temporary consume me. You may see rather significant periods of silence from me for the next... I dunno... 6 to 12 months? I'm sure the others will chime in as needed if I'm not around.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1117102 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1117103 - Posted: 14 Jun 2011, 23:13:20 UTC - in response to Message 1117102.  
Last modified: 14 Jun 2011, 23:20:44 UTC

Anybody have any theories about what is causing the ridiculously consistent heavy load?

Yes, you've been splitting practically nothing but "shorties" - very high angle range tasks, from a basketweave survey at Arecibo. Hang on, I'll get you the reference.

Edit - try my message 1112964. That covers most of it.
ID: 1117103 · Report as offensive
Profile eaglescouter

Send message
Joined: 28 Dec 02
Posts: 162
Credit: 42,012,553
RAC: 0
United States
Message 1117115 - Posted: 14 Jun 2011, 23:54:26 UTC

Unable to upload results,

first box reports: project servers may be temporarily down
second box reports: Internal HTTP server error

help!

It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :)
ID: 1117115 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1117120 - Posted: 14 Jun 2011, 23:59:36 UTC - in response to Message 1117102.  

Thanks for the update Matt,

Claggy
ID: 1117120 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1117121 - Posted: 15 Jun 2011, 0:01:16 UTC - in response to Message 1117115.  

Bare with it Eaglescouter, mine tried and got a can't connect to server then turned around a minute later and got right through. It's catch as catch can right now as everybody fills up after the outage.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1117121 · Report as offensive
Profile Jeff Mercer

Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1117129 - Posted: 15 Jun 2011, 0:14:42 UTC


Thanks for the update Matt. So far, I'm running pretty good. I'm getting plenty of work, and so far, no problem uploading or downloading. Enjoy the music !! I play a little Hendrix at times. Takes my mind off of a lot of problems !! Thanks for sticking with the project. I appreciate your hard work.
ID: 1117129 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1117154 - Posted: 15 Jun 2011, 1:23:24 UTC - in response to Message 1117102.  

Matt, thanks for the news!


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1117154 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 1117156 - Posted: 15 Jun 2011, 1:42:39 UTC
Last modified: 15 Jun 2011, 1:43:39 UTC

Break a leg! Or is that only for actors?
ID: 1117156 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1117170 - Posted: 15 Jun 2011, 2:44:46 UTC - in response to Message 1117102.  

It sure is nice seeing the network graph for the whole lab going from a baseline of ~50 Mbits/sec to ~250 Mbits/sec when we started that procedure. Too bad we're still currently stuck using the HE connection for our uploads/downloads. Maybe someday that'll change.


Thanks for the update Matt,keep up the good work. I glad someone/something opened the flood gates even though it may not last long, d/l usually moving at 3.67Kb - 15Kb taking hours just shot up to 88Kb - 347Kb and minutes.
ID: 1117170 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1117185 - Posted: 15 Jun 2011, 3:17:59 UTC

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!
Boinc....Boinc....Boinc....Boinc....
ID: 1117185 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22540
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1117233 - Posted: 15 Jun 2011, 7:53:48 UTC

My pet theory - re-try times are too short for the current "shorty storm". In previous existences I've found that the re-try rate can be very sensitive to the time-out time, small changes in that can have very substantial changes in overall throughput of a system.


On a more human note, enjoy your music career, and when are you touring the UK?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1117233 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1117406 - Posted: 15 Jun 2011, 17:43:55 UTC - in response to Message 1117185.  

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1117406 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1117410 - Posted: 15 Jun 2011, 17:45:58 UTC - in response to Message 1117406.  

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

- Matt


No chance some viral meanie has crept into the works?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1117410 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1117425 - Posted: 15 Jun 2011, 18:22:58 UTC - in response to Message 1117406.  

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

What numbers would those be, Matt? Maybe we can help, looking at it from this end?
ID: 1117425 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1117426 - Posted: 15 Jun 2011, 18:23:54 UTC - in response to Message 1117410.  
Last modified: 15 Jun 2011, 18:24:22 UTC

I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not!

Of course we know shorties are a major problem, but some other numbers just aren't adding up...

- Matt


No chance some viral meanie has crept into the works?


But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling.
ID: 1117426 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1117441 - Posted: 15 Jun 2011, 19:18:18 UTC

Did you find the bottleneck? I just got a herd of downloads and they are coming at me fast and furious!

Whatever it was, great job guys!


PROUD MEMBER OF Team Starfire World BOINC
ID: 1117441 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1117498 - Posted: 15 Jun 2011, 21:46:32 UTC - in response to Message 1117426.  
Last modified: 15 Jun 2011, 21:49:08 UTC

[snip]
But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling.


*nix is not immune to virii, but few people write viruses for *nix as the damage would be limited - and not as many people are P----d off at Linix or Unix due to them being almost free of cost, as opposed to M$ Windoze... but we're getting off topic...
.

Hello, from Albany, CA!...
ID: 1117498 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1117504 - Posted: 15 Jun 2011, 21:56:10 UTC

Something wrong with this batch of work units. I'm getting a ton of -9s. Was afraid it might be me but they are starting to validate against all types of other machines.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1117504 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1117548 - Posted: 15 Jun 2011, 23:20:35 UTC - in response to Message 1117498.  

[snip]
But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling.


*nix is not immune to virii, but few people write viruses for *nix as the damage would be limited - and not as many people are P----d off at Linix or Unix due to them being almost free of cost, as opposed to M$ Windoze... but we're getting off topic...


Very far off topic. And the plural of virus is viruses. And Windows is spelled with a "ows" much like Linux isn't spelled with an "s" as in Linsux. And I don't think that people are pissed off with Windows because its not free. People don't write viruses for Linux because its not worth the small user base to put the amount of effort into breaking it.
ID: 1117548 · Report as offensive
Profile eaglescouter

Send message
Joined: 28 Dec 02
Posts: 162
Credit: 42,012,553
RAC: 0
United States
Message 1117567 - Posted: 16 Jun 2011, 0:05:16 UTC - in response to Message 1117121.  

Bare with it Eaglescouter, mine tried and got a can't connect to server then turned around a minute later and got right through. It's catch as catch can right now as everybody fills up after the outage.


I'm still here. Today my machines are unable to upload completed work.
"Project servers may be temporarily down"

It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :)
ID: 1117567 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Technical News : Monolith (Jun 14 2011)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.