Message boards :
Technical News :
Monolith (Jun 14 2011)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Usual outage day. Project goes down, we squeeze and copy databases, project comes back up. It seems the mysql replica is oddly unable to keep up with much success anymore. I think the cause is our ridiculously consistent heavy load lately thus keeping the databases busier than normal. Anybody have any theories about what is causing the ridiculously consistent heavy load? What's also a little strange is the CPU/IO load on jocelyn is low... so what's the bottleneck? I'd have to guess network, but it's copying the logs from the master faster than executing the SQL within those logs. So...? And speaking of high production loads I also just noticed we're low on work to split. Prepare for tonight to be a little rocky as files are slow to transfer up from the archives and get radar blanked before being splittable. By the way, the Astropulse assimilators are off because the database table containing the signals had one of its fragments run out of extents. In layman's terms it reached an arbitrary limit that we'll now have to work around. We'll sort this out shortly. Kepler data is here in a big ol' box and being archived down to HPSS. It sure is nice seeing the network graph for the whole lab going from a baseline of ~50 Mbits/sec to ~250 Mbits/sec when we started that procedure. Too bad we're still currently stuck using the HE connection for our uploads/downloads. Maybe someday that'll change. Sorry my posts continue to be intermittent. I apologize but expect things to get worse as the music career will temporary consume me. You may see rather significant periods of silence from me for the next... I dunno... 6 to 12 months? I'm sure the others will chime in as needed if I'm not around. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Anybody have any theories about what is causing the ridiculously consistent heavy load? Yes, you've been splitting practically nothing but "shorties" - very high angle range tasks, from a basketweave survey at Arecibo. Hang on, I'll get you the reference. Edit - try my message 1112964. That covers most of it. |
eaglescouter Send message Joined: 28 Dec 02 Posts: 162 Credit: 42,012,553 RAC: 0 |
Unable to upload results, first box reports: project servers may be temporarily down second box reports: Internal HTTP server error help! It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :) |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Claggy |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Bare with it Eaglescouter, mine tried and got a can't connect to server then turned around a minute later and got right through. It's catch as catch can right now as everybody fills up after the outage. PROUD MEMBER OF Team Starfire World BOINC |
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Thanks for the update Matt. So far, I'm running pretty good. I'm getting plenty of work, and so far, no problem uploading or downloading. Enjoy the music !! I play a little Hendrix at times. Takes my mind off of a lot of problems !! Thanks for sticking with the project. I appreciate your hard work. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Matt, thanks for the news! - Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. - |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30925 Credit: 53,134,872 RAC: 32 |
Break a leg! Or is that only for actors? |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
It sure is nice seeing the network graph for the whole lab going from a baseline of ~50 Mbits/sec to ~250 Mbits/sec when we started that procedure. Too bad we're still currently stuck using the HE connection for our uploads/downloads. Maybe someday that'll change. Thanks for the update Matt,keep up the good work. I glad someone/something opened the flood gates even though it may not last long, d/l usually moving at 3.67Kb - 15Kb taking hours just shot up to 88Kb - 347Kb and minutes. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not! Boinc....Boinc....Boinc....Boinc.... |
rob smith Send message Joined: 7 Mar 03 Posts: 22444 Credit: 416,307,556 RAC: 380 |
My pet theory - re-try times are too short for the current "shorty storm". In previous existences I've found that the re-try rate can be very sensitive to the time-out time, small changes in that can have very substantial changes in overall throughput of a system. On a more human note, enjoy your music career, and when are you touring the UK? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not! Of course we know shorties are a major problem, but some other numbers just aren't adding up... - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not! No chance some viral meanie has crept into the works? "Time is simply the mechanism that keeps everything from happening all at once." |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not! What numbers would those be, Matt? Maybe we can help, looking at it from this end? |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I would have thought that Matt and the rest of the project staff KNEW they were sending out nothing but shorties. Guess not! But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Did you find the bottleneck? I just got a herd of downloads and they are coming at me fast and furious! Whatever it was, great job guys! PROUD MEMBER OF Team Starfire World BOINC |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
[snip] But of course not! They're running *nix which eradicated viruses long ago, when the Earth was still cooling. *nix is not immune to virii, but few people write viruses for *nix as the damage would be limited - and not as many people are P----d off at Linix or Unix due to them being almost free of cost, as opposed to M$ Windoze... but we're getting off topic... . Hello, from Albany, CA!... |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Something wrong with this batch of work units. I'm getting a ton of -9s. Was afraid it might be me but they are starting to validate against all types of other machines. PROUD MEMBER OF Team Starfire World BOINC |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
[snip] Very far off topic. And the plural of virus is viruses. And Windows is spelled with a "ows" much like Linux isn't spelled with an "s" as in Linsux. And I don't think that people are pissed off with Windows because its not free. People don't write viruses for Linux because its not worth the small user base to put the amount of effort into breaking it. |
eaglescouter Send message Joined: 28 Dec 02 Posts: 162 Credit: 42,012,553 RAC: 0 |
Bare with it Eaglescouter, mine tried and got a can't connect to server then turned around a minute later and got right through. It's catch as catch can right now as everybody fills up after the outage. I'm still here. Today my machines are unable to upload completed work. "Project servers may be temporarily down" It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.