Panic Mode On (96) Server Problems?

Message boards : Number crunching : Panic Mode On (96) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 23 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1649190 - Posted: 4 Mar 2015, 18:00:38 UTC

Hi all - just want to post here real quick that the lastest round of headaches (unresponsive schedulers, web site, etc.) has been due to our mysql database (the user/web database) getting into a overly busy state and thus unresponsive. We're still not exactly sure what brought this over the edge, as it is hasn't really grown in size, but we are working with every new bottleneck to help peel this apart. A few days ago I found one new-ish query happening from the web page that wasn't particularly efficient and gumming things up. Dave fixed that. Now I'm just finding the daily stats dump slowing things done. Why? I'm not sure, but I am moving that to the replica for now.

Sorry for the inconvenience and lack of responses about this. I should get back in the habit of regular tech news updates. I've fallen out of the habit due to my erratic schedule the past few years (though I am currently back to working full time, at least the first half of 2015).

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1649190 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1649191 - Posted: 4 Mar 2015, 18:00:46 UTC - in response to Message 1649184.  
Last modified: 4 Mar 2015, 18:02:34 UTC

I have long held the opinion that the project lacks effective communication with the crunchers and I am told by people here in the forum thats its a time issue. Would it really take more than a minute to place an item on the home page saying "We are aware of upload/download issues we are working on it".

There would probably be even more complaints on these boards if the message read "We are aware of upload/download issues, but we are *not* working on it, because we're working on the jobs our day work contracts pay us to do."

Edit - oops, cross-post with Matt (hi, Matt) - but I'll let it stand as a general, non-specific, comment.
ID: 1649191 · Report as offensive
Werecow
Avatar

Send message
Joined: 13 Mar 05
Posts: 56
Credit: 4,917,657
RAC: 3
United States
Message 1649195 - Posted: 4 Mar 2015, 18:13:13 UTC

Thanks for the update, Matt.


W'cow
ID: 1649195 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1649196 - Posted: 4 Mar 2015, 18:15:08 UTC

Thanks Matt.
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1649196 · Report as offensive
Herb Smith
Volunteer tester

Send message
Joined: 28 Jan 07
Posts: 76
Credit: 31,615,205
RAC: 0
United States
Message 1649199 - Posted: 4 Mar 2015, 18:17:00 UTC - in response to Message 1649191.  

If they at least posted that they were NOT working on things, I could just turn off my machines. As much as I hated my boss standing over me micromanaging when I was troubleshooting outages, I knew we had to take a few moments to communicate to everyone else what was happening. Common courtesy.

For a project like this an acknowledgement within the first hour. A more complete description of the problem within a day. And progress reports every few days. Certainly a lot better than the 10 minute acknowledgement and hourly updates I lived with. These updates are just simple notes. Takes maybe a minute or two.

And maybe somebody out here will recognize the issue and be able to help solve it. I am sure all the MySQL experts are thinking of possibilities right now.

Herb
ID: 1649199 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1649203 - Posted: 4 Mar 2015, 18:22:08 UTC

And "Wham bam thank you ma'am" My host reported 217 tasks and got 108 new ones :D

Thank You Matt :D
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1649203 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1649205 - Posted: 4 Mar 2015, 18:26:39 UTC - in response to Message 1649203.  

Finally able to report.
Now if only I could get some new work before it's all gone again & the splitters can't meet demand (Yes, there are presently 1.42 million results ready-to-send, however work In progress is down more than a million. Given the poor splitting rate, and people returning work as they receive & crunch it, it won't be that long before we're out of ready-to-send work & have to reply on what the splitters are putting out. ie- not much).
Grant
Darwin NT
ID: 1649205 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1649207 - Posted: 4 Mar 2015, 18:29:13 UTC
Last modified: 4 Mar 2015, 18:31:35 UTC

Thank you Matt.
Your update means a lot to me.
Now I will work on my hardware a bit and get ready for more WU!!
ID: 1649207 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34053
Credit: 18,883,157
RAC: 18
Belgium
Message 1649220 - Posted: 4 Mar 2015, 19:00:15 UTC

Thanx for the update Matt :)

Forums are running smooth now.
rOZZ
Music
Pictures
ID: 1649220 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1649256 - Posted: 4 Mar 2015, 19:48:59 UTC - in response to Message 1649190.  

Thanks for the update Matt,

Claggy
ID: 1649256 · Report as offensive
Chris Oliver Project Donor
Avatar

Send message
Joined: 4 Jul 99
Posts: 72
Credit: 134,288,250
RAC: 15
United Kingdom
Message 1649257 - Posted: 4 Mar 2015, 19:56:10 UTC - in response to Message 1649191.  

At least I would know to switch off !!!

[/quote]
Richard Haselgrove wrote:
There would probably be even more complaints on these boards if the message read "We are aware of upload/download issues, but we are *not* working on it, because we're working on the jobs our day work contracts pay us to do.".[/quote]
ID: 1649257 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1649272 - Posted: 4 Mar 2015, 20:21:39 UTC

Thanks for the update Matt, it's greatly appreciated ! Keep up the good work, we'll stick around...
ID: 1649272 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1649289 - Posted: 4 Mar 2015, 21:01:23 UTC

Web pages are properly loading and tasks seem to be reporting, now we just have to get new work and AP up. It is likely I will be crunching more Einstein.
ID: 1649289 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1649325 - Posted: 4 Mar 2015, 22:24:21 UTC - in response to Message 1649273.  

2,315,978 Results ready to send. Not one of them getting to my main computer. The two tasks my Laptop got, must have been a freak incident.

Validation of completed tasks is so slow, that it will take a year to validate my completed tasks.

Icicles forming on my GPU now, not good at all....

Tuesday outages is turning into something more like Tuesday to Friday outage.

I'm not receiving many either. There was a brief spurt of around 25 to 2 machines but 1 machine only received 1 task. Now the 2 machines are about out again while the 1 machine finally did receive 37. Likewise the 'Results received in last hour' peaked a little above 100k but is now back down to 41,893. Seems not many of those 2,229,483 are making it to working machines. I wonder where all that data is going. I'm also not seeing many validations. I keep looking for my old CPU tasks to validate but not many have.
ID: 1649325 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1649327 - Posted: 4 Mar 2015, 22:31:20 UTC
Last modified: 4 Mar 2015, 22:31:53 UTC

Give the servers some time to catch up.
Lots of hosts requesting work right now.

Thanks Matt for the update.


With each crime and every kindness we birth our future.
ID: 1649327 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1649335 - Posted: 4 Mar 2015, 22:37:54 UTC

Thanks for the update Matt. :-)

Well my main rig did do a dip into CPU backup work, probably about 5mins before things started to move here (as usual), but it's still very hard to get much work.

Cheers.
ID: 1649335 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1649346 - Posted: 4 Mar 2015, 22:53:50 UTC

Very much luck of the draw, my two identical rigs were both out of GPU work, one got 87 new tasks about half an hour ago, the other got 20 a couple of minutes ago!
ID: 1649346 · Report as offensive
The_bestest

Send message
Joined: 7 Oct 06
Posts: 36
Credit: 82,706,887
RAC: 79
United States
Message 1649356 - Posted: 4 Mar 2015, 23:37:00 UTC - in response to Message 1649191.  

Matt's update was fine, and that's really all we can hope for. The issue is finding the update. It is extremely difficult to find a specific post in a specific thread on a specific subject. If Matt's update were on the home page, then people would learn to go to that specific location for the most current update. The exact same amount of time would bre required, only the location of the update changes

To say that would create more complaints is in my opinion silly. It is the uncertainty that I believe creates more issues. If the updates are to be kept in the discussion forums, then create a category that ONLY the admins can post to. No comments, no nuthin posted there by anyone else
ID: 1649356 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1649357 - Posted: 4 Mar 2015, 23:50:13 UTC - in response to Message 1649356.  

Matt's update was fine, and that's really all we can hope for. The issue is finding the update. It is extremely difficult to find a specific post in a specific thread on a specific subject. If Matt's update were on the home page, then people would learn to go to that specific location for the most current update. The exact same amount of time would bre required, only the location of the update changes

To say that would create more complaints is in my opinion silly. It is the uncertainty that I believe creates more issues. If the updates are to be kept in the discussion forums, then create a category that ONLY the admins can post to. No comments, no nuthin posted there by anyone else

Which is close to what the Technical News forum is designed for, and I for one would welcome Matt's resumption of a narrative there. TN allows a degree of interactivity, which I think is useful - there have been times when the Q & A dialog following an initial 'news' post has been even more informative than the opening item. But I must confess I'm less enthusiastic about the incontinent outpourings of 'thank you' that follow any new thread. And as for posters that hit the top link on any random page for their outpourings - yet again, 'read before you post', guys.
ID: 1649357 · Report as offensive
Profile xpozd
Avatar

Send message
Joined: 26 Jan 15
Posts: 88
Credit: 280,183
RAC: 1
Canada
Message 1649358 - Posted: 4 Mar 2015, 23:51:47 UTC - in response to Message 1649349.  

I'm still getting the huge amount of "NO TASKS AT ALL".


same thing here its all i been seeing since mondays maintenance.
ID: 1649358 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (96) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.