Panic Mode On (80) Server Problems?

Message boards : Number crunching : Panic Mode On (80) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 25 · Next

AuthorMessage
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1325670 - Posted: 8 Jan 2013, 3:40:15 UTC - in response to Message 1325661.  

- the root cause is a very poorly though through and implemented download scheduling algorithm that can do nothing apart from cause problems.

Iz that inside info ?
If they have found another gremlin with its neck stuck in a bottle,
the hIT men can have a smashing time with it,
unless the Trainman says otherwize :(


I don't know what's broken, but I have stopped the scheduler from sending me any more AP tasks until this get cleared up. I have 10 on one machine, 7 that has been sitting here since yesterday and the scheduler won't send any more because of the tasks are stuck. There are 23 that are stuck on the other one that are holding up 99 other tasks from d/l'ing. Since everything here is being held up by the APs, I'm seriously considering aborting them just to get some work. The retry button on both machines are in severe pain because I am abusing them so much.


I don't buy computers, I build them!!
ID: 1325670 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1325673 - Posted: 8 Jan 2013, 4:30:25 UTC - in response to Message 1325623.  

The current round of sick download performance was happening before the APs started to move, so you can forget them as the culprit - the root cause is a very poorly though through and implemented download scheduling algorithm that can do nothing apart from cause problems.

?
My hosts file is set to use the one server that behaves OK. Even that has been struggling since the network traffic maxed out. Before the network traffic maxed out downloads were coming down quickly.
With the severe WU limits in place, and the reduced availability of AP work, it was possible that the network traffic would back off. For whatever reason, that hasn't happened.
Grant
Darwin NT
ID: 1325673 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1325715 - Posted: 8 Jan 2013, 6:32:41 UTC

You were the lucky one. Before the servers maxed out I was lucky to get any work, and when I did an MB was taking about 30 minutes to download, with numerous re-tries, now the servers are maxed out MBs are either taking 1 minute or, in the majority of cases 30 minutes with many re-tries.
In "normal" times an MB takes about 10 seconds, and an AP a couple of minutes, with only the odd re-try - even when the servers are maxed those times only double.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1325715 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1325819 - Posted: 8 Jan 2013, 16:20:24 UTC
Last modified: 8 Jan 2013, 16:20:45 UTC

Some weeks here at Seti the work units just fly from the downloader, AP included .Then we get weeks where they crawl at snail speed if your lucky.
I see the GPU group has another drive for two new servers. Im just wondering if that will help. After all if your trying to push 10" diameter of water thru a 1" diameter pipe, how effective will that be?

Is it hardware or software? or a combination of both.
[/quote]

Old James
ID: 1325819 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1325823 - Posted: 8 Jan 2013, 16:36:16 UTC - in response to Message 1325819.  
Last modified: 8 Jan 2013, 16:36:45 UTC



Is it hardware or software? or a combination of both.



My personal belief is that nobody knows.

My fear is that nobody cares.

My annoyance is that nobody seems to be trying to find-out (just for fun).

My response is to "abandon all hope."
ID: 1325823 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1325937 - Posted: 9 Jan 2013, 3:28:14 UTC - in response to Message 1325823.  


Is it hardware or software? or a combination of both.

My personal belief is that nobody knows.
My fear is that nobody cares.
My annoyance is that nobody seems to be trying to find-out (just for fun).
My response is to "abandon all hope."


Every now and then it sure does feel like it. Then eventually lucidity gets the better of me and I realize that (as far as my little cruncher is concerned) Seti enjoys 100% uptime. Well, at least for the past 2 years it has...

Now obviously to make a statement like that I must be bending the rules a little bit. And by that I mean I'm not counting a couple weeks of server replacements or a multi day power outage or any other scenario the Lab has no control over (though I can't remember any others in the 2yr timeframe mentioned).

Now I know this is no consolation to anyone with a PC quick enough to burn through their cache in a matter of hours. But I'd be lying if I said my little netbook hasn't been crunching pretty much non-stop for a fraction over 24 months now.

All that means is, a lot of people must be doing something right. And it's not just the people in the lab that care. What about Richard who's always looking out for us? What about Brad who just posted a wall-of-text of accomplishments for the lab? That's just a couple of the people going above and beyond so things run smoothly for us. But not a definitive list of people who care. What about the Raistmers and the Jasons? The Joes, Claggys and Mikes? I'm sure that's just the tip of the iceberg. In fact I think the reality is that everybody here cares for the project.
ID: 1325937 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1326112 - Posted: 9 Jan 2013, 17:36:10 UTC - in response to Message 1325937.  


Is it hardware or software? or a combination of both.

My personal belief is that nobody knows.
My fear is that nobody cares.
My annoyance is that nobody seems to be trying to find-out (just for fun).
My response is to "abandon all hope."


Every now and then it sure does feel like it. Then eventually lucidity gets the better of me and I realize that (as far as my little cruncher is concerned) Seti enjoys 100% uptime. Well, at least for the past 2 years it has...

Now obviously to make a statement like that I must be bending the rules a little bit. And by that I mean I'm not counting a couple weeks of server replacements or a multi day power outage or any other scenario the Lab has no control over (though I can't remember any others in the 2yr timeframe mentioned).

Now I know this is no consolation to anyone with a PC quick enough to burn through their cache in a matter of hours. But I'd be lying if I said my little netbook hasn't been crunching pretty much non-stop for a fraction over 24 months now.

All that means is, a lot of people must be doing something right. And it's not just the people in the lab that care. What about Richard who's always looking out for us? What about Brad who just posted a wall-of-text of accomplishments for the lab? That's just a couple of the people going above and beyond so things run smoothly for us. But not a definitive list of people who care. What about the Raistmers and the Jasons? The Joes, Claggys and Mikes? I'm sure that's just the tip of the iceberg. In fact I think the reality is that everybody here cares for the project.


So that my comment isn't misunderstood as being snarky: I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues. I didn't even go so far as to say, my "observation" is that nobody cares.

It is plain that a great number of people care a lot about this project.

My slowest cruncher has only run out of work once in many months. Your point is well taken.

ID: 1326112 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1326191 - Posted: 9 Jan 2013, 21:38:27 UTC

I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues.


I do not believe that Matt Jeff and Eric "don't care". Sorry that is going too far.
I don't believe they have the time and resources to do what they want but to say the guys in the lab "don't care" is to me and insult.
ID: 1326191 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 1326222 - Posted: 9 Jan 2013, 22:19:58 UTC - in response to Message 1326191.  

but to say the guys in the lab "don't care" is to me and insult.

+1
ID: 1326222 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1326371 - Posted: 10 Jan 2013, 15:37:30 UTC - in response to Message 1326191.  

I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues.


I do not believe that Matt Jeff and Eric "don't care". Sorry that is going too far.
I don't believe they have the time and resources to do what they want but to say the guys in the lab "don't care" is to me and insult.

I think you're still not understanding him. He didn't say they don't care, he said his fear is that they don't care. There's a difference between a fear and a belief.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1326371 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1326412 - Posted: 10 Jan 2013, 17:59:21 UTC - in response to Message 1326371.  

I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues.


I do not believe that Matt Jeff and Eric "don't care". Sorry that is going too far.
I don't believe they have the time and resources to do what they want but to say the guys in the lab "don't care" is to me and insult.

I think you're still not understanding him. He didn't say they don't care, he said his fear is that they don't care. There's a difference between a fear and a belief.

I understand the words but why even hint that perhaps no one cares, because that is what that statement suggests. Anyone who has been around these boards should realise that "a lack of caring" is NOT the problem and it is disingenuous to even suggest it. Also coming from someone with a total of 95,733,064 and running 13 machines. It just strikes the wrong note with me.
ID: 1326412 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1326414 - Posted: 10 Jan 2013, 18:14:03 UTC - in response to Message 1326371.  
Last modified: 10 Jan 2013, 18:14:37 UTC

I think you're still not understanding him. He didn't say they don't care, he said his fear is that they don't care. There's a difference between a fear and a belief.

Belief or fear makes no difference- he is implying that they don't care.
Of course if he was to read the donation thread he'd see that there are plans to address a couple of the major problems the project has. That they've gone to the effort to inform the fund raisers of their requirements would indicate to most people that they do care.
Grant
Darwin NT
ID: 1326414 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1326415 - Posted: 10 Jan 2013, 18:21:52 UTC

I think I'm going to assume that the fact that you've got nothing better to do than argue semantic points from a post eight messages, two days and a maintenance outage ago is a good thing.

It means that the project has (very quietly) started running so smoothly that you've all got nothing better to panic about :P
ID: 1326415 · Report as offensive
Profile QuietDad
Avatar

Send message
Joined: 2 Oct 99
Posts: 83
Credit: 28,926,603
RAC: 59
United States
Message 1326437 - Posted: 10 Jan 2013, 19:30:29 UTC

What some forget, the original idea for distributed computing was that we donate our spare computer cycles for scientists to use for research purpuses. This was NEVER intended to be a "game" for points.
ID: 1326437 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1326444 - Posted: 10 Jan 2013, 19:49:01 UTC - in response to Message 1326415.  

I think I'm going to assume that the fact that you've got nothing better to do than argue semantic points from a post eight messages, two days and a maintenance outage ago is a good thing.

It means that the project has (very quietly) started running so smoothly that you've all got nothing better to panic about :P

It's at times like this when I think we should have a "Preparing to Panic" thread.
ID: 1326444 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1326447 - Posted: 10 Jan 2013, 20:13:53 UTC - in response to Message 1326415.  

I think I'm going to assume that the fact that you've got nothing better to do than argue semantic points from a post eight messages, two days and a maintenance outage ago is a good thing.

It means that the project has (very quietly) started running so smoothly that you've all got nothing better to panic about :P

Quite correct I haven't been panicking for ages
ID: 1326447 · Report as offensive
Rolf

Send message
Joined: 16 Jun 09
Posts: 114
Credit: 7,817,146
RAC: 0
Switzerland
Message 1326450 - Posted: 10 Jan 2013, 20:24:45 UTC

Please, no panic!
Panic is when people start loosing connection to reality - but we are living in reality.

If "he" doesn't care (or "they"don't care), there are still two possibilities: Either he doesn't want to take care (his own decision) or he does not have the time to take care (somebody else decided).
As an example take Matt's blog: http://setiathome.berkeley.edu/forum_thread.php?id=59157
Although it's nearly 3 years old, some parts could still be valid!

I prefer the last Q-A-block:
Q: What do you mean "you hit the limit?" Are you mad at us?
A: No. No. No. Nothing personal, but I'd really like other project staff to chime in more often, and maybe I need to step away in order for that to happen. Plus being in the spotlight I sometimes find myself having to argue on behalf of policies or practices around here which I don't know much about, or I don't exactly agree with. Meh. I do wish SETI@home had better avenues and resources for public relations. In any case I'll still keep working towards improving that in the future whenever I can, whether or not that's direct contact from me or otherwise. The daily notes were fun, but honestly probably not the best use of my time. I may just report about bigger projects as they happen. We'll see.

So I'll be around - just a lot quieter.

ID: 1326450 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1326452 - Posted: 10 Jan 2013, 20:30:31 UTC

I've been receiving Short CPU tasks that insist they run Immediately. Plus, all my recently downloaded CUDA 23 tasks are...Shorties. I see an approaching Shortie Storm.

Batten down the hatches, no need to panic, it's just a little Short thing. Unless...
ID: 1326452 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1326513 - Posted: 11 Jan 2013, 0:15:39 UTC

Matt has posted an update and I believe that all serious Setizens should read:

Matt's Post Here
ID: 1326513 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1326670 - Posted: 11 Jan 2013, 14:21:55 UTC

Something's gone away....
Everything was still going fine about 5 hours ago when I went to bed. This morning I have a bunch of rigs that have dropped back to Einstein because they can't connect to the server to report and get new work.

Whazzup?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1326670 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.