Message boards :
Number crunching :
Panic Mode On (80) Server Problems?
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 25 · Next
Author | Message |
---|---|
MikeN Send message Joined: 24 Jan 11 Posts: 314 Credit: 53,478,845 RAC: 16,440 ![]() ![]() |
I notice this morning that my pc 6738288 has marked around 100 tasks as abandoned. What caused this and is there anything I need to do? It hasn't abandoned anymore in the past 6-7 hours and appears to be working. Boinc tasks is reporting about 200 tasks on the pc but the web site reports about 120. I think you need to abandon all the tasks on your PC by resetting the project. If BOINC has marked them as abandoned you will not get credit for them, nor will any results you return count. I would also reboot the PC as there may be some sort of problem with it. This happened to me three times in two days over Christmas, always with the same PC. Everytime it got 100 CPU tasks, a couple of hours later they would all be marked abandoned. In the end I suspended CPU processing until I could get back to work to reboot he system and check there was nothing wrong with the PC. It has been fine since. ![]() |
![]() Volunteer tester ![]() Send message Joined: 18 Aug 99 Posts: 1324 Credit: 86,116,568 RAC: 26,508 ![]() ![]() |
Looks like things are grinding to a halt again. Can't get any GPU d/ls because I have 2 AP tasks that have been stuck for several hours and retries don't help much as it quickly drops to under 1.0 kb then goes into retry status. ![]() I don't buy computers, I build them!! |
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 9594 Credit: 123,422,866 RAC: 76,350 ![]() ![]() |
Whatever was causing the MB validators to get bogged down appears to have cleared- the backlog is now declining instead of growing. Also the AP work appears to have pretty much cleared, so hopefully the download traffic will settle down shortly. And whatever they did to the splitters after that last outage appears to have worked- i can't remember a time when the splitters were able to pump out 60 WU/s time after time, but now the Ready to send buffer is back down to normal levels, that's what they're doing. 50/s is the worst so far, for a quite a while there they were struggling to do 20/s. Grant Darwin NT |
rob smith ![]() Volunteer tester Send message Joined: 7 Mar 03 Posts: 15870 Credit: 294,503,822 RAC: 314,825 ![]() ![]() |
The current round of sick download performance was happening before the APs started to move, so you can forget them as the culprit - the root cause is a very poorly though through and implemented download scheduling algorithm that can do nothing apart from cause problems. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 42,153,030 RAC: 1,281 ![]() |
- the root cause is a very poorly though through and implemented download scheduling algorithm that can do nothing apart from cause problems. Iz that inside info ? If they have found another gremlin with its neck stuck in a bottle, the hIT men can have a smashing time with it, unless the Trainman says otherwize :( |
![]() Volunteer tester ![]() Send message Joined: 18 Aug 99 Posts: 1324 Credit: 86,116,568 RAC: 26,508 ![]() ![]() |
- the root cause is a very poorly though through and implemented download scheduling algorithm that can do nothing apart from cause problems. I don't know what's broken, but I have stopped the scheduler from sending me any more AP tasks until this get cleared up. I have 10 on one machine, 7 that has been sitting here since yesterday and the scheduler won't send any more because of the tasks are stuck. There are 23 that are stuck on the other one that are holding up 99 other tasks from d/l'ing. Since everything here is being held up by the APs, I'm seriously considering aborting them just to get some work. The retry button on both machines are in severe pain because I am abusing them so much. ![]() I don't buy computers, I build them!! |
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 9594 Credit: 123,422,866 RAC: 76,350 ![]() ![]() |
The current round of sick download performance was happening before the APs started to move, so you can forget them as the culprit - the root cause is a very poorly though through and implemented download scheduling algorithm that can do nothing apart from cause problems. ? My hosts file is set to use the one server that behaves OK. Even that has been struggling since the network traffic maxed out. Before the network traffic maxed out downloads were coming down quickly. With the severe WU limits in place, and the reduced availability of AP work, it was possible that the network traffic would back off. For whatever reason, that hasn't happened. Grant Darwin NT |
rob smith ![]() Volunteer tester Send message Joined: 7 Mar 03 Posts: 15870 Credit: 294,503,822 RAC: 314,825 ![]() ![]() |
You were the lucky one. Before the servers maxed out I was lucky to get any work, and when I did an MB was taking about 30 minutes to download, with numerous re-tries, now the servers are maxed out MBs are either taking 1 minute or, in the majority of cases 30 minutes with many re-tries. In "normal" times an MB takes about 10 seconds, and an AP a couple of minutes, with only the odd re-try - even when the servers are maxed those times only double. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() ![]() Send message Joined: 16 May 99 Posts: 10363 Credit: 93,311,901 RAC: 30,631 ![]() ![]() |
Some weeks here at Seti the work units just fly from the downloader, AP included .Then we get weeks where they crawl at snail speed if your lucky. I see the GPU group has another drive for two new servers. Im just wondering if that will help. After all if your trying to push 10" diameter of water thru a 1" diameter pipe, how effective will that be? Is it hardware or software? or a combination of both. ![]() Old James |
tbret Volunteer tester ![]() Send message Joined: 28 May 99 Posts: 3377 Credit: 262,328,898 RAC: 107,218 ![]() ![]() |
My personal belief is that nobody knows. My fear is that nobody cares. My annoyance is that nobody seems to be trying to find-out (just for fun). My response is to "abandon all hope." ![]() |
![]() Volunteer tester ![]() Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,959,816 RAC: 72 ![]() |
Every now and then it sure does feel like it. Then eventually lucidity gets the better of me and I realize that (as far as my little cruncher is concerned) Seti enjoys 100% uptime. Well, at least for the past 2 years it has... Now obviously to make a statement like that I must be bending the rules a little bit. And by that I mean I'm not counting a couple weeks of server replacements or a multi day power outage or any other scenario the Lab has no control over (though I can't remember any others in the 2yr timeframe mentioned). Now I know this is no consolation to anyone with a PC quick enough to burn through their cache in a matter of hours. But I'd be lying if I said my little netbook hasn't been crunching pretty much non-stop for a fraction over 24 months now. All that means is, a lot of people must be doing something right. And it's not just the people in the lab that care. What about Richard who's always looking out for us? What about Brad who just posted a wall-of-text of accomplishments for the lab? That's just a couple of the people going above and beyond so things run smoothly for us. But not a definitive list of people who care. What about the Raistmers and the Jasons? The Joes, Claggys and Mikes? I'm sure that's just the tip of the iceberg. In fact I think the reality is that everybody here cares for the project. |
tbret Volunteer tester ![]() Send message Joined: 28 May 99 Posts: 3377 Credit: 262,328,898 RAC: 107,218 ![]() ![]() |
So that my comment isn't misunderstood as being snarky: I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues. I didn't even go so far as to say, my "observation" is that nobody cares. It is plain that a great number of people care a lot about this project. My slowest cruncher has only run out of work once in many months. Your point is well taken. ![]() |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 26 May 99 Posts: 9235 Credit: 54,942,872 RAC: 48,070 ![]() ![]() |
I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues. I do not believe that Matt Jeff and Eric "don't care". Sorry that is going too far. I don't believe they have the time and resources to do what they want but to say the guys in the lab "don't care" is to me and insult. "Proud to be born and bred in Croydon" |
Speedy Volunteer tester ![]() Send message Joined: 26 Jun 04 Posts: 995 Credit: 8,847,415 RAC: 2,895 ![]() ![]() |
but to say the guys in the lab "don't care" is to me and insult. +1 ![]() |
David S Volunteer tester ![]() Send message Joined: 4 Oct 99 Posts: 18352 Credit: 23,694,275 RAC: 5,728 ![]() ![]() |
I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues. I think you're still not understanding him. He didn't say they don't care, he said his fear is that they don't care. There's a difference between a fear and a belief. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 26 May 99 Posts: 9235 Credit: 54,942,872 RAC: 48,070 ![]() ![]() |
I did say my "fear" is that nobody cares, and by "nobody" I meant anyone who could do anything about server issues. I understand the words but why even hint that perhaps no one cares, because that is what that statement suggests. Anyone who has been around these boards should realise that "a lack of caring" is NOT the problem and it is disingenuous to even suggest it. Also coming from someone with a total of 95,733,064 and running 13 machines. It just strikes the wrong note with me. "Proud to be born and bred in Croydon" |
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 9594 Credit: 123,422,866 RAC: 76,350 ![]() ![]() |
I think you're still not understanding him. He didn't say they don't care, he said his fear is that they don't care. There's a difference between a fear and a belief. Belief or fear makes no difference- he is implying that they don't care. Of course if he was to read the donation thread he'd see that there are plans to address a couple of the major problems the project has. That they've gone to the effort to inform the fund raisers of their requirements would indicate to most people that they do care. Grant Darwin NT |
Richard Haselgrove ![]() Volunteer tester Send message Joined: 4 Jul 99 Posts: 11891 Credit: 115,347,560 RAC: 69,746 ![]() ![]() |
I think I'm going to assume that the fact that you've got nothing better to do than argue semantic points from a post eight messages, two days and a maintenance outage ago is a good thing. It means that the project has (very quietly) started running so smoothly that you've all got nothing better to panic about :P |
![]() ![]() Send message Joined: 2 Oct 99 Posts: 81 Credit: 13,809,664 RAC: 15,430 ![]() ![]() |
What some forget, the original idea for distributed computing was that we donate our spare computer cycles for scientists to use for research purpuses. This was NEVER intended to be a "game" for points. ![]() |
Nick: ID 666 ![]() Volunteer tester Send message Joined: 18 May 99 Posts: 11698 Credit: 31,887,026 RAC: 2,526 ![]() ![]() |
I think I'm going to assume that the fact that you've got nothing better to do than argue semantic points from a post eight messages, two days and a maintenance outage ago is a good thing. It's at times like this when I think we should have a "Preparing to Panic" thread. |
©2018 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.