Message boards :
Technical News :
Out of the fire and into the pit of sulfuric acid. (Feb 19, 2010)
Message board moderation
Author | Message |
---|---|
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Gargh! The science database on thumper went down at 2am due to a filled root partition. One of the raid arrays on thumper lost a drive at about the same time, and uploads are still too slow. I've fixed the first problem, a hot spare automatically fixed number 2 and will be working on number 3 now. Happy Friday! Eric @SETIEric@qoto.org (Mastodon) |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Thanks Eric, good to know. Hope everything turns out ok. PROUD MEMBER OF Team Starfire World BOINC |
ront Send message Joined: 25 Aug 01 Posts: 77 Credit: 386,336 RAC: 0 |
Thanks for the update. Hope it works out. Really appreciate the hard work and dedication you all display each and every day. ront |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
Beta is still down. |
Galadriel Send message Joined: 24 Jan 09 Posts: 42 Credit: 8,422,996 RAC: 0 |
thx Eric for your undying effort. sigh |
Dave Send message Joined: 29 Mar 02 Posts: 778 Credit: 25,001,396 RAC: 0 |
Wehay I have work!!!!!!!!!!!!!!!! HAHA :)! |
eaglescouter Send message Joined: 28 Dec 02 Posts: 162 Credit: 42,012,553 RAC: 0 |
Still unable to connect to server :( It's not too many computers, it's a lack of circuit breakers for this room. But we can fix it :) |
Ford Prefect Send message Joined: 16 Nov 08 Posts: 2 Credit: 579,432 RAC: 0 |
|
Eric Korpela Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60 |
Unfortunately Bob and Jeff brought the splitters and assimilators down to allow the RAID array to rebuild at full speed. That'll delay more work by a couple hours. Hopefully uploads are back to full speed by then. Eric @SETIEric@qoto.org (Mastodon) |
Dave Cummings Send message Joined: 16 May 09 Posts: 219 Credit: 1,193,729 RAC: 0 |
Mine came on for a short while, one machine got some more tasks at 19:13gmt but the main machine didnt even trying at the same time - hope the repairs are going well! |
Rick Send message Joined: 3 Dec 99 Posts: 79 Credit: 11,486,227 RAC: 0 |
Thanks for the update and many thanks for all the efforts of the SETI@Home staff. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66361 Credit: 55,293,173 RAC: 49 |
Unfortunately Bob and Jeff brought the splitters and assimilators down to allow the RAID array to rebuild at full speed. That'll delay more work by a couple hours. Hopefully uploads are back to full speed by then. Nice to hear, But I and others have been trying to upload for the last 7 days and all We get is project backoff, Sure Yesterday I was able to upload 2 Wu's and only 2 as those were ever sent acks, Somewhere in the next 15.5 hours I'll run out of work and the PC here will still be trying to upload, but can't thanks to the project backoff and people are not happy about the backoff at all as there are threads about It, Yet people seem to think Oh Yer only unable to upload cause of the outage, Richard Hasslegrove and I agree, This was happening before that and is not outage related as He says the traffic is just not reaching Seti, It's like We're getting boucebacks saying sorry said server doesn't exist there, go away. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
Dave Send message Joined: 29 Mar 02 Posts: 778 Credit: 25,001,396 RAC: 0 |
Superjoker: as explained elsewhere the backoffs are our friend as without them the servers would be flooded with requests & no-one would get anywhere. The longer the backoffs the better as it spreads the load more. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Superjoker: as explained elsewhere the backoffs are our friend as without them the servers would be flooded with requests & no-one would get anywhere. The longer the backoffs the better as it spreads the load more. Backoffs are a perfect technique for spreading the load when when the complete system is capable of handling (with an adequate satety margin) the aggregate anticipated demand averaged over an extended period of time. That's how SETI normally runs, and a few backoffs to shave the peaks and fill the troughs are exactly what's needed. Backoffs do not help if the aggregate load exceeds - over an extended period - the system's capacity to absorb work. Then you have to take more drastic action, to reduce demand or increase supply. For the last 4.5 days (only), SETI's capacity to absorb work has been below demand. I see no sign that demand has increased: instead, it seems to me that capacity has decreased (hopefully, temporarily). No amount of smoothing (backoffs) will solve this. What is needed is to restore the status quo ante on the capacity side. |
Berserker Send message Joined: 2 Jun 99 Posts: 105 Credit: 5,440,087 RAC: 0 |
Gargh! The science database on thumper went down at 2am due to a filled root partition. I know this pain very well. Our main source code server (at my employer) died for exactly the same reason. Not quite so many users depending on it, but still painful - especially as I was sat not very far away from it but had no access to the server room (at the time). Stats site - http://www.teamocuk.co.uk - still alive and (just about) kicking. |
Mike O Send message Joined: 1 Sep 07 Posts: 428 Credit: 6,670,998 RAC: 0 |
This will take some time to recover from.. I have dozens ans dozens up WUs to upload and im a small time farmer. Think of the ones that have 4 295s on an I7 and not just one Rig/Machine/System/MoneyPit/ElecticEater/ObjectOfEffection. Say "NO!" to button abuse. Pressing the panic buttons wont help and actually will slow the recovery down. Sit back.. grab some popcorn and enjoy the show. Last person to upload their WUs wins! Eric!.. DUDE! You always amaze me.. Dedication to this project is an understatement! Glad to hear no major things melted from the AC going on a hiatus. (other thread) Not Ready Reading BRAIN. Abort/Retry/Fail? |
Keith White Send message Joined: 29 May 99 Posts: 392 Credit: 13,035,233 RAC: 22 |
Superjoker: as explained elsewhere the backoffs are our friend as without them the servers would be flooded with requests & no-one would get anywhere. The longer the backoffs the better as it spreads the load more. That's all fine and good but some of us have been having upload/report problems for the last week. Scarecrow graphs show that returned units are still half of what they are normally. I'm still getting "project servers may be down" when it tries to connect. Done units keep failing to upload. The point is this isn't normal behavior and last time it was a faulty switch. It's aggravating that some people simply aren't seeing this problem while others have been seeing this even before the AC crash and the first group is saying "all is well". "Life is just nature's way of keeping meat fresh." - The Doctor |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
It's aggravating .... Well, it was aggravating while the problem was unacknowledged. Now that Eric is onsite, and has uttered the magic words "uploads are still too slow", my aggravation levels have dropped considerably. Hi Eric! Sorry you've copped for a miserable Friday, but thanks for the post. If there's anything useful we can do by way of remote logging/diagnostics/testing, please ask. And please note Keith's point that scheduler updates are slow-to-nonexistent as well. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
It's aggravating that some people simply aren't seeing this problem while others have been seeing this even before the AC crash and the first group is saying "all is well". There is a big difference between saying "all is well" and saying "I think it's fixed, let's see if it keeps getting better." I've been tracking a memory leak in one of my projects (not related to SETI or BOINC). It took about 20 minutes to fix the leak, but it took nearly a day for the results to show. [edit]My biggest worry for the SETI gang is that something stressed during the overheat hasn't decided to fail enough to show. It may be running flawlessly at least until the last staff member to leave is 10 minutes down the road. Knock on wood. |
Rick Send message Joined: 7 Aug 01 Posts: 3 Credit: 12,870 RAC: 0 |
There is always "MONDAY" for somebody. Thanks for your efforts and dedication of the staff. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.