Message boards :
Number crunching :
Extended Outage August 3 2010 Problems
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Pappa, Allen, tests are happening which do not show on the server status page. For myself, If I left something on the table before I go to bed I would expect it to be the same as I left it in the morning. To get things accommplished, the Seti Staff needs that space. Regards Please consider a Donation to the Seti Project. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
I am a C++ developer and I have experience with Windows and Linux if you guys are stuck. Ian, Welcome If you are really interested in helping in I suggest that you look at Boinc Dev and Boinc Alpha. http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_alpha There you can sign up for an account and review what has transpired. Then you can join the repository to download the code. It gives you a starting point. More than that I can not say. Regards Please consider a Donation to the Seti Project. |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
I'm a recently returned, long-time, pre-BOINC SAH cruncher. Hmm.. since I believe all of the failure to connect messages come from the boinc client.. It would need to upon failure check and read content of a 1 line project specific file... something like pull up a MOTD, or GREP the project line of a boinc wide status list... The trick would be where to keep it that would be accessable during ANY projects outage. Janice |
Ian Green Send message Joined: 25 Jul 10 Posts: 24 Credit: 102,337 RAC: 0 |
I am a C++ developer and I have experience with Windows and Linux if you guys are stuck. I signed up to the mailing list. |
rob smith Send message Joined: 7 Mar 03 Posts: 22534 Credit: 416,307,556 RAC: 380 |
Wonder what Pappa means by 'Here is the next version'. Different to worse. Stacks of jobs uploading, most taking several "real" attempts. Downloads, a few, and they are all in instant retry. Some of this might be down to the fact that the world started to move late this afternoon (UK time), so there must be hundreds of thousands of jobs to upload and the bit of damp string is getting dried out by the heat generated (dry string doesn't conduct as well as wet string......). But I don't think that's the only issue as it wasn't this bad last weekend. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Wonder what Pappa means by 'Here is the next version'. In the past Matt stated there were over a million Results uploaded daily. Now pause that for 3 days... The silly part as I just got back to check mail I see two of 3 machines have have cleared itself without human intervention. Regards Please consider a Donation to the Seti Project. |
Ian Green Send message Joined: 25 Jul 10 Posts: 24 Credit: 102,337 RAC: 0 |
Well I suspect that this weekly outage is becoming a nuisance. Might be an idea to think form the point of view of a large data center and adopt their strategies. |
Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 |
Might be an idea to think from the point of view of a large data center and adopt their strategies. Well, the first stategy to adopt is their funding mode. How much are you willing to pay for S@H? |
Blurf Send message Joined: 2 Sep 06 Posts: 8964 Credit: 12,678,685 RAC: 0 |
Well I suspect that this weekly outage is becoming a nuisance. Ian--large data centers have appropriate funding and appropriate-size staffing. Not to be rude--how do you propose to apply this to the Seti lab? |
Blurf Send message Joined: 2 Sep 06 Posts: 8964 Credit: 12,678,685 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. Any specific reason the staff can't turn on the servers before they leave on Thursday night? |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Perhaps Near Time Persistency Checker runs through to Friday morning? Boinc....Boinc....Boinc....Boinc.... |
Ian Green Send message Joined: 25 Jul 10 Posts: 24 Credit: 102,337 RAC: 0 |
Storage is cheap, Linux is free. Server computers are not expensive anymore either. |
The Gas Giant Send message Joined: 22 Nov 01 Posts: 1904 Credit: 2,646,654 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. So they can get some sleep Thursday night? |
Blurf Send message Joined: 2 Sep 06 Posts: 8964 Credit: 12,678,685 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. TGG-you missed the point...I said before they leave |
Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. In the past, many of the Berkely gang have come in after hours, or at least spent time on line after hours, when server problems arise. I suspect that part of the new 3 day outrage is giving them some predictable time off. |
The Gas Giant Send message Joined: 22 Nov 01 Posts: 1904 Credit: 2,646,654 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. Yup. Server get's turned on befoe they leave...server goes kaput 2hrs later...evening check in means work to be done. Server gets turned on Friday morning when they get in (hopefully a little earlier than usual), server goes kaput 2hrs later, already there to fix it - just another day in paradise. ps. No need to make things bold - I got you the first time. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Storage is cheap, Linux is free. Server computers are not expensive anymore either. You are correct. in 2000 an EMC2 for 1 Terabyte was over 2 million. Today what is needed for the Master Science server is roughly $6000 16 Terabyte DAS So generally for Storage you have: NAS - Network Attached Storage. SAN - Storage Area Network. DAS - Direct Access Storage. In the case of a Data Center, you have several larger more powerful SAN's that get beat up by several servers or Clusters. Generally those are interconnected by 3 Gigabit Fiber Channel. A Good NAS is a Host computer with Very good Network capabilites and the OS is stripped down to handle File system only. A Good SAN has smaller processing power and once again is designed to optimize file system capabilities. Probably is interconnected via Fiber Channel and may Gigabit or higher interconnect. DAS is desiganed to hook directly the the monster Server that you just built (ordered). Normally connected via a Raid (or multiple) controller(s) Each of this pieces of hardware has a Raid controller. The Administrator has the problem of determining the Median/average file size to set the Stripe size and the cluster size to maximize the throughput. And the Raid type. So each drive has the CRC value of what is being written, and the Parity word plus the actual Data. That gets very complicated. Plus in a Win Server NTFS or a Nix Server iNodes to cover the amount of possible files to be written. So without writing about 3+ pages of the basic knowledge to do all this. Are you offering to purchase the DAS that is need to replace what "Bambi" currently holds? My understanding is they need at least 12 terabytes. Of course most of this Should be Enterprise class hardware. Regards Please consider a Donation to the Seti Project. |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. Going back to the original post. If a daemon grabs a chunk of data and start processing and is set to supend (not get more data) that will take x or x.xx plus hours, and it is running on Thurdsay. Or meaning it will complete at 6:30pm or maybe 9:30pm or maybe not until 3:30am while everyone is gone. It allows them to insure the last process completed successfully before turning on servers Friday morning. It also allows them to make final adjustments to server processes and reboot any machine that might be needed before unleshing the ungodly amount of traffic that is about to happen. So everyone is well rested and has reasonable confidence that when everything is brought back up; there should be no problems. Most perople here (there are a few exceptions) do not have to deal with more than one or two computers. They do not have to deal with authentication issues where servers have to authenticate to other servers for services (then Users have to authenticate). WE will not talk about having to us Radius to handle authentication across the Internet (pick your server OS). Pick you OS, Nix or Win the administrators recover as quickly as possible. Seti is an Enterprise Class operation (~200000 users with more than one computer) that is being ran on barely adequate hardware/connectivty. You all have been Demanding Science Too. Regards Please consider a Donation to the Seti Project. |
The Gas Giant Send message Joined: 22 Nov 01 Posts: 1904 Credit: 2,646,654 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. I'm pretty sure that's what I said... :p |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
Pappa-this question was raised before and I don't remember seeing an answer (my bad if I missed it)...think it's a good one. That is rediculous, If they turned them on Thursday and they went down it would be no different than if they didn't switch them on. They have gone many times with the servers down and no one racing in to fix them. Official Abuser of Boinc Buttons... And no good credit hound! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.