Panic Mode On (47) Server problems?

Author	Message
arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1110061 - Posted: 26 May 2011, 17:54:46 UTC Last modified: 26 May 2011, 17:56:02 UTC We know that one of the servers is having difficulties http://setiathome.berkeley.edu/forum_thread.php?id=64259 Please continue the venting. ID: 1110061 ·

Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0	Message 1110116 - Posted: 26 May 2011, 20:36:49 UTC - in response to Message 1110114. Good for you. My cache will run out in a day or so (if I keep the PCs running, doing nothing else but S@H) and therefore, in addition to your requests, I will also demand a refund! Don't take life too seriously, as you'll never come out of it alive! ID: 1110116 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 1110127 - Posted: 26 May 2011, 21:03:43 UTC - in response to Message 1110119. I understand Bruno decided to play nice. ID: 1110127 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1110150 - Posted: 26 May 2011, 22:14:20 UTC Posted by Jeff Cobb over in Tech News..... The folks at Overland really came through! They give us amazing support. One of their engineers logged into the the server and worked his magic. The RAID and filesystem have come back to life. We'll let the RAID resync and do a final reboot to clear some flags and then restart work generation. PROUD MEMBER OF Team Starfire World BOINC ID: 1110150 ·

KB7RZF Volunteer tester Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2	Message 1110177 - Posted: 27 May 2011, 0:35:21 UTC I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe ID: 1110177 ·

Miep Volunteer moderator Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0	Message 1110302 - Posted: 27 May 2011, 7:44:34 UTC - in response to Message 1110177. I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe Yes, we can see that :D - is there something you DON'T crunch? I seem to have another day or so, before I can test how well the backup project mechanism works in 6.12.28. Carola ------- I'm multilingual - I can misunderstand people in several languages! ID: 1110302 ·

Miep Volunteer moderator Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0	Message 1110405 - Posted: 27 May 2011, 15:04:09 UTC - in response to Message 1110401. This is totally unacceptable. My caches will run out in 10 days, and I tell you people that if this problem isn't solved in the coming 25 years, I will leave this project forever. Well well, the project is still not functioning as it should. I'm counting down now to when I will leave this project. Now it's only 24 years and 364 days left.... I wouldn't do that, if I was you, the counting down bit. What with the trees in the forest, I'd be afraid what exactly I was counting down to... Carola ------- I'm multilingual - I can misunderstand people in several languages! ID: 1110405 ·

KB7RZF Volunteer tester Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2	Message 1110410 - Posted: 27 May 2011, 15:12:10 UTC - in response to Message 1110302. I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe Yes, we can see that :D - is there something you DON'T crunch? I seem to have another day or so, before I can test how well the backup project mechanism works in 6.12.28. LOL There's a few newer projects that have come out that I haven't attached to. All of these in my sig have been ones I've crunched for during a teams project of the month and as a just because they looked interesting. SETI will always be home, but I figured its always nice to share. So, I shared. LOL ID: 1110410 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 1110420 - Posted: 27 May 2011, 15:25:19 UTC - in response to Message 1110401. This is totally unacceptable. My caches will run out in 10 days, and I tell you people that if this problem isn't solved in the coming 25 years, I will leave this project forever. Well well, the project is still not functioning as it should. I'm counting down now to when I will leave this project. Now it's only 24 years and 364 days left.... When it is working again, do you suspend your count, or does it reset? ID: 1110420 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1110424 - Posted: 27 May 2011, 15:31:59 UTC One positive thing that's come out of this is that the file deleters and db purge are getting some catch-up time. Results and Work Units waiting for db purge are both under 100K and dropping. Donald Infernal Optimist / Submariner, retired ID: 1110424 ·

Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20	Message 1110426 - Posted: 27 May 2011, 15:34:47 UTC - in response to Message 1110423. This is totally unacceptable. My caches will run out in 10 days, and I tell you people that if this problem isn't solved in the coming 25 years, I will leave this project forever. Well well, the project is still not functioning as it should. I'm counting down now to when I will leave this project. Now it's only 24 years and 364 days left.... I wouldn't do that, if I was you, the counting down bit. What with the trees in the forest, I'd be afraid what exactly I was counting down to... Nah, that's not a problem. I come from a family were the men lives until they're close to 100 years old. So, if the project keeps going, we will have to deal with you for another 40+ years? Works for me. Donald Infernal Optimist / Submariner, retired ID: 1110426 ·

Jason Safoutin Volunteer tester Send message Joined: 8 Sep 05 Posts: 1386 Credit: 200,389 RAC: 0	Message 1110433 - Posted: 27 May 2011, 16:13:34 UTC - in response to Message 1110410. I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe Yes, we can see that :D - is there something you DON'T crunch? I seem to have another day or so, before I can test how well the backup project mechanism works in 6.12.28. LOL There's a few newer projects that have come out that I haven't attached to. All of these in my sig have been ones I've crunched for during a teams project of the month and as a just because they looked interesting. SETI will always be home, but I figured its always nice to share. So, I shared. LOL Agreed. SETI is my favorite project and will always be my top one. I used to participate in other projects, but never was too interested in any of them. The problems happen often, but that won't stop me from crunching here ever. "By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3 ID: 1110433 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30812 Credit: 53,134,872 RAC: 32	Message 1110485 - Posted: 27 May 2011, 19:04:51 UTC - in response to Message 1110440. So, if the project keeps going, we will have to deal with you for another 40+ years? Aaaarggh! :-))) Eric almost has that Fountain of Youth formula ET sent decoded ... ID: 1110485 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1110666 - Posted: 28 May 2011, 5:41:14 UTC There's life on the cricket graph starting around 0500utc. Upload server is still disabled though. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1110666 ·

tbret Volunteer tester Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40	Message 1110680 - Posted: 28 May 2011, 7:48:07 UTC Instead of first asking a question that would expose how ignorant I am about this, let me get ahead of that and proclaim and admit that I am really ignorant about a lot of things. One of the many is a RAID array. Now, it's hard to be as ignorant as I am, especially since I once configured a RAID using an Adaptec controller that I seem to remember paying more for than the laptop I'm using to type this post. But that was back in the days when expensive motherboards would have two VESA Local Bus slots on them. Okay, so now that everyone's up-to-date on how out-of-date I am --- Just how big are these RAID arrays SETI is using (in GBs)? That question would probably answer my next question which is "Why are they using them?" I can't imagine that our upload / download activity, confined by the pipe into the lab, would need nearly 900MB/s and if it does then I can't imagine how many physical drives there would have to be in the array to handle it for more than half of a day at a time. Can someone give me a clue as to why you'd want to run a "striped" array (I understand redundancy) on this project in big, bold, conceptual strokes that even I can understand? I was just transferring an ISO file via wireless at 11.5MB/s across my den (I know that's one big file as opposed to 5,000 22k files). I'm thinking RAM makes more sense and so I want to know why I'm wrong; just as a sort-of "welcome to reality in the 21st century" lesson for me. ID: 1110680 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1110694 - Posted: 28 May 2011, 8:44:28 UTC - in response to Message 1110680. Just how big are these RAID arrays SETI is using (in GBs)? That question would probably answer my next question which is "Why are they using them?" RAID stands for Redundant Array of Inexpensive Disks. Note the redundant. RAID0 is just for speed, there is no redundancy. 1 disk dies, all data is lost. RAID 1 is mirroring, one disk maintains a copy of another disk. RAID 5+6 are the ones that really matter- data is spread across multiple disks. With RADI5, if one disk dies no data is lost. It can be rebuilt from the redundant data stored on the other disks in the array. With RAID6 2 disks in the array can die & still no data is lost. Grant Darwin NT ID: 1110694 ·

Tim Volunteer tester Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0	Message 1110699 - Posted: 28 May 2011, 9:09:00 UTC Last modified: 28 May 2011, 9:09:49 UTC Imagine a hard disk failure nowâ€¦â€¦ at our computers with so many tasks to upload :-) ID: 1110699 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 1110711 - Posted: 28 May 2011, 9:53:46 UTC - in response to Message 1110680. Just how big are these RAID arrays SETI is using (in GBs)? Looking at the Server Status Page, the line for 'Results out in the field' right now shows 5,857,809 for SETI (Multibeam), and 159,686 (Astropulse). Those are 'tasks' in modern terminology. Sometimes two people will be working on the same workunit, but often (more often for MB), one task will be complete and have been returned for validation, but the other will still be active. For the sake of argument, let's say that represents 5 million MB workunits, and 100 thousand AP workunits. That data has to be held at SETI, on those RAID arrays, until the results are validated - that's so a replacement copy can be sent out if validation is inconclusive or a worker times out. MB data files are 367 KiB in storage requirements, and AP are 8 MiB. Multiply that lot out, and I get the answer to be... Two thousand five hundred gigabytes When a RAID has to be rebuilt (as is going on at the moment), every single one of those bytes has to be read, and where appropriate written back to make the new redundant copy. If the RAID arrays held less data, the process would be quicker, and we could get back to work sooner. That's why I ask people not to hold so many tasks in their caches. ID: 1110711 ·

Jason Safoutin Volunteer tester Send message Joined: 8 Sep 05 Posts: 1386 Credit: 200,389 RAC: 0	Message 1110723 - Posted: 28 May 2011, 11:09:51 UTC I managed to get a cache of 59 WU's just not. This was the first time I tried to download work since the issues started. I noticed one took just a few seconds to crunch. I wonder if there will be many of those. Still not able to upload anything though. "By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3 ID: 1110723 ·

Jason Safoutin Volunteer tester Send message Joined: 8 Sep 05 Posts: 1386 Credit: 200,389 RAC: 0	Message 1110737 - Posted: 28 May 2011, 12:10:12 UTC - in response to Message 1110727. As I understand it, the reason for using RAID configurations is because there is redundancy built in, i.e. a backup. RAID disks are hot swappable, so if a hard drive fails you simply take it out and throw it away and plug a new one in. Then the RAID array will copy whatever it needs to the new disk. I don't have enough technical knowlege to know whether or not the Project is manipulating their data in the most efficient way, but I can appreciate, as Richard has shown, that the amount of data invloved is significantly high. Also, Seti staff are supposed to be project scientists first and foremost, and spend their time analysing the data we provide for them. But of neccessity, they have to also undertake a secondary role of being Server admins. I think they do rather well all things considered. Agree with you 100%. "By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3 ID: 1110737 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.