Godzilla Meets Bambi (Aug 29 2007)

Author	Message
Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 629095 - Posted: 29 Aug 2007, 20:40:26 UTC As far as the public data pipeline is concerned, it's been relatively smooth sailing since recovering from the weekly outage yesterday. Queues are draining or filling in the right directions, work is being created and sent out at an even pace, etc. However, bambi was a bit of a time consuming headache this morning. It finally resynced from the spurious RAID failure yesterday. I tested the supposed failed drives and got enough confusing outputs that I thought the disk controller went nuts. Playing around with the 3ware BIOS showed this was more or less the case: every time we rescanned the drives a different small random subset would disappear from the list. This isn't a good thing. We popped the system open and found nothing loose or unseated. So we did a true power cycle - unplugging it from the wall, etc. Since then the disks have all returned and remain intact after several rescans and reboots. So perhaps an ugly bit got jammed in the 3ware card and needed to be neutralized. Meanwhile I moved splitting to lando so I could work on bambi without dangerously running low on work to send. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 629095 ·

Peter Send message Joined: 27 Jun 99 Posts: 26 Credit: 10,645,591 RAC: 0	Message 629098 - Posted: 29 Aug 2007, 20:47:35 UTC - in response to Message 629095. Nice to see the project is keeping you busy. - But as an a side why do I keep getting "system connect" errors - new Vista Prem build about 21:40 BST ID: 629098 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 629174 - Posted: 29 Aug 2007, 22:24:11 UTC Thank You Berkeley AND Matt ;) . . . Keep Posting - It is Appreciated . . . ID: 629174 ·

DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2	Message 629236 - Posted: 30 Aug 2007, 0:21:32 UTC - in response to Message 629095. I just had a Compaq RAID controller go bad upon reboot last week. Just between you and me, I would acquire a spare to have on-hand, just in case. What kind of 3wire RAID controller is it? ID: 629236 ·

Astro-AL Send message Joined: 31 Mar 00 Posts: 18 Credit: 95,868,034 RAC: 80	Message 629328 - Posted: 30 Aug 2007, 2:18:41 UTC - in response to Message 629095. I wish someone would explain to me why others seem to be getting work and I just keep waiting. It has been days since I have had work on my machines. I have only been able to get a few WU's on one machine. I have been trying to get answers, but can't get any responses. Below is the messages I have been receiving. SETI@home 8/28/2007 4:39:48 PM [file_xfer]Started download of file libffw3f-3-1-1a_upx.dll 8/28/2007 4:40:11 PM Project communication failed: attempting access to reference site 8/28/2007 4:40:11 PM [file_xfer] Temporaraily failed download of file libffw3f-3-1-1a_upx.dll: system connect 8/28/2007 4:40:11 PM Backing 0ff 1 hr 56 min 41 sec on download of file libffw3f-3-1-1a_upx.dll 8/28/2007 4:40:11 PM Access to reference site succeeded-project servers may be temporarily down. SETI@home 8/28/2007 5:22:35 PM [file_xfer]Started download of file setiathome_5.27_windows_intelx86.exe 8/28/2007 5:22:35 PM Project communication failed: attempting access to reference site 8/28/2007 5:22:35 PM [file_xfer] Temporaraily failed download of file setiathome_5.27_windows_intelx86.exe: system connect 8/28/2007 5:22:35 Backing 0ff 1 hr 56 min 41 sec on download of file setiathome_5.27_windows_intelx86.exe 8/28/2007 5:22:35 PM Access to reference site succeeded-project servers may be temporarily down. As far as the public data pipeline is concerned, it's been relatively smooth sailing since recovering from the weekly outage yesterday. Queues are draining or filling in the right directions, work is being created and sent out at an even pace, etc. However, bambi was a bit of a time consuming headache this morning. It finally resynced from the spurious RAID failure yesterday. I tested the supposed failed drives and got enough confusing outputs that I thought the disk controller went nuts. Playing around with the 3ware BIOS showed this was more or less the case: every time we rescanned the drives a different small random subset would disappear from the list. This isn't a good thing. We popped the system open and found nothing loose or unseated. So we did a true power cycle - unplugging it from the wall, etc. Since then the disks have all returned and remain intact after several rescans and reboots. So perhaps an ugly bit got jammed in the 3ware card and needed to be neutralized. Meanwhile I moved splitting to lando so I could work on bambi without dangerously running low on work to send. - Matt ID: 629328 ·

JLDun Volunteer tester Send message Joined: 21 Apr 06 Posts: 573 Credit: 196,101 RAC: 0	Message 629405 - Posted: 30 Aug 2007, 4:06:15 UTC - in response to Message 629328. [quote]I wish someone would explain to me why others seem to be getting work and I just keep waiting. It has been days since I have had work on my machines. I have only been able to get a few WU's on one machine. I have been trying to get answers, but can't get any responses. Below is the messages I have been receiving. SETI@home 8/28/2007 4:39:48 PM [file_xfer]Started download of file libffw3f-3-1-1a_upx.dll 8/28/2007 4:40:11 PM Project communication failed: attempting access to reference site 8/28/2007 4:40:11 PM [file_xfer] Temporaraily failed download of file libffw3f-3-1-1a_upx.dll: system connect 8/28/2007 4:40:11 PM Backing 0ff 1 hr 56 min 41 sec on download of file libffw3f-3-1-1a_upx.dll 8/28/2007 4:40:11 PM Access to reference site succeeded-project servers may be temporarily down. SETI@home 8/28/2007 5:22:35 PM [file_xfer]Started download of file setiathome_5.27_windows_intelx86.exe 8/28/2007 5:22:35 PM Project communication failed: attempting access to reference site 8/28/2007 5:22:35 PM [file_xfer] Temporaraily failed download of file setiathome_5.27_windows_intelx86.exe: system connect 8/28/2007 5:22:35 Backing 0ff 1 hr 56 min 41 sec on download of file setiathome_5.27_windows_intelx86.exe 8/28/2007 5:22:35 PM Access to reference site succeeded-project servers may be temporarily down. This may not apply, but I've seen it in recent posts: Have you tried running ipconfig /flushdns from the command line? ID: 629405 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30651 Credit: 53,134,872 RAC: 32	Message 629459 - Posted: 30 Aug 2007, 6:53:46 UTC - in response to Message 629095. As far as the public data pipeline is concerned, it's been relatively smooth sailing since recovering from the weekly outage yesterday. Queues are draining or filling in the right directions, work is being created and sent out at an even pace, etc. However, bambi was a bit of a time consuming headache this morning. It finally resynced from the spurious RAID failure yesterday. I tested the supposed failed drives and got enough confusing outputs that I thought the disk controller went nuts. Playing around with the 3ware BIOS showed this was more or less the case: every time we rescanned the drives a different small random subset would disappear from the list. This isn't a good thing. We popped the system open and found nothing loose or unseated. So we did a true power cycle - unplugging it from the wall, etc. Since then the disks have all returned and remain intact after several rescans and reboots. So perhaps an ugly bit got jammed in the 3ware card and needed to be neutralized. Meanwhile I moved splitting to lando so I could work on bambi without dangerously running low on work to send. - Matt Matt, A long time ago I had a drive with that problem. Different random blocks being tagged bad. Finally decided to run a couple scans and not spare the blocks. As each scan came up with different random blocks, none the same, I finally realized the platters were fine, but the on the disk electronics board was the item that was failed. I'm assuming you pulled the drive free of the raid controller to run the tests, just to be sure it isn't the raid. I suspect you are going to have more problems with this drive and you most likely spared good blocks. Gary ID: 629459 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 629472 - Posted: 30 Aug 2007, 8:02:07 UTC - in response to Message 629459. I suspect you are going to have more problems with this drive and you most likely spared good blocks. The problem wasn't with a drive, it was multitple drives, and different drives on each occasion. Grant Darwin NT ID: 629472 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 629480 - Posted: 30 Aug 2007, 9:05:59 UTC Last modified: 30 Aug 2007, 9:06:25 UTC Matt, Just the usual thanks for the updates, and this is also all very useful insight into the admin for big server systems! As for the RAID problems: PSU marginal? High temperatures? Vibration? Or have you really got a failing controller card or a batch of dubious disks?? Or some wierd config problem?... Good luck, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 629480 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 629525 - Posted: 30 Aug 2007, 12:16:10 UTC - in response to Message 629328. I wish someone would explain to me why others seem to be getting work and I just keep waiting. It has been days since I have had work on my machines. I have only been able to get a few WU's on one machine. I have been trying to get answers, but can't get any responses. Below is the messages I have been receiving. According to your account, your last post was 536 days ago. I'm not sure who you've been trying to get answers from, but we're here to help you now! 8-) ID: 629525 ·

Sterling_Aug Send message Joined: 27 Sep 02 Posts: 54 Credit: 14,105,725 RAC: 0	Message 629663 - Posted: 30 Aug 2007, 17:21:47 UTC - in response to Message 629525. Has anyone else been getting tons of computation errors in Vista lately? I upgraded to BOINC 5.10.20 and I am still getting dozens of errors each day. ID: 629663 ·

Henk Haneveld Volunteer tester Send message Joined: 16 May 99 Posts: 154 Credit: 1,577,293 RAC: 1	Message 629698 - Posted: 30 Aug 2007, 18:11:46 UTC - in response to Message 629663. Last modified: 30 Aug 2007, 18:12:46 UTC Has anyone else been getting tons of computation errors in Vista lately? I upgraded to BOINC 5.10.20 and I am still getting dozens of errors each day. Do you have the right Chicken 2.4 version? The first release had problems with Vista. There has been a special version made for it ID: 629698 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 629701 - Posted: 30 Aug 2007, 18:14:32 UTC - in response to Message 629663. Has anyone else been getting tons of computation errors in Vista lately? I upgraded to BOINC 5.10.20 and I am still getting dozens of errors each day. Yes. They are nothing to do with your version of BOINC, and nothing to do with the technical staff at Berkeley. They are, however, probably to do with the fact that you've installed an optimised application. Please come over to the Number Crunching forum, and read this thread and this post - both of them may apply to you. ID: 629701 ·

Cameron Send message Joined: 27 Nov 02 Posts: 110 Credit: 5,082,471 RAC: 17	Message 630975 - Posted: 1 Sep 2007, 12:38:38 UTC It's interesting what Matt and the Others get up to in the routine of getting SETI work to us. Keep up the Great Work ID: 630975 ·

Scarecrow Send message Joined: 15 Jul 00 Posts: 4520 Credit: 486,601 RAC: 0	Message 633264 - Posted: 4 Sep 2007, 8:06:23 UTC The thread subject was just too tempting..... ---------- *** Lord, I apologize... and be with the starving pygmies in new guinea........ ID: 633264 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.