| Author |
Message |
Matt LebofskyVolunteer moderator Project administrator Project developer Project scientist
 Send message
Joined: 1 Mar 99 Posts: 1375 Credit: 74,079 RAC: 0

|
|
As far as the public data pipeline is concerned, it's been relatively smooth sailing since recovering from the weekly outage yesterday. Queues are draining or filling in the right directions, work is being created and sent out at an even pace, etc.
However, bambi was a bit of a time consuming headache this morning. It finally resynced from the spurious RAID failure yesterday. I tested the supposed failed drives and got enough confusing outputs that I thought the disk controller went nuts. Playing around with the 3ware BIOS showed this was more or less the case: every time we rescanned the drives a different small random subset would disappear from the list. This isn't a good thing.
We popped the system open and found nothing loose or unseated. So we did a true power cycle - unplugging it from the wall, etc. Since then the disks have all returned and remain intact after several rescans and reboots. So perhaps an ugly bit got jammed in the 3ware card and needed to be neutralized. Meanwhile I moved splitting to lando so I could work on bambi without dangerously running low on work to send.
- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
|
|
|
|
|
Nice to see the project is keeping you busy. - But as an a side why do I keep getting "system connect" errors - new Vista Prem build about 21:40 BST
|
|
|
|
|
|
Thank You Berkeley AND Matt ;) . . . Keep Posting - It is Appreciated . . .
|
|
|
|
|
|
I just had a Compaq RAID controller go bad upon reboot last week. Just between you and me, I would acquire a spare to have on-hand, just in case. What kind of 3wire RAID controller is it?
|
|
|
|
|
|
I wish someone would explain to me why others seem to be getting work and I just keep waiting. It has been days since I have had work on my machines. I have only been able to get a few WU's on one machine. I have been trying to get answers, but can't get any responses. Below is the messages I have been receiving.
SETI@home 8/28/2007 4:39:48 PM [file_xfer]Started download of file libffw3f-3-1-1a_upx.dll
8/28/2007 4:40:11 PM Project communication failed: attempting access to reference site
8/28/2007 4:40:11 PM [file_xfer] Temporaraily failed download of file libffw3f-3-1-1a_upx.dll: system connect
8/28/2007 4:40:11 PM Backing 0ff 1 hr 56 min 41 sec on download of file libffw3f-3-1-1a_upx.dll
8/28/2007 4:40:11 PM Access to reference site succeeded-project servers may be temporarily down.
SETI@home 8/28/2007 5:22:35 PM [file_xfer]Started download of file setiathome_5.27_windows_intelx86.exe
8/28/2007 5:22:35 PM Project communication failed: attempting access to reference site
8/28/2007 5:22:35 PM [file_xfer] Temporaraily failed download of file setiathome_5.27_windows_intelx86.exe: system connect
8/28/2007 5:22:35 Backing 0ff 1 hr 56 min 41 sec on download of file setiathome_5.27_windows_intelx86.exe
8/28/2007 5:22:35 PM Access to reference site succeeded-project servers may be temporarily down.
As far as the public data pipeline is concerned, it's been relatively smooth sailing since recovering from the weekly outage yesterday. Queues are draining or filling in the right directions, work is being created and sent out at an even pace, etc.
However, bambi was a bit of a time consuming headache this morning. It finally resynced from the spurious RAID failure yesterday. I tested the supposed failed drives and got enough confusing outputs that I thought the disk controller went nuts. Playing around with the 3ware BIOS showed this was more or less the case: every time we rescanned the drives a different small random subset would disappear from the list. This isn't a good thing.
We popped the system open and found nothing loose or unseated. So we did a true power cycle - unplugging it from the wall, etc. Since then the disks have all returned and remain intact after several rescans and reboots. So perhaps an ugly bit got jammed in the 3ware card and needed to be neutralized. Meanwhile I moved splitting to lando so I could work on bambi without dangerously running low on work to send.
- Matt
____________
|
|
|
|
|
|
[quote]I wish someone would explain to me why others seem to be getting work and I just keep waiting. It has been days since I have had work on my machines. I have only been able to get a few WU's on one machine. I have been trying to get answers, but can't get any responses. Below is the messages I have been receiving.
SETI@home 8/28/2007 4:39:48 PM [file_xfer]Started download of file libffw3f-3-1-1a_upx.dll
8/28/2007 4:40:11 PM Project communication failed: attempting access to reference site
8/28/2007 4:40:11 PM [file_xfer] Temporaraily failed download of file libffw3f-3-1-1a_upx.dll: system connect
8/28/2007 4:40:11 PM Backing 0ff 1 hr 56 min 41 sec on download of file libffw3f-3-1-1a_upx.dll
8/28/2007 4:40:11 PM Access to reference site succeeded-project servers may be temporarily down.
SETI@home 8/28/2007 5:22:35 PM [file_xfer]Started download of file setiathome_5.27_windows_intelx86.exe
8/28/2007 5:22:35 PM Project communication failed: attempting access to reference site
8/28/2007 5:22:35 PM [file_xfer] Temporaraily failed download of file setiathome_5.27_windows_intelx86.exe: system connect
8/28/2007 5:22:35 Backing 0ff 1 hr 56 min 41 sec on download of file setiathome_5.27_windows_intelx86.exe
8/28/2007 5:22:35 PM Access to reference site succeeded-project servers may be temporarily down.
This may not apply, but I've seen it in recent posts:
Have you tried running ipconfig /flushdns from the command line?
____________
|
|
|
|
|
As far as the public data pipeline is concerned, it's been relatively smooth sailing since recovering from the weekly outage yesterday. Queues are draining or filling in the right directions, work is being created and sent out at an even pace, etc.
However, bambi was a bit of a time consuming headache this morning. It finally resynced from the spurious RAID failure yesterday. I tested the supposed failed drives and got enough confusing outputs that I thought the disk controller went nuts. Playing around with the 3ware BIOS showed this was more or less the case: every time we rescanned the drives a different small random subset would disappear from the list. This isn't a good thing.
We popped the system open and found nothing loose or unseated. So we did a true power cycle - unplugging it from the wall, etc. Since then the disks have all returned and remain intact after several rescans and reboots. So perhaps an ugly bit got jammed in the 3ware card and needed to be neutralized. Meanwhile I moved splitting to lando so I could work on bambi without dangerously running low on work to send.
- Matt
Matt,
A long time ago I had a drive with that problem. Different random blocks being tagged bad. Finally decided to run a couple scans and not spare the blocks. As each scan came up with different random blocks, none the same, I finally realized the platters were fine, but the on the disk electronics board was the item that was failed. I'm assuming you pulled the drive free of the raid controller to run the tests, just to be sure it isn't the raid. I suspect you are going to have more problems with this drive and you most likely spared good blocks.
Gary
____________
|
|
|
|
|
I suspect you are going to have more problems with this drive and you most likely spared good blocks.
The problem wasn't with a drive, it was multitple drives, and different drives on each occasion.
____________
Grant
Darwin NT. |
|
|
ML1Volunteer tester Send message
Joined: 25 Nov 01 Posts: 7109 Credit: 3,681,240 RAC: 940

|
|
Matt,
Just the usual thanks for the updates,
and this is also all very useful insight into the admin for big server systems!
As for the RAID problems:
PSU marginal?
High temperatures?
Vibration?
Or have you really got a failing controller card or a batch of dubious disks??
Or some wierd config problem?...
Good luck,
Martin
____________
Mandriva Linux A user friendly OS!
See new freedom Mageia2
The Future is what We make IT (GPLv3) |
|
|
Volunteer tester Send message
Joined: 9 Apr 02 Posts: 11987 Credit: 17,879,868 RAC: 59,539

|
I wish someone would explain to me why others seem to be getting work and I just keep waiting. It has been days since I have had work on my machines. I have only been able to get a few WU's on one machine. I have been trying to get answers, but can't get any responses. Below is the messages I have been receiving.
According to your account, your last post was 536 days ago. I'm not sure who you've been trying to get answers from, but we're here to help you now! 8-)
____________
|
|
|
|
|
|
Has anyone else been getting tons of computation errors in Vista lately?
I upgraded to BOINC 5.10.20 and I am still getting dozens of errors each day.
____________
|
|
|
|
|
Has anyone else been getting tons of computation errors in Vista lately?
I upgraded to BOINC 5.10.20 and I am still getting dozens of errors each day.
Do you have the right Chicken 2.4 version?
The first release had problems with Vista.
There has been a special version made for it
____________
|
|
|
|
|
Has anyone else been getting tons of computation errors in Vista lately?
I upgraded to BOINC 5.10.20 and I am still getting dozens of errors each day.
Yes. They are nothing to do with your version of BOINC, and nothing to do with the technical staff at Berkeley. They are, however, probably to do with the fact that you've installed an optimised application.
Please come over to the Number Crunching forum, and read this thread and this post - both of them may apply to you. |
|
|
|
|
|
It's interesting what Matt and the Others get up to in the routine of getting SETI work to us.
Keep up the Great Work |
|
|
|
|
|
The thread subject was just too tempting.....
----------
*** Lord, I apologize... and be with the starving pygmies in new guinea........ |
|
|