What "spare parts" would you keep to allow a rapid recovery from failure?

Message boards : Number crunching : What "spare parts" would you keep to allow a rapid recovery from failure?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2018653 - Posted: 12 Nov 2019, 13:16:51 UTC

I have reduced my commitment to Seti@Home crunching to one dedicated cruncher and one "daily driver"/cruncher. I have been selling my surplus hardware.

I want to keep enough extra parts that I can probably get my dedicated cruncher up promptly without waiting (probably) for an order/shipping to arrive.

What would you keep? What has failed on you? I have had two MB's go south, an overworked PSU go south and one or two gtx 1060's go south.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2018653 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 2018658 - Posted: 12 Nov 2019, 14:12:19 UTC - in response to Message 2018653.  

Just keep PSUs and disks as spares. All other parts fails have been of the order of once a decade!

Happy fast crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2018658 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2018666 - Posted: 12 Nov 2019, 14:58:24 UTC

PSU most certainly.
Disc drive - well, that's a debate. But I do have a spare disk which is a clone of my Windows 7 PC as that is my daily driver.
Anything else - I'm only a couple of hours away from three different component shifters so its of no big deal.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2018666 · Report as offensive
Holdolin

Send message
Joined: 10 Apr 19
Posts: 68
Credit: 88,777,750
RAC: 30
United States
Message 2018673 - Posted: 12 Nov 2019, 15:52:40 UTC

As others have suggested, most definitely a PSU and perhaps a hdd/ssd. Those are the most common points of failure in my experience. I can't say much though as my basement looks like a parts depot and could easily suggest keeping much more, but if have no need desire then just the mentioned stuff should work for ya.
ID: 2018673 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018676 - Posted: 12 Nov 2019, 16:00:37 UTC

I have a lot of spare parts, but mainly as a result of parts collection, I haven't really bought anything specifically to use as backups.

I have a couple PSUs
I have tons of memory, most of my systems use DDR3 ECC RDIMMs, and they have more memory installed than they need, so I can redistribute if necessary
I have some spare small SSDs laying around for OS disks
lots of old HDDs that are really too old/small to be useful for anything, these are last resort backups for OS drives
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018676 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2018708 - Posted: 12 Nov 2019, 21:51:13 UTC

PSU first for sure , but my last one decide to kill my Ram too ...

somes good caps to repair the MB sometimes too ^^
ID: 2018708 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 2018768 - Posted: 13 Nov 2019, 8:17:44 UTC
Last modified: 13 Nov 2019, 9:09:26 UTC

And, whatever you do, don't forget Murphy's Law. Whatever you decide to keep as spares, it'll be something else that fails ;-)

P.

ps, Never had anything fail in a pc, only hard drives in a NAS box.
ID: 2018768 · Report as offensive
Ianab
Volunteer tester

Send message
Joined: 11 Jun 08
Posts: 732
Credit: 20,635,586
RAC: 5
New Zealand
Message 2018772 - Posted: 13 Nov 2019, 10:04:26 UTC
Last modified: 13 Nov 2019, 10:07:10 UTC

Just keep a whole "warm spare".

Some random dumpster dived i3 (depends on the quality of your local dumpsters), that's all set up and ready to go.

If one of your real machines dies, you have either all the parts, or it's lost a system board, then you have a working spare PC.

Power isn't an issue, because it's not plugged in. Space isn't an issue because a PC case doesn't take up any more space than box of PC parts.

Wife puts up with my "spares" because she knows that if her (dumpster dived) PC dies, I can have a serviceable backup under her desk in 5 mins.

The 10 "warm spares" in the corner may suggest it's time to have a scrap metal session though :-D There are some perfectly good C2D machines in there with valid Win10s , and if that's all you had, they would run and "work". Just they are likely worth more as $2 of scrap metal .

But just pick your best "old"machine and stick it in the corner, ratter than scrapping or selling it. Then not matter what fails, you have good parts to fix it.
ID: 2018772 · Report as offensive
Ianab
Volunteer tester

Send message
Joined: 11 Jun 08
Posts: 732
Credit: 20,635,586
RAC: 5
New Zealand
Message 2018773 - Posted: 13 Nov 2019, 10:15:07 UTC - in response to Message 2018768.  



ps, Never had anything fail in a pc, only hard drives in a NAS box.


I've seen pretty much EVERY part of a PC fail , but that's over 100s of machines and several decades. Heck throw in a good thunderstorm and I've seen PCs there EVERY part was toasted. When the modem cable is welded into the socket on both the PC and wall end, it's likely there was some current and voltage slightly over spec.... By maybe 100,000 volts?

But hey, a "warm spare" in the cupboard would still be good, once you got power and internet back on again.
ID: 2018773 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2018783 - Posted: 13 Nov 2019, 11:40:43 UTC

and don't forget to have a backup copy of your BOINC & BOINC_DATA folders ^^
ID: 2018783 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2018784 - Posted: 13 Nov 2019, 11:58:01 UTC - in response to Message 2018783.  

and don't forget to have a backup copy of your BOINC & BOINC_DATA folders ^^

+1
A proud member of the OFA (Old Farts Association).
ID: 2018784 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2018801 - Posted: 13 Nov 2019, 15:25:03 UTC

In reality the only files you need to back up are configuration files, executables and libraries. Data, unless you backup continuously as it is continuously changing, and, in the event of a really bad crash you just wave goodbye to a load of tasks - which is why it is a bad idea to have over-inflated caches as having a couple of hundred tasks waiting to time-out is one thing, but to have several thousand is even worse.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2018801 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018805 - Posted: 13 Nov 2019, 15:45:56 UTC - in response to Message 2018801.  

In reality the only files you need to back up are configuration files, executables and libraries. Data, unless you backup continuously as it is continuously changing, and, in the event of a really bad crash you just wave goodbye to a load of tasks - which is why it is a bad idea to have over-inflated caches as having a couple of hundred tasks waiting to time-out is one thing, but to have several thousand is even worse.


you can detach the system from the project, which abandons all the tasks immediately and they get redistributed to users. no waiting for timeout.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018805 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22202
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2018807 - Posted: 13 Nov 2019, 15:51:04 UTC

But does that work if the disk on which they are is "a smoldering lump of rubble"?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2018807 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2018808 - Posted: 13 Nov 2019, 15:54:38 UTC - in response to Message 2018801.  

....which is why it is a bad idea to have over-inflated caches as having a couple of hundred tasks waiting to time-out is one thing, but to have several thousand is even worse.


This assumes that the person is just going to let them abandon and time out. When I lost a hard drive full I recovered all of them. By doing so I was able to improve the process and more importantly get the resend limit increased from 20 to 80 per request. So, it was actually a net positive for everyone.
ID: 2018808 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2018809 - Posted: 13 Nov 2019, 15:56:34 UTC - in response to Message 2018807.  

But does that work if the disk on which they are is "a smoldering lump of rubble"?


I don't see why not.

reinstall OS/BOINC to new disk
set new host name to be the same as before
grab the "Number of times client has contacted server" from the host details page
increment that number by one and add to your client_state.xml file
new system looks like the "old" system
detach.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2018809 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65747
Credit: 55,293,173
RAC: 49
United States
Message 2018850 - Posted: 13 Nov 2019, 22:45:32 UTC

Video cards and psus is what I'd have a few extras of, just in case.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 2018850 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34746
Credit: 261,360,520
RAC: 489
Australia
Message 2018851 - Posted: 13 Nov 2019, 22:50:39 UTC

Here it's 2 "warm spares", a PSU, 2 monitors and a 250GB SSD.

Cheers.
ID: 2018851 · Report as offensive

Message boards : Number crunching : What "spare parts" would you keep to allow a rapid recovery from failure?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.