Message boards :
Number crunching :
To checkpoint or not -- the wear and tear of SSD drives
Message board moderation
Author | Message |
---|---|
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that? If I am not mistaken, checkpointing is mostly controlled by Boinc, not the application. There is a setting in your computing preferences.........request tasks to checkpoint at most every.........with the default being 60 seconds. I am not sure if the app also has something hardwired to do checkpointing or to override the preference setting. The end result being that it limits the amount of work lost if the task is interrupted for any reason. I also am not certain if the checkpointing operation has anything to do with the updating of the progress info reported by Boinc, but I don't think so. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
In any case I keep also a normal legacy hard drive for backups and low speed tasks and I keep the Boinc data partition on that, so that I do not stress the SSD. SETI surely does not get slowed down by not using the SSD. Looking at the HDD blinking regularly and thinking that this happens to the SSD made me take this decision as soon as I got my first SSD! Sleepy |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that? Wear and tear of SSD drives isn't an issue. Tech Report did a "Test till they die" article on several consumer SSDs. They lasted, way, way, way, way, way longer than their rated limits. USB thumb drives, particularly of small capacity- they will be at risk of early death due to write limitations. Reporting progress is necessary- how many people have reset or powered down their system because they thought nothing was happening, even though it was? Likewise with processing work- something that takes 10+ minutes to process, with no indication that it is actually being processed is likely to be aborted by the user if they see it there, with no (visible) progress. And to process a WU for 14 min, only to have something happen and have to start all over again would be somewhat annoying. Ideally the application would honor BOINCs settings for check pointing. Give the user the option to increase the frequency, or to disable it entirely if they want to, through configuration settings. But don't remove the function completely. Grant Darwin NT |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
OK. So it is a non issue. And a looked at the code. The app writes to disk only on specified intervals. Case closed. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I too at first worried about the constant writes to my SSDs. I too saw and read that article about tested life expectancy of consumer SSD and then never worried about it again. I will replace any SSD just because of newer technology or more capacity needed before any of my current SSDs die. My first SSD used on BOINC accumulated 570 GB of writes. It didn't die in the Tech Report tests until it hit the PB range. One of my current SSD used for BOINC has accumulated only 23 GB of writes so far after 3 years. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I had this argument with my fellow computer geek here at work about his SSD. He also cited the article and said he wasn't worried about it. Then it died just over 1 year after he got it... Good thing is the price on SSD is coming down so it's easier to replace them when they do fail (because, evidently they DO) lol |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Well the old rule of 30-day infant mortality in consumer products basically handled manufacturing defects. With modern manufacturing quality control systems, that rule basically needs to get thrown out or heavily modified. Maybe it should be the 1-year infant mortality rule. Also with the shift to total solid state electronics, you get rid of the electrolytic capacitor handicap. If you can escape power or thermal environment problems, then the likely cause of failure in modern solid state electronics is cosmic ray bombardment. Or operator error. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mike Send message Joined: 17 Feb 01 Posts: 34253 Credit: 79,922,639 RAC: 80 |
I too at first worried about the constant writes to my SSDs. I too saw and read that article about tested life expectancy of consumer SSD and then never worried about it again. I will replace any SSD just because of newer technology or more capacity needed before any of my current SSDs die. My first SSD used on BOINC accumulated 570 GB of writes. It didn't die in the Tech Report tests until it hit the PB range. One of my current SSD used for BOINC has accumulated only 23 GB of writes so far after 3 years. I guess you mean TB not GB. Mine has 21 TB written in 15 month but should be able to handle 70 TB. With each crime and every kindness we birth our future. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
OK, you made me check the figures again. The SMART data in SIV shows the 23 GB of Total Host Sector Writes but I just checked again this time with Crucial Storage Executive app and it shows 11.95 TB written. Big difference. The SMART ID's are different between the Kingston and Crucial drives so the naming conventions are different and the exposed values are not the same either. It makes more sense for the 11.95 TB value and using the app designed for the drive I guess. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I too at first worried about the constant writes to my SSDs. I too saw and read that article about tested life expectancy of consumer SSD and then never worried about it again. I will replace any SSD just because of newer technology or more capacity needed before any of my current SSDs die. My first SSD used on BOINC accumulated 570 GB of writes. It didn't die in the Tech Report tests until it hit the PB range. One of my current SSD used for BOINC has accumulated only 23 GB of writes so far after 3 years. It seems like most SSD manufactures are using TB written as part of the warranty these days. With a limit of about 80-100TBW per 256GB of drive capacity. Looks like my old 240GB SSD is at 47TB of NAND writes and 94% life. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
A few years old, but most of the ones that Tech Report tested were able to write a petabyte before they started having serious problems. Maybe the first versions of SSDs were a bit more fragile, but now they have smart logic controllers in them that do automatic wear-leveling to keep one area from being written-to more than anywhere else. General rule of thumb is: -SLCs are the most durable. -MLCs are pretty reliable -TLCs seem to be the most fragile, but still pretty robust--they can usually do 200+TB just fine before having bad cells. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Yes, I figure my first SSD the Kingston HyperX 120GB 5K should be able to go past 1 PB since the cheaper 3K sibling made it to 800 TB in the Tech Report test. It is a MLC drive and the same for my Crucial MX100 and MX200 drives. I have the most concern over the new Samsung 960 EVO M.2 drive which is a 3D TLC drive. I don't think TLC drives are as robust as the older MLC drives even with the larger cell size of 3D NAND. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Even though my main 2 rigs have SSD's, BOINC resides only on their mechanical drives. Cheers. |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that? I'm running the optimised app on GTX1060's (3GB) and they rip through a work unit in 5 mins or less. The machines all have HDD with a 2 minute checkpoint interval. I wouldn't bother with the complexity of checkpoints, but a percentage progress is always good to see. Thanks for the great app. By the way if you're interested the Asteroids at home CUDA app needs optimising :-) BOINC blog |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that? Thanks, I'll join Asteroids when the next major outage (days long) happens. Then I'll take a look at the source code. But my guess is that there is not much to optimize since the world is full of good programmers. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yes, I figure my first SSD the Kingston HyperX 120GB 5K should be able to go past 1 PB since the cheaper 3K sibling made it to 800 TB in the Tech Report test. It is a MLC drive and the same for my Crucial MX100 and MX200 drives. I have the most concern over the new Samsung 960 EVO M.2 drive which is a 3D TLC drive. I don't think TLC drives are as robust as the older MLC drives even with the larger cell size of 3D NAND. . . Hi Keith, . . OK what are MLC and TLC drives? I am not familiar with the terminology? And I wish you had not written that about your new SSD, the drive in my new rig is a 250GB Samsung 960 EVO M.2 ... but it is the system drive and does not house the S@H files. Stephen ? :( ? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Yes, I figure my first SSD the Kingston HyperX 120GB 5K should be able to go past 1 PB since the cheaper 3K sibling made it to 800 TB in the Tech Report test. It is a MLC drive and the same for my Crucial MX100 and MX200 drives. I have the most concern over the new Samsung 960 EVO M.2 drive which is a 3D TLC drive. I don't think TLC drives are as robust as the older MLC drives even with the larger cell size of 3D NAND. MLC and TLC refer to the NAND cell voltage level. SLC is one bit per cell or single voltage representing a binary bit, MLC is 2 bits per cell or 2 voltage levels and TLC it 3 bits per cell and is three distinct voltage levels. The scale is SLC, MLC and then TLC with regard to data integrity robustness and the amount of PE writes a cell can endure before being taken out of service by the flash controller. SLC is also faster than MLC and TLC and is usually put in data center product lines. The rule of thumb got upset a bit when the 3D cell structures came into play because they have larger cell features and can handle more erases than even MLC at smaller feature size. They are too new in the marketplace to see where they fall out on the lifespan curve I believe. My Kingston HyperX 5K SSD has 25 nm MLC cells and my Crucial MX100 and MX200 drives have 16nm MLC cells. I still have 100% capacity on the HyperX drives but the Crucial drives are already down to 86%. The Crucial MX100 drives are 2 years newer than the Kingston HyperX drives which are going on 5 years now. I put the Samsung 960 EVO M.2 into the new Ryzen build simply because I was curious about the technology and form factor and the new motherboard supported it. I can't say that I can really perceive much of a difference in day to day operations between the Ryzen system and my older FX systems even though the M.2 drive has 5 times the benchmark performance advantage. For how I use the computers mostly as BOINC crunchers, a simple SATA SSD is all that I need. Whether the new Samsung drive with its 3D NAND holds up to the constant writes under BOINC is the unknown. I have BOINC installed on the boot drives in all my systems. I figure I will replace or upsize the drives in the future before I run into capacity degradation or the drives fail. Crossing my fingers I guess. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Karsten Vinding Send message Joined: 18 May 99 Posts: 239 Credit: 25,201,931 RAC: 11 |
I have been running SSD's for some (many) years now. My very first was a 60Gb OCZ Agility, bought at the same time when Intel 64/128/256Gb where the best performing drives. I remember being stunned by the speedup it gave my system. Nowadays i feel stunned when I work at a normal HDD drive computer, but not in a good way... Since then I upgraded to a Samsung 840EVO, and all my other PC's are running with various SSD's. At first when I only had the 64Gb Agility, I was very cautious about not writing to it. It more or less contained only the OS. Swap drive was moved to a HDD. Later I moved the swap disk to it. It didn't show any problems. Later on it became a secondary drive as the EVO moved in. Now it maintained the BOINC drive and some of the Steam games. Wear did go up, but not at an alarming level. It lived this way until about a year ago, when remaining lifetime showed 15%. It also seemed that it had started using its spare blocks, as the reallocation count was going up, so it was getting worn out. I decided it should live its last write cycles in my PS3. This killed it in a matter of 8 months, with mainly using the PS3 for streaming movies via Netflix / Plex. As the PS3 does not have much mem it caches streams to disk, so it probably saw a lot of writes during this last time. One day, without warning the PS3 wouldnt boot, the Agility was dead. All in all the Agility lasted me more than 7 (close to 8) years, I have had many hard drives that didn't last as long. And this is for a small 60Gb drive, with less capacity to do its wear leveling. A 120 og 240Gb drive would have lasted much longer in the same conditions. I for one is not worried about wear. None of my current drives are below 95% wear leveling, despite being used without any special settings, besides the ones Windows / Centos/ Ubuntu sets themselves. SSD's can fail prematurely, as can any piece of electronic, but I consider the technology mature and reliable. We soon pass the 10 year mark for normal consumer availability of these drives. My latest hardware failure was a good old fashioned HDD, only 4 months old. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have installed two SSD on my HP Laptop running BOINC on Linux. The first was an OCZ, the second a Samsung and they both failed. Then I installed a 1 TB Seagate hybrid disk and it has no problems. Tullio |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.