To checkpoint or not -- the wear and tear of SSD drives

Message boards : Number crunching : To checkpoint or not -- the wear and tear of SSD drives
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 270,690,202
RAC: 296,080
Finland
Message 1877081 - Posted: 6 Jul 2017, 9:12:01 UTC

Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that?
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1877081 · Report as offensive     Reply Quote
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49070
Credit: 879,025,185
RAC: 201,002
United States
Message 1877082 - Posted: 6 Jul 2017, 9:22:43 UTC - in response to Message 1877081.  

Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that?

If I am not mistaken, checkpointing is mostly controlled by Boinc, not the application.
There is a setting in your computing preferences.........request tasks to checkpoint at most every.........with the default being 60 seconds. I am not sure if the app also has something hardwired to do checkpointing or to override the preference setting.
The end result being that it limits the amount of work lost if the task is interrupted for any reason.
I also am not certain if the checkpointing operation has anything to do with the updating of the progress info reported by Boinc, but I don't think so.
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1877082 · Report as offensive     Reply Quote
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 132
Credit: 45,453,268
RAC: 34,642
Italy
Message 1877083 - Posted: 6 Jul 2017, 9:33:06 UTC - in response to Message 1877081.  

In any case I keep also a normal legacy hard drive for backups and low speed tasks and I keep the Boinc data partition on that, so that I do not stress the SSD. SETI surely does not get slowed down by not using the SSD.

Looking at the HDD blinking regularly and thinking that this happens to the SSD made me take this decision as soon as I got my first SSD!

Sleepy
ID: 1877083 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8896
Credit: 115,398,846
RAC: 70,493
Australia
Message 1877084 - Posted: 6 Jul 2017, 9:34:43 UTC - in response to Message 1877081.  

Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that?

Wear and tear of SSD drives isn't an issue.
Tech Report did a "Test till they die" article on several consumer SSDs. They lasted, way, way, way, way, way longer than their rated limits.
USB thumb drives, particularly of small capacity- they will be at risk of early death due to write limitations.

Reporting progress is necessary- how many people have reset or powered down their system because they thought nothing was happening, even though it was?
Likewise with processing work- something that takes 10+ minutes to process, with no indication that it is actually being processed is likely to be aborted by the user if they see it there, with no (visible) progress.

And to process a WU for 14 min, only to have something happen and have to start all over again would be somewhat annoying.
Ideally the application would honor BOINCs settings for check pointing. Give the user the option to increase the frequency, or to disable it entirely if they want to, through configuration settings. But don't remove the function completely.
Grant
Darwin NT
ID: 1877084 · Report as offensive     Reply Quote
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 270,690,202
RAC: 296,080
Finland
Message 1877087 - Posted: 6 Jul 2017, 10:23:51 UTC

OK. So it is a non issue.
And a looked at the code. The app writes to disk only on specified intervals.
Case closed.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1877087 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2452
Credit: 185,962,151
RAC: 368,347
United States
Message 1877105 - Posted: 6 Jul 2017, 15:41:07 UTC - in response to Message 1877084.  

I too at first worried about the constant writes to my SSDs. I too saw and read that article about tested life expectancy of consumer SSD and then never worried about it again. I will replace any SSD just because of newer technology or more capacity needed before any of my current SSDs die. My first SSD used on BOINC accumulated 570 GB of writes. It didn't die in the Tech Report tests until it hit the PB range. One of my current SSD used for BOINC has accumulated only 23 GB of writes so far after 3 years.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1877105 · Report as offensive     Reply Quote
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3993
Credit: 208,969,221
RAC: 34,758
United States
Message 1877113 - Posted: 6 Jul 2017, 16:00:11 UTC - in response to Message 1877105.  

I had this argument with my fellow computer geek here at work about his SSD. He also cited the article and said he wasn't worried about it. Then it died just over 1 year after he got it...

Good thing is the price on SSD is coming down so it's easier to replace them when they do fail (because, evidently they DO) lol
ID: 1877113 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2452
Credit: 185,962,151
RAC: 368,347
United States
Message 1877146 - Posted: 6 Jul 2017, 18:53:32 UTC - in response to Message 1877113.  
Last modified: 6 Jul 2017, 18:54:24 UTC

Well the old rule of 30-day infant mortality in consumer products basically handled manufacturing defects. With modern manufacturing quality control systems, that rule basically needs to get thrown out or heavily modified. Maybe it should be the 1-year infant mortality rule. Also with the shift to total solid state electronics, you get rid of the electrolytic capacitor handicap. If you can escape power or thermal environment problems, then the likely cause of failure in modern solid state electronics is cosmic ray bombardment. Or operator error.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1877146 · Report as offensive     Reply Quote
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30625
Credit: 57,735,459
RAC: 30,214
Germany
Message 1877154 - Posted: 6 Jul 2017, 20:35:10 UTC - in response to Message 1877105.  

I too at first worried about the constant writes to my SSDs. I too saw and read that article about tested life expectancy of consumer SSD and then never worried about it again. I will replace any SSD just because of newer technology or more capacity needed before any of my current SSDs die. My first SSD used on BOINC accumulated 570 GB of writes. It didn't die in the Tech Report tests until it hit the PB range. One of my current SSD used for BOINC has accumulated only 23 GB of writes so far after 3 years.


I guess you mean TB not GB.
Mine has 21 TB written in 15 month but should be able to handle 70 TB.
With each crime and every kindness we birth our future.
ID: 1877154 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2452
Credit: 185,962,151
RAC: 368,347
United States
Message 1877160 - Posted: 6 Jul 2017, 20:52:23 UTC - in response to Message 1877154.  

OK, you made me check the figures again. The SMART data in SIV shows the 23 GB of Total Host Sector Writes but I just checked again this time with Crucial Storage Executive app and it shows 11.95 TB written. Big difference. The SMART ID's are different between the Kingston and Crucial drives so the naming conventions are different and the exposed values are not the same either. It makes more sense for the 11.95 TB value and using the app designed for the drive I guess.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1877160 · Report as offensive     Reply Quote
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6468
Credit: 176,095,653
RAC: 56,058
United States
Message 1877219 - Posted: 7 Jul 2017, 4:56:53 UTC - in response to Message 1877154.  

I too at first worried about the constant writes to my SSDs. I too saw and read that article about tested life expectancy of consumer SSD and then never worried about it again. I will replace any SSD just because of newer technology or more capacity needed before any of my current SSDs die. My first SSD used on BOINC accumulated 570 GB of writes. It didn't die in the Tech Report tests until it hit the PB range. One of my current SSD used for BOINC has accumulated only 23 GB of writes so far after 3 years.


I guess you mean TB not GB.
Mine has 21 TB written in 15 month but should be able to handle 70 TB.

It seems like most SSD manufactures are using TB written as part of the warranty these days. With a limit of about 80-100TBW per 256GB of drive capacity.
Looks like my old 240GB SSD is at 47TB of NAND writes and 94% life.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1877219 · Report as offensive     Reply Quote
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2913
Credit: 10,887,293
RAC: 479
United States
Message 1877220 - Posted: 7 Jul 2017, 5:28:20 UTC

A few years old, but most of the ones that Tech Report tested were able to write a petabyte before they started having serious problems.

Maybe the first versions of SSDs were a bit more fragile, but now they have smart logic controllers in them that do automatic wear-leveling to keep one area from being written-to more than anywhere else.

General rule of thumb is:
-SLCs are the most durable.
-MLCs are pretty reliable
-TLCs seem to be the most fragile, but still pretty robust--they can usually do 200+TB just fine before having bad cells.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1877220 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2452
Credit: 185,962,151
RAC: 368,347
United States
Message 1877233 - Posted: 7 Jul 2017, 7:37:18 UTC

Yes, I figure my first SSD the Kingston HyperX 120GB 5K should be able to go past 1 PB since the cheaper 3K sibling made it to 800 TB in the Tech Report test. It is a MLC drive and the same for my Crucial MX100 and MX200 drives. I have the most concern over the new Samsung 960 EVO M.2 drive which is a 3D TLC drive. I don't think TLC drives are as robust as the older MLC drives even with the larger cell size of 3D NAND.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1877233 · Report as offensive     Reply Quote
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 12615
Credit: 169,690,740
RAC: 87,171
Australia
Message 1877236 - Posted: 7 Jul 2017, 7:48:58 UTC

Even though my main 2 rigs have SSD's, BOINC resides only on their mechanical drives.

Cheers.
ID: 1877236 · Report as offensive     Reply Quote
Profile MarkJProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1044
Credit: 50,384,397
RAC: 2,298
Australia
Message 1877387 - Posted: 8 Jul 2017, 1:48:33 UTC - in response to Message 1877081.  
Last modified: 8 Jul 2017, 1:51:23 UTC

Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that?

I'm running the optimised app on GTX1060's (3GB) and they rip through a work unit in 5 mins or less. The machines all have HDD with a 2 minute checkpoint interval. I wouldn't bother with the complexity of checkpoints, but a percentage progress is always good to see.

Thanks for the great app. By the way if you're interested the Asteroids at home CUDA app needs optimising :-)
BOINC blog
ID: 1877387 · Report as offensive     Reply Quote
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 270,690,202
RAC: 296,080
Finland
Message 1877429 - Posted: 8 Jul 2017, 8:51:31 UTC - in response to Message 1877387.  
Last modified: 8 Jul 2017, 8:51:46 UTC

Q: Should the optimized app make checkpoints and report progress? How often? Based on time or number of internal iterations? Does it need a command line option for that?

I'm running the optimised app on GTX1060's (3GB) and they rip through a work unit in 5 mins or less. The machines all have HDD with a 2 minute checkpoint interval. I wouldn't bother with the complexity of checkpoints, but a percentage progress is always good to see.

Thanks for the great app. By the way if you're interested the Asteroids at home CUDA app needs optimising :-)


Thanks,
I'll join Asteroids when the next major outage (days long) happens.
Then I'll take a look at the source code.
But my guess is that there is not much to optimize since the world is full of good programmers.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1877429 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2638
Credit: 48,883,130
RAC: 138,116
Australia
Message 1877879 - Posted: 11 Jul 2017, 23:40:17 UTC - in response to Message 1877233.  

Yes, I figure my first SSD the Kingston HyperX 120GB 5K should be able to go past 1 PB since the cheaper 3K sibling made it to 800 TB in the Tech Report test. It is a MLC drive and the same for my Crucial MX100 and MX200 drives. I have the most concern over the new Samsung 960 EVO M.2 drive which is a 3D TLC drive. I don't think TLC drives are as robust as the older MLC drives even with the larger cell size of 3D NAND.


. . Hi Keith,

. . OK what are MLC and TLC drives? I am not familiar with the terminology? And I wish you had not written that about your new SSD, the drive in my new rig is a 250GB Samsung 960 EVO M.2 ... but it is the system drive and does not house the S@H files.

Stephen

? :( ?
ID: 1877879 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 2452
Credit: 185,962,151
RAC: 368,347
United States
Message 1877895 - Posted: 12 Jul 2017, 1:05:44 UTC - in response to Message 1877879.  

Yes, I figure my first SSD the Kingston HyperX 120GB 5K should be able to go past 1 PB since the cheaper 3K sibling made it to 800 TB in the Tech Report test. It is a MLC drive and the same for my Crucial MX100 and MX200 drives. I have the most concern over the new Samsung 960 EVO M.2 drive which is a 3D TLC drive. I don't think TLC drives are as robust as the older MLC drives even with the larger cell size of 3D NAND.


. . Hi Keith,

. . OK what are MLC and TLC drives? I am not familiar with the terminology? And I wish you had not written that about your new SSD, the drive in my new rig is a 250GB Samsung 960 EVO M.2 ... but it is the system drive and does not house the S@H files.

Stephen

? :( ?

MLC and TLC refer to the NAND cell voltage level. SLC is one bit per cell or single voltage representing a binary bit, MLC is 2 bits per cell or 2 voltage levels and TLC it 3 bits per cell and is three distinct voltage levels. The scale is SLC, MLC and then TLC with regard to data integrity robustness and the amount of PE writes a cell can endure before being taken out of service by the flash controller. SLC is also faster than MLC and TLC and is usually put in data center product lines. The rule of thumb got upset a bit when the 3D cell structures came into play because they have larger cell features and can handle more erases than even MLC at smaller feature size. They are too new in the marketplace to see where they fall out on the lifespan curve I believe. My Kingston HyperX 5K SSD has 25 nm MLC cells and my Crucial MX100 and MX200 drives have 16nm MLC cells. I still have 100% capacity on the HyperX drives but the Crucial drives are already down to 86%. The Crucial MX100 drives are 2 years newer than the Kingston HyperX drives which are going on 5 years now.

I put the Samsung 960 EVO M.2 into the new Ryzen build simply because I was curious about the technology and form factor and the new motherboard supported it. I can't say that I can really perceive much of a difference in day to day operations between the Ryzen system and my older FX systems even though the M.2 drive has 5 times the benchmark performance advantage. For how I use the computers mostly as BOINC crunchers, a simple SATA SSD is all that I need. Whether the new Samsung drive with its 3D NAND holds up to the constant writes under BOINC is the unknown. I have BOINC installed on the boot drives in all my systems. I figure I will replace or upsize the drives in the future before I run into capacity degradation or the drives fail. Crossing my fingers I guess.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1877895 · Report as offensive     Reply Quote
Profile Karsten Vinding
Volunteer tester

Send message
Joined: 18 May 99
Posts: 182
Credit: 19,544,912
RAC: 23,568
Denmark
Message 1877947 - Posted: 12 Jul 2017, 12:21:43 UTC - in response to Message 1877895.  
Last modified: 12 Jul 2017, 12:42:42 UTC

I have been running SSD's for some (many) years now.

My very first was a 60Gb OCZ Agility, bought at the same time when Intel 64/128/256Gb where the best performing drives. I remember being stunned by the speedup it gave my system.
Nowadays i feel stunned when I work at a normal HDD drive computer, but not in a good way...

Since then I upgraded to a Samsung 840EVO, and all my other PC's are running with various SSD's.

At first when I only had the 64Gb Agility, I was very cautious about not writing to it. It more or less contained only the OS. Swap drive was moved to a HDD.
Later I moved the swap disk to it. It didn't show any problems.

Later on it became a secondary drive as the EVO moved in. Now it maintained the BOINC drive and some of the Steam games. Wear did go up, but not at an alarming level.

It lived this way until about a year ago, when remaining lifetime showed 15%. It also seemed that it had started using its spare blocks, as the reallocation count was going up, so it was getting worn out.

I decided it should live its last write cycles in my PS3. This killed it in a matter of 8 months, with mainly using the PS3 for streaming movies via Netflix / Plex.
As the PS3 does not have much mem it caches streams to disk, so it probably saw a lot of writes during this last time. One day, without warning the PS3 wouldnt boot, the Agility was dead.

All in all the Agility lasted me more than 7 (close to 8) years, I have had many hard drives that didn't last as long. And this is for a small 60Gb drive, with less capacity to do its wear leveling. A 120 og 240Gb drive would have lasted much longer in the same conditions.

I for one is not worried about wear. None of my current drives are below 95% wear leveling, despite being used without any special settings, besides the ones Windows / Centos/ Ubuntu sets themselves.

SSD's can fail prematurely, as can any piece of electronic, but I consider the technology mature and reliable. We soon pass the 10 year mark for normal consumer availability of these drives.
My latest hardware failure was a good old fashioned HDD, only 4 months old.
ID: 1877947 · Report as offensive     Reply Quote
Profile tullioProject Donor
Volunteer moderator
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 6301
Credit: 1,688,218
RAC: 1,600
Italy
Message 1877964 - Posted: 12 Jul 2017, 14:39:00 UTC

I have installed two SSD on my HP Laptop running BOINC on Linux. The first was an OCZ, the second a Samsung and they both failed. Then I installed a 1 TB Seagate hybrid disk and it has no problems.
Tullio
ID: 1877964 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : To checkpoint or not -- the wear and tear of SSD drives


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.