From 2497631b59d597dd8a7be9d0d5d476a1fe189a2e Mon Sep 17 00:00:00 2001
From: Ariejan de Vroom <ariejan@devroom.io>
Date: Wed, 24 Jan 2024 21:14:12 +0100
Subject: [PATCH] add post prepare new hard disk for zfs nas

---
 ...01-24-prepare-new-hard-disk-for-zfs-nas.md | 101 ++++++++++++++++++
 1 file changed, 101 insertions(+)
 create mode 100644 content/posts/2024-01-24-prepare-new-hard-disk-for-zfs-nas.md

diff --git a/content/posts/2024-01-24-prepare-new-hard-disk-for-zfs-nas.md b/content/posts/2024-01-24-prepare-new-hard-disk-for-zfs-nas.md
new file mode 100644
index 0000000..fd50e3f
--- /dev/null
+++ b/content/posts/2024-01-24-prepare-new-hard-disk-for-zfs-nas.md
@@ -0,0 +1,101 @@
++++
+date = 2024-01-24
+title = "Prepare new harddisk for ZFS/NAS"
+tags = ["homelab", "storage", "nas", "zfs"]
++++
+
+You can read more on my homelab and datahoarding problem [here](https://www.devroom.io/2020/02/28/building-a-diy-home-server-with-freenas/) and [here](https://www.devroom.io/2020/11/12/the-big-diy-nas-update/). 
+
+Today I scored two _recertified_ 10TB HGST drives for very little. Normally I'd go for the brand new stuff, but this deal was too good to be true.
+
+My main goal is to check if these recertified disks are worth the money/effort. Next up I want to experiment with a new ZFS pool setups. (You still cannot remove a raidz vdev from your pool in 2024[^1])
+
+## Current state of affairs
+
+Right now my main ZFS storage pool looks like this:
+
+  * `tank` 
+    * `raidz1` 4x 3TB WD Red
+    * `raidz1` 4x 8TB WD White
+    * `raidz1` 4x 14TB WD White
+
+All in all good for 100TB raw storage space and roughly 75TB of usable storage. And yet, it's getting full. Linux ISO's take up a lot of space. I also have a 2TB (2x 1TB SSD mirrors) pool for VM storage and a single 3TB drive as a backup intermediate disk (i.e. backups are copied there, then uploaded elsewhere).
+
+As stated, I cannot _remove_ any of those `raidz1` vdevs. My only options are to build a new pool with new disks or replace disks in this pool. 
+
+## It's all about trust
+
+So, new drives I normally trust. If they don't work, they don't work. But if they spin-up, I've alwasy assumed they'd be okay. Yeah, I know. 
+
+But, since I now have a pair of 10TB _recertified_ drives, I'd like to be sure they're good to go. Recertified in this context probably means they were retired from a data center somewhere. Their power-on time is a little over 5 years with a production data of December 2017. 
+
+To make sure these disks are good to go, I'm going to run a bunch of tests against them to see if they hold up.
+
+ 1. SMART conveyance test
+ 1. SMART extended test
+ 1. `badblocks`
+
+## S.M.A.R.T.
+
+If you don't know about different SMART tests, here's a refresher. I'm skipping the short test, because I'm running a long test.
+
+> **Short** Checks the electrical and mechanical performance as well as the read performance of the disk. Electrical tests might include a test of buffer RAM, a read/write circuitry test, or a test of the read/write head elements. Mechanical test includes seeking and servo on data tracks. Scans small parts of the drive's surface (area is vendor-specific and there is a time limit on the test). Checks the list of pending sectors that may have read errors, and it usually takes under two minutes.
+
+> **Long/extended** A longer and more thorough version of the short self-test, scanning the entire disk surface with no time limit. This test usually takes several hours, depending on the read/write speed of the drive and its size.
+
+> **Conveyance** Intended as a quick test to identify damage incurred during transporting of the device from the drive manufacturer to the computer manufacturer. Only available on ATA drives, and it usually takes several minutes.
+
+Running these is as easy as:
+
+```
+$ sudo smartctl -t <short|long|conveyance> /dev/sda
+```
+
+If you want to know how long theses tests are going to take:
+
+```
+$ sudo smartctl -c /dev/sda
+...
+Short self-test routine
+recommended polling time: 	 (   2) minutes.
+Extended self-test routine
+recommended polling time: 	 (1144) minutes.
+...
+
+```
+
+So it'll take a little of 19 hours to do a full extended SMART test.
+
+## Badblocks
+
+Badblocks is a utility, well, searches a disk for bad blocks. It will write data to the entire disk an verify each block to good. 
+
+Naively I ran the following to do a full write test with progress indicator and verbose output:
+
+```
+$ sudo badblocks -wsv /dev/sda
+badblocks: Value too large for defined data type invalid end block (9766436864): must be 32-bit value
+```
+
+As it turns out, `badblocks` uses a default block size of 1024, meaning it cannot - out of the box - scan disks 8TB or larger. Let's figure out what blocksize our disk uses, and plug that information into `badblocks`.
+
+```
+$ sudo blockdev --getbsz /dev/sda
+4096
+
+$ sudo badblocks -t random -w -v -s -b 4096 /dev/sda
+Checking for bad blocks in read-write mode
+From block 0 to 2441609215
+Testing with random pattern:   5.97% done, 39:45 elapsed. (0/0/0 errors)
+
+```
+
+And we're in business. Well, now we wait a few hours / days for `badblocks` to complete. I might even do a second pass just for the fun of it. 
+
+## What's next? 
+
+After I've run at least two badblocks passes and both they conveyance and extended SMART tests on this disk, I'm going to do it on the other one as well. If that all goes well I'll probably put them as a ZFS mirror pair in a new pool and do some testing there.
+
+[^1]: There are good technical reasons for that, I know. Wish I'd known about it before I built my pool, though. 
+
+