Unraid installation experience report (2021)

Experience report from a first-time NAS operator, as well as some useful tips for getting started and testing the array is working as expected. The goals are to not have to worry about running out of disk space again, and consolidating homelab services to a single machine.

UNRAID Version 6.9.2 is being evaluated in this post.

Hardware overview

Item Cost (£)
Intel(R) Core(TM) i7-3770S CPU @ 3.10GHz 8M L3 cache 0
Gigabyte Technology Co., Ltd. B75M-D3H, Version x.x 0
32GB DDR3 0
1000 Mbps NIC 0
Fractal Arc Mini R2 mATX case. 0
1x Samsung SSD860 EVO 500GB UserBenchmark 0
6x Barracuda ST2000DM008-2FR102 2TB UserBenchmark Datasheet (2 recycled, 5) 174.96
Broadcom / LSI 9207-8i SAS2.1 HBA Datasheet 105
3x YIWENTEC Mini SAS 36P SFF-8087 to 4 SATA 7Pin 10.27
BeQuiet LM-CM530W PSU 0
Total 290.23

Items of cost 0 were recycled from hardware I had in the cupboard. The 860 EVO will be in the cache pool alone for now, the Barracuda's will be in the array.

In addition to hardware, there's electricity costs self-hosting. In Nov. 2021, during a supposed energy crisis, my utility providers offers me £2.02/kWh. After measuring the consumption with a power meter, the server was found to draw between 50W idling, and 100W performing disk benchmark tests for my use cases. If your use-cases involve constant media transcoding, or running game servers, you'll likely draw more power. At 100W constant, it would be 0.1*20.02*24*365=17537.52 UK pence per year, or in less obscure fiat units, =~ $230=/year. That is not an insignificant cost, nearly the total cost of the build per annum! I expect my use-case to be on the lower-end of the power usage curve, for pure-NAS operations, I'd expected to be 1/2 of that yearly figure.

Installation

The installer is a bit weird how it expects to be flashed on a disk, from Linux at least. I wonder why Unraid doesn't provide a ISO you dd to a USB drive, instead some scripts must be manually run. Not hard, just unusual to me. I was slightly worried Unraid is using the disk GUID as ID for the registration tokens, but found there is a way to move your key to another flash drive, if needed.

Locking down the OS is a shame. I suppose they did it for better control over their system, but it means you can not apt or pacman your way to anything, you have to go through the UI as far as I can see for now. Looks like UNRAID is building a community app store concept around Docker containers, which is a nice idea. I found some great pre-packaged components, however some of them are very out of date. Curation may be a problem, unlike the Debian/Arch repos. It's a shame to have more separation, but I can see they have a different vision to Linux packages, instead basing things more around containers. Somewhat relevant meta.

The UI wasn't a great first experience. It was surprising that the "Start the array" button, to me the most obvious first thing I wanted to push, was buried in the middle of a page called "Main", which imho should be called "Array". Anyway, with experience, the UI has become "obvious", but it was jarring at first.

I'd expect the UI to a be more polished than it is for a commericial product, but it's very usable once you get used to it, so it's a minor issue for me.

The first boot, the first drive failure

The ST2000DM008's attached to my LSI 9207-8i SAS2.1 card were giving read errors for reasons unclear. Cabling good, LSI card good… Experienced UNRAIDs disk error handling right out of the gate. The errors were reported well, although the need to acknowledge an error before it stops reporting wasn't clear at first. I must have rebooted in various configurations trying to find a pattern for some time, before realising the errors are "sticky", so the testing was a waste of time. The notifications UI is another area that is idiosyncratic.

The first remediation attempted was to plug in the drives directly to the motherboard SATA ports. In this configuration, the array is OK and free of errors, so that leaves either the SAS card, the SAS<->SATA cabling, or the Linux driver for the SAS card at potential fault.

To check this, I installed an old WD Blue 250GB (UserBenchmark) in lieu of the ST2000DM008, this time, there were no read errors. As a fun aside, HD manufacturers have been busy,

unraid-2021-wd-blue-compared.png

Disk 1 and 2 are the 2TB ST2000DM008's, and sdb is the the WD Blue.

So, the problem is the ST2000DM008's specifically as they relate to the SAS<->SATA cables and the HBA card. All other components have been shown to work.

Attempting to spot differences between the drives, the datasheets for the WD Blue and Barracuda were compared. Both drives use the SATA 6Gb/s interface, all other differences do not seem pertinent, apart from perhaps the size of the drive.

Three sets of 1x SAS -> 4x SATA cables were purchased. Unfortunately all the same brand, but testing all three sets of cables produced S.M.A.R.T errors on the ST2000DM008's. Are the SAS->SATA cables versioned? Why does the old drive work, but the new ones do not?

The error notification looks like this,

Unraid device dev1 SMART health [199]: 27-11-2021 19:21
Warning [TOWER] - udma crc error count is 428
ST2000DM008-2FR102_ZK307NLW (dev1)

The SAS card has been detected at least by Linux,

01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 11
        IOMMU group: 1
        Region 0: I/O ports at e000
        Region 1: Memory at f7d40000 (64-bit, non-prefetchable)
        Region 3: Memory at f7d00000 (64-bit, non-prefetchable)
        Expansion ROM at f7c00000 [disabled]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x8 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
                Not readable
        Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 04000001 0000000f 08010000 7e58334c
        Capabilities: [1e0 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [1c0 v1] Power Budgeting <?>
        Capabilities: [190 v1] Dynamic Power Allocation <?>
        Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

The errors seen so far are all read errors (since that's the first step!) They all report like this in Linux,

[  282.271396] blk_update_request: I/O error, dev sdb, sector 3907028544 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 0
[  282.509822] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
[  282.509828] mpt2sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
[  282.509847] sd 1:0:1:0: [sdc] tag#677 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 cmd_age=0s
[  282.509852] sd 1:0:1:0: [sdc] tag#677 CDB: opcode=0x28 28 00 e8 e0 87 b0 00 00 10 00

I started to look at the SAS card. Research indicated the cards shipped from Amazon often have outdated firmware which is known to be buggy. I was on firmware version 20.00.00.00, the fixed version was reported at 20.00.07.00.

It wasn't easy to figure out from Broadcom's website how to flash the HBA card. It was going to involve creating a boot disk, and potentially booting DOS. Fun times! Luckily, that was all avoided thanks to an excellent thread on ServerBuilds.net. I followed the instructions, flashed by card to the 0.7 version and rebooted with the Barracuda 2TB drive attached, verified like so,

# dmesg | grep FWVersion
[   18.586915] mpt2sas_cm0: LSISAS2308: FWVersion(20.00.07.00), ChipRevision(0x05), BiosVersion(07.39.02.00)

No more read errors! Moving on at long last!

A slow transfer

Transfers from the array to my laptop were going at 40MB/sec over the Samba share that GNOME discovered automatically. That seems very slow, time to measure.

A 1G test file was created with fallocate -l 1G test.dat1 and then transferred with rsync from the array mount to my laptop with a very fast drive,

rsync -P -v Tower.lan:/mnt/disk1/test.dat .

~113MB/sec transfer speed was measured. Therefore it's transferring at 113*8/1000~=90% utilization. I'm willing to accept that's the best case taking into account ephemeral processing overheads I'm not familiar with. I've also seen it transfer at 50MB/sec, not been reproduced. Considered it being the disks in a spun down state, however, measurements show that only affects latency, not throughput. shrug.

Getting into the array with SSH requires copying a public key to the root account. Telnet is enabled by default as well (telnet root@tower.lan), there is no password by default.

Useful disk information is found in /var/local/emhttp/disks.ini. First checking the read speed with hdparm,

hdparm -t /dev/md1 /dev/md1

Quartiles were [205.5, 209.0, 209.0] MB/sec, more than the network can handle anyway…

/dev/md2 was similar enough.

The array disks were also each tested, /dev/sdc1, etc, etc. They were all ballpark 200MB/sec read speeds as well, as expected. Significantly better than the speeds reported on the UserBenchmarks site. As can be seen in the benchmark graphic above, it simply means the hdparm benchmark is not using a small transfer size to report its numbers. It can be see performance similar to what is reported on UserBenchmarks is seen starting at about 75% the capacity of the disk.

The cache drive is an SSD, device name is /dev/sdb1. It is an Samsung_SSD_860_EVO_500GB_S4XBNF0MA43116V.

hdparm measured 482.47 MB/sec read speeds. SSDs are much better than HDDs at maintaining read speeds no matter the size of the random range in which seek tests are performed.

/dev/sda1 (flash drive) was measuring at ~35 MB/s. Useful as another sanity check. It makes sense given that flash memory is relatively slow to read.

The DiskSpeed Unraid app was then tested. The app ecosystem in Unraid looks promising. It's basically a neat wrapper over the Docker CLI, but well made and convenient. Install the container, browse to tower.lan:18888, and there's the disk app. This further confirmed the above measurements with its own benchmarking method.

dd was then considered, here complications with measurement methodology started. Since it is a high-level tool, there are more high-level conveniences that can interfere with good data collection. To ease the use of this tool, a small BASH function has been scripted,

benchmount() {
    pushd "$1" 2>&1>/dev/null

    local blksz=${3:-100M}
    local count=${4:-1}

    sync
    echo 3 > /proc/sys/vm/drop_caches

    case "$2" in
       read)
           [ -e "$1"/test.dat ] || dd if=/dev/zero of=test.dat bs=$blksz count=$count oflag=sync,nocache status=none
           sync
           echo 3 > /proc/sys/vm/drop_caches
           dd if=test.dat of=/dev/null bs=$blksz count=$count oflag=dsync,nocache iflag=fullblock
          ;;
       write)
           dd if=/dev/zero of=test.dat bs=$blksz count=$count oflag=dsync,nocache iflag=fullblock
          ;;
    esac
    rm -vf "$1"/test.dat
    popd 2>&1>/dev/null
}

There's more choices for the blocksize to be made. A complicated surface is generated by varying blocksize and counts, which depend on the details of the computer system as a whole. Local testing suggested 32M was a good blocksize, it saturates the channels well enough.

cache write = 347 MB/sec
cache read  = 512 MB/sec

array write = 50  MB/sec
array read  = 215 MB/sec

iozone was used to double check things, but it was reporting read and write speeds of GB/sec, clearly testing more the filesystem-level performance than disk performance. Unclear whether it's designed for such tests.

Example,

docker run --rm --rm -v /mnt/disk1:/data -w /data -it --name iozone pstauffer/iozone:latest /bin/sh -c 'iozone -i 0 -i 1 -+u -f ./iozone.dat -s 300m -r 16m'

A read speed of 7778291 kBytes = 7.8 GB/sec was measured. It is assumed this is measuring memory, not disk.

If write errors are experienced, ensure the correct share is mounted on /data. A common error is testing the memory-backed filesystem the Unraid USB stick prepares for the userspace. Not only is this small (20GB here), it is much faster than any disk.

Measured power usage during iozone run, the idle power climbed from 50W to 80W during this test.

So, after that enormous excursion, it's clear that the initial measurement of 40MB/sec over Samba is deservedly unexpected. rsync transfers at about 90% efficiency (115MB/sec roughly), so I'd expected something in the same ballpark for Samba and NFS, knowing nothing about their implementation details or features, it must be admitted.

The network filesystems have transparent client-side caching which can give false measurements. It is advisable to unmount the network filesystem, and then remount it for each measurement. This ensures the caches have been invalidated.

NFS is first considered, mount the NFS share,

benchnfs() {
    local filename=${1:-win11.iso}
    local bs=${2:-1k}
    local action=${3-read}
    local mnt=${4:-./test}
    local remote_share=${5:-/mnt/user/Media}

    [ -e "$mnt" ] || mkdir "$mnt"

    sudo umount "$mnt"
    ssh Tower.lan 'echo 3 > /proc/sys/vm/drop_caches && sync'
    echo 3 | sudo tee /proc/sys/vm/drop_caches && sync
    sudo mount -t nfs tower.lan:"$remote_share" "$mnt"
    case "$action" in
        "read")
            [ -e "$filename" ] || rm -fi "$filename"
            dd if="./$mnt/$filename" of="$filename" bs="$bs" oflag=dsync,nocache iflag=fullblock status=progress
            ;;
        "write")
            dd if="$filename" of="./$mnt/$filename" bs="$bs" oflag=dsync,nocache iflag=fullblock status=progress
            ;;
    esac
}

And then measurements were taken varying the "blocksz" parameter. There's a lot of variance with the measurements, indicating the benchmark is missing opportunities to reset state.

For NFS,

Transfer buffer size Measurements(MB/sec) Notes
64k 20  
256k 40  
2M 60  
2M 70  
4M 75 80  
8M 84  
16M 77 81  
32M 85 79  
64M 96 82  
128M 96 93 sweet spot
250M 92  
500M 90  
1G 95 92 latency gets longer down here

So, potentially around 96MB/sec if the blocksize is right. Unsure what this size is dependent on. 128/64 MB don't remind me of anything important…

To test SAMBA, I relied on GNOME's virtual filesystem discovering and mounting the Samba share from UNRAID for me, then another script is crafted,

benchsamba() {
    local filename=${1:-win11.iso}
    local bs=${2:-64M}
    local action=${3:-read}
    local mnt=${4:-./test}
    local remote_share=${5:-/run/user/1000/gvfs/smb-share:server=tower.local,share=media/}

    ssh Tower.lan 'echo 3 > /proc/sys/vm/drop_caches && sync'
    echo 3 | sudo tee /proc/sys/vm/drop_caches && sync

    case "$action" in
        "read")
            [ -e "$filename" ] || rm -fi "$filename"
            dd if="./$mnt/$filename" of="$filename" bs="$bs" oflag=dsync,nocache iflag=fullblock status=progress
            ;;
        "write")
            dd if="$filename" of="./$mnt/$filename" bs="$bs" oflag=dsync,nocache iflag=fullblock status=progress
            ;;
    esac
}

For Samba, the measurements were close enough to NFS that roughly the same expectations can be held for both file systems. Read and write speeds are essentially the same, since the network is the bottleneck in this setup, not the array disks.

After performing these tests, it has not been possible to reproduce the 40MB/sec transfer speeds using the point-and-click interface in GNOME Files, as was first observed, and what prompted this performance analysis. There are clearly more variables not being taken into account, but this has been a long enough excursion for me. The conslusion is 90MB/sec over the network filesystems is broadly what is being seen. 113MB/sec using rsync, so about 90/113*100~=80% of potential capacity. IOW, 20% overhead versus rsync.

A paucity of power

Seven 2 TB drives are available, but the careful observer will see only 6 in the array. That's because there were not enough SATA power connectors with the BeQuiet PSU. A bag of BeQuiet PSU connectors was purchased from eBay, but upon adding more SATA power connectors with these cables, the machine shorted on boot. Fraction of a second and the power was out. Unsure if it's just this SATA power cable that is bad, or whether the wattage of the PSU is too low. 12 TB is enough for me, for now anyway.

Getting busy

The next thing I'd like is to expose UNRAID to the Internet, perhaps I could host a router on a VM inside the array at some point in the future, that would be neat. Set the root password to something sane. The docs suggested using the "Dynamix Password Validator", installing that downloads a minified JS file called zxcvbn.js. Not auditable which is a shame, but at least a hash check is performed during install… Would be a great place to slip malware in though. The tradeoff here being usability. The app works first time and the process was mindless. Just works, password entry now shows a strength meter, lets hope it's not also harvesting my passwords for my beloved leader.

The first set of apps I found referenced are from Dynamix. I get an uneasy feeling, since the plugins really do just write stuff wherever they want, there's not structured packaging going on here, it's a bit of a mess in my opinion. I don't look forward to maintaining this.

The SSD trim package was the second choice. Again a surprise. A PHP web app for adding a cron job to run fstrim on schedule? What!? It would seem the reason is that the developers of UNRAID thought the ease-of-UI programming in PHP would be a winner. I can see some merit to the idea, but it feels very messy and ad-hoc to me, which can be a working strategy tbf. Manually performing this job requires knowledge of cron syntax and the fstrim command (which like most command in Linux nowadays are far from doing one thing well…) If you're an experienced UNIX admin, you might find some of these techniques bizarre and wasteful, but it clearly serves the purpose of getting a less technical audience up with a very powerful home server.

After using the Dynamix web UI to create my trim schedule, I found that the packages define their own cron strategy as well (rather than using a Distribution designed system => consistency),

# cat /boot/config/plugins/dynamix.ssd.trim/ssd-trim.cron
# Generated ssd trim schedule:
0 0 * * 0 /sbin/fstrim -a -v | logger &> /dev/null

Then, the plugin, through some BASH and PHP spaghetti, ended up directly pushing the cats into crontab (ouch!) like this,

#/bin/bash

# Concatenate the set of installed plugin cron files
# into a single system crontab.

cron_files() {
  cat /boot/config/plugins/dynamix/*.cron 2>/dev/null
  for plugin in /var/log/plugins/*.plg; do
    plugin=${plugin##*/}
    cat /boot/config/plugins/${plugin%.*}/*.cron 2>/dev/null
  done
}

ENTRIES=$(cron_files)
if [[ "$ENTRIES" ]]; then
  echo "$ENTRIES"|crontab -c /etc/cron.d -
else
  crontab -c /etc/cron.d -d
fi

A lot of these details should be more structured for long-term maintainability and auditability.

Conclusions

This being my first foray into building a home server, a lot of lessons were learned, notably,

  1. IT projects require tons of planning if stress isn't something you like. A lot of the issues I blundered through above could have been avoided with more solid planning. For example, had I spent some time researching the SAS card I bought, and read others' experiences, I wouldn't have spent so much time isolating where the issue was. I would have flashed as soon as I bought the card and been done in 30 minutes rather than a couple days…
  2. Measure everything three times with different tools if possible. There's so much trickery involved in the paths from the disk to the CPU that it's very trivial to take measurements that are completely misleading. By measuring several times, you can build some confidence in this landscape of statistical machine. By using different tools, you can be more sure a pathology is not being hit and explanations become more confident.
  3. It's not that easy setting up a home NAS, it would be if an expert sent you an inventory of known working components. This goes back to point 1., but it's not hard to miss the nuances of PCI channel configurations on your motherboard not being compatible with an HBA card, or not realising you don't have enough SATA power cables on the PSU, or that your PSU mysteriously shorts with certain SATA power cables you acquired and so on. Getting to the bottom of all these issues can take a very long time. I started questioning why I was torturing myself with all this and not just spending money on a pre-built NAS. If money isn't a big issue for you, I'd recommend paying for a prebuilt NAS in hindsight.

Footnotes:

1

fallocate is a slightly magical. In the SSH case, it doesn't matter, but optimizations can occur on read operations when the system detects long runs of NULs. Better to use dd from a random source IME.

Created: 2021-11-28 Sun 18:46

Validate