Oldskooler Ramblings

the unlikely child born of the home computer wars

Archive for May 4th, 2018

You cannot violate the laws of physics

Posted by Trixter on May 4, 2018

It’s technology refresh time at casa del Trixter.  I’m dabbling in 4K videography, and upgrading my 9-year-old i7-980X system to an i7-8700K to keep up.  Another activity to support this is  upgrading the drives in my home-built ZFS-based NAS, where I back up my data before it is additionally backed up to cloud storage.  The NAS’ 4x2TB drives were replaced with 2x8TB and 2x3TB (cost reasons) in a RAID-10 config, and it mostly went well until I started to see disconnection errors during periods of heavy activity (ie. a zpool scrub):

Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc] Device not ready
Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc] 
Apr 30 19:32:07 FORTKNOX kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc] 
Apr 30 19:32:07 FORTKNOX kernel: Sense Key : Not Ready [current] 
Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc] 
Apr 30 19:32:07 FORTKNOX kernel: Add. Sense: Logical unit not ready, cause not reportable
Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc] CDB: 
Apr 30 19:32:07 FORTKNOX kernel: Read(16): 88 00 00 00 00 00 08 32 11 70 00 00 01 00 00 00
Apr 30 19:32:07 FORTKNOX kernel: end_request: I/O error, dev sdc, sector 137498992
Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc] Device not ready
Apr 30 19:32:07 FORTKNOX kernel: sd 0:0:2:0: [sdc]

At first I thought the drive was bad, so I replaced it.  I then saw exactly the same types of errors on the replacement drive, so to make sure I wasn’t sent a bad replacement, I tested the drive in another system and it passed with flying colors.  So now the troubleshooting began:  Switch SATA ports on the motherboard:  No change.  Switch SATA cables: No change.  Switch SATA power cables: No change.  Switch SATA cables and ports with one of the drives that was working:  No change; that specific drive kept reporting “Device not ready”.  I even moved the drive to a different bay to see if the case was crimping the cables to the drive when I put the lid back on:  No change.

It was really starting to confuse me as to why this drive wouldn’t work installed as the 4th drive in my NAS.  I started to doubt the aging Xeon NAS motherboard, so I bought a SAS controller and a SAS-to-SATA forward breakout cable so that the card could handle all of the traffic.  This seemed to work at first, but eventually the errors came back.  I then started swapping SATA breakout ports, then entire SAS cables, then eventually a replacement SAS controller.  In all instances, the errors eventually came back on just that single drive, a drive that worked perfectly in any other system!

The solution didn’t present itself until I started building my replacement desktop system based on the i7-8700k.  In that system, I opted for a modular power supply to keep the cable mess at a minimum (highly recommended; I’ll never go back to non-modular PSUs).  When I was putting my video editing RAID5 drives into the new desktop, I noticed with irritation that each of the modular SATA power cables only had three headers on them instead of four.  This sucked because I was hoping to use one SATA power breakout cable for all four drives, and now I’d have to use two cables which added to the cable clutter inside the case.  This power supply was Gold rated, high wattage — why only put three SATA power headers on a breakout cable?  In thinking about the problem, I came to the conclusion that the makers of the power supply were likely being conservative, to avoid exceeding the limits of what that rail was designed to provide.

And that’s when I remembered that I was putting four drives on a single rail back on the NAS, and not three like the new power supply was enforcing.  When I moved the misbehaving NAS drive to a SATA power header on another rail, all of the drive disconnection problems went away.  Whoops.

How did this work before?  The power draw of 2x8TB + 2x3TB drives was just high enough to be dodgy, when the previous configuration of older 4x2TB drives was not.  The newer drives draw more power than the older drives did.

Lesson learned, and now I have spare controllers and cables in case there’s a real failure.

Posted in Technology | 7 Comments »