It is the late 1990s and the computer server world is dominated by enterprise UNIX operating systems – all competing with each other. Windows 2000 is not out yet and Windows NT 4 is essentially a toy that lesser mortals run on their Intel PCs which they laughingly call ‘servers’. Your company has a commercial UNIX and its called Solaris. Your UNIX is very popular and is a leading platform. Your UNIX however has some major deficiencies when it comes to storage.
IRIX – a competing proprietary UNIX – has the fantastic XFS file system which vastly out performs your own file system which is still UFS (“Unix File System” – originally developed in the early 1980s) and doesn’t even have journalling – until Solaris 7 at least (in November 1998). IRIX had XFS baked into it from 1994. IRIX also had a great volume manager – where as Solaris’ ‘SVM’ was generally regarded as terrible and was an add-on product that didn’t appear as part of Solaris itself until Solaris 8 in 2000.
It wasn’t just IRIX that was extraordinarily better in this area. AIX was ahead of both – JFS was released in 1990 and had file system features that were only just recently introduced by Microsoft with ReFS. JFS was a journaled file system – the first ever journalled file system included in an OS – as I mentioned above it took until November 1998 for Sun to catch up. AIX had a “Logical Volume Manager” (LVM) implementation as well, which again was much better than Sun’s SVM.
This disparity between Solaris and other commercial UNIX platforms did not however hold Solaris’s market share back as it perhaps should have. This was because customers using Solaris on big high-end servers would simply not use UFS, especially not between 1998 and 2005. Customers used VxFS instead – a third party file system, but one that was developed originally at AT&T’s Unix labs, one that was the core file system in another proprietary unix – HP-UX – and one that had journalling, was modern, and could actually compete against XFS and JFS. Of course customers had to buy this from Veritas, but this was a small price to pay for a decent file system and volume system (yes, it came with an alternative volume manager too – Veritas Volume Manager).
So eventually Sun realised that storage was exploding in capacity and UFS just wasn’t up to the task. They also realise that VxFS wasn’t likely to be up to the task either, and with the growing threat of Linux and Windows Server a different solution was needed – a file system to fix all the problems with UFS and leap-frog the competition. As a young man I was fortunate to work at Sun Microsystems when this was happening and I got to meet the core ZFS developers and even work in the ZFS ‘development lab’ – I worked in the same building.
Sun had a problem though – they didn’t just need a new file system. Their RAID implementation (entirely in software, Sun servers never had hardware RAID), and volume management implementations also needed to be replaced. So ZFS sought to replace all three of these components at once. Sadly it would take until 2006 for ZFS to be released into production usage on Solaris, and by then the battle for the enterprise operating system was already over. Linux and Windows had won, the commercial UNIXes had lost. Intel had won the hardware war – the commercial UNIX vendors had lost. Perhaps file systems weren’t as important as Sun had thought.
ZFS is a child of the 1990s commercial UNIX systems. It is an incredibly complicated file system that manages the entire storage stack from the disk spindles all the way up to the file system exposed to application. It can manage vast quantities of disk spindles and scale up to 16 exabytes of storage. It is however still very much a product of the 1990s Sun thinking – a file system for large, beefy all-in-one servers running hundreds of applications. The world had however moved on whilst Sun wasn’t watching.
By 2006 the dominant server platform was 1 or 2U Intel server running Linux or Windows 2003 – servers that almost universally shipped with hardware RAID controllers. High-end SAN storage arrays were king in the enterprise and ZFS wasn’t built for them at all – ZFS was designed to manage the disks directly, making it a great platform for building a SAN storage array itself! Except it wasn’t, because ZFS was still designed with a 1990s mindset. It has no clustering support, its a local file system designed for just Solaris to utilise.
The goal of ZFS to allow Solaris to compete and address vast swathes of storage that UFS and the other competing file systems could not. However by 2006 when ZFS was finally released the other file systems had caught up. They had evolved to scale to the available storage. For a short while everybody talked about how Linux needed ZFS, how Mac OS X needed ZFS, and how ZFS could even turn up in Windows. Ten years after ZFS was launched none of those things have turned out to be true.
Even more frustrating for ZFS fans is that today the dominant computing model is virtual machines and containers: lots of relatively small operating system instances utilising relatively small data sets working together. ZFS makes very little sense in this environment.
Proponents of ZFS on Linux and elsewhere said that ZFS was required because it was revolutionary and much better than what Linux had. In some cases this was true, but in the important cases it was not. Linux was then, and still now, mostly run on hardware RAID systems, had fantastic simple and reliable software RAID, a performant and simple volume manager (LVM) and a range of file system choices that scaled to what was available then and now. Linux was gifted both XFS and JFS from Solaris’ rivals – and both of which continued to develop, XFS particularity so.
Linux did lack some features of ZFS – namely efficient snapshots and data checksumming – that were important. Ten years later we can clearly see that these issues did not prevent the adoption of Linux and ZFS did not in any way save Solaris – Solaris is dying, slowly in private, away from the eyes of the press. Linux won, despite not having ZFS (or Dtrace).
So what about today, does Linux needs ZFS? Canonical thinks it does, and thinks ZFS is exciting technology – more exciting than we’ve seen in “a long time”. Except it really isn’t. These are the same arguments we heard 10 years ago and yet ZFS is even less relevant today than it was a decade ago. Canonical tried to justify ZFS with a series of ‘killer’ features:
Linux already has copy on write snapshots via LVM thin provisioned pools, its in production and supported in RHEL. What’s more it supports most Linux file systems – you and choose whichever you like. If you prefer you can dump LVM and use btrfs which supports snapshots in the same way. So no, sorry canonical, this is not a killer feature of ZFS.
- ‘copy-on-write cloning’
ZFS clones are just writeable snapshots, it snapshots ZFS and then copies this (via COW) to create a writable clone. Well, shucks, Linux’s LVM supports this as well and has done for years. It also is a COW based system. Oh and btrfs does this too. This isn’t a killer feature of ZFS either.
- ‘continuous integrity checking against data corruption’
XFS has metadata-only (non-data) integrity checking too. Btrfs has full data integrity checking against data corruption. So, no, ZFS can’t claim this is a killer feature that others don’t have. It doesn’t matter anyway – this continuous integrity checking means nothing if you’re using ZFS on a hardware RAID controller or against an enterprise (or non-enterprise) storage array. It only works and guarantees anything if you’re letting ZFS manage the spindles directly. This was a product of 1990s thinking about how storage would be attached or baked into Sun’s servers. Besides, when was the last time you got data corruption? What problem is this trying to solve? I’ve never felt that Linux needs this feature, have you? This isn’t a killer feature.
- ‘automatic repair’
Whilst it is true that ZFS does not have to run a horrible NTFS-style chkdsk process, or a horrible ext3-style fsck process either, other file systems have progressed in this regard too. XFS has a similar automatic repair function, doesn’t ever run fsck at boot (there is no XFS fsck!), and does have an xfs_repair tool that nobody ever has to use. Its also worth pointing out that ZFS does have to have non-automatic repairs sometimes, in fact, I’ve had to do it a lot when running ZFS in production. ZFS scrub’s are…not fun, ridiculously slow and can lose files just like any file system does. I found this in production multiple times. Oh, and btrfs supports ‘automatic repair’ too. This isn’t a killer feature.
- ‘efficient data compression’
I think this is the only feature that has any merit in Canonical’s list, but I cannot call it a killer feature. Work is ongoing on adding compression into ext4, but nobody seems to care much about doing it. If you really want it its baked into btrfs on Linux. So no, canonical, this is not a ‘killer’ feature.
ZFS – and sadly btrfs – are both rooted in a 1990s monolithic model of servers and storage. btrfs hasn’t caught on in Linux for a variety of reasons, but most of all its because it simply isn’t needed. XFS runs rings around both in terms of performance, scales to massive volume sizes. LVM supports XFS by adding COW snapshots and clones, and even clustering if you so wanted. I believe the interesting direction in file systems is actually things like Gluster and Ceph – file systems designed with the future in mind, rather than for a server model we’re not running any more.
Canonical are targeting ZFS support for containers, saying that its the perfect fit for that. The irony is containers don’t need ZFS. Red Hat is using a LVM/devicemapper CoW based approach. CoreOS has switched away from btrfs (ZFS-style snapshots!) to overlayfs and ext4 – and apparently performance was much better. Docker can use OverlayFS as well and recommends against using ZFS.
Of course, Ubuntu suffers from NIH syndrome, and so isn’t using Docker/Atomic/rkt etc – it has created its own container technology – LXD. Perhaps it doesn’t matter then that you’re using OpenZFS – if you’re planning on using LXD (and thus ZFS) you’re ignoring the mature, established container technologies and picking a platform that is almost certainly going to be poorly supported going forward.