Dear canonical: we don’t want or need ZFS

It is the late 1990s and the computer server world is dominated by enterprise UNIX operating systems – all competing with each other. Windows 2000 is not out yet and Windows NT 4 is essentially a toy that lesser mortals run on their Intel PCs which they laughingly call ‘servers’. Your company has a commercial UNIX and its called Solaris. Your UNIX is very popular and is a leading platform. Your UNIX however has some major deficiencies when it comes to storage.

IRIX – a competing proprietary UNIX – has the fantastic XFS file system which vastly out performs your own file system which is still UFS (“Unix File System” – originally developed in the early 1980s) and doesn’t even have journalling – until Solaris 7 at least (in November 1998). IRIX had XFS baked into it from 1994. IRIX also had a great volume manager – where as Solaris’ ‘SVM’ was generally regarded as terrible and was an add-on product that didn’t appear as part of Solaris itself until Solaris 8 in 2000.

It wasn’t just IRIX that was extraordinarily better in this area. AIX was ahead of both – JFS was released in 1990 and had file system features that were only just recently introduced by Microsoft with ReFS. JFS was a journaled file system – the first ever journalled file system included in an OS – as I mentioned above it took until November 1998 for Sun to catch up. AIX had a “Logical Volume Manager” (LVM) implementation as well, which again was much better than Sun’s SVM.

This disparity between Solaris and other commercial UNIX platforms did not however hold Solaris’s market share back as it perhaps should have. This was because customers using Solaris on big high-end servers would simply not use UFS, especially not between 1998 and 2005. Customers used VxFS instead – a third party file system, but one that was developed originally at AT&T’s Unix labs, one that was the core file system in another proprietary unix – HP-UX – and one that had journalling, was modern, and could actually compete against XFS and JFS. Of course customers had to buy this from Veritas, but this was a small price to pay for a decent file system and volume system (yes, it came with an alternative volume manager too – Veritas Volume Manager).

So eventually Sun realised that storage was exploding in capacity and UFS just wasn’t up to the task. They also realise that VxFS wasn’t likely to be up to the task either, and with the growing threat of Linux and Windows Server a different solution was needed – a file system to fix all the problems with UFS and leap-frog the competition. As a young man I was fortunate to work at Sun Microsystems when this was happening and I got to meet the core ZFS developers and even work in the ZFS ‘development lab’ – I worked in the same building.

Sun had a problem though – they didn’t just need a new file system. Their RAID implementation (entirely in software, Sun servers never had hardware RAID), and volume management implementations also needed to be replaced. So ZFS sought to replace all three of these components at once. Sadly it would take until 2006 for ZFS to be released into production usage on Solaris, and by then the battle for the enterprise operating system was already over. Linux and Windows had won, the commercial UNIXes had lost. Intel had won the hardware war – the commercial UNIX vendors had lost. Perhaps file systems weren’t as important as Sun had thought.

ZFS is a child of the 1990s commercial UNIX systems. It is an incredibly complicated file system that manages the entire storage stack from the disk spindles all the way up to the file system exposed to application. It can manage vast quantities of disk spindles and scale up to 16 exabytes of storage. It is however still very much a product of the 1990s Sun thinking – a file system for large, beefy all-in-one servers running hundreds of applications. The world had however moved on whilst Sun wasn’t watching.

By 2006 the dominant server platform was 1 or 2U Intel server running Linux or Windows 2003 – servers that almost universally shipped with hardware RAID controllers. High-end SAN storage arrays were king in the enterprise and ZFS wasn’t built for them at all – ZFS was designed to manage the disks directly, making it a great platform for building a SAN storage array itself! Except it wasn’t, because ZFS was still designed with a 1990s mindset. It has no clustering support, its a local file system designed for just Solaris to utilise.

The goal of ZFS to allow Solaris to compete and address vast swathes of storage that UFS and the other competing file systems could not. However by 2006 when ZFS was finally released the other file systems had caught up. They had evolved to scale to the available storage. For a short while everybody talked about how Linux needed ZFS, how Mac OS X needed ZFS, and how ZFS could even turn up in Windows. Ten years after ZFS was launched none of those things have turned out to be true.

Even more frustrating for ZFS fans is that today the dominant computing model is virtual machines and containers: lots of relatively small operating system instances utilising relatively small data sets working together. ZFS makes very little sense in this environment.

Proponents of ZFS on Linux and elsewhere said that ZFS was required because it was revolutionary and much better than what Linux had. In some cases this was true, but in the important cases it was not. Linux was then, and still now, mostly run on hardware RAID systems, had fantastic simple and reliable software RAID, a performant and simple volume manager (LVM) and a range of file system choices that scaled to what was available then and now. Linux was gifted both XFS and JFS from Solaris’ rivals – and both of which continued to develop, XFS particularity so.

Linux did lack some features of ZFS – namely efficient snapshots and data checksumming – that were important. Ten years later we can clearly see that these issues did not prevent the adoption of Linux and ZFS did not in any way save Solaris – Solaris is dying, slowly in private, away from the eyes of the press. Linux won, despite not having ZFS (or Dtrace).

So what about today, does Linux needs ZFS? Canonical thinks it does, and thinks ZFS is exciting technology – more exciting than we’ve seen in “a long time”[1]. Except it really isn’t. These are the same arguments we heard 10 years ago and yet ZFS is even less relevant today than it was a decade ago. Canonical tried to justify ZFS with a series of ‘killer’ features:

  • ‘snapshots’
    Linux already has copy on write snapshots via LVM thin provisioned pools, its in production and supported in RHEL. What’s more it supports most Linux file systems – you and choose whichever you like. If you prefer you can dump LVM and use btrfs which supports snapshots in the same way. So no, sorry canonical, this is not a killer feature of ZFS.
  • ‘copy-on-write cloning’
    ZFS clones are just writeable snapshots, it snapshots ZFS and then copies this (via COW) to create a writable clone. Well, shucks, Linux’s LVM supports this as well and has done for years. It also is a COW based system. Oh and btrfs does this too. This isn’t a killer feature of ZFS either.
  • ‘continuous integrity checking against data corruption’
    XFS has metadata-only (non-data) integrity checking too. Btrfs has full data integrity checking against data corruption. So, no, ZFS can’t claim this is a killer feature that others don’t have. It doesn’t matter anyway – this continuous integrity checking means nothing if you’re using ZFS on a hardware RAID controller or against an enterprise (or non-enterprise) storage array. It only works and guarantees anything if you’re letting ZFS manage the spindles directly. This was a product of 1990s thinking about how storage would be attached or baked into Sun’s servers. Besides, when was the last time you got data corruption? What problem is this trying to solve? I’ve never felt that Linux needs this feature, have you? This isn’t a killer feature.
  • ‘automatic repair’
    Whilst it is true that ZFS does not have to run a horrible NTFS-style chkdsk process, or a horrible ext3-style fsck process either, other file systems have progressed in this regard too. XFS has a similar automatic repair function, doesn’t ever run fsck at boot (there is no XFS fsck!), and does have an xfs_repair tool that nobody ever has to use. Its also worth pointing out that ZFS does have to have non-automatic repairs sometimes, in fact, I’ve had to do it a lot when running ZFS in production. ZFS scrub’s are…not fun, ridiculously slow and can lose files just like any file system does. I found this in production multiple times. Oh, and btrfs supports ‘automatic repair’ too. This isn’t a killer feature.
  • ‘efficient data compression’
    I think this is the only feature that has any merit in Canonical’s list, but I cannot call it a killer feature. Work is ongoing on adding compression into ext4, but nobody seems to care much about doing it. If you really want it its baked into btrfs on Linux. So no, canonical, this is not a ‘killer’ feature.

ZFS – and sadly btrfs – are both rooted in a 1990s monolithic model of servers and storage. btrfs hasn’t caught on in Linux for a variety of reasons, but most of all its because it simply isn’t needed. XFS runs rings around both in terms of performance, scales to massive volume sizes. LVM supports XFS by adding COW snapshots and clones, and even clustering if you so wanted. I believe the interesting direction in file systems is actually things like Gluster and Ceph – file systems designed with the future in mind, rather than for a server model we’re not running any more.

Canonical are targeting ZFS support for containers, saying that its the perfect fit for that. The irony is containers don’t need ZFS. Red Hat is using a LVM/devicemapper CoW based approach. CoreOS has switched away from btrfs (ZFS-style snapshots!) to overlayfs and ext4 – and apparently performance was much better. Docker can use OverlayFS as well and recommends against using ZFS.

Of course, Ubuntu suffers from NIH syndrome, and so isn’t using Docker/Atomic/rkt etc – it has created its own container technology – LXD. Perhaps it doesn’t matter then that you’re using OpenZFS – if you’re planning on using LXD (and thus ZFS) you’re ignoring the mature, established container technologies and picking a platform that is almost certainly going to be poorly supported going forward.


Life is not fair

I loathe the phrase. As a statement of fact it doesn’t really make that much sense. Life can’t be fair or unfair. Its just what it is. People rarely utter this phrase to support somebody and is not supportive anyway. More often people use this phrase as justification.  “Well, life isn’t fair”. Its a common phrase for parents to utter to their children. “Life is awful so don’t complain” seems to be the implication.

If you were to disagree with my assertion above, then logically we use the phrase “life isn’t fair” to explain that the world is not a fair place and thus logically we must expect in our lives to be treated unfairly, to be treated badly, to expect misfortune, bad luck, and for other people to not treat us justly.

Are we really saying that though? We also get told we must follow the rule of law, we must do what our teachers say, we must do what our managers say, we should treat others as we expect to be treated, we should be fair to others. Aren’t these two statements in conflict with each other? The phrase “life isn’t fair” as a justification only works if we thus teach or ‘allow’ people to be unfair to others.

I don’t believe in a “just world”. I don’t believe karma is real or that a divine entity will make adjustments to make life fair. I do believe that actions have consequences but not to a grand plan of fairness, actions and reactions are just what they are, and just happen, in many cases randomly.

Despite all of this I go about each day with a genuine, deeply felt sense that I have been treated badly, undeservedly. I do not deserve to be treated as I am. Given how much time I spend caring about everybody else, given how much effort I put into things that benefit others, its not fair, right?

Well, I believe this because I was taught that we must all be kind to each other, we must honour each other, be nice, and not be selfish. Quite frankly this is utter bullshit. I have nothing but anger and contempt for the actions of people who taught me this tripe. Its not true. The world isn’t a fair place, as they blindly would tell me when I was mistreated, but would enforce their moral views on me anyway. I’m required to care about others, not be selfish, but when others do the same to me, all that is left is an empty statement of agreement and a useless retort – life isn’t fair.

One of the most important lessons I’ve learnt through counselling is that our life is ours alone and we should not let others control us, instead, we must seek to achieve what we want and in most cases put our needs before others. All of this is in super stark contrast to what most of us are told when children, when we’re taught that we should put others first.

Life isn’t fair, so why should I act fairly all the time? Why exactly must I accept that life is shit, and yet, feel guilty if I do what I want rather than what others want? Why exactly am I expected to not do things others don’t like, or act the way others want me to act, when nobody ever does that for me?

What really bothers me is that in life other people act unfairly, and I have to accept this, because life isn’t fair, and I’m not allowed to act unfairly myself, but on top of this, I’m not even allowed to express my frustration about other peoples actions. No. I must not do this. I must be quiet, not cause a problem, and censor myself, because, after all, life isn’t fair.

In researching this blog post I read a lot of articles about fairness and the reality of the world, one of my favourites was your broken idea of fairness. I don’t agree with all of it, and I don’t have a broken idea of fairness because, as you have just read, I know the world isn’t fair. What I think is important is his Rule number one. Life is a competition. It isn’t meant to be fair, and if you believed all that crap growing up about sharing, fairness, etc, then you were gullible. Life is about getting what you want over others.

I think the most important part of the article is talking about how other peoples morality is forced onto us as children:

People like to invent moral authority. It’s why we have referees in sports games and judges in courtrooms: we have an innate sense of right and wrong, and we expect the world to comply. Our parents tell us this. Our teachers teach us this. Be a good boy, and have some candy.

But reality is indifferent. You studied hard, but you failed the exam. You worked hard, but you didn’t get promoted. You love her, but she won’t return your calls.

Life isn’t fair. But don’t let others tell you to act fairly when simultaneously justifying the world by saying it isn’t fair. Do what you want. Be you. Say it like it is, and realise that others are competing with you. Its very unlikely they will place you before them, so don’t place them before you.

Bargate, Web Development

Bargate security overhaul

Bargate is a web application that lets users access their files on SMB/CIFS servers within the corporate network. It thus connects to SMB/CIFS servers on behalf of the user and authenticates on their behalf as well. To do this it needs their password each time the user loads the page and thus connects to the back end SMB server.

The existing design

The existing security design of Bargate is predicated on the belief that the server should not be trusted to store the user’s password. If it stores the user’s password then any break in to the server / web application could obtain the list of stored user’s passwords. Encrypting them, whilst making an attack slightly more difficult, doesn’t solve the underlying problem since the application will need to have the decryption key stored on it in order to use them. An attacker could steal both in nearly all circumstances.

Bargate thus stores the passport in the user’s session which is client side (stored in cookies). It is encrypted first using AES 256-bit CFB, then put in the session, and the session is signed by itsdangerous before being put into a cookie for the user. The encryption/decryption key for the AES 256-bit step is stored in the Bargate configuration.

The danger in this design is:

  • The encrypted password is sent across the network on every request (even if it is over SSL)
  • The encrypted password is stored in the cookie and thus on a myriad of end user devices, for perhaps up to 30 days (depending on session lifetime)
  • If an attacker gains access to the ENCRYPT_KEY (stored on the server) it can decrypt any password stored on any end user device, and gets the user’s actual password

This design was chosen of course because storing the password on the server, with or without encryption, is even worse. It would also mean any flaws in Bargate to allow attackers to steal a user’s session would work without without first having to first compromise the end user’s device as is the case today. Today if there are any flaws like that in the code they are innocuous as the attacker won’t have the encrypted password, and thus won’t be able to access any systems.

The new design

What we want to achieve is quite simple:

  • The bargate server, if attacked, can’t be used to steal user passwords (i.e. don’t store users passwords in plain text and don’t store them encrypted if the encryption key is known by the application)
  • The end user device, if attacked, can’t be used to get the user password or even the encrypted password
  • The user’s password or encrypted password should not be sent over the wire on every request, only at log on time

The password of course has to be stored somewhere, but it does not have to be stored in plain text, and the place where it is stored does not have to have the encryption key. That is how it works today – its stored on the client which doesn’t have the encryption key – but this has several downsides. Instead the new Bargate authentication system will store the password encrypted on the server, but encrypted with a key stored in the user’s session, thus reversing the design.

This means:

  • Passwords are no longer encrypted using the same encryption key for every user, each session has a unique encryption key.
  • The end user device does not store the password in any form, which allows the deploying company/group/user to focus on server security rather than end user device security (especially important in the age of BYOD).
  • Attacking the end user’s device gives the attacker no useful information. If you get access to the per-user/session encryption key stored on the client this key only decrypts an encrypted password the client never has and never will have.
  • The encrypted password is not sent over the network on each request
  • The decryption key sent over the network on each request is itself encrypted by a key known only by the server, so it is useless to an attacker eavesdropping on the connection (if they had broken TLS).

The new design in detail

  1. The end user logs into Bargate by sending their username and password over TLS
  2. Bargate checks the username and password via LDAP, Kerberos or “SMB”
  3. Bargate generates a 32-byte (256-bit) session encryption key for the user
  4. Bargate encrypts the user’s password using the session encryption key and stores it on the server (most likely in Redis with an expiration)
  5. Bargate encrypts the session encryption key using ENCRYPT_KEY (a bargate config option) and stores it in the user’s session. Bargate does not store the session encryption key any longer.
  6. The user’s browser saves the encrypted decryption key in the browser’s cookie storage
  7. The user’s browser is redirected to view a file server
  8. The user’s browser presents the encrypted decryption key to the server as a cookie over TLS
  9. Bargate decrypts the decryption key using ENCRYPT_KEY
  10. Bargate uses the resulting decryption key to decrypt the password stored in Redis
  11. Bargate uses the decrypted password to authenticate to the SMB server on the user’s behalf

Remaining attack vectors

There are two remaining attack vectors.

  • Session hijacking
    • An attacker can still take session cookies off a client and then use them. This threat is reduced with TLS and http only cookies, but an attacker could still get to them. This is a generic problem with web applications however. Adding restrictions to lock sessions to an IP address is an option, but can be disruptive and is of limited benefit.
  • Attacker with access to both the server and client
    • If the attacker has compromised both ends, well, you know, game’s over anyway.

A replacement for nss_updatedb: nis2db

In 2011 the glibc project released version 2.15 which dropped support for using Berkley DB based database files as a source of user/group information (well, any name service switch information). Instead the “db” backend of name service switch (nss) is now provided by a simple glibc-specific db file format.

This means the nss_updatedb tool, which I have used for years to provide network-free user/group information on Linux machines, no longer works on modern Linux systems. The tool generated BDB files that glibc’s nss db module simply does not support anymore. All Linux systems using glibc 2.15 or later are affected by this change.

To restore the functionality I need, which is to pull user and group information out of NIS and place them in db files that glibc can read, I have written “nis2db” which is a really simple python script which reads from NIS and uses the “makedb” command shipped with glibc.

The tool is available now and is open source:


Britain: For the Love of God, Please Stop David Cameron

Benjamin Studebaker

On May 7 (this Thursday), Britain has a general election. I care deeply about British politics–I did my BA over there and will return to do my PhD there this fall. But more importantly, David Cameron’s government has managed the country’s economy with stunning fecklessness, and I couldn’t live with myself if I didn’t do my part to point this out.

View original post 1,605 more words


Kerberos protected NFS with Active Directory and the PAC

For years I’ve been trying to use Active Directory’s Kerberos implementation for setting up secure NFS4. This is where NFS4 is configured to require kerberos tokens to make sure only the user with a valid kerberos token (i.e. they authenticated to Active Directory) can access their relevant files. This is stark contrast to NFS with AUTH_SYS – where certain IP addresses are essentially given full access.

The advantage of using NFS4/krb5 is that it can be used to share out a protected NFS4 file share to whatever IP address you like safe in the knowledge that only authenticated users can access their files. They have to authenticate with Kerberos first (i.e. Active Directory) before they can access their files – and only their files. It also solves the ‘root squash’ problem – root cannot access everybody else’s files.

However, in the past we were using Windows Server 2003 as our domain controllers – we only upgraded to Server 2012R2 a few months ago. Since upgrading we could finally mount (connect to) NFS4 shares protected by Kerberos. Once mounted however users could not use their kerberos ticket to access their files – permission denied was returned. The logs showed no errors. It was a head banging against brick wall moment.

Everything should have been working, until I discovered an obscure article suggesting that our users are in too many groups. Sure enough thanks to some odd internal practices relating to software groups and Sharepoint our users were in literally hundreds of groups – but why would this break NFS4? Its because, as ever, Active Directory isn’t a what I’d call standard Kerberos implementation. Active Directory uses an optional RFC4120 field called ‘AuthorizationData’. It fills this with a Microsoft-only ‘thing’ called the Privilege Attribute Certificate or ‘PAC’. It contains all sorts of additional information such as groups, SIDs, password caches, etc. Its essential to Microsoft servers – but NFS4 doesn’t need it. NFS4 doesn’t send group information.

The good news is you can instruct AD not to send PAC information for your NFS4 server. The procedure is very simple:

In the Active Directory Users and Computers tool, select View -> Advanced Features.

  • Open the computer object properties of the NFS4 server (i.e. find the computer object for your NFS4 server)

  • Select the Attribute Editor tab

  • Edit the “userAccountControl” attribute

  • The original value will probably be 4096 and be displayed as “0x1000 = (WORKSTATION_TRUST_ACCOUNT)”. Don’t worry if it isn’t that however.

  • Add 33554432 to the value field.

  • Click OK

  • Ensure the stored value shows “0x2000000 = (NO_AUTH_DATA_REQUIRED)”

Once this is done the PAC won’t be added to the Kerberos ticket. This should then allow users to access NFS4 mounts and manage their files – and prevent anybody else managing their files!

In RHEL7 you should not need to do this as the ‘svcgssd’ service has been replaced with a new daemon – the gss-proxy. This software has been written specifically to cope with the huge size of Active Directory Kerrberos tickets. Sadly I don’t have a RHEL7 system (yet) to prove this. I will update this blog post when I do!

Bargate, Linux, Personal, Python, Web Development

Filestore Web Access – or how I fell in love with programming again

When I was 16 I wrote a little ‘CMS’ or website content management system called IonPanel. It was pretty awful – it was written in PHP and MySQL, was probably terribly insecure and I mostly programmed it on Windows using IIS. It was however terribly exciting to write, and rather popular for a little while. Searching for the right string on google would find hundreds upon hundreds of websites running the software, and it was open source! Lots of people contributed to it. Several of my friends wrote little CMS packages, but none were as popular as IonPanel, and none as fast and feature packed. I was very proud of it. Sadly it died of the second-system effect when I attempted to re-write it for version ‘2.0’. A beta was launched, but then I went to University, I started realising how terrible PHP was, and I gave up. IonPanel slowly died. As time passed I longed for that time again – when I was writing code daily on an open source project that lots of people were using.

Since then I’ve written lots of code for lots of people but nothing has captivated me like IonPanel did – until now – twelve years later. A year or so ago I got the idea of writing a web interface to the University’s file storage platform. I’d recently got into Python and wanted to find a CIFS/SMB library I could use from Python. I found one – albeit badly documented and barely used – and wrote an application around it. Today that application has grown into something I’m extremely proud of. Enter ‘Filestore Web Access’.

Filestore Web Access allows all university students and staff to access their personal and shared files from a modern web browser anywhere in the world. Until I created FWA getting access to files away from the University’s standard desktops was quite difficult, unless you knew how to use SSH!

At the time of writing, it’s looking really rather good, here it is in two different themes:

Screen Shot 2014-04-21 at 19.30.55           fwa-flatly

The responsive design (thanks to Twitter Bootstrap, and a lot of extra code) causes it to work great on mobile:

Screen Shot 2014-04-21 at 19.36.28 fwa-mobile-1

And the new login screen with changing backgrounds I’m especially proud of:

Screen Shot 2014-04-21 at 19.33.35 Screen Shot 2014-04-21 at 19.33.59 Screen Shot 2014-04-21 at 19.33.47


I intend to write more about FWA in the next few days and weeks. Until then you can look take a look at even more screenshots!

You can also view the project page on GitHub: