Why I now prefer Arch Linux

Yesterday I installed Linux onto a fresh laptop and there was no debate in my mind what Linux distribution I would use – Arch. A year ago that would not have been the case, I might even have considered installing RHEL instead. So why Arch?

When I was younger – and still today – I noticed that most people I spoke to in the IT profession who had reached their fourth or fifth decade of life had given up on new ideas or new technologies and had settled into what they knew best. Although I respected this, I didn’t enjoy the thought that maybe one day I’d become the same way, even though I felt that my openness to new ideas and technologies was one of my strengths, and something I deeply enjoyed.

Well, I’m in my third decade of life now, and I have definitely begun to notice how easy it can be to settle down for what is comfortable and stick with it. Thankfully I haven’t lost my passion for new ideas and trying new things. Last year at Southampton we hired an intern, this in of itself is a tribute to my manager for changing the norm and doing something new because nobody had ever done that before. It was an eye opening experience, not just because it was enjoyable to teach, but because the internship taught us far more than we realised it would. It challenged us. It made us think differently. The whole team improved. What I didn’t expect was to change my preferred desktop operating system – but, it did. He (the intern) had an established like for Arch linux, advocated it when asked why, and left it at that. He didn’t push me to try it – but it made me want to try it. And so I did.

For a typical end user, Arch Linux is next to impossible to install. It doesn’t have an installer at all. The bootable image you download starts Linux running and then drops you at a zsh prompt. From there its entirely up to you on what you do next. The installation documentation is clear, concise, but not detailed – you need to know what you’re doing to get anywhere – but this, for me, is perfect. It gets out of the way, gives me the tools to install, and then lets me get on with it. I loved it.

The initial experience made me feel like a young teenager installing Linux for the first time. It literally made me feel young again. I felt like I was learning Linux again. Although I had used all these tools before, I had become used to an installer doing it all for me, and I enjoyed learning things I didn’t know about tools I’d used for years.

Arch Linux is minimalist, it is simple. It gets out of your way. You have total control over what your system looks like. It doesn’t hide details from you so you learn why things are done the way they are. At each stage the documentation lets you decide how you want to build your system and gives you information on why you might want to pick any particular solution.

It is easy to build a desktop linux installation. If you want GNOME, on Wayland, with all the bells and whistles, its easy. If you want Xorg and fluxbox, this is very easy too. If you want to use GRUB as your bootloader, you can, but if you want to use systemd-boot (as I do), you can do that too! Each option is easy to achieve, and isn’t hidden away. On Debian I wanted to switch to systemd-boot, but it was far more difficult than expected, and I gave up and just left it with GRUB. On Arch there is support for many different options, and they’re all very easy. You don’t have to accept what the distribution wants you to use.

The other fantastic feature of Arch Linux though is that it seems to solve the classic divide of Linux distributions: new software vs stable software. If I were to pick RHEL as my distribution it would be stable, sure, but the software is old, and most modern apps can’t be easily installed. Just getting flatpak on RHEL is next to impossible. I could alternatively pick Fedora, but then I have to go through a huge upgrade every six months and often things break.

Arch is a hybrid of these approaches, in that it is a rolling release distribution. I have access to all the latest stable versions of all the software I want, but there is no ‘big bang’ forced upgrade every six months. I update when I want to, and Arch updates individual packages as they reach a stable point based on the application itself, rather than an arbitrary time deadline. With RHEL software is considered stable just because it is of a certain age – with Arch, software is considered stable when the developers have declared it to be so based on their thorough understanding of their own releases. You might think that Arch would be somewhat like Debian unstable – frequent bugs and breakages from updates – but in fact Arch seems to hit the sweet spot of stability and modernity.

Pretty much my only criticism of Arch is the lack of a standard AUR installation tool in the base platform. AUR refers to ‘arch user repository’ – add-on software packages that aren’t maintained by the core team. It is heavily used by end users though, and there are many front end tools to make it easy to install AUR packages. Without one of these tools users have to clone a git repository and build packages themselves. With a frontend, such as pacaur (which is what I used), its as simple as using the standard package manager.

So, if you like Linux, and use it on the desktop, and want a better experience: go give Arch a try. You might just like it 🙂

bargate 1.6

bargate, an open source web interface to SMB file servers, hasn’t had many new features recently. It is now very stable/reliable and there is little need for any user-visible features. I do however plan a number of changes for v1.6:

Add pysmb support for better SMB2 performance

bargate uses the pysmbc library to talk to SMB file servers. This is a very thin wrapper around Samba’s libsmbclient – the defacto standard open source SMB client. Sadly when using SMB2 or later performance is terrible. The trouble seems to be in the stat() call, here is a comparison:

When using SMB2, pysmbc/libsmbclient performance is so bad that it is unusable. Until now this hasn’t really been a problem – everything supported SMB1 still, and Samba still doesn’t enable SMB2 by default. Since the Windows wannacry vulnerability though many folks are just turning SMB1 off altogether (although there really isn’t a good reason for doing so).

I haven’t made any progress with the Samba team in identifying why using SMB2+ is so much slower, so I’ve decided I will make the ‘backend’ of bargate modular. You will be able to choose which library to use – either pysmbc (libsmbclient) or pysmb, which is an alternative pure-python SMB1/2 implementation. Unlike pysmbc, the pysmb library does not suffer performance issues when using SMB2.

Switch to Bootstrap 4

When Bootstrap 4 is released I’ll switch over to using that rather than the current Bootstrap 3, and take the opportunity to re-write much of the HTML to make the pages more efficient.

Add text editing in browser

The final feature I’m planning to add in bargate 1.6 is support for editing files ‘in-browser’ via the codemirror javascript-based text editor. This would allow users to edit text documents from within bargate, such as plain text, or HTML or programming code.

Attacking racism does not stop racism

This year – 2016 – has to be one of the most politically divisive years in recent history. In the UK voters opted to leave the European Union, many clearly hoping to end immigration, and in the USA voters opted to elect Trump, again many hoping to end immigration. In both cases polls before the elections proved to be largely incorrect, and whats more, the principal of the silent majority seems to be the cause.

I’m not going to spend long arguing this point, but it seems clear to me that a lot of people in both countries didn’t admit how they truly felt and so also didn’t admit what they were going to vote for. After the election seemingly legitimatises how they voted they pretty much always say they hate being labelled racists and they feel like their fears and arguments are shut down as being racist. They usually end up by saying this vote means they are the majority and they are in fact not racist at all.

It is clear a majority of these voters feel ignored by the ‘establishment’ and feel that they are not represented. They are correct. The establishment has ignored them. Mainstream political parties don’t know what to do with them.

In both elections the winning slogans have been succinct and obvious: “Take back control” and “Make America great again”. They are anti-progressive rallying calls and the left-wing and centre-ground response is to call people who agree with them ‘a basket of deplorable’s’, ‘racists’, ‘sexists’, ‘homophobes’ and ‘transphobes’. They are correct. These people really are motivated by these emotions.

It is thus both true that these people are ignored and that they hold racist views. The mistake that we make is to demonise these people, to attack them and to call them racist – even if it is in fact true. Unless we’re planning on disenfranchising these people because they are racist then elections will continue to be lost and the “new right” will continue to grow – condemning them makes them more angry and more likely to vote for people like Trump who legitimatise actions which they think will make them feel better.

Recently I fell out with a close friend because his reaction to racism in America was to attack people who happened to be white – calling them racists (“all white people are racist”). Rather than engage with these people, hold civilised debate, he called white people “saltine crackers”, he accused them of having no culture, and he accused them of not having friends who weren’t white. Such a reaction is obviously completely bonkers because it does nothing to actually end racism and discrimination. It only angers people further, and emboldens them to spread their unacceptable views. Its like fighting a house fire by throwing bombs at it.

What we need to do is accept that racism is natural and very human. Instead what happens more often than not is that we assume that racism exists only because white people are ‘inherently’ racist, which is in of itself racist! The irony should be obvious. We should seek to understand racism, accept that its a perfectly normal thing for people to feel, and educate ourselves on how to not take racist actions.

It is not hard to see why we’re all capable of having racist views. Humans evolved to survive – just like every other creature on earth. As such when we interact with people who look or act significantly different to ourselves we react in fear – because that is what kept us alive for the past few million years. Its a perfectly logical response. I will call upon Star Trek to better illustrate my point:

QUARK: You never pulled a stunt like that. You’re smart enough to know that people don’t want to be reminded that you’re different. Who wants to see somebody turn into goo? I hope you don’t do that around Kira.

ODO: Why shouldn’t I?

QUARK: If she’s anything like me, she’d rather you didn’t. Don’t you get it, Odo? We humanoids are a product of millions of years of evolution. Our ancestors learned the hard way that what you don’t know might kill you. They wouldn’t have survived if they hadn’t have jumped back when they encountered a snake coiled in the muck. And now millions of years later, that instinct is still there. It’s genetic. Our tolerance to other lifeforms doesn’t extend beyond the two arm, two leg variety. I hate to break this to you, but when you’re in your natural state, you’re more than our poor old genes can handle.

ODO: So what are you saying, Quark? That the Klingons couldn’t help what they did because of their genes?

QUARK: I’m not trying to excuse what they did. I’m only telling you why it happened.

from Star Trek: Deep Space Nine Season 7 Episode 14 “Chimera”

Now, some people might think Quark and I are justifying racism. What we’re doing is justifying racist feelings not actions. Feeling fear and disliking people that are different from you is normal. What matters is the actions you take based upon those feelings and logical thought. We don’t teach this though. We simply condemn racism, and in the process, confuse and anger most human beings because we’re not making it clear that is is perfectly normal and acceptable to feel fear and dislike about people who look or act differently. What is NOT acceptable is intentionally acting on those feelings.

 

This is the difference between childhood and adulthood – learning to accept emotions and not just acting directly on them. What we need to do is encourage people to express how they feel and take positive action rather than voting for a narcissistic sexual predator who has no problem saying we should act on our fears because… “we have no choice”.

Dear canonical: we don’t want or need ZFS

It is the late 1990s and the computer server world is dominated by enterprise UNIX operating systems – all competing with each other. Windows 2000 is not out yet and Windows NT 4 is essentially a toy that lesser mortals run on their Intel PCs which they laughingly call ‘servers’. Your company has a commercial UNIX and its called Solaris. Your UNIX is very popular and is a leading platform. Your UNIX however has some major deficiencies when it comes to storage.

IRIX – a competing proprietary UNIX – has the fantastic XFS file system which vastly out performs your own file system which is still UFS (“Unix File System” – originally developed in the early 1980s) and doesn’t even have journalling – until Solaris 7 at least (in November 1998). IRIX had XFS baked into it from 1994. IRIX also had a great volume manager – where as Solaris’ ‘SVM’ was generally regarded as terrible and was an add-on product that didn’t appear as part of Solaris itself until Solaris 8 in 2000.

It wasn’t just IRIX that was extraordinarily better in this area. AIX was ahead of both – JFS was released in 1990 and had file system features that were only just recently introduced by Microsoft with ReFS. JFS was a journaled file system – the first ever journalled file system included in an OS – as I mentioned above it took until November 1998 for Sun to catch up. AIX had a “Logical Volume Manager” (LVM) implementation as well, which again was much better than Sun’s SVM.

This disparity between Solaris and other commercial UNIX platforms did not however hold Solaris’s market share back as it perhaps should have. This was because customers using Solaris on big high-end servers would simply not use UFS, especially not between 1998 and 2005. Customers used VxFS instead – a third party file system, but one that was developed originally at AT&T’s Unix labs, one that was the core file system in another proprietary unix – HP-UX – and one that had journalling, was modern, and could actually compete against XFS and JFS. Of course customers had to buy this from Veritas, but this was a small price to pay for a decent file system and volume system (yes, it came with an alternative volume manager too – Veritas Volume Manager).

So eventually Sun realised that storage was exploding in capacity and UFS just wasn’t up to the task. They also realise that VxFS wasn’t likely to be up to the task either, and with the growing threat of Linux and Windows Server a different solution was needed – a file system to fix all the problems with UFS and leap-frog the competition. As a young man I was fortunate to work at Sun Microsystems when this was happening and I got to meet the core ZFS developers and even work in the ZFS ‘development lab’ – I worked in the same building.

Sun had a problem though – they didn’t just need a new file system. Their RAID implementation (entirely in software, Sun servers never had hardware RAID), and volume management implementations also needed to be replaced. So ZFS sought to replace all three of these components at once. Sadly it would take until 2006 for ZFS to be released into production usage on Solaris, and by then the battle for the enterprise operating system was already over. Linux and Windows had won, the commercial UNIXes had lost. Intel had won the hardware war – the commercial UNIX vendors had lost. Perhaps file systems weren’t as important as Sun had thought.

ZFS is a child of the 1990s commercial UNIX systems. It is an incredibly complicated file system that manages the entire storage stack from the disk spindles all the way up to the file system exposed to application. It can manage vast quantities of disk spindles and scale up to 16 exabytes of storage. It is however still very much a product of the 1990s Sun thinking – a file system for large, beefy all-in-one servers running hundreds of applications. The world had however moved on whilst Sun wasn’t watching.

By 2006 the dominant server platform was 1 or 2U Intel server running Linux or Windows 2003 – servers that almost universally shipped with hardware RAID controllers. High-end SAN storage arrays were king in the enterprise and ZFS wasn’t built for them at all – ZFS was designed to manage the disks directly, making it a great platform for building a SAN storage array itself! Except it wasn’t, because ZFS was still designed with a 1990s mindset. It has no clustering support, its a local file system designed for just Solaris to utilise.

The goal of ZFS to allow Solaris to compete and address vast swathes of storage that UFS and the other competing file systems could not. However by 2006 when ZFS was finally released the other file systems had caught up. They had evolved to scale to the available storage. For a short while everybody talked about how Linux needed ZFS, how Mac OS X needed ZFS, and how ZFS could even turn up in Windows. Ten years after ZFS was launched none of those things have turned out to be true.

Even more frustrating for ZFS fans is that today the dominant computing model is virtual machines and containers: lots of relatively small operating system instances utilising relatively small data sets working together. ZFS makes very little sense in this environment.

Proponents of ZFS on Linux and elsewhere said that ZFS was required because it was revolutionary and much better than what Linux had. In some cases this was true, but in the important cases it was not. Linux was then, and still now, mostly run on hardware RAID systems, had fantastic simple and reliable software RAID, a performant and simple volume manager (LVM) and a range of file system choices that scaled to what was available then and now. Linux was gifted both XFS and JFS from Solaris’ rivals – and both of which continued to develop, XFS particularity so.

Linux did lack some features of ZFS – namely efficient snapshots and data checksumming – that were important. Ten years later we can clearly see that these issues did not prevent the adoption of Linux and ZFS did not in any way save Solaris – Solaris is dying, slowly in private, away from the eyes of the press. Linux won, despite not having ZFS (or Dtrace).

So what about today, does Linux needs ZFS? Canonical thinks it does, and thinks ZFS is exciting technology – more exciting than we’ve seen in “a long time”[1]. Except it really isn’t. These are the same arguments we heard 10 years ago and yet ZFS is even less relevant today than it was a decade ago. Canonical tried to justify ZFS with a series of ‘killer’ features:

  • ‘snapshots’
    Linux already has copy on write snapshots via LVM thin provisioned pools, its in production and supported in RHEL. What’s more it supports most Linux file systems – you and choose whichever you like. If you prefer you can dump LVM and use btrfs which supports snapshots in the same way. So no, sorry canonical, this is not a killer feature of ZFS.
  • ‘copy-on-write cloning’
    ZFS clones are just writeable snapshots, it snapshots ZFS and then copies this (via COW) to create a writable clone. Well, shucks, Linux’s LVM supports this as well and has done for years. It also is a COW based system. Oh and btrfs does this too. This isn’t a killer feature of ZFS either.
  • ‘continuous integrity checking against data corruption’
    XFS has metadata-only (non-data) integrity checking too. Btrfs has full data integrity checking against data corruption. So, no, ZFS can’t claim this is a killer feature that others don’t have. It doesn’t matter anyway – this continuous integrity checking means nothing if you’re using ZFS on a hardware RAID controller or against an enterprise (or non-enterprise) storage array. It only works and guarantees anything if you’re letting ZFS manage the spindles directly. This was a product of 1990s thinking about how storage would be attached or baked into Sun’s servers. Besides, when was the last time you got data corruption? What problem is this trying to solve? I’ve never felt that Linux needs this feature, have you? This isn’t a killer feature.
  • ‘automatic repair’
    Whilst it is true that ZFS does not have to run a horrible NTFS-style chkdsk process, or a horrible ext3-style fsck process either, other file systems have progressed in this regard too. XFS has a similar automatic repair function, doesn’t ever run fsck at boot (there is no XFS fsck!), and does have an xfs_repair tool that nobody ever has to use. Its also worth pointing out that ZFS does have to have non-automatic repairs sometimes, in fact, I’ve had to do it a lot when running ZFS in production. ZFS scrub’s are…not fun, ridiculously slow and can lose files just like any file system does. I found this in production multiple times. Oh, and btrfs supports ‘automatic repair’ too. This isn’t a killer feature.
  • ‘efficient data compression’
    I think this is the only feature that has any merit in Canonical’s list, but I cannot call it a killer feature. Work is ongoing on adding compression into ext4, but nobody seems to care much about doing it. If you really want it its baked into btrfs on Linux. So no, canonical, this is not a ‘killer’ feature.

ZFS – and sadly btrfs – are both rooted in a 1990s monolithic model of servers and storage. btrfs hasn’t caught on in Linux for a variety of reasons, but most of all its because it simply isn’t needed. XFS runs rings around both in terms of performance, scales to massive volume sizes. LVM supports XFS by adding COW snapshots and clones, and even clustering if you so wanted. I believe the interesting direction in file systems is actually things like Gluster and Ceph – file systems designed with the future in mind, rather than for a server model we’re not running any more.

Canonical are targeting ZFS support for containers, saying that its the perfect fit for that. The irony is containers don’t need ZFS. Red Hat is using a LVM/devicemapper CoW based approach. CoreOS has switched away from btrfs (ZFS-style snapshots!) to overlayfs and ext4 – and apparently performance was much better. Docker can use OverlayFS as well and recommends against using ZFS.

Of course, Ubuntu suffers from NIH syndrome, and so isn’t using Docker/Atomic/rkt etc – it has created its own container technology – LXD. Perhaps it doesn’t matter then that you’re using OpenZFS – if you’re planning on using LXD (and thus ZFS) you’re ignoring the mature, established container technologies and picking a platform that is almost certainly going to be poorly supported going forward.

A replacement for nss_updatedb: nis2db

In 2011 the glibc project released version 2.15 which dropped support for using Berkley DB based database files as a source of user/group information (well, any name service switch information). Instead the “db” backend of name service switch (nss) is now provided by a simple glibc-specific db file format.

This means the nss_updatedb tool, which I have used for years to provide network-free user/group information on Linux machines, no longer works on modern Linux systems. The tool generated BDB files that glibc’s nss db module simply does not support anymore. All Linux systems using glibc 2.15 or later are affected by this change.

To restore the functionality I need, which is to pull user and group information out of NIS and place them in db files that glibc can read, I have written “nis2db” which is a really simple python script which reads from NIS and uses the “makedb” command shipped with glibc.

The tool is available now and is open source: https://github.com/divad/nis2db

Britain: For the Love of God, Please Stop David Cameron

Benjamin Studebaker

On May 7 (this Thursday), Britain has a general election. I care deeply about British politics–I did my BA over there and will return to do my PhD there this fall. But more importantly, David Cameron’s government has managed the country’s economy with stunning fecklessness, and I couldn’t live with myself if I didn’t do my part to point this out.

View original post 1,605 more words

Kerberos protected NFS with Active Directory and the PAC

For years I’ve been trying to use Active Directory’s Kerberos implementation for setting up secure NFS4. This is where NFS4 is configured to require kerberos tokens to make sure only the user with a valid kerberos token (i.e. they authenticated to Active Directory) can access their relevant files. This is stark contrast to NFS with AUTH_SYS – where certain IP addresses are essentially given full access.

The advantage of using NFS4/krb5 is that it can be used to share out a protected NFS4 file share to whatever IP address you like safe in the knowledge that only authenticated users can access their files. They have to authenticate with Kerberos first (i.e. Active Directory) before they can access their files – and only their files. It also solves the ‘root squash’ problem – root cannot access everybody else’s files.

However, in the past we were using Windows Server 2003 as our domain controllers – we only upgraded to Server 2012R2 a few months ago. Since upgrading we could finally mount (connect to) NFS4 shares protected by Kerberos. Once mounted however users could not use their kerberos ticket to access their files – permission denied was returned. The logs showed no errors. It was a head banging against brick wall moment.

Everything should have been working, until I discovered an obscure article suggesting that our users are in too many groups. Sure enough thanks to some odd internal practices relating to software groups and Sharepoint our users were in literally hundreds of groups – but why would this break NFS4? Its because, as ever, Active Directory isn’t a what I’d call standard Kerberos implementation. Active Directory uses an optional RFC4120 field called ‘AuthorizationData’. It fills this with a Microsoft-only ‘thing’ called the Privilege Attribute Certificate or ‘PAC’. It contains all sorts of additional information such as groups, SIDs, password caches, etc. Its essential to Microsoft servers – but NFS4 doesn’t need it. NFS4 doesn’t send group information.

The good news is you can instruct AD not to send PAC information for your NFS4 server. The procedure is very simple:

In the Active Directory Users and Computers tool, select View -> Advanced Features.

  • Open the computer object properties of the NFS4 server (i.e. find the computer object for your NFS4 server)

  • Select the Attribute Editor tab

  • Edit the “userAccountControl” attribute

  • The original value will probably be 4096 and be displayed as “0x1000 = (WORKSTATION_TRUST_ACCOUNT)”. Don’t worry if it isn’t that however.

  • Add 33554432 to the value field.

  • Click OK

  • Ensure the stored value shows “0x2000000 = (NO_AUTH_DATA_REQUIRED)”

Once this is done the PAC won’t be added to the Kerberos ticket. This should then allow users to access NFS4 mounts and manage their files – and prevent anybody else managing their files!

In RHEL7 you should not need to do this as the ‘svcgssd’ service has been replaced with a new daemon – the gss-proxy. This software has been written specifically to cope with the huge size of Active Directory Kerrberos tickets. Sadly I don’t have a RHEL7 system (yet) to prove this. I will update this blog post when I do!

Filestore Web Access – or how I fell in love with programming again

When I was 16 I wrote a little ‘CMS’ or website content management system called IonPanel. It was pretty awful – it was written in PHP and MySQL, was probably terribly insecure and I mostly programmed it on Windows using IIS. It was however terribly exciting to write, and rather popular for a little while. Searching for the right string on google would find hundreds upon hundreds of websites running the software, and it was open source! Lots of people contributed to it. Several of my friends wrote little CMS packages, but none were as popular as IonPanel, and none as fast and feature packed. I was very proud of it. Sadly it died of the second-system effect when I attempted to re-write it for version ‘2.0’. A beta was launched, but then I went to University, I started realising how terrible PHP was, and I gave up. IonPanel slowly died. As time passed I longed for that time again – when I was writing code daily on an open source project that lots of people were using.

Since then I’ve written lots of code for lots of people but nothing has captivated me like IonPanel did – until now – twelve years later. A year or so ago I got the idea of writing a web interface to the University’s file storage platform. I’d recently got into Python and wanted to find a CIFS/SMB library I could use from Python. I found one – albeit badly documented and barely used – and wrote an application around it. Today that application has grown into something I’m extremely proud of. Enter ‘Filestore Web Access’.

Filestore Web Access allows all university students and staff to access their personal and shared files from a modern web browser anywhere in the world. Until I created FWA getting access to files away from the University’s standard desktops was quite difficult, unless you knew how to use SSH!

At the time of writing, it’s looking really rather good, here it is in two different themes:

Screen Shot 2014-04-21 at 19.30.55           fwa-flatly

The responsive design (thanks to Twitter Bootstrap, and a lot of extra code) causes it to work great on mobile:

Screen Shot 2014-04-21 at 19.36.28 fwa-mobile-1

And the new login screen with changing backgrounds I’m especially proud of:

Screen Shot 2014-04-21 at 19.33.35 Screen Shot 2014-04-21 at 19.33.59 Screen Shot 2014-04-21 at 19.33.47

 

I intend to write more about FWA in the next few days and weeks. Until then you can look take a look at even more screenshots!

You can also view the project page on GitHub: https://divad.github.io/bargate/

 

Docker is a whale which carries containers on its back

docker-logo

See, its a whale! With containers! On its back! Like discworld but a whale instead of a turtle.

Ever since I first played with User Mode Linux (UML) back in the days of Linux 2.4 I’ve been working with virtualisation, normally being involved in server virtualisation activities wherever I’ve worked. The project I’m leading right now at Southampton is the conversion of our entire physical server estate to virtual on VMware.

Despite living and breathing these technologies I’ve never actually liked x86 virtualisation. It is a terrible waste of code and processor time. It virtualises the entire hardware platform as if the guest OS is actually running on real physical hardware – but why? And even this isn’t entirely true anymore – in all modern virtualisation products the guest OS is fully aware its being virtualised, there are tonnes of ‘tools’ and ‘drivers’ running facilitating communication between guest and hypervisor. Its thus a hybrid – a mess of different approaches and compromises.

I entirely blame Microsoft for the growth of this odd x86 virtualisation market. Outside of the x86 world IBM and Sun created hardware level virtualisation and OS-level virtualisation, but in x86 land, because of the proprietary and slow-moving nature of Windows, vendors sprang up creating the x86 hybrid virtualisation model – part hardware, part software. It meant you could run Windows inside a virtualised container and make better use of hardware – at the cost of enormous overheads and massive duplication of data. One of the most ridiculous things from an architecture perspective is every x86 VM solution emulating a PC BIOS or UEFI firmware instance for every guest. Whatever for!

So for a long time I’ve been hoping that “OS-level” virtualisation would eventually assert itself and become the dominant form of virtualisation. I think it hasn’t because Microsoft joined the x86 virtualisation party by buying Hyper-V and rushing off to compete with VMware and so the market has carried on down this odd virtualisation path. Architecturally there will always be a place for this type of virtualisation, but the vast majority of servers and virtual desktops don’t need this. They don’t need to pretend to be running on real hardware. They don’t need to talk to a fake-BIOS. Clearly the x86 virtualisation vendors think this too as each new generation of product has mixed more ‘paravirtualized’ components into the product – to improve performance and cut down on duplication.

So whats the alternative? Real OS-level virtualisation! There are lots to choose from too. Solaris has Zones/Containers. FreeBSD has jails. AIX has WPARs. HP-UX has HP-UX containers. Linux predictably has lots to choose from: OpenVZ, VServer, lmctfy and LXC to name a few (and predictably, until recently, none were in the upstream kernel). LXC is the one everybody was talking about. The idea was to put acceptable OS-level virtualisation components into the kernel rather than just taking OpenVZ and shoving it in the kernel, which would have ended badly and never been accepted. So LXC has taken a long time to write because of this and somewhat has lost its ‘new! exciting!’ sheen.

LXC remains however the right architectural way to do virtualisation. In LXC, and all the other OS-level technologies, the host’s kernel is shared and is used by the guest container. No hardware is virtualised. No kernel is virtualised – only the userland components are. So the host’s kernel is still doing all the work and thats what the guest operating system uses as its kernel. This eliminates all the useless overheads and allows for easy sharing of userland components too – so you don’t have to install the same operating system N times for N virtualised guests.

Sadly everybody’s experience with LXC for the past few years was along the lines of “oooh, that sounds awesome! is it ready yet?” and usually the answer was “not yet…nearly!”. All that changed last month though as LXC 1.0 was released and became ‘production ready’. Yay! All we needed now I thought was for all the Linux shops to switch away from bulky x86 full fat hypervisors and start moving to LXC. Instead, by the time LXC 1.0 was released, something else has come along and stolen the show.

Enter Docker. Now, Docker actually is LXC. Without LXC, Docker wouldn’t exist. But Docker extends LXC. Its the pudding on top which makes it into a platform literally everybody is talking about. Docker is not about virtualising servers, its about containerising applications, but uses LXC underneath. The Docker project says that the aim is to “easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more.”

So when I realised Docker was getting massive traction I was displeased, because I wanted LXC to get this traction, and docker was stealing the show. However, I had missed the point. Docker is revolutionary. I wanted LXC to kill all the waste between the hardware and the server operating system’s userland components – the parts that are my day job. Docker wants to kill that waste, and all the waste in the userland of the operating system as well – the parts I hadn’t considered being a problem.

For years vendors and open source projects have produced applications, released them and asked for an IT department to install and maintain operating systems, install and maintain pre-requisite software and then install the application and configure it. Then usually another team in the organisation actually runs and maintains the application. Docker has the potential to kill all of that waste. In the new world order the vendor writes the code and creates a container with all the prerequisite OS and userland components (except for the linux kernel itself) and then releases the container. The customer only has to load the container and then use the application.

It is then a combination of the fairly well established “virtual appliances” seen in VMware/KVM/Hyper-V land, but with all the x86 hypervisor waste removed.

This has many benefits:

  • The software vendor doesn’t have to maintain a full operating system that is expected to work on any number of virtualisation solutions and different fake hardware models. They only have to target LXC, with the host kernel doing all the difficult work.
  • The software vendor can pick and choose whatever userland components they need and properly and fully integrate the application with the userland OS.
  • The software vendor takes care of patching the userland OS and the application. The patching process is integrated. No more OS patches breaking the app. No more OS patching for the IT department to do.
  • The customer IT department’s work is radically and significantly reduced. They only have to deploy the container image – a very easy procedure – and within seconds have a fully set up and ready to use application.
  • And end to dependencies, prerequisites, compatibility issues, lengthy installation, incorrect configured operating systems and applications
  • And all the benefits of LXC – low overheads, high performance, and end to the duplication of the same operating system.
  • And end to having to upgrade and move applications because the guest server operating system is now end of life – even if the application isn’t.

So, today’s IT platforms probably consist of:

  • A farm of physical servers running a hypervisor platform like VMware or KVM
  • Hundreds if not thousands of virtual machines running only 2-3 different operating system flavours (e.g. RHEL5/6 or Windows Server 2008/2012) with a small number of VMs (<10%) running exoctic different things
  • Teams of infrastructure people maintaining the guest operating systems and using OS-level management systems such as RHN, Landscape, Puppet, Chef, Cfengine, Runit, etc and spending a lot of time patching and maintaining operating systems.
  • Teams of application people, usually without root, or even worse with root, having an uneasy relationship with infrastructure teams, installing applications and patching them (or probably not patching them) and maintaining them.

If Docker catches on the way I’d like it to (beyond what even the Docker project envisaged) then I think we’d see:

  • A farm of physical servers running an LXC hypervisor Linux OS
  • Hundreds if not thousands of Docker containers containing whatever the vendor application needs.
  • Teams of application people using the vendor supplied web-interfaces to manage the applications, patching them using vendor patching systems which integrate all the components fully, or just by upgrading stateless docker instances to the latest version.

It seems that this vision is already a reality: https://coreos.com/. CoreOS envisages applications packaged as ‘Docker’ containers, and CoreOS as the minimalist platform hypervisor underneath. The IT departments’ sole job would be to install CoreOS onto hardware and then load Docker containers as needed from vendors, open source projects, and internal software development teams.

This is all very new and cutting edge. Docker 0.9 was only released a few weeks ago. CoreOS’s latest version is a major change. Other exciting areas of development with Docker are plans to let you swap out LXC and use OpenVZ or Solaris Zones or FreeBSD jails instead, thus opening Docker up to Solaris and BSD too. This is a very exciting new frontier which, if successful, will totally re-write how the IT world works. I can’t wait to see what happens next.