How Kubernetes Came to Dominate Large-Scale Computing, Part 1

How Kubernetes Came to Dominate Large-Scale Computing, Part 1

Dozens of vendors are wooing organizations of all sizes to adopt cloud services. But the hot item that all the experts are recommending to developers and system administrators is Kubernetes. Master Kubernetes, and you can run programs on any cloud service. You have a good bid for jobs at enterprises around the world.

This series tries to explain why Kubernetes achieved this central role so quickly since its first release in 2015. We’ll compare the container model with two other popular forms of computer resource sharing: virtual machines, also known as Infrastructure as a Service (IaaS), and Functions as a Service (FaaS). We’ll also show how Kubernetes entered a rich environment of containerization, particularly Docker, and capitalized on what was there. (I won’t use the term “serverless computing” in this series, partly because it’s vague and has been applied to both PaaS and FaaS, but mostly because it’s a misleading term—there’s always a server somewhere.)

Kubernetes is covered in the LPI DevOps Tools Engineer certification.

Three Popular Forms of Resource Sharing

Readers of this series probably understand the difference between IaaS, Platform as a Service (PaaS, where containerization falls), and FaaS. I’ll describe the history of each briefly, showing what each is good for and where it caught on. But first some basics for those who can benefit from them:

IaaS (virtual machines)

Here an emulation of a complete computer system, including its operating system and drivers, runs on top of another. Different VMs share hardware and the host software, known as the hypervisor, but are unable to see each other or know of other VMs’ existence.

PaaS (containerization)

An application and its environment (such as libraries, and sometimes ancillary services such as logging) runs in isolation on top of a shared operating system. Docker and Kubernetes are just components within a full PaaS offering.

FaaS

Isolated, stateless functions run on a platform provided by the host system with a full environment to support the functions.

At each of the three stages just listed, the host system and its vendor take on more software and more responsibility, freeing the clients’ developers and system administrators to concentrate on a smaller set of software components.

Although IaaS and FaaS are important, Kubernetes has far outstripped them, as well as dominating the PaaS space. We’ll look first at the history of virtual machines and FaaS, then at containerization.

Virtual machines

The idea of emulating a full computer system—and of offering virtual computer systems to clients at a far lower cost than they would have to pay for running their own hardware—goes far back. Starting in 1967, IBM supported virtual machines on its System 360 (the computer that came to mind when everybody thought of computers during the 1960s and 1970s). VMware launched in 1998, and Amazon Web Services (AWS) in 2006.

There were also plenty of emulators that allowed one to run multiple computer environments on one desktop system, each isolated. One well-known example is QEMU, which goes back to 2003 and is widely used on top of GNU/Linux.

One great advantage of virtual machines is that the host and guest operating systems can differ. Most cloud vendors have adopted Linux for the host, but a lot of their clients want to run Windows. Microsoft Azure, as one would expect, runs on Windows hosts. But in any case, clients have their choice of operating systems to run on top.

Virtual machines were considered leading-edge for years up until 2013, when the creation of Docker suddenly made containerization feasible. Virtualization suffered from a lack of standardization, which several organizations tried to fill.

Two notable open source implementations of hypervisors are XEN and Kernel-based Virtual Machine (KVM), both of them still in use. XEN was released as open source in 2003 and demonstrated that Linux could host virtual machines; both Amazon.com EC2 and Google used XEN for a while in their cloud offerings. KVM was first released in 2006.

(QEMU and Xen are covered in the LPIC-3 Virtualization and Containerization certification.)

But the overarching standard in virtual machines is OpenStack, which was based on work by Rackspace, a major hosting service, and was first announced in 2010.

OpenStack had audacious goals and quickly built up components to match them. It now offers a stack of more than 30 distinct pieces of software: networking, identity (access control), telemetry (administrative tracking), three types of storage, etc.

I suggest that OpenStack appeals to companies with massive data management needs. For instance, it can run cloud services for millions of clients with support for accounting, observability, and other key business requirements.

But it’s a world unto itself. OpenStack reminds me of two other gargantuan standardization attempts, neither of which succeeded to the extent that OpenStack did.

One failed standard was the Common Object Request Broker Architecture (CORBA), which was meant to tie together independently running programs. It tried to anticipate and fulfill all the needs of programs making remote procedure calls, and kept adding layers and adding complexity while never catching up to industry needs. Ultimately, programmers solved (most) of their problems through the web-based REST model, which I have described elsewhere.

The other failed standard was the Digital Computing Environment (DCE). It was a lash-up of products from several long-defunct companies such as Apollo Computer, Digital Equipment Corporation, and Hewlett-Packard. DCE aimed to make a local network appear like one computer, as NFS and CIFS do but on a much grander scale. DCE failed for several reasons, but the main problem was that (like CORBA) it tried to accomplish too much and the vendors took shortcuts by tying together incompatible (and extremely buggy) products instead of producing an integrated system from the ground up.

One observer found more than 5,000 companies using OpenStack currently. That’s a robust customer base, but it pales before Kubernetes, which is estimated to be used in 60% of enterprises.

Why have virtual machines faded in importance in the age of containers? Containers are much more lightweight than virtual machines. So they start up more quickly (making it easier to respond to spikes in user load) and cost less to run. More and more developers prefer Linux as their application platform, so they have no objection to running on a Linux host; the great advantage of virtual machines no longer makes a difference.

The job of populating a virtual machine is a significant burden. To keep down the instances’ footprints (and hence the latency and cost of running the virtual machines), administrators must rigorously search their system and eliminate every daemon, every utility, every library that isn’t needed by their particular application. This is an important practice for containers too, but because the developer isn’t including an operating system, the task is much easier.

Many observers claim that containerization is more likely than virtual machines to allow breaches (where one client can read or even write another client’s data). Vulnerabilities have been found both in VM hypervisors and in operating system support for containers. I have never been able to figure out why some people assume that containerization is less secure: their reasoning seems to be that operating systems that support containers are bigger than hypervisors and therefore have a larger attack surface, but that assumption seems simplistic to me. Google, which created Kubernetes and naturally has a lot invested in containerization, published an article whose point (to perhaps be simplistic) is that the important thing in security is not whether you use virtual machines or containers, but whether you build your software using secure practices.

FaaS

IaaS, which I’ve just expounded on, and PaaS, which I’ll describe last, can be self-hosted, although they make third-party vendor offerings extremely useful. By contrast, FaaS is entirely designed for a third-party vendor. If you run functions on your own hardware, you’re just running programs as usual. The attraction of FaaS is that you outsource almost all administrative tasks to a third party. In consequence, porting your functions to another vendor could involve a lot of work.

FaaS can be ideal for certain use cases. For instance, most web servers are already highly modularized and handle each user request individually. Thus, ephemeral function invocations that rely on an external database to store state can implement a web server very efficiently. Clearly, that’s a major market in computing.

Functions in FaaS are event-driven. You tell the vendor what external activity you want to trigger each function, and the server invokes the function as needed, scaling up as much as necessary by running more instances of the function. So FaaS is a reasonable option for any short-term task that takes input from a user and returns a result, then goes dormant.

But most applications have multiple layers of functions, along with supporting services and a need to preserve state in addition to any database they use. FaaS leaves all this up to the developer to implement separately.

So if virtual machines offered more than most developers wanted, FaaS offered less. Containerization was the sweet spot. There was only one problem during the early growth of distributed programming: no convenient and highly functional software existed for containers. There were PaaS services, but they involved vendor lockin and didn’t provide all the tools developers wanted.

That changed in 2013—the year of the container. We’ll pick up the story of containers in the second article of the series.

Author

  • Andrew Oram

    Andy is a writer and editor in the computer field. His editorial projects at O'Reilly Media ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. Andy also writes often on health IT, on policy issues related to the Internet, and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, USTPC.

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다