Code That Built the Internet: The Impact of BSD, Part 1

Code That Built the Internet: The Impact of BSD, Part 1

Like the Spanish Inquisition, nobody expected the internet. Its earliest appearance (as ARPANET) took place in the same year—1969—as the comedy troupe behind the Spanish Inquisition quip, Monty Python. But during its first decade, the internet was treated as a convenience for file-sharing and a couple other applications; few observers anticipated how it would alter modern life until decades later. The major institution that brought computers into the internet age was the University of California at Berkeley, which created a version of Unix that they called the Berkeley Software Distribution and is now known as BSD.

Linux Professional Institute honors and promotes BSD through its BSD Specialist certification. This two-part series talks about the overlapping histories of BSD and the internet, and their mutual interdependence. For the internet was just as critical to the development of BSD as BSD was to the internet.

The U.S. Government Decides to Ramp Up the Information Highway

Officially, the internet is a creation of the U.S. Department of Defense, and can be traced to a project at the department’s Advanced Research Projects Agency (ARPA). ARPANET started in 1969 and gradually became the internet we know nowadays in the 1970s and 1980s, as Vint Cerf and others developed TCP/IP. But although protocols were standardized—so in theory, internet hosts could exchange data—programming interfaces were awkward and unstandardized.

ARPA, renamed DARPA, decided in the late 1970s to put its formidable resources into creating an internet-capable operating system. Some form of Unix would be the natural choice, because it was a portable operating system that ran on different types of hardware. It was also quite popular among the scientists that the Department of Defense dealt with. The surprise might be that DARPA didn’t latch onto the original Unix being licensed by AT&T, one of the world’s biggest and most stable companies. Instead, they showed a notably innovative streak by choosing a scrappy bunch of graduate students at the University of California, Berkeley campus.

Admittedly, although many major advances in BSD were developed by students at the university, the system enjoyed many outside contributions. Marshall Kirk McKusick, one of BSD’s key leaders who made early and long contributions to both its code and its organization, provided me with a list of contributors from the years 1979 − 1993. It includes some 60 individuals and organizations who gave them a “large subsystem” and hundreds of other contributors.

I think that several things about Berkeley impressed DARPA. They had already made major contributions to Unix, showing both creativity and hard work. Their license was perhaps most important of all: They gave everybody the source code for free.

In other words, BSD was an early instance of free and open source software. The steps in that direction were complicated and not worth recapitulating here in detail. Essentially, Berkeley’s code complemented AT&T’s proprietary code (which was also available when a customer licensed Unix from AT&T). Berkeley’s networking code was the first to be released independently.

In response to DARPA backing, the University of California at Berkeley took three critical steps making BSD a system of note.

  • It set up a formal organization, the Computer Systems Research Group, to manage BSD development. The CSRG steered the project in a productive direction for the next dozen years.
  • It created the now-famous BSD license, ensuring that the operating system is free and open source software.
  • It completed (over many years, and after an AT&T lawsuit) a complete, independent, stand-alone operating system.

This is why I say that the internet was crucial to the success of BSD. The work they carried out under the DARPA contract made them even more important. And their results, first released in BSD versions 4.1 and 4.2, created an internet everybody could plug into. A slightly later version, 4.3, was labeled the “single Greatest Piece of Software Ever, with the broadest impact on the world” in a 2006 InformationWeek article.

Before we look at BSD itself, take note that the genius of these tools was that they could work on many other operating systems, both Unix and non-Unix. That’s what made the internet a universal communication platform that could legitimately be considered “world-wide” by 1989, when Tim Berners-Lee adroitly crafted the World Wide Web from common internet conventions and other ideas in current use, such as structured markup languages. The Web took the internet to heights of popularity beyond anyone’s expectations, and made possible further advances such as internet commerce and programmed interactions through APIs.

Groundwork: Plugging into the Internet

The starting point for networking, in operating systems, consists of the system calls that send and receive data over network links. Every operating system needs an interface to the networking layer, and modern interfaces are almost universally based on Berkeley sockets.

Sockets are simple data structures that make network programming as easy as file operations. (That’s not just a random comparison. On Unix, everything is a file.) The programmer specifies what protocol they want to use (TCP, UDP, etc.) plus a couple other arguments delineating the type of communication.

Like a call creating a file, a call that creates a socket returns a simple integer that represents the network connection. Sending, receiving, waiting for input, etc. are straightforward from then on.

If you dig down into the programming behind the Web, streaming media, remote logins, or any other use you choose to make of the internet, some modern form of Berkeley sockets lies at a low level.

But How Do You Find the Computer You Want to Send To?

The BSD project also made the Domain Name System a reality that people could populate with hostnames and query.

DNS is complicated: a distributed system of nodes that exist in multiple hierarchies, all with the purpose of translating names such as lpi.org into IP addresses such as 65.39.134.140. Each program using the internet has to figure out whom to ask in order to learn how to reach the system they want to contact.

On the early internet, which was like a small town where everybody knew everybody else, administrators entered names manually into HOSTS.TXT files on each host. That file still exists today on every computer, but it contains only local IP addresses. Remote ones are handled by DNS.

It’s hard to design a system with multiple authoritative hosts, delegation, built-in redundancy, and other characteristics of a robust distributed information system, as specified in the 1983 DNS specifications (RFC 881 and RFC 882) and follow-up RFCs. (An RFC is a typical document released by the Internet Engineering Task Force, and includes its specifications for internet standards.)

Over the past few decades, a lot of research has gone into designing architectures for distributed systems that balance consistency and partition tolerance in feasible ways. Massively partitioned databases such as Cassandra store and serve up large amounts of data. Decision-making can be distributed through Paxos, ZooKeeper, Raft, and other consensus protocols. Content Distribution Networks carry most of the world’s internet traffic. But none of those existed when BSD developers had to implement DNS.

The BSD team’s Berkeley Internet Name Domain (BIND) service was not the first DNS server, but it came very early in the history of DNS and turned out to be the software used nearly everywhere for decades. Even as late as 2023, when BIND was old enough to be considered legacy technology and many other alternatives were available, BIND was estimated to run 60% of authoritative DNS servers (that is, servers that provided information rather than just passing through queries).

The second and final article in this series covers more of the important contributions in BSD.

Author

  • Andrew Oram

    Andy is a writer and editor in the computer field. His editorial projects at O'Reilly Media ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. Andy also writes often on health IT, on policy issues related to the Internet, and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, USTPC.

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *