RSS
Blog
Open, Simple, Generative: Why the Web is the Dominant Internet Application

Open, Simple, Generative: Why the Web is the Dominant Internet Application

August 17, 2021 - by Andrew Oram

Everything in the 2021 Open Anniversary celebration comes together in the Open Web, the subject of this month's article. In fact, I have taken the words Open Web as almost a prompt for free association. While everyone appreciates the wealth of free information on the Web, readers of this article may be surprised at where the idea of the Open Web takes us.

Hypertext, the Internet Way

Let's start at the dawn of the World Wide Web. The two standards on which it originally rested were HTML and HTTP.

The “HT” that those abbreviations share stands for hypertext, a term famously invented by visionary philosopher Ted Nelson around 1965. As a somewhat heady and grandiose term that matches Nelson's expansive ambition and personality, hypertext recognizes that thoughts are never confined to individual documents. Each document inevitably refers to and depends on an understanding of other sources of information. This has always been true, but the web makes the relationships explicit through links.

A zeal for representing relationships was and remains the ultimate goal of Tim Berners-Lee, inventor of the web. Just glance through his original proposal to his management, the European Organization for Nuclear Research (CERN in its French abbreviation). The proposal is bound together throughout by a concern for exposing and organizing the relationships between thoughts and between things. The same obsession has remained through the decades in Berners-Lee's proposals for a Semantic Web (2000) and Linked Data (2006).

Grandiosity on the scale of Nelson and Berners-Lee is not entirely abjured by CERN, either, which presents the first-time visitor to its web site with the question, "What is the nature of our universe?"

So the idea of hypertext has been around for a while, but neither Nelson's grand vision (the stillborn Xanadu) nor later experiments such as Apple Computer's HyperCard caught on. I will say a bit about each of these projects in order to contrast them with the traits that took the World Wide Web on such a different path.

First, both Xanadu and HyperCard were proprietary. This limited chances for people outside the organizations developing each technology to add to it and build their own visions on it. The web, in contrast, because it was open, shared the amazing ability of certain computer technologies to spawn enormous innovation. In the term used by law professor Jonathan Zittrain, the web is generative.

Apples' Hypercard was starved for resources, admittedly, but I found little value to it in the first place. The design offered limited capabilities, probably because of the tiny computer memories and slow processors of the 1980s. Each piece of content had to fit on a small card. The biggest limitation was fundamental: each set of cards was self-contained and couldn't link to outside resources. It was left to Berners-Lee to make this major leap in hypertext power through one of his greatest inventions: the URL. These unassuming strings take advantage of the internet and Domain Name System—and Berners-Lee cooked into the URL ways to connect to content outside the web, a valuable gambit to gain a foothold in a sea of other protocols.

Xanadu was complex. This complexity stemmed from the best of intentions: Nelson insisted on creating bidirectional links, which could lead to all kinds of benefits. With bidirectional links, you could follow a site that is linking to you. Payment systems could be enabled—something many internet users dearly wish for now. There would be ways to preserve sites whose hosting computers go down—another serious problem always hovering over the internet, as I point out in my article “Open knowledge, the Internet Archive, and the history of everything.

As Nelson said in a talk I attended in 2008, "Berners-Lee took the easy way out!" The web has one-directional links, and therefore suffers from all the aforementioned lapses that Nelson claimed would not have plagued Xanadu. To drive home how conscious this choice was, let's go back to Berners-Lee's proposal, mentioned earlier:

"Discussions on Hypertext have sometimes tackled the problem of copyright enforcement and data security. These are of secondary importance at CERN, where information exchange is still more important than secrecy. Authorisation and accounting systems for hypertext could conceivably be designed which are very sophisticated, but they are not proposed here."

Reams have been written, virtual and real, about the ramifications of Berners-Lee's prioritization. But history's verdict is pretty definitive: the easy way out is the right way. Like the larger internet, the web does not try to track down lost resources or assign value to traffic. Its job is just to get information from source to destination.

Open, simple, generative: these traits allowed the web to succeed where other systems had tried and failed. Web standards are debated, endorsed, and maintained by the nonprofit World Wide Web Consortium (W3C).

Berners-Lee also happened to come along at a good moment in computer history. He invented the web in 1989 and it picked up steam a couple of years later. This was the very time that the general public was discovering the internet, that odd research project used only by defense sites, scientific researchers, and a few related companies. (Most of the internet access points were run by the U.S. government until 1994.)

People had long been using non-internet discussion forums and clunky ways of transferring files. Much of the traffic now moved onto the web. And this leads to the next stage of the web's generativity.

Port 80 Finds New Uses

We need a bit of technical background to understand what happened next on the web.

Most computers run multiple programs. When a computer receives network traffic, it figures out which program to send it to, thanks to an arbitrary number called a port that is attached to each packet. A person may be talking to friends using internet relay chat (TCP port 6667), while receiving mail (TCP port 25), and so on. The web was awarded TCP port 80.

Meanwhile, to prevent malicious attacks, network and system administrators usually place firm restrictions on the ports where they accept traffic. In the mid-1990s, with the growth of the internet and the creation of high-speed, always-on connections, restrictions on ports hampered the introduction of new services. People would hear of some wonderful service that could enhance their productivity, but the network administrator either didn't trust the service or was too busy to reconfigure the network so it could send and receive traffic on the new port. The internet was experiencing one of the biggest booms known to any industry in history, and innovators were stymied by this odd technological straitjacket.

More and more, innovators gazed yearningly at the one port that was always guaranteed to be open: port 80, thanks to the universal adoration for the web. And so the developers made a momentous decision: they would violate the tradition of providing a unique port number for their application, and send everything over the Web. The user's web browser would receive and process the traffic. (I put this move into a broader context in a memoir, near the end of the section "Peer-to-peer (early 2000s) and the emergence of user-generated content.")

Although slapped with the censorious term "port 80 pollution" by network administrators, the movement toward web services succeeded beyond its wildest dreams and brought along Software as a Service. Many people spend all day on their browser, moving between Facebook, Google, Salesforce, etc.—with all the traffic moving through port 80.

Berners-Lee's HTTP protocol has now gone far beyond the web. It's the communications protocol of choice for the loosely coupled architecture known as microservices. This story is covered in my article "How the RESTful API Became Gateway to the World." That's the generative web in motion.

Web Hosting and the Democratization of the Internet

The simplicity of the web drove early adoption. Berners-Lee based HTTP on other standard protocols, making it recognizable to administrators. Meanwhile, he based the language for creating web pages, HTML, on an older standard called SGML, but made it rock-bottom easy to learn.

Furthermore, new HTML users could learn how to do cool new things with it just by viewing the source code to the web page in their browser. (I am indebted to Tim O'Reilly, who was my manager for many years, for pointing this out.) This transparency also applied to the languages for rich formatting (CSS) and dynamic web pages (JavaScript). Eventually, CSS and JavaScript were moved into separate files, and developers started shrinking or "minifying" code to save download time. Still, users could look into the files to study how to make web pages. People quit jotting their ideas into journals they shoved in desk drawers, and put their ideas up on the web.

As long as the internet ran on corporate servers, the professional administrators who managed the hardware and software could set up a web server like any other internet service. They could host a few web pages as well as the database that undergirded dynamic content. (See again my article "How the RESTful API Became Gateway to the World" for a history of dynamic content.) Everybody went through the administrator to put up a new web page.

The requirements changed in the early 2000s when millions of individuals started blogging, posting comments, and eventually uploading photos and videos. Tim O'Reilly coined the term "Web 2.0" for this explosion of individual contributions. Content generation was splitting off from web server management. The need was filled by content management systems (CMSes) and web hosting. Thousands of services now help people create their own web pages, providing CMS tools, databases, backups, and guaranteed up-time. Two of the most popular CMSes, WordPress and Drupal, are open source.

The open web depends on hosting. But you do give up some control when using a hosting service. A lot of sophisticated web operations use parts of the HTTP protocol that require control over the web server. A hosting service can also take down sites that it finds objectionable. (On the other hand, a take-down is less painful than hosting the site yourself and being sued or prosecuted.) The software that makes it so easy to build an attractive web site can also be limited or buggy.

The irony of Web 2.0 is that people can easily generate and disseminate content (sometimes racking up earnings in the hundreds of thousands of dollars) because of the technologies' simplicity—but at the same time cede control to social media platforms and other sites.

Many visionaries are trying to decentralize internet services, to make them more like the early days when most internet sites hosted their own servers. Various alternatives to centralized services exist, such as Jabber for chat (standardized now as the Extensible Messaging and Presence Protocol or XMPP) and Diaspora for social media. Proposals for decentralized services based on blockchains and cryptocurrencies revive Ted Nelson's goal of an internet where individuals can charge micropayments.

Accessibility Remains a Problem for Many

The resources of the web should be available to everyone, but many factors hold access back: lack of internet connections, censorship, and non-inclusive web design. The article ends by discussing these issues.

Lack of Internet Connections

For years, the computer industry and the mainstream media have taken always-on, high-speed internet access for granted. The people working in those fields have internet access, and all their friends and neighbors do too. (Ironically, I am writing right now during a rainstorm that has cut my internet access, helping me to remember how privileged I usually am.)

The people who usually lacked access had far greater worries—lack of food, jobs, health care, or physical safety—and did not make universal access to the internet a major rallying cry. After the COVID-19 lockdowns revealed that children were being denied an education and adults were cut off from critical information because of limited internet access, some governments—although reeling from the pandemic—did start to look at solutions.

Earlier, some governments and NGOs had found ways to provide information through other media. The previous article in this Open Anniversary series mentioned Endless OS, which distributes computers loaded up with resources such as Wikipedia pages. Although internet access is richer, print-outs and computers can still provide desperately needed educational resources.

Censorship

Censorship is a more selective denial of internet access. There is no doubt that dangers lurk on the internet. Child pornography, terrorist recruitment, trade in illegal substances and stolen information—it all goes on. Censors target these problems, but also crack down on content that they consider politically or socially unacceptable. Because censorship requires central control over the gateways through which all internet content flows, censorship is usually found in highly centralized societies with strong central governments.

Because all of us know of some internet content we wish wasn't there, I will not argue the moral or political issues behind censorship. The topic of this section is what people do to get around it. The main remedy that has emerged is called an onion routing network. TOR, which was originally partly funded by the U.S. Navy, is the best-known of those networks today.

In an onion routing network, people who oppose censorship volunteer to host access points. If I want to reach a human rights researcher or (less sympathetically) want to buy ammunition online, I download a list of access points. I then send my message to one of the access points.

The access point has a list of other nodes in the onion routing network, and forwards my request to one chosen at random. The second node then routes my request to a third node, and so on. Like an onion, the anti-censorship network has many layers in order to make it hard to trace who sent a message to whom. The final nodes in the network routes my request to my recipient.

Because the lists of access points are public, censors know them too. It would be possible to block them all, and censors sometimes try. But the access points are numerous, change regularly, and often serve other purposes besides routing through the network. Sophisticated nodes introduce random delays between receiving a message and passing it on, to make it harder for a snooping observer to realize that the two messages are related.

Back to my successfully delivered request. Some information must be stored somewhere to allow the response to come back to me, and that's the feature of onion routing networks that is most vulnerable to attack. Like other types of cybersecurity, designers of onion routing networks are in constant competition with attackers.

Non-Inclusive Web Design

The final barrier I'll discuss in this article is web page designs that require a visitor to have good eyesight, good hearing, a steady hand, or some other trait that parts of the population lack. When advocates for the differently abled talk about "accessibility," they refer to designs that present no difficulties to anyone, or (because that's hard to achieve) offer workarounds for people with difficulties. Examples of accessibility features include:

  • Supplementing different colors with other visual or textual cues to the differences in a web page
  • Allowing text to be enlarged by the viewer
  • Offering a textual description for each image, so that a person using a screen reader gets the most important information
  • Adding closed-caption text to videos
  • Allowing visitors to select elements from a screen without having to point and click
  • Using familiar or standard design elements, so that visitors can apply knowledge they have learned from other sites

Many online tools exist to help designers check accessibility. In the United States, web sites should do whatever they need to conform to the American with Disabilities Act. Many companies also try to require accessibility on all their web sites. But most designers don't understand where their designs can exclude visitors, and guidelines often go unheeded.

Conclusion

The web, linking the world and teeming with information, has created one of the most open environments humanity has ever known. Its lapses and weaknesses are painfully visible, certainly. Simon St. Laurent, a content expert and web programmer who reviewed this article, complains, "The Web largely failed to realize the read/write vision that Tim Berners-Lee started with, and instead we have endless silos with various degrees of openness." But although we should address the web's problems, their critique underscores the web's amazing achievements. Indeed, some science fiction authors have suggested that the internet will survive civilization itself. Let's use the web to make sure we keep both.
 

Read previous post

About Andrew Oram:

Andrew Oram

Andy is a writer and editor in the computer field. His editorial projects at O'Reilly Media ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. Andy also writes often on health IT, on policy issues related to the Internet, and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, USTPC.