Never was so much owed by so many to so few - a look at the unheralded heroes of the open source world

 A hand writing the words Open Source.
A hand writing the words Open Source.

In 1940 British Prime Minister Winston Churchill delivered a rousing wartime speech. His words were weaponized to give heart to the Royal Air Force and Allies who were fighting the Battle of Britain.

Churchill's words still ring through the ages, especially in any situation where the hopes and dreams of the majority hang on the effort of a few heroes - and this is perhaps particularly relevant in the tech landscape, which relies heavily on open source software.

These are computer programs and components whose source code is freely available - to be examined, modified and even redistributed.

By making the decision to open source a project, developers run the risk that they won't make any profit for it. Still, thousands of coders around the world regularly give up their free time and pass up better paid work in order to develop and maintain open source code.

Open source is everywhere. In 2023 SynopSys released a comprehensive report using its KnowledgeBase, which contains information for nearly 200 million versions of over 5.1 million open source components that use data from more than 26,000 unique sources.

The report found that open source is so pervasive that most developers these days don't even know all the open source components in their own software.

Of 1,700 codebases across 17 industries, 76% was open source and almost all codebases (96%) contained at least some open source software.

If developers from major companies like NASA rely so much on open source components (sometimes unknowingly), this begs the question: what would the impact be on critical initiatives like the International Space Station without such software?

Victory at all costs

As much as we rely on open source, there's little chance of the ISS falling from the sky or the entire internet crashing due to a dearth of developers.

This is due to the guiding principles of open source, which foster sharing knowledge and collaborating on code.

This hasn't always been the case. In the USA the 1976 Copyright Act led most manufacturers to stop releasing their products’ source code, so they couldn't be run on competitor's devices.

As Richard Stallman, a programmer at MIT Artificial Intelligence Laboratory discovered, this had a number of unintended consequences:

"When I started working at the MIT Artificial Intelligence Lab in 1971, I became part of a software-sharing community that had existed for many years. Sharing of software was not limited to our particular community; it is as old as computers, just as sharing of recipes is as old as cooking. But we did it more than most."

Speaking on gnu.org, Stallman clarifies that at that time the AI Lab used a timesharing operating system called ITS that the lab staff "hackers" had designed and written in assembler language for the Digital PDP-10, one of the large computers of the era.

Stallman's role as an AI Lab staff system hacker was to improve this system and he noted, "If you saw someone using an unfamiliar and interesting program, you could always ask to see the source code, so that you could read it, change it, or cannibalize parts of it to make a new program."

At the time these new programs weren't known as "free software" or "open source" as such terms didn't exist.

The introduction of the Copyright Act changed the tech landscape for the worse. By the early Eighties the PDP-10 (and therefore all programs associated with it) had become obsolete.

The more modern computers of the era, such as the VAX series or the Motorola 68020, had proprietary operating systems, which required team members to sign a nondisclosure agreement even to obtain an executable copy.

Things came to a head in 1980 when Stallman was refused a copy of the source code for the lab's newly installed Xerox 9700 laser printer.

The 9700 was Xerox's first commercial laser printer and could take up an entire room.
The 9700 was Xerox's first commercial laser printer and could take up an entire room.

As he'd had access to the program for the previous XGP printer, Stallman had been able to make life easier for the team by modifying the code to generate special alerts when printing was completed or when there was a paper jam. This was much more convenient, given that in those days printers could take up an entire floor.

Stallman was faced with what he called a "stark moral choice" between shutting up and signing the NDA or taking a stand:

"The answer was clear: what was needed first was an operating system. With a free operating system, we could again have a community of cooperating hackers—and invite anyone to join. And anyone would be able to use a computer without starting out by conspiring to deprive his or her friends."

Freedom, Justice, Honor

In January 1984 Stallman quit his job at MIT and began writing GNU software. This recursive acronym stands for "GNU is Not Unix". This reflects his attempts to create an operating system and associated programs which were compatible with Unix but based on open source principles.

Soon after, he started a non-profit corporation called the Free Software Foundation to employ programmers and provide a legal infrastructure for the free software movement.

Besides working on a number of open source tools such as the text editor GNU Emacs, Stallman championed the principle of "Copyleft". In 1989 the first GPL (GNU General Public License) was released, which details a user's rights and obligations for programs released under the GPL. These include the right to use and modify the program, provided any derivatives and their source code are made available to others under the same license.

Crucially, this prevents bad actors from simply turning open source code into closed, proprietary software. The GPL license also does not deprive developers from making money from open source programs.

For instance, though the source code for Emacs was freely available Stallman was able to make money by having interested users mail him a self-addressed envelope and a cheque for $150 to obtain a physical copy of the program.

A kernel of truth

Throughout the 1980's the FSF had great success in creating equivalent open source applications for their Unix counterparts. However, they had considerably less success with Stallman's original goal of creating a fully fledged operating system.

In order to run, applications require a 'kernel', which sits at the core of the operating system and allocates resources to them.

The goal to produce this remained elusive. This was mainly because as new GNU components were released which could run on Unix systems, they became popular being used, modified and redistributed by keen coders.

This made for much more powerful programs and even attracted greater funding for GNU but the project essentially became a victim of its own success. GNU developers' time was put into maintaining ports and adding features to existing components, rather than moving on to write new components and a fully fledged kernel.

By 1990 all the GNU applications were complete but the proposed Kernel (codename 'Hurd') wasn't suitable for production use. Today over 30 years later, Hurd remains nominally in active development.

Fortunately in 1991, Finnish Computer Science  student Linus Torvalds used GNU development tools to develop a Unix-compatible kernel. Although it was proprietary at first, in 1992 Linus committed to making the kernel open source, releasing it under the GNU Public License.

Linux and GNU developers worked  together to integrate GNU components with Linux to fulfill their goal of a fully functional and free operating system.

To this day Linux still makes use of many GNU tools, which is why the project members insist on the more technically correct name 'GNU/Linux'.

Image of Linus Torvalds
Image of Linus Torvalds

The Cathedral and the Bazaar

Both GNU and Linux were committed to "Copyleft", whereby source code would be freely distributed with programs for modification/redistribution under the same terms.

Still after working on the Linux kernel, software engineer Eric Raymond discovered that the two foundations adopted very different approaches.

He published his findings in a landmark 1997 essay known as 'The Cathedral and the Bazaar', describing the key differences between 'top down' and 'bottom-up' software design.

At the time, programs like GNU Emacs followed a 'Cathedral' model, whereby source code was only made available with each major software release. Between releases the code was restricted to an exclusive group of software developers.

Linus Torvalds however favored the 'Bazaar' model, where all code is developed over the internet in full view of the public. Raymond's maxim, which he later dubbed Linus' law, stated that "given enough eyeballs, all bugs are shallow."

In other words, following a 'Bazaar' model means the more widely available source code is made for testing, the faster errors will be found.

This is in contrast to the 'Cathedral' model, which requires a small team of developers spending huge amounts of time and resources bug hunting.

Cathedral and the bazaar book cover
Cathedral and the bazaar book cover

Victory in our time

Today, desktop versions of open source Linux operating systems like Ubuntu enjoy a small but growing market share of around 3%.

Given the popularity of open source server software like Apache and Nginx, it's also safe to say the modern internet couldn't exist without Linux. Of the top 25 websites in the world, only 2 don't use its server software.

96.3 percent of the top ne million web servers are running Linux. Much of the internet's infrastructure also runs on Linux-based switch operating systems. According to the Linux Foundation's Annual Report, throughout 2022 over 52 Million lines of code were generated every week by project communities, a 13% year on year increase.

Organizations like the FSF actively recruit members and funding to further the expansion of open source projects. The Linux Foundation pays Linus Torvalds to work on the Linux kernel full time.

Popular open source software like Apache HTTP server is maintained by both paid programmers and volunteers from around the world. But this isn’t quite the fairy tale ending open source fans might have hoped for.

Blood, sweat and tears

Anyone who's spent any time in the world of open source development will tell you though that the image of a worldwide commune where thousands of programmers are lining up to freely give of their time and coding skills is overly optimistic.

Eric Raymond penned his essay partly based on his own efforts to implement the Bazaar model while working on the open source utility Fetchmail.

Some of this met with great success. Raymond released new versions early and often. Anyone who contacted him about the fetchmail tool was added to a mailing list and asked to be a beta tester. He listened and polled them about new features and sent them chatty messages in order to improve the tool further.

Raymond's strong advocacy for this approach to open source development is widely considered to be the key factor in Netscape's decision to open source their code. This led to the foundation of the Mozilla project, without which we'd have neither the hugely popular Firefox browser nor Thunderbird Mail.

Eaten bread is soon forgotten

Eric Raymond may have coined the notion that "many eyes can make bugs shallow". But there's a far older expression: too many cooks can spoil the broth. This can even be true for debugging code.

In 2006 a lone coder provide this after posting on the mailing list for Debian, an extremely popular open source Linux server OS.

His question related to an error message in 'OpenSSL' - an open source library often used for securing communications between users and websites. The user in question simply wanted to remove what they saw as a redundant error message.

Two experienced coders told him they saw no harm in doing this and the programmer duly submitted his changes as a 'patch'. However, in so doing he unknowingly crippled OpenSSL's pseudo-random number generator. This meant that it went from being able to produce millions of varied secure encryption keys to just a handful of extremely weak ones.

The implications of this were very serious. Traffic to and from websites using Debian and its derivatives like Ubuntu could easily be decrypted by bad actors like hackers and shadowy government agencies.

In 2008 the vulnerability was reported by Software Engineer Luciano Bello. The Debian community was quick to patch it but this sparked questions in the open source community.

How had their "many eyes" failed to make this bug shallow? Once the changes were accepted, how the hell had it taken almost two years to spot the vulnerability?

The war of the unknown warriors

Debian's OpenSSL debacle was a good illustration of how a lone programmer could throw a digital wrench into the works.

On the other hand, it was also the actions of one man that led to this major vulnerability in the open source tool being fixed.

One major lesson from what happened is that when multiple people are working on open source code, there's no one person who understands it fully.

Still, there are some exceptions to this rule. Even today, there are individuals who develop and maintain critical open source components and tools with little or no help. Many of these “unknown warriors”, to borrow once again from Churchill,  are vital to the success of critical open source projects.

Some entrants in this hall of fame in no particular order include:

If you've ever used a computer, iPhone or modern games consoles like the PS4 or X-Box One, then you've used this open source software library, dedicated to compressing and formatting data.

It's become the defacto standard for data compression and is used by many programs, including the Linux Kernel and Apache HTTP server, meaning it's vital to the smooth running of millions of internet servers.

The utility was originally written by Jean-Loup Gailly and Marc Adler, who handled the compression and decompression features respectively. Today it's primarily maintained by Marc Adler.

SQLite is an open source database engine and is embedded into apps every day by developers. It can be found in every major operating system, as well as all iOS and Android devices. Most televisions and set-top boxes also use some implementation of SQLite.

Despite being deployed on virtually every device on the planet that requires storing and retrieving data, since its founding in 2000 the SQLite project's core development is managed by D. Richard Hipp and just two other developers.

GnuPG (GNU Privacy Guard) is an open source encryption program, designed to be fully compatible with the proprietary program PGP (Pretty Good Privacy).

GnuPG can not only be used to encrypt files but also generates verifiable digital signatures. These are often used to sign package repositories in Linux distributions, to make sure they're authentic and not malware.

It can also be used to generate secure SSH keys to allow admins to securely connect to servers. This makes GnuPG vital to the secure running of critical infrastructure like websites.

It was originally developed by Werner Koch, who released the software in 1999 and maintains it to this day.

Like zlib and SQLite, FFmpeg is likely something you use every day without knowing it. The open source suite for handling video, audio and other forms of multimedia streams is used in many popular programs like the animation tool Blender and the VLC Media player. It's also deployed on popular websites like YouTube and handles all audio/video playback in the popular Chrome browser.

The project was led by Michael Niedermayer from 2004 - 2015, who held a key role in maintaining the core software including the ffmpeg command line tool.

Though not the sole developer, he received praise from Debian maintainers amongst others for his careful and methodical approach in examining and applying patches.

A dark and deadly valley

Those who follow the Bazaar model of open source development tread a fine line between democracy and anarchy.

While it can be helpful to have a lead developer to marshall other coders, it's not the best way to create software from scratch. There's also no way for more senior coders to force their will on others.

This became very apparent to FFmpeg in 2011, when a number of developers forked the software under the name 'libav' due to issues with the project leadership.

Naturally these problems are moot when a programmer is maintaining open source code single handedly. But this raises the question of what to do when that person is unable to continue working on a project.

In The Cathedral and the Bazaar, Eric Raymond listed a number of 'Lessons' based on his experience of open source development. Lesson Number 5 addressed situations like this by suggesting:

"When you lose interest in a program, it's your duty to hand it off to a competent successor."

There's a certain Darwinism to this maxim, as if the original developer abandons an open source project, it's likely the most efficient continuation or fork will be adopted by the community.

Raymond also credited Linus Torvals with the creation of Lesson 6:

"Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging."

This was due to his admiration of the Linux development model, which emphasized a fluid approach whereby ideas and prototypes are written multiple times before a stable release.

Never give in

People like Stallman, Torvalds, Raymond and the Unknown Warriors described above expose the paradox at the very heart of open source.

On one hand it thrives on egalitarianism and collaboration - multiple skilled teams can work together on maintaining code and with enough eyes, all bugs can indeed be shallow. When one programmer falls, others can pick up the cudgels and continue development.

Still, the impetus for many popular open source projects seems to be driven by the individual. This can be a frustrated lone coder keen to solve a particular technical problem, or a visionary who seeks a to empower othersto use software more freely.

While projects like GNU and Linux have since snowballed into a global collaboration of developers, there are still so many open source tools and components that are being created and maintained by just a handful of people. Often this work is incidental to their real jobs.

Giving Tuesday

If it weren't for some benevolent individuals, new features, bug fixes, and security fixes to critical open source projects could never happen.

This is why it's incumbent on all of us to emphasize the importance of contributing to open source development.

Tuesday, November 29 2023 was designated "Giving Tuesday", where individuals are invited to donate to many good causes.

If you are feeling generous, consider donating to NumFOCUS, whose mission is to promote open practices in research, data, and scientific computing by serving as a fiscal sponsor for open source projects.

For anyone else feeling generous, developer Josh Sherman also maintains a comprehensive list of all open source projects which accept donations.