Digital Economy Dispatch #102 -- The Secrets of Sustainable Software

Digital Economy Dispatch #102 -- The Secrets of Sustainable Software23rd October 2022

I remember the first computer program I ever wrote. I was 16 years old, and the school had taken a small group of us to visit the local university. I’m pretty sure it was the only place in the region that had a computer. Certainly, none of us had ever seen one before. After talking with us all for a couple of hours, we were each handed a piece of paper they called a “Coding Sheet” and invited to copy a dozen lines of text onto the sheets, taking care to replace the blanks with the 2 integers we wanted to add together. Apparently, these special instructions told the computer to add the 2 numbers and print out the result. We handed in our coding sheets and were told we would get the results returned to us in 10 days.

Eventually, after a long wait, we each received an envelope containing a printout from the computer. Mine was very brief and contained just the following:

Line 3: Syntax error

Somewhat disappointed (but not downhearted!) I fixed the spelling error that I’d made on line 3 of the coding sheet and resubmitted it. I eagerly waited another 10 days, ripped open the envelope, and read the results:

Line 6: Syntax error

And that’s how my education into the joy of software engineering began. The rest, as they say, is history.

That was Then, This is Now

Much has changed in more than 40 years since I wrote that first failed program. From those early experiences in computer programming, we have seen an explosion in the development and adoption of digital technology. At such a pace that it has been argued that the world now runs on software. There are estimated to be more than 2 billion computers in use around the world. The core of a computer, the CPU, is now capable of running trillions of instructions per second. Their price to performance ratio has improved so greatly that they are embedded in almost every conceivable device. Every day artifacts such as a modern car may contain hundreds of computer-controlled devices on the inside.

One obvious consequence of these developments is a growing recognition that software is now a key resource powering major parts of business, industry sectors, national infrastructures, economies, and society. When it works well, it smooths the way to delivering products and services with quality and efficiency. When it fails there can be dire consequences, including loss of life, huge financial costs, and disruption to our way of life.

Unfortunately, there are significant concerns that too much of the software in use is not very good. Errors, misunderstandings, and poor execution have led to frustration on behalf of organizations sponsoring development efforts, and complaints from end users struggling to carry out their tasks. The danger is that advances in hardware capacity are squandered by the software’s ineffectiveness. A point that is captured in a well-known joke told often in various forms:

Software Engineer: If the automobile industry had developed like the software industry, we would all be driving $25 cars that get 1,000 miles to the gallon.

Automotive Engineer: And if cars were like software, they would crash twice a day for no reason, and when you called for service, they’d tell you to reinstall the engine.

In its defence, writing software is a tough job. It is complex to build, requiring a combination of skills that are difficult to develop and maintain. As a result, top programmers must bring together the computer science knowledge to design effective algorithms with the engineering practices necessary to create programs that are correct, efficient, and appropriate. All of this while gaining insights into the application domain sufficient to ensure solutions meet the needs of potential users. Perhaps it is no surprise that mistakes are made.

Time and Tide Wait for No-one

Yet, these challenges alone do not fully account for the poor state of software. Much of the fault lies not in the skills of individuals in the process of how it is conceived and created. Rather, it is the longer-term effects of how it is managed, manipulated, and maintained. Over time, large amounts of software are created, updated, amended, and reworked. Without a lot of effort, this patchwork of pieces spins out of control.

This collection of programs, referred to as a software codebase, must be managed as a major asset. Organizations that rely on software for their success (and today that means all most all of them) require a disciplined approach to manage their software codebase. Something that is unfortunately often lacking.

The key to improving the state of today’s software, according to a recent book, is to see your software codebase as a dynamic, evolving artifact that is built to be changed and adapted to the world around it. Written by experts at Google, the book describes the software engineering practices in use by thousands of Googlers maintaining many millions of lines of software over the past 20 years. The authors’ assertion is that there is one aspect that is central to improving the quality and effectiveness of the codebases at the core of today’s software-driven systems: Sustainability.

Their perspective is grounded in an important distinction between “programming” and “software engineering”. Seen through their experiences, they take a very pragmatic position: software engineering is “programming integrated over time”.

In programming, we learn the basics of how to write a computer program that “works” because it is syntactically correct and performs a useful function. On the other hand, software engineering demands we understand that the construction, deployment, and maintenance of computer programs requires collaboration across teams, negotiation between stakeholders, choices amongst alternative approaches, updates to correct errors, addition of new features, and so much more. The goal of software engineering, therefore, should be to make software sustainable. That is, ensuring that it is able to react to necessary change over its lifetime from initial design through to its deployment, maintenance, and eventual disposal.

“Your organization’s codebase is sustainable when you are able to change all of the things that you ought to change, safely, and can do so for the life of your codebase.”

To build sustainable software, the book identifies three fundamental principles as the basis for Google’s sustainable approach to managing its codebase:

  • Time and Change. By understanding the expected lifetime and usage contexts for the software, appropriate mechanisms can be provided to support its adaptation over time.

  • Scale and Growth. In line with the software, the organization itself will need to constantly adjust to manage the adoption and evolution of the code.

  • Trade-offs and Costs. With appropriate metrics and measures, the decision-making processes are improved to surface key risks and make appropriate judgements on the best way to proceed.

In the case of Google, many of the lessons they learned over the past few decades have been a result of the speed at which they have grown. Unsurprisingly, this has required significant technical advances in how they solve problems and deliver new capability to users. However, they also recognize that organizational adjustments have been essential to overcome the scaling problems. This has led to a combination of technical and organization shifts that has allowed Google to turn scale to its advantage.

As a result, bringing flexibility to the Google codebase has required a focus on 5 characteristics of this technical-organizational relationship:

  • Investing in cross-organizational expertise in change management in the Google codebase. For example, building a shared understanding of how to manage the impact of upgrades to compilers and other core software engineering infrastructure.

  • Creating a regular software release rhythm to build confidence that stable software can be released regularly and often. This has required sophisticated support for devops and continuous integration practices.

  • Processing all software regularly to upgrade it to conform to the latest standards and tools. Specialist engineers are identified to manage this complex and error-prone process.

  • Building familiarity with all software to look for optimizations that can be made to improve its quality and performance. No code is considered static or completed.

  • Defining a common set of policies and guidelines enforced across all teams to ensure consistency of the codebase. Continuous communication is essential to socialize and share appropriate behaviours regarding these policies.

In this way, Google moves away from a focus on producing code and builds a more robust set of practices and tools to manage its codebase so its teams can collaborate on its evolution for as long as it needs to be used.

Of course, as always, the devil is in the details with any effective software engineering approach. Much of the book provides details and examples of how Google implements these principles of software sustainability in practice. Many of these provide useful insights into how a perspective on increasing sustainable provides opportunities for Google to deliver more and better solutions for its customers.

The Sting in the Tail

As we come to realize how much we rely on software, questions about its quality and effectiveness must be addressed. Software must not only work, but it also must be sustainable: Changeable over time, supported by scalable practices as its use grows, and managed in a disciplined way to allow trade-offs to be made. These principles underlie Google approach to software engineering. There are many useful lessons we can all learn from the new book describing Google’s approach to software sustainability.

Digital Economy Tidbits

The State of AI 2022. Link.

Just released, the latest State of AI Report 2022. This is a very useful summary of trends and a goldmine of stats and charts about the current state of AI.

Now in its fifth year, the State of AI Report 2022 is reviewed by leading AI practioners in industry and research. It considers the following key dimensions, including a new Safety section:

  • Research: Technology breakthroughs and their capabilities.

  • Industry: Areas of commercial application for AI and its business impact.

  • Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.

  • Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.

  • Predictions: What we believe will happen and a performance review to keep us honest.

Key themes in the 2022 Report include:

  • New independent research labs are rapidly open sourcing the closed source output of major labs. Despite the dogma that AI research would be increasingly centralised among a few large players, the lowered cost of and access to compute has led to state-of-the-art research coming out of much smaller, previously unknown labs. Meanwhile, AI hardware remains strongly consolidated to NVIDIA.

  • Safety is gaining awareness among major AI research entities, with an estimated 300 safety researchers working at large AI labs, compared to under 100 in last year’s report, and the increased recognition of major AI safety academics is a promising sign when it comes to AI safety becoming a mainstream discipline.

  • The China-US AI research gap has continued to widen, with Chinese institutions producing 4.5 times as many papers than American institutions since 2010, and significantly more than the US, India, UK, and Germany combined. Moreover, China is significantly leading in areas with implications for security and geopolitics, such as surveillance, autonomy, scene understanding, and object detection.

  • AI-driven scientific research continues to lead to breakthroughs, but major methodological errors like data leakage need to be interrogated further. Even though AI breakthroughs in science continue, researchers warn that methodological errors in AI can leak to these disciplines, leading to a growing reproducibility crisis in AI-based science driven in part by data leakage.