CFEngine’s Star Trek and AI origins
A 30 year anniversary tribute
CFEngine is 30 years old this year. For a piece of software, that’s been quite a long life. A lot has happened on its journey to arrive at where we are today, but most users have now forgotten how it emerged from a young lad’s dreams of artificial intelligence. I’ve written at length about the science in my book In Search of Certainty, and even some biographical notes. but grab a coffee and gather around, and I’ll tell you a bit more.
Today, CFEngine is widely known as the configuration management tool that was replaced by Puppet, Chef, and then Ansible. “I know engineers, they love to change things!” (said Dr McCoy in Star Trek The Motion Picture). True, we love new tools, but it’s not accurate to say that CFEngine has been replaced. It’s alive and well. More importantly, calling CFEngine only configuration management isn’t quite accurate either. CFEngine was imagined to be much more than what we’ve now come to understand as configuration management. It was designed in the age before virtual machines and cloud computing, and yet its principles are still sound and applicable to much of the IT world today. To do everything that CFEngine did in modern IT systems over the years (configuration, monitoring, network routing, text processing, network orchestration, etc), you would need half a dozen different bits of software with some challenging integration on top.
I wrote CFEngine at the University of Oslo in 1993 after a deep discussion with the system administrator at the department of physics about the complexity of the shell and Perl scripts we were then using to automate the fifty or so servers we had. The physics department was one of the most demanding IT environments at the university for Unix-like operating systems. There were many flavours of Unix then: SunOS4, SunOS5 (which became Solaris), HP-UX with its Sinclair Spectrum-like rubber keyboards, Apollo workstations, IBM’s AIX in multiple versions, OSF1 on the DEC alphas, Ultrix, and more. Linux was still a pipe dream at that time.
These flavours of Unix were all quite different from one another–some based on BSD Unix, some based on System V. On top of this, everyone at the university had special needs! Today we try to make systems as similar as possible in order to manage scale, but CFEngine was designed to handle diversity and variability without breaking the bounds of human effort. It was putting the user needs ahead of limitations imposed for managerial convenience. It was as much about knowledge management as configuration management. Biology can handle diversity, why not technology?
A colleague at the university computing service USIT had written some impressive daily and weekly shell scripts to manage the university’s computing service’s interests on the machines. The idea of running the same software regularly, like an error correction loop was intriguing. Most software installed once and then went into a hands-off monitoring mode. I was intrigued by these maintenance scripts–not least the huge complexity that went just into dealing with the different syntax between systems. Half the scripts were if-then-else tests to figure out the precise version. One could hardly see what they were actually doing for all the checking.
What we were talking about was a software robot that lived not in the physical world, but whose environment was the abstract state space of operating systems. A declarative language could be used to express desired intent, but every machine had to have the robotic chops to be responsible for its own state. Back then, you never knew if the network would be up or down. The robot wouldn’t manipulate blocks or chess pieces as one thought in the 80s, but rather system files and processes.
One of the major challenges was the configuration of difficult subsystems like Berkeley Sendmail, which involved complex text editing as well as process management. One couldn’t simply install standard files because they wouldn’t work across all machines and they wouldn’t respect the special needs of what local users wanted. It would be best if people kept their hands off the hosts, but you could never guarantee that, so it was bad-form to just overwrite stuff to impose change. From this experience, the concept of autonomy and promises versus impositions eventually evolved in Promise Theory. CFEngine’s text editing language remains one of the most sophisticated models of automated editing for files, working convergently while working around what others might have done (what we now tend to call idempotently, which is not quite accurate).
Just running shell commands wasn’t going to cut it either as some software does today for relatively homogeneous Linux. For one thing, all the options of shell commands were different across the Unices. A software agent that could manage a computer as well as a human needed to have complex manipulative skills and cause as little downtime disruption as possible.
The answer was obvious: one should separate the cognitive or sensory environmental concerns and hide all that checking from the language of intent to reveal the purpose. We discussed the idea of a Domain Specific Language to make everything crystal clear, and over the Christmas break I wrote one. It was an ad hoc affair to begin with, which went through several revisions, but it did the job of exposing the intent rather than the technicalities. Simplicity was what system administrators wanted then (the story is different now, in an age of developers).
The endless if-then-else statements of scripts were replaced by a set-algebra evaluator for policy rule relevance that was likened to Prolog. I didn’t know much about AI research then, except for what I’d read in Douglas Hoftadter’s Gödel Escher Bach. I started intuitively, and only later came to try to put the ideas on a proper academic footing. Still, I’m still surprised at how often I rediscover that the right way to solve modern problems is to do what I did intuitively in CFEngine.
As a physicist, with an interest in computers, the challenge of regulating a system had several dimensions. I didn’t just want to run commands to be forgotten, like a Markov process. I wanted to understand the dynamic stability of the machines and measure their behaviour. I was intrigued by the idea of artificial life, and I imagined a system like the one in Star Trek where you could ask the computer for a diagnostic and everything would heal itself. In an early online manual, I even quoted with some self-irony from the original episode of Star Trek The Ultimate Computer, about Dr Daystrom’s M5 “duotronic” computer.
Kirk: “I’m curious, Doctor, why is it called the M5?”
Daystrom: “Well you see, M1 to M4 were not entirely successful. This one is. M5 is ready to take control of your ship.”
Kirk: “Total control?”
Daystrom: “That is what it is designed for.”
Kirk: “There are some things that Men have to do to remain Men, your computer takes that away.”
Daystrom: “The computer can do your job … One machine can do all those things that Men do now. Men can go on to do greater things…”
I was no expert on AI, but I’d just written a paper with an old colleague, Allan McLachlan, who had left physics and gone into AI research. AI was fresh in my mind.
I realised that CFEngine needed to have extensive sensory skills to understand what kind of Unix system it was working on, and have powerful manipulators to be able to change system configurations, especially when it came to editing text files that were ubiquitous on Unix. In the mid 90’s I was inspired by immunology and swarm intelligence. Polly Matzinger’s Danger Model of the human immune system was featured on a BBC documentary, and I began to develop the self-healing capabilities in reference to the ingenious methods of environmental regulation employed by our immune systems.
Eventually, I released what I then called cfengine 1.0 internally at the university, and later cfengine 2.0 in the spring of 1993 for use at CERN and gave a talk there to the CERN unix group. I published a couple of papers on it, and hoped it might be useful to others without any real expectations. Little could I know about how quickly it found a home. By then, Richard Stallman’s Free Software Foundation had started its GNU project for Free Software. I thought it was a great idea to make the code available and learn from others. I gave the code to cfengine 2.x and the project became the public version 1.0.0 of GNU cfengine. The restylised naming CFEngine was only adopted around 2009 for CFEngine 3.0 when the company formed.
Security was beginning to be a hot topic, with recent memories of the Internet Worm still burned into people’s minds. I realised that the only way to realistically make a system of distributed agents secure was to build security into the design itself. CFEngine was designed to never accept commands or instructions from other sources than its local policy. Some found the idea of not being allowed to do whatever they wanted infuriating. Others saw the wisdom of having safe limits.
As far as I know, no CFEngine installation has ever been compromised or compromisable. Like all software, it had occasional buffer overflow problems (ironically these were usually in OpenSSL encryption which was the presumed centre of security), but they couldn’t be exploited because of the way everything was isolated. Richard Stallman encouraged me to make powerful features harder to invoke to prevent accidents. These principles made sense to me, and although we parted ways later over politics, I learned lessons from his experience.
By 1997, I went to USENIX Lisa conference to give a talk about CFEngine’s stability mechanisms to a thousand or more attendees. Afterwards, I gave an informal Birds of a Feather talk about the software and was overwhelmed when hundreds of people came to the small room to hear about it. People were sitting on the floor around my feet, and I was fixed to a few square feet to stand. The software was now in use on a scale I had not expected, yet lots of things were missing from my original vision of a system of smart autonomous agents. I wrote GNU cfengine 2 after giving a follow up talk about CFEngine and a vision of artificial immune systems in 1998. This included machine learning of behavioural patterns, with the capability to defend itself against Denial Of Service attacks. In the audience was a member of the University of New Mexico who introduced me to the work of Stephanie Forrest; our groups codeveloped different concepts of artificial immune systems.
I tried both symbolic AI and Bayesian machine learning methods in order to make CFEngine as self-contained as possible. I was highly suspicious of anything that smelled of complex logical reasoning. I knew from physics that stability was the prime concern for any automatic system, and logical reasoning is highly fragile. I wanted to make sure that nothing could prevent CFEngine from self-healing a host that got into trouble. I built CFEngine around “fixed point” thinking: every desired state of a system should be an achievable fixed point, else it should be discouraged.
Though it took many years to understand all the details, my choices were surprisingly fortuitous. Symbolic methods were certain and powerful, and while machine learning had limited value they allowed the first continuous monitoring studies of computers to accomplish anomaly detection. Machine Learning quickly becomes expensive and suffers from a basic flaw: what happens normally isn’t necessarily what you want your system to do in the future. Autopilots are good at flying in a straight line, or landing at certain airports, but they can’t tell you what to do if there’s bad weather or an engine falls off. As I tried to understand the issues in depth, I stumbled into Promise Theory and operator algebras (sometimes called semi-lattices etc). Many of these methods were forgotten and rediscovered by a new generation of researchers (often at Google) in the 21st century.
CFEngine’s strict model of autonomous agents meant that there could be no client server protocols that pushed data. What we now call publish subscribe methods were the basis of all communication as a matter of policy. A version of Voluntary Cooperation based RPC was designed based on pub-sub messaging. The effect was to create something like the Name Data Networking (NDN) concept for Internet Services. Hosts would make a kind of collect-call home to check for messages without committing to anything or exposing themselves to vulnerability.
The ideas were there for a universal platform, but everything was on far too slow a timescale for modern applications. I used CFEngine as a generic distributed platform and we'd discussed its use for wide ranging network methods with Claudio Bartolini and John Wilkes at Hewlett Packard as part of a European academic Network of Excellence called EMANICS. I knew that some of those ideas belonged elsewhere, but never quite had time to rebuild them in a modernized architecture. The way cloud was eventually implemented initially violated many of the precepts I’d studied and approved and it would take another 10 years to find their way back to basic safety principles with rediscoveries like Kubernetes.
I tried some silly ideas too, like using entropy as a way of measuring configuration (as most academics do at some point), but eliminated these as systematically as I could by research. I wrote a research text Analytic Network and System Administration (J. Wiley & Sons, 2002) and started a Master’s degree at the university and was awarded the first professorship in the area of system administration.
By the mid 2000s, GNU cfengine 2 was ubiquitous. You could turn over any stone in IT datacenters and you’d find CFEngine working underneath, like a secret colony of ants. Users were usually only using a tiny fraction of the capabilities in most cases, and many were unaware of the careful thinking that had gone into the design for robustness.
Around this time, Luke Kanies, once a CFEngine user, started a company of his own with a tool called Puppet. We had met at LISA 2001, where I was conference chair. He was wandering around alone looking lost when we were setting up a dinner for the organisers, so I asked him to join us at the "high table". One time when he was out of work and Stanford University asked me to consult for them, I passed the job to him and he migrated them to Puppet, starting his company Reductive Labs. Luke was outspoken and was good at rallying the online community in a way that I wasn’t. I’m a pathologically antisocial introvert, with too many conflicting interests. I had a day job as a Professor of computer science, and I was exhausted from maintaining CFEngine basically alone. I started a company in 2008 only after seeing what Puppet was doing, partly to escape what I felt had become a dead end job at the university, and partly to be able to hand over the CFEngine support burden to someone else. Just before CFEngine incorporated, Chef software was also started by Adam Jacob (a former Puppet user) and Jesse Robbins.
Many users went over to using Puppet and Chef instead of CFEngine, as it was the latest thing. They had venture backing much earlier and poured money into community building. There was a bizarre hostility amongst some of them. I started to receive abuse online from Puppet users. It was a first taste of what the software communities can be like. It revealed a tribalism that made me pull back from engaging in community too much. Over the years I’ve received about 30% hate mail from Puppet and Chef users. Sometimes it was because I was unable to be one of the tribe.
Commercial interests had the effect of making the three tools competitors. Neither of the newer tools were as powerful as CFEngine in features or efficiency, but they were more popular for their approach to usability, partly because the web generation was encouraged to develop the Ruby code they were based on. I embarked on rewriting CFEngine 3 to solve some of the deep seated limitations in the code base. I experimented with C++ for six months and eventually abandoned it in exasperation and went back to C. I rewrote the entire code base in 3 months with most limitations removed.
By the time CFEngine 3 was ready, there was a new generation of IT people who had grown up in web commerce and didn’t have the deep system knowledge that system administrators had had before. I confess that, at the time, I didn’t think we could sell configuration management: after all it was a solved problem and already free and open–but that was only my inexperience in business talking. I watched as the competitive market analysts dragged the cutting edge research back to the lowliest commodity basics.
My idea for the company had been to employ knowledge management methods, including semantic networks and machine learning, to understand systems. After all, configuring systems is fairly easy, but understanding the monster you’ve created is hard. I put a lot of effort into this, collaborating with my friend Alva Couch from Tufts university. We didn’t believe the approaches to semantic web (and its RDF language) were correct, so we came up with a simpler alternative to support causal inference. It appeared in the first CFEngine commercial product, called CFEngine Nova. By this time, I’d already developed Promise Theory somewhat and the promise model was used strictly in CFEngine 3 to ensure the reliability, safety, and above all certainty of every aspect of operations.
Our grand plan for CFEngine's commerical product was to start with CFEngine Nova (for starburst networks), then join these into clusters with CFEngine Constellation, and finally crown the whole with CFEngine Galaxy! However, after the company took on venture funding, the smart heads of the newly imposed management removed the knowledge related and AI related parts of CFEngine — partly because the developer team wasn’t ready to understand them, and partly due to lack of a current market for them. Today, 15 years later, those ideas are coming of age.
After a rough ride with internal struggles and venture capital skullduggery. I eventually decided to move on from the company I’d started in 2014. I continued to develop some of the machine learning and semantic features, and developed Semantic Spacetime as the Promise Theory formalisation of virtual process representations. It took another ten years to make my version of the AI features work as successfully as I’d wanted, working in between other jobs when I could and developing a small graph database to support it. Then, working with CAIDA and a more powerful independent graph database ArangoDB, I mapped the whole Internet in Semantic Spacetime. Today we are in a new age of AI and everything old is new again.
The commercial environment brought about by the arrival of Puppet and Chef pulled the narrative away from my own technological aspirations, back to basics more aligned with a web generation who had mainly known Linux and the LAMP stack. Only a few advanced customers could really see the possibilities of CFEngine’s unbridled capabilities–J.P. Morgan Chase inducted CFEngine into their prestigious Hall of Innovation just as I had decided to leave.
I never called the knowledge related features of CFENgine “AI”, but in my mind there was always that vision of the starship Enterprise flying itself and healing itself like an artificial organism. The tools are more manual than before, and more developer centric than self-governing. Some of the design principles of CFEngine were carried over into Kubernetes. Open Policy Agent does basically what CFEngine’s classification system did automatically, but I like to think that CFEngine still holds its own in this space.
20 years ago I recall reading an article where the authors wrote: the good news and the bad about speech recognition is that we have a long way to go to get to the level of understanding we see on Star Trek. The good news is that, at least we have until the 23rd century to figure it out. In just 20 years, we’ve come a long way in IT hardware, but we haven’t come far at all when it comes to system automation. In some ways, we’ve gone backwards. Each generation has new tools to build because the infrastructure changes, but each also has to relearn the lessons of the past for themselves. We are not good at principles in IT.
Remarkably, both GNU cfengine 2 and CFEngine 3 are still out there in the wild running some of the largest corporations–out of sight, out of mind, keeping the swarms of systems flying in a straight line. My friend Nick Anderson, who still works for the company and has been a stalwart CFEngine supporter for as long as I can remember, tells me that the largest single deployment is currently around 3 million nodes. Few companies are happy to reveal what they do publicly, but it gives me some comfort to know that the years of effort were worth it.
Happy birthday, CFEngine. Middle age is coming for you.