The Sorrow of the Cloud Messengers
Swallowing the spatio-temporal vortex of high volume transactions in the 21st century
In 1913, British composer Gustav Holst (whose name clearly betrays signs of a mixed Swedish, Latvian and German ancestry), released a somewhat ambitious piece of choral music called The Cloud Messenger, based on a famous Sanskrit poem about an exiled spirit who persuades a passing cloud to take a message of love back to the Himalayas where his loyal wife awaits his return. The Cloud Messenger received lukewarm reviews at the time and ultimately forced Holst into a depressive retreat with Clifford Bax (brother of composer Arnold Bax), whose obsession with astrology prompted him to work on his most famous and successful work The Planets Suite. It received a revival of interest recently, but surely not because of the gathering storm of cloud computing.
Supernatural forces notwithstanding, the idea that something as diffuse and intangible as a cloud could be trusted to carry messages (of love or otherwise) reliably is surely a lovely romantic notion, but perhaps not the obvious one, at least until we reach the 21st century–when surely a whole generation of Internet users might actually be unaware that clouds are anything other than message passing platforms on the end of a wireless router.
Joking aside, the infrastructure we now call cloud has grown in size and stature, and some nagging issues are beginning to surface that could make one wonder whether the Internet is not ruled by supernatural spirits after all. This is true not least concerning the ability for everyone to have a shared view of data.
Is everyone seeing the same message at the same time? What does “at the same time” even mean?
These are old issues, falling under the umbrella of the larger so-called consistency problem–but remarkably they are still not solved for the masses, except perhaps in a few cases with some hazardous workarounds. The specific technicalities are demanding, meaning that very few people have the time or inclination to look into it in any depth. It’s also probably true that most of those who do find themselves looking at the problem have backgrounds in physics, because the central issues are questions that have been addressed previously, and at length, in the context of space and time.
So what does this mean in practice? Well, let’s take a look at this in terms of one of the most critical problems faced in IT systems: high volume transaction processing (OLTP): people who deal with such issues will likely face some of the more pressing issues first. I’ve worked with some of these questions over the years, and I’ve pointed out that–as the scale of virtual computing operations grows–many of the phenomena edge towards phenomena that look a lot like quantum mechanics. There is no magic in that idea: it’s purely a scaling issue and information access issue. However, it’s one that’s still barely acknowledged and even less well understood.
Transactions, why and where?
Data transactions might not qualify as poetry, in the strictest sense, but they are certainly small stanzas carrying the corpuscular lifeblood of a good many industries, including finance and travel, not to mention critical security applications that rule a good deal of the planet. Transactions may carry reads and writes, sales and purchases, chunks of streaming content, and much more.
Aggregating and handling transactions accurately and in large quantities is such a central problem, one might think it should have been solved long ago. “Should” is maybe so, but we know technology is more cultural than rational and that trends are driven by the competitive business landscape of big companies, which are poorly suited to overturning existing ideas with innovation. So technology gets stuck in its ruts, telling itself the same self-fulfilling stories, and the world remains held together somewhat precariously with string and sticky tape — far more than anyone would care to admit .
I’ve written about the data consistency issue problem before from a Promise Theory viewpoint, and even described a technical solution to virtualized wide area transaction processing with András Gerlits, who has implemented it beautifully for the world of infrastructure. Others have written plenty too. But, let's recap'.
There are two main ways we can look at it: technically or pragmatically:
- Technically, the problem of data consistency is the age-old problem of relativity, i.e. what happens when different receivers see processes taking place from their own perspective and then make judgements that might differ from those local perspectives in which the processes are taking place. Since, I’ve written about his many times before, we can consign that to the history pile.
- Pragmatically, the problem is somehow more interesting–because the Downstream Principle in Promise Theory tells us that all the action and confusion occurs at the receiver’s end of a stream of updates. In other words the problem lies in aligning how we use data in order to be consistent. Even with the best technology, it’s ultimately everyone’s choice whether they are consistent or not.
The basic conundrum is not hard to understand. If you look at the light collected, from a scene in front of you, using different lenses, straight buildings may seem to be bent or falling over; people may seem to be fat or thin, green or blue, and so on. No matter what signal is emitted by a source, it’s the receiver who has the final responsibility to interpret it and get it right. Likewise with data collected from sensors and sources. How can we trust what is reported to us?
Ideally, we’d all like to know the “correct” version sent by the authoritative source, but in practice, just agreeing about what we see is already a good thing.
The Time Vortex from Edge To Centre
Everywhere data get aggregated from multiple sources at the user interface, into an authoritative stream, we have trouble with defining and managing consistency. Today, two eminently practical standout examples come to mind in the IT industry: microservices and digital twins.
Microservices are a design concept in which one breaks up a software system into small number of human-manageable parts, which often leaves the management of the technological whole in a state of some ambiguity. The sum of microservices in a project needs to work together as a loosely coupled monolithic service from the user’s perspective, and this means that the quasi-independent parts have to be changed in step each time vital changes occur, e.g. when protocols, code, configurations, and authorisations for the whole are altered. This leads to a distributed consistency issue.
Similarly, when building a “digital twin” model, i.e. a virtual image of some physical phenomenon as a consistent picture from, say, a number cameras and sensors in smart environments, we need to know which data belong together to create a snapshot that can be aligned semantically according the possibly changing context. Clocks can sometimes be used for this when one is piecing together data forensically from static sources after the fact, but clocks are not a good guide to this for dynamic realtime updates, because they don’t take into account latencies and races to First Come First Served queue processing, to mention just two issues.
The problem of perceiving or inferring alignment of multiple parts within a whole is exacerbated by high volume parallelism. If data take entirely different paths from edge to centre, how shall we compare them fairly? After all, the receiver's view is always their truth. There are different ways of looking at high volume transactions (where “high” is always to be understood relative to the capacity to process). Moreover, any interpretation has to be seen through the lens of the technology one is using to transmit and store the data. In fact, all these issues can be addressed in terms of policy and causality, but this is not yet how current software works.
A pragmatic engineer might (and indeed, one did) see consistency as a simple problem of synchronisation of partial shared state between databases in SQL. This view is certainly encouraged by the rise of microservices paradigm that tend to use SQL backends. I once called microservices “a temporary aberration in the history of computing” because they are really a human workaround for a problem that has typically been solved using compilers in other cases, e.g. Remote Procedure Calls (RPC). What’s missing to make a complete compiler technology for distributed applications is an underlying distributed-linker bus with standard semantics. An “enterprise bus” or smart data pipeline would enable an RPC-style compiler to build distributed applications like this just like any large software project, but such a concept is only now being imagined on top of the basic bus. Such an enterprise bus is (of course) precisely a Cloud Messenger.
The need for this kind of bus was sort of foreseen by LinkedIn when they built Kafka many years ago as well as a number of message queues. Combined with an Actor Model style approach to programming (very Promise Theory compatible) has led to a number of developer frameworks for working on distributed systems. Surprisingly, the consistency or alignment compatibility of data changes is rarely addressed, or is handled only clumsily with distributed consensus algorithms like Raft, which actually solve a rather different problem that only approximates data consistency over relatively slow timescales.
But most folks used simple databases. Surely we have fixed all these issues for databases, right? There’s Paxos by a famous person, and Raft by the friendly open source spirit in the sky, Google’s brute force approach to Spanner, and so on. Every database has some consistency thing built in, and by now we’ve largely become bored of the ACID versus BASE (eventual consistency) arguments, because–let’s face it–all consistency is eventual: it’s just a matter of how long. And who really cares about it anyway until something goes wrong? Well, it turns out that we can solve the issue in cloud infrastructure. It requires only the intellectual buy-in to do so.
From SQL to KV
An early form of enlightened cloud messenger speaking consistently was implemented some years ago by András Gerlits for SQL-only silos. It allowed independent databases to share a common table, so that no matter who wrote to it or when, all databases would only be able to read new data after synchronisation was completed. This can be done in a way compatible with the CAP conjecture, to bust through some of its obvious myths. Even this early appetite-whetter already addressed a few common use cases, but it was more of a proof of concept implemented as an application patch than a reliable replacement infrastructure for Next Generation services. For one thing, it was tied to SQL. Whatever the dominance of SQL, it is by no means the only show in town anymore. One would prefer a solution that could be applied to any kind of data store.
Then we joined forces to define a more fundamental solution, based on the idea of a causality clock. Ultimately all data have to live in key-value stores. So the fundamental issue of one of creating a safe and scalable and consistent data store (or distributed ACID-store in the appropriate lingo).
It’s perhaps true that the importance of having absolute instantaneous consistency has been exaggerated in some cases. This may be why many feel they can get away with fudges using the Raft protocol and data replication to advocate data safety. Synchronisation of partial tables between standalone databases is nice to have for many applications, but it doesn’t cover the full spectrum of issues. However, there is one case where we critically need global consistency to be updated in a precisely atomic manner: that’s platform configuration changes, or behavioural policy changes.
Changing the definition of a database schema, for instance, could be catastrophic if mishandled, and would at best lead to non-reconcilable data vectors. Imagine if relativity alone were allowed to uncontrollably introduce skewed data semantics on a data-dimension level. It would be effective corruption of the data. Moreover, let’s say we change the security access control or sharing set on data and for a split second the policy is different across different hosts, then the security perimeter of data has been compromised. Data could, in principle, be tainted–especially under high volume traffic, where the density of requests is high enough to probe the small inconsistency. It might sound like a minor issue, but if this were a firewall breach, you might consider it differently.
Management of smart spacetime in the dataverse
Nearly all issues in IT come back to some version of configuration management or the management of policy. When cloud computing began to dominate operations, in the 21st century, the management of policy and configuration reverted from the carefully engineered agent models of the 90s to more traditional old fashioned “push based distribution” models, in which software was packaged (now in containers or virtual machine images) in ready-to-go, off-the-shelf immutable configurations. A reconfiguration could then only be done by a redeployment, and any different configuration requirements now have to be provided as different pre-packaged versions. The CFEngine-like model of post hoc realtime customization for hosts was rejected by the cloud generation–reverting to the push-upload models of software like Ansible and Terraform.
Configuration space (the imaginary space of all the configurable settings on all computers) used to be a purely abstract idea–something within computers, at point locations. This is no longer true: the unpacking of configuration into packages, or encapsulated regions, “containers” etc, means that configuration space is now basically mapped into actual spacetime locations by virtualization.
What all this means is that configuration is now literally laid out extensively over physical locations. Spacetime itself is the new computer configuration, just as phase space externalizes dynamics in classical physics (but quantum physics reminds us that this is naive). Crucially, those locations could now also be remapped at any moment by cloud management. What is the location of a job in the cloud? It depends when you ask.
This kind of mapping of interior state into real physical space is also what happens in quantum mechanics, and it leads to “spooky” effects like tunnelling through barriers and uncertainties about boundaries. That might sound like something theoretical, but it all means potential breach of the container security for hosted processes.
For now, no one particularly cares about such details. There’s a pervasive belief that as long as we encrypt everything that fixes any encapsulation, consistency, or sovereignty issue. The simple answer is, not in all cases: only if the interaction between processes is purely data based. Shared resources, like CPU and memory starvation can still occur as contained processes interfere with one another due to underlying dependencies on CPU and memory. Encryption doesn’t prevent Denial of Service attacks, for instance. Dependencies are not all data-based (no pun intended). The platform itself is the fundamental channel for influence.
Accepted truths finish the race last
There’s a second spacetime issue that jumps out at us too, which is more a failing of modelling than of technology. We tend to view the world as a series snapshots, as though every frozen moment of a dynamic process is a new somewhat fickle truth, rather than what it really is: merely a step in the causal evolution of the system. For this reason, databases almost exclusively operate with a write policy of “latest value wins”, as if each moment brought us closer to a final truth. But transactional updating streams are typically random arrival processes, especially when distributed over wide areas, so this means that database values are basically random numbers, which not only makes it hard to debug potential errors, but also means we have accepted unpredictability on a fundamental level (without modelling for it).
Consistency doesn’t protect you from contention: if users want to fight each other by committing conflicting values, there’s nothing in any database technology to stop them.
What does this mean for data semantics? It certainly makes the semantics of data hard to manage. Versioning filesystems, with serial and parallel versioning, have been with us for over a decade, and version control is already ubiquitous in programming process management. Why not the same historical integrity/security for databases? This is now being addressed in projects like Omniledger and XTDB, with their “immutable” timeline stores and interpretational policy semantics. Perhaps this is too little, too late, but we always have to win the attention of the crowds before technologies become available to all.
Part of the reason for data inconsistency is precisely the ambiguity of order in random parallel processing. There are usually no clear semantics for data because there is no clear process at the receiver to curate any. This is easily defined more clearly using the data vortex ledger model. Indeed, one could eventually imagine Omniledger and XTDB merging to provide full semantic control: the Time Lords of IT could prevail!
What emerges from just a tiny rethinking of the physics of processes in the cloud is that managing them — handling streams and transactions — has become simply a matter of managing spacetime–like tending a virtual garden rather than processing and recycling the usual arbitrary dumping data into un-managed landfill. Data management and knowledge management are now about managing space and time in a smart way.
NDN, the CDN of dynamic data sharing
Let us not forget the future. Nothing is ever truly new, of course, but the desire for something better will always try to shine through the long shadow cast by ubiquity. Everything has happened before and will surely be reinvented again (except the actors in Battlestar Galactica). After all, most things happen by accident — and what happens on purpose is usually fought with all the might of a sore loser.
We made a simple oversight in the development of networking. Separating the issues of data management from data communications is really a nonsense that has persisted for decades. It persists largely because analysts and investors arbitrarily insist upon certain simplistic categories for their market analysis — boldly "misunderstimating" the public markets. For decades now there have been proposals for “Next Generation” Internet technologies. What little progress has been made has to be sneaked in through the back door.
One such concept for resolving data management is related to the idea of named data resources. Naming of resources (e.g. using URIs) can encapsulate ideas of versioning and membership as well as just unique key value. By allowing names to be unique in both space and time., and using a robust and scalable publish-subscribe architecture, one can engineer a hierarchy of data communications that respect semantics. Such an approach could truly make a next generation Internet that respected data integrity on a semantic level.
Try to imagine a world in which all basic Internet technologies routed information like smart data pipelines, from source to destination in the first place. Then, attach the entry-points for change to the data time-vortex ledger! What one has then is all the tools to transparently manage consistent data through policy defined subscription channels automatically. The Promise Theoretic Downstream Principle could then be implemented at the flick of a configuration option. This is the goal of the Omniledger.
The designers of the cloud services weren’t thinking at all about sustainable scaling when cloud technology was developed, they were selling the familiarity of old physical networking architectures, and all their bad habits to boot. Load balancers were a hack, born into physical networks when they were purely physical, immature, and isolated. The idea of placing a point load-balancer in front of a service is called a traffic jam by any other name. No one did this because it was the best solution, it was merely a necessary workaround. But that workaround became deified by current cloud offerings.
Content Delivery Networking (CDN) on the other hand, belonged to the next generation of thought. CDN load balancing works in a more sustainable way than ordinary load balancing, and also offers data replication from an apparent single service point. The same ideas can be used to improve all data services, with smart data flows. The rational approach is the look to version control, i.e. to adapt the way we use names, and to use directory services (as indirection lookups) to point to multiplexed copies that are replicated for fast access. One looks for contextually appropriate versions from the timeline.
Basic CDN still follows the old model of “last version wins”, but an upgrade to it was proposed many years ago with Named Data Networking or NDN, which adds a measure of version control to the mix, making every version a winner in its own epoch. NDN didn't come to full fruition either, but some of its ideas will seep into the creeping incrementalism we call innovation in the future.
In NDN, data are basically immutable (presenting a separate garbage collection issue, but it’s a clean separation) so that semantically named objects (something like the semantic web idea of URIs) can be located just with a simple distributed hash table. One trades a consistency problem for a naming problem — but naming is a problem we understand better than spacetime dynamics.
Coda
Technology has come so far and yet, in some ways, nowhere at all compared to the scope of what we expect from it. Data technology has penetrated society so quickly that we somehow expect these issues would already have been solved, but they are only just beginning to surface. It will be sticky tape and chewing gum for years yet. We know that each generation tries to turn new ideas back into the old ones they had when they were younger, but true innovators do still exist and can eventually bring about change–even to the masses given time. It might take an invention to be reinvented two or three times before it eventually settles into general adoption. Even early CFEngine could do a lot of what early cloud systems could do, and was the template for Kubernetes self-healing, but it wasn’t the right tool for the commercialization of the job. NDN pointed out some practical issues, but it was ignored too. The present cloud might be a patchwork of clunky bits and pieces, but this is the nature of the human experience. High minded spirits are frequently exiled from the world of commerce.
“Rows and floes of angel hair, and ice cream castles in the air?” Joni Mitchell once looked at clouds that way, with a flair for creatively possible. Then she also pulled herself together, noting:
“But now they only block the sun
They rain and they snow on everyone
So many things I would have done
But clouds got in my way…”
Personally, like a hopeful exiled spirit, I'm praying for sunshine.
Author's Note: In a attempt to balance the timeline and calm the great cloud spirit, this story is to be released at the moment of balance between light and dark, good and evil, ice cream and hot chocolate, the Autumnal Equinox Sun, Sep 22, 2024, 2:43 PM. :)