Universal Data Analytics as Semantic Spacetime

Part 11: In search of answers, beyond this horizon

19 min readOct 2, 2021

In this series, I’ve discussed at length — and with explicit examples — what we can do with the principles of Semantic Spacetime to make sense of generalized data analytics, using just a couple of tools on a personal computer. For some applications, indeed for exploring and learning, this combination will take you quite far. It’s not necessary to jump into the deep end of High Performance Computing, Big Data, or Deep Learning to find answers. Some problems are certainly larger though: the long tail of the data processing power-law has plenty of hurdles and challenges, and I’ll return to these at a later date. In this final installment I want to summarize what we’ve accomplished using these basics.

Graphs want to be “alive”

In the series, I’ve tried to show how graphs are the natural data representation for active processes. Graphs are built by processes, they lay out the circuitry of flow processes, and interacting with graphs is an on-going process, not a random access story. Graphs are not merely frozen archives of passive data, graph circuitry remains an active spatio-temporal mirror of the world, with embedded directionality, and unresolved inline choices that capture complex causality. Every graph is a computer as well as a model of state. Thus graphs are samples of spacetime — with semantics.

Today, we have a battery of methods developed to calculate certain properties of unlabelled graphs, at least when their nodes are homogeneous and memoryless. We sum up weighted contributions, e.g. in Artificial Neural Networks, or “entire” Graph Algorithms such as eigenvector centrality (PageRank) etc, to expose certain structures just from the topology. But the most fundamental resource for machine learning and causal computation lies in the variable data held within graph: vertices or nodes, and their labelled edges or links. Advanced machine learning is accomplished by memory processes with strong semantics, by traversing both symbolic and quantitative information.

Graphs resist all attempts to be frozen into static snapshots or dead archives. Standardized (one might even say “authoritarian”) hierarchies, like taxonomies or ontologies, tables of contents, etc. try to define spaces, like a fixed coordinate system in Euclidean space. These are common but fail to capture the nuances of real world complexity — because these coordinate systems are only ad hoc spanning trees, i.e. incidental route maps overlaid onto a snapshot of an evolving system. As we learned from Einstein, relativity of viewpoint and circumstance forces us to change perspective. Every spanning tree yields a unique “view” or partitioning of data, but usually the deeper invariant semantics within nodes don’t fall into these ad hoc treelike hierarchies. This is why hierarchical filesystems still need symbolic links to patch them up into a usable state.

The “liveness” of network data makes graphs a key part of the animated storytelling of the world around us — from embedded edge computing, within local ecosystems, to the routing of signals and resources for control purposes. This characteristic also makes graph databases quite different (i.e. highly non-linear) compared to their static SQL cousins. The process of querying, and working with, graph data for the realtime evolution of a system requires a very different languages and more active patterns of analysis than mere retrieval from a tabular archive.

Don’t forget the Internet

Internet routing protocols were the first large scale machine learning graph databases. They enabled the “living” Internet of changing pathways and services we enjoy today to grow and evolve. Today, the data structures are highly distributed and the Internet graph spans the world, but everything began in closets in a few universities before multiplying and spreading virally, as part of their own growth process.

The Internet was designed to be a hierarchical mesh network — an “ecology” of connections, resilient to failure. If you like, it was by observing spacetime principles that we ended up with an emergent ecosystem of technologies and regional human organizations to bring about the information sharing and social network of our times. The sum of all those small patches appears incoherent and noisy, even though the reality of it spans the globe according to a high ordered set of principles.

There’s a lesson there about how processes form networks, and how networks perpetuate processes to explore them. I occasionally wonder whether we have we forgotten these lessons in our contemporary obsession with Artificial Neural Networks, Deep Learning, and large scale graph based computation such as Pregel and Apache Giraffe.

A graph in semantic spacetime is a representation of a process by an inhomogeneous graph, in which context and specialization are encoded by hard-coded labels and other data discriminators. A Deep Learning Neural Network is a simulation of a process over a relatively homogeneous surrogate network, in which inhomogeneity has to be imprinted to capture different instances — by learning the relative weights software links, in a neutral manner, to generate an interference pattern.

Smart use of spacetime

We can all do data science with just a couple of tools. Don’t be ashamed of your laptop — not every problem needs to be crushed by brute force mega-clusters. Google envy — the dream of commanding an army of computers that can crush computations at the push of a RETURN key — might excite us on an adolescent level, but science and engineering are more subtle adversaries, which don’t necessarily respond to the threat of force. There are ethical concerns too in the use of brute force: flaunting brute force calculations in massive datacentres flouts global warming concerns. The IT industry is every bit as noxious as the airline or car industry, make no mistake, when you trace its dependency graph to the energy source! Cloud operational expenditures are actually quite high too.

Of course, there remain a few large scale problems, e.g. like protein folding and other scale-dependent phenomena, where brute force may be the only option for the time being. We don’t know enough of the underlying principles to reduce these problems yet — but that will change in the future. GPUs and other specialized chips to parallelize data processing pipelines can temporarily exploit the spacetime of the computation for greater efficiency, but these still require a deep understanding of the structure of spacetime to mimic the processes. The key is always to exploit the structure of space and time in each of its localized meanings for best results.

In the previous post, I showed how two competing models of graph computation have emerged:

The direct use of graphs with weighted nodes and links to memorize process state, attaching semantics to nodes and links through precise encoding. This is what you might store in your graph database. Learning is accumulated like the well trodden path across virgin territory by updating scalar weights. One can reason by direct graph relationships (FOLLOWS, CONTAINS, EXPRESSES), as well as cache similarity measures (effective distance) on a pairwise basis (NEAR). This can be computed efficiently on a need to know basis for inference, but not necessarily for extrapolation. This is the routing algorithm model of graph analytics.
The there’s the “Stanford” approach of transforming an entire graph into multi-dimensional Euclidean embedding as a speculative vector space, using Deep Learning, and using the resulting Pythagorean distance as a similarity measure. You hand the whole thing over to a cloud provider, because the process is on an industrial scale. The results are computed up front, in their entirety, by brute force — and they remain somewhat magical and inscrutable. This embedding of processes as an over-complete Euclidean space makes a certain kind of speculative prediction and extrapolation possible. The algorithms are made one-by-one to work on specially curated kinds of graph data, and require multi-stage processing in one gigantic effort. The results are “out of control”, but could be impressive, even amazing because they’re unexpected.

Conputation is expensive. I think we’re also fast approaching an age in which we’ll have no other choice than to think much more of utilizing the ubiquitous edge devices we’ve spent decades acquiring —letting the edge processes themselves compute their answers in real time. Those origin graphs can be viewed as the accumulation of many small steps, distributed over space and time. Perhaps we can look forward to a new age in which small is once again considered beautiful, and the common ecosystem of virtual information circuitry surrounding us is what will define our human experiences.

The problem with brute force computation, marshalled as a bulk operation, is that it’s really expensive unless one exploits the graph of process interrelationships directly. Forget about simulating everything later, do it in real time. Part of this pain is self-inflicted. The culture of simulation doesn’t lead us to approach problems with an eye to making them efficient. Computing culture is to throw brute force at problems.

Instead of using multidimensional vector spaces as a paradigm, could we instead use bit operations to optimize similarity and discrimination with AND and OR? Interferometry (see part 8) is a powerful tool for merging parallel processes — analogous to methods of quantum computing. If a calculation is interrupted, could we pick it up where it left off, or do we have to start the whole thing again? Calculations can also be treated as “living evolving state” rather than snapshot batch jobs. These are all problems for carefully engineered data pipelines, and an extended caching hierarchy.

It’s really pipelines all the way down.

Digital twins and data centralization

As an example, consider the digital twin. Now, if ever, is surely the age of the digital twin. Digital twins are supposed to represent facsimiles, shadows, or cybernetic ghosts of physical devices — connecting a real world to a virtual world. They live somewhere in “the cloud”, while the source devices haunt the edge of reality. Good old fashioned monitoring dashboards are, in a sense, poor-man’s examples of digital twins. They collect the numbers, but we’ve still a long way to go to make good use of the data semantics.

One can easily kill off the remnants of semantics, gleaned from edge sources, by transforming data into a dead knowledge representation — like fixed schema databases, designed for random access. Processes are not random access structures: they have causal topology.

For example, if one has a process driving changes, the events are typically captured event-by-event as log files, at the point of origin as timeseries. Later, these may be merged into a single muddled thread, in which origin is lost. Today, logs are typically uploaded into relational SQL databases to throw at Elasticsearch, for post hoc brute force searching. Such random access stores can’t describe the process timeline without relying on artificial timestamps or auto-incrementing keys, which have no invariant meaning.

A graph database would encode such a proper-time series using FOLLOWS edges, and separate annotations about events with EXPRESSES ,CONTAINS, and NEAR links. Auto-incrementing numerical keys are okay, until a process bifurcates into parallel threads, or merges from several parallel threads, at which point all bets are off. The original causality then gets eliminated by projecting the original causal graph into “universal continuum time” (this is the problem with Euclidean spacetime — it has only a single average timeline, and thus gets muddled by generating entropy). A graph can easily bifurcate to avoid this. A series of timestamps may also pass through several timezones, and derive from independent clocks that are potentially unsynchronized, as processes migrate from host to host — thus “Euclidean” exterior time ceases to have any meaning. Semantic spacetime explains how to deal with these issues. Graphs are the key.

A graph database can retain the original causal relationships (of proper interior time) for each process, as well as fork and abstract these in order to relate them to invariants for context and semantic inference (see parts 7 and 8)— to model events.

Ultimately, there’s the issue of where to keep the data for digital twins. It seems like an odd idea — to manage a thing by making a copy of it somewhere else, a bit like honouring celebrities with weird Madame Tussaud’s waxworks. You either feel proud or embarrassed about the likeness — but they are dead things, not living representations. Yet, we try climb the ladder of representation, comprehension, from mere data capture, via perspetives, interpretations, to an eventual understanding (see figure 1), and twins are the contemporary expression of that.

Figure 1: Layers of processing take us up the DIKW ladder from Data coming out of the blue, twinned in storage, to interpreted Information, then experiential Knowledge, and hopefully later Wisdom. But what is the penalty for climbing this ladder in time and resources?

These are the well-known problems with trying to move data around and rule the world by remote control:

Synchronization of data over long distances (equilibrium) involves latency, and clocks can’t be trusted to reflect real processes. Determinism relies on speed, and timing issues expose the fundamental flaws of technologies based on instantaneous responses.
Collecting data from multiple components doesn’t automatically provide a representation of the whole story. If we lose the context of the measurement, and the interrelationships, there’s no way to reconstruct them later. We need a scaled approach that blends dynamics with semantics (see figure 4 below).
Finally, the relevance of the answers, inferred from past data, for the “here and now” depends on a chain of invariances linked by causation, each of which needs to be questioned. There is uncertainty in every story we try to tell with data.

From a spacetime perspective, it’s completely clear why there is no single concept of time in a distributed system (see Smart Spacetime), and thus no precise deterministic control on any timescale. Sending data over a network incurs delays, loses context, and perhaps loses the chain of causality — even data unless special measures are taken to capture them all. So why then would we centralize in cloud datacentres?

We can centralize just as much as we have to in order to calibrate information to a common standard, so that semantics are the same for all. Then, we preferentially keep data as close to the point of application as possible, in order to minimize delays and distortions. It may sound simple, but it’s fraught with subtleties. The bottom line is:

The first rule of distributed computing: don’t send anything over a network unless you have to.

Knowledge representations as cybernetic control systems

As I mentioned at the start of the series, today we’re using data as much for cybernetic feedback and control as for scientific discovery. Consider then a data pipeline, pumping information through space and time. Think of any computational process or digital conversation passing through a mobile communications network between endpoints that could be in motion. Maintaining the flow of data, at a certain rate, is one problem, especially when spatial relationships might be changing — this is a cybernetic challenge of interest in 5G or 6G services and beyond. Maintaining the meaning of the results is a bigger issue, especially at scale.

A lot has been made of the revolution in edge computing: the Internet of Things, RFID chips, 5G network coverage, and smart “embedded” services. The promise of rich contextual edge-services is still a work in progress, but advancing by the day. With data sources in relative motion, the systems we rely on are constantly seeking paths through “Internet spacetime”, negotiating handovers from cell tower to cell tower in a “telco spacetime” graph, and coordinating physical locations with available services at the “edge”. By separating events (logs) from invariants, using the principles I’ve described, we can manage and interact with active processes as a chain of simple coincidences (part 7). These are similar to RDF’s triples, but have spacetime semantics.

PersonLocation(roomID,personID)
UserService(userID,serviceID)
RoomIPprefix(roomID,IPprefix)Train(engine,list carriageIDs)
TrainStation(trainID,stationID)
PassengerTicket(name,ticketID)
TicketTrain(ticketID,trainID)
PassengerTrain(name,trainID)
PassengerStation(name,stationID)etc..

These coincidence function annotations quickly generate a rich and easily searchable graph, from easily understandable spacetime events — in an idempotent or invariant way. From a database viewpoint, details that are interior to agents in this description can be modelled as document details that don’t need to be exposed as graph structure. This is why I chosen ArangoDB at my tool of choice.

Notice how there are several possible spaces in play in a semantic spacetime model: coordinates over the Earth surface, building addresses in a city map, train stations on a railway network, etc.. For spatial reference, IP addresses or street addresses are meaningless to geo-space, and geospatial coordinates are meaningless in mobile communications of the Internet space or to postal delivery. They may need to be mapped to one another by virtual processes of discovery (say a postman’s travels). These mapping searches are themselves non-static! Processes involve types of virtual motion on many incompatible levels.

Edge computing isn’t just about smart devices in the home either: it’s about wiring together these completely different models of space and time, through data pipelines, with levels of understanding and sophistication to complete a puzzle of staggering complexity. Well chosen data representations are needed to process and curate these into well chosen knowledge representations at the right speed to anticipate the answers to future questions. It’s data pipelines all the way down.

Figure 2 : Meaning isn’t instantaneous. It’s inferred by processes that proceed at rates less than or equal to the rate of data arrival, mixing past and present in the causality cone. Rich answers rely on elapsed time to add value to data, e.g. in Machine Learning. The ability to access answers quickly from a rich model depends on the sophistication of the knowledge representation after processing.

Data processing pipelines (figure 2) are becoming the circuitry of the cybernetic world, every bit as important as our utility networks. Asimov got this part wrong: the cybernetic future doesn’t begin with isolated humanoid robots, but rather with a giant silicon ecosystem, a digital rainforest of diversity to explore the intrinsic complexity of human civilization. Today, it’s still held together with coffee, VPNs, and bug fixes. If we’re going to make progress in the future, it needs to become as robust and commoditized as electronic circuitry is today. There’s still a long way to go, because the problem of data circuitry spans many layers of virtualization, each treated as distinct — but waiting to be unified by underlying spacetime principles.

Intrinsic scales and choosing a data store

To realize this vision, we need data representations and services that are relatively fast, local, and separated by their intrinsic timescales — e.g. by long term and short term effects (see figure 3). Physics tells us that every process in spacetime can be characterized by the intrinsic scales that govern its workings in space and time (called dimensional analysis). This leads to the principle of locality — an important scaling principle.

One of our chief goals in analyzing and interpreting data is to discover those intrinsic scales and express changes relative to them. The combinatorics of processes adds semantics (figure 3). In this series, we’ve looked at how we can get the most out of scenarios by recognizing the role of space and time within them, on a fundamental level.

Figure 3: A real data pipeline is composed of multiple node types that interact over a hierarchy of timescales, with complex causal boundary conditions. These could easily be represented by say ArangoDB, but not by less flexible graph representations that are assumed to be homogeneous.

The notion of putting a multi-model database at the edge, as a smarter cache for scaled data circuitry is a compelling idea — no more controversial (but infinitely more structured and usable) than only having embedded Linux boards with their hierarchical filesystems running there.

Tempus incognito

Underneath it all, at the bottom of the ladder of meaning is time — the change of state that makes everything tick. The sampling frequency for data (known as the Nyquist frequency) plays an underestimated role as a foundation for time-series data collection. Graphs embed timeseries everywhere, through directed links. From there, processes delve into multiple spaces, by exploring memory and outcome. Without space there is no memory, no state and no process.

Databases exist to engineer semantic models representing data in space. Graph databases exist to capture processes. By encoding semantics in types and collections, we try to capture some aspects of an original context and build a durable understanding. We separate batch processing from in-band computation to avoid brute force repetition. Then we cache that learning in long-term databases, to build on past work, separating short term and long term learning. Memory allows us to play with time.

We can dump data into sequential logs for speed, but we can’t search and analyze and summarize them on the same timescale. We can carefully parse data and try to reason about events directly from logs and timeseries, but not very quickly. But we can’t cache that reasoning without a process graph. Logs, key-value stores, tables, documents, and graphs are not alternatives, they are different cases. Each representation must be seen as a step in a larger process, not a competing choice. If you’re going to have one tool, having a flexible database is a pretty good place to start.

Naming of concepts

Semantics begin with naming. Personally, I find that one of the hardest choices I have to make, in any modelling project, is deciding how to name things, variables, and processes, so that I’ll understand them later. If we choose names wisely, a system will tell its own story in human readable terms. If we choose poorly, we might end up searching endlessly through numbers without seeing the wood for the trees.

We can attach names to unique or distinguishable instances, to groups, collectives, whole time-series, and to reasoned inferences too. Whenever there are “boundary conditions” that inject information or interpretation from the real world into a model, we have to name something. Naming is especially important in graph modelling, because the point of graph modelling is to illuminate narrative in a particular way. Nodes are names with other attributes. These are the “eigenstates” of our system.

The way we choose to model a scenario depends on all the different interacting processes that contribute to an observation — how data are obtained, how they are read and used by other processes. If you need to pre-process data (or process in multiple stages) then you have a data pipeline problem. Make sure you can get what you need within the Nyquist sampling time to answer the questions you’re posing!

Mixed data models for deep knowledge

The Semantic Spacetime Project provides a model for representing processes, not as movements in a Euclidean space (as Newton might have imagined) but as a network of interlinked events that move from node to node in a network, carrying named attributes (documents that convey deeper contextualized meaning). Sometimes nodes represent places or things (as we understand these concepts), because they are basically invariant over the timescale of interest. Other nodes may represent ephemeral events, forming sequences that tell stories, with links to represent transactions or pathways (real or virtual). These networks may be all represented as point-to-point graphs, with a little discipline.

Graphs (in the sense of Graph Theory or networking) are a relatively recent but powerful addition to data modelling. The history of semantic graphs began in the 1800s with a fascination for taxonomies, especially in biology (beware the Platypus). Then in the technological age, it was adopted perhaps first with Topic Maps, which were a form of electronic index for librarians. Later, with the web, came RDF, OWL, and other machine languages, based on first order logics. I’ll go out on a limb and proclaim that these early technologies have been miserable failures, even though academics continue to write papers about them long after their death. The reason, as I see it, is that relationships do not express a recognizable logic. Computer scientists tend to think in terms of logic, but logic doesn’t discover and tell interesting stories: it’s too strongly constrained to enable the kind of emergence that we seek in seeing beyond the obvious. It can’t tell us something we didn’t already know. But, it turns out that spacetime concepts do underpin deeper truths and may surprise us.

Today, of course, there’s a lot more written about Artificial Neural Networks (ANN), but those staged computational fabrics often have to go to extreme lengths to post-process ideas that can more simply be represented by a weighted semantic graph.

I haven’t made up my mind yet about what the proper role of ANN is alongside other tools. As a physicist, I think of them as taking on the job that Effective Field Theories perform in quantum physics — smart renormalizations of processes for contextual prediction.

To C, or not to C

Science is all about strict constraints, and “AI” is all about relaxing them. So, there will always be two ways to use data:

Classical experimentation, where we’re basically doing high fidelity signal processing — the output is a faithful function of the input.
Composite data sculpture, mixing sources to create a result which isn’t any single reality, but rather an impressionistic merger of results from multiple sources.

The former is how we do science and art, the latter is more like the experience produced by human cognition. We need to keep this distinction in mind when making sense of developments in machine learning.

Good answers from good questions

If you’ve followed my research work, you’ll know that I’ve made heavy use of semantic networks (graphs), from search algorithms to knowledge representations, since about 2003, when I first became interested in the role of semantics. I abandoned the logical approaches early on and developed Promise Theory, which has shaped much of my understanding ever since. That experience culminated in a model of relationships called Semantic Spacetime, which was based on some early work in collaboration with others, and which can be applied to basically anything. I’ve applied it to knowledge representation, diagnostics, dependencies and other supply chains, the Internet of Things, 5G Edge Computing, and even Machine Learning — for a variety of companies large and small.

Over the past decade, implementing semantic spacetime concepts in code several times, each time with different tools, I’ve helped different companies frame their questions in a way that is easier to solve. The tools I chose for this rendition (Go plus ArangoDB) have proven to be the best choices so far — which shows that technology does improve gradually in matching our needs.

The nice thing about Go is that it makes ideas easy to express, if no prizes for beauty. The nice thing about ArangoDB is that it’s not so much a database as a data processing platform, which can do advanced processing over distributed locations. This is what we need for the edge challenge. The lack of strong opinionation in either of these allows us to easily work around limitations of more rigid models.

*Figure 4: We trade speed and simplicity for richness of interpretation at a slower pace when storing, then time invested in processing semantics pays off when retrieving answers.*

A good data representation enables a good knowledge representation, which in turn will be able to separate and illuminate phenomena by scale, by relevance, rather than having to search every atom in a data set by brute force.

Postscript

In the series, I’ve concentrated on these core issues:

How to use distributed compound state in memory processes to represent data with context. Context is the true route to discriminating and thus naming data.
How to use graphs or networks to chart relationships, and to solve search problems quickly, without expensive brute force computation.
How to separate the onion layers of a system so that we don’t have to confront all the working parts of everything around us at once (coarse graining).

The better we understand the smart use of spacetime resources, the easier it will be to advance our lives, and still win back all the lost efficiency, buried by archaeological layers of software bloat — to help to play our own small part in saving the planet from the choking hand of the human energy footprint.

If you’ve made it this far, thank you for following this series of posts. Do comment and let me know your thoughts, and share with friends, relatives, and family pets. Until next time…

Video on SST: https://www.arangodb.com/dev-days-2021/day-4/