# Universal Data Analytics as Semantic Spacetime

Part 9: Graph structure, process computation, and dimensional reduction

*There’s surely an uncountable number of atoms in a beam holding up your roof, and it’s clear that we don’t need to refer to each atom to talk about the role of the beam. The beam is a **superagent**; it has its own story, on a much larger scale than the atoms. It relies compositionally on the atoms within it, but they don’t play a direct role in the function of the beam. You could replace the specific atoms with something else, and it would still work. There is a **separation of phenomena** across scales — effective descriptions dominate the stories we tell. When we shift perspective, from one scale to another, eliminating unwanted degrees of freedom, we both reduce the dimensionality of the problem, and we reorganize computation to achieve a sufficient answer. In the process, we go from something “uncountable” to “a manageable few”. In this post, I want to discuss the foundations for such** effective pictures**, coarse-graining, and renormalization of models, when computing answers from graphs.*

# Answers

In studying graphs, we’re not just looking for shortest paths, or to how get from one point to another in the quickest time: we may actually want to know where the weakest points in a network are, or find alternative (even hidden) routes, backdoors and, covert channels. We might be looking for the optimum place to establish a new node, in order to serve others, or to depend on a critical source. We might want to turn a timeline of events into a map of knowledge, turning *happenings* (“X reads a book”) into *invariant state* (“X knows the book”). Finally, we may look for co-activation clusters to find correlated events, items, or relationships!

Graphs are maps, processes, and calculators all in one. The first machine learning graph was the Internet routing infrastructure: **RIP**, **OSPF**, **BGP**, etc. Today, we have begun associate graphs with machine learning particularly for** “AI” **(Artificial Intelligence) in a number of ways: the strengths of causal connections can be used to record history and predict priorities; the clusters that form large scale structure can identify patterns of similarity for inference. Graphs and networks are now basic tools of “AI”. In all these cases, how we identify **process scales** is key to finding the answers we seek.

However, when we look at a huge graph, we are looking at more than just a database — it’s filled with both the remnants of old and harbingers of new *processes*. We might be interested in querying its large scale patterns, single paths, or topographical features — but this requires iterative interaction with the many alternatives through it. A one-off query statement, such as we’re used to from archival (e.g. SQL) databases, usually won’t cut it. Few other kinds of database can easily show patterns in data in the same way that a graph database can, but that doesn’t mean it’s easy. This whole topic makes graphs part of an unfolding story about **knowledge representation** that probably begins and ends with **neuroscience** (a story for another time).

# Learning discriminants

An atom inside a structural beam has no individual significance — it merely plays a role in a larger structure, thus we look mainly to that larger structure for answers. Unless the single atom is a critical part of another process (say holding together the mouth of a crack), we don’t want to deal with it explicitly. We work with the beam on the scale of the role* it *plays.

The concept of being

insideandoutsideof some place is awkward in the Euclidean model of space because there are no natural boundaries, but it plays a major role in graph semantics. In a network, boundaries are easily drawn and influence from outside a region can only pass along channels that are links or edges in the graph.

An outside influence may imprint changes throughout the interior of the region it touches, not just at the edge or point of contact by *propagation*, or iterative flow. These external influences can thus be *learned* and distributed throughout the *interior configuration* of links and nodes. This is how a network can act as **memory**. In society, it’s often the habitual knowledge, built into ongoing process relationships between people and groups, that keeps social practices alive, not so much the passive knowledge documented inside books or agents. For example, we sometimes call this distributed process memory “muscle memory” when it refers to ourselves.

So, the impression left by exterior “influences” is recorded on some level by the *strengths* and *types* of bonds between things or steps in a process. This is how cognitive systems learn. A graph is a** learning network**. The links between atoms in the metal beam, like the bonds in living tissue (e.g. bone, muscle, and skin) update their configurations *dynamically* in response to exterior stress information too. They learn the pattern of stress by adapting to it. Holograms and paths worn across grassy lawns do the same. Learning is a cognitive process — not unique to humans. It’s everywhere!

To make use of cognitive processes, wherever they may be, we have to see salient features at the right scale. A useful scale for learning is usually much bigger than the scale of the smallest change in a system, because small details are changing too quickly — atoms and beams. We need **persistence** to make sense of ideas. Thus statistical averages play a key role in processes of interest, because they separate persistent signal from fluctuating noise. Averaging is a way of finding **effective invariants**, i.e. the stable message within a body of data. It does so by summing over the “random” variations, which change faster than the persistent scale we’re interested in, hoping for something more persistent to emerge.

# Renormalizing graphs and “summarization”

The technique of integrating out unwanted details (known as **degrees of freedom**) goes by different names in different fields of science and engineering: *dimensional reduction*, *coarse-graining*, *pooling*, *renormalization*, etc. It’s the basis for averaging, statistical analysis, the classical measurement of quantum processes, and also for **Artificial Neural Networks** and now **Graph Neural Networks**.

In network routing, summarization is the process by which IP addresses are replaced by their common prefixes to reduce the dimension of information about the network for signposting.

When we want to eliminate detail, we look for **patterns** that express **equivalences**, and try to collect them under a common umbrella. Then we replace the collection with a single identifier — something simpler. To do this, without losing too much of their meaning, we have to look to the local semantics of agents and nodes, i.e. their specific functional roles in the processes of interest, and try to preserve causality on a composite level. This is just the **coarse graining** described in the previous post (part 8), where we looked for “scattering regions” with common inputs and outputs. It always has to happen relative to a process. Recognition of a pattern is always based on some underlying concept of *process semantics *because it’s processes that memories meaning not things. The devil, of course, is in the details — how we decide to encode as much or as little of that pattern as we need.

In figure 1, for instance, we start out with a graph of distinguishable atomic components, which **EXPRESS** properties with names like H and O. They are bound together into super agents that **CONTAIN** the parts H and O. The superagents can be labelled with a new name H²0, which contains less information than the original graph, but **EXPRESS**es new semantics on the scale of the molecule. All we did was to integrate out some local detail in the semantic neighbourhood of the molecular parts. If we now coarse grain the whole system graph for this pattern, we can identify a single component superagent which we rename as “water”.

In each case there is a dimensional reduction associated with the semantic scale of the data representation. How should we go about doing this? There are many possible approaches. With an underlying basis for semantics, using the four spacetime types, we could calculate this simply as we did in the last episode.

But not everyone models graphs with clear semantics. Then we need to extract a discipline indirectly. This is what graph embeddings try to do. I’ll not go into that topic here, because it’s orthogonal to the semantic spacetime approach and requires significant processing to stand a chance at recognizing roles. When that’s done, one still has to assign new names to the clusterings and divine their meanings. I’ll be returning to this point in the next post too, to show how we can use this as a practical technique.

# Flow gives us a way to generate calculations

In physics, the generator of the **proper time** is a rule for transforming the state of a physical system into a new state — a flow of “**energy**” (associated with the Hamiltonian of the system). This is a graph process, as physicist Richard Feynman showed by representing processes as **Feynman Diagrams**. Analytical applications — in which graph relationships represent on-going dynamical processes — are effectively *flow* systems too. The effective **adjacency matrix** of the graph, at different scales, has the role of a *propagator*, and the flow becomes a way to compute outcomes.

Everything is a process. Even when we think something is static, there’s a process associated with *observing* it that takes time and effort! Whether a **flow** is data passing through a pipeline, money passed by transactional exchange, histories over citations in journal publications, or just conversations exchanged between employees in an organization, in each case there is a kind of “**vote**” that gets passed from node to node in a network. Directed graphs are thus the circuitry that expresses this **voting flow**. The lucky “winners” (or perhaps losers) in this view are those nodes which receive most “votes” during a process’s history.

# Frobenius-Perron eigenvectors

A simple way to calculate and summarize this score in a steady state graph is by using the **Frobenius-Perron** result for this principal eigenvector of a graph’s adjacency matrix! It allows us to compute **eigenvector centrality**. Or, in plain language, the effective score for votes, summed over the whole graph. If we base current characteristics on statistics, it’s a technique that’s related to the technique of **Principal Component Analysis**. Some examples, where this is useful for finding **bottlenecks** and **Single Points of Failure**:

**Data pipelines**, dependencies and interoperability.- Human resources and
**social influencer networks**— voting or reliance. - Production/collaboration
**delivery networks**, i.e. service delivery and dependency.

**Voting** is only a numerical way of counting **importance**, but it’s an easy one to understand. Semantics on graph nodes and links allow us to supplement this with qualitative criteria.

# Example calculation: node importance ranking

The network centrality or importance story begins with the concept of flow and equilibrium in a network. There are several measures of how “central” or “important” a node is. The one I find most useful is called eigenvector centrality. The most important nodes are the ones with the most important neighbours. If we take the “weight” of a node and carry it along the links it connects, then it flows mainly towards certain centres. The graph computers the value automatically just by following the flow. That’s leads to a circular definition which can be solved by an eigenvector equation:

Av = kIv,

where A is the adjacency matrix for the graph, I is the identity matrix, and k is a constant (eigenvalue). That leaves v as a vector whose solution components are the relative ranks of **centrality** or **importance** for each node (see figure 2).

A node’s centrality score also simply indicates the points of fragility for a system, because the impact of a failure will be larger closer to a very central node. This is why military strategy targets key infrastructure to disable an enemy. Semantics become more important when network weightings refer to:

- Citation networks (assumed fair and representative)
- Dependency networks (“supply chain”) etc of components
- Smart building systems

Centrality is a numerical ranking, not automatically designed to account for semantics. It’s also not designed to work for directed graphs. These are issues that can be dealt with (and which I’ve written a lot about over the years). When applied to directed graphs, eigenvector centrality is a subtle matter. In a directed graph, a flow never reaches an equilibrium (i.e. a common “sea level” of votes), it just follows the arrows and runs down the drain. There are various “sink remedies” to this problem — some of which will alas still be patented for a few more years. The simplest remedy is to just remove the arrow direction from all links, then it closes and becomes stable to eigenvectors under the **Perron-Frobenius theorem**.

# Node degree and eigenvector centrality

The simplest kind of importance rank would be an entirely local one, and is found just by counting the number of neighbours each node has. The idea is that a node with many friends or neighbours is more important, in a kind of sociological sense, than a node with none. It’s sounds like a trite observation, couched in these terms, but if we think about the work done by the nodes as being proportional to the “stuff” they process, then this can be used to calculate a kind of measure of work as a result of exchange, which is useful in flow-based systems.

This brings us back to the importance of hubs again. The hub of a small graph cluster is thus the most important member in a group, since it has the most direct connections. Managers might like this interpretation of importance — it fits the classical hierarchy of management (see figure 2). But why should we only consider a local voting horizon? Each neighbour may be more important because of its neighbours! What about cumulative importance within an entire organizational network?

Consider this example from my book **A Treatise on Systems** Volume 1 (originally from the Archipelago paper mentioned below). Figure 3 below shows a work process that forms a human-computer network. The nodes form a graph, and the labels tell us something about the semantics.

Back in 2003, some research students, colleagues and I developed a graph importance analysis tool called **Archipelago**, which used a centrality mechanism to find the local maxima for graph centrality, and organize the visualization around these “different islands” of importance, like a planetary system with major bodies and satellites. The idea can be seen in figure 3. If we take the simplest assumptions, i.e. that all the links are of equal weight and all the nodes the same then we can form the matrix of connections, symmetrize it and calculate the importance. The result shows which nodes are the most important (at the centre of the regions) and which are close to them. When the graph is directed, there are additional subtleties to deal with. The method was similar in concept to a **Principal Component Analysis** clustering method, but with qualitatively different outcomes based on importance. Neural *auto-encoder* methods and *convolutional neural nets* can also be used for dimensional reductions, at higher cost and with meanings that are possibly less clear in a deterministic sense.

So, let’s calculate centrality, as an example graph computation, and find a way to compute it for some fraction of the nodes and links in a graph database. It assumes that graphs won’t grow too large, but that’s a different issue that we’ll cover in the final installment.

# Matrix (tensor) multiplication from edges

The essence of graph computation is matrix multiplication. Using **ArangoDB**, this is easy to do because an **Edge Collection** is basically the adjacency matrix for links. This is one of the *superpowers of *** ArangoDB** compared to other databases I’ve tried. We can copy, modify, derive, and operate on document collections as an

**operator algebra**. Eigenvector centrality is calculated as the solution to an eigenvalue equation:

*(Graph adjacency matrix) x vector = (scale factor) x vector*

The distribution of values in the vector represents the effect of neighbours voting by pointing at each other, which is effected by multiplying by the matrix of arrows. If we can do the matrix multiplication, we can do a lot of powerful analytical things. Graphs are basically matrices and matrices are graphs. The nodes of a graph form the basis (a list of names that label) for a vector. Suppose each node in a graph contains a number (called “weight”), then we can form a vector of all the nodes in the graph.

*Weight[node] = weight*

Similarly, if we assume that each link between node_i and node_j has a numerical strength associated with it, we could write something like

*Strength[i][j] = link(node_i,node_j)*

Strength becomes a matrix called the **weighted adjacency matrix**. If all the strengths are set to 1, then it’s just an array with 1 where there’s a link and 0 where there isn’t (this is sometimes called **one hot encoding**). This matrix representation is the basis for many methods relating to graphs, and one of those is importance-ranking using “eigenvector centrality”.

Matrix multiplication is both potentially CPU and memory intensive for large graphs. That’s one reason why deep learning algorithms require huge resources to perform even “simple” operations. We can perform matrix multiplication directly in a graph database. Row-column and graph databases make this straightforward (see **multiply.go**). If you know exactly what you’re doing and can invest in production scale services, with developer support, algorithmic graph processing frameworks like **Pregel** can also offer some efficient runtime solutions to certain well defined problems, but I won’t discuss this here.

The **Frobenius-Perron theorem** tells us that we can compute the principal eigenvector (the one with the largest eigenvalue, which gives the centrality score) of a non-negative matrix just by multiplying some initial vector by the adjacency matrix a few times. In practice, if we start with a vector of uniform weight* (1,1,1,1,1,1,1,1,…)* and multiply it by the adjacency matrix, we get the unnormalized eigenvector after two or three multiplications. To get the adjacency matrix, we just need the links. Here, the Arango model is nice because we only need the link collections to do the multiplication of adjacency — see the function **GetAdjacencyMatrixByInt() **in the Go SST library, then **GetPrincipalEigenvector()** which just does the multiplication.

# Use Go or Arango to calculate?

Although databases profile their query languages, working with a query language is something to be avoided in most databases. Like all Domain Specific Languages, query languages are quirky and limited by the deliberately narrow model they encourage — and yet people will always try to push them beyond their intended mandate. In that sense, query languages are the nemesis of every database user. The syntactic acrobatics needed to apply transactional operations to complex symbolic calculations end up stretching even the most elastic of brains beyond their limit. Fortunately, ArangoDB has an intuitive query language, well suited for programmatic thinking. For more imaginative applications, we always need the full power of a general programming language, and database drivers, to move data around and combine them in meaningful ways. This is where a language like Go comes in. But there’s a trade-off — it’s not a simple choice, because client languages require us to move data around, which is expensive.

Using a client program (written say in Go) to retrieve data and send it back and forth across a network may incur significant delays due to the latencies of the data path. They may also test the memory limits of the client machine. A database server, on the other hand, which may have more memory, has first hand access to the data, though it might be quite busy handling requests from many clients. Should we impose on its CPU time to perform calculations, or push large volumes of data over a network?

By way of illustration, let’s do both. In the repository, the programs (**cgraph1.go** and **cgraph2.go**) show how to perform the calculation using Go and using ArangoDB, using copies of data rather than altering an original source graph. For the latter, we use ArangoDB’s multi-model ability to keep different collections (different directories) to make copies of graph data that we can operate on directly, without altering the source data. In other words, we use collections as scratch-pads for computation as discussed in parts 4 and 5. You can verify that they produce the same answer in each case.

Figure 5 shows the example graph used in the code, and the histogram representation on top of the graph, indicating the central node. The adjacency matrix for this graph is multiplied by a uniform starting distribution:

0 1 0 1 1 1 1 0 1 0 1 1 0 1 0 1 1 x 1 1 0 1 0 1 1 1 1 1 1 0 1

If we label the nodes by vector row 1–4 from top left, clockwise, and make node 5 the central node, then multiplying once gives (3,3,3,3,4). Multiplying twice gives (10,10,10,10,12), and so on. Normalizing after a few iterations gives (0.19,0.19,0.19,0.19,0.23), showing the greater importance of node 5.

Using the Go client to read from the database and then perform the calculation in memory, we have a structure like this (**cgraph1.go**):

adjacency_matrix, dim, keys :=

S.GetAdjacencyMatrixByInt(g,"CONNECTED",false)ev :=S.GetPrincipalEigenvector(adjacency_matrix,dim)S.PrintVector(ev,dim,keys)

We use the **CONNECTED** collection as the subgraph adjacency matrix (a form of **NEAR** relation). This is efficient in ArangoDB, because of the way we can separate data by collection. Finding the principal eigenvector is a simple matter of multiplying this matrix a few times. It converges quite quickly, without the need for complicated convergence tests:

` ev = MatrixMultiplyVector(adjacency_matrix,ev,dim)`

ev = MatrixMultiplyVector(adjacency_matrix,ev,dim)

ev = MatrixMultiplyVector(adjacency_matrix,ev,dim)

The alternative approach is to use ArangoDB’s query language directly (see **cgraph2.go**). The ability to form isolated collections once again helps us here. We can form a separate Node collection as a vector to operate on. We ping pong data from one collection to another, using the query language’s ability to refer directly to data by fully qualified URI:

`FOR `**link** IN AdjacencyMatrixCollection

LET **row** = PARSE_IDENTIFIER(**link**._from)

LET **col** = PARSE_IDENTIFIER(**link**._to)

LET **vec** = DOCUMENT(CONCAT("FromVector",**row**.key))

UPSERT { _key: **row**.key }

INSERT { _key: **row**.key, weight: **link**.weight * **vec**.weight }

UPDATE { weight: **OLD**.weight + **link**.weight * **vec**.weight }

INTO ToVector

This performs a single multiplication. It looks much more complicated, but it’s partly an artifact of combining what would be several layers of Go function calls into a single declaration. The use of **UPSERT**() ensures idempotent updating, which would be handled by the **AddNode() **and **AddLink()** functions we made in part 4. The **PARSE_IDENTIFIER()** functions are used to trip off the old fully-qualified collection name so that we can **CONCAT()** a new one, thus moving data from one “subdirectory” to another.

The multiplication can be repeated, swapping out the **ToVector** and **FromVector** names with different collections. Notice how the ability to refer to the previous value in **OLD** is key to enabling this — a nice feature of the ArangoDB language.

In this example, the centre of the graph is the obvious “important” or central node: it has more connections than all the others. By changing the initial weights of the links, we can shift weight to remove the obvious symmetry.

The calculation is just for illustration. You can alter the programs to work out more interesting cases, such as figure 2 or figure 3.

# Arangolangs (or just Arangos) can help

With large computations, at some point you “reach the top and have to stop”, but the question remains: should we compute in **Go** or in **ArangoDB**? There is no simple answer to this question. What seems to be true however is that serialization of bulk data (whether over network or PCI bus) is to be avoided, especially over “cloud distances’’. If we can compute results at the database’s location, operating from collection to collection, and minimize communication with a client, much time can be saved, albeit at the cost of contending for resources with shared access.

As long as we’re exploring methods on small datasets, say running on a laptop or local desktop server, with an SSD and a local ArangoDB process, the average time to run either approach is quite similar, so it doesn’t matter. But, for data even in the 10,000s of nodes, one may see a factor or two or three difference even locally on a laptop due to the copying and context switching in a data intensive process. In the future, a proliferation of methods involving GPUs and other specialized hardware computers will enable us to process all data in place. So it seems like a good investment to build with in-database computation in mind.

One of ArangoDB’s superpowers seems to be its model of collections, which can be used as temporary storage during computation. One can build a kind of operator algebra using collections. Edge collections are matrices, and node collections are vectors.

Try running the test programs (cgraph1.go and cgraph2.go). At 2000 nodes, it takes about twice as long to run the golang computation program cgraph1 as to run the calculation inside the database in cgraph2. By the time we’ve added 20,000 nodes, working inside ArangoDB is about 3x faster than pulling data into Golang, so a logarithmic scaling. That might come as a surprise — as one normally expects a database lookups to add an overhead.Traditionally, this feat would have come from rotating disk search times. In the SSD age, that overhead is comparable to ordinary RAM.

For very large data problems with clustered representations, there’s Google’s Pregel message passing architecture, which works for a few special case algorithms, but not everyone has the tenacity or resources to run a Pregel installation, which relies on a synchronized version of the Actor model for computation. There are still many many problems for which there is no need to invest in such a cognitively heavy handed approach. It’s interesting to see how ArangoDB is integrating Pregel and trying to make it accessible without burdening the user with details.

It seems clear that the future of data processing lies inside** active datastores**, not in treating a datastore as a static central archive. Realizing this for distributed systems, e.g. in the world of 5G, is another challenge for semantic spacetime modelling.

# Next…

The computation of centrality is a simple illustration of how a graph naturally segments itself into regions, as a result of dynamical processes in the graph itself. In the next post, I’ll show how to develop this idea to radically reduce a large graph to a smaller one with the same equivalent properties, i.e. use renormalization to show how coarse-graining can give us certain answers more cheaply.

The examples used in this series installment reveal how spacetime processes can be used to calculate, inside and outside a system of measurement. If we use a graph as a computer, we can build computations gradually and adapt through learning. It’s not just where we capture data that we have to understand the meaning of process: as soon as we couple the world to our measuring apparatus spacetime inside and outside are connected.

*I’m grateful to Arthur Keen for discussing this topic, especially for telling me about one-hot encoding.*