Universal Data Analytics as Semantic Spacetime

Types are not enough…

I recall an episode from many years ago, during the early development of Promise Theory, in which a PhD graduate from the University of Oslo came to discuss my approach and tell me about his problem. The student had worked on Object Oriented Type-Analysis for his thesis and had been unable to resolve a final issue. His advisor had identified the case as a paradoxical problem in OO modelling, and they had spent some years working on it. After hearing about the problem, I sketched it on the blackboard, using Promise Theory, and showed him how I would have done it. His mouth fell open as he saw that the answer was so obvious — a simple shift in thought, from the imposed obligation of a rigid type model to autonomous promises, made the issue unambiguous. Freed of a restraining doctrine, we had solved the problem immediately. It was one of many incidents that pushed me towards network modelling without the artificial logical structures that are tradition in mathematics — towards ideas like Semantic Spacetime.

Graph spacetime

Suppose you want to model relationships between a number of individuals. You can’t do that with a one-dimensional histogram. With a one-dimensional key-value table, at best you could count the number of interactions each individual experiences (with everyone else) and store it in a key-value pair. It would be natural to store that in the node, because it’s information only about itself. Interactions, on the other hand, are two-dimensional tables: matrices or 2-tensors. If we make a table with rows and columns labelled by individuals’ names, we get a so-called adjacency-matrix.

          A  B  C  D  E  F  G  H
A 0 - - - - - - -
^ B - 0 - 3 - - - -
| C - - - - - - - -
| D - 1 - 0 - - - -
row E - - - - 0 - - -
| F - - - - - 0 - -
| G - - - - - - 0 -
v H - - - - - - - 0
<--- column ---> matrix(r,c) = value
matrix(2,4) = 3
matrix(4,2) = 1

Graphs: associations, links, or “edges” vs types

Let’s begin by thinking about the kinds of stories we want to tell with data. Numbers alone don’t usually give us a complete picture — we still need data types somehow. Processes involve intentional steps, e.g. in financial transactions or delivery logistics. Networks describe the transitions (links) and their possible constraints, not just the resting places (nodes) along the journey. Graph modelling allows us to move on from rigid typological assertions like:

  1. A emailed B
  2. A is the child of B
  3. A employs B
  4. A is an alias for B
  5. A belongs to B
Figure 1: A directed, labelled “edge” or link in a graph between objects A and B.

Nodes, “vertices”, agents — and their types

In the list (1–5) above of examples, the nodes A and B’ also need an interpretation, i.e. we need to describe their intended semantics too. In the first two cases of the list above, A and B are people. In the third case, A could be a person, or some kind of organization. In the fourth case, A and B seem to be people but in fact they are names of people. In the fifth, A is some kind of object and B is a person. Does this matter? The simple answer is yes, if we want to be able to decipher meaning on a larger scale than a single link. Indeed, that’s what we mean by inference or reasoning.

IF A (emailed OR spoke OR shouted OR texted OR tweeted OR *) TO B

Rethinking associative link semantics

Although these selected examples help us to see some of the issues of naming, when attempting to express link semantics, the need to exercise care in choosing these modelling relationships is clear, if we don’t want to get tangled in a spaghetti of inferential complexity. Thankfully, semantic spacetime principles show us how to make this straightforward, by reducing all interactions to four meta-types, which only describe process causal structure. We can be smarter by pre-classifying relationships into broadly actionable “kinds”, based on their spacetime process semantics. Recognizing that all processes involve one of four spacetime constructs is the key:

  • A “CONTAINS” B (a kind of spatial relationship),
  • A “FOLLOWS” B (a kind of temporal relationship),
  • A “EXPRESSES” B (a local property) and
  • A “is NEAR to” B.

Modelling sort-of graphs in Go

In Go programming, we can’t easily make explicit graphs in a natural way. However, we can form matrices using the associations called maps (see part 4 of this series) in a couple of ways. A simple edge in an interior graph may be represented as a kind of array, associating one item with another. The type discipline forces the types to be consistent for all entries.

var child_of = make(map[string]string)child_of[“A”] = B
type VectorPair struct {
From string
To string
}
var employs = make(map[Pair]bool)employs[VectorPair{From: “A”, To: “B”}] = true

Associations in Arango

To code associations in ArangoDB, we can go beyond the key-value linkage discussed in the previous post, and use an explicit graph model. A graph is built, as a network, by defining collections of nodes (vertices) and links (edges) as different data sets. Links or Edge Collections give a direct representation of matrices, in which each document is a compound edge value.

Figure 2: A graph representation of some “entities”. It’s a natural way to think about relationships in pictorial terms — but the difference between salvation and damnation lies in the proper labelling of the arrows.

Associations with causal semantics

The inverse problem for describing links, mentioned at the start of this piece, shows us that associations form natural groups. They can be directional, so we want to define: forward and reverse interpretations. It might also be useful to define negative semantics for each link to exclude nodes explicitly, so it’s worth planning ahead. For example, suppose we want to say that “A contains B”, then there are four variants:

Forward: “Contains”,Backward: “Is part of”,Negative-forward: “does not contain”Negative-backward: “is not part of”
Figure 3: separating link semantics from the link using a look-up table (map)Figure 3: separating link semantics from the link using a look-up table (map).

Making an SST toolkit

As a Go map, an edge group type could be represented as a global resource variable as follows (which could itself be stored in the database as a key-value table, as shown in part 4):

ASSOCIATIONS[“groupalias”] -> structured edge type values
  1. The forward explanation text.
  2. The reverse direction text.
  3. The forward negative.
  4. The reverse negative.
(A,B,ASSOCIATION[“alias”],”+”)
type Association struct {
Key string `json:”_key”`
STtype int `json:”STType”`
Fwd string `json:”Fwd”`
Bwd string `json:”Bwd”`
NFwd string `json:”NFwd”`
NBwd string `json:”NBwd”`
}
var ASSOCIATIONS = make(map[string]Association)ASSOCIATIONS[“CONTAINS”] = Association{ “CONTAINS”, GR_CONTAINS, ”contains”, ”belongs to or is part of”, ”does not contain”, ”is not part of” }
var STTYPES []IntKeyValueconst GR_NEAR int = 1      // approx like
const GR_FOLLOWS int = 2 // i.e. influenced by
const GR_CONTAINS int = 3 // inside/outside
const GR_EXPRESSES int = 4 // represents, etc
Figure 4: the four elementary spacetime relationship types.

Associating with edges in Arango

Finally, we want to be able to define edges or links between things/nodes in an Arango database. Because we’ve separated the semantics, all we need is to attach the group-alias of the link type to each edge (figure 2). We want to be able to code something like

CreateLink(node1, ”group-alias”, node2, weight)CreateLink("Norwegian", "IS_LIKE", "Swedish", 60/100)
Figure 5: browsing the links collection in Arango, we see the document structure of links.
Figure 6: a fully formed semantic spacetime model may have several node and edge collections

Summary: building on the SST package

In the last few posts, I showed how to put together a few structures and wrapper functions in Go. Here, we added a lookup table with a data model to manage the subtleties of graph relationships from the beginning. These are simple tricks that I’ve learned from bitter-sweet experience, and I recommend some version of this advice to everyone working with semantically labelled data, whether it’s for edge computing, big data, or machine learning. We’re now starting to build up methods that simplify working with semantic data in all the relevant formats — sometimes tables, sometimes documents, sometimes graphs — this is where a multi-model database is a friend. To enable everyone to accept the use of better tools, we also need to develop natural idioms for using them across a variety of different data representations.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mark Burgess

Mark Burgess

174 Followers

@markburgess_osl on Twitter and Instagram. Science, research, technology advisor and author - see Http://markburgess.org and Https://chitek-i.org