Universal Data Analytics as Semantic Spacetime

Eventful coincidence (x-x’)!

Coincidence is what happens when several things or processes meet at the same location: their timelines or trajectories join or cross — they are “incidentally” together. A related term co-activation is also used in biology for coincident proteins that activate processes in a kind of “logical AND” handshake, with proposal AND confirmation both required to switch on a process. It’s the same idea used in forensic investigations: if A and B are observed together, then there is some kind of connection between them — perhaps to be discovered or elaborated upon later. The role of coincidence is not always easy to discern, but in spacetime it’s simply a matter of expressing how events are composed from their coincident parts.

Figure 1: An Internet traceoute graphed end-to-end. Illustrating the complexity of process paths due to parallelism. Pink nodes are indistinguishable like a QM multi-slit experiment, and each arrow defines a proper time clock — so that on a map forms a kind of wavefunction for Internet service.
Figure 2: Adding a second destination and merging the graphs to build up a map of the whole network adds a fork in the path, but some common path — as seen from the same starting node.
Figure 3: Temporary coincidence is like scattering of processes, or ships passing in the night. As a causal diagram (Feynman diagram) it looks like this.
Figure 4: Sometimes two separate processes join and stay joined, forming a new “molecule” with different semantics. This is an interaction that transforms spacetime properties.

Property or coincidence? Document or graph?

We have a choice about what happens in spacetime: we choose a scale for what encompasses a location (fine or coarse grained), and thus we have a choice about whether to consider properties as being interior to a location, or exterior between locations — as intrinsic properties or as coincidences of interaction. We can perceive each version of a story if we can observe it over such a characteristic scale, in the hope that this allows us to understand it.

Location[userID] = roomIDRoom_Occupancy[roomID] ++Total_Room_Usage[roomID] ++
AccessRights[userID,roomID] -> {userID, roomID, Accessrights: rwxa}
Room_Occupancy[roomID]--
List services and devices registered to HUB(roomID)
Mean_Room_Occupancy[roomID,”Wed:Hr09:Min15_20”] = NewMean(roomID,”Wed:Hr09:Min15_20”)Variance_Room_Occupancy[roomID,”Wed:Hr09:Min15_20”] = NewVariance(roomID,”Wed:Hr09:Min15_20”)
Room_Occupancy[roomID]
FOR EACH service IN ROOM(roomID),   GRAPH: device  ---is contained in --->  roomID   GRAPH: Device  --- subscribes to ---->  service

Graphs connect local key-value stores

As a graph database, the same virtual representation of this scenario looks like a cluster of attributes. It will always look something like a hub and spoke diagram when some property unites a number of related things. See figure 5 below.

Figure 5: Data (users) may be clustered around different physical or virtual locations in a hub pattern. Some hubs are fixed, others may be mobile (on planes or trains). In cyberspace, location is not the simple coordinate concept we learn about in Euclidean space, it’s a graph relation. When extended to cover a sequence of different locations could form a sequence of hubs to describe a journey (see figure 2).

Riding the four horses of semantics

I’ve alluded to repeatedly, throughout this series, to the four types of connection in a Semantic Spacetime interpretation of the world. Now, having used these in practice, we can describe them more systematically.

EXPRESS: context attributes

Although the smart room data can be expressed as key value pairs, or as docuument data, a graphical interpretation can also be used to show how properties are connected to locations in a more intuitive light (e.g. figure 1). If the properties aren’t shared or used by others, then EXPRESSed relations are the perfect use-case for document database format. There’s usually no need to complicate the graph by exposing irrelevant noise, but sometimes it may be expedient to do so. As a relational graph, a key value store has the topology of a hub, represented by the name of the map, with keys expressing one or more properties, surrounded by a cluster of values. Although all the keys are different they all EXPRESS some kind of property of the same meta-type:

  • Users clustered around a room ID.
  • Users clustered around a common service (local WiFi)
  • Rooms visited by a single user during a day.
  • Devices or services accessed by a particular user.
  • Devices or services available within a given room.
  • etc.

FOLLOWS: change and motion

A narrative is more than a list of qualities expressed by a fixed location or agent. It’s a sequence of events, each of which can express attributes. When one item or event FOLLOWS another, the two play a role of a causal transition, which is involved in ordering a narrative.

Figure 6: time interpreted as a succession of observations by a fixed sensor. Recall the passport example in part 6 of the series.
Figure 7: Time and location moving together as a sequence of observations by an observer that interprets itself to be moving. This is the relativistic complement of the scenario in figure 2.

CONTAINS: inside or outside?

A concept that we take for granted in Euclidean space, but which can’t be directly represented by a graph is CONTAINment, i.e. what it means for one node to be inside another. However, if we allow clusters of nodes to behave like effectively a single “supernode” then that is possible, albeit with ambiguous edges. This includes the concept of semantic generalization too: being part of a related cluster is being a member of a weak or strong generalization. Think, for a moment, of a bank as a central location that unifies customer accounts. The customers and their money are not really inside the bank, but are associated with it, yet we see this as belonging. So a CONTAINS relation concerns both the scaling of things into larger things (see figure 8) and a sense of carrying or ownership in different circumstances.

Figure 8: By imagining a boundary around clusters connected by different relationships, such as CONTAINS or EXPRESS, we can imagine a hierarchy of hubs within hubs, something like atoms and nuclei. This is how we represent scaled regions in a graph.

Mixed representations

The four types give us a kind of graph coordinate basis to describe generic semantics by, when processes interact non-trivially, we end up with nodes linked by several different kinds of relationship. According to the semantic spacetime model, we can always classify the relationships as one of the four types — but that won’t be sufficient.

Figure 9: two different ST styles (contains, expresses) used to model narrative relationships. We can sometimes choose to encapsulate expression as containment, e.g. in struct data types, tables, or documents formats.
pairs := S.GetNeighboursOf(g,start,S.GR_CONTAINS,”+”)adjacency := S.GetAdjacencyMatrixByKey(g,”CONNECTED”,false)
FOR link IN Contains FILTER link.semantics == “INSIDE” RETURN link
FOR link IN Near FILTER link.semantics == “CONNECTED” RETURN link

NEAR semantics

It’s tempting to think of closeness or proximity, the quality of being “near” something, as having to do with physical distance (which locally means hop or edge count in a graph), but as we’ve already seen, the semantics of distance depend on the semantics of the space you happen to be thinking of at a given moment. There are many possible interpretations. Agent properties can be close in shape, in location, be connected or tethered, be close in value, in time, etc. There are many formal definitions of distance that have been designed for data and for graphs, but they’re all basically ad hoc ways of embedding a graph in some Euclidean metric space, like a scatter plot. It’s understandable that we look for this kind of mathematical relation to automate similarity, but that’s also not how we decide similarity in practice. That kind of distance can change, so there’s no sense in encoding it.

  • If nodes are (approximately) similar in their expressed properties, e.g. word spellings (color and colour, or inbound and in-bound). This interpretation is context dependent.
  • If they are directly connected by a small number of FOLLOWS links, we might consider nodes to be close (depending on the specific follow interpretation).
  • If they can be CONTAINED within a certain type (depending on the specific contain interpretation).

Precise choosing and naming of semantics

The purpose of a name or any other kind of relationship is to explain something to a reader. For all the sometimes strained mathematical justification presented in literature as if it were unquestionable truth, we basically engineer things to produce the answer we want. The reader of your story will thank you for being clear without excessive nitpicking.

  • Transitive relationships — if A is related to B and B is related to C then A is related to C. e.g. “is the same as” Equivalence relations are transitive. If A is greater than B and B is greater than C, then A is greater than C. So inequivalence relations are also transitive. That’s basically because A,B, and C are all the same type of object or node.
  • Intransitive relationships: Suppose we start to model different kinds of things as nodes in the same graph. A belongs to B and B belongs to C. Does A belong to C? What could this mean? Try substituting some different things or persons for A,B, and C. Can we interpret the first “belongs to” in the same way as the second?
  • The book belongs to Mark.
  • The concept belongs to the book.
  • The concept belongs to Mark?

Internet/BGP example

BGP is my go-to example for networking, because it covers so many concepts from scaling to causality, and the most important central core of the Internet that few know about, and fewer still understand. It’s everything from edge computing to centralized scalability — and today, it’s being re-used as a redundant spacetime switching fabric in datacentres too. We can sketch out a simple model of BGP issues in terms of the semantic horsemen.

  • The smallest addressable entity on the Internet is a Network Interface Card (NIS) which EXPRESSES an IPv4 and/or an IPv6 address, but in terms of routing, an IP prefix is the smallest fragment. The prefix CONTAINS assigned IP addresses in that space are interior to the prefix. It also EXPRESSES these to the outside world so that they can be reached.
  • End to end paths can be traced using the “ping” or IP ECHO protocol using a tool such as traceroute (see example above). In a trace route, each journey is a sequence of events called hops — each hop FOLLOWS the last, for a single journey. The next time we try it, the journey from end to end could be partly or completely different. Thus there is a difference between the map of all possible paths (the BGP “wavefunction”) and the actual path taken in a trial observation.
Figure 10: The approximate hierarchy of addressing in Internet management, with IP addresses at the bottom and assignment of prefixes by “AS” policy regions at the top.
  • One or more IP addresses map to a DNS/BIND domain, e.g. example.com. The DNS domain is an coarse grained overlay of virtual nodes which CONTAIN disjoint IP addresses and prefixes.
  • One or more prefixes maps to a routing domain, which attaches to an IP organization. These are typically hosted by large “Telco” providers. The old model of class A,B,C networks is replaced by CIDR prefixing.
  • An autonomous organization may be associated with a BGP domain. A BGP policy domain is called an Autonomous System (AS). On the interior of a BGP domain, routing is performed by some protocol (OSPF, IS-IS, RIP, etc). Between AS domains, routing is performed by border gateway routers using the eBGP protocol. eBGP routers are NEAR each other, by definition.
  • eBGP has no hierarchy, it is a “peer to peer” network. Each peer shares information about its routes and neighbours to all its neighbours.If there are several gateway routers in a non-singleton domain, BGP information is equilibrated by the separate iBGP protocol.
Figure 11: ASes form bubbles inside which local routing takes place. Between ASes, routes are exchanged voluntarily and cooperatively by “Border Gateways”, which represent directions using eBGP. BGP learns and maintains a local node view of which prefixes can be reached in which direction, by sharing data with peers.
Figure 12: An end to end path is routed by a number of protocols, passing through a number of autonomous regions, all of which have to learn about each other’s presence by “machine learning” of prefixes. This enables a ranking or prioritization of possible pathways. Algorithms on all levels determine final routes for forwarding end to end packets.

Summary

  • FOLLOWS (direction) explains the causal order of a process, i.e. dependency.
  • CONTAINS (hierarchical order) explains generalizations and group memberships.
  • EXPRESSES (intrinsic attribute) describes a node and its interior properties. The node acts as a hub for its attributes.
  • NEAR (comparison without order) expresses similarity or approximation to find related items during search. It can also express mutual connectivity.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mark Burgess

Mark Burgess

174 Followers

@markburgess_osl on Twitter and Instagram. Science, research, technology advisor and author - see Http://markburgess.org and Https://chitek-i.org