Trust, Language, and Cognition

How Promise Theory and Semantic Spacetime help to uncover weaknesses from monitoring to comprehension

21 min readNov 1, 2022

(This essay is a summary of the authors research papers)

Most of us make the tacit assumption that the purpose of language is to enable communication between individuals: either by speech or by writing. Moreover, humans are typically upheld as the sole possessors of a language capability. Like the Victorians, and their ladder of species, we've constructed an anthropomorphic mythology around language, modelling it in our own image, and placing humans at the apex of its evolution. Species that apparently express themselves by simple alarms and signals are not usually considered to possess language, though even their squarks and grunts would properly be viewed as rudimentary language in the sense of the Chomsky hierarchy.

The cognitive agent is connected to its sensory "edge" and forms a language from the spacetime properties of the sensory channel.

Technology fares poorly in languages too. For commanding and programming, we have invented structured languages to span the Chomsky processing hierarchy, but — for observation, monitoring, reflection, and alerting — current methods have failed to develop adequate languages that capture concepts to report on the states of information technology. We are stuck in a primitive notion of alarms and kernel measurements originating from the 1970s; yet these signals don’t say much that correlates unambiguously with the observable behaviours of software.

The purpose of monitoring, like language, is thought to be about communicating: in this case, the state of a process or an activity. Certainly monitoring plays a role in coordination between system and administrator: system owners aim to respond to observed states based on an assessment of their meanings–but we don’t seriously try to learn the intended meanings; rather we try to decipher the ups and downs of signals as if every change were a new language, or perhaps no language at all. Is system behaviour more like the weather? Even the weather is based on cause and effect. But, if there is no language in a monitored signal, then what meaning are we actually hoping to find in it?

How should we decide what can actually can be said, or what needs to be said in a language whose purpose it is to explain the state of a process? I’ve written about these issues in a series of papers that are also summarized in my book Smart Spacetime. In this essay, I want to sketch out a summary of some of the points and apply it more directly to monitoring of technology.

Language is for navigating change

Human language is too new a phenomenon, in evolutionary terms, to have evolved for the rigours we apply it to, like writing books or holding after-dinner speeches. Its structures probably evolved for something older and more related to the environmental pressures of survival, such as path navigation. Indeed, there are good reasons to expect that language is all about representations of space and time. Bees, for instance, communicate flying directions to one another through “dance” protocols.

From a computational and cognitive perspective, the view that language is more about encoding space and time makes more sense–and this makes exterior language for communication decidedly less special. Exterior communication appears to be only a secondary phenomenon compared to the use of language process to decompose a stream of observations internally, and model what can be said about it in the first place.

An agent with input and output channels is typically “interested in” (i.e. responds to) sensory information within its interior processing, in order to decode the meaning of what it observes outside. This is based on how its perceived environment influences it. Here we can take “meaning” to represent what we intend to do with a state or a signal. If the signal is vague, we should intend little to avoid being driven into spurious action. If it is clear and significant, we can attribute greater intentionality to it — invest in it.

An exterior influence is, of course itself a kind of process that involves signalling and communicating at the sensory “edge” or “interface” between agent and exterior, but the agent can only discern a meaning from such experiences if it has learned from what is inherent in past episodes. Moreover, an agent recognizing input needn’t encode it with the same language for understanding it as it may later use to share its knowledge in a compressed form with another agent. It would, however, need to have analogous structures in all cases to cope with the conceptual composition. Formally, both these interior and exterior processes are linguistic in nature. Language emerges as a network of spacetime relationships between changes of state, whether inside or outside, in order to build an addressable memory of the events and derive concepts about them.

So, we conclude that the primary function of language is not communication, but rather the sorting, encoding, and scaling of data making up the signal. It encodes and expresses distinctions and selections to be discovered, stored, and retrieved for future use. It seeks to measure and compare both short term and long term changes concerning an agent, in its various structures at different scales. The medium for this memorization isn't important. When ants lay down stigmergic trails to follow, they lay out their meaning in the exterior space. When a brain lays down memories to retrieve, it uses the medium of neurons in the brain. Memory is space, and space is memory.

Chomsky demonstrated the close link between language and process by pointing out that the structural rigours of language form a computational hierarchy expressing the level of resource sophistication needed to complete the process of storing and retrieving information. The implication of his hierarchy is stark for those who hold human language to be a unique occurrence. Irrespective of whether other animals express themselves outwardly or not, beyond simple signalling, they must undoubtedly use the structures of language internally, in order to be able to serialize, store, and retrieve the contextual knowledge they encounter about the world within in a consistent timeline. Without language, data are not addressable, not retrievable. It cannot be unique to humans.

Language, then, is a linearized representation of general spacetime processes. It's needed for the normal functioning of a decision-making processor. Without language, no being would be able to think or memorize concepts on any level. Language comes from within, based on patterns from within and without. Signals have both perceived semantics (qualitative content about how to respond to them) and dynamics (quantitative content about their magnitude), so why don’t we ever try to ask what languages machines are trying to express themselves in when we insist on monitoring them so intently?

One reason is that we are stuck on the simplistic idea that computational states are only either true and false.

Reasoning is illogical!

Some years ago, as I was developing the notions of Promise Theory, I realized that the standard story of representing knowledge and reasoning about it, used in Computer Science, was fundamentally incapable of making inferences, indeed it was probably just wrong as a model of reasoning. It's partly because computer science has no formal notion of data/behaviour on more than one scale — data are often caricatured as being just 1s and 0s.

Computer Science has long assumed that logic was the basis of all reasoning. First order logic is, after all, one of the most impressive calculi within mathematics, with wide ranging arithmetic and semantic applications. It has roots going back to Aristotle. But logics, and in particular modal logics (so often touted as being the answer to sophisticated reasoning) attempt to use rigid mathematical rules to determine truthful outcomes. As such they are an inappropriately overconstrained model of reasoning, which is almost useless for adaptation or prediction. Logical rules are the propagating glue between axioms that differential equations are to spacetime boundary conditions in calculus. Such systems are not meant to discover novelty by composition, they are meant to rigidly relate pre-known facts within a framework of constraints.

Human reasoning is very different from logic. What we seem to do, as humans, is to tell ourselves stories about the world (part of our navigation, you might say), by stringing together ideas and episodic memories into partially ordered sequences; then we select from these alternatives the one(s) that give us a contextually satisfying emotional response.

Whereas, in science fiction, emotion is the dreaded opposite of reason, to be purged by all evolved rational entities, far from it: emotion is the ultimate arbiter of our rationalizations–the very criterion by which we decide when to stop asking for further explanation. Are we there yet? Is it finished yet? No? No? Oh fine, just stop! Good enough! I believe you.

Once we are satisfied, or perhaps simply exhausted from the never ending series of Zeno-like incremental questions, we tag the question as being resolved by that story path. Only then are we are willing to claim that we understand the issue. That final emotional arbitration truncates the otherwise infinite regress of questioning. It's related to what we call trust. Trust, after all, is the sense of satisfaction we feel when foregoing the validation of something (not as computer scientists believe a token or certificate that confirms a validation). Trust and understanding are the points at which we stop asking for further justifications. Trust is the “ultraviolet cutoff” that stops us from dwelling endlessly with infinite regress on every issue. We seek a renormalized picture of the world, in which emotion shields us from divergence. Trust is what makes the world of semantics bounded and discrete.

Language, process, and reason go hand in hand

The process by which we string together stories from components is what we represent in language. The stories themselves are (in my view) what we mean by reasoning. [A new book by Mercier and Sperber, The Enigma of Reason, takes a similar view.] Stories are essentially a form of encoded causality capture, with discrete elements to capture semantics in symbolic (compressed) form. The discreteness comes from insisting on clear boundary conditions around the phenomena in the causal process. From there, reasoning is what we mean by causation, and vice versa.

The concepts of language are symbolic representations of change about space and time. They are based on discrete and isolatable symbols or patterns. The patterns are discrete because we distinguish events as separable entities using cyclic sampling processes as their measure. That’s not to say that higher derived concepts cannot overlap with respect to different contexts. Composition leads to new scales, and new meanings. Indeed, any composite idea can always be broken down into atoms, just as unique organisms can be broken down into the same spanning set of genes. Logics try to avoid overlapping notions as scale, and so they fail to be useful tools for conceptual semantics, as they need all the axioms and rules in place from the start. Yet this is what we need to derive!

How then might we learn a language from a stream of data, without knowing the language in the first place? The answer is a bit like code-breaking, and a lot like forensic analysis.

The DNA fractionation method

In a number of papers and blogs exploring the spacetime approach to knowledge and natural language, I showed that a natural approach to extracting meaning from sensory information was to employ the Millikan technique, i.e a technique similar to that used in bioinformatics for DNA sequencing. If one breaks up strings of language, composed of recognizable patterns, into fragments of short lengths, one quickly identifies the alphabet of symbolic patterns as the smallest parts. One also finds longer strings that are its basic composites. The longer the sequences, the less likely they are to repeat and the more significant they could be. Indeed, after a small number (five or six) words for English, uniqueness becomes so likely that repetition couldn’t happen and the composite is once again meaningless. The most repeated patterns tend to be meaningless whitespace or punctuation. The infrequently but still repeated patterns are those of maximum significance–related to the proper names of partial concepts (see references 3 and 4). Things that are never repeated at all may be spurious noise.

We start by looking for whitespace to distinguish words from characters. Then, making the analogy between text and DNA again, we could say that:

Codons or letters are alphabetic characters.
Words composed of these are encoding for definite process fragments or amino acids.
Sentences composed of words form events like gene bindings.
Stories composed from sentences are episodes or protein processes.

Meaning isn’t just one kind of thing on one unique scale. Nature can’t be grasped by only looking at things the same size as an observer. Things are made of smaller things and lead to bigger things. Compositions in space and time lead to stories or histories: i.e. language processes.

Stories have a different kind of meaning than sentences or single letters or words. This idea applies to any sequence of patterns, not just to text–which is why sequential pattern discriminators can be made to recognize any kind of input as long as it can be demarcated and serialized as a spacetime process. This, in turn, is why impartial fabrics such as Artificial Neural Networks can recognize patterns in different kinds of input signal and render them as pseudo-symbols (dimensionally reduced output channels).

All this is old news, even though the full details are far from fully proven. More interesting is how the words and notions of linguistic representation relate to trust–something we all know about. We don’t typically relate reasoning to trust. Instead we might talk about probabilities. Probabilities are a way for obtaining a quantitative representation of symbolic classes. Many scientists feel more comfortable with numbers so they prefer probabilities to discrete classes. This is true in spite of probabilities being very limited in their ability to represent information. The involvement of statistics in average information transmission in Shannon’s work, has further confused the matter by giving only average information a meaning. Physicists frequently wade into this soup of statistical inference, due to its similarity and connection to thermodynamics, instead of looking at the alphabet of information itself as the building block of meaning.

In modern times, probability has sometimes become an excuse for foregoing explanation rather than being a tool for explaining. People seem to trust probabilities because they appear to be a way to make vague issues quantitative, yet we should be acutely clear that quantitativeness is not a promise of meaningfulness. Meaning tends to come from scale, but probabilities hide scales. Quantity (or quantitativeness) itself is simply a way to order things, assuming there is an appropriate scale of magnitudes to grab hold of. The assumptions underpinning scales are all too often suppressed when arguing about probabilities. A major problem with probabilities is that, as dimensionless ratios, they are scale invariant and as aggregates they have no relative order. Only the coarse aggregate classes (like the bars in a histogram) reveal any residual scale. But semantics are both scale dependent and ordered. We need to retain these aspects of process dynamics when comprehending state. For that we need both language and a concept of work.

Finding meaning by work (mining)

Let’s quickly summarize the approach to extracting meaning from a data stream used in references 3 and 4 , meaning or intent seem to be measured as something like work divided by frequency of occurrence (a quantity with the dimension of action).

Intent = length of pattern string / frequency of occurrence

Intent is, after all, a measure of how invested we are in an outcome or an interpretation. The more effort we put into expressing intent, by length, the meaningful it is to us. Notice that frequency is a scale dependent quantity (how often over how long?).

This work can be used to rank the relative importance of language fragments over the scale of a coherent experience. When we use this as the measure of persistent meaning, it leads to surprisingly intuitive outcomes that we can verify using Natural Language Processing as a test case.

The fragmentation of input patterns into fragments (n-grams) allows us to annotate the original unadulterated sequence with its fragments in memory. Those fragments will recur in other experiences and thus mutual associations bind together larger ideas through the “genetic” composition. They come together in a network of associations that form Semantic Spacetime. If we follow these principles, we end up with a connected ordered memory space, addressed by a non-ordered contextual soup of such semantic fragments. It can be further enhanced with dynamical information–when the observed changes are sudden or slow, etc (rates of change). What we end up with is a language of contextual change, analogous to a semantic phase space, and all represented as a network-graph. The graph is a structured and addressable memory.

Memory is not a structureless space, like a Euclidean manifold. It can be represented as a network of Russian doll-like agents, with interior processes that interact with other agents in a protocol of “computation”.

Meaning, like evolving namespaces

Meanings are not true immutable invariants either: they change over long times. As we update our contextual understandings, concepts must evolve from one meaning to another. Just as we find it difficult to read Beowulf, Chaucer, or even Shakespeare today, so we might need to adjust our thinking when looking back at past events in a data system. Snapshots of a database from a year ago might mean something very different from an equivalent snapshot taken today, yet probably no one builds systems to recognize this today. We don’t even know at what rate such semantic changes evolve, because we can’t measure changes of this kind.

As data move from the transitory edge to the more invariant centre of a superagent distributed system (like going from nerve endings to a brain cortex), the rate of change must necessarily slow down in order to scale a consistent understanding of the different languages at the edge into a single common representation for all at centre. The scaling works by summarization of semantics, as in Internet routing.

A spacetime causality cone formed from independent, interleaving sampling loops at the sensory edge.

The figure above (from reference 5) shows the basic structure of a system used to sample data from edge to centre, structuring events into a consistent hierarchy of namespaces, analogous to the process of concept formation in a linguistic analysis. The process of sharing a properly categorized picture of shared state is a linguistic process, even at the most basic level of data collection! Language is not just for telling stories, it’s for building them from experiences too. It’s about transmuting those edge fluctuations into invariant qualities, like turning acting into a movie, or sentences into a story. Language adds to raw data the ability to classify and route information intentionally: to discriminate cases by intent or elimination.

The diagram also looks a lot like so-called publish-subscribe content networks. Indeed, when does a state reporting network in fact become a Content Delivery Network? The answer is, when the contextual doubts about the processes concerned have passed, and we can commit facts to the annals of history. At this point, a separate process can replicate data as a temporal snapshot of immutable state. “Process” becomes part of fixed archival features of a data landscape which can then be relied upon without risk of invalidation.

Work is intent

To measure intended meaning from a data stream, we first need a directional reference. We need to define a meta-goal for making use of data. How shall we define good from bad, progress from disruption? Given such a direction, the more work we do on something in pursuit of that goal, the more invested we are in it, the more likely we are to need the data in the long term and the greater its meaning or significance. Signals that don't line up with this direction are spurious and meaningless, telling us nothing.

These are the economics of data. They translate into statistical properties like entropy over time, but they rely on the alignment of promises or spanning goals.

The interplay between what is represented in environment (around an agent) and what is memorized (within an agent) provides a crucial comparison. It enables a coupling together of processes on different timescales, to read information from the effective boundary conditions we’ve stored as long term memory. Remembering everything equally in short term memory would be to remember nothing at all. Hierarchy and scale offer priorities, just as quantitative scales permit ordering. Value lies in the separation of scales, as I wrote in my book In Search of Certainty.

Machine monitoring and “observability”

So, let's return to the issue of machine monitoring and its missing language. Suppose we think about an agent responsible for the monitoring of machinery, how would we proceed to find a representation of what it sees, find meaning in it, and even learn the language of its signalling? We first need to establish the alphabet of basic symbols, i.e. the smallest invariants of signal. In particular we need to find out what delimits one word (at the next scale) from another. Is there a whitespace or punctuation signal? Is there a fixed framing?

Not all human languages handle the whitespace efficiently either. For instance, in Chinese, proper names and phrases formed from strings of phonetically repurposed characters, are not obviously distinguishable from the characters’ original meanings, so it can take intelligent parsing on multiple scales to decode the intended meaning, just as implicit parentheses at the sentence level are hard to see without punctuation.

To gather agent state from a single source is easy. To aggregate state from multiple sources is harder, because it involves a more complex spacetime language representation. I’ve argued, together with Andras Gerlits in our paper (reference 5 below), that having a generic platform for consistent shared cognitive state is the way to get started with the automation of this communication in a generic way. We designed such a platform in reference 5, in which we write the input language directly onto a spacetime causality cone, like an extended episodic memory.

Why write directly to the causality cone? The more we buffer in local copies, the less causality we retain. Copying into general access memory can be decoupled, as with a brain, retrieving data may take longer by scattering it into silos, but by passing memorization through a consistent cyclic process (as in the hippocampus), episodic integrity can be maintained on a fast timescale at the edge, while long term dissemination and integration for large scale shared recall can be worked out over a slower schedule (e.g. overnight as dreaming) to shifts memories to long term shared storage.

For monitoring too, we first have to discover what is white space and punctuation between intentional events. Then,

The alphabet of transaction encodings in transfer protocol are the characters..
Words are monitoring types and descriptions, names for network locations and times encoded as definite promise fragments.
Sentences composed of these words are composite events, such as contextualized observed events, data samples, alarms and signals, etc.
Stories composed from sentences are episodes of such process histories. These are partially ordered.

This separation is not what monitoring systems typically do. Instead, they collect a raw numerical or text signal, without knowledge of their embedded patterns, and hope to infer meaning from the values, as we might attempt to infer the mood of a family pet from a record of its purring and barking but without context. Because monitoring systems don’t even know what parts of a signal are white space or punctuation, they wait in endless uncertainty on the off chance that some algorithm might one day find meaning in the signal. If only we could talk to the animals!

Our current approach to monitoring hasn’t changed much since the 1970s. We allow machines to express themselves in a crude language of grunts and average sizes, like farm animals, and we let system administrators try to learn to interpret their conditions and symptoms by attaching the analogue of bells and thermometers. The studies we did in the 1990s (reference 1) showed that there is little to extract from these barking noises machines produce–indeed, even less now that the machines are mostly virtual. That situation can be improved only by making machinery speak a more intentional language. If we bothered to build the edifice of intent that we built into machinery, and into the way we observe and decide their behaviours, this could be a more straightforward exercise. However, today, the average monitoring system has no scale, no alphabet, only a quantitative trace or a counter that goes up and perhaps down. The observer is left to guess what the system is expressing from these grunts.

Trusted stories, worth keeping

Trust in observations is a shorthand way of measuring our individual assessments of the work required to develop that essential emotional satisfaction about it. Shared language helps to build trust.

If we believe logicians, trust can only come about by verifying the truth of facts, but this is not really how human reasoning works. We mainly want to feel good about assessments. If checking the facts makes that happen, then all is logically consistent. But we are also quick to replace facts with intuitions that reach beyond the immediate evidence–because episodic memories may suggest possible outcomes that extrapolate from what facts are immediately on the table (so to speak). That which is not trusted is perceived as risk, so risk and trust are opposites in the semantics of process assessment.

In summary, a story is a process that builds trust by composing a sequence of contextual inferences (made on a smaller scale) into a larger one, with a sense of time. This is what language is used for. The process involves cycles of sampling (reading samples frame by frame and providing the clock for the process) and then the application of recognition processes at each scale (as per the Shannon-Nyquist theorem) allows us to commit our best inferences to become the trusted “symbols” of a language–that’s a matter of efficient compression. Each independent process scale typically will have its own process and its own language rules. Just as short and long term memory management are decoupled in brain activity, so they should be decoupled in technology too.

Reasoning is thus based on the trusted cognitive landmarks that we extract and symbolize for long term storage, rather than on threshold values and probabilities. Trust replaces the mathematical notion of probability with a learned amplitude and a work principle. For decoding phenomena whose languages we may not know, as in system monitoring, it makes sense to build up a knowledge over time, rather than starting from scratch each time.

If this suppression of logic seems to indicate that we wouldn’t be able to make valid deductions from data, that would be false. When reasoning about alternative “root causes” (contributing factors), the Promise model illustrates how source and receiver play individual causal roles in interpreting meaning. If we know that either A, B, or C caused X, and we can eliminate B and C somehow, then we are left only with a promise to accept A. It looks like the figure below.

It couldn’t be simpler. The proposals are donor promises in the process, offering the same promise of candidature, while the receptors are decisions to accept these proposals. The receiver can eliminate these based on its own internal criteria.

Coda, Money and Finance

Before ending, let’s note another communication network of interest: money. I discussed this with co-author Jan Bergstra in our book Money, Ownership, and Agency. As a language, money is elementary, but its interior processes may be significantly more complex.

The language of money is also somewhat incomplete compared to monitoring, especially if we view it purely in terms of traditional currencies. The semantics of money get completed with additional information like purchase orders and receipts, that partly document the semantics. In modern times, electronic platforms offer effective encapsulations of money with added purchase semantics, leading to many specialized semantic micro-currencies. Like energy, money is only a quantitative measure of activity; it has no specificity or capacity for expression, except by its currency type. An economic transaction, in full, is composed from the analogous parts.

Characters or exchange symbols are the alphabet of transaction encodings.
Words express orders and descriptions of network locations, goods and services, as well as times, encoded as definite promise fragments.
Sentences composed of these words are composite events, such as contextualized exchanges, purchase orders, invoices, etc.
Stories composed from sentences are episodes such as account histories.

As we create modern monetary systems, clearly intentional stories about transactions, which incorporate semantic state, will be the measure of our trust — not encrypted tokens. We already see a revolution in this area in transportation and hospitality, where loyalty schemes track our histories. For some, this might be an intrusion of privacy, for others it's adaptive contextualization. Loss of privacy is also a feature of cognition. We can't stop others from thinking their own thoughts or remembering what they see, but when they act on those observations, we might not be aligned with their intentions.

Summary

Language is a method for decomposing processes into actionable graphs (reference 2), from which we can identify regions of similarity that take us from mere patterns to concepts. Its more about representation than communication. It's tied to cognition from edge to centre.

The state of the art for capturing monitoring and commerce data is far from this vision. It relies purely on ad hoc patterns of activity, treated as grunts and whistles, or at best as landmarks “proper names” taken on trust. The significance of what is communicated isn’t clear because it has no well defined language, and therefore has no inherent meaning. The basis for monitoring distributed systems at scale would be to build a platform supporting spacetime causality for data, collected within the constraints of edge processes. The longer trends from a wider area, interleaved across all the sensory channels form a different kind of story, with different concepts. All this could be managed in realtime (see reference 5) with less time wasting than today. For now, let’s simply underline the link between trust and the interpretation of data, in order to encourage the development of better systems that report their measures in terms of proper languages: languages that converse in the goals of the system and reflect our ultimate assessments of satisfaction.

References

Measuring system normality, ACM Transactions on Computing Systems 20, p.125–160, (2002)
A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning (2017)
Testing the Quantitative Spacetime Hypothesis using Artificial Narrative Comprehension (I) : Bootstrapping Meaning from Episodic Narrative viewed as a Feature Landscape (2020)
Testing the Quantitative Spacetime Hypothesis using Artificial Narrative Comprehension (II) : Establishing the Geometry of Invariant Concepts, Themes, and Namespaces (2020)
Continuous Integration of Data Histories into Consistent Namespaces (2022)