From cognition to understanding

The hard problem of context, and the meaning of tidiness in knowledge representation

21 min read2 days ago

By way of introducing this year’s SSToryline software tool project, this article revisits some theoretical material I discussed in 2017, concerning the somewhat evasive role of context in knowledge representation. (It nearly always takes 10 years for an idea to catch on, and to find a little money to take the work further, as an Open Source project.)

The idea behind the Semantic Spacetime project is that storytelling lies at the heart of all the difficult issues in intentionality and cognitive behaviours — even, ultimately, computation too, because there is a semantic mapping between reasoning and the most basic processes that unfold in space and time.

Consciousness may or may not be "the hard problem" of knowledge, but context is surely one of them! If that sounds glib, I could add that the two are perhaps not as far from one another as you may think. But we shouldn’t get ahead of ourselves! This is not an article about consciousness. It’s about a software project to represent knowledge in relational form for the benefit of humans.

The main purpose of this article is to discuss how an agent can address and retrieve memories (technically) for certain incidents or topics, without having an idealized and perfect theoretical index. Ironically, this is something I find very difficult to do myself. I am not good at random access retrieval: people’s names, disconnected facts, etc. On the other hand, a visual or musical cue will trigger instant recognition for me. I can spot a book or an LP record from a thin sliver of colour amongst a hundred others, and therefore I have never had any need to alphabetize books or records. Why is it so hard to remember simple names, when I can sing entire songs I learned as a three year old? The answer would seem to be a lack of context.

Promise Theory has something to say about trust and reasoning. I’ll start with a quick review of what I’ve already written in the past about Semantic SpaceTime (SST) and knowledge representations; then, I’ll try to say a few things about how “context” may work as an indexing mechanism. I say “may” because this is still an actively unsolved problem (god will be in the details), but this is the approach we take.

What is knowledge anyway?

Knowledge, as I’ve said many times before, isn’t quite as we’ve come to describe it in everyday life. It’s not about encyclopædias; it’s not a transmissible commodity. It’s not something you find in books or on computers. It’s not lines from Shakespeare, or verses of ABBA that you’ve only memorized. True knowledge is something that can only exist in your head when it becomes something you know like a friend.

We don’t expect to know someone after finding their name in the phone book, but we sometimes pretend that we can know something by quickly scanning the information on Wikipedia!

If this reminds you of the social brain hypothesis, it should — indeed, I worked on that last year with Robin Dunbar. Being able to repeat information as gossip isn’t knowledge either: you need to be able to tell a convincing story about it yourself to make it yours. On the harddisk or on the web information isn’t your knowledge, it’s a modern cave painting. In a book, it’s merely the offer of a story about someone else’s knowledge.

The key to knowledge lies in “work” or effort.

At university, we used to say that photocopying a journal paper was as good as having read it. That’s sort of how we’ve come to treat knowledge in IT too. As long as data are stored somewhere for future access, we think we “own” them and therefore we “know” them. In truth, the process of learning is more like this:

Scouring dictionaries and directories of facts for something relevant.
Asking someone else who knows where to find information.
Investing in articles and longer expositions of related facts and ideas.
Reading books of curated information, telling stories as particular overviews .
Dipping into databases and webs of knowledge for random access lookup.

Knowledge, when it comes down to it, is something you learn (over time) to know like a friend. It’s a familiarity deep enough to be able to tell someone about habits, flaws, when a thing would typically manifest or not, how to use or avoid it. How things behave and therefore how we can know them are deeply contextual. If you don’t put in the effort to know something, it will remain a stranger to you.

The IT (Information Technology) industry, ever eager to standardize the “good enough” by committee, has officially done little to distinguish between data and knowledge. The closest it comes is to define knowledge graphs as a W3C standard for the Resource Description Framework, and the Web Ontology Language (OWL) . These half-cocked standards of the XML era (children of the last century) are typically superficial in their desire to standardize quickly on saleable tools–over compensated by a hopelessly technical schematic bureaucracy for data classification that users are expected to follow to little effect. If you enjoy book keeping or tax returns, you’ll love the standards.

What this means is that most efforts to represent knowledge as networks of information are both very expensive in terms of time and effort and that the end result is almost worthless in the end. Of course, we bureaucratize the joy out of it too, with type systems and schemas in a variety of languages mostly devised before anyone knew too much about the challenges of scaling. There’s nothing like a bit of multi-entry bookkeeping to keep people busy, but all that is missing the point. One way or the other, the IT industry simply wants to see knowledge as a shorthand for “database” — because that’s, after all, what you can sell. Well, it may still be possible thanks to our SSTorytime project.

Where does this leave us? Most people find it glib to call knowledge storytelling, because we think of stories condescendingly as something we only tell children, but narrative exposition is at the heart of reasoning–even in formal logic. We should think again.

How we curate knowledge: it’s a garden not a warehouse

I’ve argued that factual statements and their interrelationships can be represented more intuitively, taking advantage of their “spacetime underpinnings”. What on Earth could that mean? It refers to the idea that our abilities to i) sense, and ii) comprehend what goes on in the space and time around us are surely at the root of all our cognitive and learning abilities. Every advanced concept is ultimately built by metaphor upon metaphor upon spacetime description. This isn’t just my own idea, others have written about it in linguistics before (see the writings of Guy Deutsche for instance).

There were no web pages or books to read when life developed. Eyes, ears, and cognitive brains could only have evolved to respond to the things around and about organisms. Our modern skills must have been co-opted from the outgrowths that evolution served us to navigate through oceans and forests, as well as to remember where we put our thumping stick, which kind of barnacles or mammoth had the best coat for a winter collection, and so on. In other words, we should think of all reasoning as a navigation through some real or imaginary space and time. This is certainly reflected in the way languages have evolved (as I’ve written at length in Smart Spacetime and In Search of Certainty).

Promise Theory identifies four basic types of spacetime relationship to build on. To begin with, we should stop thinking about “things”, and rather think about “processes”. In that light, everything we encounter is a passion event. Some agent, with cognitive capabilities, can then decide four basic ways to think about events:

SIMILARITY — roughly the same kind of event
LEADS TO — one event after another
CONTAINS — one event is a part of another (at the same time)
PROPERTY — an attribute or property of a something

In spacetime language these correspond to equivalence/distinction, temporal structure or composition, spatial composition, and spatial expression of recognisable attributes. The origin of these may be found in the work of Semantic Spacetime.

“AI” doesn’t make all of this redundant!

Recently, knowledge related issues have been dominated by so-called “AI” (artificial intelligence, artificial distillation of intelligence, etc, call it what you will). AI tools are about taking the mental processing away from humans and replacing that effort with a generated response. They do this with elaborate spacetime models called artificial neural networks that are pipeline processes. They generate beautifully fluid language output that hopefully resonates with the reader. However, in doing so they take the work out of the hands of the reader. Again, even if the information is correct, this generated output is not knowledge, because it doesn’t come from you. In a court of law, it might be called hearsay, because you are not a witness to this account, nor can you attest to it with a hand on your heart.

As a counterpoint to this end-run around human knowing, the Semantic Spacetime STToryline project takes the view that we can and should better understand knowledge processes by helping humans to manage them with machine assistance: to curate, remember, and be able to bolster fading memory abilities by representing it in a way that is both meaningful and automatable. A typical process for learning, might be something like this:

We take notes, directly from observations, or indirectly from hearsay.
We look over the notes and try to make sense of them by putting things in order according to some kind of intuitive model. Tidying is an important way of simplifying and generalizing–sometimes confused with ontology.
We revisit our notes and change our minds as time goes by, editing and modifying–usually just in our heads, but with more discipline on paper or computer too.
Finally, we review, revise, and rehearse with various audiences to turn these scraps into an integrated knowledge that we can tell stories about. This is the “muscle memory” that engages intentional effort to remember.

It’s crucial to understand that nothing becomes knowledge without doing your homework! It’s like a garden, it won’t grow unless you tend the information. Archives are graveyards where knowledge goes to die. Keeping knowledge alive is an ongoing process, and the quality of the harvest depends on the care you put into it. Moreover, principally the garden is in your head–your notes only give you some secretarial reminders. The sense-making and recall of it is up to you. You can’t hire an AI or a robot helper to tend and water your brain for you, any more than a robot can help you to learn the piano, or high jumping. It might be able to do those things better, but that's not your hard won understanding. If you try to cheat, you just write yourself out of the story.

The bottleneck of knowledge assimilation is the brain and the speed of our perception, so it doesn’t help that “AI” can feed information to us faster if we can’t absorb it faster. If the result is meant for us, it takes as long as it takes. This should tell us that the useful role for AI in a human world is to handle coarse grained mass-impact services where human attention is unnecessary, rather than circus tricks to imitate human creativity.

The final step in knowledge curation is perhaps to understand something. We say we understand something when we can tell an emotionally satisfying story about it: a sufficiently satisfying narrative to join the dots between the patchwork of assertions we’ve collected. When our story is watertight (to our own satisfaction) and emotionally satisfying (to ourselves), we feel we understand matters. Everyone has their own threshold for making this determination. Often, the story has to form a tidy chain. Why should this be our criterion? Probably, the answer is because when you need to walk through the forest, there’s no magic transporter, you have to walk every step yourself. At least not yet–that can only be built later through social knowledge and cooperation on a scale larger than an individual. But, in the end it’s about the telling of stories.

But why?! Explanations!

“Wait!”, you say, shouldn’t a story be actually true to be called knowledge? Let’s have none of this “emotionally satisfying” nonsense!

Ideally, of course, the answer is yes! If only we can make that determination. But, we’re rarely in a position to be able to determine truth in any watertight way. In the end, we settle for trusted hypotheses. Our emotions act as shortcuts that prevent us from going into endless recursive depths of asking “but why??!”, allowing us to say: enough! They provide the acceptable end state for a story.

Even our revered mathematical logic can only try to connect dots in terms of things assumed. Indeed, objective truth is probably a phantom aspiration in most cases–something that doesn’t actually exist in a reliable way. In the end, it’s about a version of things that we choose to trust or believe in. The process of understanding, or getting our story straight, is this:

We create a storyline between the facts as we see them.
When someone asks, but why…(dammit) ? Then we extend the story to add a satisfying answer.
Eventually we are exhausted by “why” and we choose to trust that some arbitrary starting point is believed to be true. We call that an axiom in mathematics. This is even true in the most rigorous science.
Our “understanding” is then what we call the pathway or pathways we have found through the forest of information. It is precisely the stories we are able to tell others about what we now know. Like it says on the exam: “in your own words”.

Of course, the first stories humans told might have been (like bees and wolves) about where to find food, or how to get home, how to recognize dinner and avoid becoming it. Being able to recognise patterns and storylines in nature is the likely origin of language and reasoning. And what’s interesting is that all these things have their origins in descriptions of what happens in space and time.

Strategy: from notes to stories

The upshot of this view is that, with a simple-minded computer representation of notes and relationships (the N4L language) we can turn notes into processes for explanation and recovery. The language builds on sketching out partial fragments and joining the dots. From those dots, one can assemble stories in different ways algorithmically–thanks to what we know about principles of spacetime structure. For example, suppose we were trying to learn about brains. We start by reading or asking around, and we jot down some notes that seem important.

A rough, off-the-cuff set of notes from reading a book might start like this…

And so on. In this form, the language is easily turned into a searchable structure, because each parenthesized relation has to map to one of the four types of spacetime semantic distinctions mentioned above. The difficult part lies i) in making the notes accurate and expressive, and ii) deciding which if the four types is which. It takes a few hours to learn, but these guiding principles make the apparently difficult matter of modelling quite easy. That’s why no off-the-cuff attempt will be useful in the long run. It takes time and effort to get semantics right. Finding the correct relations and understanding which of the four types they actually belong to is the very work that will help us to understand.

The spacetime structure of events starts with the here and now, and expands causally into a future cone of related events. In physics, we call it the "light cone".

Example

In computer programming, we used to love flow diagrams. Well, here’s what that could look like.

A flow chart is a rather simplistic graph, just as logical algebras are rather simplistic ways of thinking about transition systems. The diagram above would look like this in N4L:

The N4L representation of the simple flow chart above.

This sort of distillation is just what teachers and writers do, in practice. It’s not a job that everyone will find easy, but that’s why software can be an assistant in the process. We could imagine the “AI” tools helping us to learn instead of taking that learning away from us.

The role of space and time in cognition

To summarize, everything is an event and events are joined into stories by one of four types of arrow. The arrows that connect the dots therefore have many names, but they all fall into one of the four categories mentioned:

SIMILARITY — a degree of equivalence
LEADS TO — a causally ordered relationship
CONTAINS — is one thing a part of another?
PROPERTY — a descriptive orexpressive property of a node

Let’s take causality (leads-to-ness!) first, because this is the foundation of all description about process and change. The figure below shows some examples of causal arrows and their interpretations.

Causal linkage has many names. Here's just a few.

Causal arrows have roughly transitive semantics. If A is before B and B is before C then A is before C. So they have consistent directionality. A similar approach can be taken with spatial encapsulation. Rather than “next to” we think from an observer viewpoint about “within or without”. Containment is also a transitive notion.

Taking any region of space and time as an event, we can describe what that event expresses or does through its properties. This is not transitive.

Attributes or properties of some agent, something or someone.

Finally, similarities and differences assessments come under the semantics of similarity. This is generally the first thing natural science rushes, quantitatively, metrically, and prematurely. Even today, most mathematical systems from statistics to AI measure likeness using pythagorean geometric distance in some artificial space rather than deciding on a criterion for separation. We liberally transform semantic distinction into continuum measurement as a way of computing it. This is what “AI” does, for instance. It leads to threshold ambiguities.

The issue of similarity or equivalence is a subtle one that we overuse in science.

So this is the spacetime approach, in brief. Hopefully, you get the idea… How do we know how to label the arrows? Perhaps we don’t, but that determination is also part of the journey. With a little experience, it’s not hard to learn.

Example: Belonging

Some relationships can be tricky to fathom. The semantics of ownership, for example, are not completely unambiguous. Suppose you want to say:

The bracelet “belongs to” Martin

Is the bracelet a property of Martin or a part of him? As an object, we might choose to make this a part the “extended space of Martin”. There is no right answer. You can choose what works for you in a given context. The pragmatic difference between the two is how they are eventually searched, as part of a process. Again, we are taught to think in terms of “things”, when we should in fact be thinking about processes.

If we interpret the bracelet as “a part of” Martin then we can also say that the bracelet contains a diamond and thus the diamond is also a part of Martin, because “part of” is a transitive relationship. But if we say that the bracelet is just something that characterizes him, it’s not clear that that is transitive because a bracelet may be characterized by being golden but this does not imply that Martin is golden!

You might make the wrong choices about things initially, but it’s easy to change your decision because the definition of the relationship is made independently of all the data where you use it. You’ll figure out the bugs in your wordings as you go, and it’s precisely this reworking that is learning.

The usefulness of a language interface becomes clear now. It’s much easier to edit your notes than to maintain a database.

The 80/20 split? context!

So far, this is straightforward enough, if not exactly easy. But we still haven’t addressed the hard question of determining the relevance of the information we’ve collected. We can easily turn it into a giant web, but where does one start and where does it end? Apart from a brute force text search, how can we find things related to other things that respect what we’ve noted down?

The trend in modern computing has been to go for “big data” — pushing for more and more evidence to find what’s right. But this is the opposite of what our emotions brains do. Rather than seeking an early end to toil by coarse grained approximation (less is more), big data enthusiasts argue that more is more. No expense spared!

Here we seek the opposite of big data. We want data to be cheap, small, and relevant. Ironically, this implies that the context for determining relevance of knowledge has to be larger than the key knowledge information itself. One of the findings of the project, predicted by the Promise Theory, is that knowledge seems to be dominated by the search for its relevance. This helps to explain why modern “AI” methods get so expensive so quickly too. Most of the data and computational expense are actually to do with contextualization. And therein lies a lesson here about the nature of reasoning, glossed over in the narratives inherited about logic: science seeks “suitably idealized simplifications” in its strategy for distilling principles. That eventually stokes a tension with the extensive nature of knowledge itself, but it summarizes the human approach. We are looking for emotionally resonant summaries, trusting that details can be expanded and validated on a social level, as part of a larger process. This is the scaling of knowledge.

We use the term “context” in several ways.

Strategic indexing context*: When taking notes, we use headers and headlines to describe a context of “aboutness”: keywords, perhaps in a phrase, that describe what we’ll find in the passage. This is a strategic use of context. It says: my strategy was to put this here so you would find it, if you looked up these keywords. Or, what index item would I look to when searching for this later.
Scene description: thanks to our senses, the context in which we think of something may depend on a complex web of happenings (of belonging and causality) that’s ultimately summarized by a sort of snapshot of the state of the scene that we think of as context. The fullest possible description of context is thus a background story for the moment. Think of forensic investigators solving a mystery. They assemble context as factual descriptors, causal motives and how all of the above come together.
As our current state of mind: the short term cache of memory of what is going on around us, keeping track of relevance.
Our current intent.

We might imagine a kind of Scene Description Language to be the ultimate goal of describing context. But, pondering this for a while leads to the realization that this imaginary Scene Description Language is just a form of the semantic spacetime that we are developing under N4L. It’s not a different thing, just a different part of the same model. One region connected to another. What distinguishes context is the way it’s used. For instance:

Martin was there. He knew Sally and came to help her carry a heavy box of wine. Because it was heavy, he left it on the table, which collapsed and broke a bottle of ammonia. The ammonia gave several people breathing difficulties, and one of them had a heart attack…

This kind of description, leading up to some key event, requires all the spacetime categories of representation: what followed from what, what was inside where, what properties did items have, who was close together, etc? From the perspective of an ontology about death or murder, all this is very far removed from anything one might write under that heading. In an ontology, one might write: stabbing, gunshot, etc, nothing to do with carrying boxes of wine. So in practice, the details that form the context of a murder could be entirely unrelated appendages that seem far away in a knowledge graph. Nevertheless, when we hear them, they will trigger an association with the murder. Just as the suggestion of pizza might trigger murder by some association with an episode of the Sopranos. Our minds are not tidy or logical, they are associative.

A large scale knowledge structure for a crime scene may involve many separate regions that originally came from different contexts. How does out new context find its way to wire up these in the present?

But context is more about scene description. For example, think of any forensic investigation from your favourite TV show (CSI, Silent Witness, Agatha Christie, etc).

In IT knowledge systems, the concept of a formal “ontology” has found unmitigated popularity amongst researchers and was turned into a simple-minded logic as OWL (cynically, because it’s easy to write papers about perhaps). This is the idea that there is a kind of spanning tree of correct categorical meanings for knowledge, which is part of the simplistic thinking in IT. Unfortunately, this idea has been shown to be flawed, if not actually false, many times. Taxonomies and ontologies are merely spanning sets for naming regions of a network. Yes, they are attempts to put things into boxes to keep them tidily. But the boxes are not mutually exclusive (is a duckbilled platypus a mammal or an egg layer?). The imposition of a complete logic onto incomplete knowledge is a poor strategy at best for organizing.

There are always many possible spanning trees, or interpretations that classify meaning. When we tell a story we might start at the beginning and follow chronology, or we might start with the outcome and work backwards. We might hop back and forth atemporally in order to discuss the relevance of the parts to the whole. The context is not a spanning tree.

By following trails of thought, we are assembling a trail of prerequisites that our brains, evolved for navigation in a landscape, can understand. If we fall prey to the conceits of perfect logic we will tend to over-constrain information so that it becomes impossible to find without the precise criteria used to store it. This is probably the most common mistake in using tools like RDF with OWL (the web ontology language) as these are based on first order logic. Logic ends up working against us, because we need to reproduce the precise lookup key to find what we’re looking for — -and we might not even be clear about what we’re looking for! The main benefit of memory is creative composition of ideas, by “mixing it up”.

Emergent reasoning: no quick fix

The set of relations we want to express in a context feels hard to imagine in the moment of trying to write it down. Think of trying to write a police statement after something has happened. You might be trying to direct the account of the details towards some leading text, instead of simply describing what you know, because you don’t see how those details are related to the outcome (yet). In retrospect, things are different. By working through our notes, and reworking them (refactoring the code, as programmers say), we gradually refine a tidier story in terms of symbolic actions and events that tell a pleasing story,

Let’s be clear, a pleasing story is an emotionally pleasing story, not an algebraically correct story. The language one finally ends up with becomes a kind of specialised language of similar concerns. Motive, method, culprit, victim, etc. One extracts patterns by extreme processing of the information to distill its essence. Over time. This helps to compress the information and make it easy to pass on as language.

Other processes include evidential reasoning in a court of law, with a final emotional interpretation by a trusted judge, or the telling of many different stories about a phenomenon in research journals (called theories or models) which are eventually selected by peer reception. Evolution works by statistical persistence over time in a messy environment. These are all valid process models, but if we try to shoehorn a particular scenario into an inappropriate process (as computer programmers sometimes do with their data models), one can easily end up in a dead-end scenario.

Knowledge has many elements across different scales that we try to sew together. From sensory information about the world, we abstract data and embed it into stories.

Wrapping up the story

The uncomfortable part of knowledge representation is that it won’t be clean and tidy. That tidiness we seek requires a lot of work to distill down to a specialized language. Thus language (not logic) is key to the process of summarizing and reworking knowledge–by tidying, reworking, forgetting and purposely deleting facts can we emerge with a coherent story.

Whether it’s spoken or written or not, the distillation of events into a symbolic steam by coarse graining and pruning of references is a kind of algorithm for memory management. Those of us who write to understand learn to do this as a process (a job even!). Teacher and pedagogues too are adept at writing stories or threads to communicate ideas in a resonant way. There is this a social level to knowledge, We tend to think that knowledge is only within us, but in fact its traces and cues are all around us. Most of the memory we rely on is outside our heads, as we’re guided by places, roads, things and processes.

The search for meaning, in this sense, is a classic search for root cause from symptoms at the bottom to some high level matrioidal event that encapsulates everything subsequent to it.

We learn from the bottom up, but we communicate typically from the top down. We speak in generalities first as an economic strategy, hoping that we can avoid having to explain the details, but trusting that they can be filled in if we need to. Language after all is a one dimension steam of symbols that summarises a timeline section through a complex history. Yet we are adept at building a picture based simply on narrative. It’s not as easy as we imagine, and the truth of things gets mangled without careful as analysis, yet language is powerful at triggering sympathetic response and understanding in others

The difficulty comes when we want to remember. Where do we start? Where in the network do we jump into the story? Most likely there is a parallel activation of many possibilities until we settle on one. Our consciousness has an attention mechanism that feeds us one story at a time (except perhaps those who have multiple personality disorders?) Perhaps this is the origin of conscious experience–as a necessary timeline formulator to make sense of the memory and experience.

To summarize, perhaps the deepest and hardest part of knowing something is this role played by context: the web of historical circumstance that illuminates the many changes that happen around us and which classifies each moment in our memory in such a way that we can make use of that information in future. Knowledge is fundamentally actionable. This shouldn’t surprise us: after all, the most likely reason for our advanced mental capabilities surely lies in an outgrowth of the skills needed in the animal kingdom, to navigate routes through terrain and recognize friend, foe, and food.

Please stay tuned to this project, and do play around with the software. I'm looking for volunteers to use and abuse it! https://github.com/markburgess/SSTorytime/tree/main