Universal Data Analytics as Semantic Spacetime

Mark Burgess
9 min readAug 16, 2021

Part 2: Setting up the Tools

I confess to having a love and hate relationship with tech. In an ideal world, one could skip this post, because software would be easy to get started with, and the answers to simple questions would be easy to find — but that’s not always the case. As a researcher first and technologist second, my patience for geeking out over someone else’s tech is limited. I want to explore ideas by playing around on a laptop, free of the need to pay for cloud servers, or install container runtime environments with all the associated overhead. There are two things we need, in particular, to be able to play with Semantic Spacetime models: a programming language and a flexible data store. In this post I’ll show how to set up those.

Feel free to skip this episode if you already have ArangoDB and Go up and running. Also please note that code formatting in medium is rather stupid so the rendering might need some interpretation!

Programming language

Although I had imagined starting with popular Python to explain the concepts of semantic spacetime, I’ve instead chosen the ugly-mug language “Go” (or Golang), both for its speed and simplicity, and for its mature language driver. Having a stable and mature driver is a big issue in database usage, because otherwise databases are a pain to work with. Go could hardly be called pretty, but it has several advantages over Python, and its ad hoc and opinionated rules, which will infuriate from time to time, are tolerable. Go really has what you need, at your fingertips, so it’s a good investment for speed and efficiency.

Data store

Next we need a convenient data store, or some kind of database — one that doesn’t require a three year training certificate to operate. After playing with numerous implementations of Semantic Spacetime over the past decade, using MySQL, MongoDB, Postgres, Neo4j, and eventually plain filesystem implementations, a former research student stumbled across ArangoDB, which suddenly made difficult things easy. ArangoDB has therefore become my database of choice for a wide range of things. It’s a multi-model database system, which works as a key-value, document, and graph data store — for flexible storage, so we can choose what’s best without having to learn multiple technologies. It also has quite good language support. In fact, Arango seems poised to become much more than a database system — it’s being extended into a pretty powerful integrated data processing platform, with nascent support for user extensions, processing algorithms and “Pregel ‘’ graph flow computation models — executed across clusters. It feels like a good investment, even for someone of limited tolerance for tech features and details like me.

In what follows, I’ll assume you’re using Linux or a Mac with a shell environment. I don’t expect Windows to be a problem, but that’s not my world. Linux is by far the simplest environment in which to do research programming.

Installation

So we have two getting started challenges:

  • Installing Go(lang) on localhost
  • Installing ArangoDB on localhost

Both of these can be downloaded as easy-to-install packages. Just follow the “destructions”! We just start by searching the Internet for those packages (they don’t seem to be available through the Linux distributions). You can find the Go language here with steps to install:

https://golang.org/dl/

After installing a package for your operating system, you need to set up some things in your environment so that you can forget about golang for the rest of your tortured life. One less thing to fret over.

You’ll need a command window (shell). Then create some directories for the Golang workspace. These are used to simplify the importing of packages.

% mkdir -p ~/go/bin% mkdir -p ~/go/src% git clone https://github.com/markburgess/SemanticSpaceTime% ln -s ~/clonedirectory/pkg/SST ~/go/src/SST

The last step links the directory where you will keep the Smart Spacetime code library to the list of libraries that Go knows about. You’ll also need to set a GOPATH environment variable and add the installation directory to your execution path.For Linux (using default bash shell) you edit the file “~/.bashrc” in your home directory using your favourite text editor. It should contain these lines, as per the golang destructions:

export PATH=$PATH:/usr/local/go/binexport GOPATH=~/go# Set a short promptexport PS1=”mark% “

Don’t forget to restart your shell or command window after editing this.

Since version 1.13 of Go, big changes have been made (and are expected to continue going forwards, sigh) concerning “modules” design. Unless you know what you’re doing, disable modules by running:

% go env -w GO111MODULE=off

To use the Go Driver, download it

% go get github.com/arangodb/go-driver

Testing Go

Try writing some simple programs in golang to learn its quirks. The most annoying of these is the forced placement of curly braces and indentations.

The full code for this example. Notice how functions (like fmt.Println()) in Go are called by objects with a dot notation:

Variable = object.function(ctx, ….)

There are also two ways to define new variables. In Go, assignments to new variables can be made without understanding types using the “:=” assignment operator (to define a new compatible variable), instead of the “=” (where variables have to have been defined already). It’s a nice shorthand once you understand the types, but it’s a barrier to learning if you don’t — and you’ll need the types when passing them as parameters. The online documentation for the driver uses this shorthand a lot making it hard for a Go/Arango novice to see what types are being referred to. So, in the examples, I’ve tried to make these explicit.

Go also has functions that can return lists of values (separated by comma). If we don’t want to receive a value, on the left hand side of an assignment, we use “_” to reject it:

var links A.Collectionlinks = graph.EdgeCollection(nil, “Near”)// var exists bool omittedexists,_ := links.DocumentExists(nil, key)

ArangoDB

Next, the database. In case you were wondering, I didn’t choose Arango to match Go, though the two work quite well together (forming AranGolangs, which are apparently king of the programming jungle).

You can run Arango in the cloud, but having it on your laptop or standalone machine is a lot faster, and by far the easiest and quickest way to become familiar with it and to develop techniques. Unless you have very large datasets, or you’re starting a global service business, there’s no reason to go to the cloud right away. That’s in keeping with my general advice to “never send anything over the network if you don’t have to”.

Download the community edition of ArangoDB to learn the ropes.

https://www.arangodb.com/download-major/

After installing, there’s a hidden script to run called arango-secure-installation. Then there’s the startup script on Unix-like Linux which could be added to system startup called arangodb3.initd.

Following the instructions on the download/installation page, for me (using my preferred OpenSuse Linux) this looked like this.

Sudo will ask you for a password the first time you use it — you give the root password to allow installation. No drama, except to locate the startup files, which fall outside the regular command path (as they require root/sudo access). To start the database simply run:

% sudo /usr/share/arangodb3/arangodb3.initd start

If you’re upgrading to the latest from an existing installation, you also need to run

% sudo /usr/share/arangodb3/arangodb3.initd upgrade

You should now be able to connect to the running webserver for Arango, which provides a nice management and visualization interface. Now type this URL into your browser:

http://127.0.0.1:8529

or simply

http://localhost:8529

into your favourite web browser (localhost or 127.0.0.1 ior ::1 are the local non-network addresses of your computer, without venturing onto the Internet for communication, and 8529 is the default service or port number for Arango, to distinguish it from other services like the web on port 80 or SSH on port 22, etc).

To access databases on other hosts, you would use a network address just by replacing the URL. For example, ArangoDB offers rentable cloud instances through its Oasis programme. Then you might end up with a URL looking like this:

https://5a8db345269f.arangodb.cloud:8529/

The Arango browser interface

When you connect to the Arango user interface, there are some quirks to finding your way around. In the login window, choose username “root” and the password you set in the secure script above. Then you are dropped into the system database.

Like most databases, Arango uses itself as a management system, so there is a master database called “_system” (named with an initial underscore in typical developer style). This is where you go to see an overview of all databases registered, and delete unwanted databases. Deleting is something you can expect to do a lot in data analytics, because we use many databases as part of a computation rather than as an archive of record. You can also use the arangosh shell, which is useful for testing too.

Once inside the system database, note the red circled items in the image below. Arango’s opening screen shows a number of monitoring traces to show that the database is alive. Click on the left panel database item to show the currently registered databases. This is where you can delete databases too, by clicking on the “settings cog-wheel” in the top right of each panel item — shown as a red circle in the centre of the image below.

Figure 1: screen shot of the Arango web interface, system database

For some reason, clicking on the database boxes themselves, in the list, doesn’t enter and change the default view to select them. Perhaps that will change in a later version. To go to the database, go to the top right (red circle) and click to select a database from the popup menu. A bit quirky, but once learned, simple enough.

With the database and language installed, you can try clicking here to create a database, or running a simple program to create a small graph database.

Go and Arango together — pretty powerful

We haven’t explained how to write golang code to access the database yet. That’s coming in future posts. You can still try this simple test program to create a database, even without understanding it. As you see, it looks much too complicated and messy — a lot of work for very little! — so our goal (early on) will be to simplify it with a layer of tooling that’s more suited to research programmers than software developers.

The full code for this example is here so you don’t have to paste.

This example code (above) shows how the raw API is a developer-centric interface. As you can see, it wasn’t designed for casual users, there’s a lot of “fuss” to manage that developers need to access a web-based service interface. As causal researchers, we can do a lot better to simplify the process of working with Arango. I’ll come back to that in part 4.

Meanwhile, just copy the example into a file called, say, graphAPI-1.go and then run the code by typing

% go run graphAPI-1.go

Run it once, run it twice, run it thrice, and see what happens. The first time, it should create a database called “my_test_graph_database”. You can verify this by going to the database list on the web. The second time, it will fail because the database already exists. This is the kind of behaviour that developers design. It probably feels a bit brutal to a researcher, but these are the issues I want to cover in this series. Building up a comfortable and easy interface to these tools is straightforward.

Try deleting the database manually in the browser and running the script again.

Then, switch to the database you’ve created (my_test_graph_database) and click on the following items in the panel on the left hand side of the screen (see figures):

  • Collections
  • Links (edges)
  • Nodes (vertices)
  • Graphs

There! These are the main action items you’ll use in daily life. The screen shots below show what the interface looks like for simple management tasks.

Figure 2: Change to the graph database and see the nodes and links. Under Collections, you find nodes and links (vertices and edges). Click to view.
Figure 3: Under Graphs, you find the graph model derived from the node and link collections.
Figure 4: Clicking on the model reveals a simple visualization of a few nodes. To get the full set, you have to click on the circled top right.

Node that the default graph view only loads a few nodes (a default number that can be configured, as rendering is time consuming). Click on the icons top right to see the whole thing.

That’s it! With these preparations, you shouldn’t have to think about these details again. At worst, you might have to restart the database by rerunning the start script if you reboot your computer.

In the next part, I’ll discuss multi-model data with graphs and documents in a familiar setting.

--

--

Mark Burgess

@markburgess_osl on Twitter and Instagram. Science, research, technology advisor and author - see Http://markburgess.org and Https://chitek-i.org