Wikis as Multigraphs of Text

Where I explore designing a type definition of a Wiki, without code this time

I have a lot of wacky computer science ideas, more of which are going to be appearing as other blog posts in the future. This is the most recent one, and therefore, the least pickled.

What is a Multigraph, anyway?

This is a concept in Discrete Mathematics, and it is a more general form of a Graph. So, what is a graph?

I think that the best way to explain these kinds of graphs is to use a different term: network. Most people have heard of a network. A social network includes you and all of your friends and family, and all of their friends and family, and so on and so on. I’m going to borrow a picture from Wikipedia.

Imagine that each circle with a number is a person, and each line between them is a friendship. That means that 1 is friends with 2 and 5, while 6 is only friends with 4.

However, since graphs are more abstract than groups of friends, we can use them to represent many different things. And there are different kinds of graphs too! The multigraph is where the edges (those lines between the circles) can be different kinds of edges. In the picture below, they are different colors.

A multigraph with three kinds of edges: grey, red, and blue

What I am theorizing is that a multigraph can represent wikis themselves.

How are Wikis Multigraphs?

Graph of Hyperlinks

Wikis have hyperlinks between documents. Here is a hyperlink to the Wikipedia page on hyperlinks. We can represent this kind of link between one bit of text with another in a graph. Let’s take a look at that first picture again:

Now imagine that each of those circles represent a Wikipedia page, and that the lines between them represent a hyperlink.

The Outline is a Directed Acyclic Graph

A Directed Acyclic Graph (DAG) is a type of graph where all the edges have a sense of direction (directed) and there are no loops (acyclic). A good, real life example of this is a family tree (from Wikipedia again).

This DAG can’t have loops in it because people cannot be a parent and a child to themselves, nor can you be your own grandpa (biologically anyway). The directed nature of this graph can go either way: you can say that the arrow points to parents, or to children. Typically, we want our DAGs to point to one particular thing, so we will say that the arrows point to parents.

Why am I bringing this up? Because I realized that the outline structure of a document is also a DAG. Each Heading has several blocks of text under it, like the children in the family tree. And each Heading itself is a subheading of a bigger Heading. The concept of the “document” itself can be seen as the Lucas Grey of our family tree. Each document in a wiki is a DAG.

Therefore Multigraph?

So, you have a graph of text blocks that form DAGs of documents, but each of those text blocks can also connect up to other text blocks in other DAGs with hyperlinks. So, what if we had a graph with two kinds of edges, the Outline kind, and the Hyperlink kind? An entire wiki could be theoretically stored in a single multigraph of text blocks, with Outline DAGs encoding documents, and Hyperlink graphs representing the connections between all the documents.

Conclusion

I believe that the type definition of a Wiki would be a multigraph of text. I think that you could store an entire wiki inside of a graph database and have a really interesting architecture for a wiki engine. But that’s for next time 🙂

Source Code in a Wiki?

This is an add on to my previous post, Wikis as Multigraphs of Text.

As of right now, most source code is stored as text files in a file system. And this has worked out fine for the industry for decades. But, there are other ways of storing programs. What if, we stored source code in a wiki?

As a reminder, from my previous blog post, I stated that a Wiki data type could be encoded as a multigraph, that is, a graph with at least two kinds of edges, Outline, where we encode text block order for output, and Hyperlink, where we track hyperlink references between blocks.

Why should we put source code in wikis?

For the purposes of this blog post, I’d like to avoid the question of the usefulness of this paradigm, and just make this statement: I think this is a fun thought experiment, not a call for change.

If/when I do implement some kind of system based on these ideas, I will report that in another post.

I think that a wiki could be an excellent way for a group of programmers to maintain a codebase over time. Utility scripts could be encoded with documentation alongside, with comments and examples included. If the wiki was version controlled, then all of that information would be as well. All alongside your code, instead of in a different place. And if that wiki was a multigraph, then you can use graph theory to structure your code instead of a bytestream.

Here’s a made up example: let’s say that there is a license/copyright comment block at the top of every one of your files in your code base. In the real world examples of these, often there was a script or automation of some kind that made sure that all of these comment blocks were up to date. If source code was encoded in a graph, then you could have only one vertex in the graph that has that license file, then you can have any number of references to it. If you have the system present the code as a file system (as a particular view into your graph data), then all the files would have that same comment block at the top.

How does this relate to Literate Programming?

This is an evolution of Literate Programming, and as such, the tangle and weave functions from that paradigm are required. The weave function creates a website that allows users to view the graph data as prose. The tangle function creates that view into the graph data of a filesystem with the source code in plaintext.

What types do we need for the multigraph?

So, the types needed to encode this change would have to expand to include different kinds of text blocks, and include a new type of edge.

In the last post, the Wiki only has one type of block, which is just Text. However, in the case of differentiating between different kinds of blocks, we need to make at least two kinds of blocks: Prose and Code. If we wanted to have our wiki be a polyglot environment, we can have the Code block encode what language the code is in as Text or possibly a sum type.

The Outline edge type can also be divided into two kinds, which I’m naming after the two Literate Programming functions: Tangle and Weave. Both the Tangle and Weave types could include a filepath as a property. So, during the tangle operation, the system would output the files as they should be for compilation, and the weave operation would generate the pages for the wiki for user viewing.

The End

This is a fairly new idea for me, and I don’t know a lot of details as of yet. I’m going to read up on Tinkerpop and Goblin and see if I can’t test out some of these ideas on my own.

Thanks for reading!