March 02, 2022

The graph-relational database, defined

On the heels of EdgeDB’s recent 1.0 launch, we (rather predictably) received one question more than any other: what is a graph-relational database? This post is intended to be the internet’s canonical answer to that question.

The graph-relational model is a new conceptual model for representing data. Under this model, data is represented as strongly typed objects that contain set-valued scalar properties and links to other objects.

Copy
type Person {
  required property name -> str;
}

type Movie {
  required property title -> str;
  multi link actors -> Person;
}

Keep in mind that “graph-relational database” is not synonymous with “EdgeDB”. EdgeDB is just the first production-ready database that implements the graph-relational model. Similarly, EdgeQL is not a definitional part of the paradigm; it’s simply our proposal for an open, implementation-independent, graph-relational query language.

Copy
select Movie {
  title,
  actors: { id, name }
} filter .title = "The Avengers"

In the future, though, other graph-relational databases may exist with different type systems, schema syntax, and query languages.

Graph-relational is best understood as a descendant of the relational paradigm. The table below provides a terminology map between the shared concepts.

Relational

Graph-relational

Table (“relation”)

Object type

Column (“attribute”)

Property or link

Row (“tuple”)

Object

However the graph-relational model extends the relational paradigm in three major ways: object identity, links, and cardinality. We’ll discuss each in detail below.

All objects have a globally unique, immutable identifier. There’s no need to explicitly declare this identifier in your schema; it is assumed. In EdgeDB it’s represented as an required, readonly property called id that has an exclusive constraint, is auto-assigned a UUID upon insertion, and will never be reused. In SDL, this would be represented as follows:

Copy
required property id -> uuid {
  constraint exclusive;
  readonly := true;
  default := uuid_generate_v1mc();
}

In the future, other graph-relational databases can represent identity differently; all that matters is that there is some concept of identity.

Relational databases don’t do this; tracking object identity requires adding an appropriately typed column, marking it as a primary key, and specifying a uniqueness contraint. This column can then be used as a target of foreign key constraints in other tables.

It’s common for graph databases (e.g. Neo4j) to internally assign identifers to nodes, since a first-class concept identity is a pre-requisite for a non-leaky concept of links/edges. Speaking of which:

In the relational model, attribute values consist of a name ("email") and a type ("text"). Under the graph-relational model, there is a third component: the cardinality.

The cardinality specifies the number of values that can be assigned to the attribute. In EdgeDB, cardinality is represented internally as a five-valued enum consisting of Empty, One, AtMostOne, AtLeastOne, and Many. In SDL, these cardinalities are represented with combinations of more familiar terms: required vs optional and single vs multi. Consider the following object type.

Copy
type Movie {
  property description -> str;
  required property title -> str;
  multi property alt_titles -> str;
  required multi link actors -> Person;
}

This movie type demonstrates all possible attribute cardinalities expressible in EdgeDB. The title property is required (cannot be empty), the alt_titles property is multi (can contain several str values), and actors is both (points to one or more Person objects). Here are the types and cardinalities of each attribute as EdgeDB sees them.

Key

Type

Cardinality

description

str

AtMostOne

title

str

One

alt_titles

str

Many

actors

Person

AtLeastOne

Multi links are necessary to represent many-to-many relationships between object types. Multi properties are less common, but occasionally useful when storing an unordered set of values, such as alt_titles in the sample schema.

Technically the relational model provides one mechanism for constraining cardinality: the not null constraint. Using EdgeDB terminology, this changes the cardinality from AtMostOne to One. There is no affordance for cardinalities greater than one.

When we use the word “set”, we mean it in the mathematical sense. This principle permeates everything in EdgeDB. There is no distinction between scalar-valued and table-valued expressions, as in SQL. Everything is a set with a known type and cardinality (even plain literal values) and can be manipulated with set-theoretic operators like union.

Copy
edgedb> select "hi";
{'hi'}
edgedb> select {"hi"};
{'hi'}
edgedb> select {"hi", "there"};
{'hi', 'there'}
edgedb> select "hi" union "there";
{'hi', 'there'}

Nothing is also a set. Like, literally nothing. As a happy consequence of the graph-relational model’s set-theoretic core, NULL is no more. Instead, the absence of data is simply an empty set.

When executing EdgeQL queries with one of our client libraries, empty sets are decoded into idiomatic values. If the set in question has no upper bound (cardinality of Many or AtLeastOne), it would be represented as an empty array. Other cardinalities result in null/nil/None (per the client library language).

We think this set of principles, taken together, defines a new kind of database abstraction that deserves its own term and represents a spiritual successor to the relational paradigm. Moreover, we think EdgeDB, which recently had its first stable release, is extremely awesome and you should try it.

Head to our GitHub repo for a collection of useful links, or jump into the Quickstart.