Data Model

Parameters
Concepts
Further Reading

Status: Final

alj: TODO: introductory textHere is an example Path macro usage: blogidea1. And an example image: A (one-dimensional) list containing the two paths "blog/idea/1" and "blog/idea/2", with a stylised file next to each path. Idea 1 shows a lightbulb, idea 2 shows a deeply smug expression.

Parameters

alj: TODO: introductory text

We give precise semantics to these parameters in the spec proper, the list need not fully make sense on the first read-through.An instantiation of Willow must define concrete choices of the following parameters:

A type NamespaceId for identifying namespaces.
A type SubspaceId for identifying subspaces.
A natural number max_component_length for limiting the length of path components.
A natural number max_component_count for limiting the number of path components in a single path.
A natural number max_path_length for limiting the overall size of paths.
A totally ordered type PayloadDigest for content-addressing the data that Willow stores.
Since this function provides the only way in which Willow tracks payloads, you probably want to use a secure hash function.A function hash_payload that maps bytestrings (of length at most $2^{64} - 1$ ) into PayloadDigest.
A type AuthorisationToken for proving write permission.
Meadowcap is our bespoke capability system for handling authorisation. But any system works, as long as it defines a type of AuthorisationToken and an is_authorised_write function.A function is_authorised_write that maps an Entry (defined later) and an AuthorisationToken to a Bool, indicating whether the AuthorisationToken does prove write permission for the Entry.

Concepts

Willow can store arbitrary bytestrings of at most $2^{64} - 1$ bytes. We call such a bytestring a Payload.

A Path is a sequence of at most max_component_count many bytestrings, each of at most max_component_length bytes, and whose total number of bytes is at most max_path_length. The bytestrings that make up a Path are called its Components.

A Timestamp is a U64, that is, a natural number between zero (inclusive) and $2^{64} - 1$ (exclusive). Timestamps are to be interpreted as a time in microseconds since the Unix epoch.

The metadata associated with each Payload is called an Entry:

1
The metadata for identifying a Payload.
2
struct Entry {
3
 
The identifier of the namespace to which the Entry belongs.
4
 
namespace_id: NamespaceId,
5
 
The identifier of the subspace to which the Entry belongs.
6
 
subspace_id: SubspaceId,
7
 
The Path to which the Entry was written.
8
 
path: Path,
9
 
The claimed creation time of the Entry.alj: TODO fix renderingWall-clock timestamps may come as a surprise. We are cognisant of their limitations, and use them anyway. To learn why, please see invalid-reference
10
 
timestamp: Timestamp,
11
 
The length of the Payload in bytes.
12
 
payload_length: U64,
13
 
The result of applying hash_payload to the Payload.
14
 
payload_digest: PayloadDigest,
15
}

A PossiblyAuthorisedEntry is a pair of an Entry and an AuthorisationToken. An AuthorisedEntry is a PossiblyAuthorisedEntry for which is_authorised_write returns true.alj: TODO better inline code tag styling

a is a prefix of a and of ab, but not of ab.A Path s is a prefix of a Path t if the first Components of t are exactly the Components of s.

We can now formally define which Entries overwrite each other and which can coexist.

An Entry e1 is newer than another Entry e2 if

e2.timestamp < e2.timestamp, or
e2.timestamp == e2.timestamp and This comparison is why we require PayloadDigest to be totally ordered.e2.payload_digest < e2.payload_digest, or
e2.timestamp == e2.timestamp and e2.payload_digest == e2.payload_digest and e2.payload_length < e2.payload_length.

A store is a set of AuthorisedEntries such that

all its Entries have the same namespace_id, and
there are no two of its Entries old and newalj: TODO proper fonts for defined values (and other kinds of defined entities as well). such that
- old.subspace_id == new.subspace_id, and
- new.path is a prefixThis is the point where we formally define invalid-reference. of old.path, and
- new is newer than old.

When two peers connect and wish to update each other, they compute the joins of all their store with equal NamespaceIds. Doing so efficiently and in a privacy-preserving way can be quite challenging, we recommend our invalid-reference protocol.Formally, adding a new Entry to a store consists of computing the join of the original store and a singleton store containing only the new Entry.The join of two store that store Entries of the same namespace_id is the store obtained as follows:

Start with the union of the two store.
Then, remove all Entries with a path p whose timestamp is strictly less than the timestamp of any other Entry of the same subspace_id whose path is a prefix of p.
Then, for each subset of Entries with equal entry_subspace_id, equal entry_path, and equal entry_timestamp, remove all but those with the greatest payload_digest.
Then, for each subset of Entries with equal entry_subspace_id, equal entry_path, equal entry_timestamp, and equal entry_payload_digest, remove all but those with the greatest payload_length.

A namespace is the join over all11 store in existence of a given NamespaceId. This concept only makes sense as an abstract notion, since no participant in a distributed system can ever be certain that it has (up-to-date) information about all existing store. A subspace is teh set of all Entries of a given SubspaceId in a given namespace. This again is more of a conceptual notion than a computer-friendly one.

Data Model

Parameters

Concepts

Further Reading