Structuring data with JSON-LD

Web APIs purpose is to expose structured data on Internet so other software applications can interact with it. Structured data is made of data types and properties.

For example, the Spotify API has an Artist data type with the properties name, genres, images, etc. Furthermore, every resource on the Spotify API has a type property that define the data type. type is set to artist on artist objects.

But when we look at another API, such as the Stripe API, we notice the type property is replaced with an object property which has the same meaning.

Stripe API
Example Stripe API
Spotify API
Example Spotify API

To integrate an API with an application, because every APIs have specificity, we need to know the data type and the signification of all the properties. Where is the name stored, what is the data format constraints: is it a string, a date? Is the property mandatory? Sometimes, you find this information on API schema manually built or auto generated from the API code.

At Datablist, we build a flexible database to store any kind of data. To achieve this flexibility, we have moved the data schemas outside of the database. The data types, the properties and the data constrains are defined in json files. Open and hackable json files.

Introducing JSON-LD#

JSON-LD is a standard to encode linked data using json. It was created in 2010 when the semantic web was a thing and it has continued to evolve since. JSON-LD is maintained by the W3C and they released the last version on June 2020.

You may already use it to add structured data on web pages for Google or when calling the Google Knowledge Graph API.

Recently, it does a comeback with new projects in Linked Data (such as Solid) and in Data Interoperability to define common data models and connect enterprise products. The European GAIA-X project also seems to implement a semantic vocabulary.

JSON-LD Concepts#

JSON-LD is a standard to self describe data resources. This means you can decipher an object data type and the structure of its properties directly from the object JSON-LD representation.

To do so, JSON-LD comes with a set of terms, starting with the @ character. The two most important are:

  • @type to define the data type
  • @id for the object identifier

Every data type and property in a JSON-LD object must be unambiguous. Properties and types in a JSON-LD object are represented with their URI.

For example, I can define a Person object with:

{
"@type": "http://schema.org/Person",
"@id": "http://data.somewhere.com/person/john-malkovich",
"http://schema.org/name": "John"
}

Here, I use property and type definitions from http://schema.org/. Directly from the JSON-LD representation, I know the object data type and I can fetch information from http://schema.org/name to get information on how to interpret the name property value.

Vocabulary#

In the JSON-LD snippet above, we use a Person type and a name property from Schema.org. It is called a vocabulary.

A vocabulary defines entity types, properties and sometimes relationships between types. More on Datablist Vocabulary.

@context and @base#

Using URI everywhere makes everything unambiguous, but it creates redundancies and increases the object size. To avoid repeating the base URI, we can use the @context term to abstract it.

In our example, the name property and the Person type are defined using Schema.org vocabulary, so we can remove http://schema.org from them and add it to the @context.

Using @context
{
"@context": {
"@vocab": "http://schema.org"
},
"@id": "http://data.somewhere.com/person/john-malkovich",
"@type": "Person",
"name": "John"
}

We have reduce our property name but the @id is still using a long URI. To abstract the vocabulary and the @id URI, we need to add them both in the context.

Using @vocab and @base
{
"@context": {
"@vocab": "http://schema.org",
"@base": "http://data.somewhere.com/person/",
},
"@id": "john-malkovich",
"@type": "Person",
"name": "John Malkovich"
}

@base will be used to resolve the @id URI. It will prove useful when generating listing to avoid repeating the base URI.

In JSON-LD, public vocabularies cover basic properties and data types to create a shared knowledge. For example, the vocabulary http://purl.org/dc/elements/1.1/ exposes a created property and it receives a date value.

Rather than creating from scratch all the properties, we will add some from public vocabularies.

{
"@context": {
"@vocab": "http://schema.org",
"@base": "http://data.somewhere.com/person/",
},
"@id": "john-malkovich",
"http://purl.org/dc/elements/1.1/created": "2021-02-30",
"@type": "Person",
"name": "John Malkovich"
}

created comes from another vocabulary so our @vocab keyword in the context doesn't apply.

Hopefully, we can create alias in the context to map properties to their full URI version.

{
"@context": {
"@vocab": "http://schema.org",
"@base": "http://data.somewhere.com/person/",
"created": "http://purl.org/dc/elements/1.1/created"
},
"@id": "john-malkovich",
"created": "2021-02-30",
"@type": "Person",
"name": "John Malkovich"
}

That's better! Because @context holds so much information, it is a good practice to define it once and add it as an external reference to the JSON-LD object.

{
"@context": {
"@vocab": "http://schema.org",
"@base": "http://data.somewhere.com/person/",
"@import": "http://a-remote-context-definition.com"
},
"@id": "john-malkovich",
"created": "2021-02-30",
"@type": "Person",
"name": "John Malkovich"
}

To learn more about JSON-LD, please visit the W3C specification.

JSON-LD at Datablist#

We use JSON-LD for our data store. To define collection data types and properties in a vocabulary, and in the data API when interacting with the resources.

See our vocabulary details here.

A developer friendly JSON-LD#

JSON-LD is complex. The specification contains several keywords and conventions that are hard to understand for new comers.

We want our APIs to be JSON-LD compliant while being developer friendly. To achieve this, we structure our API response to look like a regular json API, while adding a @context definition to it.

Example for an Item object
{
"@context": {
"@base": "https://data.datablist.com/39c39b23d17144a/collections/790022bc60cfbc835/items/",
"@vocab": "http://vocab.datablist.com/",
"@import": "http://vocab.datablist.com/terms.jsonld"
},
"@id": "0d41642eb82c445",
"@type": "Contact",
"firstName": "John",
"lastName": "Malkovich"
}

We have all the information to decipher this object:

  • The full URI built using @id and @base is: https://data.datablist.com/39c39b23d17144a/collections/790022bc60cfbc835/items/0d41642eb82c445
  • The type is http://vocab.datablist.com/Contact
  • The properties are http://vocab.datablist.com/firstName and http://vocab.datablist.com/lastName
  • All the information about the properties data constraints can be found in the http://vocab.datablist.com/terms.jsonld vocabulary definition.
info

You might try to go with your browser on http://vocab.datablist.com or on a property http://vocab.datablist.com/Contact and notice it returns a 404 error. Those must be seen as unique identifiers for properties and can be found in the vocabulary definition.

Properties data constraints#

Because JSON-LD is self described, we know for any object property the kind of data it expects. May it be a string, a number, a boolean, a date or other. When interacting with the API, data validation is performed on the incoming data. Not based on information included on our data store but from the data constraints we have on our external JSON-LD vocabulary.