Latest version
You are reading the latest version of this documentation. Older versions can be selected from the sidebar on the left.
8. Data Set Metadata¶
Each Data Provider (Data Provider) maintains a set of one or more metadata files, each of which can describe one or more distinct data sets. These descriptions serve several purposes:
They drive discovery descriptions are ingested into our search system and made available to a Data Consumer searching for particular kinds of data.
They inform consumption of that data, providing information on:
The API required to access the data set
Any access constraints which may need to be satisfied
Licenses for any accessed data
Representation and internal semantics of expressions of the data
8.1. Metadata File Structure¶
Note
The examples below use YAML format for compactness and increased readability. Data providers may present this information either in YAML or in JSON form.
The overall structure of the metadata file is a list of objects, each of which has the following structure:
- content:
# Discovery information
access:
# Access control and licensing information
transport:
# |API| information
representation:
# Data format information
8.2. Content Block¶
The content
key contains a block of JSON-LD compatible information describing the conceptual content of the dataset.
A simple example is shown below:
- content:
"@type": "dcat:Dataset"
"@context":
dcat: http://www.w3.org/ns/dcat#
dct: http://purl.org/dc/terms/
oe: http://energydata.org.uk/oe/terms/
dct:title: My amazing data set
dct:description: This is a free text description of the data set
dcat:version: 0.1.2
dcat:versionNotes: This is a note on this particular version of the dataset
oe:sensitivityClass: OE-SA
oe:dataSetStableIdentifier: myData
These are the minimum properties every data set must define, they include terms from the
Dublic Core (dct
) and Data Catalog (dcat
)
vocabularies, as well as from the Open Energy core ontology. Prefixes are defined in the JSON-LD @context
object
as in the example above.
Key |
Value |
---|---|
Short title for this data set |
|
Longer form description of this data set |
|
Version number of the data set, this should preferably follow semantic versioning if possible. Versioning of the data set should be used to indicate changes in delivery mechanism, or in representation, rather than for changes in the underlying data. For example, this should not be used to differentiate between data sets from different years, rather it should be used to indicate whether a potential data consumer might need to alter how it processes any returned data. |
|
Notes used to explain any changes to this version |
|
|
The data sensitivity class of this data set. In the current Open Energy system this should always be one of OE-O, OE-SA, or OE-SB, no other classes are permitted. The value of this property also determines the level of API security imposed, with OE-O data sets being open data with no additional security, and the two shared data classes mandating FAPI security using the Open Energy trust services. |
|
An identifier, unique to this Data Provider, which will not be changed, and which will be used along with the data provider’s own ID to create a unique identifier for this data set within the Open Energy search system. |
8.2.1. Additional metadata¶
The information above is the minimum needed to ensure that a data set is visible in the Open Energy search system. There are, however, other properties of a data set which may be useful to potential data consumers. Where such information can be provided, it should be provided in as standard a form as possible - in practice this translates to making use of existing ontologies such as DCAT and Dublin Core by preference, then shared, industry-specific, ontologies, and only using internal or custom representation when absolutely necessary.
Of particular note, and something we would like to ultimately expose in our search interface, is information about the geospatial and temporal ranges of entries within a data set. This is a complex subject, but one that has already been handled by DCAT. If you need to express this kind of information, please do so according to the standards laid out here.
8.3. Access Block¶
This section describes the kinds of licensing, expressed as sets of capabilities, and what, if any, conditions must be satisfied before a data consumer can acquire these data.
Each item within this section contains:
A statement describing a set of conditions which must be satisfied to grant access, and the set of capabilities granted should access be provided by this set of conditions. The exact specification for these statements can be found at Access Control and Capability Grant Language
A boolean property indicating whether the access conditions in [1] are sufficient (
true
), or simply indicative (false
). In the former case, a data consumer which satisfies all the conditions will be granted access, in the latter they may be granted access, but there may be additional requirements not fully described hereA pair of dates indicating the time range for which this access condition is valid. Data providers are encouraged to commit to access and license conditions with a reasonable timeframe to allow potential consumers to plan their own activities
access:
# Access constraint to licensing predicates
- rule: oe:verified, oe:last_update max_age_days 60 grants oe:use_any
sufficient: true
appliesFrom: 2021-04-22
appliesTo: 2022-04-22
- rule: group:some_group grants oe:use_any, oe:adapt_any
sufficient: false
appliesFrom: 2021-04-22
appliesTo: 2022-04-22
8.4. Transport Block¶
This section describes the on the wire transport protocol, normally HTTP, but with scope to describe out-of-band
transports with an initial HTTP negotiation process. It contains at least a single http
key, the value of which
must be valid Open|API|
For example:
transport:
http:
# This block is mandatory, and contains the Open|API| spec for the secured or open
# HTTP endpoints (depending on data class)
openapi: 3.0.0
info:
title: Sample |API|
description: CSV format data
version: 0.1.0
servers:
- url: http://data-provider-example.com
description: Describe this particular server if needed
paths:
"/data":
get:
summary: Returns a CSV containing all the data
description: If we had any more to describe, we'd do it here
responses:
'200':
description: CSV data stream
Note
Because API security is defined in relation to the data sensitivity class of the data set, it is not necessary to define the security of any presented API in this section. Data sets in class OE-O must expose an API with no extra security measures, and those in OE-SA and OE-SB must be secured by FAPI using the Open Energy trust services.
8.4.1. Heartbeat URL¶
Data providers SHOULD create a secured endpoint to act as a heartbeat - if this is specifed then the OEGS will periodically call it to assertain liveness and optionally gather metrics as described in Heartbeat and monitoring endpoint
A hearbeat URL can be specified as a single key heartbeat_url
with the value being the fully qualified URL at which
the hearbeat response is exposed.
8.5. Representation Block¶
This section describes the format of any data received by a data consumer from this data set. Open Energy does not mandate particular formats, so this section is guidance rather than specification.
The only required element in this section is a key mime
which should contain the
media type of the returned data. At a bare minimum this allows a client to
load data into some kind of tooling. Depending on this value, other objects may be present.
8.5.1. text/csv¶
This type indicates that data is presented in CSV format. In this case, an optional key csvw
may be defined, and
should contain valid JSON-LD following the CSV for the Web guidelines:
representation:
mime: text/csv
csvw:
# This is only applicable if the mime type is text/csv
"@context": http://www.w3.org/ns/csvw
tableSchema:
columns:
- titles: country
- titles: country group
- titles: name (en)
- titles: name (fr)
- titles: name (de)
- titles: latitude
- titles: longitude
8.5.2. Other types¶
This is currently open for consultation, we would like to be able to guide data providers towards particular representation types for particular kinds of information, and make use of any existing ontologies or standards such as the Common Information Model where such standards will aid interoperability between Open Energy participants and the wider community.
8.6. Full Example¶
Putting together all the fragments from previous sections produces the following - this represents a single data set, in the full metadata file this would be contained within a list. YAML form:
- content:
"@type": "dcat:Dataset"
"@context":
dcat: http://www.w3.org/ns/dcat#
dct: http://purl.org/dc/terms/
oe: http://energydata.org.uk/oe/terms/
dct:title: My amazing data set
dct:description: This is a free text description of the data set
dcat:version: 0.1.2
dcat:versionNotes: This is a note on this particular version of the dataset
oe:sensitivityClass: OE-SA
oe:dataSetStableIdentifier: myData
access:
# Access constraint to licensing predicates
- rule: oe:verified, oe:last_update max_age_days 60 grants oe:use_any
sufficient: true
appliesFrom: 2021-04-22
appliesTo: 2022-04-22
- rule: group:some_group grants oe:use_any, oe:adapt_any
sufficient: false
appliesFrom: 2021-04-22
appliesTo: 2022-04-22
transport:
http:
# This block is mandatory, and contains the Open|API| spec for the secured or open
# HTTP endpoints (depending on data class)
openapi: 3.0.0
info:
title: Sample |API|
description: CSV format data
version: 0.1.0
servers:
- url: http://data-provider-example.com
description: Describe this particular server if needed
paths:
"/data":
get:
summary: Returns a CSV containing all the data
description: If we had any more to describe, we'd do it here
responses:
'200':
description: CSV data stream
representation:
mime: text/csv
csvw:
# This is only applicable if the mime type is text/csv
"@context": http://www.w3.org/ns/csvw
tableSchema:
columns:
- titles: country
- titles: country group
- titles: name (en)
- titles: name (fr)
- titles: name (de)
- titles: latitude
- titles: longitude
Or, in JSON form:
[
{
"content": {
"@type": "dcat:Dataset",
"@context": {
"dcat": "http://www.w3.org/ns/dcat#",
"dct": "http://purl.org/dc/terms/",
"oe": "http://energydata.org.uk/oe/terms/"
},
"dct:title": "My amazing data set",
"dct:description": "This is a free text description of the data set",
"dcat:version": "0.1.2",
"dcat:versionNotes": "This is a note on this particular version of the dataset",
"oe:sensitivityClass": "|OE-SA|",
"oe:dataSetStableIdentifier": "myData"
},
"access": [
{
"rule": "oe:verified, oe:last_update max_age_days 60 grants oe:use_any",
"sufficient": true,
"appliesFrom": "2021-04-22T00:00:00.000Z",
"appliesTo": "2022-04-22T00:00:00.000Z"
},
{
"rule": "group:some_group grants oe:use_any, oe:adapt_any",
"sufficient": false,
"appliesFrom": "2021-04-22T00:00:00.000Z",
"appliesTo": "2022-04-22T00:00:00.000Z"
}
],
"transport": {
"http": {
"openapi": "3.0.0",
"info": {
"title": "Sample |API|",
"description": "CSV format data",
"version": "0.1.0"
},
"servers": [
{
"url": "http://data-provider-example.com",
"description": "Describe this particular server if needed"
}
],
"paths": {
"/data": {
"get": {
"summary": "Returns a CSV containing all the data",
"description": "If we had any more to describe, we'd do it here"
},
"responses": {
"200": {
"description": "CSV data stream"
}
}
}
}
}
},
"representation": {
"mime": "text/csv",
"csvw": {
"@context": "http://www.w3.org/ns/csvw",
"tableSchema": {
"columns": [
{
"titles": "country"
},
{
"titles": "country group"
},
{
"titles": "name (en)"
},
{
"titles": "name (fr)"
},
{
"titles": "name (de)"
},
{
"titles": "latitude"
},
{
"titles": "longitude"
}
]
}
}
}
}
]