emer2gent-data

Emer2gent Data Resources

This is an attempt to provide a way to bring together resources/data being used by the Emer2gent Data Alliance.

There are many organisations with many different technologies available. The aim is to distribute effort as much as possible and allow individual organisations to maintain their own lists of metadata about their resources - an organisation index - in a common format. Their organisation index should exist at a fixed URL on the web. That might be in a Github repository, on their own site, or somewhere else; as long as the URL doesn’t change and is accessible. An organisation only needs to update the central index once with basic details about their organisation index.

Note - This is a work-in-progress and the format may change.

Central index

The index.json file in this repository is a central place to define other lists. It contains an object in the format:

{
    "unique-id": {                                        
        "author": "A.N. Other",
        "description": "A description of the author",
        "url": "https://an.other/",
        "index": "https://an.other/covid-19/index.json"
    }
}

The properties are defined as follows:

Organisation index

These are JSON files containing a list of datasets. Here is an example index file for ODI Leeds’ COVID-19 resources. Each dataset can contain multiple resources (URLs of files/visualisations etc). The organisation index file is of the form:

We have created an initial json schema to provide a full specification and examples are documented below.

Here is a basic example:

{
	"version": "1.0",
	"datasets": []
}

Dataset

An individual dataset’s metadata has the following required fields:

The following fields are optional:

Let’s look at a bare-bones example with the minimum of metadata. This might be useful as a placeholder but wouldn’t actually define any resources.

{
	"id": "1",
	"createdAt": "2020-05-01T00:00Z",
	"title": "An example"
}

Now let’s have a more complete example that has three CSV files and a web-based visualisation as resources:

{
    "id": "jh-dashboard",
    "sharing": "public",
    "topics": ["health", "timeline"],
    "tags": ["#COVID-19","data"],
    "licence": "CC0",
    "createdAt": "2016-06-02T11:27:08",
    "updatedAt": "2020-04-28T11:10:21.637Z",
    "update_frequency": "Annually",
    "title": "Johns Hopkins COVID-19 Dataset",
    "author": "Johns Hopkins University",
    "author_email": "fakeemail@coronavirus.jhu.edu",
    "url": "https://coronavirus.jhu.edu/",
    "maintainer": "A. Person",
    "maintainer_email": "fakeemail@coronavirus.jhu.edu",
    "description": "2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE",
    "resources": [{
            "type": "vis",
            "format": "html",
            "title": "COVID-19 United States Cases by County",
            "description": "Data is updated once per day after 8 p.m. Eastern to allow the system to pull county-level data. For the most up-to-date confirmed cases and deaths, please see the COVID-19 Global Map. New York City borough deaths data does not include Probable COVID-19 deaths, as this data is not reported.",
            "url": "https://coronavirus.jhu.edu/us-map"
     },{
            "type": "data",
            "format": "csv",
            "check_size": 3482,
            "temporal_coverage_from": "2020-01-22",
            "temporal_coverage_to": "2020-05-01",
            "title": "Global number of confirmed cases",
            "description": "",
            "url": "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
     },{
            "type": "data",
            "format": "csv",
            "check_size": 3482,
            "temporal_coverage_from": "2020-01-22",
            "temporal_coverage_to": "2020-05-01",
            "title": "Global number of deaths",
            "description": "",
            "url": "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv"
     },{
            "type": "data",
            "format": "csv",
            "check_size": 3482,
            "temporal_coverage_from": "2020-01-22",
            "temporal_coverage_to": "2020-05-01",
            "title": "Global number of recovered patients",
            "description": "",
            "url": "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv"
    }],
    "references": [{
            "url": "https://raw.githubusercontent.com/rolls-royce/EMER2GENT/master/data/sun/geo/country_name_mapping.csv",
            "description": "Country names to ISO code mapping. Required as data uses non-standard country names."
    }]
}

Resource

Your dataset may contain multiple resources such as a visualisation, CSV, JSON, PDF, API etc. The resources array lets you add them. The fields are:

Here is an example of a visualisation resource:

{
	"type": "vis",
	"format": "html",
	"title": "COVID-19 United States Cases by County",
	"description": "Data is updated once per day after 8 p.m. Eastern to allow the system to pull county-level data. For the most up-to-date confirmed cases and deaths, please see the COVID-19 Global Map. New York City borough deaths data does not include Probable COVID-19 deaths, as this data is not reported.",
	"url": "https://coronavirus.jhu.edu/us-map"
}

Reference

This section can be used to specify other datasets which are linked to this one. For example, some resources may require other datasets or resources in order to be able to make full use of it. For example, the Johns Hopkins COVID-19 Dataset uses non-standard country names. From an analysis point of view, ISO country codes are much better.

The team at Rolls-Royce created an excellent country name to ISO3166 codes lookup CSV - here’s an example of that written as a references:

{
	"url": "https://raw.githubusercontent.com/rolls-royce/EMER2GENT/master/data/sun/geo/country_name_mapping.csv",
	"description": "Country name to ISO3166 code mapping, as dataset uses non-standard country names"
}