How‑to: open license data

Overview

The question of how to open license data, datasets, databases, and other forms of numerical information can be both simple and involved.

The discussion here is restricted to non‑personal data which can be or has been legitimately made public. That stipulation then removes issues of personal privacy and commercial privacy from further consideration.

The short answer is that you select a suitable open license, obtain the relevant license notice, assess who should be attributed where relevant, and add this information as a header or as a standalone file to your data file or files.

The long answer is that if you are importing and exporting data and handling metadata repeatedly, then your data management tooling will need to recognize and process this licensing information as well.

The choice of license is straightforward — indeed just two licenses are recommended and both originate from the Creative Commons organization:

Instrument Target Comment
CC‑BY‑4.0 datasets any associated software needs open source licensing
CC0‑1.0 associated metadata public domain waiver to facilitate cataloging

Theoretical considerations

A good definition for open data is found in the 2019 European directive 2019/1024 covering public sector information (European Commission 2019, recital 16):

open data as a concept is generally understood to denote data in an open format that can be freely used, re‑used and shared by anyone for any purpose

That definition from law makes it quite clear that free use and reuse are key attributes. That said, some open licenses do place additional obligations to attribute when republishing, irrespective of whether the data in question is in its original state or has been subsequently modified.

The next issue to consider is the avoidance of data silos though inappropriate license choice. This problem is not really resolved as such — rather the most common attribution license is selected in order to create the largest possible common silo. That license is the Creative Commons CC‑BY‑4.0 license. With additional background and analysis provided on this blog.

Any associated metadata needs to be licensed for use and reuse too and the Creative Commons CC0‑1.0 public domain waiver is recommended to allow for cataloging by other parties with the minimum of friction.

Practical considerations

Practical questions concerning license notices and their inclusion are covered in Ball (2014). One approach is the Frictionless Data framework in which the legal information is contained in a JSON file and the entire bundle then zipped (Frictionless Data ongoing, Karev and Winfree 2020).

The issue of handling attribution tracking within more broad reaching data management tooling is beyond the scope of this topic.

Useful links

References

Ball, Alex (17 July 2014). How to license research data. Edinburgh, United Kingdom: Digital Curation Centre (DCC).

Ball, Alex and Monica Duke (2015). How to cite datasets and link to publications. Edinburgh, United Kingdom: Digital Curation Centre (DCC).

European Commission (26 June 2019). “Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information — PE/28/2019/REV/1”. Official Journal of the European Union. L 172: 56–83.

Frictionless Data (ongoing). Applying licenses, waivers or public domain marks. Frictionless data. Cambridge, United Kingdom.

Karev, Evgeny and Lilly Winfree (8 October 2020). Announcing the new frictionless framework. Open Knowledge Foundation blog. Creative Commons CC‑BY‑4.0 license.

Lombardi, Francesco (22 June 2020). Flomb/Calliope-Italy: v0.2.2 — Direct and indirect electrification options. Europe: Zenodo. doi:10.5281/zenodo.3903089. Zipped dataset. DOI resolves to the latest version. Creative Commons CC‑BY‑4.0 license.