-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy pathmetrics_v0.3.yaml
178 lines (165 loc) · 16.4 KB
/
metrics_v0.3.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# LIST OF FAIRSFAIR METRICS AND THEIR RESPONSE OUTPUT FORMATS
metrics:
## ---------------- FINDABILITY ---------------- ##
- metric_identifier: FsF-F1-01D
metric_short_name: Globally Unique Identifier
metric_name: Data is assigned a globally unique identifier.
description: A data object may be assigned with a globally unique identifier such that it can be referenced unambiguously by humans or machines. Globally unique means an identifier should be associated with only one resource at any time. Examples of unique identifiers of data are Internationalized Resource Identifier (IRI), Uniform Resource Identifier (URI) such as URL and URN, Digital Object Identifier (DOI), the Handle System, identifiers.org, w3id.org and Archival Resource Key (ARK). A data repository may assign a globally unique identifier to your data or metadata when you publish and make it available through their services.
fair_principle: F1
evaluation_mechanism: Identifier is considered unique if it is successfully validated through https://pythonhosted.org/IDUtils/. Supported schemes are ISBN10, ISBN13, ISSN, ISTC, DOI, Handle, EAN8, EAN13, ISNI ORCID, ARK, PURL, LSID, URN, Bibcode, arXiv, PubMed ID, PubMed Central ID, GND.
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-F1-02D
metric_short_name: Persistent Identifier
metric_name: Data is assigned a persistent identifier.
description: We make a distinction between the uniqueness and persistence of an identifier. An HTTP URL (the address of a given unique resource on the web) is globally unique, but may not be persistent as the URL of data may be not accessible (link rot problem) or the data available under the original URL may be changed (content drift problem). Identifiers based on the Handle System, DOI, ARK are both globally unique and persistent. They are maintained and governed such that they remain stable and resolvable for the long term. The persistent identifier (PID) of a data object may be resolved (point) to a landing page with metadata containing further information on how to access the data content, in some cases a downloadable artefact, or none if the data or repository is no longer maintained. Therefore, ensuring persistence is a shared responsibility between a PID service provider (e.g., datacite) and its clients (e.g., data repositories). For example, the DOI system guarantees the persistence of its identifiers through its social (e.g., policy) and technical infrastructures, whereas a data provider ensures the availability of the resource (e.g., landing page, downloadable artefact) associated with the identifier.
fair_principle: F1
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-F2-01M
metric_short_name: Descriptive Core Metadata
metric_name: Metadata includes descriptive core elements (creator, title, data identifier, publisher, publication date, summary and keywords) to support data findability.
description: Metadata is descriptive information about a data object. Since the metadata required differs depending on the users and their applications, this metric focuses on core metadata. The core metadata is the minimum descriptive information required to enable data finding, including citation which makes it easier to find data. We determine the required metadata based on common data citation guidelines (e.g., DataCite, ESIP, and IASSIST), and metadata recommendations for data discovery (e.g., EOSC Datasets Minimum Information (EDMI), DataCite Metadata Schema, W3C Recommendation Data on the Web Best Practices and Data Catalog Vocabulary). This metric focuses on domain-agnostic core metadata. Domain or discipline-specific metadata specifications are covered under metric FsF-R1.3-01M. A repository should adopt a schema that includes properties of core metadata, whereas data authors should take the responsibility of providing core metadata.
fair_principle: F2
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 2
passed: false
- metric_identifier: FsF-F3-01M
metric_short_name: Inclusion of Data Identifier in Metadata
metric_name: Metadata includes the identifier of the data it describes.
description: The metadata should explicitly specify the identifier of the data such that users can discover and access the data through the metadata. If the identifier specified is persistent and points to a landing page, the data identifier and links to download the data content should be taken into account in the assessment.
fair_principle: F3
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-F4-01M
metric_short_name: Searchable Metadata
metric_name: Metadata is offered in such a way that it can be retrieved programmatically.
description: This metric refers to ways through which the metadata of data is exposed or provided in a standard and machine-readable format. Assessing this metric will require an understanding of the capabilities offered by the data repository used to host the data. Metadata may be available through multiple endpoints. For example, if data is hosted by a repository, the repository may disseminate its metadata through a metadata harvesting protocol (e.g., via OAI-PMH) and/or a web service. Metadata may also be embedded as structured data on a data page for use by web search engines such as Google and Bing or be available as linked (open) data.
fair_principle: F4
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 2
- metric_identifier: FsF-A1-01M
metric_short_name: Data Access Information
metric_name: Metadata contains access level and access conditions of the data.
description: This metric determines if the metadata includes the level of access to the data such as public, embargoed, restricted, or metadata-only access and its access conditions. Both access level and conditions are necessary information to potentially gain access to the data. It is recommended that data should be as open as possible and as closed as necessary. There are no access conditions for public data. Datasets should be released into the public domain (e.g., with an appropriate public-domain-equivalent license such as Creative Commons CC0 licence) and openly accessible without restrictions when possible. Embargoed access refers to data that will be made publicly accessible at a specific date which should be specified in the metadata. For example, a data author may release their data after having published their findings from the data. Therefore, access conditions such as the date the data will be released publically is essential. Restricted access refers to data that can be accessed under certain conditions (e.g. because of commercial, sensitive, or other confidentiality reasons or the data is only accessible via a subscription or a fee). Restricted data may be available to a particular group of users or after permission is granted. For restricted data, the metadata should include the conditions of access to the data such as point of contact or instructions to access the data. Metadata-only access refers to data that is not made publicly available and for which only metadata is publicly available.
fair_principle: A1
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-I1-01M
metric_short_name: Formal Representation of Metadata
metric_name: Metadata is represented using a formal knowledge representation language.
description: Knowledge representation is vital for machine-processing of the knowledge of a domain. Expressing the metadata of a data object using a formal knowledge representation will enable machines to process it in a meaningful way and enable more data exchange possibilities. Examples of knowledge representation languages are RDF, RDFS, and OWL. These languages may be serialized (written) in different formats. For instance, RDF/XML, RDFa, Notation3, Turtle, N-Triples and N-Quads, and JSON-LD are RDF serialization formats.
fair_principle: I1
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 2
- metric_identifier: FsF-I1-02M
metric_short_name: Metadata with Semantic Resources
metric_name: Metadata uses common semantic resources
description: A metadata document or selected parts of the document may incorporate additional terms from semantic resources (also referred as semantic artefacts) so that the contents are unambiguous and can be processed automatically by machines. This enrichment facilitates enhanced data search and interoperability of data from different sources. Ontology, thesaurus, and taxonomy are kinds of semantic resources, and they come with varying degrees of expressiveness and computational complexity. Knowledge organization schemes such as thesaurus and taxonomy are semantically less formal than ontologies.
fair_principle: I1
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-I3-01M
metric_short_name: Links to related entities
metric_name: Metadata includes links between the data and its related entities.
description: Linking data to its related entities will increase its potential for reuse. The linking information should be captured as part of the metadata. A dataset may be linked to its prior version, related datasets or resources (e.g. publication, physical sample, funder, repository, platform, site, or observing network registries). Links between data and its related entities should be expressed through relation types (e.g., DataCite Metadata Schema specifies relation types between research objects through the fields ‘RelatedIdentifier’ and ‘RelationType’), and preferably use persistent Identifiers for related entities (e.g., ORCID for contributors, DOI for publications, and ROR for institutions).
fair_principle: I3
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-R1-01MD
metric_short_name: Metadata of Data Content
metric_name: Metadata includes the descriptions of the content of the data.
description: This metric evaluates if a description (properties) of the content of the data is specified in the metadata. The description should be an accurate reflection of the actual data deposited. Data content descriptors include but are not limited to resource type (e.g., data or a collection of data), variable(s) measured or observed, method, data format and size. Ideally, ontological vocabularies should be used to describe data content to support interdisciplinary reuse.
fair_principle: R1
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 4
- metric_identifier: FsF-R1.1-01M
metric_short_name: Data Usage License
metric_name: Metadata includes license information under which the data can be reused.
description: This metric evaluates if data is associated with a license because otherwise users cannot reuse it in a clear legal context. We encourage the application of licenses for all kinds of data whether public, restricted or for specific users. Without an explicit license, users do not have a clear idea of what can be done with your data. Licenses can be of standard type (Creative Commons, Open Data Commons Open Database License) or bespoke licenses, and rights statements which indicate the conditions under which data can be reused. It is highly recommended to use a standard, machine-readable license such that it can be interpreted by machines and humans. In order to inform users about what rights they have to use a dataset, the license information should be specified as part of the dataset’s metadata.
fair_principle: R1.1
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-R1.2-01M
metric_short_name: Data Provenance
metric_name: Metadata includes provenance information about data collection or generation.
description: >-
Data provenance (also known as lineage) represents a dataset’s history, including the people, entities, and processes involved in its creation, management and longer-term curation. It is essential to provide provenance information about your data to provide valuable context and to enable informed use and reuse. The levels of provenance information needed can vary depending on the data type (e.g., measurement, observation, derived data, or data product) and research domains. For that reason, it is difficult to define a set of finite provenance properties that will be adequate for all domains. Based on existing work, we suggest that the following provenance properties of data generation or collection are included in the metadata record as a minimum.
(a) Sources of data, e.g., datasets the data is derived from and instruments
(b) Data creation or collection date
(c) Contributors involved in data creation and their roles
(d) Data publication, modification and versioning information
There are various ways through which provenance information may be included in a metadata record. Some of the provenance properties (e.g., instrument, contributor) may be best represented using PIDs (such as DOIs for data, ORCIDs for researchers).
This way, humans and systems can retrieve more information about each of the properties by resolving the PIDs. Alternatively, the provenance information can be given in a linked provenance record expressed explicitly in e.g., PROV-O or PAV or Vocabulary of Interlinked Datasets (VoID).
fair_principle: R1.2
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 2
- metric_identifier: FsF-R1.3-01M
metric_short_name: Community-Endorsed Metadata Standard
metric_name: Metadata follows a standard endorsed by a given research community.
description: In addition to core metadata required to support data discovery (covered under metric FsF-F2-01M), metadata to support data reusability should be made available following community-endorsed metadata standards. Some communities have well-established metadata standards (e.g., geospatial [ISO19115], biodiversity [DarwinCore, ABCD, EML], social science [DDI], astronomy [International Virtual Observatory Alliance Technical Specifications]) while others have limited standards or standards that are under development (e.g., engineering and linguistics). The use of community-endorsed metadata standards is usually encouraged and supported by domain and discipline-specific repositories.
fair_principle: R1.3
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1
- metric_identifier: FsF-R1.3-02D
metric_short_name: Data File format
metric_name: Data available in file format backed by the research community.
description: File formats refer to methods for encoding digital information. For example, CSV for tabular data, NetCDF for multidimensional data and GeoTIFF for raster imagery. Data should be made available in a file format that is backed by the research community to enable data sharing and reuse. Consider for example, file formats that are widely used and supported by the most commonly used software and tools. These formats also should be suitable for long-term storage and archiving, which are usually recommended by a data repository. The formats not only give a higher certainty that your data can be read in the future, but they will also help to increase the reusability and interoperability. Using community-endorsed formats enables data to be loaded directly into the software and tools used for data analysis. It makes it possible to easily integrate your data with other data using the same preferred format. The use of preferred formats will also help to transform the format to a newer one, in case a preferred format gets outdated.
fair_principle: R1.3
evaluation_mechanism: to-do
created_by: FAIRsFAIR
date_created: 2020-07-08
date_updated: 2020-07-08
version: 0.3
total_score: 1