Metadata

 

"Scientific metadata provide the information
necessary for investigators separated by time,
space, institution or disciplinary norm
to establish common ground"1

 

At the ingest (the inclusion of a dataset in a data archive), the description of the dataset is checked. Both the creator and the data librarian can add so-called metadata to the set. Metadata are structured, standardised information units which describe, explain and localise an original source (such as a dataset). Adding metadata makes it easier to find, recognize or (re)use an information source, or to link it to other information sources. Metadata are often called ‘data about data’. Or ‘information about information’. There are metadata to describe the contents (abstract) and metadata to describe the context (date of creation, instruments used, etc.).

The important question when adding metadata is: “Does a (future) user have sufficient information to understand what the dataset comprises?” Because of the diversity of datasets, there are no standard answers: One size fits no one.

                     

Below you see a list of the fields included in the 3TU.Datacentrum metadata form. The fields with an asterisk (*) are obligatory. The structure used by 3TU.Datacentrum is based on the Dublin Core Metadata Initiative (DCMI) standard. Dublin Core is easy to use and is applied all over the world.

  • Creator*
    Main researcher(s) involved in producing the data
  • Contributor
    Institution where the data was created or collected. A person or organization responsible for making contributions to the dataset.
  • Publisher*
    Institution which submitted the work
  • Title*
    Name or title by which a resource is known
  • Publication year*
    The year when the data was or will be made publicly available
  • Date created
    Date the resource itself was put together; this could be a data range or a single date
  • Description*
    Concise description of the contents of the dataset. Describe the research objective, type of research, method of data collection and type of data.
  • Subject
    Subject, keyword, classification code, of key phrase describing the resource
  • Coverage temporal
    Indicate the dates to which the data refer. Enter the year, or beginning and end dates
  • Coverage spatial
    Describe the geographic area to which the data refer (e.g. municipality, town/city, region, country) The geographic coordinates of the area may be included, if desired
  • Identifier
    3TU.Datacentrum automatically assigns a persistent identifier to a dataset once the entire deposit procedure has been completed. In some cases, a dataset may be known by one or more other (persistent) identifiers
  • URL to publication
    Include the web addresses for any publication, important internal reports or other datasets that are related to your dataset.

3TU.Datacentrum also conforms to the DataCite Metadata Scheme2.When a DOI is assigned to a dataset, the associated metadata are entered in the so-called metadata store.

Based on the dataset it is determined whether and in which fields extra metadata should be added. Then the list is sent to the maker for approval. It is good to realise that metadata can sometimes be gathered from the data themselves. Some data formats include metadata in their data. Think of digital photos: when you save them, you automatically also save data about the circumstances of taking them (diaphragm, exposure, etc.). A different example: in the IDRA weather measurement dataset the description includes: “Radar range(s): standard, near, far. Max rain level: strong rain”. These are metadata gathered from the dataset itself.

If there is no date of publication for the data, or if it does not matter, the date included in the metadata is the date the data were uploaded. If an observation is from a certain date, that date is added in the field data created.

Curation boundaries

At the moment of data creation, the research data are managed by the researcher. Over time, the research data will (ideally) move in steps from the private to the public domain. These steps are called curation boundaries3,4. They are moments when data are transferred between people, organisations, machines, laboratories and disciplines, and from one data format to another. Moments when decisions have to be made about the data. At these steps, “data friction”1 can occur: “points of resistance where data can be garbled, misinterpreted, lost”. Paul Edwards et al. suggest that you should not expect metadata to guarantee unambiguity1. Even within a discipline people have different vocabularies and regularly ask each other “what do you mean exactly?”. A curation boundary is sometimes compared to the fit of two metal parts: on the one hand, you strive for precision (metadata), but on the other hand, some lubrication is required in order to overcome imperfections. In data curation, direct communication with the researcher (data producer) is the lubricant or the repair in case of ambiguities.

 

1. Edwards, P. (2011). Science Friction: Data, Metadata, Collaboration. Social Studies of Science 41(5), 667-690. doi: 10.1177/0306312711413314
2. DataCite. (2011). DataCite MetaData Scheme for the Publication and Citation of Resea rch Data. Retrieved 8-12-2012 from http://datacite.org/schema/DataCite-MetadataKernel_v2.0.pdf
3. Sieverts, E. (2011). De cirkel van onderzoeksdata. Retrieved 8-12-2012 from http://www.library.uu.nl/medew/it/eric/data.pdf
4. ANDS. (2011). Data Curation Continuum. Retrieved 8-12-2012 from http://ands.org.au/guides/curation.continuum.pdf

Twitter
Loading..