WP2: Data access to marine biological data | Emodnet Biology

WP2: Data access to marine biological data

Lead: VLIZ

+          (MBA, Deltares, SAHFOS, MARIS, SMHI, IEO, IMR, IFREMER, OGS, Aarhus Univ, ILVO, ICES, NIOZ, SYKE, NIMRD, Cefas, IPMA, IOF)

 

Objectives

 

The objective of Work Package 2 is to provide data and metadata on observations of marine species (phytoplankton, zooplankton, macroalgae, angiosperms, benthos, birds, mammals, reptiles and fish) and to answer the primary task of the tender, that is the development of a common method of access to biological data held in repositories by the organization collecting them and make the data interoperable such that all data of a particular type collected within a defined time and space window can be found, visualised and downloaded allowing data from different sources to be assembled without further processing. The different marine biological data and databases that will contribute to the project are listed in detail under the spreadsheet attached to this proposal (datasets.xls) and access to the data is ensured by the different data supply statements. The standards and data formats used within this project to integrate the scattered marine biological datasets are based on the World Register of Marine Species (WoRMS), the authoritative and comprehensive list of names of marine organisms worldwide and the Darwin Core Archive, an internationally recognised biodiversity informatics data standard that simplifies the publication of biodiversity data. Through the implementation of the European Ocean Biogeographic Information System (EurOBIS) as marine biological data infrastructure, this project has a strong collaboration, with OBIS, an evolving global strategic alliance of people and organizations sharing a vision to make marine biogeographic data, freely available over the World Wide Web. The specific objectives of this work package are:

 

  • analyse and assess in-depth the usability and fitness for purpose of the different data and databases that will contribute to the project, including analysis of trait information
  • decide on the optimal mechanisms for linkage with the EMODnet portal, making maximal use of existing systems
  • format the data and perform taxonomic and data standardizations to allow interoperability with the EMODnet biological portal
  • determine the suitability of the data for the creation of the data products and validate the produced data products
Methodology & activities

1. Analyse and assess in-depth the usability and fitness for purpose of the different data

We will in a first phase undertake in-depth analysis and assessment of the usability and fitness for purpose of the different data and databases that will contribute to the project. The analysis will look at the different data types available, for example whether abundance data, absence data or biomass data are available, what the taxonomic, spatial or temporal cover of the dataset is and whether the dataset is fitted to contribute to specific envisaged data products. We will also assess in depth if supporting functional trait information can be made available. The priority traits that will be assessed are based on the prioritization that has been made during the EMODnet Biology II project and includes taxonomy, geography body size, environment, habitat, depth, reproduction, mobility, skeleton and diet. Also depending on the dataset specificities, a specific model of linking with the EMODnet data portal might be preferred. For example, detailed metadata descriptions, species observation data, species abundance data or aggregated species data will contribute to the EMODnet portal using different modalities and can be served through different technologies. Based on this first assessment, we can then decide what datasets are best fit for the creation of the data products and on  the optimal mechanism for linkages with the EMODnet portal, making use of existing systems as much as possible. The partners of this work package are responsible for different databases, for example large-scale thematic databases or long-term national marine biological monitoring data series. These databases cover the largest European marine biological monitoring series and include all trophic levels of the marine ecosystem, or species groups.

2. Format the data and perform data standardizations

In a second step the data will be formatted accordingly and the taxonomic standardization will be performed for all biological data to allow interoperability with the EMODnet biological portal (WP6) and to integrate the data from different sources allowing the creation of data products (WP4). One of the main difficulties with biological data is that every species is a variable and there are over 33,000 species in European seas. The taxonomic standardization will be done using the World Register of Marine Species (WoRMS). All taxon names will be matched with WoRMS to trace and rule out spelling variations and resolve frequently used synonyms. If it concerns valid taxon names not yet present in WoRMS, these are passed on to the responsible taxonomic editors of WoRMS who will check them, resolve the taxonomy and may decide to add them to the Register. Further complications in data integration and analysis are that different sampling methods significantly affect the collection of species (and reported abundance), and the same species may be reported from very different habitats. Thus we will need to document and classify where and how they were sampled, which sampling gear has been used and how the data was processed. Other quality control checks include: checking that the required data fields are present and the values are possible,  that all data fields contain the appropriate data, ensure database relational integrity for datasets which have Measurements or Facts values, check that abundances are provided for the datasets for which they were promised, that when biomasses are provided, it is clear whether they are wet weight or dry weight, when codes are provided for certain data (e.g. sex, life stage, sampling gear,…), they are explained and to check for duplicate records. These checks proved valuable not only to limit data duplication, but also to assess whether relevant sampling descriptors (different subsamples) or biotic measurements (e.g. life stages, size measurements) were omitted. All these metadata of the contributing datasets will be made available using ISO and INSPIRE compliant standards.

 

Output (Deliverables)

D2.1: Assessment of data and databases, including list of datasets that will be used for creation of products (M3)

D2.2: D2.2: Data standardization and formatting of a subset of the data that is needed for the data products (M12)

D2.3: Data standardization and formatting of all datasets mentioned under data coverage section of proposal for linking with EMODnet biology (M24)