Digital Collectomics Example SIP Jena applied ML

This dataset contains a selection of 100 herbarium scans (low-resolution) from the Herbarium Haussknecht, provided by Senckenberg Institute for Plant Form and Function (SIP), Jena. The images have been processed with the machine learning tool (convolutional neural network) for plant organ detection by Younis et al. 2020, see The dataset consists of two files according to the RO-Crate specification (

  • A full RO-Crate which contains both the detected plant organ annotations in a machine-readable form as part of the ro-crate-metadata.json and low-res images of the herbarium scans themselves where the detected bounding boxes are visualized
  • annotations-highres.json: A pure RO-Crate metadata JSON-LD file which contains the detected annotations in reference to the original, high-resolution scans (which are referenced via their web URL)

The annotations consist of the following 6 different classes (mapped to terms from controlled vocabulary):

  1. leaf ->
  2. flower ->
  3. fruit ->
  4. seed ->
  5. stem ->
  6. root ->

Download Metadata as EML

Dataset DOI: doi:10.12761/w2c1-x551

Data and Resources

Additional Info

Field Value
Other info
Last Updated February 21, 2024, 14:29 (UTC)
Created February 16, 2024, 16:21 (UTC)

Responsible parties

Creator and point of contact
Name Jonas Grieb

Name Claus Weiland

Associated party
Name Solveig Franziska Bucher
Role Content provider

Associated party
Name Kristin Victor
Role Content provider

Research data management planning

Estimated volume of created data Cannot estimate
Data will be stored at (long-term archived) SGN

Link to this dataset: