DS2STAC:#

A Python package for harvesting and ingesting (meta)data into STAC-based catalog infrastructures

https://codebase.helmholtz.cloud/cat4kit/ds2stac/gitlab-profile/badges/main/pipeline.svg https://readthedocs.org/projects/ds2stac/badge/?version=latest https://img.shields.io/badge/code%20style-black-000000.svg https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336 https://img.shields.io/badge/code%20style-pep8-orange.svg http://www.mypy-lang.org/static/mypy_badge.svg https://api.reuse.software/badge/codebase.helmholtz.cloud/cat4kit/ds2stac/gitlab-profile https://zenodo.org/badge/DOI/10.5281/zenodo.8086566.svg

Overview:#

The DS2STAC (Data Servers/Services to STAC metadata catalog) package comprises of three specialized sub-packages designed for the purpose of scanning and harvesting datasets, as well as generating STAC-Catalogs, STAC-Collections, and STAC-Items. Additionally, there is another package included in the DS2STAC suite that facilitates the ingestion of the generated STAC metadata into the STAC-database, such as pgstac. Each of these packages facilitates the retrieval of geospatial and temporal information required for the creation of SpatioTemporal Asset Catalog (STAC) Catalogs, Collections, and Items. At present, the DS2STAC framework enables the extraction of data from THREDDS-Server, Intake-Catalogs, and SensorThings APIs. In each of the three instances, it establishes and oversees uniform STAC items, catalogs, and collections. These resources are subsequently made publicly accessible through the pgSTAC database and the STAC API, facilitating a user-friendly engagement with environmental research data inside the specified data servers and services.

The DS2STAC initiative is a component of the CAT4KIT (Catalog service for Karlsruhe Institute of Technology) project. Cat4KIT has received funding from the Exzellenzuniversitäts-Vorhaben Research Data Management of the Karlsruhe Institute of Technology.

As mentioned in above DS2STAC has four different submodules that each of them is responsilbe for harvesting and creating STAC-Catalogs, -Collections, and -Items from a specific data server/sevices or ingesting the creating the datasets in a STAC-database. We name these submodules as follows:

TDS2STAC
TDS2STAC logo

This Python-based package is responsible for the harvesting and generation of STAC-Metadata from a THREDDS. The package possesses the capability to extract detailed datasets information from three distinct web services, namely ISO, WMS, and ncML.

Tip

To access information regarding the utilization of a TDS2STAC, please refer to the TDS2STAC documentation available at the following link:
https://tds2stac.readthedocs.io.
If you like to contribute to the development of this open-source package, utilize the provided repo in below:
https://codebase.helmholtz.cloud/cat4kit/ds2stac/tds2stac/.

STA2STAC
TDS2STAC logo

The STA2STAC package serves as a bridge between the SensorThings API(STA) and the SpatioTemporal Asset Catalog (STAC) standard, improving the process of making STA time-series data more Findable, Accessible, Interoperable, and Reusable (FAIR).

Tip

To access information regarding the utilization of a STA2STAC, please refer to the STA2STAC documentation available at the following link:
https://sta2stac.readthedocs.io.
If you like to contribute to the development of this open-source package, utilize the provided repo in below:
https://codebase.helmholtz.cloud/cat4kit/ds2stac/sta2stac/.

INTAKE2STAC
TDS2STAC logo

INTAKE2STAC, a Python suite, streamlines the retrieval of dataset details from Amazon S3 via an INTAKE catalog and the creation of STAC-Catalog, -Collection, and -Items. This process is instrumental in elevating the FAIRness of environmental data on Amazon S3—making such data more discoverable (Findable), readily accessible (Accessible), seamlessly compatible (Interoperable) with other datasets, and easier to repurpose (Reusable).

Tip

To access information regarding the utilization of a INTAKE2STAC, please refer to the INTAKE2STAC documentation available at the following link:
https://intake2stac.readthedocs.io.
If you like to contribute to the development of this open-source package, utilize the provided repo in below:
https://codebase.helmholtz.cloud/cat4kit/ds2stac/intake2stac/.

INSUPDEL4STAC
TDS2STAC logo

STAC specification is a method of exposing spatial and temporal data collections in a standardized manner. Specifically, the STAC specification describes and catalogs spatiotemporal assets using a common structure. This package is desigend to manage ingestion, updating and and removing of STAC-Metadata toward either PostgreSQL schema and functions for STAC (pgSTAC) or STAC compliant FastAPI application (STAC-FASTAPI) services.

Tip

To access information regarding the utilization of a INSUPDEL4STAC, please refer to the INSUPDEL4STAC documentation available at the following link:
https://insupdel4stac.readthedocs.io.
If you like to contribute to the development of this open-source package, utilize the provided repo in below:
https://codebase.helmholtz.cloud/cat4kit/ds2stac/insupdel4stac/.

Data Servers / Services#

A comprehensive manual designed to enhance your understanding of the employed data server/services within this project.

About STAC#

This document serves as a reference for learning about the STAC metadata standard.

Architechture#

Contribution#