Metadata from various source platforms arrives in various data models (iso19115, DCAT, Datacite) and serialisations (xml, json, ttl). This module harmonizes the metadata to a common relational database model. From where it is further processed.
This process runs at intervals on newly arrived records.
- get updated versions (if hash-id-source is not processed yet)
- deduplicate using identifier-alias
- harmonize metadata to common model
The same database as the harvester component is used, as described in db-migrate.
Clone the repository
git clone https://github.com/soilwise-he/md-harmonization
cd md-harmonization
Rename and Update .env-template (as .env) file with postgres db connection details. In a virtual environment:
pip install -r requirements.txt
python src/process.py
Run the process via a container. Set database connection details in the src/.env file.
docker run -it --env-file src/.env ghcr.io/soilwise-he/md-harmonization:latest python src/process.py
Python mudule using sqlalchemy for database (postgresql) administration
Based on pygeometa to parse iso19139, schema.org, datacite, dcat
This work has been initiated as part of the Soilwise-he project. The project receives funding from the European Union’s HORIZON Innovation Actions 2022 under grant agreement No. 101112838. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.