Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/paper_draft.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Build JOSS paper draft PDF

on:
push:
paths:
- paper/**
- .github/workflows/paper_draft.yml
pull_request:
paths:
- paper/**
- .github/workflows/paper_draft.yml
jobs:
paper:
runs-on: ubuntu-latest
name: JOSS Paper Draft
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
# This should be the path to the paper within your repo.
paper-path: paper/paper.md
- name: Upload
uses: actions/upload-artifact@v4
with:
name: paper
# This is the output path where Pandoc will write the compiled
# PDF. Note, this should be the same directory as the input
# paper.md
path: paper/paper.pdf
61 changes: 61 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
@article{Hoyer:2017,
title = {xarray: {N-D} labeled arrays and datasets in {Python}},
author = {Hoyer, S. and Hamman, J.},
year = 2017,
journal = {Journal of Open Research Software},
volume = 5,
number = 1,
publisher = {Ubiquity Press},
doi = {10.5334/jors.148},
url = {https://doi.org/10.5334/jors.148}
}
@software{NetCDF:2026,
author = {Unidata},
title = {Network Common Data Form (NetCDF)},
year = 2026,
publisher = {UCAR/Unidata Program Center},
doi = {10.5065/D6H70CW6},
url = {http://doi.org/10.5065/D6H70CW6}
}
@article{CFConventions:2017,
title = {A data model of the {Climate} and {Forecast} metadata conventions ({CF}-1.6) with a software implementation (cf-python v2.1)},
volume = {10},
issn = {1991-9603},
url = {https://gmd.copernicus.org/articles/10/4619/2017/},
doi = {10.5194/gmd-10-4619-2017},
number = {12},
journal = {Geosci. Model Dev.},
author = {Hassell, D. and Gregory, J. and Blower, J. and Lawrence, B. N. and Taylor, K. E.},
month = {dec},
year = {2017},
note = {Publisher: Copernicus Publications},
pages = {4619--4646},
}
@article{ACDD:2023,
author = {{ESIP Documentation Cluster}},
title = {Attribute Conventions for Data Discovery},
date = {2023-09-05},
url = {https://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery},
urldate = {2024-01-12}
}
@software{SeaSenseLib:2026,
title = {SeaSenseLib},
author = {Sorge, Yves and Frajka-Williams, Eleanor},
month = feb,
year = 2026,
publisher = {Zenodo},
version = {v0.5.0},
doi = {10.5281/zenodo.18623283},
url = {https://doi.org/10.5281/zenodo.18623283},
}
@article{TODO:2026,
title = {TODO},
author = {TODO, Todo},
year = 2026,
journal = {TODO},
volume = 2026,
number = null,
publisher = {TODO},
doi = null,
url = null
}
98 changes: 98 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: 'SeaSenseLib: An Extensible Library for Processing Oceanographic Sensor Data'
tags:
- python
- oceanography
- marine science
- sensor data
- xarray
- netcdf
- cf-conventions
- acdd
- fair
authors:
- name: Eleanor Frajka-Williams
orcid: 0000-0001-8773-7838
equal-contrib: true
affiliation: 1
- name: Yves Sorge
orcid: 0009-0007-0043-9207
equal-contrib: true
affiliation: 1
affiliations:
- name: Institute of Oceanography, University of Hamburg, Germany
index: 1
ror: 00g30e956
date: 26 April 2026
bibliography: paper.bib
---

# Summary

`SeaSenseLib` is a Python library for reading, standardizing, and exporting heterogenous raw oceanographic sensor formats. It converts format-specific inputs into CF/ACDD-compliant netCDF files with canonical variable names, normalized units and preserved raw metadata. Processing is deterministic and deliberately avoids scientific interpretation or quality control, ensuring reproducible, researcher-controlled downstream analysis. `SeaSenseLib` provides a unified I/O layer, a configurable pipeline model for data standardization, optional plotting functions, and an extensible plugin system for adding new readers, writers and processing components without modifying the core library.


# Statement of Need

Oceanographic research relies heavily on in-situ observations from CTD instruments, moored platforms, and other systems. These instruments often record data in manufacturer- or instrument-specific formats (e.g., Sea-Bird `.cnv`, RBR `.rsk`) with inconsistent variable names, partially standardized units, and heterogeneous metadata quality. As a result, researchers frequently maintain self-developed scripts tailored to individual datasets, which increases maintenance burden, reduces reproducibility, and complicates sharing as well as long-term reuse.

While general libraries such as `xarray` [@Hoyer:2017] and `netCDF4` [@NetCDF:2026] offer powerful tools for working with multidimensional data but do not address the challenges of converting heterogeneous raw sensor formats or standardized metadata for interoperable Level-1 data products. In this context, “Level-1” refers to metadata-enriched, standardized sensor datasets that have not yet undergone scientific interpretation or advanced quality control.

`SeaSenseLib` fills this gap by providing a general-purpose and extensible solution for processing oceanographic sensor data from various instruments using a consistent and unified interface. The library reads various file formats and converts them into standardized `xarray` datasets, enabling uniform data processing, preparation, and visualization independent of the original instrument source, with built-in support for CF Conventions (Climate and Forecast) [@CFConventions:2017] and ACDD metadata (Attribute Convention for Data Discovery) [@ACDD:2023]. The result is an analysis-ready representation of raw observations that facilitates reproducible processing, interoperability, and long-term archiving, while creating FAIR-compliant data products and supporting community-driven expansion by using a plugin-based architecture in which new routines can be added without modifying the library’s code.


# State of Field

Various open-source tools already support specific components of the oceanographic data workflow, but they address only parts of the overall problem. The R package `oce` [@TODO:2026], for example, offers extensive analysis functions and readers for some common formats, but focuses on data analysis rather than on a generic, cross-format harmonization of variables, units, and metadata. `OceanDataTools.jl` [@TODO:2026] provides readers and tools in Julia for accessing selected data sources, but does not follow a conceptual model to transform heterogeneous raw data structures into a standardized, declarative data model. Both tools primarily address analysis or format-specific data access, but not the reproducible transformation and standardization of heterogeneous input data.

`stglib` [@TODO:2026] is a widely used Python toolkit with broad instrument support and established processing scripts for oceanographic time-series products, including QA/QC controls and wave-related outputs. Architecturally, this solution integrates raw-format conversion with substantial downstream processing routines in a single toolkit, with strengths in operational breadth and instrument-oriented processing pathways.

In addition to these general-purpose tools, there are project-specific parsers such as the `ocean-data-parser` [@TODO:2026] from the CIOOS ecosystem (Canadian Integrated Ocean Observing System), which are technically valuable but were each developed for specific formats or data sources and do not offer a generally extensible metadata layer, a configurable standard workflow, or a plugin-based extension model.

`SeaSenseLib` differs from these approaches primary in design focus: it is engineered as an early, modular standardization component that converts heterogeneous raw sensor files into deterministic, metadata-harmonized Level-1 netCDF datasets (CF/ACDD-oriented), with canonical variable mapping, unit normalization, and preserved provenance. With support for multiple formats and plugin-based extensibility, SeaSenseLib is designed as a reusable building block that can be embedded into project-specific processing pipelines and workflow engines across diverse institutional contexts, rather than prescribing one end-to-end processing stack.


# Design and Approach

`SeaSenseLib` follows a multi-stage processing architecture. First, instrument-specific readers convert raw data into an xarray data structure, using existing libraries such as `pycnv` [@TODO:2026] and `pyrsktools` [@TODO:2026] where available. Data then passes through a configurable pipeline to be converted to a canonical data model through the harmonization of variable names. To achieve this, user-defined rules, format-specific mappings, and more general fallback-based mappings are combined. This process allows very different input data to be converted into a common internal schema, which is necessary as identical physical quantities often are recorded in different formats under different names.

Metadata is extracted by instrument-specific readers from file headers, variable attributes, as well as instrument-specific information, and is adapted to CF and ACDD conventions via the internal pipeline using modular components. Derivable physical quantities are then calculated, if the parameters needed are available. A validation unit checks structure, units, and metadata before data is exported as standardized netCDF files. A key design goal of the pipeline is transparent and reproducible data processing without embedded decisions, as those should remain under the researcher’s control. Provenance for the entire transformation can be recorded to ensure transparency and reproducibility.

Extensibility is provided through Python entry points. External packages can register new readers, writers, convention handlers, derive functions, or plotters, which are automatically detected at runtime. This design enables long-term adaptability as new instruments, formats, and conventions emerge. In summary, the functionality includes the import of heterogeneous sensor data formats, their transformation into a standardized internal data model, convention-based metadata enrichment, and export to standardized output formats.

`SeaSenseLib` is intentionally designed as a modular component rather than an all-in-one processing suite. Its primary responsibility is deterministic conversion and standardization of heterogeneous raw sensor formats into harmonized Level-1 netCDF datasets. Workflow orchestration is deliberately out of scope and can be handled by specialized pipeline/workflow engines, enabling clean separation of responsibilities and easier integration into diverse research infrastructures.


# Research Impact Statement

`SeaSenseLib` has been adopted in academic research workflows for processing oceanographic sensor data. For example, it has been used within the Experimental Oceanography research group at the University of Hamburg for datasets collected in the Denmark Strait as part of the EPOC project [@TODO:2026]. Users report reduced preprocessing effort and improved consistency of derived Level-1 datasets.

The combination of an extensible design, a focus on CF and ACDD conventions, and integration into the Python ecosystem makes `SeaSenseLib` well suitable for data-intensive projects involving moored observations, CTD profiling, and data analysis workflows. The software thus provides a foundation for robust, reproducible, and FAIR-oriented data products suitable for both scientific analysis and long-term archiving.

Version 0.5.0 of the software, on which this article is based, has been archived and released for referencing [@SeaSenseLib:2026].


# Development History

`SeaSenseLib` originated in June 2023 as a grassroots project under the name ctd-tools, developed following an oceanographic field campaign in the North Sea to support visualization and standardized conversion of heterogeneous sensor data. Early versions focused on Sea-Bird and RBR formats and were iteratively extended in response to practical research needs.

In 2024, the library proved useful in a data recovery effort within the Experimental Oceanography group at the University of Hamburg, where it was used to harmonize historical mooring datasets (2006–2018), demonstrating its value for consistent long-term data processing.

In 2025, supported by dedicated funding, the project was expanded beyond CTD-focused workflows to support a broader range of sensor types, leading to its renaming as `SeaSenseLib`. A subsequent refactoring introduced modular, plugin-based architecture, enabling extensibility across formats and research workflows.


# Acknowledgements

We acknowledge contributions from Isabelle Schmitz during the genesis of this project.
Parts of this work were supported by the European Union’s Horizon 2020 research
and innovation programme under grant agreement No. 803140 (TERIFIC – Targeted Experiment
to Reconcile Increased Freshwater with Increased Convection).


# AI Usage Disclosure

AI-assisted tools (GitHub Copilot and Claude Sonnet 4.6) were used during code refactoring to modernize the implementation without altering the existing functionality. All outputs were reviewed, validated, and corrected by the authors.


# References

Loading