Skip to content

Commit 2470dd8

Browse files
doc: Add an outline for applying automatic QC flagging (#8)
1 parent 4665cee commit 2470dd8

3 files changed

Lines changed: 143 additions & 1 deletion

File tree

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Contents
2525
Data Acquisition <methods/acquisition>
2626
Standardisation <methods/standardisation>
2727
Trim to Deployment <methods/trimming>
28+
Automatic QC <methods/auto_qc>
2829
Apply Calibration <methods/calibration>
2930
Convert to OceanSites <methods/conversion>
3031

docs/source/methods/auto_qc.rst

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
3. Automatic QC flagging
2+
==============================
3+
4+
Here we will create some automatic QC flagging based on U.S. Integrated Ocean Observing System (IOOS) Quality Assurance of Real Time Ocean Data (QARTOD); https://ioos.noaa.gov/project/qartod/).
5+
6+
The outcome here will be to flag data
7+
8+
.. list-table:: Quality Control Flag Values and Meanings
9+
:header-rows: 1
10+
:widths: 6 22 22 40
11+
12+
* - Flag
13+
- OceanSITES Meaning
14+
- IOC Meaning
15+
- Notes
16+
* - 0
17+
- **unknown**
18+
- not defined
19+
- Used in OceanSITES, not IOC
20+
* - 1
21+
- **good_data**
22+
- **Data point passed the test**
23+
- Passed documented required QC tests
24+
* - 2
25+
- **probably_good_data**
26+
- Test was not evaluated
27+
- OceanSITES assumes quality; IOC indicates no test performed or unknown
28+
* - 3
29+
- **potentially_correctable_bad_data**
30+
- **Data point is interesting/unusual or suspect**
31+
- OceanSITES implies fixable; IOC flags as suspect (non-critical or subjective failure)
32+
* - 4
33+
- **bad_data**
34+
- **Data point fails the test**
35+
- Failed critical QC tests or flagged by data provider
36+
* - 7
37+
- **nominal_value**
38+
- not defined
39+
- Constant value, e.g. for reference or nominal settings; not used by IOC
40+
* - 8
41+
- **interpolated_value**
42+
- not defined
43+
- Estimated or gap-filled data; not used by IOC
44+
* - 9
45+
- **missing_value**
46+
- **Data point is missing**
47+
- Placeholder when data are absent
48+
49+
50+
**Including QC Flags in an xarray Dataset**
51+
52+
To add a QC flag variable to an xarray Dataset, define a new variable (e.g., `TEMP_QC`) with the same dimensions as the data variable, and assign the appropriate attributes:
53+
54+
.. code-block:: python
55+
56+
import numpy as np
57+
import xarray as xr
58+
59+
ds["TEMP_QC"] = xr.DataArray(
60+
np.ones(ds["TEMP"].shape, dtype="int8"),
61+
dims=ds["TEMP"].dims,
62+
attrs={
63+
"long_name": "quality flag for TEMP",
64+
"flag_values": [0, 1, 2, 3, 4, 7, 8, 9],
65+
"flag_meanings": "unknown good_data probably_good_data potentially_correctable_bad_data bad_data nominal_value interpolated_value missing_value"
66+
}
67+
)
68+
69+
70+
1. Overview
71+
-----------
72+
73+
Besides
74+
Raw mooring records often contain extraneous data before deployment or after recovery (e.g., deck recording, values during ascent/descent, post-recovery handling). These segments must be trimmed to retain only the time interval when the instrument was collecting valid in-situ measurements at the nominal depth during deployment. In this stage:
75+
76+
- Visualised to identify data issues (e.g., deployment start/end spikes only)
77+
- Optionally low-pass filtered (e.g., 2-day Butterworth)
78+
- Inspected manually
79+
- Optionally adjusted:
80+
- Revised trimming bounds
81+
- Prepared for further processing (e.g., gridding)
82+
83+
2. Purpose
84+
----------
85+
86+
- Flag data quality per sample
87+
- Generate summary plots and statistics
88+
89+
3. Input
90+
--------
91+
92+
- Standardised `xarray.Dataset` containing raw time series (`TIME`, `TEMP`, etc.)
93+
- Configuration information for the automatic QC tests to be applied (e.g. QARTOD global range test, spike test, etc)
94+
95+
4. Output
96+
---------
97+
98+
- Additional flagged data variables on the `xarray.Dataset` named `<PARAM>_QC`.
99+
- Configuration information for the automatic QC applied
100+
101+
102+
5. Example
103+
----------
104+
105+
.. code-block:: python
106+
107+
from oceanarray.methods import auto_qc
108+
109+
ds_trimmed = newname_here(ds_std, start="2021-01-05T20:00", end="2023-02-25T17:00")
110+
111+
.. code-block:: text
112+
113+
<xarray.Dataset>
114+
Dimensions: (TIME: 104576)
115+
Coordinates:
116+
* TIME (TIME) datetime64[ns] ...
117+
Data variables:
118+
TEMPERATURE (TIME) float32 ...
119+
PRESSURE (TIME) float32 ...
120+
Attributes:
121+
start_time: 2021-01-05T20:00
122+
end_time: 2023-02-25T17:00
123+
trimmed: True
124+
125+
6. Implementation Notes
126+
-----------------------
127+
128+
- Rely heavily on the `ioos_qc` python package
129+
130+
7. FAIR Considerations
131+
----------------------
132+
133+
- Don't change the data - only apply flags
134+
- Retain configuration information for the flagging carried out automatically: i.e., what thresholds were used
135+
- **Note:** Since we are using OceanSITES data format, we should use OceanSITES flagging. However, there is a conflict in meaning for flag "2". Possibly it might be wiser to simply not use flag 2 and only use flag 3 when it's not a flag 1?
136+
137+
138+
See also: :doc:`calibration`
139+

docs/source/processing_framework.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@ The instrument-level processing carries out 2 main steps:
3535
- **Stage 0:** Downloading raw instrument files (e.g., `.cnv`, `.asc`).
3636
- **Stage 1:** Converting data files to a consistent (internal) format.
3737
- **Stage 2:** Trimming the record to the deployment period (i.e., removing the launch and recovery periods) and applying clock corrections.
38-
- **Stage 3:** Applying calibrations to the moored instrument and create a traceable log of the calibration process.
38+
- **Stage 3:** Applying automatic QC, i.e. global range tests and spike tests from ioos_qc QARTOD.
39+
- **Stage 3.5:** Apply calibrations to the moored instrument and create a traceable log of the calibration process.
3940
- **Stage 4:** Convert data to a common format for onward use with rich metadata.
4041

4142
The first step step is downloading data from instruments. This is typically using manufacturers' software and some of the downloaded files may be in proprietary formats (e.g., SeaBird `.cnv` format). Formats can also change over time or depending on settings used when downloading the data, hence the need for the "standardisation" step. Stage 0 is typically performed at sea as soon as a mooring is recovered and instruments are available.
@@ -61,6 +62,7 @@ The first step step is downloading data from instruments. This is typically usi
6162
- :doc:`methods/acquisition` describes downloading raw instrument files.
6263
- :doc:`methods/standardisation` describes how to convert the raw instrument files to an internally-consistent format (e.g., RBD or netCDF).
6364
- :doc:`methods/trimming` describes how to trim the data to the deployment period and apply clock corrections.
65+
- :doc:`methods/auto_qc` describes the approach for automatically generating and adding quality control flags to the data parameters (geophysical ones, e.g., `TEMP`, `CNDC`, `PRES`).
6466
- :doc:`methods/calibration` describes how to apply calibration corrections to the instrument data and create a traceable log of the calibration process.
6567
- :doc:`methods/conversion` describes how to convert the data to a common format (e.g., CF-netCDF) with rich metadata.
6668

0 commit comments

Comments
 (0)