Skip to content

Add MTG datasource#818

Open
albertocarpentieri wants to merge 3 commits intoNVIDIA:mainfrom
albertocarpentieri:acarpentieri/mtg_datasource
Open

Add MTG datasource#818
albertocarpentieri wants to merge 3 commits intoNVIDIA:mainfrom
albertocarpentieri:acarpentieri/mtg_datasource

Conversation

@albertocarpentieri
Copy link
Copy Markdown
Contributor

Summary

  • Adds MTG — a DataSource for EUMETSAT MTG-I FCI Level-1C Full Disk (FDHSI) calibrated effective radiance from the Meteosat Third Generation geostationary satellite
  • Downloads on demand from the EUMETSAT Data Store via eumdac, extracts only the BODY segment files needed for the requested spatial extent
  • Returns calibrated effective radiance at the channel's native resolution — no spatial resampling applied
  • Adds MTGLexicon mapping the 12 mtg_* variable names to native FCI channel identifiers
  • Supports an optional region-of-interest (roi) that limits both the downloaded segments and the spatial extent of the output array

Data source details

Property Value
Source type DataSource
Remote store EUMETSAT Data Store (collection EO:EUM:DAT:0662)
Format NetCDF4 (40 BODY + 1 TRAIL segments per product zip)
Spatial resolution 1 km (VIS/NIR channels, 11136×11136) and 2 km (WV/IR channels, 5568×5568)
Spectral coverage 12 non-HR channels, 0.4–13.3 µm
Scan frequency 10 minutes (full disk)
Date range 2024-01-16 (MTG-I1 operational start) to present
Region Full disk, geostationary at 0°E
Authentication OAuth2 via EUMETSAT_CONSUMER_KEY + EUMETSAT_CONSUMER_SECRET

Data licensing

License: EUMETSAT Data Policy — free and open access

Open data, freely available for commercial and non-commercial use. Registration required for API access.

Key implementation details

  • Native resolution per channel: all variables in a single __call__ must share the same native resolution (1 km or 2 km); mixed-resolution requests raise ValueError — make separate calls for VIS/NIR and WV/IR channels
  • Segment-level extraction: the product zip contains 40 BODY .nc files; only segments overlapping the requested row range (with ±1 margin) are extracted, reducing I/O significantly for ROI fetches
  • BODY file sort order: segments are sorted by chunk number (the _NNNN suffix before .nc), not by filename — filenames contain per-segment timestamps that can sort out of lexicographic order, which would cause row misalignment when open_mfdataset concatenates along y
  • Lazy grid computation: full-disk lat/lon grids (925 MB at 1 km, 230 MB at 2 km) are computed once per resolution on first use and cached on the instance
  • Download resilience: per-attempt SIGALRM timeout (1200 s) with up to 3 retries and exponential backoff for transient network failures; previous alarm state restored after each attempt to avoid interfering with external signal handlers
  • grid() static method: returns (lat, lon) for the full disk at "1km" or "2km" resolution independently of any datasource instance

Test plan

  • Fast unit tests (no credentials): grid() shape for both resolutions, invalid resolution rejection, available() date checks, all 12 channels present in MTGLexicon, mixed-resolution ValueError
  • Slow integration tests (@pytest.mark.slow): single-channel 2 km fetch, multi-channel 2 km fetch, 1 km channel fetch, two-timestep fetch, ROI fetch (Europe, 2 km and 1 km), cache on/off

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

  • eumdac

root and others added 2 commits April 15, 2026 10:14
Signed-off-by: root <root@cpu-dm-0002.cm.cluster>
Signed-off-by: Alberto Carpentieri <acarpentieri@oci-hsg-cs-001-login-01.cm.cluster>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 16, 2026

Greptile Summary

Adds MTG, a DataSource for EUMETSAT MTG-I FCI Level-1C Full Disk calibrated radiance, along with MTGLexicon covering all 12 non-HR channels. The implementation is well-structured with lazy grid caching, SIGALRM-backed download retries, segment-level ROI extraction, and thorough unit/integration tests.

  • _OPERATIONAL_START date mismatch (mtg.py:193): constant is 2024-10-01 but the available() docstring and test_mtg_available both expect 2024-01-16 as the boundary — the fast unit test will fail without a fix.
  • Segment sort inconsistency (mtg.py:421): _download_product selects which zip segments to extract using a lexicographic sort of body filenames, while _read_channel sorts by chunk number. The PR description explicitly calls out that these orderings can differ, so ROI fetches can silently extract the wrong row-range segments.

Confidence Score: 3/5

Not safe to merge as-is: the existing fast unit test will fail, and ROI data fetches can silently return misaligned pixel data.

Two P1 findings: (1) _OPERATIONAL_START = 2024-10-01 makes test_mtg_available fail for the 2024-01-16 parametrize case; (2) lexicographic sort in _download_product can extract wrong segments for ROI requests, producing silently wrong data. Both are straightforward fixes but block correctness.

earth2studio/data/mtg.py — lines 193 (_OPERATIONAL_START) and 421 (body_names sort key).

Important Files Changed

Filename Overview
earth2studio/data/mtg.py New MTG FCI data source — two P1 bugs: _OPERATIONAL_START set to 2024-10-01 breaks the available() unit test (expects 2024-01-16), and body_names lexicographic sort in _download_product disagrees with the chunk-number sort in _read_channel, silently extracting wrong segments for ROI fetches.
earth2studio/data/init.py Adds from .mtg import MTG — placed before hrrr rather than after, breaking alphabetical module ordering (minor style issue).
earth2studio/lexicon/init.py Adds MTGLexicon export — same alphabetical ordering issue as data/__init__.py (mtg placed before hrrr).
earth2studio/lexicon/mtg.py New MTGLexicon mapping all 12 non-HR FCI channels to their NetCDF group-path names; well-structured, follows LexiconType metaclass pattern correctly.
earth2studio/lexicon/base.py Adds 12 MTG vocabulary entries to E2STUDIO_VOCAB; entries are well-described and consistent with existing satellite channel patterns.
test/data/test_mtg.py Good coverage of unit and integration tests; test_mtg_available will fail because the fixture datetime(2024, 1, 16) is earlier than the hard-coded _OPERATIONAL_START (2024-10-01).

Reviews (1): Last reviewed commit: "fix tests and add retry" | Re-trigger Greptile

Comment thread earth2studio/data/mtg.py
X_SCALE_1KM = X_SCALE_2KM / 2 # -2.794357630158035e-05
Y_SCALE_1KM = Y_SCALE_2KM / 2 # +2.794357630158035e-05

_OPERATIONAL_START = datetime(2024, 10, 1, tzinfo=timezone.utc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _OPERATIONAL_START date contradicts tests and docstring

_OPERATIONAL_START is set to 2024-10-01, but available()'s docstring says "≥ 2024-01-16" and test_mtg_available explicitly asserts available(datetime(2024, 1, 16, tzinfo=timezone.utc)) == True. Since 2024-01-16 < 2024-10-01, the implementation returns False — the unit test will fail as-is.

The PR description also lists "2024-01-16 (MTG-I1 operational start)" as the date range start. If that is the intended boundary, fix the constant:

Suggested change
_OPERATIONAL_START = datetime(2024, 10, 1, tzinfo=timezone.utc)
_OPERATIONAL_START = datetime(2024, 1, 16, tzinfo=timezone.utc)

Comment thread earth2studio/data/mtg.py
first_seg, last_seg = 0, 0
to_extract = all_names

for name in to_extract:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Segment sort key inconsistency causes wrong ROI extraction

body_names is sorted lexicographically here, but the PR description explicitly notes that filenames contain per-segment timestamps that can sort out of lexicographic order. _read_channel correctly sorts the extracted files by the chunk-number suffix (rsplit("_", 1)[-1]), but this method uses the lexicographic position to decide which files to extract (body_names[first_seg : last_seg + 1]). When the two orderings differ, the wrong row-range segment files are pulled from the zip, and ROI fetches return silently misaligned data.

The fix is to use the same sort key that _read_channel already applies — sorting body_names by the integer chunk number parsed from the last underscore-delimited token in each filename (before .nc).

Comment on lines +29 to 30
from .mtg import MTG
from .hrrr import HRRR, HRRR_FX
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Import ordering: mtg should follow hrrr, not precede it

The file keeps imports ordered alphabetically by module name. hrrr sorts before mtg (h < m), so the new import is placed one line too early. The same issue appears in earth2studio/lexicon/__init__.py (line 27).

Suggested change
from .mtg import MTG
from .hrrr import HRRR, HRRR_FX
from .hrrr import HRRR, HRRR_FX
from .mtg import MTG

Comment thread earth2studio/data/mtg.py
Comment on lines +446 to +448
old_handler = signal.signal(
signal.SIGALRM,
lambda *_: (_ for _ in ()).throw(TimeoutError("MTG download timed out")),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 signal.SIGALRM is unavailable on Windows

signal.SIGALRM is a POSIX-only signal. Any attempt to use this on Windows raises AttributeError. Since Earth2Studio targets Linux/macOS HPC nodes this is likely acceptable in practice, but adding a platform guard or a note in the docstring would prevent confusing errors if the data source is ever used on Windows.

@albertocarpentieri albertocarpentieri changed the title Acarpentieri/mtg datasource Add MTG datasource Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant