Conversation
Signed-off-by: root <root@cpu-dm-0002.cm.cluster>
Signed-off-by: Alberto Carpentieri <acarpentieri@oci-hsg-cs-001-login-01.cm.cluster>
Greptile SummaryAdds
|
| Filename | Overview |
|---|---|
| earth2studio/data/mtg.py | New MTG FCI data source — two P1 bugs: _OPERATIONAL_START set to 2024-10-01 breaks the available() unit test (expects 2024-01-16), and body_names lexicographic sort in _download_product disagrees with the chunk-number sort in _read_channel, silently extracting wrong segments for ROI fetches. |
| earth2studio/data/init.py | Adds from .mtg import MTG — placed before hrrr rather than after, breaking alphabetical module ordering (minor style issue). |
| earth2studio/lexicon/init.py | Adds MTGLexicon export — same alphabetical ordering issue as data/__init__.py (mtg placed before hrrr). |
| earth2studio/lexicon/mtg.py | New MTGLexicon mapping all 12 non-HR FCI channels to their NetCDF group-path names; well-structured, follows LexiconType metaclass pattern correctly. |
| earth2studio/lexicon/base.py | Adds 12 MTG vocabulary entries to E2STUDIO_VOCAB; entries are well-described and consistent with existing satellite channel patterns. |
| test/data/test_mtg.py | Good coverage of unit and integration tests; test_mtg_available will fail because the fixture datetime(2024, 1, 16) is earlier than the hard-coded _OPERATIONAL_START (2024-10-01). |
Reviews (1): Last reviewed commit: "fix tests and add retry" | Re-trigger Greptile
| X_SCALE_1KM = X_SCALE_2KM / 2 # -2.794357630158035e-05 | ||
| Y_SCALE_1KM = Y_SCALE_2KM / 2 # +2.794357630158035e-05 | ||
|
|
||
| _OPERATIONAL_START = datetime(2024, 10, 1, tzinfo=timezone.utc) |
There was a problem hiding this comment.
_OPERATIONAL_START date contradicts tests and docstring
_OPERATIONAL_START is set to 2024-10-01, but available()'s docstring says "≥ 2024-01-16" and test_mtg_available explicitly asserts available(datetime(2024, 1, 16, tzinfo=timezone.utc)) == True. Since 2024-01-16 < 2024-10-01, the implementation returns False — the unit test will fail as-is.
The PR description also lists "2024-01-16 (MTG-I1 operational start)" as the date range start. If that is the intended boundary, fix the constant:
| _OPERATIONAL_START = datetime(2024, 10, 1, tzinfo=timezone.utc) | |
| _OPERATIONAL_START = datetime(2024, 1, 16, tzinfo=timezone.utc) |
| first_seg, last_seg = 0, 0 | ||
| to_extract = all_names | ||
|
|
||
| for name in to_extract: |
There was a problem hiding this comment.
Segment sort key inconsistency causes wrong ROI extraction
body_names is sorted lexicographically here, but the PR description explicitly notes that filenames contain per-segment timestamps that can sort out of lexicographic order. _read_channel correctly sorts the extracted files by the chunk-number suffix (rsplit("_", 1)[-1]), but this method uses the lexicographic position to decide which files to extract (body_names[first_seg : last_seg + 1]). When the two orderings differ, the wrong row-range segment files are pulled from the zip, and ROI fetches return silently misaligned data.
The fix is to use the same sort key that _read_channel already applies — sorting body_names by the integer chunk number parsed from the last underscore-delimited token in each filename (before .nc).
| from .mtg import MTG | ||
| from .hrrr import HRRR, HRRR_FX |
There was a problem hiding this comment.
Import ordering:
mtg should follow hrrr, not precede it
The file keeps imports ordered alphabetically by module name. hrrr sorts before mtg (h < m), so the new import is placed one line too early. The same issue appears in earth2studio/lexicon/__init__.py (line 27).
| from .mtg import MTG | |
| from .hrrr import HRRR, HRRR_FX | |
| from .hrrr import HRRR, HRRR_FX | |
| from .mtg import MTG |
| old_handler = signal.signal( | ||
| signal.SIGALRM, | ||
| lambda *_: (_ for _ in ()).throw(TimeoutError("MTG download timed out")), |
There was a problem hiding this comment.
signal.SIGALRM is unavailable on Windows
signal.SIGALRM is a POSIX-only signal. Any attempt to use this on Windows raises AttributeError. Since Earth2Studio targets Linux/macOS HPC nodes this is likely acceptable in practice, but adding a platform guard or a note in the docstring would prevent confusing errors if the data source is ever used on Windows.
Summary
MTG— aDataSourcefor EUMETSAT MTG-I FCI Level-1C Full Disk (FDHSI) calibrated effective radiance from the Meteosat Third Generation geostationary satelliteeumdac, extracts only the BODY segment files needed for the requested spatial extentMTGLexiconmapping the 12mtg_*variable names to native FCI channel identifiersroi) that limits both the downloaded segments and the spatial extent of the output arrayData source details
DataSourceEO:EUM:DAT:0662)EUMETSAT_CONSUMER_KEY+EUMETSAT_CONSUMER_SECRETData licensing
License: EUMETSAT Data Policy — free and open access
Open data, freely available for commercial and non-commercial use. Registration required for API access.
Key implementation details
__call__must share the same native resolution (1 km or 2 km); mixed-resolution requests raiseValueError— make separate calls for VIS/NIR and WV/IR channels.ncfiles; only segments overlapping the requested row range (with ±1 margin) are extracted, reducing I/O significantly for ROI fetches_NNNNsuffix before.nc), not by filename — filenames contain per-segment timestamps that can sort out of lexicographic order, which would cause row misalignment whenopen_mfdatasetconcatenates alongySIGALRMtimeout (1200 s) with up to 3 retries and exponential backoff for transient network failures; previous alarm state restored after each attempt to avoid interfering with external signal handlersgrid()static method: returns(lat, lon)for the full disk at"1km"or"2km"resolution independently of any datasource instanceTest plan
grid()shape for both resolutions, invalid resolution rejection,available()date checks, all 12 channels present inMTGLexicon, mixed-resolutionValueError@pytest.mark.slow): single-channel 2 km fetch, multi-channel 2 km fetch, 1 km channel fetch, two-timestep fetch, ROI fetch (Europe, 2 km and 1 km), cache on/offChecklist
Dependencies