Summary
veda-docs states that for ingesting monthly data, you can use the following format YYYY-MM_monthly.tif or YYYYMM.tif --- the important thing is the number regex parsing.
I tried to ingest this file 202409_Hurricane_Helene_finalBMHD_VNP46A3_MonthlyComposite_2024-08_monthly.tif but the start and end datetime were for Jan 2024 instead of August 2024. ISSUE: YYYY-MM is not an acceptable format
After debugging in veda-data-airflow it was observed that this pattern YYYY-MM is incompatible, despite veda-docs saying that it is allowable. See below for veda-data-airflow regex patterns.
Additional investigation info from veda-data-airflow
Root cause: filename date format doesn't match any multi-component pattern
Per-item datetimes come from parsing the filename, not from the collection's temporal_extent (that only sets the collection-level extent). The parser is in regex.py:37-91 and is invoked from stac.py:84-86.
Tracing your filename
202409_Hurricane_Helene_finalBMHD_VNP46A3_MonthlyComposite_2024-08_monthly.tif
extract_dates tries 7 regex strategies in order, breaks on first match. All require a [_.-] separator immediately before the digits:
| # |
Pattern |
Format |
Match? |
| 1 |
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}) |
ISO |
no |
| 2 |
(\d{8}T\d{6}) |
yyyymmddThhmmss |
no |
| 3 |
(\d{4}_\d{2}_\d{2}) |
yyyy_mm_dd |
no |
| 4 |
(\d{4}-\d{2}-\d{2}) |
yyyy-mm-dd |
no — your 2024-08 is only yyyy-mm |
| 5 |
(\d{8}) |
yyyymmdd |
no — no _<8digits> anywhere |
| 6 |
(\d{6}) |
yyyymm |
no — 202409 is at filename start with no preceding _/-/. |
| 7 |
(\d{4}) |
yyyy |
matches _2024 inside _2024-08 |
So single_datetime = datetime(2024, 1, 1). Then regex.py:18-21 applies datetime_range: "month" to Jan 1 → start=2024-01-01, end=2024-01-31.
Your items are getting January 2024, not August 2024.
Fixes (pick one)
- Override in the discovery config — set explicit dates in the
discovery_items entry; these short-circuit filename parsing per stac.py:72-78:
"start_datetime": "2024-08-01T00:00:00Z",
"end_datetime": "2024-08-31T23:59:59Z"
(Drop datetime_range if using explicit start/end.) This won't work if you want per-item dates derived from each file — only if all discovered files share the same date.
- Add a yyyy-mm strategy to
DATE_REGEX_STRATEGIES in regex.py:43-51, e.g. (r"[_\.\-](\d{4}-\d{2})(?!-\d)", "%Y-%m") placed above the bare-year fallback. This is the most general fix if you have many files with _yyyy-mm_ naming.
Note
The leading 202409 (event YYYYMM) in the filename isn't reachable by any strategy because it sits at position 0 with no preceding separator — so even if you wanted that date, the regex couldn't grab it.
Summary
veda-docs states that for ingesting monthly data, you can use the following format
YYYY-MM_monthly.tiforYYYYMM.tif--- the important thing is the number regex parsing.I tried to ingest this file
202409_Hurricane_Helene_finalBMHD_VNP46A3_MonthlyComposite_2024-08_monthly.tifbut the start and end datetime were forJan 2024instead ofAugust 2024.ISSUE: YYYY-MM is not an acceptable formatAfter debugging in
veda-data-airflowit was observed that this patternYYYY-MMis incompatible, despiteveda-docssaying that it is allowable. See below for veda-data-airflow regex patterns.Additional investigation info from veda-data-airflow
Root cause: filename date format doesn't match any multi-component pattern
Per-item datetimes come from parsing the filename, not from the collection's
temporal_extent(that only sets the collection-level extent). The parser is in regex.py:37-91 and is invoked from stac.py:84-86.Tracing your filename
202409_Hurricane_Helene_finalBMHD_VNP46A3_MonthlyComposite_2024-08_monthly.tifextract_datestries 7 regex strategies in order, breaks on first match. All require a[_.-]separator immediately before the digits:(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})(\d{8}T\d{6})(\d{4}_\d{2}_\d{2})(\d{4}-\d{2}-\d{2})2024-08is only yyyy-mm(\d{8})_<8digits>anywhere(\d{6})202409is at filename start with no preceding_/-/.(\d{4})_2024inside_2024-08So
single_datetime = datetime(2024, 1, 1). Then regex.py:18-21 appliesdatetime_range: "month"to Jan 1 → start=2024-01-01, end=2024-01-31.Your items are getting January 2024, not August 2024.
Fixes (pick one)
discovery_itemsentry; these short-circuit filename parsing per stac.py:72-78:datetime_rangeif using explicit start/end.) This won't work if you want per-item dates derived from each file — only if all discovered files share the same date.DATE_REGEX_STRATEGIESin regex.py:43-51, e.g.(r"[_\.\-](\d{4}-\d{2})(?!-\d)", "%Y-%m")placed above the bare-year fallback. This is the most general fix if you have many files with_yyyy-mm_naming.Note
The leading
202409(event YYYYMM) in the filename isn't reachable by any strategy because it sits at position 0 with no preceding separator — so even if you wanted that date, the regex couldn't grab it.