Create cloud-free composite images from temporal mosaics of HLS granules using the HLS STAC geoparquet archive and lazycogs.
The CMR STAC API has imposed rate limits on the HLS collections. This algorithm queries HLS STAC records directly from parquet files in S3, then generates cloud-free composites by mosaicking multiple observations over time.
This DPS algorithm uses rustac to query an archive of HLS STAC records stored as STAC geoparquet, then uses lazycogs to read the HLS COG assets and build cloud-free temporal composites. The workflow:
- Queries HLS STAC items for a given bounding box and time range.
- Opens the requested spectral bands and
Fmaskseparately withlazycogs. - Masks out cloudy pixels using HLS
Fmaskbits 1, 2, and 3. - Computes the median value for each pixel across the time series.
- Exports the result as Cloud Optimized GeoTIFFs with STAC metadata.
The export path writes the affine transform from lazycogs metadata explicitly before calling rioxarray/GDAL so the output COGs stay on the intended rectilinear grid.
Warning
This archive of HLS STAC records is experimental and is a few days behind the latest records in CMR. See the hls-stac-geoparquet-archive repo for details.
The parquet query path still reads from the MAAP-hosted geoparquet archive through DuckDB's AWS credential chain.
There are two HLS asset access modes, but they are not equally important:
- Production DPS / OGC Application Package runs use direct S3 mode (
direct_bucket_access=True).run.shenables this by default and reads LP DAAC assets froms3://lp-prod-protected/...through an authenticatedS3Store. - Local smoke tests use HTTPS mode (
direct_bucket_access=False). This reads LP DAAC asset URLs over HTTPS through an authenticatedHTTPStoreand avoids theus-west-2direct-S3 constraint.
Do not add direct_bucket_access as a MAAP DPS application input. The deployed package should always use the run.sh default. The only supported reason to set DIRECT_BUCKET_ACCESS=false is local testing, such as smoketest.sh.
Direct S3 mode uses obstore.auth.earthdata.NasaEarthdataCredentialProvider to fetch short-lived LP DAAC S3 credentials. This mode is intended for DPS compute running in us-west-2.
Set these environment variables before running the local HTTPS smoke test:
export EARTHDATA_USERNAME="your-earthdata-username"
export EARTHDATA_PASSWORD="your-earthdata-password"run.sh is the MAAP DPS entry point. It keeps the four positional inputs from algorithm-config.yml for backwards compatibility with current MAAP DPS and enables direct S3 bucket access by default:
./run.sh "2025-05-01T00:00:00Z" "2025-05-31T23:59:59Z" "500000 5000000 600000 5100000" "EPSG:32615"run.sh also accepts named arguments for direct development use and the in-development OGC Application Package invocation style:
./run.sh \
--start_datetime "2025-05-01T00:00:00Z" \
--end_datetime "2025-05-31T23:59:59Z" \
--bbox 500000 5000000 600000 5100000 \
--crs "EPSG:32615"For local-only testing, disable direct bucket access with DIRECT_BUCKET_ACCESS=false:
DIRECT_BUCKET_ACCESS=false ./run.sh "2025-05-01T00:00:00Z" "2025-05-31T23:59:59Z" "500000 5000000 600000 5100000" "EPSG:32615"Direct Python invocation is useful for development. Unlike run.sh, main.py defaults to HTTPS mode unless --direct_bucket_access is passed.
HTTPS-backed local path:
uv run main.py \
--start_datetime "2025-05-01T00:00:00Z" \
--end_datetime "2025-05-31T23:59:59Z" \
--bbox 500000 5000000 600000 5100000 \
--crs "EPSG:32615" \
--output_dir "/tmp/output"Direct S3 path matching DPS behavior:
uv run main.py \
--start_datetime "2025-05-01T00:00:00Z" \
--end_datetime "2025-05-31T23:59:59Z" \
--bbox 500000 5000000 600000 5100000 \
--crs "EPSG:32615" \
--output_dir "/tmp/output" \
--direct_bucket_accessThe repo includes a bounded smoke test that exercises the DPS wrapper through the HTTPS path:
./smoketest.shThat script is equivalent to running run.sh with DIRECT_BUCKET_ACCESS=false. This is the intended place to set direct bucket access to false.
--start_datetime: Start datetime in ISO format.--end_datetime: End datetime in ISO format.--bbox: Bounding box coordinates (xmin ymin xmax ymax).--crs: CRS definition for the bounding box coordinates. Must use meter units.--output_dir: Directory where output files will be saved.--direct_bucket_access: Developer-levelmain.pyflag for direct S3 reads.run.shpasses this automatically for DPS; local smoke tests omit it viaDIRECT_BUCKET_ACCESS=false.
A successful run writes:
output/<band>.tiffor each requested bandoutput/item.json
The default output bands are red, green, blue, nir_narrow, swir_1, and swir_2.