Conversation
…cal_source is Path or str
…add opt to wipe all previous attrs/encoding; create doc string for set_var_attrs; add some validation to set_var_attrs; add flag_values, flag_meanings, extra_attrs params
…for Raster CAVM classificaton data
…provided or find it in attrs/encoding
…nt to map; mapping CAVM is a pain without it
|
I really need to update the dataset validator... It's pretty useless. The failed CI check here doesn't matter. |
There was a problem hiding this comment.
Pull request overview
Adds a new gridded land-cover categorical dataset (CAVM-2-0 landCoverCat) to the ILAMB data registry and introduces utility enhancements to better support categorical/flag metadata and compressed NetCDF output.
Changes:
- Register a new NetCDF artifact for CAVM-2-0 landCoverCat in
registry/data.txt. - Add a new conversion script to download, reproject, and write the landCoverCat product with CF flag metadata (plus extra palette/description attrs).
- Extend
ilamb3_datautilities: HTML downloads now optionally unzip, coordinates/variables support compression encoding, andset_var_attrsadds validation + flag handling.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| registry/data.txt | Registers the new CAVM-2-0 landCoverCat NetCDF file and checksum. |
| ilamb3_data/init.py | Enhances download and attribute-setting utilities (zip extraction, compression, flag metadata, validations). |
| data/CAVM-2-0/convert.py | New pipeline to fetch/extract CAVM, reproject to lat/lon, set CF/ODS attrs, and write the output NetCDF (with plotting verification). |
Comments suppressed due to low confidence (1)
ilamb3_data/init.py:736
_FILL_VALUESattempts to support unicode strings withnp.dtype('U'), butnp.dtype('U')(length 0) won’t match typical dtypes like'<U32', and the kind-based fallback doesn’t handlefinal_dt.kind == 'U'(or object strings). This can lead toValueError: No CF _FillValue defined...for valid string variables. Consider adding afinal_dt.kind == 'U'fallback (and/or handling object/variable-length strings explicitly).
# default CF _FillValue options (see https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html#classic_format_spec)
_FILL_VALUES = {
np.dtype("S1"): np.bytes_(b"\x00"), # char (fixed-length)
np.dtype("U"): "", # string (variable-length)
np.int8: np.int8(-127), # byte
np.int16: np.int16(-32767), # short
np.int32: np.int32(2147483647), # int
np.float32: np.float32(1.0e20), # float
np.float64: np.float64(1.0e20), # double
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tained plot (don't use df, use what is stored in netcdf)
There was a problem hiding this comment.
I don't have strong feelings about this, but I wonder why we don't do something more like this:
Dimensions: (lat: 4271, lon: 30556, bnds: 2, flag_values: 20)
Coordinates:
* lat (lat) float64 34kB 89.99 89.98 89.97 ... 39.71 39.7 39.69
* lon (lon) float64 244kB -180.0 -180.0 -180.0 ... 180.0 180.0
* flag_values (flag_values) int8 20B 1 2 3 4 5 21 ... 42 43 91 92 93 99
flag_meanings (flag_values) <U3 240B 'B1' 'B2a' 'B3' ... 'SW' 'GL' 'NA'
flag_descriptions (flag_values) <U50 4kB 'cryptogam_herb_barren' ... 'no...
Dimensions without coordinates: bnds
Data variables:
landCoverCat (lat, lon) float32 522MB 92.0 92.0 92.0 ... nan nan nan
lat_bnds (lat, bnds) float64 68kB 89.99 90.0 89.98 ... 39.68 39.69
lon_bnds (lon, bnds) float64 489kB -180.0 -180.0 ... 180.0 180.0
Attributes: (12/38)
Conventions: CF-1.12 ODS-2.6
activity_id: ILAMB
...
Your way may be most like the standard and if you are more comfortable keeping it that way, then that is fine with me. But it irks me that I cannot use the flag meanings/descriptions without post-processing them. That is, I have to read it and split by spaces for them to be useful at all. This seems silly when we can just store the arrays? We can chat about this.
Decided to stick with conventions as closely as possible. So, |
This is a gridded land cover classification product, which is new to ILAMB, so how we format it may change over time.
The landCoverCat data are stored as U-ints, but landCoverCat has the attributes
flag_valuesandflag_meanings, which are the CF-standard way for mapping a numeric to a string. However, we add our own additional attribute,flag_descriptions, because CAVM has an integer, string code, and string description for each landCoverCat.I went a little crazy and also added
flag_colors, which are string hex codes to use for mapping. Mapping CAVM without their color palette is a nightmare, so better to have it handy.netcdf ILAMB_UAF_CAVM-2-0_fx_landCoverCat_gr_v20260331 { dimensions: lon = 30556 ; lat = 4271 ; bnds = 2 ; variables: double lon(lon) ; lon:axis = "X" ; lon:units = "degrees_east" ; lon:standard_name = "longitude" ; lon:long_name = "Longitude" ; lon:bounds = "lon_bnds" ; double lat(lat) ; lat:axis = "Y" ; lat:units = "degrees_north" ; lat:standard_name = "latitude" ; lat:long_name = "Latitude" ; lat:bounds = "lat_bnds" ; byte landCoverCat(lat, lon) ; landCoverCat:_FillValue = -127b ; landCoverCat:units = "" ; landCoverCat:standard_name = "cover_category" ; landCoverCat:long_name = "Vegetation or Land-Cover Category" ; landCoverCat:flag_values = 1b, 2b, 3b, 4b, 5b, 21b, 22b, 23b, 24b, 31b, 32b, 33b, 34b, 41b, 42b, 43b, 91b, 92b, 93b, 99b ; landCoverCat:flag_meanings = "B1 B2a B3 B4 B2b G1 G2 G3 G4 P1 P2 S1 S2 W1 W2 W3 FW SW GL NA" ; landCoverCat:flag_descriptions = "cryptogam_herb_barren cryptogam_barren_complex non-carbonate_mountain_complex carbonate_mountain_complex cryptogam_barren_dwarf-shrub_complex graminoid_forb_cryptogam_tundra graminoid_prostrate_dwarf-shrub_forb_moss_tundra non-tussock_sedge_dwarf-shrub_moss_tundra tussock-sedge_dwarf-shrub_moss_tundra prostrate_dwarf-shrub_herb_lichen_tundra prostrate-hemi-prostrate_dwarf-shrub_lichen_tundra erect_dwarf-shrub_moss_tundra low-shrub_moss_tundra sedge-grass_moss_wetland_complex sedge_moss_dwarf-shrub_wetland_complex sedge_moss_low-shrub_wetland_complex fresh_water salt_water glacier non-arctic" ; landCoverCat:flag_colors = "#d7d7b3 #a8a802 #a68282 #8282a0 #cdcd66 #ffebaf #ffd37f #e6e600 #ffff00 #dfb0b0 #db949e #97e602 #38a802 #9eedbd #73ffdf #04e6a9 #0070ff #e0f2ff #ffffff #cccccc" ; double lat_bnds(lat, bnds) ; double lon_bnds(lon, bnds) ; // global attributes: :Conventions = "CF-1.12 ODS-2.6" ; :activity_id = "ILAMB" ; :aux_uncertainty_id = "N/A" ; :contact = "Martha Raynolds (mkraynolds@alaska.edu)" ; :creation_date = "20260331" ; :data_specs_version = "2.6" ; :dataset_contributor = "Morgan Steckler" ; :doi = "https://doi.org/10.17632/c4xj5rv6kv.2" ; :frequency = "fx" ; :grid = "1x1 km Lambert Azimuthal Equal Area reprojected to latitude x longitude" ; :grid_label = "gr" ; :has_aux_unc = "FALSE" ; :history = "\n20260331: \"CMORized\" data from Raster CAVM 2.0 (downloaded from Mendeley Data)\n\n20260331: Reprojected to lat/lon and added metadata using ILAMB utilities\n" ; :institution = "University of Alaska, Fairbanks, USA" ; :institution_id = "UAF" ; :license = "https://creativecommons.org/licenses/by-nc/3.0/deed.en" ; :nominal_resolution = "1 km" ; :processing_code_location = "https://github.com/rubisco-sfa/ilamb3-data/tree/main/data/CAVM-2-0/convert.py" ; :product = "derived" ; :realm = "land" ; :references = "Raynolds, M.K., et al. (2019). A raster version of the Circumpolar Arctic Vegetation Map (CAVM). Remote Sensing of Environment, 232, 111297. https://doi.org/10.1016/j.rse.2019.111297" ; :region = "panarctic" ; :site_id = "N/A" ; :site_location = "N/A" ; :source = "Unsupervised classifications of seventeen geographic/floristic sub-sections of the Arctic, using AVHRR and MODIS data (reflectance and NDVI) and elevation data" ; :source_data_retrieval_date = "2026-03-30T12:06:45Z" ; :source_data_url = "https://data.mendeley.com/public-files/datasets/c4xj5rv6kv/files/5223c414-234a-498c-ae08-3100cb38510f/file_downloaded" ; :source_id = "CAVM-2-0" ; :source_label = "CAVM" ; :source_type = "satellite_retrieval" ; :source_version_number = "2.0" ; :table_id = "N/A" ; :title = "Raster Circumpolar Arctic Vegetation Map" ; :tracking_id = "hdl:21.14102/60f0d31b-3ce2-4d24-875a-e89614a04723" ; :variable_id = "landCoverCat" ; :variant_info = "CMORized product prepared by ILAMB" ; :variant_label = "ILAMB" ; :version = "v20260331" ; }