Skip to content

Fix typos and ambiguities#2

Open
minnerbe wants to merge 4 commits intokevinyamauchi:add-tables-pointsfrom
minnerbe:add-tables-points
Open

Fix typos and ambiguities#2
minnerbe wants to merge 4 commits intokevinyamauchi:add-tables-pointsfrom
minnerbe:add-tables-points

Conversation

@minnerbe
Copy link
Copy Markdown

Hi Kevin, thanks for putting together this addition to the OME-NGFF spec!

I've been reading the table spec very carefully lately and came across some minor typos and (perceived) ambiguities. Therefore, I suggest a few changes that hopefully contribute to the spec being more precise and complete:

  • Nullable integers seemed to be at the wrong hierarchy level within the var group, so I put them one level higher.
  • The wording of the chunking recommendations for columns in the obs and var dataframes seemed ambiguous to me. I tried to make it as clear as possible and consistent with obsm and varm.
  • I added some recommendations for chunking of arrays within non-basic dataframe columns (nullable integers and categoricals) as well as for sparse arrays. The rationale behind my changes is that arrays that are likely to be read together should ideally share chunk sizes (which was your original intention, I believe). In particular:
    • the codes, mask and values array should be chunked the same way as the corresponding dimension of X;
    • the data and indices arrays of sparse matrices should be chunked in the same way.

Furthermore, I am confused by the absence of nullable integers and categoricals in the obs group. In my understanding, obs and var should be qualitatively the same, but associated with different dimensions of X, is this correct?
If this absence is unintentional, I suggest to add nullable integers and categoricals also to obs and I'd be happy to do so.

minnerbe added 4 commits May 19, 2023 10:13
* Slightly reformulate chunking of obs and var to eliminate ambiguities
* Chunking for obs and var is now consistent with osbm and varm
* Data in tables SHOULD now be consistently chunked the same as the
  corresponding dimension of X. This now includes ALL datatypes that can
  be used as columns.
* Data and indices of sparse matrices SHOULD be chunked in the same way.
@github-actions
Copy link
Copy Markdown

Automated Review URLs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant