Skip to content

Improve Documentation for Non-CellProfiler Datasets in Pycytominer#430

Merged
gwaybio merged 33 commits intocytomining:mainfrom
axiomcura:update-docs
Sep 26, 2024
Merged

Improve Documentation for Non-CellProfiler Datasets in Pycytominer#430
gwaybio merged 33 commits intocytomining:mainfrom
axiomcura:update-docs

Conversation

@axiomcura
Copy link
Copy Markdown
Member

Description

Thank you for your contribution to pycytominer!
Please succinctly summarize your proposed change.
What motivated you to make this change?

This PR updates the README with instructions on how to handle morphological features extracted from software other than CellProfiler. These changes were conducted due to a reviewer’s comment/suggestion.

"When I used the function pycytominer.normalize() on a dataset originating from my own work, I encountered an error message saying
“No CP features found. Are you sure this dataframe is from CellProfiler?”.
I could easily fix the problem because I know the properties of a CellProfiler dataset, and the error message was hinting at some “Metadata_” missing.
The fix was very easy, I just needed to add the prefix “Metadata_” to my metadata columns. My dataset comes from another commonly used proprietary software called Harmony. I understand that the authors are in favor of using open-source resources, but I think that the pycytominer tool could be useful in a much larger context, and take as input different types of datasets. In the readme file, that I followed to perform the normalization, I couldn’t find anywhere that the dataset has to come from CellProfiler. The author should state clearly, that only cellprofiler datasets can be used. Instead, currently, the sentence in the readme files of the pycytominer github repository says “Image data flow from a microscope to cell segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler)”. And in the code chunck the sentence says “Each function takes as input a pandas DataFrame or file path”, but in reality, it has to be a specifically formatted pandas dataframe (at least regarding the column names). I think the authors should fix this misunderstanding in the documentation. Alternatively, the better option would be to support the use of different datasets, and explain how the column names should be formatted to be easily processed by the different functions. I can’t exclude that this is described somewhere in the tutorials, but I didn’t encounter it when following the instructions on the readmefile. I would also make the error message more self-explanatory because for now, it implies that I know what CellProfiler is and how its output is formatted. If I don’t have this knowledge, as a user, I might give up on using this tool. This would be unfortunate, as I believe that this tool is a fundamental step in image-based profiling. Considering how easy was to fix this issue for my own dataset, which allowed me to immediately see the outputs of your tool, I recommend reconsidering how to best document the formatting steps necessary to be able to use the different functions of the pycytominer too"

Please also link to any relevant issues that your code is associated with.

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@axiomcura axiomcura changed the title Update docs WIP: Improve Documentation for Non-CellProfiler Datasets in Pycytominer Sep 11, 2024
axiomcura and others added 5 commits September 21, 2024 18:42
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 22, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.72%. Comparing base (409d2aa) to head (6caab11).
⚠️ Report is 153 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##           main     #430       +/-   ##
=========================================
+ Coverage      0   94.72%   +94.72%     
=========================================
  Files         0       57       +57     
  Lines         0     3145     +3145     
=========================================
+ Hits          0     2979     +2979     
- Misses        0      166      +166     
Flag Coverage Δ
unittests 94.72% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@axiomcura axiomcura marked this pull request as ready for review September 25, 2024 17:04
@axiomcura axiomcura changed the title WIP: Improve Documentation for Non-CellProfiler Datasets in Pycytominer Improve Documentation for Non-CellProfiler Datasets in Pycytominer Sep 25, 2024
@gwaybio
Copy link
Copy Markdown
Member

gwaybio commented Sep 26, 2024

@axiomcura - please take a look at my changes. They aim to simplify and clarify. I also removed the non-CellProfiler example that actually used CellProfiler data. This is too confusing and I think the updated function docstrings and error messages in #448 are sufficient. Lastly, I removed the table of contents which was in a strange place and I felt added more confusion than help

@gwaybio gwaybio requested review from d33bs and kenibrewer September 26, 2024 12:24
Copy link
Copy Markdown
Member

@kenibrewer kenibrewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good set of changes! I like adding a description for using these other feature sets front and center.

@axiomcura
Copy link
Copy Markdown
Member Author

I like the changes @gwaybio! I agree that having the table of contents in the middle of the README was a bit confusin. It's much cleaner and more readable now.

@d33bs, waiting on your approval if the updated documentation is ready to be merged! c:

Copy link
Copy Markdown
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I left a few comments + suggestions and overall thought this LGTM when ready. Additionally, had these thoughts:

  • I noticed that the Citing pycytominer header might need to be capitalized to stay consistent with other uses in the document.
  • I noticed the image of the workflows could use maybe another block alongside CellProfiler (or some other method) to indicate the freedom to use other tools as desired (at the moment it visually communicates you can only use two tools to reach the workflows). Maybe there could be a mention of "Single-cell Profiling Software (e.g. CellProfiler, DeepProfiler, etc.)" (this is a bit too long but I think you can see what I mean!). See below:

image

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
Co-authored-by: Dave Bunten <ekgto445@gmail.com>
@gwaybio
Copy link
Copy Markdown
Member

gwaybio commented Sep 26, 2024

It looks like this is a duplicate of #448 ? @axiomcura - is this true? How are these different?

Taking a look at what's changed here, it looks like #448 is the correct PR and we should close this one.

@gwaybio
Copy link
Copy Markdown
Member

gwaybio commented Sep 26, 2024

(I'll plan on going through @d33bs comments and adding them to 448)

@gwaybio
Copy link
Copy Markdown
Member

gwaybio commented Sep 26, 2024

oh wait! I am the one confused. One sec

axiomcura and others added 4 commits September 26, 2024 13:17
Co-authored-by: Dave Bunten <ekgto445@gmail.com>
Co-authored-by: Dave Bunten <ekgto445@gmail.com>
Co-authored-by: Dave Bunten <ekgto445@gmail.com>
@gwaybio
Copy link
Copy Markdown
Member

gwaybio commented Sep 26, 2024

Thanks for the review @d33bs !

One thing I left for @axiomcura to address is this comment:

I noticed the image of the workflows could use maybe another block alongside CellProfiler (or some other method) to indicate the freedom to use other tools as desired (at the moment it visually communicates you can only use two tools to reach the workflows). Maybe there could be a mention of "Single-cell Profiling Software (e.g. CellProfiler, DeepProfiler, etc.)" (this is a bit too long but I think you can see what I mean!). See below:

@axiomcura - what do you think about dropping this figure in favor of your new figure 1? I propose that you rename the existing file (maybe to something like pipeline_legacy.png) and add a high res version of figure one (naming it pipeline.png)

axiomcura and others added 4 commits September 26, 2024 15:11
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Copy link
Copy Markdown
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉 - I'll merge this in once tests pass

@gwaybio
Copy link
Copy Markdown
Member

gwaybio commented Sep 26, 2024

update, looks like macos runners aren't picking up the jobs. Since ubuntu and other integration tests passed and this PR doesn't touch code, I will merge to reduce delays

@gwaybio gwaybio merged commit e0ab9aa into cytomining:main Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants