-
Notifications
You must be signed in to change notification settings - Fork 40
docs: add pydocsstyle linting and switch to mkdocs #396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c1d2138
eb8a886
059267f
51a9f8a
54730fc
14f78a2
8aae6dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,23 @@ env: | |
| TARGET_PYTHON_VERSION: "3.9" | ||
|
|
||
| jobs: | ||
| docs-check: | ||
| # This job is used to check the documentation build with strict mode enabled | ||
| name: Docs check | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout repo | ||
| uses: actions/checkout@v4 | ||
| - name: Setup python, and load cache | ||
| uses: ./.github/actions/setup-env | ||
| with: | ||
| python-version: ${{ env.TARGET_PYTHON_VERSION }} | ||
| cache-pre-commit: false | ||
| cache-venv: true | ||
| setup-poetry: true | ||
| install-deps: true | ||
| - name: Build documentation | ||
| run: poetry run mkdocs build --strict | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to build this check into a pre-GH-Actions check so developers can avoid surprising results from possible failures here? Mostly I wonder here if there's a pre-commit check which could be included that runs a check that things are prepared in advance. |
||
| quality-test: | ||
| # This job is used to run pre-commit checks to ensure that all files are | ||
| # are formatted correctly. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,21 +4,27 @@ build: | |
| os: ubuntu-22.04 | ||
| tools: | ||
| python: "3.11" | ||
| jobs: | ||
| post_checkout: | ||
| # Full history is required for dunamai to calculate the version | ||
| - git fetch --unshallow || true | ||
| post_create_environment: | ||
| # Install poetry | ||
| # https://python-poetry.org/docs/#installing-manually | ||
| - pip install poetry | ||
| # Tell poetry to not use a virtual environment | ||
| - poetry config virtualenvs.create false | ||
| post_install: | ||
| # Install dependencies with 'docs' dependency group | ||
| # https://python-poetry.org/docs/managing-dependencies/#dependency-groups | ||
| - poetry install --with dev,docs --all-extras | ||
| commands: | ||
| # Full history is required for dunamai to calculate the version | ||
| - git fetch --unshallow || true | ||
| # Install poetry | ||
| # https://python-poetry.org/docs/#installing-manually | ||
| - pip install poetry | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might be an external issue: reading through this line and recognizing |
||
| # Install poetry-dynamic-versioning plugin | ||
| - poetry self add "poetry-dynamic-versioning[plugin]" | ||
| # Build the project | ||
| - poetry build --format sdist | ||
| # Extract the built sdist | ||
| - mkdir -p dist/sdist && tar -xzf dist/*.tar.gz -C dist/sdist/ | ||
| # Replace the files from the repo with the built sdist | ||
| - mv dist/sdist/*/pyproject.toml . | ||
| # Tell poetry to not use a virtual environment | ||
| - poetry config virtualenvs.create false | ||
| # Install dependencies with 'docs' dependency group | ||
| # https://python-poetry.org/docs/managing-dependencies/#dependency-groups | ||
| - poetry install --with dev,docs --all-extras | ||
| # Build the docs | ||
| - poetry run mkdocs build --clean --site-dir $READTHEDOCS_OUTPUT/html --config-file mkdocs.yml | ||
|
|
||
| sphinx: | ||
| builder: html | ||
| configuration: docs/conf.py | ||
| mkdocs: | ||
| configuration: mkdocs.yml | ||
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,8 +18,8 @@ test: ## Test the code with pytest | |
|
|
||
| .PHONY: docs | ||
| docs: ## Build the documentation | ||
| @echo "📚 Building documentation" | ||
| @poetry run sphinx-build docs build | ||
| @echo "📚 Serving documentation" | ||
| @mkdocs serve | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it still make sense to run the build or serve through |
||
|
|
||
| .PHONY: build | ||
| build: clean-build ## Build wheel file using poetry | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would this become a duplicate of the content found at the root of the project? If so, is there a way we could dynamically reference the root readme without having to create a duplicate here? I can see how differentiating here might be a good idea, but if the content is a clone, it could be difficult to keep up to date over time (even minor skew here could be confusing to readers). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| <img height="200" src="https://raw.githubusercontent.com/cytomining/pycytominer/main/logo/with-text-for-light-bg.png?raw=true"> | ||
|
|
||
| # Data processing for image-based profiling | ||
|
|
||
| [](https://github.com/cytomining/pycytominer/actions/workflows/integration-test.yml?query=branch%3Amain) | ||
| [](https://codecov.io/github/cytomining/pycytominer?branch=main) | ||
| [](https://github.com/astral-sh/ruff) | ||
| [](https://pycytominer.readthedocs.io/) | ||
| [](https://doi.org/10.48550/arXiv.2311.13417) | ||
|
|
||
| Pycytominer is a suite of common functions used to process high dimensional readouts from high-throughput cell experiments. | ||
| The tool is most often used for processing data through the following pipeline: | ||
|
|
||
| <img height="325" alt="Description of the pycytominer pipeline. Images flow from feature extraction and are processed with a series of steps" src="https://github.com/cytomining/pycytominer/blob/main/media/pipeline.png?raw=true"> | ||
|
|
||
| [Click here for high resolution pipeline image](https://github.com/cytomining/pycytominer/blob/main/media/pipeline.png) | ||
|
|
||
| Image data flow from a microscope to cell segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler). | ||
| From here, additional single cell processing tools curate the single cell readouts into a form manageable for pycytominer input. | ||
| For CellProfiler, we use [cytominer-database](https://github.com/cytomining/cytominer-database) or [CytoTable](https://github.com/cytomining/CytoTable). | ||
| For DeepProfiler, we include single cell processing tools in [pycytominer.cyto_utils](cyto_utils.md). | ||
|
|
||
| From the single cell output, pycytominer performs five steps using a simple API (described below), before passing along data to [cytominer-eval](https://github.com/cytomining/cytominer-eval) for quality and perturbation strength evaluation. | ||
|
|
||
| ## Installation | ||
|
|
||
| You can install pycytominer via pip: | ||
|
|
||
| ```bash | ||
| pip install pycytominer | ||
| ``` | ||
|
|
||
| or conda: | ||
|
|
||
| ```bash | ||
| conda install -c conda-forge pycytominer | ||
| ``` | ||
|
|
||
| ## Frameworks | ||
|
|
||
| Pycytominer is primarily built on top of [pandas](https://pandas.pydata.org/docs/index.html), also using aspects of SQLAlchemy, sklearn, and pyarrow. | ||
|
|
||
| Pycytominer currently supports [parquet](https://parquet.apache.org/) and compressed text file (e.g. `.csv.gz`) i/o. | ||
|
|
||
| ## API | ||
|
|
||
| Pycytominer has five major processing functions: | ||
|
|
||
| 1. Aggregate - Average single-cell profiles based on metadata information (most often "well"). | ||
| 2. Annotate - Append metadata (most often from the platemap file) to the feature profile | ||
| 3. Normalize - Transform input feature data into consistent distributions | ||
| 4. Feature select - Exclude non-informative or redundant features | ||
| 5. Consensus - Average aggregated profiles by replicates to form a "consensus signature" | ||
|
|
||
| The API is consistent for each of these functions: | ||
|
|
||
| ```python | ||
| # Each function takes as input a pandas DataFrame or file path | ||
| # and transforms the input data based on the provided options and methods | ||
| df = function( | ||
| profiles_or_path, | ||
| features, | ||
| samples, | ||
| method, | ||
| output_file, | ||
| additional_options... | ||
| ) | ||
| ``` | ||
|
|
||
| Each processing function has unique arguments, see our [documentation](https://pycytominer.readthedocs.io/) for more details. | ||
|
|
||
| ## Usage | ||
|
|
||
| The default way to use pycytominer is within python scripts, and using pycytominer is simple and fun. | ||
|
|
||
| ```python | ||
| # Real world example | ||
| import pandas as pd | ||
| import pycytominer | ||
|
|
||
| commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98" | ||
| url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz" | ||
|
|
||
| df = pd.read_csv(url) | ||
|
|
||
| normalized_df = pycytominer.normalize( | ||
| profiles=df, | ||
| method="standardize", | ||
| samples="Metadata_broad_sample == 'DMSO'" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Pipeline orchestration | ||
|
|
||
| Pycytominer is a collection of different functions with no explicit link between steps. | ||
| However, some options exist to use pycytominer within a pipeline framework. | ||
|
|
||
| | Project | Format | Environment | pycytominer usage | | ||
| | :------------------------------------------------------------------------------- | :-------- | :------------------- | :---------------------- | | ||
| | [Profiling-recipe](https://github.com/cytomining/profiling-recipe) | yaml | agnostic | full pipeline support | | ||
| | [CellProfiler-on-Terra](https://github.com/broadinstitute/cellprofiler-on-Terra) | WDL | google cloud / Terra | single-cell aggregation | | ||
| | [CytoSnake](https://github.com/WayScience/CytoSnake) | snakemake | agnostic | full pipeline support | | ||
|
|
||
| A separate project called [AuSPICES](https://github.com/broadinstitute/AuSPICEs) offers pipeline support up to image feature extraction. | ||
|
|
||
| ## Other functionality | ||
|
|
||
| Pycytominer was written with a goal of processing any high-throughput image-based profiling data. | ||
| However, the initial use case was developed for processing image-based profiling experiments specifically. | ||
| And, more specifically than that, image-based profiling readouts from [CellProfiler](https://github.com/CellProfiler) measurements from [Cell Painting](https://www.nature.com/articles/nprot.2016.105) data. | ||
|
|
||
| Therefore, we have included some custom tools in `pycytominer/cyto_utils` that provides other functionality: | ||
|
|
||
| Note, [`pycytominer.cyto_utils.cells.SingleCells()`](cyto_utils.md##pycytominer.cyto_utils.cells) contains code to interact with single-cell SQLite files, which are output from CellProfiler. | ||
| Processing capabilities for SQLite files depends on SQLite file size and your available computational resources (for ex. memory and cores). | ||
|
|
||
| ### CellProfiler CSV collation | ||
|
|
||
| If running your images on a cluster, unless you have a MySQL or similar large database set up then you will likely end up with lots of different folders from the different cluster runs (often one per well or one per site), each one containing an `Image.csv`, `Nuclei.csv`, etc. | ||
| In order to look at full plates, therefore, we first need to collate all of these CSVs into a single file (currently SQLite) per plate. | ||
| We currently do this with a library called [cytominer-database](https://github.com/cytomining/cytominer-database). | ||
|
|
||
| If you want to perform this data collation inside pycytominer using the `cyto_utils` function `collate` (and/or you want to be able to run the tests and have them all pass!), you will need `cytominer-database==0.3.4`; this will change your installation commands slightly: | ||
|
|
||
| ```bash | ||
| # Example for general case commit: | ||
| pip install "pycytominer[collate]" | ||
|
|
||
| # Example for specific commit: | ||
| pip install "pycytominer[collate] @ git+https://github.com/cytomining/pycytominer@77d93a3a551a438799a97ba57d49b19de0a293ab" | ||
| ``` | ||
|
|
||
| If using `pycytominer` in a conda environment, in order to run `collate.py`, you will also want to make sure to add `cytominer-database=0.3.4` to your list of dependencies. | ||
|
|
||
| ### Creating a cell locations lookup table | ||
|
|
||
| The `CellLocation` class offers a convenient way to augment a [LoadData](https://cellprofiler-manual.s3.amazonaws.com/CPmanual/LoadData.html) file with X,Y locations of cells in each image. | ||
| The locations information is obtained from a single cell SQLite file. | ||
|
|
||
| To use this functionality, you will need to modify your installation command, similar to above: | ||
|
|
||
| ```bash | ||
| # Example for general case commit: | ||
| pip install "pycytominer[cell_locations]" | ||
| ``` | ||
|
|
||
| Example using this functionality: | ||
|
|
||
| ```bash | ||
| metadata_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/load_data_csv/2021_08_23_Batch12/BR00126114/test_BR00126114_load_data_with_illum.parquet" | ||
| single_single_cell_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126114/test_BR00126114.sqlite" | ||
| augmented_metadata_output="~/Desktop/load_data_with_illum_and_cell_location_subset.parquet" | ||
|
|
||
| python \ | ||
| -m pycytominer.cyto_utils.cell_locations_cmd \ | ||
| --metadata_input ${metadata_input} \ | ||
| --single_cell_input ${single_single_cell_input} \ | ||
| --augmented_metadata_output ${augmented_metadata_output} \ | ||
| add_cell_location | ||
|
|
||
| # Check the output | ||
|
|
||
| python -c "import pandas as pd; print(pd.read_parquet('${augmented_metadata_output}').head())" | ||
|
|
||
| # It should look something like this (depends on the width of your terminal): | ||
|
|
||
| # Metadata_Plate Metadata_Well Metadata_Site ... PathName_OrigRNA ImageNumber CellCenters | ||
| # 0 BR00126114 A01 1 ... s3://cellpainting-gallery/cpg0016-jump/source_... 1 [{'Nuclei_Location_Center_X': 943.512129380054... | ||
| # 1 BR00126114 A01 2 ... s3://cellpainting-gallery/cpg0016-jump/source_... 2 [{'Nuclei_Location_Center_X': 29.9516027655562... | ||
| ``` | ||
|
|
||
| ### Generating a GCT file for morpheus | ||
|
|
||
| The software [morpheus](https://software.broadinstitute.org/morpheus/) enables profile visualization in the form of interactive heatmaps. | ||
| Pycytominer can convert profiles into a `.gct` file for drag-and-drop input into morpheus. | ||
|
|
||
| ```python | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed in the Read the Docs preview that these Python code blocks don't appear to include syntax highlighting. Is it possible to enable this somehow? |
||
| # Real world example | ||
| import pandas as pd | ||
| import pycytominer | ||
|
|
||
| commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98" | ||
| plate = "SQ00014812" | ||
| url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/{plate}/{plate}_normalized_feature_select.csv.gz" | ||
|
|
||
| df = pd.read_csv(url) | ||
| output_file = f"{plate}.gct" | ||
|
|
||
| pycytominer.cyto_utils.write_gct( | ||
| profiles=df, | ||
| output_file=output_file | ||
| ) | ||
| ``` | ||
|
|
||
| ## Citing pycytominer | ||
|
|
||
| If you have used `pycytominer` in your project, please use the citation below. | ||
| You can also find the citation in the 'cite this repository' link at the top right under `about` section. | ||
|
|
||
| APA: | ||
|
|
||
| ```text | ||
| Serrano, E., Chandrasekaran, N., Bunten, D., Brewer, K., Tomkinson, J., Kern, R., Bornholdt, M., Fleming, S., Pei, R., Arevalo, J., Tsang, H., Rubinetti, V., Tromans-Coia, C., Becker, T., Weisbart, E., Bunne, C., Kalinin, A. A., Senft, R., Taylor, S. J., Jamali, N., Adeboye, A., Abbasi, H. S., Goodman, A., Caicedo, J., Carpenter, A. E., Cimini, B. A., Singh, S., & Way, G. P. Reproducible image-based profiling with Pycytominer. https://doi.org/10.48550/arXiv.2311.13417 | ||
| ``` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this file still required in order to use |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # Cyto utilities | ||
|
|
||
| Functions enabling smooth interaction with CellProfiler and DeepProfiler output formats. | ||
|
|
||
| ::: pycytominer.cyto_utils |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # Main Functions | ||
|
|
||
| <!-- prettier-ignore-start --> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is a |
||
| <!-- mkdocs block --> | ||
|
|
||
| ::: pycytominer | ||
| options: | ||
| members: | ||
| - aggregate | ||
| - annotate | ||
| - consensus | ||
| - feature_select | ||
| - normalize | ||
|
|
||
| <!-- mkdocs block END --> | ||
| <!-- prettier-ignore-end --> | ||
This file was deleted.
Uh oh!
There was an error while loading. Please reload this page.