Thank you for your interest in contributing! We're eager to see your ideas and look forward to working with you.
This document describes the technical procedures we follow in this project. It should also be stressed that as members of the Scikit-HEP community, we are all obliged to maintaining a welcoming, harassment-free environment. See the Code of Conduct for details.
The front page for the Awkward Array project is its GitHub README. This leads directly to tutorials and reference documentation that you may have already seen. It also includes instructions for compiling for development.
The first thing you should do if you want to fix something is to submit an issue through GitHub. That way, we can all see it and maybe one of us or a member of the community knows of a solution that could save you the time spent fixing it. If you "assign yourself" to the issue (top of right side-bar), you can signal your intent to fix it in the issue report.
Feel free to open pull requests in GitHub from your forked repo when you start working on the problem. We recommend opening the pull request early so that we can see your progress and communicate about it. (Note that you can git commit --allow-empty to make an empty commit and start a pull request before you even have new code.)
Please make the pull request a draft to indicate that it is in an incomplete state and shouldn't be merged until you click "ready for review."
We welcome the use of AI tools as part of the development process. They can be valuable aids for drafting, refactoring, documentation, and exploration. However, contributions to Awkward Array require human judgment, contextual understanding, and familiarity with the project’s structure, goals, and standards.
When using AI tools:
- You remain the author of the contribution. Review, understand, and test all AI-assisted code or documentation before submitting it under your name. You should be able to explain and defend the changes on request.
- Avoid fully automated submissions. Issues or pull requests generated end-to-end by automated tools, without meaningful human review or intent, are not appropriate.
- Be respectful of reviewers’ time. Ensure that both the content of the PR and its description reflect your own understanding. Reviewers should not be expected to infer authorship or unknowingly interact with an AI during review.
- Disclose significant AI assistance. If AI tools were used for a substantial portion of the contribution, please note this in the PR description. (This guide is an example of that: we used ChatGPT to help with the writing, and in this comment, we acknowledge that fact.)
Contributors are responsible for the correctness, maintainability, and long-term impact of all submitted changes, regardless of whether AI tools were used.
Pull requests that do not meet these expectations may be closed without review.
Currently, we have four regular reviewers of pull requests:
- Ianna Osborne (ianna)
- Peter Fackeldey (pfackeldey)
- Andres Rios Tascon (ariostas)
- Iason Krommydas (ikrommyd)
You can request a review from one of us or just comment in GitHub that you want a review and we'll see it. Only one review is required to be allowed to merge a pull request. We'll work with you to get it into shape.
If you're waiting for a response and haven't heard in a few days, it's possible that we forgot/got distracted/thought someone else was reviewing it/thought we were waiting on you, rather than you waiting on us—just write another comment to remind us.
If you want to contribute frequently, we'll grant you write access to the scikit-hep/awkward repo itself. This is more convenient than pull requests from forked repos.
Unless you ask us not to, we might commit directly to your pull request as a way of communicating what needs to be changed. That said, most of the commits on a pull request are from a single author: corrections and suggestions are exceptions.
Therefore, we prefer git branches to be named with your GitHub userid, such as ianna/write-contributing-md.
The titles of pull requests (and therefore the merge commit messages) should follow these conventions. Mostly, this means prefixing the title with one of these words and a colon:
- feat: new feature
- fix: bug-fix
- perf: code change that improves performance
- refactor: code change that neither fixes a bug nor adds a feature
- style: changes that do not affect the meaning of the code
- test: adding missing tests or correcting existing tests
- build: changes that affect the build system or external dependencies
- docs: documentation only changes
- ci: changes to our CI configuration files and scripts
- chore: other changes that don't modify src or test files
- revert: reverts a previous commit
Almost all pull requests are merged with the "squash and merge" feature, so details about commit history within a pull request are hidden from the main branch's history. Feel free, therefore, to commit with any frequency you're comfortable with.
It is unnecessary to manually edit (rebase) commit history within a pull request.
The installation for developers procedure is described in brief on the front page, and in more detail here.
Awkward Array is shipped as two packages: awkward and awkward-cpp. The awkward-cpp package contains the compiled C++ components required for performance, and awkward is only Python code. If you do not need to modify any C++ (the usual case), then awkward-cpp can simply be installed using pip or conda.
Subsequent steps require the generation of code and datafiles (kernel specification, header-only includes). This can be done with the prepare nox session:
nox -s prepareDetails
The prepare session accepts flags to specify exact generation targets, e.g.
nox -s prepare -- --tests --docsThis can reduce the time taken to perform the preparation step in the event that only the package-building step is needed.
nox also lets us reuse the virtualenvs that it creates for each session with the -R flag, eliminating the dependency reinstall time:
nox -R -s prepareThe C++ components can be installed by building the awkward-cpp package:
python -m pip install ./awkward-cppDetails
If you are working on the C++ components of Awkward Array, it might be more convenient to skip the build isolation step, which involves creating an isolated build environment. First, you must install the build requirements:
python -m pip install "scikit-build-core[pyproject,color]" pybind11 ninja cmakeThen the installation can be performed without build isolation:
python -m pip install --no-build-isolation --check-build-dependencies ./awkward-cppWith awkward-cpp installed, an editable installation of the pure-python awkward package can be performed with
python -m pip install -e .Finally, let's run the integration test suite to ensure that everything's working as expected:
python -m pytest -n auto testsFor more fine-grained testing, we also have tests of the low-level kernels, which can be invoked with
python -m pytest -n auto awkward-cpp/tests-spec
python -m pytest -n auto awkward-cpp/tests-cpu-kernelsThis assumes that the nox -s prepare session ran the --tests target.
Furthermore, if you have an Nvidia GPU and CuPy installed, you can run the CUDA tests with
python -m pytest tests-cuda-kernels
python -m pytest tests-cudaYou can also run additional unit tests that have more test coverage for all the low-level kernels for even more detailed fine-grained testing.
For Python Kernels:
python -m pytest -n auto awkward-cpp/tests-spec-explicitFor CPU Kernels:
python -m pytest -n auto awkward-cpp/tests-cpu-kernels-explicitFor CUDA Kernels
python -m pytest tests-cuda-kernels-explicitSometimes it's convenient to build a wheel for the awkward-cpp package, so that subsequent re-installs do not require the package to be rebuilt. The build package can be used to do this, though care must be taken to specify the current Python interpreter in pipx:
pipx run --python=$(which python) build --wheel awkward-cppThe built wheel will then be available in awkward-cpp/dist.
The Awkward Array project uses pre-commit to handle formatters and linters. This automatically checks (and may push corrections to) your pull request's git branch.
To respond more quickly to pre-commit's feedback, it can help to install it and run it locally. Once it is installed, run
pre-commit run -ato test all of your files. If you leave off the -a, it will run only on currently stashed changes.
As stated above, we use pytest to verify the correctness of the code, and GitHub will reject a pull request if either pre-commit or pytest fails (red "X"). All tests must pass for a pull request to be accepted.
Note that if a pull request doesn't modify code, only the documentation tests will run. That's okay: documentation-only pull requests only need the documentation tests to pass.
Unless you're refactoring code, such that your changes are fully tested by the existing test suite, new code should be accompanied by new tests. Our testing suite is organized by GitHub issue or pull request number: that is, test file names are
tests/test_XXXX-yyyy.py
where XXXX is either the number of the issue your pull request fixes or the number of the pull request and yyyy is descriptive text, often the same as the git branch. This makes it easier to run your test in isolation:
python -m pytest tests/test_XXXX-yyyy.pyand it makes it easier to figure out why a particular test was added. The easiest way to make a new testing file is to copy an existing one and replace its test_zzzz functions with your own. The previous tests should also give you a sense of the way we test things and the kinds of things that are constrained in tests.
Documentation is automatically built by each pull request. You usually won't need to build the documentation locally, but if you do, this section describes how.
We use Sphinx to generate documentation. You may need to install some additional packages:
To build documentation locally, first prepare the generated data files with
nox -s prepareDetails
Only the --headers and --docs flags are actually required at the time of writing. These can be passed with:
nox -s prepare -- --docs --headersThen, use nox to run the various documentation build steps
nox -s docsThis command executes multiple custom Python scripts (some require a working internet connection), in addition to using Sphinx and Doxygen to generate the required browser viewable documentation.
To view the built documentation, open
docs/_build/html/index.htmlfrom the root directory of the project in your preferred web browser, e.g.
python -m http.server 8080 --directory docs/_build/html/Before re-building documentation, you might want to delete the files that were generated to create viewable documentation. A simple command to remove all of them is
rm -rf docs/reference/generated docs/_build docs/_static/doxygenThere is also a cache in the docs/_build/.jupyter_cache directory for Jupyter Book, which can be removed.
The Awkward Array main branch must be kept in an unbroken state. There are two reasons for this: so that developers can work independently on known-to-be-working states and so that users can test the latest changes (usually to see if the bug they've discovered is fixed by a potential correction).
The main branch is also never far from the latest released version. We usually deploy patch releases (z in a version number like x.y.z) within days of a bug-fix.
Committing directly to main is not allowed except for
- updating the
pyproject.tomlfile to increase the version number, which should be independent of pull requests - updating documentation or non-code files
- unprecedented emergencies
and only by the the reviewing team.
The main-v1 branch was split from main just before Awkward 1.x code was removed, so it exists to make 1.10.x bug-fix releases. These commits must be drawn from main-v1, not main, and pull requests must target main-v1 (not the GitHub default). A single commit cannot be applied to both main and main-v1 because they have diverged too much. If a bug-fix needs to be applied to both (unlikely), it will have to be reimplemented on both.
Currently, only one person can deploy releases:
- Ianna Osborne (ianna)
There are two kinds of releases: (1) awkward-cpp updates, which only occur when the C++ is updated (rare) and involves compilation on many platforms (takes hours), and (2) awkward updates, which can happen with any bug-fix. The releases listed in GitHub are awkward releases, not awkward-cpp.
If you need your merged pull request to be deployed in a release, just ask!
To make an awkward-cpp release:
- A commit to
mainshould increase the version number inawkward-cpp/pyproject.tomland the corresponding dependency inpyproject.toml. This ensures thatawkward-cppandawkwardremain in-sync. - The Deploy C++ GitHub Actions workflow should be manually triggered.
- A
gittagawkward-cpp-{version}should be created for the new version epoch.
To make an awkward release:
- A commit to
mainshould increase the version number inpyproject.toml - A new GitHub release must be published. "Generate release notes" produces categorized notes automatically: a workflow labels each PR
type/<type>from its conventional-commit title, and.github/release.ymlgroups those labels into sections. - A
docs/switcher.jsonentry must be added for new minor/major versions.
Pushes that modify docs/switcher.json on main will automatically be synchronised with AWS.
Nightly wheels of awkward-cpp and awkward are built and published to the Scientific Python Nightly Wheels Anaconda Cloud organization.
As the awkward-cpp and awkward nightly wheels do not include version control system information, they will have the same version numbers as the last released versions on the public PyPI. To avoid resolution conflicts when installing the nightly wheels, it is recommended to first install awkward-cpp and awkward from PyPI to get all of their dependencies, then uninstall awkward-cpp and awkward and install the nightly wheels from the Scientific Python nightly index.
python -m pip install --upgrade awkward
python -m pip uninstall --yes awkward awkward-cpp
python -m pip install --upgrade --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple awkward
To ensure stability and reproducibility across the Awkward ecosystem:
-
Collaborative Review Required
- Pull requests that affect core dependencies (e.g., Numba, Pandas, PyArrow, NumPy) or Python version support must be discussed with maintainers before merging.
- This avoids unilateral decisions that could break critical workflows.
-
Compatibility Transparency
- If a dependency is not yet available for a new Python release, note this clearly in the PR description and documentation.
- Example: “Supports Python 3.14, but Numba-dependent features are disabled until upstream support is released.”
-
Testing Expectations
- CI should run against all supported Python versions.
- Tests that depend on unsupported features (e.g., Numba on Python 3.14) must be marked with xfail so contributors see the limitation.
-
Version Pinning
- Pin or constrain versions of critical dependencies to prevent silent breakages from upstream changes.
- Floating versions may be allowed for secondary utilities, but core scientific packages should be locked.
-
Communication
- Use GitHub Discussions or Issues to propose changes before opening a PR that alters compatibility.
- This ensures consensus and avoids surprises for downstream users.
-
Have I discussed this change with maintainers?
-
Did I document limitations (e.g., missing Numba support)?
-
Are CI tests updated to reflect compatibility status?
-
Did I pin or constrain critical dependencies?