MolCrysKit is a Python toolkit designed for handling molecular crystals, providing utilities for parsing crystallographic data, identifying molecules within crystals, and performing various analyses on molecular crystals using graph theory and the Atomic Simulation Environment (ASE).
- Robust Molecule Identification: Identify individual molecules within a crystal structure using graph-based algorithms
- Disorder Handling: Process disordered structures with graph algorithms
- Topological Surface Generation: Create surface slabs while preserving molecular topology
- Hydrogen Addition: Automatically add hydrogen atoms based on geometric rules
To install MolCrysKit, you can use pip:
pip install .Or for development purposes, install in editable mode:
pip install -e .Here's a simple example of how to use MolCrysKit:
from ase import Atoms
from molcrys_kit.structures.crystal import MolecularCrystal
# 1. Create a toy system (e.g., 2 Water molecules in a unit cell)
# In practice, you would typically load this from a file: atoms = read('cif_file.cif')
atoms = Atoms(
symbols=['O', 'H', 'H', 'O', 'H', 'H'],
positions=[
[1.0, 1.0, 1.0], [1.8, 1.0, 1.0], [0.7, 1.6, 1.0], # Molecule 1
[5.0, 5.0, 5.0], [5.8, 5.0, 5.0], [4.7, 5.6, 5.0] # Molecule 2
],
cell=[10.0, 10.0, 10.0],
pbc=True
)
# 2. Initialize MolecularCrystal (Automatically identifies molecules via graph logic)
crystal = MolecularCrystal.from_ase(atoms)
# 3. Access Crystal & Molecular Properties
print(f"Lattice Parameters: {crystal.get_lattice_parameters()}")
print(f"Identified Molecules: {len(crystal.molecules)}")
mol = crystal.molecules[0]
print(f"Molecule 1 Formula: {mol.get_chemical_formula()}")
print(f"Molecule 1 Center of Mass: {mol.get_center_of_mass()}")Two Dockerfiles are provided for different environments:
| File | Base image | Use case |
|---|---|---|
Dockerfile |
python:3.10-slim |
Local use, reviewers, CI |
Dockerfile.bohrium |
registry.dp.tech/dptech/ubuntu:ubuntu24.04-py3.12 |
Bohrium cloud platform |
Both install MolCrysKit directly from the GitHub archive. A local Docker build context is still used to start the build, but the package, notebook assets, and helper scripts are fetched from the selected GitHub ref inside the image build.
- Docker Desktop (macOS / Windows) or the Docker Engine (Linux).
# 1. Clone the repository and enter the package directory
git clone https://github.com/SchrodingersCattt/MolCrysKit.git
cd MolCrysKit
# 2. Build the image (≈ 5–10 min on first run; subsequent builds use the cache)
docker build -t molcryskit:latest .
# 3. Run the smoke test to confirm everything works
docker run --rm molcryskit:latest python /opt/molcryskit/scripts/docker_smoke_test.py
# 4. Start the Jupyter notebook server
docker run -it --rm -p 8888:8888 molcryskit:latest
# Then open http://localhost:8888 in your browser.
# Example CIF files are available at /workspace/notebook/example/ inside the container.# From the MolCrysKit/ directory:
bash scripts/docker-test.shThis script builds the image and runs the smoke test automatically, reporting
ALL CHECKS PASSED on success.
# Build with the Bohrium-specific Dockerfile
docker build -f Dockerfile.bohrium -t molcryskit-bohrium:latest .
# Pin to an immutable Git tag instead of the moving main branch
# (recommended for archival/reviewer reproducibility)
docker build -f Dockerfile.bohrium \
--build-arg MOLCRYSKIT_REF=refs/tags/v0.2.0 \
-t molcryskit-bohrium:v0.2.0 .The Bohrium image uses pip install from the GitHub archive zip (no git clone
required) and does not include Jupyter — Bohrium provides its own notebook
environment.
The Bohrium registry is convenient for cloud execution, but it should not be the only archival location because image retention is controlled by the platform and project namespace. For a stable public anchor, publish immutable release images to GitHub Container Registry (GHCR).
The workflow publish-ghcr.yml pushes
Dockerfile images to ghcr.io/<owner>/molcryskit:
- pushing a Git tag such as
v0.2.0publishesghcr.io/<owner>/molcryskit:v0.2.0 - stable release tags also receive
latest - manual dispatch can publish a development snapshot from a chosen Git ref
Recommended archival pattern:
# 1. Create and push an immutable release tag
git tag v0.2.0
git push origin v0.2.0
# 2. GitHub Actions publishes the image automatically to GHCR
# ghcr.io/<owner>/molcryskit:v0.2.0For Bohrium, keep using Dockerfile.bohrium as the
platform-specific runtime image, but cite the GitHub repository and GHCR image
as the permanent public anchor.
docker run -it --rm \
-p 8888:8888 \
-v /path/to/your/cif/files:/workspace/my_data \
molcryskit:latestYour files will be accessible at /workspace/my_data/ inside the container.
For detailed architecture, tutorials, and API reference, please see the docs/ directory.
MolCrysKit resolves crystallographic disorder through two complementary paths that are designed to be fully compatible:
Explicit path (_add_explicit_conflicts + _add_conformer_conflicts):
Processes atoms tagged with _atom_site_disorder_assembly / _atom_site_disorder_group
in the CIF (e.g. SHELXL PART groups). Mutual-exclusion edges are added for
atoms in different groups of the same assembly. Hydrogen atoms bonded to an
explicit-disorder centre inherit the centre's group tag and are resolved together
with it.
Implicit SP path (_add_implicit_sp_conflicts + _resolve_valence_conflicts):
Processes partial-occupancy atoms on crystallographic special positions that carry
no disorder tags (common in SHELX riding-H refinements). The algorithm
clusters copies of each asymmetric-unit site by proximity and adds mutual-exclusion
edges within each cluster. For heavy atoms (N, P, S) with sufficient copies
around them, a tetrahedral/trigonal decomposition (_sp_tetrahedral_single) finds
geometrically valid orientation combinations and adds cross-cluster compatibility
constraints.
Motif merge post-pass (_merge_chemical_motifs):
After the Maximum-Weight Independent Set is solved, isolated XH_n centres
(e.g. NH4+, H2O) are reconstructed by a greedy distance- and angle-sorted H
selection. Soft conflict edges (valence_geometry, implicit_sp, geometric)
are ignored here so that the strongly disordered SP-position motifs are not
excluded wholesale. A key guard enforces one H per crystallographic site
(asym_id) when at least max_H distinct asym_ids are present: this prevents
the greedy from picking multiple copies of the same SHELX H position (which point
in nearly identical directions) before exhausting the other sites.
Known-compatible CIF styles for NH4+:
| Style | Example | Tags | Resolution path |
|---|---|---|---|
Explicit PART groups (dg=-1) |
DAI-4 N4 | disorder_assembly=A/B/C/D, disorder_group=-1 |
Explicit path |
Implicit SHELX riding-H (dg=0) |
DAI-4 N1 | No tags, occ=1/sso, sso>1 |
Implicit SP + motif merge |
| High-multiplicity SP (24 orientations) | PAP-4 | occ=1/24, no tags |
Implicit SP + motif merge |
Both styles produce correct NH4+ (4 H per nitrogen) after resolution.
Valence-completeness diagnostics:
DisorderSolver.solve() automatically calls
molcrys_kit.analysis.disorder.diagnostics.check_valence_completeness on
each resolved structure. If an isolated N or O centre has an H count outside
the expected range (N: 3–4; O: 0–2), a WARNING is emitted via the standard
logging system. This catch-all does not modify the structure; it only makes
potential resolution artefacts visible.
See the molcrys_kit/ directory for source code and the scripts/ directory for utility scripts (e.g. disorder diagnostics, molecule identification, CIF processing).
Contributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.