Skip to content

kotama7/HPC-AutoResearch

Repository files navigation

HPC-AutoResearch

HPC-AutoResearch targets HPC environments with a Singularity-based, split-phase execution path. It orchestrates: idea loading/generation → BFTS tree search experiments → plot aggregation → LaTeX writeup → optional PDF review.

At a glance

  • Split-phase execution with explicit install/coding/compile/run steps inside Singularity.
  • Per-run isolation: every experiment gets its own config, logs, workspace, and artifacts.
  • Tree-search agent manager with parallel workers (GPU-aware, CPU fallback).
  • Optional MemGPT-style memory with LLM-based compression across branches for longer context.
  • Resource files to mount datasets and inject templates/docs into prompts.
  • Configurable persona system for role-specific prompt customization.
  • Token tracking for monitoring LLM usage and costs.
  • Multi-seed evaluation for robust result validation.

Repository layout

  • launch_scientist_bfts.py: main launcher (ideas -> experiments -> plots/writeup/review).
  • generate_paper.py: plots/writeup/review for an existing run directory.
  • ai_scientist/: core agent logic, tree search, prompts, memory, and utilities.
    • memory/: MemGPT-style hierarchical memory implementation.
    • treesearch/: BFTS agent manager and parallel workers.
    • utils/: token tracking and model parameter utilities.
    • persona.py: configurable persona system for prompt customization.
  • prompt/: split-phase and stage prompts, response schemas, and writeup templates.
    • common/: base system prompts and domain-neutral instructions.
    • phases/: Phase 0 planning and Phase 1 installer prompts.
    • memory/: memory compression prompt templates.
    • schemas/: structured response schemas for split execution.
  • template/: base Singularity image instructions (template/README.md).
  • docs/: expanded guides for requirements, configuration, resources, outputs, and troubleshooting.
  • tests/: unit tests for memory, resources, compression, and parallelism.
  • experiments/: run outputs (generated at runtime).

Typical workflows

  1. Run end-to-end: generate ideas (optional) -> run launch_scientist_bfts.py -> review experiments/<run>/.
  2. Reuse a run: skip the experiment and run generate_paper.py for plots/writeups.
  3. Iterate locally: use --phase_mode single for quick iteration without Singularity.
  4. Custom persona: set agent.role_description in config to customize the agent's role (e.g., "HPC Researcher").
  5. Enable memory: use --enable_memgpt or set memory.enabled=true for hierarchical context management.
    • Note: Without MemGPT, there is no context budget management. Idea/task descriptions are injected as full text, which may exceed LLM context limits for complex experiments. See docs/memory/memory.md for details.
  6. Parallel experiments: adjust --num_workers to scale across available GPUs.

Where to start

  • If you are new: start with docs/getting-started/requirements.md, docs/getting-started/installation.md, and docs/getting-started/quickstart.md.
  • If you are operating on HPC: read docs/configuration/execution-modes.md, docs/configuration/configuration.md, and docs/configuration/outputs.md.
  • If you are extending prompts/resources: read docs/architecture/llm-context.md and docs/architecture/resource-files.md.

Documentation

Detailed guides live in docs/. Start with docs/README.md for the full index.

Acknowledgement

The tree search component is built on top of the AIDE project. This project extends the original AI-Scientist-v2 with split-phase execution, MemGPT-style context-engineering and Singularity container support.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors