Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ The agent was used to produce the trajectories for the [BixBench benchmark](http

```bash
# Clone the repository
git clone https://github.com/Future-House/data-analysis-crow.git
cd data-analysis-crow
git clone https://github.com/Future-House/finch.git
cd finch

# Install dependencies
pip install -e .
Expand All @@ -46,7 +46,7 @@ ANTHROPIC_API_KEY = "your-anthropic-api-key"

## Using the Agent

The agent works by taking a dataset and a prompt, then iteratively building a Jupyter notebook to answer the question. Visit the [tutorial](https://github.com/Future-House/data-analysis-crow/blob/main/tutorial/example.ipynb) for a simple step-by-step guide on how to use the agent.
The agent works by taking a dataset and a prompt, then iteratively building a Jupyter notebook to answer the question. Visit the [tutorial](https://github.com/Future-House/finch/blob/main/tutorial/example.ipynb) for a simple step-by-step guide on how to use the agent.

## Advanced Usage
For advanced evaluations, you can configure `server.yaml` and `runner.yaml` in the `src/scripts/bixbench_evaluation` directory and then run the evaluation script:
Expand All @@ -62,7 +62,7 @@ This will:

Results are saved in the output directory specified in your configuration file.

Note that the dataset and environment configuration must be updated appropriately. For an example, see [dataset.py](https://github.com/Future-House/data-analysis-crow/blob/main/src/fhda/dataset.py) which includes the capsule dataset configuration used for the BixBench benchmark.
Note that the dataset and environment configuration must be updated appropriately. For an example, see [dataset.py](https://github.com/Future-House/finch/blob/main/src/fhda/dataset.py) which includes the capsule dataset configuration used for the BixBench benchmark.

We also recommend visiting the BixBench repository where we share a full evaluation harness for the agent.

Expand Down
12 changes: 6 additions & 6 deletions tutorial/example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@
" language=language,\n",
" system_prompt=prompts.CAPSULE_SYSTEM_PROMPT_QUERY,\n",
" use_tmp_work_dir=False,\n",
" run_notebook_on_edit=True if cfg.USE_DOCKER else False,\n",
" )\n",
" return dae"
]
Expand All @@ -94,11 +93,11 @@
"# If using docker, be sure to pull the image from docker hub first\n",
"# docker pull futurehouse/bixbench:aviary-notebook-env\n",
"# This image includes many bioinformatics and data science packages\n",
"cfg.USE_DOCKER = False\n",
"cfg.USE_DOCKER = True\n",
"# This can be R or PYTHON in Docker or with a local kernel if you have R installed\n",
"LANGUAGE = NBLanguage.PYTHON\n",
"MAX_STEPS = 3\n",
"MODEL_NAME = \"claude-3-7-sonnet-latest\""
"MAX_STEPS = 20\n",
"MODEL_NAME = \"claude-sonnet-4-5-20250929\""
]
},
{
Expand Down Expand Up @@ -173,7 +172,7 @@
"outputs": [],
"source": [
"# VANILLA ROLLOUT - this is a simple version of the what the rollout Manager does\n",
"dataset_folder = Path(\"dataset\")\n",
"dataset_folder = Path(\"datasets/brain_size_data.csv\")\n",
"query = \"Analyze the dataset and give me an in depth analysis using pretty plots. I am particularly interested in crows.\"\n",
"environment = setup_data_analysis_env(query, dataset_folder)\n",
"\n",
Expand Down Expand Up @@ -222,7 +221,8 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"pygments_lexer": "ipython3",
"version": "3.13.2"
}
},
"nbformat": 4,
Expand Down
Loading