AI4Chem@LUAS: A Framework for Chemical Experiment Automation

Now Published

Can reasoning power significantly improve the knowledge of large language models for chemistry? --Based on conversations with LLMs

Dong-Xu Cui^⊥, Shi-Yu Long^⊥, Yi-Xuan Tang, Yue Zhao, and Qiao Li^*

Journal of Chemical Information and Modeling (IF=5.3, JCR Q1)

Published: August 25, 2025

1. Introduction

AI4Chem@LUAS is a framework for automating experiments using Large Language Models (LLMs). It allows users to define experiments and prompts in JSON configuration files and run them against various language models via the command line.

The project is structured to separate concerns into distinct modules:

configs/: Holds all user-facing configurations for tasks and prompts.
src/: Contains the core logic for API interaction, experiment execution, and file I/O.
main.py: Parses command-line arguments and orchestrates the experiment run.

This modular design allows for easier extension and maintenance.

2. Project Structure

AI4Chem@LUAS/
├── main.py                 # Main execution entry point
├── config.py               #   Stores all API keys
├── requirements.txt        #   Project dependencies
|
├── configs/                  # Contains all project JSON configurations
│   ├── prompts.json          #   All Prompt templates
│   └── tasks.json            #   All task definitions (paths, column names, etc.)
|
├── src/                      # Core source code directory
│   ├── __init__.py
│   ├── config_loader.py      #   Configuration loader
│   ├── experiment_runner.py  #   Experiment runner (core logic)
│   ├── llm_api.py            #   Encapsulates all LLM API calls
│   └── utils/                #   Common utility packages
│       ├── __init__.py
│       └── file_io.py        #     Encapsulates all file I/O operations
|
├── data/                     # Stores raw data CSV files
│   ├── task1_data.csv
│   ├── task2_data.csv
│   └── ...
|
├── results/                  # (Auto-generated) Experiment results output directory
├── logs/                     # (Auto-generated) Detailed log output directory
└── repair_results/           # (Auto-generated) Repair results directory

3. Getting Started

3.1. Prerequisites

Python 3.9 or higher
An API key from an LLM provider (e.g., OpenAI)

3.2. Installation

Clone the repository
```
git clone <repository-url>
cd LLM_Chem
```
Create and activate a virtual environment (recommended) :
```
python3 -m venv venv
source venv/bin/activate
```

Install the required dependencies:

pip install -r requirements.txt

pip install -r requirements.txt

3.3. Configuration

API Keys: Open the config.py file and replace the placeholder values with your actual API keys and base URLs.

# config.py
DASHSCOPE_API_KEY = "YOUR_DASHSCOPE_API_KEY"
DASHSCOPE_BASE_URL = "YOUR_DASHSCOPE_BASE_URL"

Data: Place your data files (in CSV format) in the data/ directory.
Tasks: Define your tasks in configs/tasks.json. Each task should have a unique ID and specify the data file, column mapping, and any learning materials.
Prompts: Define your prompts in configs/prompts.json. Each prompt should have a unique ID and specify the prompt template and the task type it belongs to.
Learning Materials: The framework supports two types of learning materials that are used as system prompts for the LLM:
- General Learning Materials: These are general context files for a whole task (e.g., 1HNMR_spectroscopy.md for all of Task 6). Their paths are specified in configs/tasks.json under the context_filepath key.
- Specific Learning Materials: These are materials for a specific experiment (e.g., for experiment 6-4 only). To enable this, you must set "requires_learning_material": true for the prompt in configs/prompts.json. The framework will then automatically look for a file named task_{exp_id}_specific_material.md in the learning_materials/ directory (e.g., task_6-4_specific_material.md).

4. How to Run Experiments

This project is designed to be run from the command line. You can specify the model and the experiments you want to run using command-line arguments.

4.1. Command-Line Arguments

--model: (Required) The name of the model to use.
--experiments: (Required) One or more experiment IDs to run.

4.2. Examples

1. Running a single experiment

python main.py --model qwen-max-latest --experiments 6-1

2. Running multiple experiments

python main.py --model deepseek-reasoner --experiments 6-1 6-2 6-3

3. Getting help

If you provide invalid arguments or need to see the available options, you can use the --help flag.

python main.py --help

5. Output

The results of each experiment are saved in the results/ directory, with a subdirectory for each experiment. The output includes:

CSV file: A CSV file containing the input and the model's output for each data point.
Text file: A text file containing a summary of the experiment, including the system prompt and the full model output for each data point.
JSONL file: A JSONL file in the logs/ directory containing detailed logs of all interactions with the LLM.

6. How to Extend the Framework

6.1. Adding a New Tas

Add a new entry to configs/tasks.json with a unique ID.
Specify the data_filepath, column_map, and any other relevant options.
Add your data file to the data/ directory.

6.2. Adding a New Prompt

Add a new entry to configs/prompts.json with a unique ID.
Specify the description, type (which should match a task type in tasks.json), and template.

6.3. Adding a New LLM Provider

Add a new function to src/llm_api.py to handle the API calls for the new provider.
Update the get_llm_response function to call the new function based on the client_type in the MODEL_CONFIG.
Add the new API key and base URL to config.py.

7. Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue if you have any suggestions or find any bugs.

8. License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
configs		configs
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI4Chem@LUAS: A Framework for Chemical Experiment Automation

Now Published

Can reasoning power significantly improve the knowledge of large language models for chemistry? --Based on conversations with LLMs

1. Introduction

2. Project Structure

3. Getting Started

3.1. Prerequisites

3.2. Installation

3.3. Configuration

4. How to Run Experiments

4.1. Command-Line Arguments

4.2. Examples

5. Output

6. How to Extend the Framework

6.1. Adding a New Tas

6.2. Adding a New Prompt

6.3. Adding a New LLM Provider

7. Contributing

8. License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI4Chem@LUAS: A Framework for Chemical Experiment Automation

Now Published

Can reasoning power significantly improve the knowledge of large language models for chemistry? --Based on conversations with LLMs

1. Introduction

2. Project Structure

3. Getting Started

3.1. Prerequisites

3.2. Installation

3.3. Configuration

4. How to Run Experiments

4.1. Command-Line Arguments

4.2. Examples

5. Output

6. How to Extend the Framework

6.1. Adding a New Tas

6.2. Adding a New Prompt

6.3. Adding a New LLM Provider

7. Contributing

8. License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages