Skip to content

edgaremy/arthropod-detection-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

70 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

French Arthropod Detection Dataset 🐞

A detection dataset containing labelled images of French Terrestrial Arthropods.

Associated Paper: Currently being reviewed
Model Page: huggingface.co/edgaremy/arthropod-detector

What's in this repository

Dataset

This repository provides scripts and metadata to recreate the French terrestrial arthropod detection dataset ArthroNat (in the src/ folder). The data is extracted from iNaturalist, and is designed to cover a wide variety of arthropod families. The data collection and annotation process is documented in this paper: (to be published).

Code

We provide several scripts arranged in multiple folders: Data download and processing scripts in src/, inference utilities in inference/, and training/validation workflows in training/ and validation/. Most figures shown in the paper associated to this repo are produced in the validation/ folder or the stats/ folder (statistical analysis of the datasets used).

Note that you don't necessarily need to have the dataset downloaded in order to play around with the figure plots, as results of most tests have been stored in intermediary .csv files. If you wish to do so, you can however start from scratch, download the dataset, train the models again and re-generate those result .csv files by yourself.

Each of those folders has its own README.md file detailing what scripts they contain and their function.

Requirements

In order to run most scripts in this repository, you will need to have a proper Python and R environments set up. The following section details this required process.


Set up Python

Clone repository

git clone https://github.com/edgaremy/arthropod-detection-dataset.git
cd arthropod-detection-dataset

Option #1: Quick setup

If you already know how to use Python environments, here are the librairies you need to install:

pip install ultralytics wget huggingface_hub seaborn pingouin pypalettes

This should suffice, though if you encounter any dependency issues, or want to reproduce the exact setup that was use to get our results, please favor option #2 below.

Option #2: For reproducibility - Set up Python venv using Conda

# You can replace "arthropod" by any name you like
conda create --name arthropod python=3.12.11
  • You can now activate the venv, and install the requirements with pip (if you already have another non-conda environment, you can do this directly):
conda activate arthropod
pip install -r requirements.txt

The environment is now ready !

Note: you will need to activate the venv whenever you want to use it (see conda documentation for more details).


Set up R

The validation scripts were checked with R 4.5.2. Install the packages used by the plotting and analysis scripts:

install.packages(c(
    "ggplot2",
    "dplyr",
    "tidyr",
    "broom",
    "paletteer",
    "sandwich",
    "lmtest",
    "gridExtra",
    "car",
    "betareg",
    "Hmisc",
    "mgcv"
))

Download the dataset

The images on iNaturalist can have various copyrights. As such, they cannot be directly provided, but can be downloaded using our script.

python src/download_dataset.py

The src/download_dataset.py script downloads images from iNaturalist and creates the structure of the yolo dataset accordingly with the labels stored in resources/dataset_labels.zip. Please note that due to API constraints, the download can be blocked after a while on iNaturalist's server. It that's the case, you should still be able to run the script again later, and it should resume where it stopped.

Note: You can also download additional validation data, that was used to assess the generalization capabilities of the detection model to new taxa. The generalization datasets are stored in the datasets(others)/ folder. For publicly available datasets (OOD and flatbug), we provide setup instructions and conversion scripts in their respective README files.


Use the detection model

If you just want to try the detection model directly, you don't need to download the dataset. The models are automatically downloaded from Hugging Face Hub.

Quick start - Command line

# Process a single image (uses YOLO11n PyTorch model by default)
python inference/run_model_with_cmd.py path/to/image.jpg

# Use YOLO11l model with ONNX format and save all outputs
python inference/run_model_with_cmd.py images/ \
    --format onnx --model-size l \
    --save-crops --save-labels --save-bbox-view

# Use GPU and custom confidence threshold
python inference/run_model_with_cmd.py images/ --device cuda --conf 0.5

For complete documentation, see inference/README.md

Use as Python module

from inference.run_model_with_cmd import load_model_from_huggingface, run_inference

# Load model (automatically downloads from Hugging Face Hub)
model = load_model_from_huggingface(
    repo_id="edgaremy/arthropod-detector",
    filename="yolo11l_ArthroNat+flatbug.onnx"  # or .pt for PyTorch
)

# Run inference with all output options
summary = run_inference(
    model=model,
    input_path="images/",
    results_folder="results",
    save_crops=True,
    save_labels=True,
    save_bbox_view=True,
    conf_threshold=0.5,
    device="cuda"
)

Available models:

  • yolo11n_ArthroNat+flatbug.pt / .onnx - Nano (fastest)
  • yolo11l_ArthroNat+flatbug.pt / .onnx - Large (most accurate)

For more details, check out:


Going further

🐞 🐜 πŸ¦‹ πŸ¦— 🐝 πŸ•·οΈ πŸ› πŸͺ° πŸͺ²

About

A detection dataset containing labelled images of French terrestrial Arthropods.

Topics

Resources

License

Stars

Watchers

Forks

Contributors