Associated Paper: Currently being reviewed
Model Page: huggingface.co/edgaremy/arthropod-detector
This repository provides scripts and metadata to recreate the French terrestrial arthropod detection dataset ArthroNat (in the src/ folder).
The data is extracted from iNaturalist, and is designed to cover a wide variety of arthropod families. The data collection and annotation process is documented in this paper: (to be published).
We provide several scripts arranged in multiple folders: Data download and processing scripts in src/, inference utilities in inference/, and training/validation workflows in training/ and validation/. Most figures shown in the paper associated to this repo are produced in the validation/ folder or the stats/ folder (statistical analysis of the datasets used).
Note that you don't necessarily need to have the dataset downloaded in order to play around with the figure plots, as results of most tests have been stored in intermediary .csv files. If you wish to do so, you can however start from scratch, download the dataset, train the models again and re-generate those result .csv files by yourself.
Each of those folders has its own README.md file detailing what scripts they contain and their function.
In order to run most scripts in this repository, you will need to have a proper Python and R environments set up. The following section details this required process.
git clone https://github.com/edgaremy/arthropod-detection-dataset.git
cd arthropod-detection-datasetIf you already know how to use Python environments, here are the librairies you need to install:
pip install ultralytics wget huggingface_hub seaborn pingouin pypalettesThis should suffice, though if you encounter any dependency issues, or want to reproduce the exact setup that was use to get our results, please favor option #2 below.
- Make sure you first have conda installed
- Create a new conda virtual env:
# You can replace "arthropod" by any name you like
conda create --name arthropod python=3.12.11- You can now activate the venv, and install the requirements with pip (if you already have another non-conda environment, you can do this directly):
conda activate arthropodpip install -r requirements.txtThe environment is now ready !
Note: you will need to activate the venv whenever you want to use it (see conda documentation for more details).
The validation scripts were checked with R 4.5.2. Install the packages used by the plotting and analysis scripts:
install.packages(c(
"ggplot2",
"dplyr",
"tidyr",
"broom",
"paletteer",
"sandwich",
"lmtest",
"gridExtra",
"car",
"betareg",
"Hmisc",
"mgcv"
))The images on iNaturalist can have various copyrights. As such, they cannot be directly provided, but can be downloaded using our script.
python src/download_dataset.pyThe src/download_dataset.py script downloads images from iNaturalist and creates the structure of the yolo dataset accordingly with the labels stored in resources/dataset_labels.zip. Please note that due to API constraints, the download can be blocked after a while on iNaturalist's server. It that's the case, you should still be able to run the script again later, and it should resume where it stopped.
Note: You can also download additional validation data, that was used to assess the generalization capabilities of the detection model to new taxa. The generalization datasets are stored in the datasets(others)/ folder. For publicly available datasets (OOD and flatbug), we provide setup instructions and conversion scripts in their respective README files.
If you just want to try the detection model directly, you don't need to download the dataset. The models are automatically downloaded from Hugging Face Hub.
# Process a single image (uses YOLO11n PyTorch model by default)
python inference/run_model_with_cmd.py path/to/image.jpg
# Use YOLO11l model with ONNX format and save all outputs
python inference/run_model_with_cmd.py images/ \
--format onnx --model-size l \
--save-crops --save-labels --save-bbox-view
# Use GPU and custom confidence threshold
python inference/run_model_with_cmd.py images/ --device cuda --conf 0.5For complete documentation, see inference/README.md
from inference.run_model_with_cmd import load_model_from_huggingface, run_inference
# Load model (automatically downloads from Hugging Face Hub)
model = load_model_from_huggingface(
repo_id="edgaremy/arthropod-detector",
filename="yolo11l_ArthroNat+flatbug.onnx" # or .pt for PyTorch
)
# Run inference with all output options
summary = run_inference(
model=model,
input_path="images/",
results_folder="results",
save_crops=True,
save_labels=True,
save_bbox_view=True,
conf_threshold=0.5,
device="cuda"
)Available models:
yolo11n_ArthroNat+flatbug.pt/.onnx- Nano (fastest)yolo11l_ArthroNat+flatbug.pt/.onnx- Large (most accurate)
For more details, check out:
- inference/README.md - Comprehensive usage guide
- inference/example_module_usage.py - Python module examples
- The dedicated Hugging Face Model Repo π€
- Ultralytics Documentation