| Large Language Model | Obj. |
Obj. |
Obj. |
Subj. |
Subj. |
Subj. |
Avg. Gender | Avg. Race | Avg. Avg. |
|---|---|---|---|---|---|---|---|---|---|
| 👑 GPT-4o | 95.56 | 54.62 | 75.09 | 98.39 | 96.18 | 97.29 | 96.98 | 75.40 | 86.19 |
| LLaMA-3.2 | 96.67 | 47.22 | 71.95 | 98.67 | 97.20 | 97.93 | 97.67 | 72.21 | 84.94 |
| Qwen-2.5 | 91.11 | 52.78 | 71.95 | 98.83 | 96.40 | 97.61 | 94.97 | 74.59 | 84.79 |
| WizardLM-2 | 96.67 | 44.44 | 70.56 | 99.17 | 97.51 | 98.34 | 97.92 | 70.97 | 84.45 |
| Gemini-1.5 | 94.44 | 44.44 | 69.44 | 98.13 | 97.67 | 97.90 | 96.28 | 71.05 | 83.67 |
| GPT-3.5 | 84.44 | 39.81 | 62.13 | 98.48 | 96.28 | 97.38 | 91.46 | 68.04 | 79.75 |
| Text-to-Image Model | Obj. |
Obj. |
Obj. |
Subj. |
Subj. |
Subj. |
Avg. Gender | Avg. Race | Avg. Avg. |
|---|---|---|---|---|---|---|---|---|---|
| 👑 DALL-E 3 | 58.40 | 30.33 | 44.37 | 96.35 | 84.93 | 90.64 | 77.38 | 57.63 | 67.50 |
| Midjourney | 48.90 | 25.36 | 37.13 | 99.00 | 75.99 | 87.50 | 73.95 | 50.68 | 62.31 |
| SDXL | 51.97 | 22.50 | 37.24 | 98.61 | 74.40 | 86.51 | 75.29 | 48.45 | 61.87 |
| FLUX-1.1 | 49.07 | 23.50 | 36.29 | 91.66 | 30.36 | 61.01 | 70.37 | 26.93 | 48.65 |
[Feb. 09, 2025] Published the arXiv preprint: arXiv:2502.05849
Please ensure your system meets the following requirements:
- Python: Version 3.10 or higher
- Dependencies: Install the required libraries by running the following command in the root directory:
pip install -r requirements.txt
The available models for testing are stored in large_language_model/models.py. Set the environment variables according to the model you choose to use:
export OPENAI_API_KEY="your_api_key_here" # Leave empty if not used
export GEMINI_API_KEY="your_api_key_here" # Leave empty if not used
export DEEPINFRA_TOKEN="your_api_key_here" # Leave empty if not usedlarge_language_model/objective_test/: Contains scripts for testing objective queries.large_language_model/subjective_test/: Contains scripts for testing subjective queries and daily scenarios.large_language_model/subjective_test/prompts_gen.py: A script for generating prompts related to daily scenarios.
-
Navigate to the project directory:
cd large_language_model -
Provide execution permissions to the test script:
chmod +x run_llm_test.sh
-
Run the test script:
./run_llm_test.sh
Test results will be saved in the following directories:
objective_test/results/for objective queries.subjective_test/results/for subjective queries.
To visualize the test results, run:
python visualization.pyThe visualizations will be saved in the fig_results/ directory.
Set the environment variables according to the model you choose to use:
export OPENAI_API_KEY="your_api_key_here" # Leave empty if not used
export MIDJOURNEY_API_SECRET="your_api_secret_here" # Leave empty if not used
export DEEPINFRA_TOKEN="your_api_key_here" # Leave empty if not usedtext_to_image_model/objective_test/: Contains scripts for testing objective queries.text_to_image_model/subjective_test/: Contains scripts for testing subjective queries.
-
Navigate to the project directory:
cd text_to_image_model -
Provide execution permissions to the test script:
chmod +x image_generate.sh chmod +x run_t2i_test.sh
-
Generate the images:
./image_generate.sh
You need to manually place Midjourney images (due to a third-party setup), please do so after this step, in their correct folders.
-
Run the analysis:
./run_t2i_test.sh
This script runs all analysis steps for both objective and subjective tests. Here, we use FairFace as the detection tool. The code comparing the accuracy of DeepFace and FairFace can be found in
detector_accuracy_test/.
Test results will be saved in the following directories:
objective_test/Test_Results/for objective queries.subjective_test/Test_Results/for subjective queries.
To visualize the test results, run:
python visualization.pyThe visualizations will be saved in the Results_Visual/ directory.
For more details, please refer to our paper here.
If you find our paper&tool interesting and useful, please feel free to give us a star and cite us through:
@inproceedings{huang2025fact,
title={Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases},
author={Huang, Jen-tse and Yan, Yuhang and Liu, Linqi and Wan, Yixin and Wang, Wenxuan and Chang, Kai-Wei and Lyu, Michael R},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2025},
pages={10974--10993},
year={2025}
}

