This directory contains the data tables and Python code used to build the figures and correlation tables reported in the paper "Subjectify: Mining Human Preference Data for Perceptual Media Quality". All plotting commands write PNG files.
python -m pip install -r requirements.txtdata/scores/ Reference quality scores used for correlations
data/experiments/ Experiment tables, grouped by dataset and condition
data/user_info/ Assessor device, display, viewing-condition, and demographic table
figures/ PNG figures generated by the scripts
scripts/ Command-line entry points
src/ Loading, scoring, sampling, and plotting code
Pairwise experiment tables use this schema:
answer,left_method,participant,right_method,test_case
answer is either the selected method or empty for a tie/no preference.
participant is the session identifier. test_case stores the evaluated
sequence and condition.
Training-question tables are stored in:
data/experiments/live_vqa_training/training_questions.csv
data/experiments/vqeg_training/training_questions.csv
They use this schema:
participant,test_case,passed_first_try,replay_count
The Maxwell multiaspect table is stored in:
data/experiments/maxwell_multiaspect/multiaspect.csv
It contains one row per assessed method/video/participant side:
method,test_case,participant,fluency,exposure,contrast,color,sharpness,noise,compression_artifacts,aesthetic
Aspect values are encoded as -1, 0, and 1.
The assessor-info table is stored in:
data/user_info/user_info.csv
It contains one row per participant session code, with columns for
screen metadata, viewing conditions, visual checks, demographics, display color
gamut, and codec capability flags.
| Data | Paper reference and URL if listed in the article |
|---|---|
| LIVE VQA | K. Seshadrinathan et al., “Study of subjective and objective quality assessment of video,” IEEE TIP, 2010. |
| Netflix Public Dataset | Netflix, “Netflix public dataset,” https://github.com/Netflix/vmaf/blob/master/resource/doc/datasets.md |
| VQEGHD3 | Video Quality Experts Group, HDTV validation report: https://www.vqeg.org/umbraco/surface/FolderList/GetFile?directory=2010%2005%20AGH%20U%20Poland&filename=VQEG%20HDTV%20Final%20Report%20version%202.0.pdf&m=0&pageId=1669 |
| ETRI-LIVE STSVQ | D. Y. Lee et al., “A subjective and objective study of space-time subsampled video quality,” IEEE TIP, 2022. |
| MCML 4K | M. Cheon and J.-S. Lee, “Subjective and objective quality assessment of compressed 4k uhd videos for immersive experience,” IEEE TCSVT, 2018. |
| SJTU 4K | L. Song et al., “The sjtu 4k video sequence dataset,” QoMEX, 2013. |
| AVT-VQDB-UHD-1 | R. R. R. Rao et al., “Avt-vqdb-uhd-1: A large scale video quality database for uhd-1,” IEEE ISM, 2019. |
| YouTube UGC | Y. Wang et al., “Youtube ugc dataset for video compression research,” MMSP, 2019; J. G. Yim et al., “Subjective quality assessment for youtube ugc dataset,” ICIP, 2020. |
| MaxwellDB | H. Wu et al., “Towards explainable in-the-wild video quality assessment: A database and a language-prompted approach,” ACM MM, 2023. |
Each convergence figure is generated by a separate command:
python scripts/plot_convergence.py data/experiments/netflix_view_modes --sample-count 500 --workers 20python scripts/plot_convergence.py data/experiments/vqeg_view_modes --sample-count 500 --workers 20python scripts/plot_convergence.py data/experiments/live_vqa_view_modes --sample-count 500 --workers 20python scripts/plot_convergence.py data/experiments/live_vqa_training --sample-count 500 --workers 20The output files are:
figures/netflix_bradley_terry_convergence_errorbars.png
figures/vqeg_bradley_terry_convergence_errorbars.png
figures/live_vqa_bradley_terry_convergence_errorbars.png
figures/training_convergence_errorbars.png
Build the Maxwell multiaspect radar chart:
python scripts/plot_multiaspect_radar.py data/experiments/maxwell_multiaspect/multiaspect.csv --output figures/multiaspect_radar_Maxwell.png --max-methods 7Build assessor-info figures:
python scripts/plot_user_info.py data/user_info/user_info.csv --output-dir figures/user_infoCorrelation commands read one experiment directory and print a table. By default, the command reports SROCC at 100% of the vote table using Bradley-Terry scores.
python scripts/compute_correlations.py data/experiments/youtube_ugc_pairwise --metrics srocc plccpython scripts/compute_correlations.py data/experiments/maxwell_pairwise --metrics srocc plccAvailable ranking models:
bradley-terry
thurstone
elo
copeland
trueskill
Select a model with --model:
python scripts/compute_correlations.py data/experiments/live_vqa_view_modes --model elo --metrics srocc plccThe same model option is available for convergence plots:
python scripts/plot_convergence.py data/experiments/live_vqa_view_modes --model thurstone --sample-count 500 --workers 20The 4K experiments use one directory per dataset. Each directory contains downscale, center crop, saliency crop, and split frame variants for the available presentation sizes.
python scripts/compute_correlations.py data/experiments/4k_avt_vqdb_uhd_1 --metrics srocc plccpython scripts/compute_correlations.py data/experiments/4k_etri_live_stsvq --metrics srocc plccpython scripts/compute_correlations.py data/experiments/4k_mcml_4k --metrics srocc plccpython scripts/compute_correlations.py data/experiments/4k_sjtu_4k --metrics srocc plccPrint per-question statistics for the training stage:
python scripts/training_question_stats.py data/experiments/live_vqa_training/training_questions.csv data/experiments/vqeg_training/training_questions.csvThe output includes the number of rows, unique participants, first-try pass
rate, mean replay count, and maximum replay count for each test_case.