Skip to content

Turtilla/ud-treebank-iaa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UD Treebank Inter-Annotator Agreement

This code is intended for calculating inter-annotator agreement when annotating (parallel learner) treebanks in the UD format.

The implementation of Krippendorff's alpha for dependency trees (syn_agreement.py) comes from Cora Haiber's adaptation (https://gitlab.ruhr-uni-bochum.de/comphist/lrec-coling_2024_hgb/-/tree/main/scripts/alpha_agreement) of Arne Skjaerholt's code (https://github.com/arnsholt/syn-agreement/). This version also includes changes that make this code importable as a module. See also LICENSES below.

Example usage:

To run the code you need one or more file with annotations per annotator, as well as helper .json files specifying:

  • batches.json needs to contain a list of dictionaries where each dictionary constitutes an annotation batch (timestep), the key inside it specifies the annotation pair using (code)names separated by a hyphen (e.g. 'a1-a2') or 'everyone' if everyone annotated the given sentences and where the value of that key is a list of sentence IDs annotated in that batch.
  • paths.json which is a list of paths to all of the .conllu files that are to be included in the analysis.
  • usernames.json which is a dictionary where the key is the (code)name of the annotator used in batches.json and the value is the annotator name included in the name of the .conllu file(s). These should ideally be different.

Calculating agreement per sentence:

  - python3 iaa.py --annotators 'usernames.json' --paths 'paths.json' --batches 'batches.json' --truncated_id --parallel --output 'ud_swell_per_sent_agr' --per_sent 

Calculating dependency relation agreement per annotation batches:

  - python3 iaa.py --annotators 'usernames.json' --paths 'paths.json' --batches 'batches.json' --truncated_id --ann_type 'deprel' --parallel --output 'deprel-batches.json' 

Calculating lemma agreement relative to sentence length:

  - python3 iaa.py --annotators 'usernames.json' --paths 'paths.json' --bins --truncated_id --ann_type 'lemma' --parallel --output 'lemma-bins.json' 

LICENCES:

The files syn-agreement.py and conll.py are (c) 2014 Arne Skjærholt and released under the GNU GPL version 2 or later: http://gnu.org/licenses/gpl.html, with the Python 3 version made possible by Cora Haiber and adaptations from Maria Irena Szawerna.

The code in alpha.py is (c) 2011-2014 Thomas Grill and released under the Creative Commons Attribution-ShareAlike licence: http://creativecommons.org/licenses/by-sa/3.0/

The code in iaa.py is (c) 2026 Maria Irena Szawerna and is released under GNU GPL v.3 or later: https://www.gnu.org/licenses/gpl-3.0.html

About

Code for calculating Inter-Annotator Agreement for UD treebanks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages