You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/api/benchmarks.rst
+93-4Lines changed: 93 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ pythainlp.benchmarks
6
6
Introduction
7
7
------------
8
8
9
-
The `pythainlp.benchmarks` module is a collection of utility functions designed for benchmarking tasks related to Thai Natural Language Processing (NLP). Currently, the module includes tools for word tokenization benchmarking. Please note that additional benchmarking tasks will be incorporated in the future.
9
+
The `pythainlp.benchmarks` module is a collection of utility functions designed for benchmarking tasks related to Thai Natural Language Processing (NLP). The module includes tools for word tokenization benchmarking and evaluation metrics for text generation tasks (BLEU and ROUGE).
10
10
11
11
Tokenization
12
12
------------
@@ -23,8 +23,8 @@ The quality of word tokenization can significantly impact the accuracy of downst
Preprocessing is a crucial step in NLP tasks. The `preprocessing` function assists in preparing text data for tokenization, which is essential for accurate and consistent benchmarking.
40
40
41
+
Evaluation Metrics
42
+
------------------
43
+
44
+
The module provides pure Python implementations of common evaluation metrics (BLEU and ROUGE) that automatically handle Thai text tokenization. These metrics are essential for evaluating machine translation, text summarization, and other text generation tasks.
45
+
46
+
BLEU Score
47
+
^^^^^^^^^^
48
+
49
+
BLEU (Bilingual Evaluation Understudy) is a metric for evaluating the quality of machine-translated text. It compares the generated text against one or more reference translations by measuring n-gram precision with a brevity penalty.
50
+
51
+
.. autofunction:: pythainlp.benchmarks.bleu_score
52
+
53
+
**Example:**
54
+
55
+
.. code-block:: python
56
+
57
+
from pythainlp.benchmarks import bleu_score
58
+
59
+
# Single reference
60
+
references = ["สวัสดีครับ วันนี้อากาศดีมาก"]
61
+
hypotheses = ["สวัสดีค่ะ วันนี้อากาศดี"]
62
+
score = bleu_score(references, hypotheses)
63
+
print(f"BLEU: {score['bleu']:.2f}")
64
+
65
+
# Multiple references per hypothesis
66
+
references = [
67
+
["สวัสดีครับ", "สวัสดีค่ะ"],
68
+
["ลาก่อนครับ", "ลาก่อนค่ะ"],
69
+
]
70
+
hypotheses = ["สวัสดี", "ลาก่อน"]
71
+
score = bleu_score(references, hypotheses)
72
+
print(f"BLEU: {score['bleu']:.2f}")
73
+
74
+
ROUGE Score
75
+
^^^^^^^^^^^
76
+
77
+
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics for evaluating automatic summarization and machine translation. It measures the overlap between the generated text and reference text(s).
Word Error Rate is a common metric for evaluating speech recognition and machine translation systems. It measures the minimum number of word-level edits (insertions, deletions, substitutions) needed to transform the hypothesis into the reference.
Character Error Rate is a metric for evaluating speech recognition and optical character recognition (OCR) systems. It measures the minimum number of character-level edits (insertions, deletions, substitutions) needed to transform the hypothesis into the reference.
from pythainlp.benchmarks import character_error_rate
124
+
125
+
reference ="สวัสดีครับ"
126
+
hypothesis ="สวัสดีค่ะ"
127
+
cer = character_error_rate(reference, hypothesis)
128
+
print(f"CER: {cer:.4f}")
129
+
41
130
Usage
42
131
-----
43
132
44
-
To make use of these benchmarking functions, you can follow the provided examples and guidelines in the official PyThaiNLP documentation. These tools are invaluable for researchers, developers, and anyone interested in improving and evaluating Thai word tokenization methods.
133
+
To make use of these benchmarking functions, you can follow the provided examples and guidelines in the official PyThaiNLP documentation. These tools are invaluable for researchers, developers, and anyone interested in improving and evaluating Thai word tokenization methods and text generation systems.
0 commit comments