Bible Verse Data

This dataset splits the content of the bible into its individual verses¹. The bible source text is downloaded from Project Gutenberg².

This repository contains the dataset bible_verses.csv and the source code tokenize_bible.py used to produce it.

Alternatively, the option to split the text into its individual sentences is given in the source code, however, there may be more preprocessing necessary to produce sufficient results.

Data domain

testament_title := title of the testament [old_testament, new_testament]
book_id := unique book identifier
verse_id := unique verse dentifier
bible_verse := unique verse identifier per book
text := full text of a verse
#chars := number of characters in a verse
#words := number of words in a verse
lexical_richness := Type-Token Ratio of a verse
lexical_novelty := Lexical Novelty³ of a verse in a book

Sentences that are not part of a verse are not included in the dataset. ↩
Challoner's revised Douay-Rheims Version (Old Testament 1609 & 1610, New Testament 1582). The Whole Revised and Diligently Compared with the Latin Vulgate by Bishop Richard Challoner A.D. 1749-1752, EBook-No. 8300. ↩
Hurst, M. (2011). Visualizing Lexical Novelty in Literature. URL https://datamining.typepad.com/data_mining/2011/09/visualizing-lexical-novelty-in-literature.html. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
bible_verses.csv		bible_verses.csv
tokenize_bible.py		tokenize_bible.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bible Verse Data

Data domain

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bible Verse Data

Data domain

Footnotes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages