There are a few scripts we can add to facilitate one of the main goals of this repo which is to share the eBible corpus as a Huggingface dataset.
prep_data_for_huggingface.py will create a parquet file of the data and a parquet file of metadata.
This is an efficient storage method, and the Huggingface preferred method of storing data.
If it is feasible, it would be good to add a script that will upload the required data to our huggingface account.
There are a few scripts we can add to facilitate one of the main goals of this repo which is to share the eBible corpus as a Huggingface dataset.
prep_data_for_huggingface.py will create a parquet file of the data and a parquet file of metadata.
This is an efficient storage method, and the Huggingface preferred method of storing data.
If it is feasible, it would be good to add a script that will upload the required data to our huggingface account.