Summary
Currently, we are using WAL for our sqlite data access as it is "is significantly faster in most scenarios" and allows for concurrent non-blocking reads.
However, on generation of DINO embeddings during dataset generation it seems that very big files are generated, and we assume this is due to the unordered data access for the dataset generation leading to entries being in the WAL multiple times.
It should be investigated if WAL is still the best option in this scenario.
Summary
Currently, we are using
WALfor our sqlite data access as it is "is significantly faster in most scenarios" and allows for concurrent non-blocking reads.However, on generation of DINO embeddings during dataset generation it seems that very big files are generated, and we assume this is due to the unordered data access for the dataset generation leading to entries being in the WAL multiple times.
It should be investigated if WAL is still the best option in this scenario.