Hi, thanks for releasing this great work!
I’m currently exploring the FAISS index at:
faiss_index/SwissProt/ProTrek_650M_UniRef50/
Inside this directory, I noticed that:
sequence/ids.tsv contains UniProt IDs, and each line corresponds to a protein sequence.
- Similarly,
structure/ids.tsv also contains UniProt IDs for protein structures.
- There’s also a
text/ folder, which seems to contain textual annotations.
My question is:
How are these three parts (sequence, structure, and text) aligned with each other?
Is the matching done through a pointer (e.g.,ids.tsv.pointer.npy)?
I tried checking the correspondence by comparing line indices — for example, line 0 in sequence/ids.tsv vs. line 0 in text/ids.tsv — but they don’t seem to match.
Could you please clarify:
- How to correctly align entries between
sequence, structure, and text?
- If a mapping file or pointer is used, where can I find it?
Thanks a lot for your help!
Hi, thanks for releasing this great work!
I’m currently exploring the FAISS index at:
Inside this directory, I noticed that:
sequence/ids.tsvcontains UniProt IDs, and each line corresponds to a protein sequence.structure/ids.tsvalso contains UniProt IDs for protein structures.text/folder, which seems to contain textual annotations.My question is:
How are these three parts (
sequence,structure, andtext) aligned with each other?Is the matching done through a pointer (e.g.,ids.tsv.pointer.npy)?
I tried checking the correspondence by comparing line indices — for example, line 0 in
sequence/ids.tsvvs. line 0 intext/ids.tsv— but they don’t seem to match.Could you please clarify:
sequence,structure, andtext?Thanks a lot for your help!