Hi,
I'm trying to test the spacedust to figure out the conserved gene clusters between the two example genomes.
- Creating databases
spacedust createsetdb listOfFastaFiles.tsv setDB tmpFolder --gff-dir examples/gff.txt --gff-type CDS
the listOfFastaFiles.tsv is:
examples/uvig_120081.fna
examples/uvig_255655.fna
-
Convert to structure sequence DB (the reference FoldseekDB Alphafold/UniProt has been downloaded in ~/database/FoldSeek/UniProt/ and named as afdb.
spacedust aa2foldseek setDB ~/database/FoldSeek/UniProt/afdb tmpFolder
Here I got two databases, setDB_foldseek and setDB_unmapped.
Q: I will analyze some virus genomes later, so full Foldseek structure searches against precomputed structures probably is a better choice than ProstT5?
-
Search querySetDB against targetSetDB (using Foldseek and MMseqs)
spacedust clustersearch setDB setDB result.tsv tmpFolder --search-mode 1 --num-iterations 2
-
I got the result.tsv file here.
result.tsv
I am not sure whether I have run the tool correctly. I am also confused by the results, as I would expect to observe some conserved gene clusters between the two example genomes.
Q: Besides, what if I have many genomes and want to identify the conserved gene clusters between any of the genomes?
Thanks!
Best wishes!
Hi,
I'm trying to test the spacedust to figure out the conserved gene clusters between the two example genomes.
spacedust createsetdb listOfFastaFiles.tsv setDB tmpFolder --gff-dir examples/gff.txt --gff-type CDSthe
listOfFastaFiles.tsvis:examples/uvig_120081.fna
examples/uvig_255655.fna
Convert to structure sequence DB (the reference FoldseekDB
Alphafold/UniProthas been downloaded in~/database/FoldSeek/UniProt/and named asafdb.spacedust aa2foldseek setDB ~/database/FoldSeek/UniProt/afdb tmpFolderHere I got two databases,
setDB_foldseekandsetDB_unmapped.Q: I will analyze some virus genomes later, so full Foldseek structure searches against precomputed structures probably is a better choice than ProstT5?
Search querySetDB against targetSetDB (using Foldseek and MMseqs)
spacedust clustersearch setDB setDB result.tsv tmpFolder --search-mode 1 --num-iterations 2I got the result.tsv file here.
result.tsv
I am not sure whether I have run the tool correctly. I am also confused by the results, as I would expect to observe some conserved gene clusters between the two example genomes.
Q: Besides, what if I have many genomes and want to identify the conserved gene clusters between any of the genomes?
Thanks!
Best wishes!