What do you want to know?
Hello, I'm wondering what database components I need to use for the task of classifying TE's using RepeatClassifier, as in do I need the entire database, or should I just use the curated consensus sequences? Should I include curated consensus sequences from all taxa or just Actinopterygii? Would I use FamDB to pull these sequences from the Dfam database online or can I pull them from the Dfam database that is stored locally? Sorry if these are noob questions, I have looked online and haven't really been able to figure them out.
Helpful context
I'm trying to run RepeatModeler to identify TE's within the stickleback genome assembly found here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_964276395.1/ using Dfam's included singularity image from here https://github.com/Dfam-consortium/TETools (conda version ran but RepeatClassifier did not run properly as seen in recent issues opened here such as #303). I'm running this on the Unity compute cluster and the Dfam database is located at /datasets/bio/dfam which I have read access to. I'm also aware that I need to run and reconfigure script before running the analysis using the container.
What do you want to know?
Hello, I'm wondering what database components I need to use for the task of classifying TE's using RepeatClassifier, as in do I need the entire database, or should I just use the curated consensus sequences? Should I include curated consensus sequences from all taxa or just Actinopterygii? Would I use FamDB to pull these sequences from the Dfam database online or can I pull them from the Dfam database that is stored locally? Sorry if these are noob questions, I have looked online and haven't really been able to figure them out.
Helpful context
I'm trying to run RepeatModeler to identify TE's within the stickleback genome assembly found here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_964276395.1/ using Dfam's included singularity image from here https://github.com/Dfam-consortium/TETools (conda version ran but RepeatClassifier did not run properly as seen in recent issues opened here such as #303). I'm running this on the Unity compute cluster and the Dfam database is located at
/datasets/bio/dfamwhich I have read access to. I'm also aware that I need to run and reconfigure script before running the analysis using the container.