Running RepeatModeler with Dfam TETools - What Database Files to Use for Classification?

**What do you want to know?**
Hello, I'm wondering what database components I need to use for the task of classifying TE's using RepeatClassifier, as in do I need the entire database, or should I just use the curated consensus sequences? Should I include curated consensus sequences from all taxa or just Actinopterygii? Would I use FamDB to pull these sequences from the Dfam database online or can I pull them from the Dfam database that is stored locally? Sorry if these are noob questions, I have looked online and haven't really been able to figure them out. 

**Helpful context**
 I'm trying to run RepeatModeler to identify TE's within the stickleback genome assembly found here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_964276395.1/ using Dfam's included singularity image from here https://github.com/Dfam-consortium/TETools (conda version ran but RepeatClassifier did not run properly as seen in recent issues opened here such as https://github.com/Dfam-consortium/RepeatModeler/issues/303). I'm running this on the Unity compute cluster and the Dfam database is located at `/datasets/bio/dfam` which I have read access to. I'm also aware that I need to run and reconfigure script before running the analysis using the container.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running RepeatModeler with Dfam TETools - What Database Files to Use for Classification? #315

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Running RepeatModeler with Dfam TETools - What Database Files to Use for Classification? #315

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions