Skip to content

Running RepeatModeler with Dfam TETools - What Database Files to Use for Classification? #315

Description

@Mtsrn0

What do you want to know?
Hello, I'm wondering what database components I need to use for the task of classifying TE's using RepeatClassifier, as in do I need the entire database, or should I just use the curated consensus sequences? Should I include curated consensus sequences from all taxa or just Actinopterygii? Would I use FamDB to pull these sequences from the Dfam database online or can I pull them from the Dfam database that is stored locally? Sorry if these are noob questions, I have looked online and haven't really been able to figure them out.

Helpful context
I'm trying to run RepeatModeler to identify TE's within the stickleback genome assembly found here https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_964276395.1/ using Dfam's included singularity image from here https://github.com/Dfam-consortium/TETools (conda version ran but RepeatClassifier did not run properly as seen in recent issues opened here such as #303). I'm running this on the Unity compute cluster and the Dfam database is located at /datasets/bio/dfam which I have read access to. I'm also aware that I need to run and reconfigure script before running the analysis using the container.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions