We are interested in generating a repeat library that can be used to mask any species in a genus (of non-model fishes). Ideally, elements in this library would have common family names that could be used for comparative purposes. We have >10 high quality assemblies that could be used as input.
Some options could be to 1) combine the assemblies and run repeat modeler to generate a single library, or 2) run repeat modeler for each species and then combine the libraries and remove duplicate sequences.
Both approaches seems to have pros and cons. Building a library from a concatenated fasta would mean that even non-repeated elements are present ~10 times. But combining libraries from each species meaning similar repeat families wouldn't share common names. Also, deciding how to remove duplicates seems like an issue in itself.
We would very much appreciate any thoughts or advice from the experts. Thank you!
We are interested in generating a repeat library that can be used to mask any species in a genus (of non-model fishes). Ideally, elements in this library would have common family names that could be used for comparative purposes. We have >10 high quality assemblies that could be used as input.
Some options could be to 1) combine the assemblies and run repeat modeler to generate a single library, or 2) run repeat modeler for each species and then combine the libraries and remove duplicate sequences.
Both approaches seems to have pros and cons. Building a library from a concatenated fasta would mean that even non-repeated elements are present ~10 times. But combining libraries from each species meaning similar repeat families wouldn't share common names. Also, deciding how to remove duplicates seems like an issue in itself.
We would very much appreciate any thoughts or advice from the experts. Thank you!