Skip to content

whether vclust is also suitable for plant viruses assembled from transcriptomic data. #30

Description

@zhangwenda0518

Hi, I’ve carefully read your article and noticed that vclust has been tested on metagenomic and phage virus datasets. I’d like to ask whether vclust is also suitable for plant viruses assembled from transcriptomic data.

I’m currently mining plant viruses from public plant transcriptome datasets. For the species I’m focusing on, I’ve collected more than 300 samples. After assembling the reads and performing virus identification, I carried out taxonomic classification and predicted potential plant viruses based on their taxonomic ranks and host relationships. Although I used MMseqs2 to dereplicate the assembled sequences, there are still a large number of fragments remaining, and I’m stuck at this step.

I’m considering using vclust to generate vOTUs as the next step. Would that be appropriate? Because the data are transcriptomic, the assembled contigs are relatively short — their lengths range from 500 bp to 20 kb, but the vast majority fall between 500 and 3000 bp, so they are quite fragmented. Could this short fragmentation affect the results?

Below are some intermediate results from my pipeline (assembly results, virus identification results, MMseqs2 clustering results, and the plant virus candidates).

I would really appreciate any advice. Thank you very much!

Genome                                                  Size     Contigs    Max_len    N50      N90      >500bp_Num   >500bp_Ratio   >1000bp_Num  >1000bp_Ratio
megahit.mix.fasta                                       1.2G     1666981    40290      789      355      786973       71.10%         266714       40.39%
mix-cobra.merged.fasta                                  945M     1349084    40534      894      303      560222       71.73%         217827       46.26%
mix-mmseqs.cluster_rep_seq.fasta                        308M     473343     40534      741      286      183270       67.64%         59849        39.53%
Plant.classified.fasta                                  49M      49576      13672      1006     558      49371        99.79%         13800        50.26%

Furthermore, the most abundant families and genera found in the analysis results are distributed as follows:

─ Top-15 Family ─
  Alphaflexiviridae                                      62,249
  Betaflexiviridae                                       12,708
  Rhabdoviridae                                           3,815
  Caulimoviridae                                            530
  Pospiviroidae                                             191
  Bromoviridae                                               84
  Closteroviridae                                            70
  Fimoviridae                                                42
  Potyviridae                                                36
  Tombusviridae                                              25
  Secoviridae                                                23
  Atkinsviridae                                              15
  Tymoviridae                                                11
  Benyviridae                                                10
  Virgaviridae                                                9

─ Top-15 Genus ─
  Potexvirus                                             62,213
  Foveavirus                                              7,056
  Carlavirus                                              4,684
  Betacytorhabdovirus                                     3,784
  Vitivirus                                                 855
  Soymovirus                                                451
  Pospiviroid                                               191
  Ilarvirus                                                  83
  Banmivirus                                                 70
  Caulimovirus                                               51
  Emaravirus                                                 42
  Lolavirus                                                  28
  Crinivirus                                                 25
  Ipomovirus                                                 22
  Closterovirus                                              21

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions