Pipeline for viral metagenomics

Hello guys,
I must premise that I am newbie of using qiime and its plugins. Having already done the pre-processing step of my pipeline, I got the fasta file of my data already trimmed, quality checked, rRNA cleaned and removed by contaminants. Now, I would like to make the taxonomic analysis of my file containing viral sequences for which I don't have reference genome. At this point I have some questions:

  1. Is there a specific tutorial on using qiime2 on viral sequences ? (Not amplicon)
  2. In order to make the reference database in qza, I tried both
  • The qiime tools import --type "FeatureData[ProteinSequence]" approach (using my local refseq file) and getting the following error: Invalid character 'U' at position 43 on line 134862 (does not match IUPAC characters for this sequence type);

  • The qiime rescript get-ncbi-data --p-query mini.nonredundant_protein.faa technique, obtaining ALWAYS the connection errors from NCB. Of course, the network connection perfectly work on my HPC.

I am using qiime2-2021.8 and rescript-2021.8.0 on Ubuntu 20.04
In the end, I don't know how to go ahead. Please, I hope someone will help me.
Thank you in advance.

Hi @emiliomastriani,
welcome in the forum!

I suspect if you could give more information on your dataset we could help more!
What platform did you use to get the sequences?
Are they paired-end or single-end and, how long they are? (If on Illumina).
Do you still have the raw reads? Or the fastq files version for the processed reads?
I am asking because it is easier to import fastq files into QIIME2 rather than fasta files, moreover without the quality information they maybe less useful.

On the database, are you trying to import a protein database? What kind of sequences do you have? Fragmented viral genome?
If so, I would probably look at the following plug in, which implement Metaphlan2 for the QIIME2 pipeline: QIIME 2 Library

There is also a newer version of Metaphlan, but is not yet in QIIME2: MetaPhlAn3 – The Huttenhower Lab

In both cases, you need quality trimmed reads and it will profile your species.
However, I am not sure on which viruses are in the inner database, you probably need to dig a little bit if what you need is in there first!
Hope it helps,

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.