Low taxon allocation

Hello, I am writing because when analyzing the soil metagenome with Qiim2, the assignment of taxa is low. There is a high allocation of unclassified and assigned only to the kingdom of bacteria. When I take out the Fasta and put the sequences manually in Blast, it does identify them better. What could be going on? Perhaps having only 251 base pairs in the 16S gene is not giving Qiime2 enough information to assign a taxon? Do you know any alternative to achieve better results? I have 5,000 sequences in the metagenome, doing it manually with Blast is impossible.

Perhaps you should try using the script plugin to access the ncbi database blsat

Hello Vanesa,

Can you tell us more about the region you sequenced?

The 16S V4 is used in the tutorials and is about this long. This should work well. But if you are using a different region of a full untargeted metagenome, that would explain the lack of taxonomic classification.

Can you also share more details about the Qiime2 plugins you used? Posting the commands you used for taxonomy classification helps us look for clues :mag:

I already found the problem, the allocation was small because I was not using assembled sequences. I have already passed the R1 and R2 files through Metaespades and it has given me a .fasta file, I am lost. There is a tutorial on how to start from two Illumina R1 and R2 files and pass it through the different software (Fastq, Trimmomatic, Metaespades, Qiime2).

I am trying to upload the .fasta file even though I have not passed it through Trimmomatic but the commands to pass it through qiime do not work and see if there is now a greater taxonomic assignment. Since the commands that R1 and R2 uploaded no longer work for me

Wait a moment!
Metaspades is designed for untargeted / shogun genomics.
The DADA2 pipeline is designed for targeted / amplicon sequences.
These are very different data types!

What kind of data do you have? If you have amplicon sequences, please find what primers were used to amplify them and report back.


I tell you all the information from my files along with the primers that were used:


Microbial DNA was extracted using the QIAsymphony PowerFecal Pro DNA Kit (Qiagen).

DNA was amplified following the 16S Metagenomic Sequencing Library Illumina 15044223 B protocol (ILLUMINA). In order to study bacterial communities, region V3‚ÄźV4 of the 16S rRNA gene was amplified using 341F‚Äź805R primers (Klindworth et al. 2013). In the first amplification step, primers were designed containing: 1) a universal linker sequence allowing amplicons for incorporation indexes and sequencing primers by Nextera XT Index kit (ILLUMINA); and 2) the corresponding primers of the specific region of 16S rRNA gene.

In the second and last assay amplification indexes were included. 16S based libraries were quantified
by fluorimetry using Quant‚ÄźiT‚ĄĘ PicoGreen‚ĄĘ dsDNA Assay Kit (Thermofisher).

Libraries were pooled before sequencing on the MiSeq platform (Illumina), 250 cycles paired reads
configuration. The size and quantity of the pool were assessed on the Bioanalyzer 2100 (Agilent) and
with the Library Quantification Kit for Illumina (Kapa Biosciences), respectively. PhiX Control library
(v3)(Illumina) was combined with the amplicon library (expected at 20 %).

Sequencing data were available within approximately 56 hours. Image analysis, base calling and
data quality assessment were performed on the MiSeq instrument (MiSeq Control Software (MCS

The process that I followed for my samples, I don't know if it went through an assemblage of the fragments, so I introduced Metaswords. Now I think that it is possible that Qiime2 has its own assembler, so I would only have to introduce one more step in my commands so that it could identify it correctly, but I don't know what step to introduce. I followed the following commands:

  1. qiime tools import
    --type 'SampleData[PairedEndSequencesWithQuality]'
    --input-path casava-12-paired-end-demultiplexed
    --input-format CasavaOneEightSingleLanePerSampleDirFmt
    --output-path demux-paired-end.qza

  2. qiime dada2 denoise-single
    --i-demultiplexed-seqs demux.qza
    --p-trim-left 0
    --p-trunc-len 120
    --o-representative-sequences rep-seqs-dada2.qza
    --o-table table-dada2.qza
    --o-denoising-stats stats-dada2.qza

  3. qiime quality-filter q-score
    --i-demux demux.qza
    --o-filtered-sequences demux-filtered.qza
    --o-filter-stats demux-filter-stats.qza

  4. qiime deblur denoise-16S
    --i-demultiplexed-seqs demux-filtered.qza
    --p-trim-length 120
    --o-representative-sequences rep-seqs-deblur.qza
    --o-table table-deblur.qza
    --o-stats deblur-stats.qza

  5. qiime feature-table summarize
    --i-table table-deblur.qza
    --o-visualization table-deblur.qzv
    --m-sample-metadata-file table-deblur.tsv
    qiime feature-table tabulate-seqs
    --i-data rep-seqs.qza
    --o-visualization rep-seqs.qzv

  6. qiime phylogeny align-to-tree-mafft-fasttree
    --i-sequences rep-seqs.qza
    --o-alignment aligned-rep-seqs.qza
    --o-masked-alignment masked-aligned-rep-seqs.qza
    --o-tree unrooted-tree.qza
    --o-rooted-tree rooted-tree.qza

  7. I skipped the diversity analysis because it used to give me an error.

  8. qiime feature-classifier classify-sklearn
    --i-classifier gg-13-8-99-515-806-nb-classifier.qza
    --i-reads rep-seqs.qza
    --o-classification taxonomy.qza

  9. qiime taxa barplot
    --i-table table-deblur.qza
    --i-taxonomy taxonomy.qza
    --m-metadata-file metadata.tsv
    --o-visualization taxa-bar-plots.qzv

1 Like

Thank you for the detailed description of your pipeline! This is very helpful, and I believe we can answer a few questions (and clear up a few misunderstandings).

In paired-end Illumina sequencing, the ends of the reads can overlap, and this is kind of like assembly. It's usually called 'merging' or 'pairing' because it's much simpler than shotgun assembly.

Yes! Because you have PairedEndSequencesWithQualityyou can use the qiime dada2 denoise-paired command. This pipeline includes the read merging step mentioned above.

The dada2 and deblur plugins both make feature tables. You can try them both and see what works best for your data. (They are not usually used in the same pipeline.)

When using the deblur pipeline, you can merge your reads with vsearch: merge-pairs: Merge paired-end reads. ‚ÄĒ QIIME 2 2023.2.0 documentation

1 Like

In my case, I have two types of sequences, one in which V3-V4 has been sequenced with 250bp per fragment. And another in which the complete 16S has been sequenced with 250bp per fragment. Considering that my essential goal is taxonomic assignment, which would you recommend: qiime dada2 denoise-paired or qiime vsearch merge-pairs?

The taxonomic assignment was so poor with V3-V4 that it was commissioned in the next trial to have the entire 16S sequenced, but it has gone just as bad with almost 80% assigning only to Bacteria. Now I don't know if the error is mine in the bioinformatics analysis or the problem is in the sequencing.

I am sending you the results of the analysis of my complete 16S gene sequences to see if it can help me to evaluate where you would truncate and where you would cut. When using the command qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left -f 0
--p-trim-left -r 0
--p-trunc-len -f 120
--p-trunc -len -r 120
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza
I think it has improved a lot. I am also going to attach the data from the sequencing of the V3-V4 region in the following because I think the results have to do with the quality of the sequencing as well. I think that the problem has been in both situations, not using the command well and that the sequencing was not very good, if you can confirm this I would appreciate it. First I attach the results of the complete 16S sequencing.

In the case of the V3-V4 region it makes me suspicious of the frequencies. I am attaching the results, in this case I have been clearer where to truncate the sequence. I have put this: qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left -f 0
--p-trim-left -r 0
--p-trunc-len -f 280
--p-trunc -len -r 260
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

I am attaching test results.

Finally, for the complete 16S gene, I used these values ‚Äč‚Äčand they gave me these frequencies, much more uniform.
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 17
--p-trim-left-r 21
--p-trunc-len-f 251
--p-trunc-len-r 251
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

Hi @Vanesa_Fernandez,

I just want to pop in here with a friendly reminder regarding this section in our Code of Conduct (Your Work is Your Work):

While it is perfectly acceptable to ask questions regarding your data/analysis and any general questions or recommendations, it's important to remember that ultimately you are responsible for making the specific decisions on your analysis.

With that being said, if you are looking for professional consulting regarding your analysis, @colinbrislawn, @ebolyen and @gregcaporaso do offer these types of services. You are welcome to reach out to them directly for more information.

Cheers :lizard:


I really appreciate this information because it's just what I'm looking for. Any professional advice that can help me to solve this problem. I am writing to you right now.