Some taxonomic units, such as class, order, and species, cannot be classified

Hi,thank you for your attention!
My data are paired-end 16S samples using 341F (5′-CCTACGGGNGGCWGCAG-3′) and 806R (5′-GGACTACHVGGGTATCTAAT-3′) primers. To get the species annotation results, I went through the following steps.

  1. USE FASTP TO CONTROL THE QUALITY AND TRIM THE ADAPTER
    For all samples, I use fastp to trim the adapters and control the quality of my paired-end fastq files.
  2. IMPORT 16S FASTQ FILES INTO QIIME2
    I import all fastq files which were processed by fastp to the QIIME2 through following code
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ../manifeat_paired.tsv --output-path paired-demux.qza  --input-format PairedEndFastqManifestPhred33V2
  1. CUT THE PRIMERS
    The cutadapt was used to cut the primers.
qiime cutadapt trim-paired --p-cores 50 \
--i-demultiplexed-sequences paired-demux.qza \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GGACTACHVGGGTATCTAAT \
--o-trimmed-sequences paired-end-demux.qza \
--verbose
  1. GENERATE FEATURE TABLE
    I preformed dada2 to generate a feature table
qiime dada2 denoise-paired  --i-demultiplexed-seqs paired-end-demux.qza  \
--p-n-threads 0  --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 0 \
--o-table dada2-table_fastp.qza \
--o-representative-sequences dada2-rep-seqs_fastp.qza \
--o-denoising-stats denoising-stats_fastp.qza

I used feature table generated before to calculate relative frequency.

qiime feature-table relative-frequency --i-table dada2-table_fastp.qza \
--o-relative-frequency-table table-relative.qza 
  1. TAXONOMY ANNOTATION
    I downloaded a Silva 138 99% OTUs full-length sequences
    classifier from here as my classifier.
    Then this classifier was used to annotate my feature sequences
time qiime feature-classifier classify-sklearn  --i-classifier silva-138-99-nb-classifier.qza \ 
--i-reads dada2-rep-seqs_fastp.qza \
 --o-classification taxonomy.qza

However, there are some questions about the taxonomy process and results I wonder to know before I go further.

  1. Firstly, my samples' primers are 341F (5′-CCTACGGGNGGCWGCAG-3′) and 806R (5′-GGACTACHVGGGTATCTAAT-3′). In this case, can I just simply use the full length classifier such as Silva 138 99% OTUs full-length sequences to do the annotation?

  2. The QIIME2 taxonomy result is attached to here.
    taxonomy.tsv (271.3 KB)
    Part of this result is displayed as below
    |Feature ID | Taxon | Confidence|
    |8b0b4c610291eee36e775915f5e9878b | Unassigned | 0.383114441|
    |d8db14e3e3e38951ae655e5e3c85febe | d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Mitochondria;g__Mitochondria;s__ | 0.932254952|
    |26b462bfa3606f96053e2b5aebf4aa36 | Unassigned | 0.426107767|
    |de8c5368822ad183cf1704db09a55821 | Unassigned | 0.465871704|
    |b4bd749754a135e16a76c8bd7aaeafe4 | Unassigned | 0.504385444|
    |e2c4ae1d2d224598588bd31b839fe557 | d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Mitochondria;g__Mitochondria;s__ | 0.758858058|
    |f6bcbbb7acf068b61bd6f605e55999e6 | d__Eukaryota | 0.720185498|

First of all, as we can see, there have Eukaryota in the taxonomy, but my samples are all 16S rRNA samples. That means there must be Bacterial taxonomy without Eukaryota etc. So what should I deal with this Eukaryota classification. Delete it?
Secondly, some protocols suggest that delete the taxonomy which has confidence below 0.7, I want to know the reason why chose 0.7 as the threshold? If I delete these, should I also delete these feature IDs in feature table and calculate relative abundance again?
Thirdly, as we can see here, many of the classifications can't be assigned to a specific species or genus. After I merge the taxonomy table with relative frequency feature table in R, I will calculate the sum of the every genus or family or so in different situation samples(such as tumor and health). When classification like this 'd__Eukaryota;p__Retaria;c__Foraminifera;o__Rotaliida' occurs, what genus I should take this into calculating or just neglect it? It is truly confusing!

Thank you for your patience and kindness to take your time read my post! Your assistance is greatly appreciated

1 Like

Hi @zcw15774723795 ,
I suggest reading the forum FAQs, which answers several common questions about taxonomy classification:

On the one hand, incomplete classification is typical for 16S data.

On the other hand, your reads are probably in mixed orientations and this is why you get one hit to Eukaryota. You can use the action qiime rescript orient-seqs to harmonize the orientation of your sequences and see if this improves classification.

You can use the action qiime taxa collapse to collapse your feature table in QIIME 2 before exporting to R. This will solve this issue by binning your taxonomic groups appropriately. If you collapse at genus level, you would get one feature like this: d__Eukaryota;p__Retaria;c__Foraminifera;o__Rotaliida;__;__

Good luck!

2 Likes

Hi, I wanted to express my thanks for the time and effort you took to answer my question, your detailed response not only solved my issue but also enhanced my understanding of the topic.
I will check the FAQs in detail.
Thank you once again for your kindness and support.

3 Likes