Hi,thank you for your attention!
My data are paired-end 16S samples using 341F (5′-CCTACGGGNGGCWGCAG-3′) and 806R (5′-GGACTACHVGGGTATCTAAT-3′) primers. To get the species annotation results, I went through the following steps.
- USE FASTP TO CONTROL THE QUALITY AND TRIM THE ADAPTER
For all samples, I usefastp
to trim the adapters and control the quality of my paired-end fastq files. - IMPORT 16S FASTQ FILES INTO QIIME2
I import all fastq files which were processed byfastp
to the QIIME2 through following code
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ../manifeat_paired.tsv --output-path paired-demux.qza --input-format PairedEndFastqManifestPhred33V2
- CUT THE PRIMERS
Thecutadapt
was used to cut the primers.
qiime cutadapt trim-paired --p-cores 50 \
--i-demultiplexed-sequences paired-demux.qza \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GGACTACHVGGGTATCTAAT \
--o-trimmed-sequences paired-end-demux.qza \
--verbose
- GENERATE FEATURE TABLE
I preformeddada2
to generate a feature table
qiime dada2 denoise-paired --i-demultiplexed-seqs paired-end-demux.qza \
--p-n-threads 0 --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 0 \
--o-table dada2-table_fastp.qza \
--o-representative-sequences dada2-rep-seqs_fastp.qza \
--o-denoising-stats denoising-stats_fastp.qza
I used feature table generated before to calculate relative frequency.
qiime feature-table relative-frequency --i-table dada2-table_fastp.qza \
--o-relative-frequency-table table-relative.qza
- TAXONOMY ANNOTATION
I downloaded a Silva 138 99% OTUs full-length sequences
classifier from here as my classifier.
Then this classifier was used to annotate my feature sequences
time qiime feature-classifier classify-sklearn --i-classifier silva-138-99-nb-classifier.qza \
--i-reads dada2-rep-seqs_fastp.qza \
--o-classification taxonomy.qza
However, there are some questions about the taxonomy process and results I wonder to know before I go further.
-
Firstly, my samples' primers are 341F (5′-CCTACGGGNGGCWGCAG-3′) and 806R (5′-GGACTACHVGGGTATCTAAT-3′). In this case, can I just simply use the full length classifier such as Silva 138 99% OTUs full-length sequences to do the annotation?
-
The QIIME2 taxonomy result is attached to here.
taxonomy.tsv (271.3 KB)
Part of this result is displayed as below
|Feature ID | Taxon | Confidence|
|8b0b4c610291eee36e775915f5e9878b | Unassigned | 0.383114441|
|d8db14e3e3e38951ae655e5e3c85febe | d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Mitochondria;g__Mitochondria;s__ | 0.932254952|
|26b462bfa3606f96053e2b5aebf4aa36 | Unassigned | 0.426107767|
|de8c5368822ad183cf1704db09a55821 | Unassigned | 0.465871704|
|b4bd749754a135e16a76c8bd7aaeafe4 | Unassigned | 0.504385444|
|e2c4ae1d2d224598588bd31b839fe557 | d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rickettsiales;f__Mitochondria;g__Mitochondria;s__ | 0.758858058|
|f6bcbbb7acf068b61bd6f605e55999e6 | d__Eukaryota | 0.720185498|
First of all, as we can see, there have Eukaryota in the taxonomy, but my samples are all 16S rRNA samples. That means there must be Bacterial taxonomy without Eukaryota etc. So what should I deal with this Eukaryota classification. Delete it?
Secondly, some protocols suggest that delete the taxonomy which has confidence below 0.7, I want to know the reason why chose 0.7 as the threshold? If I delete these, should I also delete these feature IDs in feature table and calculate relative abundance again?
Thirdly, as we can see here, many of the classifications can't be assigned to a specific species or genus. After I merge the taxonomy table with relative frequency feature table in R, I will calculate the sum of the every genus or family or so in different situation samples(such as tumor and health). When classification like this 'd__Eukaryota;p__Retaria;c__Foraminifera;o__Rotaliida' occurs, what genus I should take this into calculating or just neglect it? It is truly confusing!
Thank you for your patience and kindness to take your time read my post! Your assistance is greatly appreciated