Low Features and Unassigned Taxonomy

Hi! I have run a few analyses and have not run into this issue previously. I thought that maybe because I was running 2024.5 but I no longer think that is the issue.

I am running into low features (47) from 122 samples with a total frequency of 149, but with previous runs they have been about 2,072 with a total frequency of 4,631,486. When I run through the end to classification, there are a lot of unassigned or even blanks for the samples. I tried both the pre-trained classifer provided as well as training my own to take out the primer reads. I have never run into this issue before so I appreciate any input.

Below is the code that I ran using terminal (conda) and qiime2-amplicon-2024.5

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path SFB_manifest.tsv --output-path reads.qza --input-format PairedEndFastqManifestPhred33V2

qiime demux summarize --i-data reads.qza --o-visualization reads-QA.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences reads.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-match-adapter-wildcards
--p-match-read-wildcards
--p-discard-untrimmed
--o-trimmed-sequences paired-end-demux-trimmed.qza

qiime demux summarize --i-data paired-end-demux-trimmed.qza --o-visualization demux_reads-QA.qzv

Below was the visualization:

Reads

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux-trimmed.qza
--p-trunc-len-f 285
--p-trunc-len-r 200
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

qiime feature-table summarize --i-table table.qza --o-visualization table.qzv --m-sample-metadata-file SFB_manifest.tsv

Table summary/ Frequency
TableSummary

From a different run/ data for comparison (run on 2024.2)
FABFrequency

Number of reads seemed OK (except for 2-3). Not as high as previous runs but still seems adequate.
per-sample-fastq-counts_after taking out primers.txt (3.1 KB)

Classifier with reference reads taken out:

qiime feature-classifier extract-reads
--i-sequences silva-138-99-seqs.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy silva-138-99-tax.qza --o-classifier classifier.qza

qiime feature-classifier classify-sklearn
--i-classifier classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza

qiime metadata tabulate
--m-input-file taxonomy.qza
--o-visualization taxonomy.qzv

qiime taxa barplot
--i-table table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file SFB_manifest.tsv
--o-visualization taxa-bar-plots.qzv

Trainedclassifier

with the pre-trained classifier:

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy2.qza

qiime metadata tabulate
--m-input-file taxonomy2.qza
--o-visualization taxonomy2.qzv

qiime taxa barplot
--i-table table.qza
--i-taxonomy taxonomy2.qza
--m-metadata-file SFB_manifest.tsv
--o-visualization taxa-bar-plots2.qzv

Pretrainedclassifier

I think that the latter issue with classification is stemming from the low features, but I have never run into this before. Happy to provide any other information!

Hi @rottaiano,

Given your quality plots, I would suggest playing around with different values for your truncation length settings. This might help increase the data you retain as well as improve the taxonomic assignment. The quality of the reads is poor towards the 3' end... which can lead to failed merges. Look at the denoising stats output file for more details on where you are losing data, usually this is related to failed merge and de novo chimera detection.

You can find more on these topics if you search the forum for unassigned taxonomy, and truncation settings.

-Mike

Thank you will do that!