Low Features and Unassigned Taxonomy

rottaiano · September 26, 2024, 6:24pm

Hi! I have run a few analyses and have not run into this issue previously. I thought that maybe because I was running 2024.5 but I no longer think that is the issue.

I am running into low features (47) from 122 samples with a total frequency of 149, but with previous runs they have been about 2,072 with a total frequency of 4,631,486. When I run through the end to classification, there are a lot of unassigned or even blanks for the samples. I tried both the pre-trained classifer provided as well as training my own to take out the primer reads. I have never run into this issue before so I appreciate any input.

Below is the code that I ran using terminal (conda) and qiime2-amplicon-2024.5

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path SFB_manifest.tsv --output-path reads.qza --input-format PairedEndFastqManifestPhred33V2

qiime demux summarize --i-data reads.qza --o-visualization reads-QA.qza

qiime cutadapt trim-paired
--i-demultiplexed-sequences reads.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-match-adapter-wildcards
--p-match-read-wildcards
--p-discard-untrimmed
--o-trimmed-sequences paired-end-demux-trimmed.qza

qiime demux summarize --i-data paired-end-demux-trimmed.qza --o-visualization demux_reads-QA.qzv

Below was the visualization:

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux-trimmed.qza
--p-trunc-len-f 285
--p-trunc-len-r 200
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

qiime feature-table summarize --i-table table.qza --o-visualization table.qzv --m-sample-metadata-file SFB_manifest.tsv

Table summary/ Frequency

From a different run/ data for comparison (run on 2024.2)

Number of reads seemed OK (except for 2-3). Not as high as previous runs but still seems adequate.
per-sample-fastq-counts_after taking out primers.txt (3.1 KB)

Classifier with reference reads taken out:

qiime feature-classifier extract-reads
--i-sequences silva-138-99-seqs.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ref-seqs.qza --i-reference-taxonomy silva-138-99-tax.qza --o-classifier classifier.qza

qiime feature-classifier classify-sklearn
--i-classifier classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza

qiime metadata tabulate
--m-input-file taxonomy.qza
--o-visualization taxonomy.qzv

qiime taxa barplot
--i-table table.qza
--i-taxonomy taxonomy.qza
--m-metadata-file SFB_manifest.tsv
--o-visualization taxa-bar-plots.qzv

with the pre-trained classifier:

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy2.qza

qiime metadata tabulate
--m-input-file taxonomy2.qza
--o-visualization taxonomy2.qzv

qiime taxa barplot
--i-table table.qza
--i-taxonomy taxonomy2.qza
--m-metadata-file SFB_manifest.tsv
--o-visualization taxa-bar-plots2.qzv

I think that the latter issue with classification is stemming from the low features, but I have never run into this before. Happy to provide any other information!

SoilRotifer · September 26, 2024, 6:44pm

Hi @rottaiano,

Given your quality plots, I would suggest playing around with different values for your truncation length settings. This might help increase the data you retain as well as improve the taxonomic assignment. The quality of the reads is poor towards the 3' end... which can lead to failed merges. Look at the denoising stats output file for more details on where you are losing data, usually this is related to failed merge and de novo chimera detection.

You can find more on these topics if you search the forum for unassigned taxonomy, and truncation settings.

-Mike

rottaiano · September 26, 2024, 7:19pm

Thank you will do that!

system · October 28, 2024, 1:19am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.