Almost all reads assigned to 1 sample?

I’ve been following the Moving Pictures tutorial generally for both an ITS and 16s libraries (Miseq multiplexed 300bp single-end) with a Keemei-checked mapping file with the correct barcodes. We’ve successfully used a different pipeline earlier for these specific libraries - so we know generally what the data should look like (and that the data is good).

However, in our Qiime2 results for both libraries, most of the sequences appear to be in only one sample - with a few sequences appearing exactly once in a few samples. Based on our older analysis, there are 96 samples total that should have a relatively even spread of reads from 100's of OTUs.

We didn’t trim primers/adapters since the Miseq data doesn’t include them in the sequences. (Although I saw a post where someone's Miseq data where their Illumina did have both forward and reverse primers included?)

I'm wondering if I'm misunderstanding a step, or there are actually primers/adapters in our sequences that are causing the DADA2 to put almost all sequences in one sample? Thank you in advance for any insights!

Here are the Qiime2 commands I used:

qiime tools import
--type MultiplexedSingleEndBarcodeInSequence
--input-path L1P1.fastq.gz
--output-path multiplexed-seqs.qza

--i-seqs multiplexed-seqs.qza
--m-barcodes-file map_k.tsv
--m-barcodes-column BarcodeSequence
--p-error-rate 0
--o-per-sample-sequences demultiplexed-seqs.qza
--o-untrimmed-sequences untrimmed.qza

--i-data demultiplexed-seqs.qza
--o-visualization demultiplexed-seqs.qzv

#The quality was good enough for all of the lengths, so I set the trunc length to the maximum it would #allow: 298bp.
--i-demultiplexed-seqs demultiplexed-seqs.qza
--p-trim-left 0
--p-trunc-len 298
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

qiime feature-table tabulate-seqs
--i-data rep-seqs-dada2.qza
--o-visualization rep-seqs.qzv

qiime tools export
--input-path table-dada2.qza
--output-path exported-feature-table

#Visualize table file as a .tsv file. This will give OTU ID, Sample ID, and counts in a table.
qiime feature-table summarize
--i-table table-dada2.qza
--o-visualization table.qzv
--m-sample-metadata-file map.tsv

Hello @ahale004,

This is neither a dada2 nor a primer/adapter issue, but an importing issue. The importing step is what recognizes samples and assigns sequences to them. I would recommend reviewing the importing docs.

Unless you mean that dada2 is filtering features from most samples, but that's another issue.

Thanks! I'm having trouble describing the issue!

I think it could be that dada2 is filtering features from most samples. I'm not sure how to describe it accurately, so I took screenshots of both the Qiime2 output and data from the same library (but different pipeline).

It seems like one sample has 2-fold higher than expected reads (10x6 instead of ~10x4 for the most abundant taxa), while most of the other samples have no reads. Very few samples have less than a thousand reads for the most abundant sequences.

Hello @ahale004,

From which step(s) in your analysis pipeline are the above feature tables?

The non-Qiime screenshot is the final output of the UPARSE pipeline, after assigning taxonomy and cleaning up PhiX, chimeras etc.

The qiime2 feature table screenshot is from after DADA2, created by converting the table-dada2.qza to biom to tsv:

--input-path table-dada2.qza --output-path table
biom convert -i table.biom -o table.from_biom.txt --to-tsv

Hello @ahale004,

Can you share your dada2 stats archive and your demux visualizer?

Hi, yes!
dada2 stats archive:
stats-dada2.qza (19.3 KB)

demux vizualizer:
demultiplexed-seqs.qzv (302.8 KB)

Hello @ahale004,

It looks like your dada2 run filtered out most of your reads. You'll need to go back to that step and re-evaluate the parameters you chose.


Thank you Colin!! I will go troubleshoot that next.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.