Hello
@Afaf and I are having an issue with demultiplexing (or possibly denoising) end reads as well. We used demux cut-adapt in qiime2-2022.8, and we are looking for some guidance
Briefly, we received multiplexed forward and reverse reads and metadata files (.txt) with barcodes for ITS and 16S amplicons. We started with the 16S, which was amplifed with EMP primers 515f and 806r. Because we do not have a fastq file for the barcodes, we imported the data as paired end multiplexed data with barcodes in sequences and demultiplexed via cutadapt. The adapters were already trimmed off by the sequencing facility, so no trimming was performed.
Import code:
qiime tools import \
--type MultiplexedPairedEndBarcodeInSequence \
--input-path muxed-pe-barcode-in-seq \
--output-path multiplexed-seqs.qza
Demux code:
qiime cutadapt demux-paired
--i-seqs multiplexed-seqs.qza
--m-forward-barcodes-file barcodes.txt
--m-forward-barcodes-column BarcodeSequence
https://view.qiime2.org/visualization/?type=html&src=09c6d6cd-b1d7-4553-8da4-d6c4e568d125
although there is some variabilty in number of reads from sample to sample (~15000 down to ~600 across 96 samples), we proceeded with denoising in dada2 with limited truncation
Denoising code:
qiime dada2 denoise-paired --i-demultiplexed-seqs 16S_per_sample_sequences.qza --p-n-threads 28 --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 248 --p-trunc-len-r 220 --o-representative-sequences 16S_rep-seqs-dada2_f.qza --o-table 16S_table-dada2_f.qza --o-denoising-stats 16S_stats-dada2_f.qza --verbose
But it looks like lots of reads were lost at the filtering step.
Example:
sample-id | input | filtered | denoised | merged | non-chimeric |
---|---|---|---|---|---|
#q2:types | numeric | numeric | numeric | numeric | numeric |
1.0 | 26123 | 298 | 298 | 119 | 119 |
1.1 | 26844 | 17 | 17 | 0 | 0 |
1.2 | 9459 | 0 | 0 | 0 | 0 |
1.3 | 7119 | 0 | 0 | 0 | 0 |
10.0 | 49600 | 37 | 37 | 0 | 0 |
10.1 | 2664 | 0 | 0 | 0 | 0 |
10.2 | 19477 | 132 | 132 | 37 | 37 |
10.3 | 14230 | 0 | 0 | 0 | 0 |
11.0 | 20846 | 2742 | 2742 | 2007 | 2007 |
11.1 | 23153 | 24 | 24 | 0 | 0 |
11.2 | 5627 | 7 | 7 | 0 | 0 |
11.3 | 2755 | 0 | 0 | 0 | 0 |
Our concerns:
-
it does not seem that the reads are all that bad, so we're not sure why the filtering step is getting rid of so many (all of the reads in some cases). We are running the denoising step again with lower p-max-ee values to test this and again with 0 truncation in either direction
-
This makes us wonder whether there is something wrong with our demultiplexing or our importing steps. The primers used were from the Earth Microbiome Project (indicating that we should have imported using the --type
EMPPairedEndSequences
), but there was not barcodes.fastq file provided (why we went with--type MultiplexedSingleEndBarcodeInSequence
).
Which brings us to 3) there seems to be high and unexpected read count variability across the samples. This makes me think that maybe something in demux step or the import step was incorrect and reads are being assigned incorrectly. Could this happen?
So to sum it up: is this a read quality issue or is this an import issue?
Thanks for bearing with me on this super long query, apologies if this is covered elsewhere in the forum, and thanks in advance for any insight!
Laura