Demultiplex question

I’m trying one of the tutorials (Atacama soil microbiome) and found something strange during the demultiplex step.

In the metadata file, I randomly chose a sample BAQ3473.3, which the corresponding barcode is (AACCTCGGATAA). Got the reverse complement (TTATCCGAGGTT).

Then search the reverse complete barcode (TTATCCGAGGTT) in the barcodes.fastq file. I got 10965 hits in that file.

However, in the demux-full.qza (demux-full.qzv), the number of sequences for sample BAQ3473.3 is 12991.

Why the number does not match to each other?

I tried another sample, BAQ1370.1.2, there is no hit in the barcodes.fastq file using the reverse complement barcode for searching, while there number of sequences in the demux-full.qzv is 16.

Can anyone explain it? Thanks.

I’m going to go out on a limb and say it is a function of the error correction of the barcodes.

1 Like

Hi Jordan, can you explain a little bit more? It seems not make sense to me that the number of sequences for a given sample after demultiplex is larger than the number of hits of barcodes in the raw file.

If there is some kind of error correction, the number of per sample sequences should be smaller than the number of hits in the raw barcodes file.

Thank you.

Dear Eric,

we made the same observation using qiime cutadapt dumux-single with IronTorrent data. We check several runs with QIIME1, an other demultiplex tool and the raw file like you and always we found more hits with QIIME2 tool. look here: Problem with cutadapt demultiplexing of IonTorrent data (Sorry I made the text to long for a forum discussion)

Quite the opposite logic, the EMP protocol uses carefully designed golay barcodes which allow for a certain amount of error correction and salvaging of reads that would otherwise go unassigned. I suspect if you were to rerun the code disabling the error correction (--p-no-golay-error-correction), your results would match your manual search.

You can also do a little digging inside the artifact where you will find something to this effect which confirms this:

library(tidyverse)
#unzipped demux-details.qza
dat<-read_tsv("~/Downloads/769bdf89-6264-4d59-9b5f-a4d7479784d4/data/details.tsv")

> dat %>% filter(sample=="BAQ3473.3")
# A tibble: 12,991 x 6
   id          sample   `barcode-sequence-id`                   `barcode-uncorrect… `barcode-correct… `barcode-errors`
   <chr>       <chr>    <chr>                                   <chr>               <chr>             <chr>           
 1 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1432… TTATCCGCAGTT        TTATCCGAGGTT      3               
 2 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1484… TTATCCGAGGTT        TTATCCGAGGTT      0               
 3 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1249… TTATATGAGGTT        TTATCCGAGGTT      3               
 4 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1157… TTATCCGAGGGT        TTATCCGAGGTT      2               
 5 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1425… TTATCCGAGGTT        TTATCCGAGGTT      0               
 6 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1103… TTATCCGAGGTT        TTATCCGAGGTT      0               
 7 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:2041… TTATCCGAGGTT        TTATCCGAGGTT      0               
 8 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1797… TTATCCGAGGTT        TTATCCGAGGTT      0               
 9 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1003… TTATCCGAGTTT        TTATCCGAGGTT      2               
10 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1831… TTATCCGAGGTT        TTATCCGAGGTT      0               
# … with 12,981 more rows

Now if we keep reads that are have 0 barcode errors:

> dat %>% filter(sample=="BAQ3473.3") %>% filter(`barcode-errors`==0)
# A tibble: 10,965 x 6
   id          sample   `barcode-sequence-id`                   `barcode-uncorrect… `barcode-correct… `barcode-errors`
   <chr>       <chr>    <chr>                                   <chr>               <chr>             <chr>           
 1 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1484… TTATCCGAGGTT        TTATCCGAGGTT      0               
 2 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1425… TTATCCGAGGTT        TTATCCGAGGTT      0               
 3 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1103… TTATCCGAGGTT        TTATCCGAGGTT      0               
 4 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:2041… TTATCCGAGGTT        TTATCCGAGGTT      0               
 5 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1797… TTATCCGAGGTT        TTATCCGAGGTT      0               
 6 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1831… TTATCCGAGGTT        TTATCCGAGGTT      0               
 7 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1196… TTATCCGAGGTT        TTATCCGAGGTT      0               
 8 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1334… TTATCCGAGGTT        TTATCCGAGGTT      0               
 9 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1713… TTATCCGAGGTT        TTATCCGAGGTT      0               
10 record-000… BAQ3473… @M00176:65:000000000-A41FR:1:1101:1917… TTATCCGAGGTT        TTATCCGAGGTT      0               
# … with 10,955 more rows
1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.