Duplicate barcodes on different sequencing runs

Hi Colin, Greg, Matthew and Jai,

I saw your reply under the post "Duplicate barcodes an issue?". I have a similar issue.

I have a project in which samples were sequenced on multiple runs (e.g.20 runs), which means some samples' barcode sequences are the same.

I am wondering whether I can generate one big mapfile with all samples included and then import all samples into qiime and continue the standard qiime2 pipeline without importing and DADA2 them 20 times based on their sequencing runs and combine them later. I heard that this method worked as all samples in that big mapfile have a unique sample ID in front of barcode sequence column.

My concern is if the barcodes are the same, they might get incorrectly assigned. But I also prefer to process these samples consistently (same cut-off to trim bad quality reads, same sampling depth) as they are from the same project.

Considering you are really experienced, can I get your opinions on this?

Thank you so much in advance!
Best Regards,
Godric Wang

Hi all,

I have a project in which samples were sequenced on multiple runs (e.g.20 runs), which means some samples' barcode sequences are the same.

I am wondering whether I can generate one big mapfile with all samples included and then import all samples into qiime and continue the standard qiime2 pipeline without importing and DADA2 them 20 times based on their sequencing runs and combine them later. I heard that this method worked as all samples in that big mapfile have a unique sample ID in front of barcode sequence column.

My concern is if the barcodes are the same, they might get incorrectly assigned. But I also prefer to process these samples consistently (same cut-off to trim bad quality reads, same sampling depth) as they are from the same project.

Can I ask whether anyone has any opinions on this?

Thank you so much in advance!
Best Regards,
Godric Wang

Hello!

It is highly recommended to denoise each sequencing run/lane separately, if you are going to use Dada2. Otherwise, different runs/lanes will mess up error learning step and bias your data.
So, it would be better idea to process each run/lane separately by Dada2 with the same (!) parameters and then merge the outputs (feature tables and rep-seqs). Regarding duplicated barcodes, you do not have other options beside process each run separately if you still need to demultiplex your reads. If they are already demultiplexed, then it is not important on this step.

PS. If you are familiar with bash or python scripting, you can write a script that will denoise each run in the loop, so you don't need to run it manually 20 times.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.