Same Sample Different Barcodes: How are barcodes used aside from demultiplexing?

Joseph_Sevigny · February 15, 2017, 5:06pm

For lots of our data we are generating (18S paired-end illumina) we have some concerns regarding the barcodes.

1.) We generated two libraries for sequencing which included the same samples but the barcodes are not necessary the same from sample to sample. We therefore have cases where we have the same samples with different barcodes and in some cases we have different samples with the same barcodes.

The reads are already demultiplexed so I do not think the barcodes are necessary for anything downstream in qiime, however, I understand that the mapping file requires unique barcodes for each sample. Aside from linking the mapping file with the name of the fastqs (M36_ATACCTTCGGTA_L001_R2_001.fastq.gz) are the barcodes used for anything? I.e. if I simply ensure the names of the reads have unique barcodes that match the mapping file samples will that be enough?

I was thinking I would just concatenate the reads for the multiple runs that belong to the same sample. Will the fact that the headers for these concatenated reads have different barcodes throughout the fastq files mess anything up? In addition, the barcode at the end of the fastq headers may not match the barcode in the fastq file name.

Thank you for your time (I apologize if that wasn't clear).

gregcaporaso · February 15, 2017, 11:02pm

Hi @Joseph_Sevigny,
Replies are inline below:

Aside from linking the mapping file with the name of the fastqs (M36_ATACCTTCGGTA_L001_R2_001.fastq.gz) are the barcodes used for anything? I.e. if I simply ensure the names of the reads have unique barcodes that match the mapping file samples will that be enough?

The barcodes are not used for anything other than demultiplexing. In QIIME 2, you can leave the BarcodeSequence column out of the sample metadata mapping file, and you will not experience any issues.

I was thinking I would just concatenate the reads for the multiple runs that belong to the same sample. Will the fact that the headers for these concatenated reads have different barcodes throughout the fastq files mess anything up? In addition, the barcode at the end of the fastq headers may not match the barcode in the fastq file name.

It sounds like these data are generated in different sequencing runs, or in different lanes of the same sequencing run (let me know if that's not right). In this case, you need to process these independently, though separate runs of dada2 denoise. My understanding is that if you were to do what you're proposing, the error model generated by DADA2 would not be correct.

We don't currently have support in QIIME 2 for combining samples that are generated in different runs/lanes. We illustrate how to combine data from different lanes/runs in the FMT tutorial, but this does assume that each sample is present in only one of the lanes/runs. So, for the moment, you'd either have to (1) generate your feature table in QIIME 1 and then import it if you'd like to use QIIME 2, or (2) analyze the samples that were sequenced in multiple lanes independently (e.g., as replicated samples). I've created an issue to port the sample collapsing functionality from QIIME 1 to QIIME 2. You can track progress on this here. Sorry to not have a better answer for you on this right now.