Analysis code/pipeline for Barcoded PairedEnd Reads

DNA · February 28, 2020, 1:11am

I am trying to extract the first ~ 25 bases from every read in R1 (those 25 bases are barcodes) along with read name, where the output is still in FASTQ format. This should generate N fastqs where N= number of barcodes. Then, I need to find the pair for every one of these reads in R2 and split R2 accordingly.

Then, parse the R1 and R2 files and see if the reads match (meaning same barcode) or not....

Is there a tool for this? if not, how best to code this?

colinbrislawn · February 28, 2020, 1:16am

Good evening @DNA,

Welcome to the forums! :qiime2:

What paper is this barcoding format based on? Did they provide guidance on demultiplexing?

I'm not sure there is a perfect way to do this within Qiime 2, but the cutadapt plugin should be a good place to start. This will work for paired end data.

Colin

P.S. I've removed your second post on the other thread. It's probably best to start a new thread, and I can help you over here!

DNA · February 28, 2020, 3:26am

Thanks for the help! Paper has not been submitted so no info regarding nature of barcodes.

All I know is that the barcodes are inline (not in header or seq name). Barcode does not need to be trimmed or cut, but used to sort/identify R1 and R2 reads that have same barcode.

system · March 30, 2020, 9:40am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.