Importing multiplexed paired-end data with separated barcode sequence files

I’m trying to import multiplexed 16S paired end MiSeq data however the barcode sequence files are not the way I typically see them, as in the barcode files are not combined as per the Q2 tutorials.
I have:
R1.fastq <- Forward reads
R2.fastq <- Forward barcode sequences
R3.fastq <- Reverse barcode sequences
R4.fastq <- Reverse reads

Is it possible to import as is or do I need to combine the barcode files elsewhere before importing them to Q2? Or perhaps is it possible to import R1+R2 and R3+R4 separately and combine them after?

1 Like

Hey @Mehrbod_Estaki!

Are the contents of R2 and R3 the same by chance? If so, you should be able to get away with just using one of those arbitrarily.

It’s also possible to import already demultiplexed data, does your sequencing center have any tools or recommendations to that end, or are you on your own here?

Hi Evan,

Alas…the files are equal in size and structure (which got my hopes up) but differ in content

 wc _R3.fastq
  9573556  11966945 173463109 _R3.fastq
$ wc _R2.fastq
  9573556  11966945 173463109 _R2.fastq
$ diff _R2.fastq _R3.fastq -q
Files _R2.fastq and _R3.fastq differ
   
$ head _R2.fastq
@M01380:40:000000000-AU9B2:1:1101:19721:1873 2:Y:0:
TTGACCCT
+
CCCCCCCC
@M01380:40:000000000-AU9B2:1:1101:18776:1884 2:Y:0:
TTGACCCT
+
CCCCCCCC
@M01380:40:000000000-AU9B2:1:1101:18648:1925 2:Y:0:
TCACTCCT

$ head _R3.fastq
@M01380:40:000000000-AU9B2:1:1101:19721:1873 3:Y:0:
TGAACCTT
+
CCCCCCCC
@M01380:40:000000000-AU9B2:1:1101:18776:1884 3:Y:0:
TGTTCTCT
+
CCCCCCCC
@M01380:40:000000000-AU9B2:1:1101:18648:1925 3:Y:0:
CTCTCTAT

In this case I wasn’t directly in touch with the sequencing facility and just were given these files. I’m looking into asking for either the demultiplexed version of it or a combined barcode file, but I wanted to have a backup plan in case that wasn’t an option.

Hey @Mehrbod_Estaki,

That's a bummer.

If you can get a hold of the demultiplexed data (or any further information about how these were sequenced), that would be ideal. But if not, we might be able to treat both forward and reverse as independent single-end runs and then manually merge them together (I don't know if it'll work, but we can try).

1 Like

Just an update on this. The most basic solution was just combining the barcode files in Qiime1 as per the "Two index/barcode reads and two fastq reads" instructions and then importing them into Q2 using the EMP-PairedEnd protocol after gzipping and renaming the fastq. files

In case this helps anyone else reading this, the structure of the files were:
_R1 <- forward reads + qual scores (barcodes and adapters removed)
_R2 < forward 8nt barcodes in the same 4 line format and order as R1
_R3 <reverse barcodes as above
_R4< reverse reads same as _R1
Using Qiime1 v1.8
extract_barcodes.py --input_type barcode_paired_end -f _R2.fastq -r _R3.fastq --bc1_len 8 --bc2_len 8 -o parsed_barcodes/

In talking with the sequencing facility they think that this should be a compatible format in Qiime2 since its a very common way of receiving the data from various facilities, especially as it was previously compatible in Q1.

1 Like

Thanks for the update!

Yeah we haven't fully matched feature-parity with extract_barcodes.py (it can do so many things!), but it's on our list. I've made an issue here to track this.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.