Importing a fastq file with duplicate barcodes

bensarawichitr · August 2, 2018, 1:33pm

Hello,

I am rerunning a project through QIIME2 that was originally ran using QIIME1 about two years ago.
The sequences files are all combined and there are a couple of samples that share barcodes. The person who ran the project in QIIME1 was somehow able to deal with the duplicate barcodes, but did not leave any documentation regarding how they split the files or even leaving the split files. I'm having to try to figure out how to split the files and import them into QIIME2.

I have tried splitting the files using QIIME1 and then importing and demultiplexing them only to have three samples with a count 1 output according to the qzv file.

Here are the commands I ran on the first set of samples that I split from the original fastq file.
split_libraries_fastq.py
-i processed_seqs/reads.fastq
-b processed_seqs/barcodes.fastq
-m 051116JS515F-mapping2-set-1.txt
--barcode_type 8
-o split_libraries_fastq_processedseqs_set_1/
--phred_offset 33
--store_demultiplexed_fastq

extract_barcodes.py \
  -f split_libraries_fastq_processedseqs_set_1/seqs.fastq \
  -c barcode_single_end \
  --bc1_len 8 \
  -o processed_seqs_set_1/

gzip processed_seqs_set_1/barcodes.fastq
gzip processed_seqs_set_1/reads.fastq

qiime tools import \
  --type EMPSingleEndSequences \
  --input-path processed_seqs_set_1/ \
  --output-path qiime2/sequences-set-1.qza

qiime demux emp-single \
  --i-seqs qiime2/sequences-set-1.qza \
  --m-barcodes-file 051116JS515F-mapping2-set-1.txt \
  --m-barcodes-column BarcodeSequence \
  --o-per-sample-sequences qiime2/demultiplexed-seqs-set-1.qza

qiime demux summarize \
   --i-data qiime2/demultiplexed-seqs-set-1.qza \
   --o-visualization qiime2/demultiplexed-seqs-set-1.qzv

Thanks in advance

thermokarst · August 2, 2018, 1:38pm

I have no clue how you could possibly disambiguate these samples based on barcode alone --- the barcode is going to be the unique identifier here, so reusing them across different samples makes things very tricky. Some thoughts: are these samples using a dual-index barcoding scheme? Reusing barcodes makes sense in that case, since you can get a combinatorial effect. Another option, are the samples all of the same amplicon? Maybe it is possible to disambiguate based on read length, but there is no way to do that in QIIME 2.

bensarawichitr · August 3, 2018, 1:15am

I'm not entirely sure what barcoding scheme was used, but I think it may have been dual indexing. I believe it was ran using Illumina MiSeq.

After talking with my supervisor, we think it may just be an error in labeling the barcodes because it doesn't make sense that when the project was sequenced for it to have duplicate barcodes when it was all ran in just one run.
We are contacting the people who do our sequencing for us to see if they still have the original files.

Thank you for your help!

system · September 3, 2018, 7:15am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.