How to import data from miseq paired end reads 2 x 251

Pseudomonas84 · December 13, 2017, 4:03pm

Hello Everyone,
I'm new to qiime and bioinformatics, trying my best to learn qiime. I wanted to import my illumina raw data using the EMP paired end sequences tutorial. I have a problem, I was only provided with a .txt barcode file. What do I do now if my barcode is in txt format?
I'm looking forward to your reply.
Thank you very much!

Nicholas_Bokulich · December 13, 2017, 4:14pm

Hi @Pseudomonas84,

Welcome! You came to the right place.

How many lines does this file have? It sounds like your "barcode file" may actually be the list of barcodes corresponding to each sample (if the number of lines = the number of samples, plus a header line), which should instead be added to your sample metadata file to be used for demultiplexing (figuring out which sequences belong to each sample). Check out the sample metadata section of this tutorial (and the rest of that tutorial) to familiarize yourself with the sample metadata format and how it is used — and see if that is what your "barcode file" looks like.

So then (if I'm correct) it sounds like you really don't have a barcode file — and I will need a little more information from you to figure out where your barcodes are and how to import them.

What protocol are you using for sequencing?
Are your barcodes contained in the sequence reads themselves (e.g., the first few bases in each read?) or in the fastq header line? (some old Illumina formats did this) If you are not sure, you may want to talk to your sequencing center or others in your lab who have worked with this protocol previously.

Thanks!

Pseudomonas84 · December 13, 2017, 5:24pm

Hello Nicholas, thank you very much for your response. I appreciate it very much.
I have the metadata file, so that means I do not have the barcode/index file. I have the paired end raw sequences from illumina miseq in fastq format, the barcodes are in the header lines that looked like this:

@D00420:153:HMG5HBCXY:2:1101:1970:2201 1:N:0:TGACAA
TATCCATCTGCTTATGGAAGCCAAGCATTGGGGATTGAGAAAGAGTAGAAATGCCACAAGCCTCAATAGCAGGTTTAAGAGCCTCGATACGCTCAAAGTCAAAATAATCAGCGTGACATTCAGAAGGGTAATAAGAACGAACCATAAAAAAGCCTCCAAGATTTGGAGGCATGAAAACATACAATTGGGAGGGTGTCAATCCTGACGGTTATTTCCTAGACAAATTAGAGCCAATACCATCAGCTTTACCG
+
GGGGGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIGIIIIIIIIIIIIGIIIIIIIIIIIIGGGGIIGGGIIIGIIGIIIIIIIIIIIIIIIIGIIGIGGGIIIIIGIAGGI.AGGIGAGIIGIGGGIGGGIG..AAGGGGAA.77AGG.AGAGGA.AGGGG.
@D00420:153:HMG5HBCXY:2:1101:3731:2160 1:N:0:TGACCA
GAGAGTGTGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGATGTGAAACCCCCGGGCTTAACCTGGGAACTGCATTCGAAACTGGCAGGCTTGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCT

The sequencing vendor also gave me a separate .fasta and .qual files , these files contain the raw sequence data information that still have primers and barcodes. The vendor utilized the forward reads found in both the r1 and r2 files from illumina basespace.

I hope I describe this somewhat clearly. Please let me know if you need more info.

Thank you very much for your time.

Nicholas_Bokulich · December 13, 2017, 5:42pm

Thanks for the info, @Pseudomonas84!

So — your reads are in an older format with the barcodes in the sequence header/label lines; easy as .

We do not currently have a method in QIIME2 to handle this directly, but for now you can use this qiime1 command to split your fastqs into a new file (without barcodes in the header lines) and generate a new fastq containing just the barcodes. These can then be imported into QIIME2.

In a future release of QIIME2 we will probably support a more streamlined approach to simplify importing/using fastqs in this format. I've created this issue to track this and we will post back here to let you know if/when that changes!

I hope that helps!

Pseudomonas84 · December 24, 2017, 6:31pm

Hi, thank you very much for your response. I now have a reads.fastq file and a barcode.fastq file. I have already imported the files to qiime2 using the EMPSingleEndSequences type. I tried to do demultiplexing sequences--for sample metadata.tsv, I used the mapping.txt that I have. It's written in the tutorial section to use *.tsv, but the *.txt that I've used did not give errors, so I supposed that it is okay? However, when I tried viewing demux.qzv using qiime2 gui so I can decide trim and trun length, I find my graph weird. I must have done something wrong here?

My apology if this question may appear stupid--new to qiime/bioinformatics.

Please let me know how to proceed and thank you for your patience.

Nicholas_Bokulich · December 29, 2017, 12:08am

Did you join the forward/reverse reads and then import? You may want to import as EMPPairedEndSequences and follow the steps in this tutorial for processing paired-end reads.

Yes, that's fine!

That plot does look strange for a single-end read, but aren't yours joined paired-end reads? That plot looks similar to the joined read quality plots in the tutorial I've linked to above. My guess is that the read joiner that you are using may be assigning an arbitrary PHRED quality score to the overlapping bases?

Never!

I hope that helps!

Pseudomonas84 · January 8, 2018, 12:07am

Hello Nicholas,
I appreciate your help very much.
I imported using the paired end, but when I did the demux:

qiime demux emp-paired
--m-barcodes-file *mapping2_corrected.txt
--m-barcodes-category BarcodeSequence
--i-seqs Juneilluminabatch.qza
--o-per-sample-sequences demux \

I got an error msg--

Plugin error from demux:

Mismatched sequence ids: D00420:153:HMG5HBCXY:2:1201:17202:51568, D00420:153:HMG5HBCXY:2:1101:1970:2201, and D00420:153:HMG5HBCXY:2:1101:1970:2201

Debug info has been saved to /tmp/qiime2-q2cli-err-lhkpr5ah.log

What should I do now?
Thanks a lot!

Nicholas_Bokulich · January 8, 2018, 5:16pm

Hi @Pseudomonas84,
It looks like either the barcodes or the reads for those IDs are missing (most likely the barcodes). It seems like something probably went wrong when extracting barcodes and that step should probably be re-run. The good news is that there are only 3 missing! You should check on this with the following:

check the number of sequences in the original files (before importing). Use this command:
wc -l barcodes.fastq sequences.fastq
The line counts should match between the barcodes and sequences files (but probably doesn't, based on your error. If the line counts do match, there may be a different issue, e.g., that the ID is not missing but is corrupted somehow).
Take a look at the lines with those IDs in the original file and in the new files. E.g., :
grep -A 3 'D00420:153:HMG5HBCXY:2:1201:17202:51568' original_file.fastq

Please run those and let us know what the outputs are. Thanks!

Pseudomonas84 · January 8, 2018, 7:25pm

Hello Nicholas,
Thank you so much for your reply.
I did try and the line counts of barcodes and seqs were not similar:
343076 barcodes.fastq
816232 Sam1_6a_S26_L002_R2_001.fastq
816232 Sam1_6a_S26_L002_R1_001.fastq

This barcodes.fastq was extracted using the sequence file provided by the vendor which I supposed were the forwards reads found in both illumina R1 and R2 read files. I did not use the above sequences.

I tried extracting the barcode on macqiime using the above sequences and using this command:
extract_barcodes.py -f Sam1_6a_S26_L002_R1_001.fastq -r Sam1_6a_S26_L002_R2_001.fastq --attempt_read_reorientation --input_type barcode_paired_end --bc1_len 8 --bc2_len 0 --mapping_fp 060717RL515Fmapping2.txt

It returned this error:
Traceback (most recent call last):
File "/macqiime/anaconda/bin/extract_barcodes.py", line 175, in
main()
File "/macqiime/anaconda/bin/extract_barcodes.py", line 171, in main
opts.attempt_read_reorientation, disable_header_match)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/extract_barcodes.py", line 79, in extract_barcodes
forward_primers, reverse_primers = get_primers(header, mapping_data)
File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/extract_barcodes.py", line 520, in get_primers
raise IndexError(("Mapping file is missing ReversePrimer field."))
IndexError: Mapping file is missing ReversePrimer field.

I guess I will have to add a reverse primer field then and see if that would work.

Thank you so much for your help and for your patience.

Best regards,
Raya

Nicholas_Bokulich · January 8, 2018, 8:00pm

Hi @Pseudomonas84,

yes — it looks like this is a requirement for that qiime1 script. Let us know if you are still having issues with demux after you extract the barcodes from the raw reads!

system · February 9, 2018, 2:01am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.