Miseq paired-end data with no barcodes

ksn · February 21, 2018, 6:10pm

Hi,
I have paired-end miSeq data in fastq format
@M00558:134:000000000-BF3W3:1:2119:18615:25104 2:N:0:1
GCGGTGTGTGCAAGGCCCGGGAACGTATTCACCGCACCGTGCTGATGTGCGATTACTAGCGATTCCAACTTCAAGGAGTCGGGTTTCAGACTCCTATCCGGACTGAGGCAGGCTTTCTGCGTTTTGCTCCACATCACTGCTTCGCTTCACTCTGTACCTGCCATTGTAGCACGTGTGTAGCCCTGGACATAAGGGCCATGATGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTTCCGCCAGAGTCCTCAGCATTACCTGCTAGTAACTGGCCGTCGGGCTTGCGCTCC
+
CCCCCGGGGGGGGGGGGGGDEGGGGGGGGGGGGGGGGGGGFGGGGGGFFFGGGGGGGGGGGGFGGGGGGGGGCFGGGGGFGGGGGGGGGFGGGGGGGGGE7=FFCGGGGCGFGGGGGGGGEFFGGGEFGGCFFFGGGGGGDFGGGGGGGGFGGFGGGGFFGFFFGGGEGFGGGFGG+DFCGFGGGGGGFGG=E9CCGE,9?FCFGGDC?4CEGCFGGGEFGGCAFG:FF57AGG3AFFGFC>C3);+/79BB13.28?F7F??BF<CBGF)905-4)6<48>@)38)((-(,0(,,(-4(
@M00558:134:000000000-BF3W3:1:2119:11618:25114 2:N:0:1
GCGGTGTGTGCAAGGCCCGGGAACGTATTCACCGCGGCATGCTGATCCGCGATTACTAGCAACTCCGACTTCATGTGGGCGGGTTGCAGCCCACAATCCGAACTGGGACCACTTTTTTGAGATCCGCTCCCCCTTGCGGGTTCGCCCCCCTCTGTGGTCGCCATTTTAGCACGTGTGAAGCCCAGGTCATCAGGGGCCTGCTGCTTCGCCGTCACCCCCCCATTCCTCCTAGTCGCCCCCGGCAGCCCCTTCAGCGTCCTCATCTCCGCCTCGAAGTATCACCACATACGGGCTGCGTTCC
+
CCCCCGGGGGGGGGGDCFEGECGGGGGGGGGGGGDG@@CFGFGGGG9FFGGGGGGGGF9ECFCFGGCFCFGGGGGEFGDGGGDGG:FGGGGG,FGGGGGGGGGGC,BFBF<BDGGG@FGGD7F@@,@+@@FFC7F,<@:F5DF6=C=@B+,7,4=<4<;F,2+22+;<3;+<9CC5C5<8+30<@FGC/0+0+<+19/2:)>)09:./52))1.977>C@F).)775:4*(0.816:9A447B()-.-(-).)48(0).0((.,((,(-(

The first 9 bases in the sequence seem to be barcodes (see the image below ...) Do I need to extract those barcodes before proceeding? I am a newbie and I would appreciate your guidance. Thanks!

jairideout · February 21, 2018, 7:00pm

Hi @ksn! Yes, you'll want to demultiplex the sequences, which will also remove the barcodes during the process. Check out the q2-cutadapt Community Tutorial for details. The tutorial's example data are single-end sequences, so the commands you'll use will differ slightly. Check out this forum post for details about how to modify the tutorial commands for paired-end data (you can ignore the first steps describing UBAM files, since your data are already in FASTQ format).

ksn · February 22, 2018, 6:45pm

Thanks! And I extract barcodes using extract_barcodes.py by telling the command to extract first 9 bases from both forward and reverse files?

jairideout · February 23, 2018, 12:47am

extract_barcodes.py is a QIIME 1 script. You can use it to extract the barcodes into a separate FASTQ file, and then use qiime demux emp-single or qiime demux emp-paired to demultiplex the sequences. However, you can skip using QIIME 1 and those QIIME 2 commands altogether if you follow the steps in the q2-cutadapt tutorial I linked to above -- that will likely be an easier route for you.

ksn · February 26, 2018, 2:27pm

I have problem in second step of the q2-cutadapt tutorial that requires metadata with barcodes.

--m-forward-barcodes-file MULTIPLE PATH
Metadata file or artifact viewable as
metadata. This option may be supplied
multiple times to merge metadata.
[required]

but I do not have list of barcodes. how to proceed? Thanks.

ebolyen · February 27, 2018, 10:12pm

Hi @ksn,

Starting back at the beginning, are these reads already demultiplexed? From your screenshot I agree that you seem to still have the barcodes attached, but since you do not have any barcode information, I suspect your sequencing center provided you with 2 fastq files per sample?

Since the barcodes appear at the beginning of your read and appear to be a fixed length, you should be able to use trim-left in the DADA2 plugin. This will cut off that however many nucleotides from the beginning of the read as you specify, solving the issue.

ksn · February 28, 2018, 12:36pm

Hi @ebolyen Yes, the sequencing center provided two fastq files per sample.

ebolyen · February 28, 2018, 4:19pm

Excellent! Then you should be able to import the data (if you are lucky, you can use the Casava format, otherwise the FASTQ Manifest is a good catch-all).

Once you've done that, you should be able to denoise as you choose, just ensure that the first 9 bases are trimmed off (such as using --p-trim-left-r 9 and --p-trim-left-f 9 for DADA2).

ksn · March 1, 2018, 1:18pm

I had renamed my fastq files so I tried FASTQ Manifest.

my manifest file looks like this:

sample-id,absolute-filepath,direction
11B-ac,/media/ejo129/somics/Data/ARC-ARK/meta-rum/fastq_raw/ren/11B-ac_R1.fastq.gz,forward
11B-ac,/media/ejo129/somics/Data/ARC-ARK/meta-rum/fastq_raw/ren/11B-ac_R2.fastq.gz,reverse
11B-bt,/media/ejo129/somics/Data/ARC-ARK/meta-rum/fastq_raw/ren/11B-bt_R1.fastq.gz,forward
11B-bt,/media/ejo129/somics/Data/ARC-ARK/meta-rum/fastq_raw/ren/11B-bt_R2.fastq.gz,reverse
11B-cp,/media/ejo129/somics/Data/ARC-ARK/meta-rum/fastq_raw/ren/11B-cp_R1.fastq.gz,forward

and the command I used is:

!/bin/bash
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path meta-rum_manifest
--source-format PairedEndFastqManifestPhred64
--output-path paired-end-demux.qza

but it threw the following error message:

An unexpected error has occurred:
Decoded Phred score is out of range [0, 62].

How can I solve this issue? Many thanks !

Mehrbod_Estaki · March 1, 2018, 4:37pm

Hi @ksn,

Are you sure your sequences have a Phred64 formatting? This is not very common these days as it is typically found in only older systems. The error message you get certainly sounds like it is related to that.
You can confirm this from your sequencing facility, or also just try changing your source format to
–source-format PairedEndFastqManifestPhred33

ksn · March 6, 2018, 2:42pm

I have imported and denoised the data using following commands:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path meta-rum_manifest
--source-format PairedEndFastqManifestPhred33
--output-path paired-end-demux.qza

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-trim-left-f 9
--p-trim-left-r 9
--p-n-threads 4
--output-dir /media/ejo129/somics/Data/ARC-ARK/meta-rum/2-denoise/

Now, I have two files - representative_sequences.qza and table.qza. What are the next steps I should follow? The data contains samples from different species, different locations and different conditions but I want to first make some sort of graphical overview (PCA for eg.). Many of the tutorials begin with metadata but I do not have one. Which tutorial would be the closest one to follow?

ChristianEdwardson · March 6, 2018, 6:25pm

This is your metadata. You'll need to make a table with this information and import it.

Here's the QIIME2 tutorial on metadata, which I think would be a good starting point.

system · April 8, 2018, 4:03am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.