(Help) Beginner Advice/Guidance for Importing Paired-End Reads

Chang_Ega · November 7, 2019, 4:44pm

Hi everyone, I am very new to QIIIME 2 (with no prior experience in microbiome bioinformatics) and I would like to seek advice on how to correctly import paired-end reads as shown below.

I have gone through the attacama and import tutorials.
I tried running the manifest format only to realize that it only works for demultiplexed reads as the error output was:

There was a problem importing /Users/ega/Augmentin/pe-64-manifest:

/Users/ega/Augmentin/pe-64-manifest is not a(n) PairedEndFastqManifestPhred64V2 file:

Does this also validate that my reads are multiplexed? (pardon me if this sounds like a redundant question)

I also tried the EMP multiplexed paired-end protocol, of which requires 3 files, namely, forward, reverse and barcode sequences. And I am unsure how to 'merge' all the forward reads from different samples into a single file as the tutorial comes with ready-to-import forward, reverse and barcode sequences.

The sequencing files are in the respective folders shown here:

These are the files shown in S1D0:

Barcode sequence is also provided by the sequencing company in a .xls file:

These are the notes provided by the sequencing company:

Hence, from the above notes, _1.fq.gz and _2.fq.gz files seems to have their barcodes and primer sequences removed.

Do i have to 'merge' all the forward and reverse reads from different samples into a single forward and reverse file ? (like how the attacama tutorial does it)
How do i create a barcodes.fast.qz from the .xls file (if it is even required at ll)
If so how do i do it. If not, what are some alternatives to import the sequences given my specific circumstances.

Will appreciate all advice and feel free to ask clarifying questions for a better understanding of my current situation.

jwdebelius · November 7, 2019, 4:47pm

HI @Chang_Ega,

Welcome to the :qiime2: forum!

It looks like your reads are already demultiplexed! So, a manifest format would be appropriate for you. Your error message said you didn't have a paired end manifest - that doesn't mean that your data isn't already demultiplexed, it means that your manifest may not be formatted correctly, so let's work through that problem!

It also (potentially) looks like some quality control has been done to your reads. That may limit how you're able to process them, so double check with your sequencing provider to see whether thats true or not.

Best,
Justine

Chang_Ega · November 8, 2019, 4:02pm

Evening @jwdebelius,
Thank you addressing my questions. I am still facing the same issue with the following error output:

There was a problem importing pe-64-manifest:

pe-64-manifest is not a(n) PairedEndFastqManifestPhred64V2 file:

Filepath on line 9 and column "forward-absolute-filepath" could not be found (/Users/ega/Augmentin/SequencingData/00.RawData/S2D2FDMP19H002103-1a_L1_S2D2_1.fq.gz) for sample "S2D2".

This was the code that resulted in the error above:

qiime tools import \
> --type 'SampleData[PairedEndSequencesWithQuality]' \
> --input-path /Users/ega/Augmentin/SequencingData/pe-64-manifest \
> --output-path /Users/ega/Augmentin/SequencingData/paired-end-demux.qza \
> --input-format PairedEndFastqManifestPhred64V2

The output for Phred33V2 was also the same as the above-mentioned one.
I did up the manifest file on googlesheets and downloaded it as .tsv file, here is how it looks like before downloading and moving it into SequencingData directory :

For clarity, this is how the files are arranged in my home directory:

And yes, you are right about the quality control that has been done and have already raised this to the sequencing provider as to what were the exact QC measures that were performed. I hope this helps you to gain a better understanding of the current problem that I am struggling with.

thermokarst · November 8, 2019, 4:05pm

It looks to me like you might have a typo here. Shouldn't the filepath be (based on your screenshots):

/Users/ega/Augmentin/SequencingData/00.RawData/S2D2/FDMP19H002103-1a_L1_S2D2_1.fq.gz

? Note the S2D2/FDMP... vs S2D2FDMP....

Chang_Ega · November 9, 2019, 10:15am

Afternoon @thermokarst,
Yes you were right about the typo. (Pardon my carelessness)
I managed to successfully import the data using PairedEndManifestPhred33V2 format

On the side note, i am curious as to why this output was generated when i tried importing using Phred64V2 format:

An unexpected error has occurred:

Decoded Phred score is out of range [0, 62].

See above for debug info.

Does this mean that my the quality score of my reads are not within the acceptable range ?
Many thanks for getting back to my queries and pinpointing the typo error, appreciate the guidance!

ben · November 9, 2019, 3:53pm

Change phred score to 33 and re-run! Ben

system · December 10, 2019, 9:53pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.