Problems importing paired end sequences

Good morning community,
In the past days, I’ve been struggling with Qiime2 (v2018.6 installed in a conda environment)
As I’ve done in the past, I’ve required the paired-end sequencing (in Ilumina Miseq) of 16s V4 region with F515/R809 primes of several DNAs extracted from saliva samples.
Some of these sequences need to be analyzed with older ones (sequenced in the past summer, that were already analyzed with v2018.6).
I’ve prepared de pe-33-manifest file and started the analysis as usual with the following command:
qiime tools import \

–type ‘SampleData[PairedEndSequencesWithQuality]’ \

–input-path pe-33-manifest \

–output-path paired-end-demux.qza \

–source-format PairedEndFastqManifestPhred33

and then performed the read joining since I want to perform denoising with deblur (as I’ve done in the past with any problems):

qiime vsearch join-pairs \

–i-demultiplexed-seqs demux.qza \

–o-joined-sequences demux-joined.qza

when I opened the demux-joined.qzv I realized that for some reason all the new seqs have lack of coverage (which does not match the counts that I have from the sequencing company, nor the sequences present in fastq.gz files).

Yesterday I was trying to understand with a colleague of mine (an experienced computer engineer) what might be happening, and we realized that, contrarily to older sequences, for some reason that we were not able to figure out yet, when paired-end-demux.qza is produced, only the forward sequence of the new seq is being considered.
I’ve already checked the manifest and it’s okay, and all the seqs (new and old) have phred33 tag
in this link (https://meocloud.pt/link/1f642518-09e8-4576-8748-1b155570e5fe/Qiime2/) I have all the files produced and a pair of sequences from this batch (P101_FU1) and a pair of sequences from the older batch (P10_M8).
Does it already happen to someone?
can you help me??

Hi @smd,

Can you send an example for R1 and R2?
If I understand you correctly, and you get the same read in R1 and R2 files, you should contact your provider. They may able to clarify the point for you.

Luca

I Luca,
I think the read its not the same in R1 and R2, but I leave you here an example. (I'm also waiting for answers from my provider)
what is happening is when the paired-end-demux.qza file is constructed, the program is just using R1 file, and I can't understand why.P101_FU1_phred33_R1_001.fastq.gz (4.6 MB) P101_FU1_phred33_R2_001.fastq.gz (5.4 MB)

Hi,
I did a quick test joining your sequences with vsearch in qiime2-2019.10.
The result is attached. demux-joined.qzv (279.5 KB)

I used:
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path man2.txt --input-format PairedEndFastqManifestPhred33 --output-path demux-paired-end.qza

qiime vsearch join-pairs --i-demultiplexed-seqs demux-paired-end.qza --o-joined-sequences demux-joined.qza

qiime demux summarize --i-data demux-joined.qza --o-visualization demux-joined.qzv

The manifest file is as follow:
sample-id,absolute-filepath,direction
s1,$PWD/P101_FU1_phred33_R1_001.fastq.gz,forward
s1,$PWD/P101_FU1_phred33_R2_001.fastq.gz,reverse

Did the joining worked for you using these files?
Did you try the latest qiime2?

Luca

2 Likes

sorry, here you have R1 and R2 3R443_M3_phred33_R1_001.fastq.gz (5.5 MB) 3R443_M3_phred33_R2_001.fastq.gz (5.5 MB)

I didn't try the latest Qiime2 yet (I'm still using v2018.6)
but you obtained exactly the same result as me 99 reads. My provider data on counts refer that this pair should have 28,710 reads.
Now I've uploaded here R1 and R2 from a sample sequenced in the last summer. if you test the same commands, you will get 38,143 reads after joining the pairs.3R443_M3_phred33_R1_001.fastq.gz (5.5 MB) [3R443_M3_phred33_R2_001.fastq.gz|attachment]
(upload://v2Gp0BXLU3J11fhiB1BdKdhydwW.gz) (5.5 MB)

Due to your previous answer I've opened the R1 and R2 .fastq sequences, to observe if they were the same, and they are not. then I've opened the files of the older sequences that I'm sending to you now, and compared them both. I've realized that the new seqs "P101_FU1" include characters like (,) and * and don't have H. I'm a biomedical scientist with not much experience in Bioinformatics. I belive that these changes in sequences mean that the sequencing protocol was different, but I'm not able to figure out what protocol should I follow to import the sequences and produced the qza file for Qiime2 analysis. Do you have any clue?
Thank you in advance,
Sara

Hi Sara,
I’m confused now…
The P101_R1 and P101_R2 files contain 32,481 sequences, they seems raw reads (no adapters or low quality tail trimmed), true I got only 99 sequence paired (sorry did not check earlier …).
What is interesting, I tried to merge them with pear and I got 32,339 sequence merged. I wonder if there is any weird character in the fastq …
:face_with_monocle::thinking:

I’ll try tomorrow with the second 3R443.
Best wishes,
Luca

Hi again Luca,
sorry, is too much information at the same time xD
as I sayed before, I belive that these characters “(” “)” “*” are the guilty ones!!!
Thank you so much for the help Luca, I’m going to talk with my provider!!!

Hi, sorry I messed that,
make sense to me wait for their response, however if you need a quick alternative you could join outside qiime2 (pear I like but many other tools available …), then import as single reads with quality and denoise with deblur as you planning to do.

Luca

Thank you so much for the help Luca

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.