q2-sidle demultiplexed by region with cutadapt + deblur denoise

elsamdea · June 23, 2021, 10:35am

Hello!

I write this here in case someone else has the same doubt that me.

I am working with the amazing q2-sidle, that let to work with short fragments sequences. (The tutorial's link is: Read Preparation — q2-sidle 2020.8 documentation).

My samples are paired-end, demultiplexed by sample but not by region(s). And the sequencing technology is Mi-Seq (Illumina).

Well, in the tutorial suggest that if the user has more than one region to study, the sequences (demultiplexed) should be demultiplexed again by region. To do that, recommend the command qiime cutadapt trim-paired in which you can eliminate the adapters at the same time you select the region with the choosed primers.

My first question is: Where should I indicate the primers? With the adapters (adapters +primers)?

The specification I have followed are:

the q2-sidle documentation (Read Preparation — q2-sidle 2020.8 documentation).
qiime cutadapt documentation (User guide — Cutadapt 5.0 documentation).

But I do not find anything related to primers inside cutadapt documentation. I tried with the adapters sequence and with the primer sequence (separately). However, I have problems with the denoising when I use the primers in cutadapt.

Also, after that, each sequence should be denoise using deblur (Illumina sequences) or dada2 (ION-TORRENT or pyrosequencing 454); right?

And, the protocol of denoising I follow has the next steps (Alternative methods of read-joining in QIIME 2 — QIIME 2 2021.2.0 documentation):

Joined sequences (I will use deblur and this step is required). Command: qiime vsearch join-pairs.
Sequences quality control quality-filter q-score.
Deblur with deblur denoise-16S.
See feature table obtained with deblur.

Well, when I follow the order (cutadapt + deblur steps), I cannot pass of step 2. And, when I look the plots:
The boxplot obtain after join steps is:

But, after the quality control:

I think I add all the relevant information, I am new with qiime and I am a bit lost yet.

Thank you for your help!

Elsa

jwdebelius · June 23, 2021, 2:34pm

Hi @elsamdea,

Let me see if I can help!

You should indicate the primers with the --p-front-f and --p-front-r flags. (Sorry, I just discovered the mistake in the docs and opened an issue to correct it.) So, the command should be

qiime cutadapt trim-paired \
 --i-demultiplexed-sequences [demultiplexed multi region sequences] \
 --p-front-f [forward primer] \
 --p-front-r [reverse primer] \
 --p-error-rate 0.1 \
 --p-indels \
 --p-discard-untrimmed \
 --o-trimmed-sequences [demultiplexed regional file]

Where the [thing in brackets] represents your file or primer.

You're welcome to use either Deblur or DADA2 with Illumina data; there's pros and cons to both. Your bigger issue looks like the quality plots, though. Your reads are generally low quality and not passing the filter. Would you be willing to share the plots from the data from all regions?

PS. I moved this to the community plugin support column

Best,
Justine

elsamdea · June 25, 2021, 8:16am

Hi @jwdebelius!

Thank you for your help!! It was really useful!

I also did not know that I can use either Deblur or DADA2. I will prove with DADA2, maybe the results will be a bit better.

The next boxplot represents the forward and reverse sequences imported as:
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-format as CasavaOneEightSingleLanePerSampleDirFmt

This boxplot shows a total length of 149-150 nucleotides in each sequences reads and also has a length of 114281 nucleotides.

And also the boxplot related to joined-sequences (which shows 79581 reads):

And the Forward Reads Frequency Histogram, in case it can help. The first one correspond to not-joined-paired-end sequences and the second one to joined-sequences:

Also, I think it is neccesary to know that the sequences are from parafined tissues extracted DNA.

Greetings,
Elsa

jwdebelius · June 25, 2021, 2:01pm

Hi @elsamdea,

Thank you for showing that error profile. With one sample, it will be best to stick with deblur, You might also consider trying only your forward reads to see if those make it through quality filtering more eassily.

However, there are some quality issues with your sequences starting out and some weird places you get low quality reads. I haven't worked with enough parafin embedded samples to know if this is a problem with some kind of inhibitors in the DNA (although it would surprise me if they made it into the sequencer instead of being tackled in PCR) or if they are the result of bad sequencing. But, this is far from a normal illumina profile, which may be the source of some of the problems.

Best,
Justine

elsamdea · July 1, 2021, 8:30am

Hi @jwdebelius,

You are right, the quality of the samples drop in weird places. But I am not sure about how to proceed now.

I have read that maybe I can only use forward reads (as you suggested), but there is not anything else I can do?

Anyone who has worked with PPFE samples could help?

In any case, if I find how to fix this quality error, I will post it.

Thank you!

Elsa

jwdebelius · July 1, 2021, 4:10pm

Hi @elsamdea,

I know the other thread (linked for posterity) has ended up on this topic. We try to avoid cross posting on the forum - something I should have paid better attention to - so I'm going to try and answer here.

I think your next step is to check your denoising stats and see where the issue is. (I'm guessing quality filtering, but its always good to check!)

I would also strongly consider contacting your sequencing provider and asking about the full run to understand if this is specific to your samples, or has shown up other places.

Best,
Justine

system · August 1, 2021, 10:11pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.