re-analysis different dataset with single end or paired end reads

Dear everyone,
I think I am really in trouble with dealing with the public 16S amplicon dataset. Recently I downloaded some corresponding datasets and plan to re-explore them..
However, I found that different data sets adopted different sequencing methods and the selection of variable regions were different, some were 454 single-ended sequencing files, some were Illumina PE250/300; At the same time, when V3-V6 region was selected to amplification, there was almost no overlap bases between the forward and reverse reads obtained by PE250/PE300, and the percentage of merged sequences was really low (<5%) during vsearch treatment.
So, I want to ask:

  1. Can I just analyze each data set separately to obtain table.qza, rep-seqs.qza, etc., and then use qiime feature-table merge to generate the final feature table and feature sequence .qza file?
  2. If all sequencing data sets must be imported and analyzed together at the same time, can I only use forward.fastq or reverse.fastq when I encounter sequences that cannot be merged?
    Sincerely look forward to answer and help ~
1 Like

Hi @1111

Yes, you can! Its actually probably prefered because dada2 corrects for sequencing run basis so running qc on each run seperately makes sense.

If you are not able to merge, running this with single end reads makes sense!

Hope that helps!
:turtle:

2 Likes

Thanks a million!!!
By the way, I wonder whether the reviewers would accept this approach of analyzing each data set individually.

1 Like

They should!

Denoising each sequencing run separately and then merging the ASV tables is recommended by the DADA2 devs in the Big Data tutorial.

The developer also recommended analyzing each data set individually on GitHub.

should I combine datasets from different sequencing runs before or after running dada2?

After. Just make sure to trim each run to the same gene region (i.e. same trimLeft for merged paired end data, and same trimLeft and truncLen for single-end data) to allow merging later.

1 Like

Wow!!!
Thanks for your so detailed responses. Your answer has been very helpful!!!
I would like to confirm with you that it is OK for me to handle double-ended and single-ended data like this:
--qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-demux.qza
--p-n-threads 16
--p-trim-left-f 10 --p-trim-left-r 10
--p-trunc-len-f 240 --p-trunc-len-r 230"

--qiime dada2 denoise-single
--i-demultiplexed-seqs single-end-demux.qza
--p-n-threads 16
--p-trim-left 12
--p-trunc-len 600

The trimLeft of the paired-ended sequencing data remains consistent for each sequencing dataset, but truncLen can be different; The single-ended data trimLeft and truncLen parameters are same for each sequencing dataset

Here is the key detail:

Just make sure to trim each run to the same gene region

After processing each sequencing runs separately with DADA2, the next step is to merge them together.

But merging only works if the exact same region is on both runs.
Same start. Same end. Same length.

Let me take a look at your data and see if merging after DADA2 is possible.

--p-trim-left-f 10 --p-trim-left-r 10
--p-trunc-len-f 240 --p-trunc-len-r 230

This results in a forward length of 230 and a reverse length of 220
The length of forward and reverse total is 450
(After read overlap / joining / merging, this number can become smaller.)

This results in a combined length of 588.

These two lengths are different, so they cannot be merged.

Different regions were sequenced. Merging is not possible.

2 Likes

Sorry, How can this be done if the prerequisite for merging is that reads stay the same length? Even if they were all paired-end sequencing files, the length of merged reads could not be identical. I'm not sure what you mean by the “exact same region”

Hello!

I struggled to find a figure that shows both the modern EMP primers and what Illumina paired-end sequencing looks like. How about this?


In panel 3, you can see the blue area of the amplicon overlaps.

These overlapping ends are merged by DADA2, vsearch, and other programs.

The important part is that last line of the image, Sequence variant calling. The start, end, and length of that DNA fragment must match.

Does that help answer your question?
I had a hard time finding figures to show this, so maybe I need to make my own!


EDIT:

How can this be done if the prerequisite for merging is that reads stay the same length?

Reads need to be the same length after merging.

If the paired-end reads are longer, merging can bring them down to a matching length.

In your example, the paired-end reads are shorter (less than 450).
You will have to trim and trunc more from the single-end reads to make them match.

1 Like