Dear everyone,
I think I am really in trouble with dealing with the public 16S amplicon dataset. Recently I downloaded some corresponding datasets and plan to re-explore them..
However, I found that different data sets adopted different sequencing methods and the selection of variable regions were different, some were 454 single-ended sequencing files, some were Illumina PE250/300; At the same time, when V3-V6 region was selected to amplification, there was almost no overlap bases between the forward and reverse reads obtained by PE250/PE300, and the percentage of merged sequences was really low (<5%) during vsearch treatment.
So, I want to ask:
Can I just analyze each data set separately to obtain table.qza, rep-seqs.qza, etc., and then use qiime feature-table merge to generate the final feature table and feature sequence .qza file?
If all sequencing data sets must be imported and analyzed together at the same time, can I only use forward.fastq or reverse.fastq when I encounter sequences that cannot be merged?
Sincerely look forward to answer and help ~
should I combine datasets from different sequencing runs before or after running dada2?
After. Just make sure to trim each run to the same gene region (i.e. same trimLeft for merged paired end data, and same trimLeft and truncLen for single-end data) to allow merging later.
Wow!!!
Thanks for your so detailed responses. Your answer has been very helpful!!!
I would like to confirm with you that it is OK for me to handle double-ended and single-ended data like this:
--qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-demux.qza
--p-n-threads 16
--p-trim-left-f 10 --p-trim-left-r 10
--p-trunc-len-f 240 --p-trunc-len-r 230"
The trimLeft of the paired-ended sequencing data remains consistent for each sequencing dataset, but truncLen can be different; The single-ended data trimLeft and truncLen parameters are same for each sequencing dataset
This results in a forward length of 230 and a reverse length of 220
The length of forward and reverse total is 450
(After read overlap / joining / merging, this number can become smaller.)
This results in a combined length of 588.
These two lengths are different, so they cannot be merged.
Different regions were sequenced. Merging is not possible.
Sorry, How can this be done if the prerequisite for merging is that reads stay the same length? Even if they were all paired-end sequencing files, the length of merged reads could not be identical. I'm not sure what you mean by the “exact same region”