Dad2 Merging Length impact: result:

dasqiime22 · July 29, 2019, 7:34pm

My data is paired end sequence (length 151 bp both F and R), in dad2 the results are fine getting almost 90% of the sequences. But, the problem is I am getting variation of length 160-252 bp after merge. Is it fine for subsequence analysis?

Mehrbod_Estaki · July 29, 2019, 8:51pm

Hi @dasqiime22,
What is your expected amplicon size? What region are you targettng? It’s possible that these are all real reads, it’s also possible that some of them are contaminants/chimeras etc. Paired-end reads often produce variable size amplicons though they should generally not be that different i.e. 160 vs 252. This is of course assuming you are looking at 16S, fungal data can be much more variable.
What do you see when you blast some of those reads that are outside of your expected target size?

dasqiime22 · July 29, 2019, 9:04pm

Expected amplicon size is a round 253.
Target region is v4.
Yes 16s.
Can you tell, if the quality score is better>30, trim or truncate is mandatory? What I see is that, if I trim few bases (around 12), then the mergere sequence length I am getting mostly around 230 bp.
Without trim I am getting mostly 253 bp.
Any suggestion?

Mehrbod_Estaki · July 30, 2019, 1:12am

Hi @dasqiime22,
Did you blast those reads to see what they were, that's really going to help us troubleshoot this.

It probably is not necessary if they are good all the way through, but I don't think this is an issue in your case.

This makes sense since the trim cuts bases from the 5' meaning if you usually had 253 bp reads then removing 2x12 bp would give you reads of about 229 bp long. The trunc parameter is the one that removes bases from the 3'.

dasqiime22 · July 31, 2019, 4:19pm

Yes, I chacked the fasta file (there is only one) having length 169 bp, it is correct one, I got 100% similarity with others.

But, I am surprised: I am doing analysis with qiime2 and also check R upto diversity, I found 540 ASVs in qiime2 and 535 ASVs in R. Is it very general case?

In Alpha diversity mesure, for example simpson: the results (index value) are varying. In R, I am getting range: 1.5-4.2 and qiime2 range:2.2-5.3. Which approach (qiime2 or R) I should consider?

Mehrbod_Estaki · July 31, 2019, 7:57pm

Hi @dasqiime22,
Once again, please report the results of your blast search on those short reads (say the one that is 169 bp), you can click on the sequence hyperlink in the visualizer you posted originally. Alternatively, can you share with us that original .qzv file so we can look through properly. I ask because if those reads outside of our expected range are not hitting against anything resembling bacteria we can simply filter them out and consider them chimera/contaminants.

This seems to be a completely separate question/concern. Could you please start a new thread and clarify exactly what steps you have performed in DADA2 vs Qiime2-dada2. 5 ASVs different between these runs is really not a big issue, it is possible that different dada2 versions are being compared and/or the random seed used between the two runs was also different.

Same as above, could you please start a new thread with much more details as to what it is exactly you are comparing. For example did you rarefy in qiime2 but not in R? Different versions of the Simpson index might be used by these tools, but we would need much more info to sort out properly.

dasqiime22 · July 31, 2019, 8:18pm

This the qzv file.demux.qzv (292.2 KB)

The results of Blast:

I have one question: How can I get the fasta file of ASVs count by samples from table.qzv? Most of the statistical software need fasta/tsv/txt file?

Mehrbod_Estaki · July 31, 2019, 8:56pm

Hi @dasqiime22,
This is your demux.qzv file, the file we need is the rep-seqs.qzv, the one where your original post had a screenshot of.

See this section of the exporting tutorial for how to get your feature-table in biom format for other tools.

dasqiime22 · July 31, 2019, 9:30pm

Thanks for 2nd reply.

Please find rep-seqs.qzv:
rep-seqs.qzv (269.0 KB)

Mehrbod_Estaki · August 1, 2019, 2:06am

Hi @dasqiime22,
Your reads are actually perfect and no need for concern. They are all at the expected length of 252/253, with the exception of a single read that is 169. Blasting that single feature shows that this is human mitochondria. You can easily just filter this feature and carry on your downstream analysis without any worries.

dasqiime22 · August 2, 2019, 11:57pm

Many many thanks for your help!

system · September 3, 2019, 5:57am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.