My data is paired end sequence (length 151 bp both F and R), in dad2 the results are fine getting almost 90% of the sequences. But, the problem is I am getting variation of length 160-252 bp after merge. Is it fine for subsequence analysis?
What is your expected amplicon size? What region are you targettng? It’s possible that these are all real reads, it’s also possible that some of them are contaminants/chimeras etc. Paired-end reads often produce variable size amplicons though they should generally not be that different i.e. 160 vs 252. This is of course assuming you are looking at 16S, fungal data can be much more variable.
What do you see when you blast some of those reads that are outside of your expected target size?
Expected amplicon size is a round 253.
Target region is v4.
Can you tell, if the quality score is better>30, trim or truncate is mandatory? What I see is that, if I trim few bases (around 12), then the mergere sequence length I am getting mostly around 230 bp.
Without trim I am getting mostly 253 bp.
Did you blast those reads to see what they were, that’s really going to help us troubleshoot this.
It probably is not necessary if they are good all the way through, but I don’t think this is an issue in your case.
This makes sense since the
trim cuts bases from the 5’ meaning if you usually had 253 bp reads then removing 2x12 bp would give you reads of about 229 bp long. The
trunc parameter is the one that removes bases from the 3’.
Yes, I chacked the fasta file (there is only one) having length 169 bp, it is correct one, I got 100% similarity with others.
But, I am surprised: I am doing analysis with qiime2 and also check R upto diversity, I found 540 ASVs in qiime2 and 535 ASVs in R. Is it very general case?
In Alpha diversity mesure, for example simpson: the results (index value) are varying. In R, I am getting range: 1.5-4.2 and qiime2 range:2.2-5.3. Which approach (qiime2 or R) I should consider?
Once again, please report the results of your blast search on those short reads (say the one that is 169 bp), you can click on the sequence hyperlink in the visualizer you posted originally. Alternatively, can you share with us that original
.qzv file so we can look through properly. I ask because if those reads outside of our expected range are not hitting against anything resembling bacteria we can simply filter them out and consider them chimera/contaminants.
This seems to be a completely separate question/concern. Could you please start a new thread and clarify exactly what steps you have performed in DADA2 vs Qiime2-dada2. 5 ASVs different between these runs is really not a big issue, it is possible that different dada2 versions are being compared and/or the random seed used between the two runs was also different.
Same as above, could you please start a new thread with much more details as to what it is exactly you are comparing. For example did you rarefy in qiime2 but not in R? Different versions of the Simpson index might be used by these tools, but we would need much more info to sort out properly.
This the qzv file.demux.qzv (292.2 KB)
The results of Blast:
I have one question: How can I get the fasta file of ASVs count by samples from table.qzv? Most of the statistical software need fasta/tsv/txt file?
This is your demux.qzv file, the file we need is the
rep-seqs.qzv, the one where your original post had a screenshot of.
See this section of the exporting tutorial for how to get your feature-table in biom format for other tools.
Thanks for 2nd reply.
Please find rep-seqs.qzv:
rep-seqs.qzv (269.0 KB)
Your reads are actually perfect and no need for concern. They are all at the expected length of 252/253, with the exception of a single read that is 169. Blasting that single feature shows that this is human mitochondria. You can easily just filter this feature and carry on your downstream analysis without any worries.
Many many thanks for your help!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.