My rarefaction plot looks strange.

Negin · October 21, 2019, 3:43pm

Hello,

Can someone help me figure out what that big box is on top of read depth of 0 for the species with the orange line?

Thanks!

Nicholas_Bokulich · October 21, 2019, 4:17pm

Hi @Negin,
The boxplots are showing the quartiles at each sequencing depth, so the simple explanation is that a single sample has a much higher faith_pd score that the other samples at the minimum rarefaction depth, leading to very high variance. The min depth is usually 1, but perhaps you set it to a higher min depth?

Negin · October 22, 2019, 6:36pm

Hi Nicholas,

Thanks for your reply. This is the code I used. I set maximum depth based on the sampling depth I had used for my diversity analysis:

qiime diversity alpha-rarefaction
–i-table qza/v3v4-20190304-table-dada2-p-Phylum-only.qza
–i-phylogeny qza/rooted-tree-v3v4-20190304-rep-seqs-dada2-p-Phylum-only.qza
–p-max-depth 4333
–m-metadata-file metadata/20190304-sample-metadata-v3v4.txt
–o-visualization alpha-rarefaction.qzv

Negin · October 22, 2019, 6:38pm

Doesn’t this boxplot seem to be on the zero?

ben · October 22, 2019, 6:56pm

Yeah, I would also look into how Faith PD is calculated - maybe there’s a denominator or numerator in the Faith PD calculation that doesn’t take into account # sequences @ zero and instead is returning a value for some reason. Ben

Nicholas_Bokulich · October 22, 2019, 7:07pm

It will be at the min-depth, which is 1 by default (not zero).

Check out your phylogeny — faith PD is measuring branch lenght so having a result like this suggests to me that you may have a tree with some really distorted branch lengths, and one sample in the orange group happens to hit that branch. Maybe you are making a de novo alignment with mafft and this is a sequence that really does not align with your other seqs (and hence may be something like host DNA?). I am largely speculating here... examine that tree closely...

ben · October 22, 2019, 7:12pm

I was also going to suggest looking at other alpha diversity metrics to see if she gets that weird spike in the beginning. Also taking a look @ your taxonomy as well. Ben

Negin · October 22, 2019, 7:15pm

I have some large relative abundance of archaeplastida which seem to be plant/fungi from the lemur’s food but I didn’t remove it since it consisted a large portion of all lemur samples (orange line) and I can’t figure out how that much contamination could get into the blood samples, so I left it there. But you are saying that should be only for one sample right? I see this phylum in all of my samples. I will try looking at my tree.

Nicholas_Bokulich · October 22, 2019, 7:18pm

Oh wow interesting. Yeah I recommend removing this unless if you want it there and want it to distort your diversity results. This is almost certainly the non-target outgroup if that orange group is lemur (feces?), since it would need to be both non-target (==poorly aligning) AND abundant enough that you are likely to grab it when subsampling to depth=1. If this is causing distortion here, just imagine what it will do to your other results...

Indeed, that would be a quick and easy way to make sure there aren't unclassified features — if the features aren't classifying it could indicate non-target DNA that would lead to this distorted phylogeny (if you are doing a de novo alignment).

ben · October 22, 2019, 7:19pm

I think that there's the inherent issue with DADA2, by design, if you have eukaryotic DNA it tends to be picked up. You can do a quality control step by filtering for only 16S sequences of interest (this will remove this sequence attributed to archaeplastida) and possibly help w/ the alpha diversity plots.

Moreover, lemurs ... tell us more.

giphy

Negin · October 22, 2019, 7:32pm

I removed all unclassified reads cause they were host DNA. I have blood samples.

ben · October 22, 2019, 7:35pm

Blood from my experience has a poor 16S carriage, we did early experiments like years ago in humans and because there’s so many circulating PMNs that scavenge bacteria DNA (double stranded free DNA doesn’t occur naturally in organisms ) that we did not get a very good 16S signal unless the host was severely ill. Maybe it’s different in lemurs, but I believe mice are very similar to humans too, though, we haven’t run many mice/human blood samples recently. Ben

Negin · October 22, 2019, 7:36pm

I can imagine plants getting into fecal samples but blood? There are many blood microbiome papers now that were able to characterize blood microbes using 16S. V3V4 should be used. Other primers are not good.

Negin · October 22, 2019, 7:38pm

Here is the tree.

Nicholas_Bokulich · October 22, 2019, 8:09pm

Whether it is host DNA or plant DNA, there are many things that could be happening. It could be non-target amplification. It could be cross-contamination. It could be index hopping... a whole host of issues.

It happens, and often it is not your fault, you just need to fix and move on... in this case it is simple, filter out those reads prior to any analyses and hope you have enough reads left over.

dada2 will not filter out non-target DNA by default (another denoisier in QIIME 2, deblur, does), but there are many ways to catch this:

use q2-fragment-insertion for tree building if you can, instead of building a de novo alignment + tree with mafft/fasttree. Anything that does not splice into the reference tree is likely junk.
Use q2-quality-control exclude-seqs to filter anything that does not resemble the reference sequences
Assign taxonomy and filter out anything that does not classify to at least phylum level.

Options 1 and 3 lead to pretty similar results. In practice, I usually do option 1 (since I am building a tree anyway), then do option 3 on top of that (the order could be reversed and may be faster).

system · November 23, 2019, 2:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.