I'm starting my alpha and beta diversity analyses, but getting a little perplexed by sampling depth. It seems my rarefaction curve never plateaus no matter the sampling depth. This is a smaller sample set, but I plan on doing a rarefaction curve for 3 more sample sets and then combining all my sample sets together (~100 samples) and analyzing their alpha/beta diversities all together, as well.
I have a sample that is very much an outlier (over 2mil frequencies!) and some that are somewhat lower in frequency count as well. Should I drop samples with too low of frequency AND the one with the highest frequency count since it's so dissimilar?
I've taken a look at the "moving pictures" tutorial and having issues getting my rarefaction curve to look something like theirs.
If anyone can point me in a direction of what I'm doing wrong (or missing), that would be greatly appreciated!
Some of these samples are super diverse. If you are able to share, what kind of environment did they come from?
How many features are shared between samples?
(I ask because if there is a barcode sneaking into the features themselves, this will inflate diversity. And this would also cause features to be found in only one sample each.)
These are environmental leaf samples from that have been surface sterilized. The environment is quite diverse/dusty chaparral. These are sequences of the fungal ITS1 gene.
I also did quite a bit of filtering on these samples beforehand. Specifically this after dereplication / OTU clustering:
Well, that means that each feature is in >1 samples, so it may not be barcode contamination like I originally suspected.
Yes. Because the barcodes/indexes are different from every sample, if they end up in the feature sequence they will make the same features appear unique to each sample. So instead of having 2k features across 100 samples, (2k features x 100 indexes) will appear as 200k features.
These are sequences of the fungal ITS1 gene.
Thank you for mentioning this.
What paper introduced these primers?
How many features did they see when they tested the primers?
But to answer your question about the high number of features, having over 20k features isn't uncommon in endophyte community research, at least from what I've found, although this sample set with the high frequencies is definitely making me scratch my head to say the least.
Here is the list of observed features for this dataset:
I've checked my pipeline and it doesn't seem I've made a mistake during dereplication either:
After this step I've further filtered the OTU table as outlined above and removed Chimeras using the vsearch plugin. I even went as far as filtering some irrelevant taxa after taxonomy using vsearch and the UNITE eukaryote database.
Any input is helpful and your time is greatly appreciated!
Hi everyone,
I apologize for posting so much on this forum, I'm a newbie at bioinformatics and having issues with my MiSeq paired-end data.
I've been having issues with my alpha-rarefaction curve not plateauing despite trying different sampling depths. I've since backtracked and found some adapter contamination and removed those with Trim Galore, but still getting the same rarefaction issues.
Right now, I've been playing around with the Uchime Denovo chimera removal, after using it including the borderline chimeras. If I don't include borderline chimeric sequences, I have very few OTUs.
Can anyone point me in any kind of direction here? The DADA2 filter ends up filtering out all of my sequences so we're trying not to use that (we want both R1 and R2 files).
So far my pipeline looks like this (after importing and merging via vsearch and dereplicating):
I basically scrapped my whole process and started using Dada2 filter on my R1 reads only. I'm getting much better results. Turns out, I had some Phix contamination from my Illumina run! Thanks for all your help in my previous posts!