In the last discussion, it was suggested to cluester ASVs into OTU.
Have I only use this command?
qiime vsearch cluster-features-de-novo
--i-table table.qza
--i-sequences rep-seqs.qza
--p-perc-identity 0.99
--o-clustered-table table-dn-99.qza
--o-clustered-sequences rep-seqs-dn-99.qza
Do I need to change my alpha and beta analysis after obtaining OTUs, or just the taxa barplot?
So, I think your issue is similar, but slightly different from the issue in the last post. But, thank you for searching!
One of the major questions is what your analytical goal is. Do you want to
Make a taxonomic bar plot?
Perform alpha and beta diveristy analysis?
Compare the level of resolution needed to identify differentially abundant taxa?
Compare classifiers across taxonomic levels?
Use something like phylofactor for differential abundance?
Just have a name you can talk about when you write your paper?
Something else?
In the post you linked, they were interested in the way in which different levels performed in a classifier, and comparing different resolutions. One of the issues in that post was that the database from the post wasn't phylogenetically coherent. OTU clustering will give you a full taxonomic string as assigned to the reference database as long as the sequence sticks to the reference database. But, you lose any novel diverisity that doesn't stick to the reference.
So, because this is microbiome analysis and our unofficial motto is It Depends , what do you want to do?
I'm evaluating the difference in the bacterial community of the insect gut between the treated and control diets.
The important thing for me is to make a taxonomic bar plot, perform alpha and beta diversity analysis, compare levels, and see the abundance.
I already saw the alpha and beta diversity and taxonomic bar plots. I'm using Greengenes 13_8 after trying Silva, Greengenes 2, and RDP (with Greengenes 13_8, i obtained more classification).
My doubt is how to unify something like this:
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;; ;__
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium;__
Because in my taxa bar plot this make different bars.
So, three thoughts here. I think Im going to organize as expansions, becuase it will be easier.
Organism Names
First, those are different organisms. One is a genus Corynbacterium. The other is a member of o. Actinomycetales that can't be assigned to family or a genus. If they meet your criteria for display (see below) they should be seperate. There's no issue here.
Barplots as displays and their limitations
A bar plot is a display for your data. You're trying to give a high level visual overview. The barplot is not going to necessarily show deferentially abundant taxa, it's going to show the diversity at a high level in your ecosystem. IMO, they're a great sanity check to make sure you didn't screw up your enviroment, and a nice way to illustrate things are very different. They're less good for things like hihglighting differentailly abundant taxa. But, that's a different soap box.
They're key utility is that they're a visual component, and that visual aspect creates limitations. You need your bar segments to be big enough to read and you need the colors to be distinct. My rule of thumb is nothing below 1% relative abundance across all my samples, and no more than 12 groups with a categorical color map.
Why those limits?
I limit to 1% because IMO anything below 1% becomes an unreadable smear. You might play with this and say less is unreadable to you, but that's my personal rule of thumb. No one needs to see an unreadable smear. If I'm interested in lower abundance taxa, I'll make a boxplot of the CLR of those specific organisms of interest. Again, the barplot is a high level overview, not a definitive statistical picture.
The 12 groups or fewer reflects what most humans can distinguish. This number if going to vary based on the quality of peoeple's color vision. I can personally see about 16 shades, my reserach assistant can onlyh tellt he difference in about 8. A well designed categorical colormap tends to adhere to these principals, and only gives you 8-12 options. So, if you have more than the number of acceptable colors shown, you're looping back and trying to figure out which color you're looking at.
I work with human data. In my complex communities (oral, fecal), I tend to display data at the family level or higher. In my nasal or vaginal data, I can usually get away with genus.
Greengenes 13_8 DB
Don't get me wrong, I have a huge fondness for Greengenes 13_8 but it's 10 years out of date. The fact that you're getting more taxonomic assignments may simply mean the classifier is overfitting. The RESCRIPt paper has more details:
IMO = In my opinion Sorry, this has trickled over from some of my other internet spaces. So, in my opinion, you shouldn't show things iwth less than 1% average relative abundance over your samples because it's hard to read relative abundances below that threshhold.
Not directly, unfortunately. You could hide them in the QIIME 2 visualization.
Or, you could use qiime feature-table norm to covert from counts to relative abidance and then qiime taxa collapse to collapse to whatever taxonomic level makes you happy. You can filter features with qiime feature-table filter-features, although Im not sure if there's a mean relative abundance filter.
You can then export the filtered data to your favorite plotting progam, whether it be Excel, ggplot, or something else.
I think it's a good reference DB. I use Silva a lot in my work and I know its common for enviromental samples. However, Silva doesn't curate their species, so I wouldn't trust their species labels. But, I also dont think a species label is necessary or even beneficial for 16S analysis, so that fact that you dont have a species label woudlnt be my concern.
I split the ANCOM discussion off into a new topic.
Yes, but I'm going to do to you what I do to my students, and recommend you use the documentation. Its there to empower you in your analysis.
So, to check the avaliable normalization function and determine what you should use, rather than relying on my memory, you'd run
qiime feature-table --help
in the command line. This will list the avalaible commands. You can then apply that --help flag to any of them, and you'll get all the parameters, option, and in some cases, examples, for the command.
Every second or third command on my terminal is a call to help. There's no exam, and nothing wrong with using the documentation avalaible!
If you don't like reading off the help commands, there's also a the plugin description in the documentation available for a subset of the QIIME 2 plugins, although anything you installed from the library wont be avaliable there.
I'm thinking to filter p abundance 0.01, but what p prevalence do you think that could be right?
Then can I do again taxa barplot and Ancombc with this new abundance-filtered-table, right?
Do you think that I have also to do this command?
qiime feature-table filter-features
--i-table table-no-clor-mit.qza
--p-min-frequency 50
--p-min-samples 4
--o-filtered-table table-filter-abund.qza
Just popping in here with a reminder from our Code of Conduct to please be patient. Each moderator you see responding on the QIIME 2 User forum is doing so on a volunteer basis, and everyone is very busy with their primary jobs/duties/etc. Additionally, please remember that posts which fall under the 'General Discussion' category are not guaranteed a response by a QIIME 2 moderator. Thanks for your understanding!
What would filtering by prevalence do?
Why do you want to filter by prevalence?
Can you find a paper that filters by prevalence? Why does that paper filter by prevalence?
Our Moderator team can not tell you what parameters to use, you have to decide what is best for your analysis. If you tell us what your goals are for the analysis, and what you think a reasonable parameter, moderators might be able to help you check that its reasonable. Moderators can not choose your parameters for you, that's on you to decide.
Yes, you will be able to re-run any analysis with the new filtered table.
This again "Depends"
What are you trying to accomplish? and How do imagine using this command to achieve your goal?
You are right, but I’m a little bit confused and for me these analyses are new so sometimes I don’t understand very well the commands.
Unfortunately none can help me so I ask here my doubts.
I’m sorry for my inadequacy!
I read that it’s suggested to remove all feature that are less than 1% of abundance.
I would like to remove only these, but I think that in this command the “prevalence” is mandatory. Probably I’m wrong!
I already filtered my data with min frequency 50 and min samples 4. But now I would like to have my taxabarplot and ancom bc only with taxa abundance more than 1%.
I am gonna repeat some of my questions because I think working through them would benefit you.
What would filtering by prevalence do?
Why do you want to filter by prevalence?
What does setting a prevalence of .34 do?
We can't tell you a method is 'correct' because
We disagree with each other all the time. (For example, we use many different pipelines )
reviewer 3 is going to criticize any pipeline you choose
Making decisions while uncertain is a big part of being a research scientist and I think you are doing a great job! Don't let the doubt stop you!
Arguing with reviewer 3 is a big part of being a research scientist!