Hello,
I have been working through qiime2, and with the help of this forum, I think Ive got most of it down, so thank you! I do have some specific questions about the gneiss tutorial.
what Im trying to do now is 1) limit the OTUs in a specific way, for example, OTUs that are only found in 50% of the samples, I was unable to find a tutorial that said how to do this.
Once I have that done, I would like to perform correlation analysis, and identify species that are found together, with stat significance. I have gone through gneiss, but was unable to generate this type of data, speciifcally, is this possible??.. does that make sense?
I’m working on performing a co-occurrence analysis myself, I have not reached the definitive answers to your questions.
On your question 1, I’d like to add one consideration. How many features you have in total? Currently I’m playing with two different ways of filtering data: a) using a Python script to set the filtering criteria, it outputs a otu table; b) Using Bokulich filtering approach, aka discard any features less than 0.005% of total frequency, I use qiime1 script:
filter_otus_from_otu_table.py -i otu_table.biom -o otu_table.flt.biom --min_count_fraction 0.00005
(There is a filtering plug in as well included if you want to use their filtering settings).
Now, I’d like to add my concern on the filtering, with the hope that @Nicholas_Bokulich or @michael.shaffer my help us. When you filter the dataset, on the assumption you use a method considering the compositional nature of the data for the analysis as SpiecEasi or SCNIC above, you change the total sum of count per sample. My understanding is that this therefore put a bias in the dataset. Is this correct? Do any of you can suggest a method to filter the data with your own criteria but keeping the total of count per sample unchanged?
Check out filter-features, which has a min-samples parameter. Just what you need!
Try out the q2-SCNIC plugin that @llenzi linked to. Sounds like just what you are after.
filter-samples will replicate this functionality in QIIME 2, but you must know the total frequency in your feature table (e.g., use qiime feature-table summarize to easily figure that out). Currently there is not a min_count_fraction parameter.
That's a very interesting question. I do not know the answer, but will say that filtering out features is fairly standard practice for any kind of co-occurrence analysis. This includes all sorts of statistical methods for compositional analysis (e.g., ANCOM), so I am guessing that this is statistically admissible and does not bias the results since the counts of other features are not disturbed (we are working with raw abundances here, not relative abundances). @mortonjt@michael.shaffer do you have any thoughts on this?
Do I need to have the q2-SCNIC folder somewhere specific? sorry for this, I realise its super basic stuff, but I just dont have this skillset, thank you!!
Thanks for trying out q2-SCNIC! The issue here is that you don’t have bioconda in your channel list and SCNIC is available through the bioconda channel. You can add bioconda to your channel list using:
and when I checked the install of bioconda package, that is I tried to install again as I thought maybe it didnt install correctly, it says… ‘bioconda’ already in ‘channels’ list, moving to the bottom… this suggests to me that its been added to my list pf packages…right??
For the effects of filtering on compositionality. I agree it is an interesting question. For SCNIC I am using the SparCC correlation metric by default and so the filtering I provide is naively using the filtering parameters that were used in the original SparCC manuscript. I know that SparCC is using a Dirichlet multinomial distribution in order to estimate true relative abundances and I think this process at least tries to account for this. (I’d recommend digging into the Estimation of component fractions section of the original SparCC paper, it’s pretty interesting (https://doi.org/10.1371/journal.pcbi.1002687)). Ideally the features that are being removed should be so lowly abundant that they do not affect this estimation step.
however when I generate a bar plot from this, there is no data there. I have tried many iterations of this, with out a max, with minimum samples as 1, just to see, and no matter wgat I do, I get a bar plot with nothing ...for example...
When I look at my unfiltered table.qzv, it says that features are only observed in 1 or 2 samples, so clearly if I was filtering things not in 5 everything would go away, and it did. However, when I filter everything observed in at least 1, just to check how the filtering command is working, everything also goes away, which confuses me.
I suppose Im also confused about how a feature is determined. I know from my bar plots that there are many species found in all samples, or in 10 samples, for example, are these species not features? Im very confused by this, sorry for the basic question.
And lastly, how do I convert the output of q2-SCNIC, the corr_net.qza or the membership.qza, fpr cytoscape.
Sorry for all the questions, and thank you so much for the help!
If you filter out any feature observed in at least one sample, that would remove everything! Because if a feature is observed in 0 samples, well, it would not exist in your feature table (and current default behavior for the filtering commands is to drop samples/features with 0 frequency after filtering).
Great question — see this post. Feature can mean any kind of observation in a feature table... in your case I assume feature = sequence variant, but it can also mean taxa if you are using a feature table where taxa are the observations.
No clue — you should consult the cytoscape docs for more details, but maybe @michael.shaffer knows?
The bar plots collapse feature variants based on taxonomic affiliation — so the bar plots have no relation to the total # of SVs.
A feature table of SVs (or OTUs, depending on how you processed your data) is what will be produced by default by dada2, deblur, or OTU clustering methods. You can use qiime taxa collapse to collapse these features based on taxonomic affiliation, producing a new feature table where the features are unique taxonomic groups at whatever taxonomic rank (level) you select.