Hi Qiime community,
I am preparing to run the alpha and beta diversity metrics on my 16S V4 dataset but had a couple questions about the set up to answer the questions of my study. My study consists of sampling bacteria populations in lake water samples across 3 lakes and from 3 size fractions per lake ( whole water, >20um and <20um). Each lake was sampled once a week for several months.
First I would like to look at how the diversity of the bacteria populations vary between lakes and between size fractions which I was planning on running the diversity core-metrics pipeline looking at categorical columns LakeName and Size fraction. I would then like to look at differences between size fractions within each lake, and between lakes within each size fraction which is were I run into my first question: would it be more appropriate to filter the feature table into each lake / size fraction ( have 3 separate lake datasets and 3 separate size fraction datasets) and then rerun the core-metrics pipeline on each sub-dataset or add a third column combining LakeName and SizeFraction in the original core-metrics analysis and then look at the results per group ( for example the size fractions comparisons for each lake)? From what I have read in previous discussions I believe the filtering may be the better option and if so I wanted to check that when rerunning the core-metrics analysis I should provide a new sampling depth for refraction specific to that group of samples. Additionally do I also have to filter the metadata file to each sub-group of samples?
My second question regards diversity correlation to continuous variables. I have both time dependant variables which vary due to season ( i.e. water temperature) and time independent continuous variables ( i.e. plankton abundance). Should the qiime diversity alpha-correlation
/ qiime diversity bioenv commands be used for all non time dependant variables and the longitudinal analyses for time dependant data or since all the data is part of time series the longitudinal analyses should be used for all continuous variables?
Further I am still waiting on results for some of the continuous variables. Is it possible to run the core-metrics pipeline on the variables I have the data for now and then use the rarified table output and an updated metadata file with the new continuous data later with the alpha and beta diversity scripts? or would I have to restart with the core-metric pipeline with the metadata files with all the variables? I was concerned about the samples being rarefied differently between diversity calculations.
Finally my last question is how to work with continuous variables in which not every sample being analyzed has a value. For example for one lake I have no temperature data and for another lake I have temperature data for some dates but not all. In the case of the lake with no temperature data will this just not appear in the comparison? For the lake where some dates temperature is missing will these dates just be ignored? Or is it better to filter the table for samples containing temperature data and re-running the core-metrics pipeline?
I apologize for the long post and any help on all/any of these steps would be much appreciated!