Merging 2 Miseq runs - pipeline doubt

Hello,

I already looked at similar forum posts and FMT tutorial, but I really need somebody to clarify if I’m doing something good or totally wrong.
Due to financial reasons I first sent my winter and spring samples (including water and sediment) for sequencing, then summer and autumns samples (also water and sediment combined) - so this is 2 seperate runs. I analyzed 1st and 2nd run seperately.

Then I saw the option of merging on FTM tutorial:
qiime feature-table merge
–i-tables table-1.qza
–i-tables table-2.qza
–o-merged-table table.qza
qiime feature-table merge-seqs
–i-data rep-seqs-1.qza
–i-data rep-seqs-2.qza
–o-merged-data rep-seqs.qza

It was a chance to merge whole year of sampling. So what did was: I copied my table.qza files and also rep-seqs.qza from 1. and 2. qiime2 analysis in new folder. I performed the above mentioned command. I also merged my 2 metadata tables and checked it with Keemei add-on.
At the moment I’m running qiime feature-classifier with silva database on this new (merged) rep-seqs.gza. Essencially, after I get my taxonomy_16S.qza & taxa_barplot.qzv, I’m going through Moving pictures tutorial (diversity analysis) but with merged data. Is that correect way to analyse merged Miseq runs?

Also, I saw option Group samples or features by a metadata category. I see that input file is FeatureTable[Frequency]. Just to be absolutely clear, is that my merged “table.qza”? Because I could group seperately samples of water in 1 year and sediment in 1 year and go throough same procedure (for diversity analysis and taxonomy).

Sorry for the long post and thanks for any information and advice regarding my doubts…

Hi @anamarija,

You have the correct workflow! One thing you need to keep in mind is your features should be comparable, so if you used DADA2 for example, we want to make sure that the ASVs chosen are the same. If you have single-end data, this means identical trim/trunc parameters so that the sequence length matches at the same positions (as defined by your primer). If your data is paired-end, then only the trim params need to match as your reads are merged, so we have the same sequence lengths as long as we start from the same relative positions from our forward and reverse primers between both runs.


Something you can do to verify that you have similar ASVs is to add a metadata column indicating run, and then look at a Jaccard pcoa plot. If you see extreme clustering by run, then either:

  1. you have a technical batch effect (super common)
  2. time of year is a huge effect (probably not the case)

Given that your runs are based on when the samples where taken, and there is likely some seasonal effect, so what you want to see is only a small batch effect. When Jaccard goes wrong, it is terribly obvious (I’ve seen flat planes separated at orthogonal angles) and that means your ASVs don’t have anything in common (terribly unlikely to be true in reality).

Alternatively you can look at your provenence and verify that your DADA2 parameters look like they are creating comperable ASVs according to the rules above.


For feature table group, yes anything that is that type is fine (rarefied is probably not ideal however), so your merged table is completely fine to group with, if that is useful for downstream analysis (certainly not required).

Hopefully that’s helpful!

3 Likes

Hi @ebolyen,

thanks for the super fast and helpful answer! My data is paired-end and the trimming is the same, so it should be fine.
I’ll try to follow everything you recommended. If something strange pops up, I’ll post here.

Thanks again, have a nice day.
:slight_smile:

2 Likes

Good morning :slight_smile: ,

To check with you this part, so in “qiime diversity core-metrics-phylogenetic” command line it would be preferable to omit --p-sampling-depth (in my case that would be around 14 000)?

Thank you!

Hi @anamarija,

Sorry that was a little unclear on my part. What I mean is you should probably group prior to running diversity metrics, as your data will not be rarefied after grouping if you group rarefied data.