Losing studies during closed reference OTU picking


I am trying to do a meta-analysis with 9 different studies. However, when I pass the closed reference OTU picking when SILVA and Grean Gene, I am losing 4 studies by the time I get to the core-metrics, using 1000 sequences per sample.

I think all of the reads from the samples from the 4 studies are not recognized by the closed reference sequences, resulting in no reads in any samples from the 4 missing studies.

Is this normal? Is there another way around this problem?

Thank you,

Hi @cbippert,

Have you tried doing clustering on the individual studies so you can see what they look like? It might help to figure out where you’re losing the data.


Hi Justine,

I don’t really understand clustering the individual studies? Do you mean passing them individually through the closed reference OTU picking? No, I have not done that.

However, all the data was from previous published papers that were able to cluster their samples together.

I ended up just passing it through the regular:

qiime phylogeny align-to-tree-mafft-fasttree
–i-sequences rep-seqs.qza
–o-alignment aligned-rep-seqs.qza
–o-masked-alignment masked-aligned-rep-seqs.qza
–o-tree unrooted-tree.qza
–o-rooted-tree rooted-tree.qza

thinking that it would fail due to the different hypervariable regions, but it ended up working. I don’t know if that is correct in what I did though.


Hi @cbippert,

When you do closed reference OTU picking, you can do it in parallel - meaning that you can pick one sample or 1M at the same time and it doesn’t matter. If you’re having an issue with the studies, then you should troubleshoot on those individually to solve your problem because it won’t affect ht eothers.

I’m a little bit confused by thsi step. If you’re doing close-reference OTU picking, then you should just use the tree associated with the closed reference OTUs. You can import the phylogeny and work from there. So, this seems like a weird step in your pipeline to me. Could you explain it fully?