Comparing datasets with non-overlapping amplicons

PatoUru · March 30, 2019, 9:34am

Hi all,
I’m doing a meta-analysis that includes set of data that were amplified with different set of primers (For example V1-V2, V3-V4). In qiime 1, I had analyzed them using the open reference method, but now I’m a bit lost. Added to this, instead of using OTUs clustering, I want to use the DADA2 generating ASVPreformatted text.
So far I made the following 2 steps
qiime tools import
qiime cutadapt demux-single

And here is my question, what would be the next step to generate ASV from sequences coming from different sets of data?
Should I continue with “qiime dada2 denoise-single” for each set of data and then concatenate the generated files?
Thanks for your help!!
Patricia

jwdebelius · March 30, 2019, 9:38am

Hi @PatoUru,

Your next step is to used closed-reference based clustering with vsearch against the reference database of your choice. Because you’re dealing with non-overlapping regions, a de novo approach won’t work because the OTUs will all be different and a traditional MAFTT tree won’t cluster them together so you’ll lose in both pythogenetic and non-phylogenetic community structures because there’s not point of comparison. Closed ref clustering at 97% or 99% isn’t perfect; there are things you’ll miss, but it actually puts your samples on a level playing field and means that you can make comparisons.

Best,
Justine

PatoUru · March 31, 2019, 3:14am

Hi Justin,
I am very grateful for your quick response. I followed your suggestion, and found that for my case it would be good to use the open_reference method.
But now I have a new problem, I would like to use SILVA as a database, and in particular: silva_132_97_16S.fna

From what I was reading in the forums, the database should not be aligned, and I have to import it to be able to use it in open_reference, but when I import it, it throws me the following error

$ qiime tools import --type SampleData [Sequences] --input-path /home/patricia/Documentos/Silva_132_release/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/97/silva_132_97_16S.fna --output-path / home / patricia / Data_base / silva_132_97_16S. qza

There was a problem importing /home/patricia/Documentos/Silva_132_release/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/97/silva_132_97_16S.fna:
** /home/patricia/Documentos/Silva_132_release/SILVA_132_QIIME_release/rep_set/rep_set_16S_only/97/silva_132_97_16S.fna is not a (n) QIIME1DemuxFormat file**

Surely I was wrong in choosing the --type option,
What would be the correct way to import this reference database?
Many thanks

jwdebelius · March 31, 2019, 7:41am

Hi @PatoUru,

It may be better, but it’s still not the best practice because it means that your OTUs are going to be driven by region and so aren’t comparable, like I explained above.

In terms of importing your reference dataset, have you checked out the Training Classifers tutorial? I think your issue is a semantic type one, and I know the tutorials are always help me figure out what semantic types I should be using for imports.

Best,
Justine

system · May 2, 2019, 5:21am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.