Will qiime2 be a good option for large microbiome dataset?

Hi everyone, I am a beginner with qiime2.

My goal is to find out a specific fungi's availability in a large oral microbiome dataset(approximately 7.3 GB) and extract that specific species' genome if it's available.

Is qiime2 a good choice for that? If yes, then for analyzing 7.3 GB data how much computational power I might need to do it within a day? Please give your suggestions. Thank you.

Hello @Arif_Zaman,

Welcome to the forums! :qiime2:

Yes, Qiime2 scales quite well, and many of the upstream data processing steps are parallelized.

Estimating runtime is hard because it depends on the complexity of your samples, even when you know the size and have already optimized all the parameters for your data set. Do you already have some test data you could use for a benchmark?

Are you working with amplicon data (like the ITS gene) or untarged 'shotgun' reads (like a metatranscriptome)?

1 Like

Hi @colinbrislawn ,

Thank you so much for your reply.

these are Illumina Hiseq shotgun reads.

I haven't thought about it. But, thanks for the idea, I will do and get back.

Ah OK! Check out q2-shogun, which is designed for shotgun reads. I'm not sure if that's a perfect fit for extracting a specific species' genome, but it will give you a taxonomic breakdown so you know if related taxa are there.

While the tutorials often feature amplicon data, they also over a good overview of Qiime2 concepts like artifacts and plugins.

Let us know if you have more questions!