Some questions about switching from QIIME 1.9X to QIIME 2.0

sdpapet · September 15, 2018, 2:57am

Hello, I plan to switch QIIME 1.9 to QIIME 2.0 and I am now learning QIIME 2.0 by myself. I have some questions/conceptions not very clear.

1>In Qiime 1.9, we normally use the concept of OTUs, which is most often used in publications. However, Qiime 2.0 introduced amplicon sequence variants (ASVs), which I have seen in the paper of DADA2. Are these two conceptions exchangeable (OTUs or ASVs)? If not, what is the difference between these two? Why Qiime 2.0 use ASVs instead of OTUs? I suppose ASVs would have some kind of advantages?

2>In Qiime 1.9, we have three different OTU picking methods? closed, open and denovo picking? By contrast, I didn't see the Qiime 2.0 tutorial mentions any of these OTU picking methods. There are several tutorials online (moving pictures, fecal, etc.). All of them are used BLASTn methods, which is similar to pick_closed_OTUs.py? I am not sure if I am right? Does Qiime 2 support other OTU picking methods such as de novo and open? I saw a section called " Clustering sequences into OTUs using q2-vsearch", which seems a solution, but I am just confused why QIIME2.0 doesn't use clustering as the default method.

3> In Qiime 1.9, the default OTUs clustering is 97% similarity level, which is very classical. However, I barely see Qiime 2.0 tutorial mention it. Most of tutorials used DADA2 workflow and quickly generated a feature table (OTU table) and corresponding feature sequences. So, we don't do clustering in Qiime 2.0? If QIIME2.0 still does it, what is the default cut-off.

4>I am also very confused about the databases. All of the tutorial used BLAST against NCBI nt database. As you know, most of us just sequences 16S rRNA gene. In QIIME 1, we have RDP, Silvia and Greengenes, all of which are specific for 16S rRNA. As far as I know, NCBI nt database almost updated every week (it is very large) and I am not sure how QIIME2 could catch up the latest database? What is the default version of nt database in QIIME2? Also, BLAST is not very fast. I suppose it will be really slow if I have a very large sequencing data set. I remember there are several methods to annotate reads in Qiime 1.X (e.g., RDP, uclust, mothur etc.). Do you still support these methods?

Also, if I want to use greengenes or other databases, how I can do it?

5> After I generate a feature table, I can export it to biom format. However, there is no taxonomic name in this table. I can only see the taxa information by building a plot. I am not sure if I did this wrong. If I was correct, why QIIME2 remove the taxa information in the feature table.

Thanks,
~Ben

Nicholas_Bokulich · September 17, 2018, 2:18pm

Hi @sdpapet,
Great questions! I will answer specifics below, but in general I recommend that you see the overview tutorial, which will clarify many of these questions.

No. See the overview tutorial but, better yet, the dada2 paper and this paper to answer all of your questions in question 1. There are many more papers out there on this topic (both benchmarks and reviews) so I recommend a literature search for more information.

All of those methods are available in QIIME 2 and covered in this tutorial.

There is no "default" in QIIME 2... QIIME 2 is a platform containing many different methods. A sort of "choose your own adventure". We do recommend and prefer denoising methods over OTU clustering, and the resources I recommend above will explain why.

Because denoised sequences do not require clustering. See the resources above for more details.

None of the tutorials use BLAST against NCBI for actual taxonomy classification. This is a feature of qiime feature-table summarize, which allows quick lookup of specific features, and is provided for user convenience, NOT for actual taxonomic classification.

Read further down in the tutorials, e.g., here, to see how taxonomy classification is performed. Several classification methods are supported in QIIME 2, and any of those reference databases that you described are usable — we even have pre-trained classifiers and links to compatible reference databases on the data resources page.

Taxonomy information is not stored in feature tables, because doing so makes it ambiguous whether any given feature table is annotated or not with feature metadata. Any method in QIIME 2 that requires taxonomy information (or other feature metadata) requires that information as a separate input. It makes things much more explicit and clear compared to, e.g., qiime1.

See this tutorial to learn how to append this information if you are exporting your feature table for use with other software.

Good luck! I hope that helps.

sdpapet · September 23, 2018, 10:45pm

Hi Nick,

Thanks for your reply. I have been following your suggestions last week. I have solved most of my questions, but I still have some follow-up questions.

1>If I chose the transitional clustering OTU method using Vsearch in QIIME 2.0, can I still do 97% cut-off? Or Is 97% cut-off still the default or everyone used cut-off?

2>I still not quite understand about the this:

" None of the tutorials use BLAST against NCBI for actual taxonomy classification. This is a feature of qiime feature-table summarize , which allows quick lookup of specific features, and is provided for user convenience, NOT for actual taxonomic classification.

__Read further down in the tutorials, e.g., here, to see how taxonomy classification is performed. Several classification methods are supported in QIIME 2, and any of those reference databases that you described are usable — we even have pre-trained classifiers and links to compatible reference databases on the data resources page._"

In the "Moving Pictures" tutorial, It says, " The feature-table tabulate-seqs command will provide a mapping of feature IDs to sequences, and provide links to easily BLAST each sequence against the NCBI nt database."

I have seen the feature IDs, which seems a random letters and numbers combination. I also read the paper you recommended (Exact sequence variants should replace operational taxonomic units in marker-gene data analysis).

For the feature ID of ASV, does this mean each ASV will have a unique feature ID. In future, even if you and I work in different project and microbiome, our datasets can have ASVs with same feature IDs?

Also, I check the tutorial about taxa assignment. " In the next sections we’ll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our FeatureData[Sequence] QIIME 2 artifact. We’ll do that using a pre-trained Naive Bayes classifier and the q2-feature-classifier plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We’ll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy."

Here is what I understand. The default database for taxonomic assignment is still Greengenes? Am I right?
Here are the questions?

A> Why you use 99% OTUs? Instead of 97% OTUs? If you use ASV method, you have to use 99%?
If I use traditional Vsearch cluster OTUs (97%), should I use Greengenes 13_8 97% OTUs, here?

B>Can I use other databases? such RDP? Silva, etc.? If I can, where I can download these database? As you know, if I download these database on official website, it might not be able to use in QIIME? I should also use 99% for other database?

C>Overall, I am confused here why gives up classical 97%

3> I am not sure if you are using Phyloseq (R package) for downstream analyses. I am also a beginner of Phyloseq, but I can import Qiime 1.X biom files in to Phyloseq without any problems. I know I can convert Qiime 2.0 tables into biom format and import to Phyloseq? Is it possible for Phyloseq to read Qiime table, tree etc directly? Anyone here also uses Phyloseq and you are welcome to discuss this.

4> ANCOM method. I am reading the latest tutorial of QIIME2.0 and working on the 08.2018 virtual box. I know the latest version has no longer supported for ANCOM analysis. I am wondering if when did you remove this method( which version of Qiime). I still want to use this method, so I want to find the old version.

Thanks,
Ben

Nicholas_Bokulich · September 24, 2018, 4:28pm

@sdpapet
I should have mentioned in response to your previous post: your post is asking many topically distinct questions. It is fine to ask many questions, but I kindly ask you to ask these in separate topics; that way it is easier for other readers to track questions that may be related to their own questions. Thank you.

Yes, you can still use 97%. I believe --p-percent-identity is the name of the parameter that you will use to adjust this.

again, that is for convenience. We do not recommend using that for actually classifying taxonomy, and further down in the tutorials taxonomy classification is described.

yes, each unique sequence will have its own unique, replicable feature ID; the same sequence observed in another study would have the same ID. There can theoretically be clashes but this would be so exceedingly rare that it is not worth worrying about.

No. There is no default database. We provide pre-trained classifiers for greengenes and SILVA, and others can be formatted to use with QIIME2 as well.

It is much more specific and informative, which is ideal for taxonomy classification.

No

No, the % identity does not need to match. Always use higher clustering % for your taxonomy database as it will always be more informative.

You can use any database you like, as long as it's in the correct format (and we provide pre-trained SILVA classifiers). See the data resources for pre-trained classifiers and links to some compatible database.

You have read the ASV denoising papers; you can see that these methods achieve more accurate results. If you want to do OTU picking, 97% is a fine threshold for clustering.

No, I do not use phyloseq. Contact the phyloseq developers for support with phyloseq.

I do not know what you are talking about. ANCOM is available in the current release of QIIME 2 (in the q2-composition plugin). See here for usage details.

Good luck!

system · October 25, 2018, 10:28pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.