Introducing Greengenes2 2022.10

Hi @liangyong19491001,

Most 515F V4 ASVs will start with TAC.... If following the EMP protocol, the fwd primer will not be present. If following a variant of the EMP protocol, the fwd primer may still exist. If your ASV do not tend to start with TAC, then the left trim may be incorrect. The filter-features command is performing an exact match, so if the ASVs are off by even a single nucleotide, they will not be found.

Best,
Daniel

Hi @liangyong19491001,

In practice for EMP V4 data, we've seen only a small number of ASVs not hit and they tend to be singletons. The ASV set represented here spans > 300,000 public and private microbiome samples from a large number of environments.

We are working on adapting DEPP so users can place their own fragments, but that is not available yet.

All the best,
Daniel

Many thanks for your detail answer! In my previous test, the ASV was started with the 515F primer, which starts with GTG, cause I have seen your answer from Greengenes2 taxonomy-from-table error - #9 by cannon.320, which says it should keep the 515F primer. Thanks again and I will test it tomorrow.

Hi @liangyong19491001,

You do not want to keep the 515F primer, I apologize if that prior thread suggested that but I don't immediately see where. The 515F primer does start with GTG: GTGYCAGCMGCCGCGGTAA

Best,
Daniel

Thank you @wasade , in the prior thread it say not trim the 5’ and I misunderstand as keep the 515F primer :grin:

Hi!
Thank you for the tutorial. I have long nanopore reads (>1200). So I used this command to classify the reads:

$ qiime greengenes2 non-v4-16s \
>    --i-table icu.biom.qza \
>    --i-sequences icu.fna.qza \
>    --i-backbone 2022.10.backbone.full-length.fna.qza \
>    --o-mapped-table icu.gg2.biom.qza \
>    --o-representatives icu.gg2.fna.qza


$ qiime greengenes2 taxonomy-from-table \
>     --i-reference-taxonomy 2022.10.taxonomy.asv.nwk.qza \
>     --i-table icu.gg2.biom.qza \
>     --o-classification icu.gg2.taxonomy.qza

With my own input files (backbone is the same).

By default, --p-perc-identity is set to 0.99. I run the same command with adjusted values 0.85, 0.9 and 0.97 to account for high errors rate in the nanopore data. Of course, I am getting the most "beautiful" results with 0.85, but I am afraid that it contains to much of false positives. Could you advise which of the thresholds is the best for working with nanopore data? Unfortunately, I don't have standards in the dataset to compare.

3 Likes

Hi @timanix,

I'm not sure what type of data may exist to inform what a reasonable similarity threshold is for these type of data. If you have positive controls, it may be feasible to estimate similarity threshold by aligning against the known 16S. If you do not have positive controls, then I wonder if it could be estimated by examining divergence relative to invariant positions in 16S, although we don't have that type of detail readily available within Greengenes2 right now.

If you vary the similarity threshold, say from 0.85 to 0.99, do the biological conclusions derived or sample-sample relationships change?

Best,
Daniel

1 Like

Hi @wasade
Thank you for the reply!
Unfortunately, in the dataset I am working with there are no positive controls or standards sequenced. I saw in the literature that 85% of Identity is used for VSEARCH for taxonomy annotation of nanopore data, but just in case I decided to check here if there are some recommendations.

I have it for 85%, 90% and 97%. There are differences in the number of sequences retained/annotated after running gg2 plugin. I can share them if you are interested and would like a better overview of the differences.

1 Like

An off-topic reply has been split into a new topic: Taxonomy filtering Greengenes2

Please keep replies on-topic in the future.

Hi @timanix,

Sorry for the delay in reply. With the results, what I'm curious about specifically are whether e.g., PERMANOVA statistics for variables of interest differ depending on the threshold used. If they do not, then it suggests the biological signal being tested is robust to this threshold. Does that make sense?

Best,
Daniel

1 Like

Hi @wasade
There were almost no differences between 85% and 90% in PCoA plots and PERMANOVA of important variables, while 97% was quite different (not surprising with less than 10% sequences retained). But I tested it with collapsed to species level features since sequences are not original but replaced with sequences from the database, so we collapsed it to species for core-metrics and for the rest, we will go no higher than genus level. We will proceed with 90% for now, but in future we will test it with standarts.

2 Likes

Thank you for the follow up!

Best,
Daniel

2 Likes

An off-topic reply has been split into a new topic: installing and using q2-greengenes plugin

Please keep replies on-topic in the future.

I would like to know if a genus has multiple taxonomic labels like g__Blautia_A_141781 and g__Blautia_A_141770 , should I treat them as separate genera or species for conducting differential analysis? Alternatively, should I remove the nodes like _141770 from the taxonomic labels and merge them into the same genus for the differential analysis?

Hi @bylam,

The taxonomy reflects the phylogeny, and collapsing as proposed would disrupt that relationship. These types of decisions ultimately depend on what question you are asking with a particular analysis, and importantly, whether they matter to the type of interpretations you can draw from the result.

Best,
Daniel

I have a simple question!

Should I utilize sequences that are mapped to the greengeens2 backbone (using the non-v4-16s function) as the input sequences for a Naive Bayes classifier? This classifier is trained on the V3-V4 regions of greengenes2.

Though I've been able to acquire taxonomic information using unmapped sequences, it seems that there's no method to obtain corresponding (with matching ASV names) phylogenetic information.

Hi @Uni,

You could just use the taxonomy of the backbone records. We didn't place V3-V4 ASVs in the phylogeny so there won't be existing coordinates for them. We are working on a way to do the placement for arbitrary fragments but it isn't available just yet.

Best,
Daniel

1 Like

Thank you for your reply.

So, if I use the V3-V4 ASVs, should not I utilize the provided phylogeny file (2022.10.phylogeny.asv.nwk.qza) for phylogenetic analysis?

Are there alternative methods for obtaining phylogeny [roots] when using V3-V4 ASVs with the Greengenes 2 database?

An off-topic reply has been split into a new topic: ow can I specifically built a functional abundance table with picrust2?

Please keep replies on-topic in the future.

Hi @uni,

For ASVs not based on 515f-806r, I would recommend using right now using the non-v4-16s action which performs closed reference OTU picking against the backbone. This would allow use of the phylogeny

Best,
Daniel

1 Like