A question about meta-analysis using qiime2-DADA2 or qiime1-closed OTU

Moon · November 25, 2020, 7:29am

Dear technical support,

I am doing a meta-anlysis using several 16S data sets from our own work and other studies.

They are surely using different methods of sequencing. Just like different methods in DNA extracting, sequencing region of 16S...

In the past, I would use closed OTU strategy of QIIME1 to diminish the batch effects as much as possible. But now, I have moved all my analytic pipelines to QIIME2. I notice that the DADA2 of QIIME2 is really different from the closed OTU strategy in QIIME1. DADA2 use the single base variant to determine each feature.

Question 1: My question is that is it suitable for us to use merged feature table and merged rep-seqs produced by DADA2 to do the meta-analysis? Or should we use another pipeline of QIIME2 (deblur-closed otu strategy) which might be more similar to the QIIME1 to do the meta-analysis?

In fact I have obtained the results using DADA2, which have really showed evident batch effects. (unweighted-unifrac)

Question 2 I guess it is because the different sequencing region would produce a lot of redundant ASVs in DADA2 pipeline so that the unweighted distance would be really large. However, we know that all the features (ASVs) would be assigned to taxonomic level in the following analysis. In this step, the ASVs though really different (caused by different sequencing region) would still be assigned to the same taxa. So, if we analysis them at a higher taxonomic level (genus...), would batch effects be diminished?

Question 3 In summary, I really want to know the best pipeline for meta-analysis. Can any experienced expert help me ? Many thanks to you!

jwdebelius · November 25, 2020, 4:15pm

Hi @Moon,

To combine regions, in general, you need a good reference and consistent approach. Because ASVs are sequence specific, if you're combining the same region, you need to use the same parameters and if you're combining different regions, you still need a scaffold. Typically, people scaffold their ASVs using an insertion tree. So, if you used an alignment tree, you may have a larger signal. (See the fragment insertion paper for an example).

I would probably just do closed reference clustering of my ASVs here - you get a scaffold and the tree and taxonomy are already calculated. However, I will say that even with OTUs, you'll probably still have a big signal from the hypervariable region.

Best,
Justine

Moon · November 26, 2020, 5:49am

Many thanks to your answers!

By the way, do "the same parameters" include the cut length in DADA2?

jwdebelius · November 26, 2020, 2:03pm

Hi @Moon,

Yes, it means processing the data the same way all the way through: same trimming parameters in cutadapt and same dada2 parameters if you're matching the same region.

Best,
Justine

Moon · December 1, 2020, 12:00pm

Dear Justine,

My data sets contain five batches.

The other four batches : V3-V4

The batch 3: V4-V5

The parameter "--p-min-fold-parent-over-abundance" in DADA2 of all batches is same as 4.
While the parameter of cut length is different among all sets.

Then I performed fragment-insert:

qiime fragment-insertion sepp --i-representative-sequences merged_rep-seqs.qza --i-reference-database sepp-refs-gg-13-8.qza --p-threads 20 --o-tree insertion-tree.qza --o-placements insertion-placements.qza

qiime fragment-insertion filter-features --i-table merged_table.qza --i-tree insertion-tree.qza --o-filtered-table filtered_table.qza --o-removed-table removed_table.qza

Q1: Is it right?.. Will this step diminish the batch effects caused by differences in cut length of DADA2 and sequencing region?

The merged-table.qzv :

The filtered-table.qzv:

The removed-table.qzv:

Q2 Why no feature was filtered...

Moon · December 1, 2020, 12:15pm

Oh! I have observed the similar question you have answered about why no feature was filtered, which is as supposed to be. But I am still confused about the Q1. Many thanks to you!

Moon · December 1, 2020, 12:30pm

Q3. I found that the following taxonomic annotation analysis would not use any results of q2-fragment-insert... Or is there different between gg-13-8-99-515-806-nb-classifier.qza and sepp-refs-gg-13-8.qza

qiime feature-classifier classify-sklearn --i-classifier gg-13-8-99-515-806-nb-classifier.qza --i-reads merged_rep-seqs.qza --o-classification taxonomy.qza

qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv

##barplot
qiime taxa barplot --i-table filtered_table.qza --i-taxonomy taxonomy.qza --m-metadata-file metadata.tsv --o-visualization taxa-bar-plots.qzv

###taxonomy_table
mkdir taxonomy_table
qiime taxa collapse --i-table filtered_table.qza --i-taxonomy taxonomy.qza --p-level 6 --o-collapsed-table ./taxonomy_table/merged_table_l6.qza

Moon · December 1, 2020, 1:14pm

God...

The fragment-insert seemed not diminish the batch-effects...

jwdebelius · December 1, 2020, 4:37pm

Hi @Moon,

Let me try to answer your questions systematically. I just had my first cup of , so my brain is still kinda kicking in and someone else may need to help me out. I'm going to try to work my way backwards and see where it gets me.

Have you checked the underlying disances and/or Adonis? It's possible that it's still the largest effect int he data, but explains a smaller portion of the variation. As I said above,

The hypervariable region is still a complicated one that hasn't been solved.

There is a difference between betwweene the sep-ref-gg-13-8.qza and the gg-13-8-99-515-806-nb-classifier.qza. The most salient one here is that you're not working with the 515-806 region. So, you want the full length classifier; I think it's available on the resources page.

Best,
Justine

Moon · December 2, 2020, 1:11am

Oh! Dear Justine, your answers really help me a lot!

jwdebelius · December 2, 2020, 5:31pm

A post was split to a new topic: Table Filtering with Phylogeny

system · January 2, 2021, 11:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.