q2-dada2 -p-pooling-method TRUE for large number of samples then move to R


I have many samples (~650) sequenced by PacBio platform. RStudio can not process dada function due to the memory limit of the software. The total size of the samples is 30 GB and I need to use pool=TRUE parameter. So, I thought about splitting the script between qiime2 and RStudio, where the first part up to qiime dada2 denoise-ccs --i-demultiplexed-seqs ./samples.qza \ will be executed in Qiime2, then export and convert the output qza into rds to complete the rest of the steps up to phyloseq object in RStudio.
My questions are:
1- Could I use pool = TRUE in qiime2 as follows:

qiime dada2 denoise-ccs --i-demultiplexed-seqs ./samples.qza \
 --o-table dada2-ccs_table.qza \
 --o-representative-sequences dada2-ccs_rep.qza \
 --o-denoising-stats dada2-ccs_stats.qza \
 --p-min-len 1300 --p-max-len 1600 \
 --p-pooling-method TRUE \
 --p-n-threads 8 \

2- How can I convert ASV-table and rep-seq qza into rds to assign the taxonomy, construct the tree, and eventually create phyloseq object in R (I already have the R codes but need to convert the files)? Should I export the files as biom format then convert them into csv then into rds to complete DADA2 R script?


Hi @Eman,

No that is not quite correct. There are two choices for --p-pool-method independent and pseudo). Below is the docs for the --p-pooling-method parameter. Can you explain why you need to use the pooling-method? Does the psuedo pooling sound like what you need?

The method used to pool samples for denoising.
"independent": Samples are denoised indpendently.
"pseudo": The pseudo-pooling method is used to
approximate pooling of samples. In short, samples are
denoised independently once, ASVs detected in at
least 2 samples are recorded, and samples are
denoised independently a second time, but this time
with prior knowledge of the recorded ASVs and thus
higher sensitivity to those ASVs.

This seems like a reasonable approach to me!

Can you explain why you need to use the pooling-method?
I tested both pool and pseudo on different subsets of samples where I found small/minor differences in the resultant taxonomy, particularly, these samples were dissected from different tissues and genotypes where tracking the vertical microbiome transmission is the main goal of the study. So, any additional common sequences across tissues would have a meaning in the analyses.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.