How to denoise, merge technical replicates, then remove chimeras

I have two MiSeq runs of 240 samples each that act as technical replicates. To analyse, I want to denoise the runs separately, compare them, merge them, and finally remove chimeras. I’m using the dada2 plugin with qiime2 2019.4 with bash.

The only way I can think to do this involves using denoise-paired twice:

 > ##Denoising
 > # denoise run1
 > qiime dada2 denoise-paired \
 >   --i-demultiplexed-seqs $Outputs/paired_end_demux_run1.qza \
 >   --p-trim-left-f 32 \
 >   --p-trunc-len-f 260 \
 >   --p-trim-left-r 31 \
 >   --p-trunc-len-r 260 \
 >   --p-chimera-method none \
 >   --p-n-threads 0 \
 >   --o-representative-sequences $Outputs/rep_seqs_dada2_run1.qza \
 >   --o-table $Outputs/table_dada2_run1.qza \
 >   --o-denoising-stats $Outputs/stats-dada2_run1.qza \
 >   --verbose
 > 
 > #denoise run2
 > qiime dada2 denoise-paired \
 >   --i-demultiplexed-seqs $Outputs/paired_end_demux_run2.qza \
 >   --p-trim-left-f 32 \
 >   --p-trunc-len-f 260 \
 >   --p-trim-left-r 31 \
 >   --p-trunc-len-r 260 \
 >   --p-chimera-method none \
 >   --p-n-threads 0 \
 >   --o-representative-sequences $Outputs/rep_seqs_dada2_run2.qza \
 >   --o-table $Outputs/table_dada2_run2.qza \
 >   --o-denoising-stats $Outputs/stats-dada2_run2.qza \
 >   --verbose
 > 
 > # Summarising and comparing
 > qiime feature-table summarize \
 >   --i-table $Outputs/table_dada2_run1.qza \
 >   --o-visualization $Outputs/table_dada2_run1.qzv \
 >   --m-sample-metadata-file $Metadata_file
 > qiime feature-table summarize \
 >   --i-table $Outputs/table_dada2_run2.qza \
 >   --o-visualization $Outputs/table_dada2_run2.qzv \
 >   --m-sample-metadata-file $Metadata_file
 > 
 > qiime feature-table tabulate-seqs \
 >   --i-data $Outputs/rep_seqs_dada2_run1.qza \
 >   --o-visualization $Outputs/rep_seqs_dada2_run1.qzv
 > qiime feature-table tabulate-seqs \
 >   --i-data $Outputs/rep_seqs_dada2_run2.qza \
 >   --o-visualization $Outputs/rep_seqs_dada2_run2.qzv
 > 
 > qiime metadata tabulate \
 >   --m-input-file $Outputs/stats-dada2_run1.qza \
 >   --o-visualization $Outputs/stats-dada2_run1.qzv
 > qiime metadata tabulate \
 >   --m-input-file $Outputs/stats-dada2_run2.qza \
 >   --o-visualization $Outputs/stats-dada2_run2.qzv
 > 
 > qiime quality-control evaluate-composition \
 >    -–i-expected-features table_dada2_run1.qza \
 >    -–i-observed-features table_dada2_run2.qza \
 >    -–p-depth 1 \
 >    -–o-visualization $Outputs/run_comparison.qzv
 > 
 > ## Merging runs
 > qiime feature-table merge \
 >   --i-tables $Outputs/table_dada2_run1.qza \
 >   --i-tables $Outputs/table_dada2_run2.qza \
 >   --o-merged-table $Outputs/table_dada2_both.qza
 > 
 > qiime feature-table merge-seqs \
 >   --i-data $Outputs/rep_seqs_dada2_run1.qza \
 >   --i-data $Outputs/rep_seqs_dada2_run2.qza \
 >   --o-merged-data $Outputs/rep_seqs_dada2_both.qza

But then my merged data is only in FeatureData[Sequence] format rather than the necessary SampleData[PairedEndSequencesWithQuality] to put it back into denoise-paired. Is there another way I can do this?

Thanks!

Hello Holly!

This sounds like a great idea. It's also the recommended way to run dada2!

From A DADA2 workflow for Big Data

DADA2 breaks this quadratic scaling by processing samples independently. This is possible because DADA2 infers exact sequence variants, and exact sequences are consistent labels that can be directly compared across separately processed samples.
(Emphasis mine)


But it sounds like you would like to do something different

compare them, merge them, and finally remove chimeras

I'm not sure if there is an elegant way to do just chimera checking with the dada2 plugin... but you can do reference based chimera checking using the vsearch plugin! You should be able to run that with the rep_seqs_dada2_both.qza from your pipeline above.

Let us know if that works for you!

Colin

Thanks Colin. I need to use DADA2 really - this work will be published with comparisons to papers also using DADA2. The older vsearch plugin won't be as useful and my supervisor wasn't keen when I spoke to them.

(I did try it regardless and got an error about ids:

Some feature ids are present in table, but not in sequences. The set of features in sequences must be identical to the set of features in table. Feature ids present in table but not sequences are:

And a list of letters and numbers not named in my sample ids, but possibly relating to the raw output from the MiSeq runs?)

1 Like

Hello Holly,

I think you are on the right track here. Which command did you run that give you the “Some feature ids are present” error? Maybe the qiime feature-table merge or the qiime feature-table merge-seqs commands?

I know dada2 can do exactly what you are describing, we just need to figure out this error.

Colin

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.