Preparing data for analysis with dada2 denoising


I am trying to optimize my protocol for microbiome analysis using qiime2 2018.8.
I have data from Illumina Miseq PE 2x250 sequencing, presented in fastq format.
I uploaded reads in casava format. Next thing I have done was to denoise the reads using dada2 pipeline. I wanted also to filter out the reads using:

qiime feature-table filter-features
--i-table table-dada2-LB17_16.qza
--p-min-frequency 100
--p-min-samples 10
--o-filtered-table table-dada2-LB17_16-filtered.qza

Is that correct? Can I do so? What are the optimal rules for min frequency and min samples? I guess it all depends on samples but was my choice here correct?

table-dada2-LB17_16.qza (143.0 KB)

Obtained Feature Table and Feature Data I used in differential abundance analysis with gneiss. My aim was to generate proper heatmap.

I have several questions concerning the pipeline:

  1. The error:
    qiime gneiss ols-regression
    --p-formula "Group+Sex+Description"
    --i-table balanses-LB17_16-filtered.qza
    --i-tree hierarchy-LB17_16-filtered.qza
    --m-metadata-file metadata-LB17_16.tsv
    --o-visualization regression_summary-LB17_16-filtered.qzv
    Plugin error from gneiss:

cannot convert float NaN to integer

Debug info has been saved to /tmp/qiime2-q2cli-err-2gou3btt.log

Metadata is here:

metadata-LB17_16.tsv (3.3 KB)

Yet, when I change only to one parameter -p-formula it passes but the outcome is strange.
Is my problem the metadata? I tried to format the file according to rules in tutorial.

  1. The heatmap

I generate it properly with a code:
qiime gneiss dendrogram-heatmap
--i-table composition-LB17-16-filtered.qza
--i-tree hierarchy-LB17_16-filtered.qza
--m-metadata-file metadata-LB17_16.tsv
--m-metadata-column Description
--p-color-map seismic
--o-visualization heatmap-LB17_16.qzv

I would like to modify it slighly eg. elognate it or correct the legend. How to do it? can I do it in qiime or should I use any external software? Can you suggest me something?

I would be very grateful for your support.


It looks like every sample with Description is unique – that by itself is probably causing the error since you won’t be able to calculate variance at all in the regression. So I’d remove that formula from the column.

You have 2 categories in Group and 2 categories in Sex. Ideally, you would only keep microbes that appear 3 times in each of these groups. I chose 3 because that the bare minimum of points you need to fit a reliable line. A quick and dirty solution is to filter out microbes that appear in less than 12 samples (3 samples x 2 groups x 2 categories).

While the filtering criteria may seem arbitrary, there is well-defined bare minimum threshold that you can set, since you will not have the resolution to analyze these infrequently observed microbes.

Regarding samples, I typically default to roughly 1000 reads, because samples below that likely failed during sequencing.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.