report for total ASvs after denoising and mitochondria and chloroplast filtering

lilycrook · November 15, 2020, 11:24am

I have been analysing my 16S rRNA paired-end sequences from four different datasets of cucumber microbiome, and I am now writing my report. I want to give a summary of the total raw paired reads, and the sequences after the denoising step and after filtering of chloroplasts and mitochondria. Is there a way I can obtain these numbers?

would you mind telling me what the number of features and Total frequency (from the figure above) mean?

Also, how can i produce a table like the one below, however also showing pos-filtering of mitochondria and chloroplasts, please.

Cheers,

Lily

vheidrich · November 16, 2020, 12:52am

Hi,

Yes, just use the command qiime feature-table summarize with your final filtered dataset.

The "total frequency" is the total number of counts/reads of your dataset. The "number of features" is the number of different ASVs that were found in your dataset. In other words, your dataset has 691,800 reads spanning 645 different sequences.

Maybe there is a clever way to do it, but I would just manually (Excel/R) append to this table that you are showing the data from the "Interactive sample detail" tab of the qiime feature-table summarize output mentioned above.

Cheers,

ChrisKeefe · November 16, 2020, 6:58pm

Congratulations on finishing your analysis, @lilycrook!
@vheidrich's answer covers your questions pretty well - I just have a couple more breadcrumbs for you. Because most people only use the DADA2 denoising stats for diagnostic work (rather than publications), there aren't links at this time to export most of the tables in that visualization.

To get the denoising stats data without a lot of copy-pasting, you can use qiime tools export. This will, by default, export a .tsv. E.g.

qiime tools export \
--input-path my_dada2_stats_file.qza \
--output-path some_directory_name

There are a couple ways you could tackle getting the per-sample frequencies from your table. If you need a programmatic solution, there are directions in other forum posts on how to export your feature table and make it into a .tsv. You'll still have to do some work with the table to sum the frequencies across features. Python, R, whatever will do this for you in a reproducible way.

If you don't need a programmatic approach, you can just copy-paste the Sample and Feature Count columns from the interactive sample detail page into a spreadsheet - that could be your already-exported dada2-stats, or a new .tsv. If you like excel, you can use an if formula to match sample ids. If you go this route, be careful that your sample-ids are always stored as plain text. If you paste them into a number-formatted column in excel, leading zeros will be dropped and your sample-ids may not match each other or the rest of your data.

Good luck!
Chris

thermokarst · November 16, 2020, 7:51pm

This can be simplified by downloading a CSV of the raw counts from the "Overview" tab of that same visualization:

After filtering out chloroplasts and mitochondria, you can generate a feature-table summarize viz (as @timanix mentioned above) , and extract the "frequency per sample detail" CSV, as shown above in the screenshot I shared.

So, stitching it all together:

Export your DADA2 denoising stats (as @ChrisKeefe showed you, above)
Export your feature table "frequency per sample detail" CSV on the unfiltered table (as I showed above)
Filter chloroplasts and mitochondria (see link above)
Export your filtered feature table "frequency per sample detail" CSV on the unfiltered table (as I showed above)
Merge
a. Merge all of these tables manually (using a spreadsheet tool, for example)
b. If you want to generate a visualization of these results, format as TSVs and run metadata tabulate, specifying all of the TSV files (the merging will automatically handle matching the IDs)

system · December 18, 2020, 1:51am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.