creating file with same sampling depth from DaDa2 output

Hi all,

Bit of a unique query. I am analysing functional gene sequences. However, I want to analyse the amino acid sequences, not the nucleotide sequences produced by qiime2. Therefore, I will export the representative sequences and frequency table after filtering through the DaDa2 pipeline. Because of this, the sampling depth is not consistent. Is there a way to impose a sampling depth on the frequency table and export the file from qiime2 with the same sampling depth for all samples? I hope I have sufficiently articulated my question here.

Thank you

Hi @chizel,

If I understand correctly, you're using a marker gene that it's 16 rRNA and then you'll take the repseqs and do a nucleic acid translation. (Presumably with plans for awesome phylogeny, etc downstream). If you wanted to analyze the amino acid sequences in qiime, there are totally ways to solve that problem. But, let's work through this issue first.

Yes, there are a lot of ways to solve this problem, if its the problem you need to solve.

  1. Rarefaction (q2-feature-table rarefy). Subsample the data without replacement to a common depth. Generally considered appropriate for alpha diversity, inappropriate for differential abundance, and there are mixed reviews for beta diversity, mostly depending on whether your approach is compositional or not. (There's a fairly compehensive literature about the issue which people have been arguing about for about a decade.) Rarefaction tends to increase the sparsity in the data, which is a problem for compositional work with a psuedocount. (Since you're working wtih sequenced data, I'm going to assume it's compositional and you don't have a way to weight by biomass.)

  2. Relative abundance. (Everything sums to 1! You don't lose rare features!)

  3. Total Sum Scaling. Take the relative abundance and scale it to a common depth. This isn't implemented in qiime2, but could be done relatively easy in a program like excel or a programing language that works with arrays.

I think depending on the work you plan to do, you might be sacrificing information this way. I think of sequencing depth as telling me about estimate percision and maybe whether or not you expect to see rare organisms

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.