How to find most abundant phyla and family in different groups of samples?

danielsebas · October 22, 2019, 3:04pm

Dear Friends,

I have set of 4 water samples. That is, in each set 10 water samples (10 water treatment stages), so in total 40.

I have performed Qiime analysis on each set to find out the microbial compositon.
Now, what I am trying to do is:

For every stage sample across the 4 sets, find the most abundant phyla and family. Which means:

Sample set1        Sample Set2        Sample set3                  Sample set 4
stage 1                  stage 1       same format here             same format here
stage 2                  stage 2
stage 3                 stage 3
.                            .
.                            .
.                            .
stage 10              stage 10

I am looking to find the most abundant phyla and family in each stage across the sample sets. For instance, the most abundant phyla and family in "stage 1" across all sample sets. Can you please let me know how can I do this? And, what statistical analysis I can perform here? Thanks!

colinbrislawn · October 22, 2019, 5:30pm

Hi!

I'm not sure there is a really clean way to do this using Qiime2, but I would look at the qiime feature-table group command. This is another command in the feature-table plugin.

If you grouped by the SetNumber column (replace this with your real column name), you would end up with 4 samples, which you could then summarize at the phylum level to get the top most common phyla.

I've noticed you have been asking questions about late stage analysis and stats in the forums lately, and I think that's really cool. When I get to late stage analysis, I do most of my work in R or Python, because they are more flexible than pipelines like Qiime 2. I'm not sure which scripting languages you have used before, but I think now is a great time to learn!

Colin

danielsebas · October 23, 2019, 3:38pm

Thanks @colinbrislawn. I am little unclear so would really appreciate your further suggestions.

I have performed QIIME analysis on these 4 samples sets separately. The samples sets are:

Sample set 1: DNA single end (10 samples)
Sample set 2: DNA paired end (10 samples)
Sample set 3: cDNA single end (10 samples)
Sample set 4: cDNA paired end (10 samples)

Could you please let me know since I have analyzed these separately, how can I analyze them together using Qiime to find out the abundant phyla/family across each stages in each sample set? Thanks much for your response!

colinbrislawn · October 23, 2019, 4:27pm

Ah OK!

I didn't understand your study design, so I was getting confused. It sounds like these 4 sample sets represent a 2x2 block with single vs paired and DNA vs cDNA. Is that correct?

Are the 10 samples from each sample set all the same?

Are the 10 samples 10 different stages?

Colin

danielsebas · October 24, 2019, 9:13am

Hi @colinbrislawn.

sounds like these 4 sample sets represent a 2x2 block with single vs paired and DNA vs cDNA. Is that correct?

Yes

Are the 10 samples from each sample set all the same?

These are 10 stages in each sample set, so the sequenced data is different (DNa and cDNA), but stages are same.

Are the 10 samples 10 different stages?

Yes

Please let me know how you think this analysis can be achieved. thanks much!

colinbrislawn · October 24, 2019, 1:49pm

Good morning Daniel,

OK! Thanks for answering all my questions! Let's dive in!

Great! Are these 10 independent categories (like 10 drug types) or in order in a continuous series (like 10 timepoints or 10 ph levels)?
There are different stat tests you would use if these are categories or continuous.

Do you know if you want to use DNA or RNA for this analysis? Do you plan on picking one or using both to show contrasting stories?

Colin

danielsebas · October 24, 2019, 3:50pm

Thank you @colinbrislawn:

Are these 10 independent categories (like 10 drug types) or in order in a continuous series (like 10 timepoints or 10 ph levels)?

These are 10 continuous stages. These are ten waster water samples from a treatment plant, so waste-water going through 10 stages of disinfection treatment. Please let me know if this answers your question?

Do you know if you want to use DNA or RNA for this analysis? Do you plan on picking one or using both to show contrasting stories?

Picking both DNA and cDNA to show contrasting stories in terms of different microbial phyla and family abundance in each stage of each sample set (as explained in my previous message). Thanks much

colinbrislawn · October 24, 2019, 4:00pm

OK Great! This sounds like a very interesting study! ->

If you analized your data using dada2, you can merge your feature abundance tables using this plugin:
https://docs.qiime2.org/2019.7/plugins/available/feature-table/merge/

Then you can make graphs with all your samples on a single graph!

One important question is if your sample from these 4 analysis have the same names or different names. If the names are the same, they will merge during the feature-table merge command. If they are different but you do want to combine them for graphing, you can use feature-table group to merge your samples.
https://docs.qiime2.org/2019.7/plugins/available/feature-table/group/

Let me know what you find,
Colin

P.S. Sometimes paired end reads and single ends reads don't mix too well. You might want to try just working with these data sets separately, then choose one to present.

danielsebas · October 29, 2019, 9:32am

Thanks @colinbrislawn. I am having difficulty in understanding how to input multiple .qza feature table files and metadata files for the features to be merged.

My feature tables and metadata files are as below:

DNA paired end:
ww-DNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table2.qza (125.3 KB) DNA_1to11_metadat-paired.tsv (1.2 KB)

DNA single end (V2 and V3 region only):
Ionexpress_1to11_metadat-singleDNA.tsv (1.3 KB) Ionexpress_1to11-dada2-rep-seqs-V3-table4-DNA-single.qza (101.1 KB) Ionexpress_1to11-dada2-rep-seqs-V2-table4-DNA-single.qza (101.1 KB)

cDNA-paired end:
ww-cDNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table.qza (105.5 KB) cDNA_1to11_metadat-paired.tsv (1.2 KB)

cDNA single end:
Ionexpress_1to11-dada2-rep-seqs-table2-cDNA-single.qza (626.2 KB) cDNA_1to11_metadat.tsv (1.2 KB)

I am looking to run the merge command like this:

qiime --i-table ww-cDNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table.qza ww-DNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table2.qza Ionexpress_1to11-dada2-rep-seqs-table2-cDNA-single.qza Ionexpress_1to11-dada2-rep-seqs-V2-table4-DNA-single.qza Ionexpress_1to11-dada2-rep-seqs-V3-table4-DNA-single.qza --p-axis feature --m-metadata-file cDNA_1to11_metadat-paired.tsv DNA_1to11_metadat-paired.tsv cDNA_1to11_metadat-single.tsv Ionexpress_1to11_metadat-singleDNA.tsv --m-metadata-column SampleID --p-mode sum --o-grouped-table grouped.qza --verbose

Could you please let me know if this is how the feature tables can be merged together for DNA and cDNA? thank you for your time.

colinbrislawn · October 29, 2019, 1:35pm

Hello @danielsebas,

I'm glad you were able to import these data sets and are ready to merge.

I would try merging pairs of data sets first, instead of trying to merge all four at once. This will let you see how well different combinations work. Maybe cDNA single + cDNA paired works great, but cDNA single + DNA single does not. I'm not sure what will work best and I'm excited to see what works well for you.

If you get any errors while merging, I'm happy to help fix them.

Colin

danielsebas · October 29, 2019, 2:17pm

Thanks @colinbrislawn.
When merging cdna paired and cdna single, for the metadata files uploaded in previous message, can I concatenate metadata files of both and use the concatenated single metadata file for merging? Thanks much!

colinbrislawn · October 29, 2019, 2:36pm

Hello Daniel,

When I look at the docs for feature-table merge, I don't see the metadata getting passed to the plugin at all.

Once you have merged your tables, I would absolutely combine those sample metadata tables into one file so that Qiime2 knows they are from paired and single reads, respectively.

Colin

danielsebas · October 29, 2019, 2:58pm

Thakns @colinbrislawn. I did this to merge cDNA paired and single qza files:

qiime feature-table merge --i-tables ww-cDNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table.qza Ionexpress_1to11-dada2-rep-seqs-table2-cDNA-single.qza --o-merged-table cDNA-paired-single-merged.qza --verbose

but got this error:

Traceback (most recent call last):
File "/root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_table/_merge.py", line 33, in merge
return tables[0].concat(tables[1:], 'sample')
File "/root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/biom/table.py", line 3361, in concat
raise DisjointIDError("IDs are not disjoint")
biom.exception.DisjointIDError: IDs are not disjoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/q2cli/commands.py", line 311, in call
results = action(**arguments)
File "</root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/decorator.py:decorator-gen-317>", line 2, in merge
File "/root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/root/miniconda2/envs/qiime2-2019.4/lib/python3.6/site-packages/q2_feature_table/_merge.py", line 37, in merge
'provided tables: %s' % ', '.join(overlapping))
ValueError: Same samples are present in some of the provided tables: ww9cDNA, ww8cDNA, ww1cDNA, ww3cDNA, ww7cDNA, ww4cDNA, ww10cDNA, ww2cDNA, ww5cDNA, ww6cDNA

Plugin error from feature-table:

Same samples are present in some of the provided tables: ww9cDNA, ww8cDNA, ww1cDNA, ww3cDNA, ww7cDNA, ww4cDNA, ww10cDNA, ww2cDNA, ww5cDNA, ww6cDNA

See above for debug info.

Says the names of samples is same, can you please let me know what to do in this scenario? Thanks much!

colinbrislawn · October 29, 2019, 4:30pm

I think I made a mistake!

Looks like this is not true! It looks like this command requires that all your samples in both tables have seperate names, so that you can choose to merge them as an optional second step. (It also keeps someone from merging their samples by accident. )

Import your samples again with separate names and try merging again.

Or try merging the cDNA with DNA as it looks like those names are different.

Colin

thermokarst · October 29, 2019, 5:16pm

Only by default. This option can be changed via the --p-overlap-method parameter.

system · November 29, 2019, 11:31pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.