How do I get a table of frequencies of each feature per sample?

kmkalanetra · October 2, 2017, 8:46pm

Hello,

I am trying to get a relative abundance table similar to summarize_taxa.py in QIIME1 where each sample (and the metadata for each sample is included) has the relative abundance of each taxa. The answers I have seen on the forum get me part way, but not all the way there or maybe I am missing something?

Any help is much appreciated,
Karen

ebolyen · October 2, 2017, 10:27pm

Hi @kmkalanetra,

I think you're looking for qiime taxa collapse. But I'm not sure what you mean by this:

Specifically the metadata part. Could you elaborate a little bit on your goals? What are you looking to do with your data (or what you'd use summarize_taxa.py for)? I suspect you could use qiime feature-table group to get your sample metadata involved in the situation, but I don't know enough about what you are trying to do to say for sure.

Thanks!

kmkalanetra · October 3, 2017, 6:53pm

Hi Evan,

Thank you for your response. I have used qiime taxa collapse, but it does not compile the information exactly the same way as summarize_taxa.py - so if I want a spreadsheet that I can use with something like LEfSe how would I make that in QIIME2? What I'm talking about is a spreadsheet with the relative abundance of each taxa per sample. I could add on the metadata myself so I'm not that concerned about that part. Something like this:

Thank you!
Karen

ebolyen · October 3, 2017, 8:37pm

Hey @kmkalanetra,

I think this post should accomplish what you need! Although you'll probably need to transpose it in your favourite spreadsheet program to match your screenshot. Also if it's important that the data be relative abundance, then you can use qiime feature-table relative-frequency to convert from plain frequency before following the above post.

Let me know if that works for you!

kmkalanetra · October 3, 2017, 10:05pm

Hi,

That worked, but the result doesn't include taxonomic IDs, but OTU IDs. For me, part of the strength of the results of the summariza_taxa.py was the way the taxonomy was broken down. Is there a way to get this information in combination with the feature-table?

Thank you!
Karen

thermokarst · October 4, 2017, 3:41am

Hi @kmkalanetra --- I might be misunderstanding your question, but have you had a chance to look at taxa barplot --- specifically the CSV export?

The CSV looks like this:

index,Unassigned,k__Archaea,k__Bacteria,BarcodeSequence,LinkerPrimerSequence,BodySite,Year,Month,Day,Subject,ReportedAntibioticUsage,DaysSinceExperimentStart,Description
L1S105,0.0,0.0,7865.0,AGTGCGATGCGT,GTGCCAGCMGCCGCGGTAA,gut,2009,3,17,subject-1,No,140,subject-1.gut.2009-3-17
L1S140,0.0,0.0,7245.0,ATGGCAGCTCTA,GTGCCAGCMGCCGCGGTAA,gut,2008,10,28,subject-2,Yes,0,subject-2.gut.2008-10-28
L1S208,0.0,0.0,8270.0,CTGAGATACGCG,GTGCCAGCMGCCGCGGTAA,gut,2009,1,20,subject-2,No,84,subject-2.gut.2009-1-20
L1S257,0.0,0.0,6486.0,CCGACTGAGATG,GTGCCAGCMGCCGCGGTAA,gut,2009,3,17,subject-2,No,140,subject-2.gut.2009-3-17
L1S281,0.0,0.0,6755.0,CCTCTCGTGATC,GTGCCAGCMGCCGCGGTAA,gut,2009,4,14,subject-2,No,168,subject-2.gut.2009-4-14
L1S57,0.0,0.0,8756.0,ACACACTATGGC,GTGCCAGCMGCCGCGGTAA,gut,2009,1,20,subject-1,No,84,subject-1.gut.2009-1-20
L1S76,0.0,0.0,7922.0,ACTACGTGTGGT,GTGCCAGCMGCCGCGGTAA,gut,2009,2,17,subject-1,No,112,subject-1.gut.2009-2-17
L1S8,0.0,0.0,7068.0,AGCTGACTAGTC,GTGCCAGCMGCCGCGGTAA,gut,2008,10,28,subject-1,Yes,0,subject-1.gut.2008-10-28
L2S155,16.0,0.0,4096.0,ACGATGCGACCA,GTGCCAGCMGCCGCGGTAA,left palm,2009,1,20,subject-1,No,84,subject-1.left-palm.2009-1-20
...
...

This looks pretty similar to your first screenshot, except that the counts aren't relative abundances, but you could calculate that in your spreadsheet software (just looking at way to get you moving right away on your problem).

It is worth noting, the CSV is different at each taxonomic level (the barplot viz runs collapse behind the scenes).

Apologies if I missed the mark here. Thanks!

kmkalanetra · October 4, 2017, 4:49am

No, that's exactly what I was looking for. Thank you!

kmkalanetra · October 4, 2017, 4:52am

I do have to say that I still like the taxa format of the summarize_taxa.py better than this way. Just saying........

Thanks for everyone's help,
Karen

jairideout · October 30, 2017, 9:03pm

An off-topic reply has been split into a new topic: Normalization necessary when converting to relative frequencies?

Please keep replies on-topic in the future.

system · December 1, 2017, 3:04am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.