Hello,
you have helped me with the same question back in 2017 with the older qiime, so apologies in advance for asking this again but cannot seem to do it with qiime2-2020.2.
How can I can basically get the following: matrix of sample x feature frequencies like shown in attached photo.
So just to recap, my dream file would look something like in the picture below (and instead of the feature ID, if I can get the actual sequence).
then i used the convert biom function to convert it to a tsv file and open it in excel…
So now, my excel file gives me the frequency of each feature ID in each biological sample. However, instead of feature ID, I would like to get the actual sequence. How can I do that?
I haven’t tested this yet (but its on my list of things to do today), but basically, I think you need to convert the representative sequences into a table and then add that as metadata. Let me play around with it and I’ll get back to you soon.
The thing I thought was going to work, didn’t work. So, I think I have a solution, but its somewhat inelegant and requires you to do some work in python. There are three paramters in this script you need to change:
table_fp should be the path to your actual table
seq_fp should be the path to your actual sequences
out_fp should be the place you want to save the sequences. This will be a tab-seperated file for excel.
import pandas as pd
from qiime2 import Artifact
table_fp = 'table.qza' # The actual path to your table
seq_fp = 'seq.qza' # The actual path to your sequences
out_fp = 'new_table.tsv' # the actual place you want to save the table with the the sequences
table = Artifact.load(table_fp).view(pd.DataFrame).T
repseq = Artifact.load(seq_fp).view(pd.Series).apply(lambda x: ''.join(str(x))
combined = pd.concat(axis=1, sort=False, objs=[table, repseq.loc[table.index])
combined.rename(columns={0: "representative_sequence"}, inplace=True)
combined.to_csv(out_fp, sep='\t')
You can run it by opening an ipython interpreter on your terminal by typing ipython and then run each line of code, updating your path for the three fp values.
Thanks Justine.
So i tried to do what you said: ran ipython on my terminal window and ran the first command "import pandas as pd" and I got an error message (please see attached).
I was thinking my other option would be just to get the rep-seqs.qza file in excel and then I can just match with sequences/feature IDs from that file with the frequencies in the other file (i.e through R for example). Is there a way I can do that? (convert my rep-seqs.qza file). Please see attached picture of what part of the rep-seqs.qza file I would like to extract).
So, it ran correctly, but this adds the sequence at. the end as metadata. You could take the last sequence column in excel and move it over, or add one line to the end of the python after
@Nicholas_Bokulich also suggested this thread, because he thinks more qiime-o-matically than I do. So, I may have sent you on a wild goose chase (sorry).