Feature-table group taking too long

aeriel.belk · February 27, 2020, 3:55pm

Hello!!

I am trying to change my sample IDs after denoising, as is described in this post. However, I cannot get the job to complete, even after running for 24 hours on a remote server. I have a lot of samples (~4,100), but this still seems like an excessive amount of time. Do you have any suggestions for what I might be doing wrong? Or an alternative approach to change my sample IDs? Thanks in advance!

My command:

qiime feature-table group
--i-table 04_artifacts-and-visualizations/combined-16S-table-oldIDs.qza
--p-axis sample
--m-metadata-file 03_metadata/combined-metadata-qiita-simple-16S.txt
--m-metadata-column new_sampleID
--p-mode sum
--o-grouped-table 04_artifacts-and-visualizations/combined-16S-table.qza

jwdebelius · February 27, 2020, 7:04pm

Hi @aeriel.belk,

So, there's part of me that would just suggest you go with what you've got. I say this from experience where my sample name wtih information encoded didn't actually match the encoded data. (We went with the encoded data in the mapping file and just swapped the sample name.)

However, my cheap and cheerful and semi non-qiime approach would be to use the python API, with the major caveat that you break your provenance by doing this and you absolutely need to have the script that runs it. But, I would do something like this in a jupyter notebook:

from pandas import pd
from qiime2 import Artifact

### Loads your mapping file so you can get the new names you want
### to be used in rename
map_ = pd.read_csv('03_metadata/combined-metadata-qiita-simple-16S.txt', dtype=str, sep='\t')
# Use the actual name of your sample name column 
map_.set_index('sample_name', inplace=True)
new_names = map_['new_sampleID'].to_dict()

### Loads your table and extracts as a dataframe
table = Artifact.load(' 04_artifacts-and-visualizations/combined-16S-table-oldIDs.qza')
table2 = table.view(pd.Dataframe)

# Please double check with head that the samples are columns. If they are, 
# use columns=new_names. Otherwise, use index=new_names.
# I'm going to assume that they're 
table2.head()

table2.rename(index=new_names, inplace=True)
# Check your renaming with head
table2.head()

# Throws table two back to an artifact, check your semantic type to be sure... 
# you might have a relative frequency table
renamed_table = Artifact.import_data('FeatureTable[Frequency]', table2, pd.DataFrame)
renamed_table.save('04_artifacts-and-visualizations/combined-16S-table.qza')

This should be faster than summing, but will absolutely break your provenance.

Best,
Justine

aeriel.belk · February 27, 2020, 7:28pm

Wow, thanks!!! I'll give that a shot. Unfortunately, I do need to change my names because at some point I'll need to merge with 18S data, and I need all the sampleIDs to match for that to work correctly.

I really appreciate your help!

system · April 26, 2020, 5:09am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.