Formatting a QIIME2 feature table for the R version of DADA2?

Sure there is — secret QIIME 2 method. Use qiime feature-table group to relabel.

We have a few examples floating around the forum of using this as a sneaky way to relabel sample IDs, e.g., here:

But group can operate on feature IDs as well, and the same trick applies... additionally, we can transform a FeatureData[Sequence] artifact to metadata so that the sequence gets read in as the new label. Proof (using the moving pics tutorial data and Artifact API):

>>> import qiime2 as q2, pandas as pd
>>> from qiime2.plugins import feature_table as ft
>>> tab = q2.Artifact.load('table.qza')
>>> seqs = q2.Artifact.load('rep-seqs.qza')
>>> t2, = ft.actions.group(tab, 'feature', seqs.view(q2.Metadata).get_column('Sequence'), 'sum')
>>> tab.view(pd.DataFrame).columns
Index(['4b5eeb300368260019c1fbc7a3c718fc', 'fe30ff0f71a38a39cf1717ec2be3a2fc',
       'd29fe3c70564fc0f69f2c03e0d1e5561', '868528ca947bc57b69ffdf83e6b73bae',
       '154709e160e8cada6bfb21115acc80f5', '1d2e5f3444ca750c85302ceee2473331',
       '0305a4993ecf2d8ef4149fdfc7592603', 'cb2fe0146e2fbcb101050edb996a0ee2',
       '997056ba80681bbbdd5d09aa591eadc0', '3c9c437f27aca05f8db167cd080ff1ec',
       ...
       'ad492bcae03f566b36a19e31f04d659a', 'eb8ef4756ed538fe480d979e740a04d8',
       '5db2cf37007f874e25eb2c901917e15a', 'fa3729663b98de0c0af7913e9f30c19e',
       '504572e3afd673db749ee5e8e3e57b97', 'a6b6f29a1196cacfc392e3d71f55e2a2',
       '0e5df3d01cc073e3c9674c2534169f03', '06845c67bc4203081a981200f33e87eb',
       '98d250a339a635f20e26397dafc6ced3', '1830c14ead81ad012f1db0e12f8ab6a4'],
      dtype='object', length=770)
>>> t2.view(pd.DataFrame).columns
Index(['TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTT',
       'TACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGAGCGCAGACGGTTACTTAAGCAGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCGTTCTGAACTGGGTGACTA',
       'TACGTAGGTCCCGAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTAGATAAGTCTGAAGTTAAAGGCTGTGGCTTAACCATAGTACGCTTTGGAAACTGTTTAACTTG',
       'TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATGTCTT',
       'TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGTGGATTGTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGAAACTGGCAGTCTT',
       'TACGGAGGGTGCGAGCGTTAATCGGAATAACTGGGCGTAAAGGGCACGCAGGCGGTGACTTAAGTGAGGTGTGAAAGCCCCGGGCTTAACCTGGGAATTGCATTTCATACTGGGTCGCTA',
       'TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCGGACGCTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGGTGTCTT',
       'TACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTT',
       'TACGTATGTCACAAGCGTTATCCGGATTTATTGGGCGTAAAGCGCGTCTAGGTGGTTATGTAAGTCTGATGTGAAAATGCAGGGCTCAACTCTGTATTGCGTTGGAAACTGCATGACTAG',
       'TACGGAAGGTCCAGGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGGCTGGAGATTAAGTGTGTTGTGAAATGTAGACGCTCAACGTCTGAATTGCAGCGCATACTGGTTTCCTT',
       ...
       'TACGTAGAAGACTAGTGTTATTCATCTTTAATAGGTTTAAAGGGTACCTAGACGGTAAATTTAATCTTTAACAGGATATGTTTTTACTAGAGTTTTATATGAGGAGGGGAGTATTTATGG',
       'CACGGAAGGGGCAAGCGTTGCTCGTAAGTATTGGGCGTAAAGAGTTTGTAGGCGGTTTTTCAAAAAACTTGGTTTTCCATCCGGCTACGACATGGTTAACCTTGCTTGAGTTCAGTCTTT',
       'TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGAAGATCAAGTCAGCGGTAAAATTGAGAGGCTCAACCTCTTCGAGCCGTTGAAACTGGTTTTCTTG',
       'GACGGAGGGTGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGCCCTGTCAGTCGGGTGTGAAAGCCCGGGGCTCAACCCCGGAACGGCACCCGAGACGGCAGGGCTG',
       'GACGGAGGATGCAAGCGTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGATTGTAAAGTCCTCTGTTAAAGATCTGGGCTTAACCCAGTTCAAGCAGTGGAAACTTATAATCTA',
       'TACGAAGGGTGCAAGCGTTATTCGGAATCATTGGGCGTAAAGCGCGCGCAGGCGGATCAGCAAGTCAGATGTGAAATCTCAGGGCTCAACCCTGAAACTGCGTCTGAAACTGCTAGTCTA',
       'TACGTAGGGTGCAAGCGTTATCCGGAATTACTGGGCGTAAAGGGTGCGTAGGCGGCATGGCAAGTCAGAAGTGAAAGGCAATAGCTTAACTATTGTTAGCTTTTGAAACTGCTAAGCTTG',
       'TACGGAGGGTGCGAGCGTTATTCGGATTCACTGGGCGTAAAGCGCATGTAGGCGGTTTCGTAAGTCTGATGTGAAAGCCCTCGACTTAATCGAGGAAGTGCATTGGATACTGCGAGGCTA',
       'TACGTAGGTGGCGAGCGTTATCCGGAATTACTGGGTGTAAAGGGCGTGTAGGCGGCACTGTAAGTCAGATGTGAAATCTCCCGGCTCAACCGGGAGCGTGCATCTGATACTGCAGTACTT',
       'TACGTAGGGTGCAAGCATTATCCGGAGTGACTGGGCGTAAAGAGTTGCGTAGGCGGTTTAATAAGTGAATAGTGAAACCTGGTGGCTCAACCATACAGACTATTATTCAAACTGTTAAAC'],
      dtype='object', length=770)

So the CLI equivalent would be something like:

qiime feature-table group \
    --i-table table.qza \
    --p-axis feature \
    --m-metadata-file rep-seqs.qza \
    --m-metadata-column Sequence \
    --p-mode sum \
    --o-grouped-table new-table.qza

Enjoy!

1 Like