FeatureData filtering

I have two sequences that I'd like to remove. It seems like I can filter from feature tables many different ways, but I wasn't sure if there's a way to do this when the input type isn't a FeatureTable format; is it possible to filter from a Type: FeatureData[Taxonomy]?

Thanks!

Hey there @devonorourke!

We don't have a method for filtering this type of data, but generally speaking you don't need to, since methods that use this know how to take the ID intersection.

If you need to filter out individual sequences, you can filter-seqs. Hope that helps! :t_rex: :qiime2:

Ok. The reason is I was training a classifier and the error indicated an ambiguous character. Apparently in my 2 million sequences there are two separate seqs with an "I". I know the names of the sequences, but wanted to filter out both the seqs and the taxonomy file.

Might need to go back to the source I imported into QIIME and correct it I suppose.

Why do you have nucleotides in the taxonomy file? Are they in the feature ID? They shouldn't be processed as nts anyway, right?

From the feature classifier tutorial (greengenes);

# FeatureData[Taxonomy]
229854	k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Legionellales; f__Legionellaceae; g__Legionella; s__
367523	k__Bacteria; p__Bacteroidetes; c__Flavobacteriia; o__Flavobacteriales; f__Flavobacteriaceae; g__Flavobacterium; s__
239330	k__Bacteria; p__Proteobacteria; c__Deltaproteobacteria; o__Desulfuromonadales; f__Geobacteraceae; g__Geobacter; s__
203525	k__Bacteria; p__OP11; c__OP11-1; o__; f__; g__; s__
148318	k__Bacteria; p__ZB3; c__; o__; f__; g__; s__
340276	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
289157	k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__; g__; s__
# FeatureData[Sequence]
>1111561
AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCACGCCTAACACATGCAAGTCGAACGGCAGCGGGGGAAAGCTTGCTTTCCTGCCGGCGAGTGGCGGACGGGTGAGTAATGCGTAGGAATTTGCCATTAAGAGGGGGACAACTCGGGGAAACTCGAGCTAATACCACATAATCTCTTCGGAGCAAAGAAGGGGATTCTTCGGAACCTTTCGCTTAATGAGAAGCCTACGTTGGATTAGCTTGTTGGTGGGGTAAAGGCTCACCAAGGCGATGATCTATAGCTGGTCTGAGAGGATGATCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGAGGAATTTTGGACAATGGGGGAAACCCTGATCCAGCGATGCCGCGTGTGTGAAGAAGGCCTAAGGGTTGTAAAGCACTTTTAGTGAGGAAGAGAGTAAGTCGGTTAATACCCGGCTTGCAAGACGTTACTCACAGAAAAAGCGCCGGCTAACTCTGTGCCAGCAGCCGCGGTAATACAGAGGGTGCAAGCGTTAATCGGATTGACTGGGCGTAAAGGGCGCGTAGGCGGTAAGATAAGTCAGATGTTAAAAACCCGAGCTCAACTTGGGGACTGCATTTGAAACTATCTCACTAGAGTACAGTAGAGGAGAGCGGAATTTCCGGTGTAGCGGTGAAATGCGTAGATATCGGAAGGAACACCAGTGGCGAAGGCGGCTCTCTGGACTGACACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGAGAACTAGCTGTTGGTACGTTTAGTATCAGTAGCGCAGCTAACGCGTTAAGTTCTCCGCCTGGGGATTACGGTCGCAAGACTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGCGGTTTAATTCGATGCAACCCGAAAAACCTTACCTACCCTTGACATCCCGCGAAGCCTGTAGAGATACGGGCGTGCTCGAAAGAGAACGCGGTGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTGCCATCTACATTAGTAGGGAACTCTAAGGAGACTGCCGGCGATAAGTCGGAGGAAGGTGGGGACGATGTCAAGTCATCATGGCCTTTATGGGTAGGGCTACACGCGTGCTACAATGGGCAGTACAAAGGGAAGCGAAGCTGTGAAGTGGAGCAAACCTCAGAAAGCTGCTCGTAATCCGGATTGAAGTCTGCAACTCGACTTCATGAGGTTGGAATCGCTAGTAATCGCAGATCAGCATGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGAAGTGGGTTGTACCAGAAGTAGGAGAGCTAACCTTCGGGAGGCATCTTACCACGGTATGATTCATGACTGGGGTGAAGTCGTAACAAGGTA
>1111421
GAGTAACGCGTAGGAACCAACCTTAGAGAGTGGAATAACCTTGGGAAACTAAGGCTAATACCGCATATACCTCGAGAGGGAAAGGAGAGTAATCTCTGCTCTAGGACGGGCCTGCGCCCGATTAGCTTGTTGGTAAGGTAATGGCTTACCAAGGCATCGATCGGTAGCTGGTCTGAGAGGACGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGCAAGCTTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCACACGCGACGATGATGACGGTAGCGTGAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGTCTAATTTGTCGGGGGTGAAATCCCAGGGCTTAACCTTGGAAGTGCCTTCGGGACAATTAGGCTTGAGACCGGGAGAGGATGGCGGAATTCCCAGTGTAGAGGTGAAATTCGTAGATATTGGGAAGAACACCGGTGGCGAAAGCGGCCATCTGGTCCGGTTCTGACGCTAAAGCGCGAAAGCGTGGGGGAGCGAACAGGATTAGATACCCTGGTAGCCACGCCGTAAACGATGTGTGCTGGATGTCGGGGGGCATGCTCTTCGGTGTCGTAGCTAACGCGTGAAGCACACCGTCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGCAGAACCTTACCAGCCTTTGACATGCCCTTTATATCCTAAAGAGACTTGGGAGTCGGTTCGGCCGGAAGGGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTATTCTCAGTTGCCATCGGGTCATGCCGGGCACTCTGAGGGGACTGCCGGTGACAAGCCGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTACAGGCTGGGCTACACACGTGCTACAATGGCGGTGACAATGGGTTATCAGGCGACTCTGCGAAGAGGAGCGAATCCTAAAAGACCGTCTTAGTTCGGATTGCACTCTGCAACCCGGGTGCATGAAGTTGGAATCGCTAGTAATCGCGGATCAGCACGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTTGTCTTTACTCGAAGACAGTGTGCCAACCTTAA
>1111090
AGAGTTTGATCCTGGCTCAGGACGAACGCTGGCGGCGGGTTTAACACATGCAAGTCGAGGGACGAAACCATCTTCGGATGGCTGAAACCGGCGAACGGGTGAGTAACACGTGACCAACCTGCCCTTCACTCAGGGATAACAGCGGGAAACCGTTGCTAACACCTGATACCGCGAGTCGAGCGCATGCTCTTCTCGTGAAAACTCCGGTGGTGAAGGAGGGGGTCGCGGCCTATCAGGTAGTTGGTGCGGTAACGGCGCACCAAGCCGACGACGGGCAGCTGGAGTGAGAGCTCGAGCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGAGAATCTTGCACAATGGGCGAAAGCCTGATGCAGCCACGCCGCGTGGAGGAAGAAGGCCTTCGGGTTGTAAACTCCTTTCAGCAGGGAAGAAGCGAAAGTGACGGTACCTGCAGAAGAAGCCCCGGCCAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGGGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGTGTAGGCGGACCATTAAGTCGGCTGTGAAATCTCTGGGCTCAACCCAGAAACTGCAGTCGATACTGGTGGTCTTGAGGTAGCTAGAGGAGAGTGGAATTCCCAGTGTAGCGGTGGAATGCGCAGATATTGGGAGGAACACCAATGGCGAAGGCAGCTCTCTGGAGCTCACCTGACGCTGAGACGCGAAAGCATGGGTAGCAAACAGGATTAGATACCCTGGTAGTCCATGCCGTAAACGATGGGTGCTAGATGTGGGGACCAGTTCACGGTCTCCGTGTCGAAGCTAACGCGTTAAGCACCCCGCCTGGGGACTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGTGGCTTAATTCGAAGCAACGCGAAGAACCTTACCTAGTCTTGACATACACCGTTCAACTACCGAAATGTAGTGGGTTCGTCCGAGGTGTACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTGTCCTATGTTGCCAACACGTAATGGTGGGGACTCATGGGAGACTGCCGGTGTCAAACCGGAGGAAGGTGGGGACGACGTCAAGTCATCATGCCCCTTATGACTAGGGCTGCACACATGCTACAATGGCAGGTACAGAGGGCTGCGATCCCGCGAGGGGGAGCGAATCCCACAAAGCCTGTCTCAGTTCGGATTGCAGTCTGCAACTCGACTGCATGAAGCCGGAGTCGCTAGTAATCGCAGATCAGCAACGCTGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACGTCATGAAAGTCGGTAACACCCGAAGCCGGTGGCCCAACCCTTCATTGGGGGGGAGCCGTCTAAGGTGGGATCGGTAATTGGGACGAAGTCGTAACAAGGTAGCCGTA
>1110893
TCCTGGCTCAGATGAACGCTAGCGCGGAGGCTTAATACATGCAAGTCGAACGGTAACAGGTAATTTATTATGCTGACGAGTGGCGCACGGGTGAGTAACGCGTACATACCTACCTCTAAGAAAGGAATAGCCCTGGGAAACTGGGATTAATACCTTATGTGCTGGTGACAGTAAAGCTACGGCGCTTAGAGATGGATGTGCGTTCTATTAGCTAGTTGGTGAGGTAACTGCTCACCAAGGCGACGATAGATAGGGGGCGTGAGAGCGTGATCCCCCACACGGGTACTGAGACACGGACCCGACTCCTACGGGAGGCAGCAGTAAGGAATATTGGACAATGGGCGGAAGCCTGATCCAGCCATCCCGCGTGTAGGAAGACTGCCCTATGGGTTGTAAACTACTTTTAGACAGGAAGAAACGCCTTTATTTATGAGGGTTTGACGGTACTGTCAGAATAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGATACGGAGGGTGCAAGCGTTATCCGGAATCACTGGGTTTAAAGGGTGAGTAGGCGGGTTATTAAGTCAGAGGTGAAAGGTTTCAGCTTAACTGCAAAATTGCCTTTGATACTGATAGTCTAGAATTATGTTGAGGTTAGCGGAATGAGTCATGTAGCGGTGAAATGCATAGATATGACTTAGAACACCAATTGCGAAGGCAGCTAACTGGGCATATATTGACGCTGAGTCACGAAAGCGTGGGGAGCGAACAAGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCTAACTCGATGTTTGTTTAAATATGAGCATCCAAGGGAAACCGTTTAGTTAGCCACCTGGGGAGTACGTTCGCAAGGATGAAACTCAAAGGAATTGACGGGGGTCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGATACGCGAGGTACCTTACCTGGGCTCTAATGCGCGTGACCGCTTCTGAAAGGAAAGCTTTCCTTCGGGACAAAGCAAGGTGCTGCATGGTTGTCGTCAGCTTCGCGCCGTGAGGTGTTGGGTTAAGTCCTGCAACGAGCGCAACCCCTATTGTTAGTTACCAGCAAGTAAGTTGGGGACTCTAGCAAGACTGCCGGCGTAAGCCGCGAGGAAGGTGGGGATGACGTCAAATCATCATGGCCTTACGTCCTGGGCTACACACGTGCTACAATGGTAGGTACAGAGAGCAGCCACTACGCGAGTAGGAGCGAATCTATAAAACCTATCACAGTTCGGATCGGAGTCTGGAACTCGACTCCGTGAAGGTGGAATCGCTAGTAATCGCGCATCAGCCATGGCGCGGTGAATACGTTCCCGGACCTTGTACACACCGCCCGTCAAGCCATGGGAGCTGGTGGTGCCTGAAGATGGTGACTTAACGTGGAGCTATTTAGGGTAAAACTAGTAACTGGGGCTAAGTCGTAACAAGGCTGCCGTACCGGAAGCGTGCGGCTGGATCACCTCCTT
>1110814
TTAGAGTTTGATCATGGCTCAGAACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGCGAAAGGCTCCTTCGGGGGCCGAGTAGAGTGGCGAACGGGTGAGTAACACGTGGGTAACCTGCCCAAGAGCGGGGGACAACGTCGGGAAACCGGCGCTAATACCGCATACGCTTGTTCGGTTTTCGGATCGGACAAGGAAAGCCTTCGGGCGCTCCTGGATGGGCCCGCGTCGCATTAGCTTGTTGGTGGGATAACAGCCCACCAAGGCGACGATGCGTAGCCGAGCTGAGAGGCTGATCGGCCACACTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCGCAATGGGGGCAACCCTGACGCAGCAACGCCGCGTGGGTGACGAAGGCCTTCGGGTCGTAAAGCCCTGTCGTGAGGGACGAAGTTCTGACGGTACCTCACAAGAAAGCCACGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGTGGCGAGCGTTGTTCGGAATCATTGGGCGTAAAGGGCGCGCAGGCGGCCCAGCAAGTCCGGGGTGAAAGCCCTCGACTCAATCGAGGAACGGCCTCGGAAACTGCTGGGCTTGAGTACGGGAGAGGTGAGCGGAATTCCCAGTGTAGCGGTGAAATGCGTAGATATTGGGAAGAACACCGGTGGCGAAGGCGGCTCACTGGACCGATACTGACGCTGAGGCGCGAAAGCCGGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCCGGCCGTAAACGTTGTGTACTAGGTGGTGGGGGTATCGACCCCTCCGCTGCCGCAGGTAACCCATTAAGTACACCGCCTGGGGAGTACGGTCGCAAGGCTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGCAGAACCTTACCCAGGCTTGACATCCCGCGCCATTCGGTGAAAGCCGGAGTTTCCTTCGGGAACGCGGTGACAGGTGGTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTACCTGCTGTTGCTACCAGTTCGGCTGGGCACTCTGCAGGGACTGCCGGCGATAAGCTGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGCCCTTTATGTCTGGGGCTACACGCGTGCTACAATGGCCGATACAAAGGGTTGCCAACCCACGAGGGGGAGCCAATCCCAAAAAGTCGGCCTCAGTTCGGACTGGAGTCTGCAACTCGACTCCACGAAGGTGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCACCCGAATTGGCTGCACCCGAAGTCGTGTGCCCAACCGCAAGGAGGGTAGCGCCGAAGGTGTGGTTGGTAAGGGGGGTGAAGTCGTAACAAGGTAACCGTA
>1110088
AGAGTTTGATCATGGCTCACATTGCCCGCTGGCGTCATGCCTAACACATGCAAGTCCAACGGTAACGGGCCCTTCGGGGTGCTGACGAGTGGCGGACGGGTGAGTAATGCGACGGAATCTGCCTTACGGTGGGGGATAACCCGGGGAAACCCGGGCTAATACCGCATACGTCCCAAGGGAGAAAGCGGGGGATCTTCGGACCTCGCGCCGAAAGATGAGCCTACGTCCGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCGGTAGCTGGTCTGAGGGGATGATCAACCTCACTGGGACTGAGACTCCGTCCAGACTCCTACGGGAGGCTTCAGTGGGGAATTTTGGACAATGGGCTAAAGCCTGATCCTGCGATGCCGCGTGGGTGAAGAACGCCTTCGGGTTGTTCAGCCCTTTCGGCGGGGACGAAAATGCCGAACCTGATACCTTCGGGTCTTTGACGTTACCTGCGATAAGAAGCACCGGCTAACTCCGTGCCAGGCAGCCGCGGTAATACGGAGGGTGCTTGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGCGGTTTGTAAGTCATATGTGAAATCCCCGGGCTCATCCTGGGAACTGCATACTATACTGCTAGGTCTAGAGTGTGATAGAGGGAAGCGGAATTCACGGTCGTATCGGCGATTTGCGTATATATCCGGAGGATCTTCAGTGCCGTATGCTGCCTCACACGTACCAACACTGACGCTGAGGCGCGAACGCGTGAGGCAGCAAATCAGGATTAGGATATCCCTGGGTAGTCCACGCCGATAAACGACTGAGAAACTAGCCGATCTGGAAGTCAACTGGCTTTCTGGCTGGCGCAGCATAACGCGTTAAGTTGCTCCGCCTGGGGAGTACGGCCGCAACGGCATAAAACTCAAATGAATATGACGGGGGCCCGCACAAGCGGTGGAGTCATGTGGTTTAATTCGATGCAACGTCGAAAAACCTTACCTGCCCTTGACATCCTCGGAACTTGTCAGAGATTGACTTGGTGCCTTCGGGAACCGAGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCGGTTCGGCCGGGAACTCTAGGGAGACTGCCGGTGATAAACCGGAGGAAGGTGGGATGTCGTCAAGTCATCATGGCCTTTATGTGCAGGGCTACACTCGTGCTACAATGGTCAGTTCAGAGGGAAGCCAACCCGCGAGGGGGAGGTAATCCCAGAAAATTGATCCTAGTCCGGATTGCAGTGTGCAAGTCGAATGCGTGAAGTCGGAATCGTTAGTAATCGGGAATCAGAATGTCGCGGTGAATACGTTCCCGGGCTTTGTACACGCCGCCCGTCACACCATGGGAGTGGGCTGTACCAGAAGCAGGTAGCCTAACGGTAAGGATGGCGCCTCCCACGGTGTGGTTCATGACTGGGGTGAATTTGTAACAAGGTAGCCGTA
>1109993
TTAGAGTTTGATCCTGGCTCAGGGTGAACGCTGGCAGCGTGCCTAATGCATGCAAGTCGAGCGGGGAGGGGAAACCCTCCCAGCGGCGAACGGCTGAGTAATATATAGCTGACCTACCCACCGGTGGGGGATAACCTCGGGAAACTGGGGCTAATACCGCATAATATAAGCTGGGGTGGTGCCTGGCTTATTAAAGCCCGCAAGGGCGCCGGTGGAGGGGGCTATATCTCAACAGGTAGTTGGTAGGGTAATGGCCTACCAAGCCTATGACGGGTAGCTGGTCTGAGAGGATGGCCAGCCAGATGGGGACTGAGACACGGCCCCAACTCCTACGGGAGGCAGCAGCAGGGGATCTTGGGCAATGCCCGAAAGGGTGACCCAGCGACGCTGCGTGGGGGAAGAAGGCCTTCGGGTTGTAAACCCCTTTTGCCGGAGAAGAAGCTCTGACGGTATCCGGCGAATAAGCCTCAGCCAACTACGTGCCAGCAGCTGCGGTAAGACGTAGGAGGCGAGCGTTACCCGGAATTACTGGGCGTAAAAGGGATGTAGGTGGCCGATCAAGTCCGGGGTGAAATTTTCCGGCTCAACCGGGAAGCTGCTCCGGATACTGATTGGCTAGAGGGCATCAGGGGGAGACGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAGGAACGCCGATGGCGAAGGCGGTCTCCTGGGGTGCCCCTGACACTGAGATCCGAAAGCGTGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCACGCCCTAAACGATGGGCACTAGTTCTGGGGGGCACTGACCCCTCCCGGGACGAAGCTAACGCTTTAAGTGCCCTGCCTGGGGAGTATAGCCGCAAGGCTGAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGCGGAGCGTGTGGTTTAATTCGATGCAAAGCGTAGAACCTTACCAGGGCTTGACATGCCGGTAGTAGGAACCCGAAAGGGGGACGACTGGTATCCAGCCCAGAGCCGGCACAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTGCCCTTAGTTGCATATCTAAGGGGACTGCCTCGCAAAACGGGGAGGAAGGTGGGGATGACGTCAAGTCAGCATGGCCCTTATGCCCTGGGCTACACACACGCTACAATGGACGCTACAGCGGGAAGCGACCGGGCGACCGGAAGCTGATCCCTTAAAAGCGTCCCCAGTTCAGATTGCAGGCTGAAACCCGCCTGCATGAAGCCGGAGTCGCTAGTAACCGCAGGTCAGCATACTGCGGTGAATACGTTCCCGGGTCCTGTACACACCGCCCGTCACGGCATGGGAGCCGACAACACCTAAAAGCGCCAAGCTAACTCCACCGGAGAGGCAGGCGTCGAGGGTGAGGTCGGTGACTGGGCCGAAGTCGTAACAAGGTAACC

Sorry - trying to write a coherent post and cook dinner is beyond my capabilities.

There are two files, a sequence file and a taxonomy file. There ambiguous bases are in the sequence file. Pretty sure I can delete those on a per-reference basis.

I just wanted to also delete the matching records in the taxonomy too.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.