Plug-in error with feature-table filter-seqs & QIIME2 version variability?

HinakoT · November 17, 2021, 11:00pm

Recently I (finally) got around to finishing up analysis of some old(er) 16S data, where I discovered an error in my bash script that ignored one sample during a .fasta file exporting process post cleanup/chimera filtering process in QIIME2.

So I decide to re-run the .fasta export steps using the QIIME2 version 2019.1, which I was using before and also for cleaning up the raw sequences leading up to that point. This is where I got an error saying filter-seqs that says Plugin error from feature-table: All features were filtered out of the data.

I looked through few posts on the same error (this, and this, and also this), but they don't seem to quite help with whatever it is that I'm missing or doing wrong. The main thing I saw that got suggested were the input file formats, which I checked for filter-seqs and they were both FeatureTable[Frequency] and FeatureData[Sequence] so I know it's not the file formatting there.

To get .fasta files from a filtered rep-seqs.qza (post-chimera filtering) per sample, I had some little scripts I used before (last in 2019) that does the following:
(where my_path/ is whatever relevant path for each file & sampleID# is the individual sample ID for each sample)

Imports all the individual names of the sequence samples
Use qiime feature-table filter-samples to get filtered-table.qza per sample:
feature-table filter-samples\ --i-table my_path/table-nonchimeric-wo-borderline.qza\ --m-metadata-file names.tsv (has individual sample names)\ --p-where "SampleID='sampleID#'"\ --o-filtered-table sampleID#.qza
Use qiime feature-table filter-seqs on the tables from step 2:
qiime feature-table filter-seqs\ --i-data my_path/rep-seqs-nonchimeric-wo-borderline.qza\ --i-table my_path/sampleID#.qza\ --o-filtered-data my_path/seq-sampleID#.qza
Run qiime tools export on outputs from step 3 and get .fasta:
qiime tools export --input-path my_path/seq-sampleID#.qza\ --output-path my_pathq/sampleID#.biom
and the attached(?) dna-sequences.fasta files are renamed and moved into a final directory

Looking back the scripts are not the prettiest, but they did at some point, actually gave me the outputs I needed.

At this point, I feel like I am either making the silliest mistake somewhere, or the version I'm using no longer works due to time/newer updates. If it is the latter, how okay is it to use a newer version for the export steps? I know versions can make enough difference in data and I really don't want to introduce any inconsistencies and inaccuracies.

Sorry for such a long post! & Thank you in advance!

thermokarst · November 17, 2021, 11:17pm

Is it possible that your Feature IDs in your FeatureTable[Frequency] don't match the Feature IDs in your FeatureData[Sequence]? You can check this by running summarize on the FeatureTable[Frequency] and tabulate-seqs on the FeatureData[Sequence] and comparing the Feature IDs listed in the two locations.

Alternatively, you can perform this check using the QIIME 2 Artifact API in python:

import qiime2
import pandas as pd

table_fp = 'my_path/table-nonchimeric-wo-borderline.qza'
seqs_fp = 'my_path/rep-seqs-nonchimeric-wo-borderline.qza'

table_artifact = qiime2.Artifact.load(table_fp)
seqs_artifact = qiime2.Artifact.load(seqs_fp)

table_df = table_artifact.view(pd.DataFrame)
seqs_srs = seqs_artifact.view(pd.Series)

table_ids = set(table_df.columns)
seqs_ids = set(seqs_srs.index)

print(len(table_ids & seqs_ids))

The final statement should show you some IDs in that intersection. If not, now you know where the problem is.

EDIT - I forgot to mention, you'll also want to double-check this on the filtered table, as well as the unfiltered one.

HinakoT · November 18, 2021, 11:12pm

Thank you @thermokarst for such a quick reply!

I checked as you suggested and it was the filtered tables that did not contain the IDs they should
Since both the original table and rep-seqs did have the IDs, and all the filtered tables from step 2 in my original post lacked ID's now I'm assuming this means that something went wrong in feature-table filter-samples.

To be completely honest, I am not sure how to proceed from this point. I've checked the surrounding little scripts but the qiime2 command for each sample seems to be complete. When I check the job log, fir each sample I am getting a message saying: Saved FeatureTable[Frequency] to: my_path/sampleID#.qza without mention of any errors or warnings.

What do you suggest I could do to troubleshoot further so I can resolve this?

Thank you again!

thermokarst · November 18, 2021, 11:29pm

The most common cause for this kind of filtering to behave this way is because of some kind of problem with the --p-where statement. I noticed yours is:

This means that the ID column of your sample metadata (names.tsv) is SampleID - can you confirm what you named that column? If it isn't SampleID, then you either should rename it to that, or update your where clause to use the actual id column name.

HinakoT · November 19, 2021, 5:50pm

Yes, the names.tsv starts out with SampleID at the top. I specify it to have that when that metadata file is made.

thermokarst · November 22, 2021, 3:11pm

To reiterate:

Have you confirmed that there is actually a Sample ID in your table that matches the Sample ID you're filtering to?

If you're still running into issues please send me download links in here or in a DM to your sample metadata and feature table. Thanks!

HinakoT · November 25, 2021, 11:39pm

I sent you the metadata and the table!
Thank you!!

thermokarst · November 29, 2021, 3:24pm

Thanks for sharing your data @HinakoT! Your sample IDs in your metadata don't match the sample IDs in your feature table:

import qiime2
import pandas as pd

table_fp = 'table-nonchimeric-wo-borderline.qza'
md_fp = 'names.tsv'

table_artifact = qiime2.Artifact.load(table_fp)
md = qiime2.Metadata.load(md_fp)

table = table_artifact.view(pd.DataFrame)

table_ids = set(table.index)
md_ids = set(md.ids)

print(len(table_ids & md_ids))

It looks like all of the sample IDs in your table are shorter than the sample IDs in your metadata - they appear to be missing the last 2 or 3 characters: '0021-0111-0000-B1' vs '0021-0111-0000-B110', for example.

HinakoT · November 30, 2021, 8:21pm

Thank you for checking them for me!

I looked back at the metadata and it turned out to be a bash regEx error that wasn't fully cutting off the ends I was able to fix it and I tried it again and it worked seamlessly!

I'm so sorry I took up so much of your time with a silly error like this! I really do appreciate all the help!!
I definitely have a lot more I need to learn about the qiime2 features and plug-ins and data formats!

system · January 1, 2022, 2:22am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.