Recently I (finally) got around to finishing up analysis of some old(er) 16S data, where I discovered an error in my bash script that ignored one sample during a .fasta file exporting process post cleanup/chimera filtering process in QIIME2.
So I decide to re-run the .fasta export steps using the QIIME2 version 2019.1, which I was using before and also for cleaning up the raw sequences leading up to that point. This is where I got an error saying filter-seqs that says Plugin error from feature-table: All features were filtered out of the data.
I looked through few posts on the same error (this, and this, and also this), but they don't seem to quite help with whatever it is that I'm missing or doing wrong. The main thing I saw that got suggested were the input file formats, which I checked for filter-seqs and they were both FeatureTable[Frequency] and FeatureData[Sequence] so I know it's not the file formatting there.
To get .fasta files from a filtered rep-seqs.qza (post-chimera filtering) per sample, I had some little scripts I used before (last in 2019) that does the following:
(where my_path/ is whatever relevant path for each file & sampleID# is the individual sample ID for each sample)
Imports all the individual names of the sequence samples
Use qiime feature-table filter-samples to get filtered-table.qza per sample: feature-table filter-samples\ --i-table my_path/table-nonchimeric-wo-borderline.qza\ --m-metadata-file names.tsv (has individual sample names)\ --p-where "SampleID='sampleID#'"\ --o-filtered-table sampleID#.qza
Use qiime feature-table filter-seqs on the tables from step 2: qiime feature-table filter-seqs\ --i-data my_path/rep-seqs-nonchimeric-wo-borderline.qza\ --i-table my_path/sampleID#.qza\ --o-filtered-data my_path/seq-sampleID#.qza
Run qiime tools export on outputs from step 3 and get .fasta: qiime tools export --input-path my_path/seq-sampleID#.qza\ --output-path my_pathq/sampleID#.biom
and the attached(?) dna-sequences.fasta files are renamed and moved into a final directory
Looking back the scripts are not the prettiest, but they did at some point, actually gave me the outputs I needed.
At this point, I feel like I am either making the silliest mistake somewhere, or the version I'm using no longer works due to time/newer updates. If it is the latter, how okay is it to use a newer version for the export steps? I know versions can make enough difference in data and I really don't want to introduce any inconsistencies and inaccuracies.
Sorry for such a long post! & Thank you in advance!
Is it possible that your Feature IDs in your FeatureTable[Frequency] don't match the Feature IDs in your FeatureData[Sequence]? You can check this by running summarize on the FeatureTable[Frequency] and tabulate-seqs on the FeatureData[Sequence] and comparing the Feature IDs listed in the two locations.
Alternatively, you can perform this check using the QIIME 2 Artifact API in python:
I checked as you suggested and it was the filtered tables that did not contain the IDs they should
Since both the original table and rep-seqs did have the IDs, and all the filtered tables from step 2 in my original post lacked ID's now I'm assuming this means that something went wrong in feature-table filter-samples.
To be completely honest, I am not sure how to proceed from this point. I've checked the surrounding little scripts but the qiime2 command for each sample seems to be complete. When I check the job log, fir each sample I am getting a message saying: Saved FeatureTable[Frequency] to: my_path/sampleID#.qza without mention of any errors or warnings.
What do you suggest I could do to troubleshoot further so I can resolve this?
The most common cause for this kind of filtering to behave this way is because of some kind of problem with the --p-where statement. I noticed yours is:
This means that the ID column of your sample metadata (names.tsv) is SampleID - can you confirm what you named that column? If it isn't SampleID, then you either should rename it to that, or update your where clause to use the actual id column name.
It looks like all of the sample IDs in your table are shorter than the sample IDs in your metadata - they appear to be missing the last 2 or 3 characters: '0021-0111-0000-B1' vs '0021-0111-0000-B110', for example.
I looked back at the metadata and it turned out to be a bash regEx error that wasn't fully cutting off the ends I was able to fix it and I tried it again and it worked seamlessly!
I'm so sorry I took up so much of your time with a silly error like this! I really do appreciate all the help!!
I definitely have a lot more I need to learn about the qiime2 features and plug-ins and data formats!