The framebot step produces these output files: _framebot.txt - the alignment to the nearest match satisfying the minimum length and protein identity cutoff. _nucl_corr.fasta and all_seqs_derep_prot_corr.fasta - the frameshift-corrected nucleotide and protein sequences satisfying the minimum length and protein identity cutoff. _failed_framebot.txt - the alignment to the nearest match that failed the minimum length and protein identity cutoff. _nucl_failed.fasta - fasta file containing the nucleotide sequences that failed the minimum length and protein identity cutoff.
Is it reasonable to use the output to (1) filter low abundance features out and (2) classify with vsearch? Please share your experience if anybody has done this before, that would be immensely helpful!
Could you elaborate on what you mean here a little bit more? Do you want to use the frameshift corrected sequences downstream in your analysis? Or are you trying to remove the sequences which couldn't be corrected?
I think the larger challenge will be creating a FeatureTable[Frequency] which uses the same features from Framebot, but it should be possible.
Not in QIIME 2 at least. We currently expect all of the data to be DNA. (Not to say that we couldn't support protein analysis in the future, we just don't have anything like that right now.)
I would like to use the frameshift-corrected sequences for classification. I noticed that I will need to cluster the output and, based on your comment, that I may need to look into another tool to do this. Then, the data could just be converted back to an artifact for vsearch classification. Unfortunately I have not come up with a solution yet to obtain a FeatureTable[Frequency] based on the amino acid sequence clustering...