RDP's Framebot integration to Qiime 2

steff1088 · December 14, 2017, 11:40pm

Hi all,

I was looking into a possible integration of the RDP program Framebot into my qiime 2 workflow. Here some info about Framebot: GitHub - rdpstaff/Framebot at 32cac097904e7c650ca73612b448d016cd543e6b

The framebot step produces these output files:
_framebot.txt - the alignment to the nearest match satisfying the minimum length and protein identity cutoff.
_nucl_corr.fasta and all_seqs_derep_prot_corr.fasta - the frameshift-corrected nucleotide and protein sequences satisfying the minimum length and protein identity cutoff.
_failed_framebot.txt - the alignment to the nearest match that failed the minimum length and protein identity cutoff.
_nucl_failed.fasta - fasta file containing the nucleotide sequences that failed the minimum length and protein identity cutoff.

Is it reasonable to use the output to (1) filter low abundance features out and (2) classify with vsearch? Please share your experience if anybody has done this before, that would be immensely helpful!

cheers,
steffen

steff1088 · December 15, 2017, 5:53pm

One question to add here: Is there a way to cluster amino acid sequences with vsearch?

ebolyen · December 15, 2017, 9:03pm

Hey @steff1088,

Could you elaborate on what you mean here a little bit more? Do you want to use the frameshift corrected sequences downstream in your analysis? Or are you trying to remove the sequences which couldn't be corrected?

For (2) if you import your nucleotide fasta as FeatureData[Sequence] you should be able to use feature-classifier classifiy-consensus-vsearch unless I'm missing something.

I think the larger challenge will be creating a FeatureTable[Frequency] which uses the same features from Framebot, but it should be possible.

Not in QIIME 2 at least. We currently expect all of the data to be DNA. (Not to say that we couldn't support protein analysis in the future, we just don't have anything like that right now.)

steff1088 · December 15, 2017, 9:36pm

Thanks @ebolyen for your response.

I would like to use the frameshift-corrected sequences for classification. I noticed that I will need to cluster the output and, based on your comment, that I may need to look into another tool to do this. Then, the data could just be converted back to an artifact for vsearch classification. Unfortunately I have not come up with a solution yet to obtain a FeatureTable[Frequency] based on the amino acid sequence clustering...

system · January 16, 2018, 3:36am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.