I normally use Vsearch and a custom database consisting of the gene we have amplified and sequenced. I use “qiime vsearch cluster-features-open-reference” and representative sequences outputted from DADA2. This works perfectly.
Instead of using this custom database I tried the same procedure but instead of the extracted genes I have the entire genomes of the bacterial strains as database. I made sure that the extracted genes I normally look for are present in the new database. However, when running Vsearch it does not find the bacterial strains any longer (The clusters are identical but it outputs random names instead of the species names from the database). If I add a “> TEST” with just the amplified gene to the database it finds this one, so the procedure seems to work.
I am not really sure how to troubleshoot this. Anyone got an idea?
We need a little more information in order to help you:
- What version of QIIME 2 are you using?
- What are the commands you are running? Copy and paste please.
- What are the results or errors you are seeing? Copy and paste any relevant terminal output, and provide any relevant QZAs or QZVs (or screenshots).
Thank you for the quick reply and sorry for not providing much information.
I am using qiime2 2019.7.
Database is loaded in by:
qiime tools import
Denoised etc with DADA2
qiime dada2 denoise-paired
OTU/taxonomy with Vsearch
qiime vsearch cluster-features-open-reference
I don't get any errors as such but when I use full genomes Vsearch only assigns a few OTUS. In the below I added a >test gene to the database
When I don't have that added it doesn't find it in the full genome database
See next post for screendumps of the database
Hope this clarifies the question. Let me know if you need more information
Screendump of the ">test" added to the database file
Screendump of the exact same sequence present in one of the full genomes in the same database file:
I am sorry my response has been so slow. Your question is on my radar and I am doing research to get you an answer.
Thank you so much for your patience.
Hey Chloe. Thank you for the update. There is no need to spend too much time on - I kinda hoped there was something obvious I’d missed. I can just extract the alleles from the full genome database and use that. A bit time-consuming but it will provide the answers I need in this particular case, I think
cluster-features-open-reference has a “strand” parameter, which is set to “plus” by default (i.e., only searches alignments in the forward direction). This is often a cause of failed searches, if the reference is in the reverse orientation. Try setting this parameter to “both” and let us know what happens.
Hey @Nicholas_Bokulich ,
Thank you for the reply. I tried your suggestion and unfortunately it had no effect. Since I know just having the allele in the database works, while having the whole genome didn't, I decided to brute force my way through to find the problem by iterative removing bases from the 5' end of the full genome till it works. In the below, "test2" works, however, when there is a bit more 5' it does not work anymore.
I did not find the exact place (I could continue but it got tiresome), however, based on what I found it seems like VSEARCH stops working when the database entry gets too long (?). Let me know if the below images don't make sense.
Sounds like perhaps an issue with VSEARCH itself. Increasing reference length will of course impact the time and search results. You could adjust the
max-rejects parameters as well.
Otherwise, I think that trimming the reference to the gene of interest may be the solution.
Hi again @Nicholas_Bokulich,
Yes… that is also what I fear. Strange though - I thought it was pretty common to look for reference sequences among full genomes. Anyways, I will just extract the GOI and use that in a smaller database. Thank you all for you time!
Maybe not with VSEARCH (or at least not for OTU clustering as you are doing). VSEARCH performs an initial search via kmer matching prior to alignment, so including the full genome may skew the kmer profile too much.
My recommendation not only because it will fix your problem; also it will be faster and more efficient downstream!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.