Since I had unassigned taxa and bacteria at the domain level (D_0_) i tried filtering these out using the command:
qiime taxa filter-table
When I used "Unassigned;;;;" I got an error message saying : All features were filtered, resulting in an empty table. Prior to this, I also had other contaminants such as- D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Chloroplast;;. But I removed such unwanted seqs. using qiime taxa filter-table
--p-exclude "Unassigned;;;;,D_0__Bacteria;;;;,D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Chloroplast;__,D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyphotobacteria;D_3__Chloroplast;D_4__Vicia faba (fava bean)"
It seems trying to remove the
unassigned gives me an error saying empty feature table
D_0_bacteria does not remove this actually as seen from the taxa bar
(however from the taxa barplot I see that on removing the unassigned should not give an empty feature table).
What else can I do, could you please let me know?
I have already trained on the V4 region using SILVA 132 database and using my forward primers (single end reads).
Also please note: i have looked into the forums for similar such problems, especially this, and my above commands were based on this forum-yet this problem remains.
Yes, I checked few of my seqs using BLAST, there was plentiful of host DNA. But I also assumed that I would be able to remove the host DNA using qiime taxa filter table. But seems this didn’t work. And I used Silva at 97% homology.
“What I did was a strict filter with a qiime quality control step with 99% homology to Greengenes”
I was not using green genes because of the not so regular updating issues. But I think I might give this a try, now that I don’t know how to proceed.
Sorry, see the previous post, they did a quality control step with 99% homology, I’m sure SILVA will work as well. You can likely do a filter for hits.qza and misses.qza. Then filter your table/req-seq with that pass. Ben
Just as a suggestion, in the post linked I used 99% database because I was much more naive back then. As it was mentioned in that link you can achieve basically the same quality with much less time by using 88_gg and reducing your identity thresholds to something like 65-85. The idea behind the positive filter is to just discard really foreign looking sequences. I doubt the choice between GG vs Silva in this step would really matter. I could be naive still though!
Yeah, I would say that this quality control step probably takes the longest. My pipeline without the step would take 3-4 hours from import to taxonomy bar plot creation on a HPC, but the quality control step with 99% homology adds 4-5 hours to that run. Ben
this is because BLAST is not particularly fast and that is all this command is doing — using BLAST to align your query sequences against the reference sequences.
indeed this is the reason to use a small database, e.g., the 88% OTUs as @Mehrbod_Estaki proposes. The goal of this step is just to do a rough filter if you are trying to filter out host vs. bacterial DNA.
Update and edit: [For the moderators: Please don't queue this. I was trying to edit this text but figured I had to either reply or delete this entire message. Out of the 2 i thought it was better to edit by replying by then by deleting the same previous post. I did this as I recently realized that i had included few of my user-id details in this deleted post, hence i have deleted those information. So now, the commands remain the same as that in the previous deleted post, with only removed personal ids. Apologies for the queueing notification.]
This time I used just a subset of samples instead of all my samples to try getting rid of the features that said: unassigned and D_0_bacteria.
My initial plot was testtaxa-bar-plots.qzv (331.5 KB)
Nope! not with the silva. Acc. to a discussion with my colleague, I have been using RDP with an identity of 0.50. this problem seems to go away.The rest I am still trying to figure it out. Although I must mention that now i get archaea and there is no such situation with D_0_ or unclassified-then on using qiime taxa filter-table, removed them.
I am still trying to figure out the rest, incase i discover more, i will post here.
Oh I see, so perhaps the RDP database you had been using was bacteria only, or at least missing the Archaea that you have in your samples. Please let us know when you have final resolution, or run into any more issues.