Hello,
I have a dataset with only 24 samples, 16S illumnina miseq 2X250 paired end. I am getting this error, but not a lot of info on why, nothing comes out when I use verbose
Thank you for getting back so quickly, I realized this after fishing around some more, and while my computer has plenty of RAM available, it looks like the max the VM will allow me to use is 16,384 MB, which is actually in the "red" zone of the bar that you can adjust the RAM on, the green stops at about 11,000 MB. This seems woefully low, from what I read in other threads you need over 4 GB for the silva file to work correctly. Could this be something I can alter by re-installing the Qiime2 virtual machine and changing a default?
I'm using 16S V4 2x250 miseq reads, the source is plant material so I know I have mitochondria/plastid sequences I need to remove, and I read the silva is better for that.
Having 11,000 MB is actually very good. But were you using that much when you got the out of memory error?
I say shut down the VM, open up the settings and move that slider up to 11 or 12 GB, then try running this plugin again. You only have to shut down the VM to change memory (no need to reinstall).
I also should add that I've been keeping all input/output files in the shared folder on my host computer, perhaps mistakenly believing that I would not use up VM ram with my fastq.gz files and such
Using a shared folder to store your data should be fine -- that's using hard disk space, which is different than RAM (memory). Did @colinbrislawn's suggestions resolve the issue for you?
The slider only goes up to 16,384 MB, I can't figure out how to change the upper limit, even with re-installing. There is an option to change the default when you re-install but when I go to settings, the upper limit of RAM doesn't change. I wonder if Windows 10 is thwarting me, wouldn't be the first time. The PC host has over 300 GB free at the moment, we got it specifically for this kind of data analysis
I was using max RAM, and it still errors out. I found this thread
that suggested up to 30 GB is needed for the silva classifier. I have that and more free on our computer, but I can't seem to increase the amount of RAM allocated to the VM. Makes me wish we bought a mac
Ah, retraining could take more memory than classifying, but I'm still surprised that it takes 30 GB. Oh well
Can you use the pre-trained silva database? Or have you considered using a different taxonomy assignment that does not have these massive requirements? I'm a big fan of search + LCA methods like classify-consensus-vsearch.
It is the pre-trained one downloaded from the website (silva-119-99-515-806-nb-classifier.qza), and I believe the same one that was used in the tread I linked - they were mistaken in the tread title, they were actually using the pre-trained silva file in the same way I am attempting to.
My samples are endophytic bacteria extracted from plant leaves, so I know I will need to filter out plastid/mitochondrial sequences from plant DNA co-amplification, I read that the silva classifier has those sequences but the green genes does not.
It is disappointing because I have plenty of RAM on my computer, but it doesn't seem that I can increase the Vitrualbox maximum allocation, unless there is a setting on my host PC somewhere that can be changed, but all I seem to find online is directions to the settings bar in the VM, which I have set at max (about 11 GB). I can see in the task manager that a few minutes into the command the memory goes up to 95%, and then goes caput.
I think SILVA + classify-consensus-vsearch is probably a good bet because it's included with qiime and should 'just work,' but I'll leave this decision up to you.
Hi @ADL! I second @colinbrislawn's suggestion to try out classify-consensus-vsearch or classify-consensus-blast with either Greengenes, SILVA, or RDP reference sequences.
You're mistaking hard disk space (i.e. storage) for RAM (memory). I think you have 300GB of storage space, but only ~16GB of RAM, which is why you are only able to use ~11GB RAM for the virtual machine.
If you want to try out the SILVA pre-trained classifier (or train your own classifier), you could try using the QIIME 2 Amazon EC2 image with an instance type that has more than 30GB RAM. After you're done with this memory-intensive step, you can download your data and continue analyses locally.
Oh, I should have mentioned that you will have to import them into qiime artifacts, specifically a FeatureData[Sequence] and FeatureData[Taxonomy] artifact.
You were reading bad advice — greengenes does contain plastid and mitochondrial sequences (just to be clear, I'm not partial to any of these databases — but I used to use greengenes in the past with plant samples in which I had the very same problem with non-target DNA so know that it works). Let's just take a look at the databases to be sure:
That command is counting the number of entries for 'mitochondria' and 'Chloroplast' in the reference taxonomy file. As you can see, there are many (and possibly more that do not match my search terms exactly).
Perhaps the report you reads suggested that SILVA has more plastid seqs or sequences specific for your host organism — I don't know these specifics — but that probably doesn't matter too much here (chances are the query plastid seqs will assign to some plastid reference sequence and you are removing them so it doesn't matter which one).
So Greengenes should work for your needs (and has much much lower memory requirements than SILVA since it's around 1/4 the size),
But if you want to go with SILVA and just can't get past these memory issues I agree with @colinbrislawn and @jairideout — use classify-consensus-blast or classify-consensus-vsearch. These methods do perform quite well (not quite as good as classify-sklearn but same ballpark ) and can be a lot easier for users to work with who are familiar with working with the underlying alignment algorithms.