Hello,
I have a dataset with only 24 samples, 16S illumnina miseq 2X250 paired end. I am getting this error, but not a lot of info on why, nothing comes out when I use verbose
Thank you for getting back so quickly, I realized this after fishing around some more, and while my computer has plenty of RAM available, it looks like the max the VM will allow me to use is 16,384 MB, which is actually in the âredâ zone of the bar that you can adjust the RAM on, the green stops at about 11,000 MB. This seems woefully low, from what I read in other threads you need over 4 GB for the silva file to work correctly. Could this be something I can alter by re-installing the Qiime2 virtual machine and changing a default?
Iâm using 16S V4 2x250 miseq reads, the source is plant material so I know I have mitochondria/plastid sequences I need to remove, and I read the silva is better for that.
Having 11,000 MB is actually very good. But were you using that much when you got the out of memory error?
I say shut down the VM, open up the settings and move that slider up to 11 or 12 GB, then try running this plugin again. You only have to shut down the VM to change memory (no need to reinstall).
I also should add that Iâve been keeping all input/output files in the shared folder on my host computer, perhaps mistakenly believing that I would not use up VM ram with my fastq.gz files and such
Using a shared folder to store your data should be fine â thatâs using hard disk space, which is different than RAM (memory). Did @colinbrislawnâs suggestions resolve the issue for you?
The slider only goes up to 16,384 MB, I canât figure out how to change the upper limit, even with re-installing. There is an option to change the default when you re-install but when I go to settings, the upper limit of RAM doesnât change. I wonder if Windows 10 is thwarting me, wouldnât be the first time. The PC host has over 300 GB free at the moment, we got it specifically for this kind of data analysis
I was using max RAM, and it still errors out. I found this thread
that suggested up to 30 GB is needed for the silva classifier. I have that and more free on our computer, but I canât seem to increase the amount of RAM allocated to the VM. Makes me wish we bought a mac
Ah, retraining could take more memory than classifying, but Iâm still surprised that it takes 30 GB. Oh well
Can you use the pre-trained silva database? Or have you considered using a different taxonomy assignment that does not have these massive requirements? Iâm a big fan of search + LCA methods like classify-consensus-vsearch.
It is the pre-trained one downloaded from the website (silva-119-99-515-806-nb-classifier.qza), and I believe the same one that was used in the tread I linked - they were mistaken in the tread title, they were actually using the pre-trained silva file in the same way I am attempting to.
My samples are endophytic bacteria extracted from plant leaves, so I know I will need to filter out plastid/mitochondrial sequences from plant DNA co-amplification, I read that the silva classifier has those sequences but the green genes does not.
It is disappointing because I have plenty of RAM on my computer, but it doesnât seem that I can increase the Vitrualbox maximum allocation, unless there is a setting on my host PC somewhere that can be changed, but all I seem to find online is directions to the settings bar in the VM, which I have set at max (about 11 GB). I can see in the task manager that a few minutes into the command the memory goes up to 95%, and then goes caput.
I think SILVA + classify-consensus-vsearch is probably a good bet because itâs included with qiime and should âjust work,â but Iâll leave this decision up to you.
Hi @ADL! I second @colinbrislawnâs suggestion to try out classify-consensus-vsearch or classify-consensus-blast with either Greengenes, SILVA, or RDP reference sequences.
Youâre mistaking hard disk space (i.e. storage) for RAM (memory). I think you have 300GB of storage space, but only ~16GB of RAM, which is why you are only able to use ~11GB RAM for the virtual machine.
If you want to try out the SILVA pre-trained classifier (or train your own classifier), you could try using the QIIME 2 Amazon EC2 image with an instance type that has more than 30GB RAM. After youâre done with this memory-intensive step, you can download your data and continue analyses locally.
Oh, I should have mentioned that you will have to import them into qiime artifacts, specifically a FeatureData[Sequence] and FeatureData[Taxonomy] artifact.
You were reading bad advice â greengenes does contain plastid and mitochondrial sequences (just to be clear, Iâm not partial to any of these databases â but I used to use greengenes in the past with plant samples in which I had the very same problem with non-target DNA so know that it works). Letâs just take a look at the databases to be sure:
That command is counting the number of entries for âmitochondriaâ and âChloroplastâ in the reference taxonomy file. As you can see, there are many (and possibly more that do not match my search terms exactly).
Perhaps the report you reads suggested that SILVA has more plastid seqs or sequences specific for your host organism â I donât know these specifics â but that probably doesnât matter too much here (chances are the query plastid seqs will assign to some plastid reference sequence and you are removing them so it doesnât matter which one).
So Greengenes should work for your needs (and has much much lower memory requirements than SILVA since itâs around 1/4 the size),
But if you want to go with SILVA and just canât get past these memory issues I agree with @colinbrislawn and @jairideout â use classify-consensus-blast or classify-consensus-vsearch. These methods do perform quite well (not quite as good as classify-sklearn but same ballpark ) and can be a lot easier for users to work with who are familiar with working with the underlying alignment algorithms.