In the last couple of weeks I've been playing with the great q2-sidle to try to amalgamate information regarding six amplicons spanning all nine hypervariable regions (V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9). I went smoothly through all the steps of the recommended pipeline up until
qiime sidle reconstruct-counts.
Here is the command:
qiime sidle reconstruct-counts \ --p-region V1V2 \ --i-kmer-map Results/Sidle/db_V1V2_300nt-map.qza \ --i-regional-alignment Results/Sidle/V1V2_align-map.qza \ --i-regional-table Results/Sidle/V1V2_table-300nt.qza \ --p-region V2V3 \ --i-kmer-map Results/Sidle/db_V2V3_300nt-map.qza \ --i-regional-alignment Results/Sidle/V2V3_align-map.qza \ --i-regional-table Results/Sidle/V2V3_table-300nt.qza \ --p-region V3V4 \ --i-kmer-map Results/Sidle/db_V3V4_300nt-map.qza \ --i-regional-alignment Results/Sidle/V3V4_align-map.qza \ --i-regional-table Results/Sidle/V3V4_table-300nt.qza \ --p-region V4V5 \ --i-kmer-map Results/Sidle/db_V4V5_300nt-map.qza \ --i-regional-alignment Results/Sidle/V4V5_align-map.qza \ --i-regional-table Results/Sidle/V4V5_table-300nt.qza \ --p-region V5V7 \ --i-kmer-map Results/Sidle/db_V5V7_300nt-map.qza \ --i-regional-alignment Results/Sidle/V5V7_align-map.qza \ --i-regional-table Results/Sidle/V5V7_table-300nt.qza \ --p-region V7V9 \ --i-kmer-map Results/Sidle/db_V7V9_300nt-map.qza \ --i-regional-alignment Results/Sidle/V7V9_align-map.qza \ --i-regional-table Results/Sidle/V7V9_table-300nt.qza \ --p-min-counts 0 \ --p-block-size 10000 \ --o-reconstructed-table Results/Sidle/full_table.qza \ --o-reconstruction-summary Results/Sidle/full_summary.qza \ --o-reconstruction-map Results/Sidle/full_map.qza
In every attempt, after 15-20h the process is
Killed. I am trying to run this on a machine with 512GB RAM available. Even so, the problem is not enough memory, as showed by the output of
Out of memory: Killed process 45808 (qiime) total-vm:522386016kB, anon-rss:513560532kB, file-rss:0kB, shmem-rss:8kB
I see this is a very common issue for everyone working with the SILVA database and a good fix for this type of issue is to reduce the number of reads per batch, which is done here by reducing the default
--p-block-size 10000. I tried to do so (2000, 1000 or 100) and nothing really changed.
When examining the log file (pasted below) and the script available on GitHub, the reason for
--p-block-size not helping is obvious: the script runs out of memory way before
--p-block-size is even considered (at line 127, within
--p-block-size is used at line 151). Therefore, because it looks like the issue is being caused by a chunk of code I can't really modify and because I don't have access to a machine with more than 512GB RAM, I don't know what to try next.
Regional Alignments Loaded Regional Kmers Loaded UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Any suggestions? Am I missing something obvious? Any hint will be greatly appreciated.
P.S.: when running this command with fewer regions (for instance, V1V2 and V5V7 only), everything works great and the process ends within 5h.
Thank you very much,