Hi everyone,
In the last couple of weeks I've been playing with the great q2-sidle to try to amalgamate information regarding six amplicons spanning all nine hypervariable regions (V1V2, V2V3, V3V4, V4V5, V5V7 and V7V9). I went smoothly through all the steps of the recommended pipeline up until qiime sidle reconstruct-counts
.
Here is the command:
qiime sidle reconstruct-counts \
--p-region V1V2 \
--i-kmer-map Results/Sidle/db_V1V2_300nt-map.qza \
--i-regional-alignment Results/Sidle/V1V2_align-map.qza \
--i-regional-table Results/Sidle/V1V2_table-300nt.qza \
--p-region V2V3 \
--i-kmer-map Results/Sidle/db_V2V3_300nt-map.qza \
--i-regional-alignment Results/Sidle/V2V3_align-map.qza \
--i-regional-table Results/Sidle/V2V3_table-300nt.qza \
--p-region V3V4 \
--i-kmer-map Results/Sidle/db_V3V4_300nt-map.qza \
--i-regional-alignment Results/Sidle/V3V4_align-map.qza \
--i-regional-table Results/Sidle/V3V4_table-300nt.qza \
--p-region V4V5 \
--i-kmer-map Results/Sidle/db_V4V5_300nt-map.qza \
--i-regional-alignment Results/Sidle/V4V5_align-map.qza \
--i-regional-table Results/Sidle/V4V5_table-300nt.qza \
--p-region V5V7 \
--i-kmer-map Results/Sidle/db_V5V7_300nt-map.qza \
--i-regional-alignment Results/Sidle/V5V7_align-map.qza \
--i-regional-table Results/Sidle/V5V7_table-300nt.qza \
--p-region V7V9 \
--i-kmer-map Results/Sidle/db_V7V9_300nt-map.qza \
--i-regional-alignment Results/Sidle/V7V9_align-map.qza \
--i-regional-table Results/Sidle/V7V9_table-300nt.qza \
--p-min-counts 0 \
--p-block-size 10000 \
--o-reconstructed-table Results/Sidle/full_table.qza \
--o-reconstruction-summary Results/Sidle/full_summary.qza \
--o-reconstruction-map Results/Sidle/full_map.qza
In every attempt, after 15-20h the process is Killed
. I am trying to run this on a machine with 512GB RAM available. Even so, the problem is not enough memory, as showed by the output of dmesg
:
Out of memory: Killed process 45808 (qiime) total-vm:522386016kB, anon-rss:513560532kB, file-rss:0kB, shmem-rss:8kB
I see this is a very common issue for everyone working with the SILVA database and a good fix for this type of issue is to reduce the number of reads per batch, which is done here by reducing the default --p-block-size 10000
. I tried to do so (2000, 1000 or 100) and nothing really changed.
When examining the log file (pasted below) and the script available on GitHub, the reason for --p-block-size
not helping is obvious: the script runs out of memory way before --p-block-size
is even considered (at line 127, within _untangle_database_ids()
, while --p-block-size
is used at line 151). Therefore, because it looks like the issue is being caused by a chunk of code I can't really modify and because I don't have access to a machine with more than 512GB RAM, I don't know what to try next.
Regional Alignments Loaded
Regional Kmers Loaded
UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Any suggestions? Am I missing something obvious? Any hint will be greatly appreciated.
P.S.: when running this command with fewer regions (for instance, V1V2 and V5V7 only), everything works great and the process ends within 5h.
Thank you very much,
Vitor