QIIME 2 2019.4.0 has faulty OpenBLAS results on certain CPUs

ebolyen · May 8, 2019, 10:11pm

We're sorry to report that on certain CPUs¹ matrix multiplication is performed incorrectly by OpenBLAS. We had initially thought the issue was a mismatch between numpy bindings and OpenBLAS, however further research has revealed the issue to be fundamental to OpenBLAS.

^1. those with AVX 512 extensions, i.e. Skylake-X

Am I affected?

Probably not, unless you have a new (and relatively expensive) computer. In particular only Intel CPUs with AVX 512 have this issue. If your CPU is not a member of one of these processor families then there is nothing to worry about; OpenBLAS will do the correct thing.

If you don't know (or don't care) what CPU you have, we recommend just updating to the latest patch.

If you would like to learn if your CPU has AVX 512, you can run the following:

OS X

sysctl -a | grep machdep.cpu.features # Look for AVX 512 (AVX 2 or 1 is fine)

Linux

cat /proc/cpuinfo | grep flags | uniq # Look for AVX 512 (AVX 2 or 1 is fine)

What is OpenBLAS?

OpenBLAS is a widely used part of the scientific computing stack, providing fast and efficient linear algebra routines, and can be found linked from libraries like fastspar, numpy, and scipy to full programming languages like R and Julia. Much of what we do would not be possible without high quality and free libraries like this one.

What is being done?

Later today/tomorrow morning we will have a patch ready (2019.4.1). We strongly recommend you update your version of QIIME 2 (ensuring that qiime info indicates that you have q2-types 2019.4.1 installed, not 2019.4.0). This patch will specifically pin OpenBLAS to 0.3.3 which does not use AVX 512 instructions. In order to make it easy to identify which 2019.4 version was used, we're updating the patch number for q2-types so that it is easier to spot in provenance.

Extra Details

We believe that only OpenBLAS versions 0.3.5 and 0.3.6 are impacted (specifically the DGEMM routine). To the best of our knowledge this issue has not been fully solved yet as we were able to reproduce the issue with both of those versions (even though 0.3.6 should have disabled the problematic code). OpenBLAS 0.3.3 was unaffected by the issue on the same hardware.

ebolyen · May 8, 2019, 10:12pm

We will follow up here when the new environment is ready. Sorry for the inconvenience!

ebolyen · May 9, 2019, 2:44am

New environment files are now available. Download and reinstall normally.

danielsebas · May 9, 2019, 9:06am

Thanks @ebolyen, I have this result from "cat /proc/cpuinfo":

processor : 19
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
stepping : 4
microcode : 1068
cpu MHz : 2199.987
cache size : 25600 KB
physical id : 0
siblings : 20
core id : 12
cpu cores : 10
apicid : 25
initial apicid : 25
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat xsaveopt pln pts dtherm spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 4399.97
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

It seems I have AVX. Am I safe?

Thanks!

timanix · May 9, 2019, 9:21am

Hi! Thank you for the update.
Currently I am rerunning my analysis in Qiime2-2019.1 since I have AVX 512. Just to be sure - am I safe with 2019.1 or I affected as well on this version?

Clara · May 9, 2019, 9:38am

Hi, may I know how to download and reinstall?
Thanks.

thermokarst · May 9, 2019, 1:20pm

Hi @danielsebas!

Yep! AVX is not the same as AVX 512 --- you're fine. If you really want to be sure, remove your 2019.4 env:

conda env remove -n qiime2-2019.4  # or whatever you named your env

Then follow the official installation guide to get set up with the latest 2019.4. Once installed, run qiime info and check that q2-types is at 2019.4.1. That is all!

Keep us posted!

thermokarst · May 9, 2019, 1:22pm

Hi @timanix.

2019.1 does not use the impacted versions of openblas (I don't think it uses openblas at all, actually). To be clear though, 2019.4 has been patched to use an unimpacted version of openblas, so you could just update your 2019.4 deployment by first removing the old env:

conda env remove -n qiime2-2019.4  # or whatever you named your env

Then follow the official installation guide to get set up with the latest 2019.4. Once installed, run qiime info and check that q2-types is at 2019.4.1.

Thanks!

thermokarst · May 9, 2019, 1:23pm

Hi there @Clara!

First, remove your 2019.4 env (if you installed it before the fix was released):

conda env remove -n qiime2-2019.4  # or whatever you named your env

Then follow the official installation guide to get set up with the latest 2019.4. Once installed, run qiime info and check that q2-types is at 2019.4.1. That is all!

Please keep us posted with any more questions - thanks!

timanix · May 9, 2019, 1:28pm

After removing 2019.4 I searched my conda for openblas and found nothing, thank you for the explanation, now it is clear.
It's already third time I am rerunning my analysis, I'll wait maybe there will be some extra updates
Thx for fixes!!!

Eman · May 9, 2019, 1:28pm

Hi @thermokarst
I got this from cat /proc/cpuinfo

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
stepping	: 3
microcode	: 0x25
cpu MHz		: 997.350
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 6385.43
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
stepping	: 3
microcode	: 0x25
cpu MHz		: 999.506
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 6385.43
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
stepping	: 3
microcode	: 0x25
cpu MHz		: 1009.409
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 6385.43
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
stepping	: 3
microcode	: 0x25
cpu MHz		: 981.318
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 6385.43
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

I don't know what to do?
Anything I did yesterday on qiime2-2019.4 should be repeated?
thanks

thermokarst · May 9, 2019, 1:30pm

I don't see AVX 512 in any of your flags --- your processor is not impacted. Please see @ebolyen's note above:

You can do that by following my post above.

Clara · May 9, 2019, 9:38pm

Thanks @thermokarst, I followed what you have said:

remove 2019.4 version
"conda env remove -n qiime2-2019.4"
(I tried to use "source activate qiime2-2019.4" again to ensure I have remove them and yes, I cant activate qiime at this time)
Reinstall again using these code
wget https://data.qiime2.org/distro/core/qiime2-2019.4-py36-linux-conda.yml
conda env create -n qiime2-2019.4 --file qiime2-2019.4-py36-linux-conda.yml

OPTIONAL CLEANUP

rm qiime2-2019.4-py36-linux-conda.yml
3) source activate qiime2-2019.4
4) Check with "qiime info" and this is what I got:
System versions
Python version: 3.6.7
QIIME 2 release: 2019.4
QIIME 2 version: 2019.4.0
q2cli version: 2019.4.0

Installed plugins
alignment: 2019.4.0
composition: 2019.4.0
cutadapt: 2019.4.0
dada2: 2019.4.0
deblur: 2019.4.0
demux: 2019.4.1
diversity: 2019.4.0
emperor: 2019.4.0
feature-classifier: 2019.4.0
feature-table: 2019.4.0
fragment-insertion: 2019.4.0
gneiss: 2019.4.0
longitudinal: 2019.4.0
metadata: 2019.4.0
phylogeny: 2019.4.0
quality-control: 2019.4.0
quality-filter: 2019.4.0
sample-classifier: 2019.4.0
taxa: 2019.4.0
types: 2019.4.1
vsearch: 2019.4.0

Application config directory
/home/ubuntu/Myvolume_1/miniconda3/envs/qiime2-2019.4/var/q2cli

Getting help
To get help with QIIME 2, visit https://qiime2.org

It seems like only the "types" and "demux" are showing 2019.4.1, so this means OK?

ebolyen · May 10, 2019, 1:58am

Yes that is exactly right!

thermokarst · May 11, 2019, 6:22pm

An off-topic reply has been split into a new topic: Trouble with filepaths and q2-dada2

Please keep replies on-topic in the future.