figshare
Browse
nvidia_failure_mwe-1.tar.gz (885.07 kB)

Nvidia OpenCL failure on K40m GPU using driver version 375.26

Download (885.07 kB)
dataset
posted on 2018-06-14, 18:00 authored by Nick CurtisNick Curtis
Purpose

This is a simple test to demonstrate the failure of NVIDIA's OpenCL runtime packaged
with the RHEL-6 Linux x65 GPU driver version 375.26 on a Tesla K40m GPU.

See our [github link](https://github.com/arghdos/nvidia_failure_mwe) for a full description as well.

Methodology

Essentially what we have done here is packaged a stripped down version of the OpenCL
Jacobian generated by pyJac for a [361-species isopentanol model](https://github.com/Niemeyer-Research-Group/pyJac-paper/blob/master/data/IC5H11OH/Sarathy_ic5_mech.cti).

The user supplies the code with the path to two different OpenCL implementations and
tests them for the correct output.

The code is checking a simple summation of the forward stoichiometric coefficients
in the model as seen in lines 80--101 of `jacobian_kernel.ocl`:

```
int nu_rev = -1;
int nu_fwd = -1;
i_0 = simple_map[i];
offset_next = net_reac_to_spec_offsets[i_0 + 1];
offset = net_reac_to_spec_offsets[i_0];
i_2 = thd_mask[i_0];
i_1 = rev_mask[i_0];
for (int net_ind = offset; net_ind <= -1 + offset_next; ++net_ind)
{
nu_fwd = nu_fwd + reac_to_spec_nu[1 + 2 * net_ind];
nu_rev = nu_rev + reac_to_spec_nu[2 * net_ind];
#ifdef PRINT
if (i_0 == 1988 && 64 * gid(0) + lid(0) == 867)
{
printf("%d\t%d
", reac_to_spec_nu[1 + 2 * net_ind], reac_to_spec_nu[2 * net_ind]);
}
#endif
}
if(64 * gid(0) + lid(0) == 867 && (i_0 == 1975 || i_0 == 1988))
{
printf("rxn:%d, nu_fwd_sum:%d, nu_rev_sum:%d
", i_0, nu_fwd, nu_rev);
}
```

Here we see that nu_fwd / rev are summed over the reaction.
For the 867th condition, and reactions 1975 & 1988 (0-based) in the mechanism, we print
the forward and reverse nu sum. These reactions are:

```
# Reaction 1976 (note: this is 1-based indexing from cantera)
reaction('ic5h9oh-2ooh-4o2 <=> ic5ohket2-4 + oh', [1.250000e+10, 0.0, 19450.0])
# 6s beta, +2kcal
# Reaction 1989
reaction('ic5ohket2-4 => oh + ch3chco + ch2oh + ch2o', [1.000000e+16, 0.0, 39000.0])
# rev / 0.000e+00 0.00 0.000e+00 /
```

As the nu_fwd / nu_rev are initialized to -1 initial we'd expect, for reaction 1975:

```
nu_fwd = -1 + (1 + 0 + 0) = 0
nu_rev = -1 + (0 + 1 + 1) = 1
```

and indeed, the correct output is:

```
rxn:1975, nu_fwd_sum:0, nu_rev_sum:1
```

Similarly, for reaction 1988:

```
nu_fwd = -1 + (1 + 0 + 0 + 0) = 0
nu_rev = -1 + (0 + 1 + 1 + 1) = 3
```
and the correct output:
```
rxn:1988, nu_fwd_sum:0, nu_rev_sum:3
```

Testing
Essentially what is done is run the test on NVIDIA's OpenCL driver with and without
the PRINT macro is defined we should get the correct output, else we should get the incorrect output:

```
rxn:1975, nu_fwd_sum:0, nu_rev_sum:0
rxn:1988, nu_fwd_sum:-1, nu_rev_sum:3
```

A complete test run, using the Intel runtime as an alternate, gives the following output:

```
python nvidia_test.py -nv /usr/lib64/ -hp /apps2/cuda/8.0.61/include/ -on Intel -op /apps2/opencl_runtime/16.1.1/intel/opencl/lib64/
gcc -fPIC -O3 -std=c99 -xc jacobian_kernel_main.ocl jacobian_kernel_compiler.ocl timer.ocl read_initial_conditions.ocl ocl_errorcheck.ocl -I/gpfs/gpfs1/apps2/cuda/8.0.61/include -Wl,-rpath,/usr/lib64 -L/usr/lib64 -lOpenCL -o test.out

./test.out NVIDIA 896 1 1



rxn:1975, nu_fwd_sum:0, nu_rev_sum:0
rxn:1988, nu_fwd_sum:-1, nu_rev_sum:3
896,4.357185000000000e+03,1.999950000000000e+02,1.410000000000000e+01

gcc -fPIC -O3 -std=c99 -xc -DPRINT jacobian_kernel_main.ocl jacobian_kernel_compiler.ocl timer.ocl read_initial_conditions.ocl ocl_errorcheck.ocl -I/gpfs/gpfs1/apps2/cuda/8.0.61/include -Wl,-rpath,/usr/lib64 -L/usr/lib64 -lOpenCL -o test.out

./test.out NVIDIA 896 1 1



rxn:1975, nu_fwd_sum:0, nu_rev_sum:1
0 1
0 1
0 1
0 1
1 0
rxn:1988, nu_fwd_sum:0, nu_rev_sum:3
896,3.998452000000000e+03,2.075790000000000e+02,1.631400000000000e+01

gcc -fPIC -O3 -std=c99 -xc jacobian_kernel_main.ocl jacobian_kernel_compiler.ocl timer.ocl read_initial_conditions.ocl ocl_errorcheck.ocl -I/gpfs/gpfs1/apps2/cuda/8.0.61/include -Wl,-rpath,/gpfs/gpfs1/apps2/opencl_runtime/16.1.1/intel/opencl-1.2-6.4.0.25/lib64 -L/gpfs/gpfs1/apps2/opencl_runtime/16.1.1/intel/opencl-1.2-6.4.0.25/lib64 -lOpenCL -o test.out


./test.out Intel 896 1 1
Compilation started
Compilation done
Linking started
Linking done
Device build started
Device build done
Kernel was successfully vectorized (4)
Kernel was successfully vectorized (4)
Done.

rxn:1975, nu_fwd_sum:0, nu_rev_sum:1
rxn:1988, nu_fwd_sum:0, nu_rev_sum:3
896,5.648410000000000e+02,9.851000000000001e+00,1.066080000000000e+02
```



Requirements

- The OpenCL implementations must be capable of using `printf` in OpenCL code.
- The OpenCL library paths may be either to an OpenCL ICD-loader (e.g., [ocl-icd](https://github.com/OCL-dev/ocl-icd)) or the libraries directly
- Only v375.26 of the Tesla driver has been tested, If you can test any other drivers and find the same bug please feel free to file an issue so we may update the list of faulty drivers.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC