Timing results for the hybrid algorithm (Xeon Phi 7120+CPU) illustrated in Fig 6.
The wall time (W), the time required to assemble the system (A), the time required for the linear solves (L) and the overhead due to offloading to the Phi (O) is shown. Note that the overhead is defined such that W = L + O. The number of slices that yield the optimal run time are highlighted in bold in the table. All measured times are in units of seconds. In addition, the standard deviation determined from 20 repetitions of the simulation is shown next to the wall time.