Orthogonal Matching PursuitΒΆ

Speed benchmarks for JAX implementation

Each row of the following table describes:

  • problem type and configuration (M x N is dictionary size, K is sparsity level)

  • Average time taken in CPU/GPU configurations

  • Speed improvement ratios

System used

  • All benchmarks have been generated on Google Colab

  • CPU and GPU configurations Google Colab have been used

M

N

K

CPU

CPU + JIT

CPU / CPU + JIT

GPU

GPU + JIT

GPU / GPU + JIT

CPU + JIT / GPU + JIT

256

1024

16

148 ms

8.27 ms

17.9x

139 ms

1.28 ms

108x

6.46x

Observations

  • JIT (Just In Time) compilation seems to give significant performance improvements in both CPU and GPU architectures

  • Current implementation seems to be slower on GPU vs CPU with JIT.

  • GPU speed gain over CPU (with JIT on) is relatively meager. On TensorFlow, people regularly report 30x improvements between CPU to GPU for neural networks implemented using Keras.

Possible deficiencies

  • There is opportunity to improve parallelization in the OMP implementation.

  • Cholesky update based implements depends heavily on solving triangular systems.

  • GPUs may not be great at solving triangular systems.