Friday, May 27, 2016

Increasing performance of Octave with OpenBlas and Multiple Threads

Octave [1] is a great platform to perform Machine Learning tasks. Octave run time performance can be improved by adding OpenBlas [2] and by increasing the number of threads Octave can run using. This can be further improved using NVIDIA's cuBLAS [3]. The following steps would work in Ubuntu OS.

For the purposes of this test, the matrix multiplication Octave code from [3] was used. To increase the number of threads and using OpenBlas, run the octave code in a command line in a terminal as:

  • export LD_LIBRARY_PATH=/opt/OpenBLAS
  • OMP_NUM_THREADS=<NumThreads> LD_PRELOAD=/opt/OpenBLAS/lib/ octave code.m


Matrix Multiplication without OpenBlas. Note: Y-axis is in log scale.
Improved Performance Using OpenBlas. Note: Y-axis is in log scale.

The performance is best when OMP_NUM_THREADS=#coresOnTheMachine (Obviously.).


[1] GNU Octave:
[2] OpenBlas:
[3] Drop-in Acceleration of GNU Octave.