Friday, May 27, 2016

Increasing performance of Octave with OpenBlas and Multiple Threads

Octave [1] is a great platform to perform Machine Learning tasks. Octave run time performance can be improved by adding OpenBlas [2] and by increasing the number of threads Octave can run using. This can be further improved using NVIDIA's cuBLAS [3]. The following steps would work in Ubuntu OS.

For the purposes of this test, the matrix multiplication Octave code from [3] was used. To increase the number of threads and using OpenBlas, run the octave code in a command line in a terminal as:


  • export LD_LIBRARY_PATH=/opt/OpenBLAS
  • OMP_NUM_THREADS=<NumThreads> LD_PRELOAD=/opt/OpenBLAS/lib/libopenblas.so octave code.m


Results

Matrix Multiplication without OpenBlas. Note: Y-axis is in log scale.
Improved Performance Using OpenBlas. Note: Y-axis is in log scale.


The performance is best when OMP_NUM_THREADS=#coresOnTheMachine (Obviously.).


References

[1] GNU Octave: https://www.gnu.org/software/octave/
[2] OpenBlas: http://www.openblas.net/
[3] Drop-in Acceleration of GNU Octave. https://devblogs.nvidia.com/parallelforall/drop-in-acceleration-gnu-octave/

No comments:

Post a Comment