Sunday, August 30, 2015

Caffe Install: ATLAS installation Ubuntu CPU throttling apparently enabled

Caffe Install requires a Math library. Of the 3 options, ATLAS is open source and, hence, the default. Installation of ATLAS was causing issues because intel_pstate governor. Governor does power management for Kernel [1].

To identify if, the intel_pstate is the issue, identify if the governor is intel_pstate. 

  1. cpufreq-info will show the governor.
  2. If intel_pstate is shown as the governor, disable intel_pstate.
  3.  Edit the file /etc/default/grub and add: GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable"
  4. Update grub.cfg ala grub-mkconfig -o /boot/grub/grub.cfg
Reference
[1] https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
[2] http://unix.stackexchange.com/questions/121410/setting-cpu-governor-to-on-demand-or-conservative
[3] https://wiki.archlinux.org/index.php/CPU_frequency_scaling#Scaling_governors

Friday, February 26, 2010

Calculating distance between two IPs using Maxmind database

This is a faster mechanism to find distance between two IPs instead of using Haversine distance to calculate the distance between two points (each defined by a lat, long tuple) on the globe .

Thursday, February 25, 2010

Hadoop: Effectiveness of a Combiner Class

The execution speed difference for an MR job with and without a combiner class is huge. The Security log analytics without a combiner class  did not complete in 1.5 days. With the addition of a Combiner class, the code finished in 15-20 minutes! Now, the reasons for this performance enhancement are obvious.
  • <K, V> are in memory and network latency and traffic to reducers is decreased.
  • Disk operations are minimal at the reducers as a result of combine operations.

Saturday, February 20, 2010

Hadoop: using ChainMapper

ChainMapper's are a way to perform: [MAP+ / REDUCE MAP*] operations.

  • Find below an example main function written to handle a chainmapper.

Notes:


  • While a chainmapper can be used to simplify processing. Usually be deftly handling the data, most [MAP+ / REDUCE MAP*] can be reduced to [MAP / REDUCE MAP]

Hadoop: MaxMind GeoIP lookup from Distributed Cache

Place GeoCityLite.dat in HDFS:

  • Create a JobClient Object and add the files' URI to the distributedCache.



  • Create a configure method that overrides org.apache.hadoop.mapred.MapReduceBase and org.apache.hadoop.mapred.jobConfigurable


  • create a Location Object and pass the IP to it.
  • A significant IPs might come back with a null city/ country. Use try and catch blocks to catch exceptions and process them accordingly.
Notes:
  • Reduce the creation of the LookupService objects. These are resource intensive.
  • Similarly, reduce the creation of Location Objects.

Saturday, December 26, 2009

Gtalk in Java Code

My analytics take a long time to execute due to the large datasets. To ease the constant need to check the execution status, I wrote a small util class to notify the status via gtalk. Gtalk is based on Jabber protocol. Smack API is a java implementation of the Jabber protocol and can be downloaded here: http://www.igniterealtime.org/projects/smack/