Saturday, February 20, 2010

Hadoop: MaxMind GeoIP lookup from Distributed Cache

Place GeoCityLite.dat in HDFS:

  • Create a JobClient Object and add the files' URI to the distributedCache.



  • Create a configure method that overrides org.apache.hadoop.mapred.MapReduceBase and org.apache.hadoop.mapred.jobConfigurable


  • create a Location Object and pass the IP to it.
  • A significant IPs might come back with a null city/ country. Use try and catch blocks to catch exceptions and process them accordingly.
Notes:
  • Reduce the creation of the LookupService objects. These are resource intensive.
  • Similarly, reduce the creation of Location Objects.