Thursday, October 29, 2015

Using Apache Zeppelin for Spark, SparkML and SparkSQL

Apache Zeppelin[1] is quite a handy tool for exploratory analysis of data using Spark, SparkML and SparkSQL.

Sample Visualization in Apache Zeppelin generated using SparkSQL:

Create RDD, dataframe and register as a temp table:
Query your data using SparkSQL:
Use SparkML for analytics:

Once the Spark SQLContext has the data, Zeppelin can be used to visualize the data.
[1] Apache Zeppelin: https://zeppelin.incubator.apache.org/