Chapter 7. Accessing Hive Tables from Spark

This chapter describes how to access Hive data from Spark.

  • Spark SQL is a Spark module for structured data processing. It supports Hive data formats, user-defined functions (UDFs), and the Hive metastore, and can act as a distributed SQL query engine. You can also use Spark SQL to incorporate Hive table data into DataFrames (see "Using the Spark DataFrame API").

  • "Hive on Spark" enables Hive to run on Spark; Spark operates as an execution backend for Hive queries.