1. Spark SQL

Spark SQL is a Spark module for structured data processing.

The recommended way to use SparkSQL is through programmatic APIs. For examples, see "Accessing ORC Files from Spark."

An alternate way to access SparkSQL, especially for a Beeline scenario, is through the Spark Thrift Server. For more information about Accessing Spark SQL through the Spark Thrift Server, see "Accessing Spark SQL through the Spark Thrift Server."

 1.1. Using Hive UDF/UDAF/UDTF with Spark SQL

To use Hive UDF/UDAF/UDTF natively with Spark SQL:

  1. Open spark-shell with hive-udf.jar as its parameter:

    spark-shell --jars <path-to-your-hive-udf>.jar
  2. From spark-shell, create functions:

    sqlContext.sql("""create temporary function balance as 'com.github.gbraccialli.hive.udf.BalanceFromRechargesAndOrders'""");
  3. From spark-shell, use your UDFs directly in SparkSQL:

    sqlContext.sql("""
    create table recharges_with_balance_array as
    select 
      reseller_id,
      phone_number,
      phone_credit_id,
      date_recharge,
      phone_credit_value,
      balance(orders,'date_order', 'order_value', reseller_id, date_recharge, phone_credit_value) as balance
    from orders
    """);

 1.2. Accessing Spark SQL through the Spark Thrift Server

The Spark Thrift Server provides JDBC access to Spark SQL.

The following example uses the Thrift Server over the HiveServer2 Beeline command-line interface.

  1. Enable and start the Spark Thrift Server as specified in "Starting the Spark Thrift Server" (in the Non-Ambari Cluster Installation Guide).

  2. Connect to the Thrift Server over Beeline. Launch Beeline from SPARK_HOME.

    su spark 
    ./bin/beeline

  3. Issue SQL commands on the Beeline prompt:

    beeline> !connect jdbc:hive2://localhost:10015
    [Note]Note

    This example does not have security enabled, so any username-password combination should work.

  4. Issue a request. The following example issues a SHOW TABLES query on the HiveServer2 process:

    0: jdbc:hive2://localhost:10015> show tables;
    
    +------------+--------------+
    | tableName | isTemporary   |
    +------------+--------------+
    | orc_table | false         |
    | testtable | false         |
    +------------+--------------+
    2 rows selected (1.275 seconds)

  5. Exit the Thrift Server:

    0: jdbc:hive2://localhost:10015> exit
  6. Stop the Thrift Server:

    ./sbin/stop-thriftserver.sh