Spark SQL is a Spark module for structured data processing.
The recommended way to use SparkSQL is through programmatic APIs. For examples, see "Accessing ORC Files from Spark."
An alternate way to access SparkSQL, especially for a Beeline scenario, is through the Spark Thrift Server. For more information about Accessing Spark SQL through the Spark Thrift Server, see "Accessing Spark SQL through the Spark Thrift Server."
To use Hive UDF/UDAF/UDTF natively with Spark SQL:
Open
spark-shell
withhive-udf.jar
as its parameter:spark-shell --jars <path-to-your-hive-udf>.jar
From
spark-shell
, create functions:sqlContext.sql("""create temporary function balance as 'com.github.gbraccialli.hive.udf.BalanceFromRechargesAndOrders'""");
From
spark-shell
, use your UDFs directly in SparkSQL:sqlContext.sql(""" create table recharges_with_balance_array as select reseller_id, phone_number, phone_credit_id, date_recharge, phone_credit_value, balance(orders,'date_order', 'order_value', reseller_id, date_recharge, phone_credit_value) as balance from orders """);
The Spark Thrift Server provides JDBC access to Spark SQL.
The following example uses the Thrift Server over the HiveServer2 Beeline command-line interface.
Enable and start the Spark Thrift Server as specified in "Starting the Spark Thrift Server" (in the Non-Ambari Cluster Installation Guide).
Connect to the Thrift Server over Beeline. Launch Beeline from
SPARK_HOME
.su spark ./bin/beeline
Issue SQL commands on the Beeline prompt:
beeline> !connect jdbc:hive2://localhost:10015
Note This example does not have security enabled, so any username-password combination should work.
Issue a request. The following example issues a SHOW TABLES query on the HiveServer2 process:
0: jdbc:hive2://localhost:10015> show tables; +------------+--------------+ | tableName | isTemporary | +------------+--------------+ | orc_table | false | | testtable | false | +------------+--------------+ 2 rows selected (1.275 seconds)
Exit the Thrift Server:
0: jdbc:hive2://localhost:10015> exit
Stop the Thrift Server:
./sbin/stop-thriftserver.sh