apache spark - PySpark createExternalTable() from SQLContext -
using spark 1.6.1. have bunch of tables in mariadb wish convert pyspark dataframe objects. createexternaltable() throwing. example:
in [292]: tn = sql.tablenames()[10] in [293]: df = sql.createexternaltable(tn) /home/charles/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 306 raise py4jjavaerror( 307 "an error occurred while calling {0}{1}{2}.\n". --> 308 format(target_id, ".", name), value) 309 else: 310 raise py4jerror( py4jjavaerror: error occurred while calling o18.createexternaltable. : java.lang.runtimeexception: tables created sqlcontext must temporary. use hivecontext instead. @ scala.sys.package$.error(package.scala:27) @ org.apache.spark.sql.execution.sparkstrategies$ddlstrategy$.apply(sparkstrategies.scala:379) @ org.apache.spark.sql.catalyst.planning.queryplanner$$anonfun$1.apply(queryplanner.scala:58) @ org.apache.spark.sql.catalyst.planning.queryplanner$$anonfun$1.apply(queryplanner.scala:58) @ scala.collection.iterator$$anon$13.hasnext(iterator.scala:371) @ org.apache.spark.sql.catalyst.planning.queryplanner.plan(queryplanner.scala:59) @ org.apache.spark.sql.execution.queryexecution.sparkplan$lzycompute(queryexecution.scala:47) @ org.apache.spark.sql.execution.queryexecution.sparkplan(queryexecution.scala:45) @ org.apache.spark.sql.execution.queryexecution.executedplan$lzycompute(queryexecution.scala:52) @ org.apache.spark.sql.execution.queryexecution.executedplan(queryexecution.scala:52) @ org.apache.spark.sql.execution.queryexecution.tordd$lzycompute(queryexecution.scala:55) @ org.apache.spark.sql.execution.queryexecution.tordd(queryexecution.scala:55) @ org.apache.spark.sql.sqlcontext.createexternaltable(sqlcontext.scala:695) @ org.apache.spark.sql.sqlcontext.createexternaltable(sqlcontext.scala:668) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:381) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:209) @ java.lang.thread.run(thread.java:745)
same thing happens if specify source='jdbc'.
the table exists:
in [297]: sql.sql("select * {} limit 5".format(tn)).show() +--------------------+-----+-----+-----+----+------+------+------+----------------------+----+----+----+-----+ | date| open| high| low|last|change|settle|volume|prev_day_open_interest|prod|exch|year|month| +--------------------+-----+-----+-----+----+------+------+------+----------------------+----+----+----+-----+ |1999-10-29 00:00:...|245.0|245.0|245.0|null| null| 245.0| 1.0| 1.0| c| cme|2001| h| |1999-11-01 00:00:...|245.0|245.0|245.0|null| null| 245.0| 0.0| 1.0| c| cme|2001| h| |1999-11-02 00:00:...|245.0|245.0|245.0|null| null| 245.0| 0.0| 1.0| c| cme|2001| h| |1999-11-03 00:00:...|245.0|245.5|245.0|null| null| 245.5| 5.0| 6.0| c| cme|2001| h| |1999-11-04 00:00:...|245.5|245.5|245.5|null| null| 245.5| 0.0| 6.0| c| cme|2001| h| +--------------------+-----+-----+-----+----+------+------+------+----------------------+----+----+----+-----+
according error, should work hive data. i'm not using hivecontext, sqlcontext. according https://spark.apache.org/docs/latest/api/python/pyspark.sql.html supported ver >= 1.3.
is there way extract dataframe sqltable?
given description want here not createexternaltable
used manage hive tables simple table:
df = sqlcontext.table(tn)
or assign result of sql
call:
df = sqlcontext.sql("select * {}".format(tn))
Comments
Post a Comment