pyspark program throwing name 'spark' is not defined -
below program throwing error name 'spark' not defined
traceback (most recent call last): file "pgm_latest.py", line 232, in <module> sconf =sparkconf().set(spark.dynamicallocation.enabled,true) .set(spark.dynamicallocation.maxexecutors,300) .set(spark.shuffle.service.enabled,true) .set(spark.shuffle.spill.compress,true) nameerror: name 'spark' not defined spark-submit --driver-memory 12g --master yarn-cluster --executor-memory 6g --executor-cores 3 pgm_latest.py
code
#!/usr/bin/python import sys import os datetime import * time import * pyspark.sql import * pyspark import sparkcontext pyspark import sparkconf sc = sparkcontext() sqlctx= hivecontext(sc) sqlctx.sql('set spark.sql.autobroadcastjointhreshold=104857600') sqlctx.sql('set tungsten=true') sqlctx.sql('set spark.sql.shuffle.partitions=500') sqlctx.sql('set spark.sql.inmemorycolumnarstorage.compressed=true') sqlctx.sql('set spark.sql.inmemorycolumnarstorage.batchsize=12000') sqlctx.sql('set spark.sql.parquet.cachemetadata=true') sqlctx.sql('set spark.sql.parquet.filterpushdown=true') sqlctx.sql('set spark.sql.hive.convertmetastoreparquet=true') sqlctx.sql('set spark.sql.parquet.binaryasstring=true') sqlctx.sql('set spark.sql.parquet.compression.codec=snappy') sqlctx.sql('set spark.sql.hive.convertmetastoreparquet=true') ## main functionality def main(sc): if name == 'main': # configure options sconf =sparkconf() \ .set("spark.dynamicallocation.enabled","true")\ .set("spark.dynamicallocation.maxexecutors",300)\ .set("spark.shuffle.service.enabled","true")\ .set("spark.shuffle.spill.compress","true") sc =sparkcontext(conf=sconf) # execute main functionality main(sc) sc.stop()
i think using old spark version 2.x.
instead of this
spark.createdataframe(..)
use below
> df = sqlcontext.createdataframe(...)
Comments
Post a Comment