jackson - Spark crash while reading json file when linked with aws-java-sdk -
let config.json
small json file :
{ "toto": 1 }
i made simple code read json file sc.textfile
(because file can on s3, local or hdfs, textfile convenient)
import org.apache.spark.{sparkcontext, sparkconf} object testawssdk { def main( args:array[string] ):unit = { val sparkconf = new sparkconf().setappname("test-aws-sdk").setmaster("local[*]") val sc = new sparkcontext(sparkconf) val json = sc.textfile("config.json") println(json.collect().mkstring("\n")) } }
the sbt file pull spark-core
library
librarydependencies ++= seq( "org.apache.spark" %% "spark-core" % "1.5.1" % "compile" )
the program works expected, writing content of config.json on standard output.
now want link aws-java-sdk, amazon's sdk access s3.
librarydependencies ++= seq( "com.amazonaws" % "aws-java-sdk" % "1.10.30" % "compile", "org.apache.spark" %% "spark-core" % "1.5.1" % "compile" )
executing same code, spark throws following exception.
exception in thread "main" com.fasterxml.jackson.databind.jsonmappingexception: not find creator property name 'id' (in class org.apache.spark.rdd.rddoperationscope) @ [source: {"id":"0","name":"textfile"}; line: 1, column: 1] @ com.fasterxml.jackson.databind.jsonmappingexception.from(jsonmappingexception.java:148) @ com.fasterxml.jackson.databind.deserializationcontext.mappingexception(deserializationcontext.java:843) @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.addbeanprops(beandeserializerfactory.java:533) @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.buildbeandeserializer(beandeserializerfactory.java:220) @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.createbeandeserializer(beandeserializerfactory.java:143) @ com.fasterxml.jackson.databind.deser.deserializercache._createdeserializer2(deserializercache.java:409) @ com.fasterxml.jackson.databind.deser.deserializercache._createdeserializer(deserializercache.java:358) @ com.fasterxml.jackson.databind.deser.deserializercache._createandcache2(deserializercache.java:265) @ com.fasterxml.jackson.databind.deser.deserializercache._createandcachevaluedeserializer(deserializercache.java:245) @ com.fasterxml.jackson.databind.deser.deserializercache.findvaluedeserializer(deserializercache.java:143) @ com.fasterxml.jackson.databind.deserializationcontext.findrootvaluedeserializer(deserializationcontext.java:439) @ com.fasterxml.jackson.databind.objectmapper._findrootdeserializer(objectmapper.java:3666) @ com.fasterxml.jackson.databind.objectmapper._readmapandclose(objectmapper.java:3558) @ com.fasterxml.jackson.databind.objectmapper.readvalue(objectmapper.java:2578) @ org.apache.spark.rdd.rddoperationscope$.fromjson(rddoperationscope.scala:82) @ org.apache.spark.rdd.rddoperationscope$$anonfun$5.apply(rddoperationscope.scala:133) @ org.apache.spark.rdd.rddoperationscope$$anonfun$5.apply(rddoperationscope.scala:133) @ scala.option.map(option.scala:145) @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:133) @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:108) @ org.apache.spark.sparkcontext.withscope(sparkcontext.scala:709) @ org.apache.spark.sparkcontext.hadoopfile(sparkcontext.scala:1012) @ org.apache.spark.sparkcontext$$anonfun$textfile$1.apply(sparkcontext.scala:827) @ org.apache.spark.sparkcontext$$anonfun$textfile$1.apply(sparkcontext.scala:825) @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:147) @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:108) @ org.apache.spark.sparkcontext.withscope(sparkcontext.scala:709) @ org.apache.spark.sparkcontext.textfile(sparkcontext.scala:825) @ testawssdk$.main(testawssdk.scala:11) @ testawssdk.main(testawssdk.scala) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:497) @ com.intellij.rt.execution.application.appmain.main(appmain.java:140)
reading stack, seems when aws-java-sdk linked, sc.textfile
detects file json file , try parse jackson assuming format, cannot find of course. need link aws-java-sdk, questions are:
1- why adding aws-java-sdk
modifies behavior of spark-core
?
2- there work-around (the file can on hdfs, s3 or local)?
talked amazon support. depency issue jackson library. in sbt, override jackson:
librarydependencies ++= seq( "com.amazonaws" % "aws-java-sdk" % "1.10.30" % "compile", "org.apache.spark" %% "spark-core" % "1.5.1" % "compile" ) dependencyoverrides ++= set( "com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4" )
their answer: we have done on mac, ec2 (redhat ami) instance , on emr (amazon linux). 3 different environments. root cause of issue sbt builds dependency graph , deals issue of version conflicts evicting older version , picking latest version of dependent library. in case, spark depends on 2.4 version of jackson library while aws sdk needs 2.5. there version conflict , sbt evicts spark's dependency version (which older) , picks aws sdk version (which latest).
Comments
Post a Comment