jackson - Spark crash while reading json file when linked with aws-java-sdk -


let config.json small json file :

{     "toto": 1 } 

i made simple code read json file sc.textfile (because file can on s3, local or hdfs, textfile convenient)

import org.apache.spark.{sparkcontext, sparkconf}  object testawssdk {   def main( args:array[string] ):unit = {     val sparkconf = new sparkconf().setappname("test-aws-sdk").setmaster("local[*]")     val sc = new sparkcontext(sparkconf)     val json = sc.textfile("config.json")      println(json.collect().mkstring("\n"))   } } 

the sbt file pull spark-core library

librarydependencies ++= seq(   "org.apache.spark" %% "spark-core" % "1.5.1" % "compile" ) 

the program works expected, writing content of config.json on standard output.

now want link aws-java-sdk, amazon's sdk access s3.

librarydependencies ++= seq(   "com.amazonaws" % "aws-java-sdk" % "1.10.30" % "compile",   "org.apache.spark" %% "spark-core" % "1.5.1" % "compile" ) 

executing same code, spark throws following exception.

exception in thread "main" com.fasterxml.jackson.databind.jsonmappingexception: not find creator property name 'id' (in class org.apache.spark.rdd.rddoperationscope)  @ [source: {"id":"0","name":"textfile"}; line: 1, column: 1]     @ com.fasterxml.jackson.databind.jsonmappingexception.from(jsonmappingexception.java:148)     @ com.fasterxml.jackson.databind.deserializationcontext.mappingexception(deserializationcontext.java:843)     @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.addbeanprops(beandeserializerfactory.java:533)     @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.buildbeandeserializer(beandeserializerfactory.java:220)     @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.createbeandeserializer(beandeserializerfactory.java:143)     @ com.fasterxml.jackson.databind.deser.deserializercache._createdeserializer2(deserializercache.java:409)     @ com.fasterxml.jackson.databind.deser.deserializercache._createdeserializer(deserializercache.java:358)     @ com.fasterxml.jackson.databind.deser.deserializercache._createandcache2(deserializercache.java:265)     @ com.fasterxml.jackson.databind.deser.deserializercache._createandcachevaluedeserializer(deserializercache.java:245)     @ com.fasterxml.jackson.databind.deser.deserializercache.findvaluedeserializer(deserializercache.java:143)     @ com.fasterxml.jackson.databind.deserializationcontext.findrootvaluedeserializer(deserializationcontext.java:439)     @ com.fasterxml.jackson.databind.objectmapper._findrootdeserializer(objectmapper.java:3666)     @ com.fasterxml.jackson.databind.objectmapper._readmapandclose(objectmapper.java:3558)     @ com.fasterxml.jackson.databind.objectmapper.readvalue(objectmapper.java:2578)     @ org.apache.spark.rdd.rddoperationscope$.fromjson(rddoperationscope.scala:82)     @ org.apache.spark.rdd.rddoperationscope$$anonfun$5.apply(rddoperationscope.scala:133)     @ org.apache.spark.rdd.rddoperationscope$$anonfun$5.apply(rddoperationscope.scala:133)     @ scala.option.map(option.scala:145)     @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:133)     @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:108)     @ org.apache.spark.sparkcontext.withscope(sparkcontext.scala:709)     @ org.apache.spark.sparkcontext.hadoopfile(sparkcontext.scala:1012)     @ org.apache.spark.sparkcontext$$anonfun$textfile$1.apply(sparkcontext.scala:827)     @ org.apache.spark.sparkcontext$$anonfun$textfile$1.apply(sparkcontext.scala:825)     @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:147)     @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:108)     @ org.apache.spark.sparkcontext.withscope(sparkcontext.scala:709)     @ org.apache.spark.sparkcontext.textfile(sparkcontext.scala:825)     @ testawssdk$.main(testawssdk.scala:11)     @ testawssdk.main(testawssdk.scala)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)     @ java.lang.reflect.method.invoke(method.java:497)     @ com.intellij.rt.execution.application.appmain.main(appmain.java:140) 

reading stack, seems when aws-java-sdk linked, sc.textfile detects file json file , try parse jackson assuming format, cannot find of course. need link aws-java-sdk, questions are:

1- why adding aws-java-sdk modifies behavior of spark-core?

2- there work-around (the file can on hdfs, s3 or local)?

talked amazon support. depency issue jackson library. in sbt, override jackson:

librarydependencies ++= seq(  "com.amazonaws" % "aws-java-sdk" % "1.10.30" % "compile", "org.apache.spark" %% "spark-core" % "1.5.1" % "compile" )   dependencyoverrides ++= set(  "com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4"  )  

their answer: we have done on mac, ec2 (redhat ami) instance , on emr (amazon linux). 3 different environments. root cause of issue sbt builds dependency graph , deals issue of version conflicts evicting older version , picking latest version of dependent library. in case, spark depends on 2.4 version of jackson library while aws sdk needs 2.5. there version conflict , sbt evicts spark's dependency version (which older) , picks aws sdk version (which latest).


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -