hadoop - java.io.IOException: Not a data file -


i processing bunch of avro files stored in nested directory structure in hdfs. files stored in year/month/day/hour format directory structure.

i wrote simple code process

sc.hadoopconfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true") val rootdir = "/user/cloudera/rootdir" val rdd1 = sc.newapihadoopfile[avrokey[genericrecord], nullwritable, avrokeyinputformat[genericrecord]](rootdir) rdd1.count() 

i exception have pasted below. biggest problem facing doesn't tell me file not data file. have go in hdfs , scan through 1000s of files see 1 not data file.

is there more efficient way debug/solve this?

5/11/01 19:01:49 warn tasksetmanager: lost task 1084.0 in stage 14.0 (tid 11562, datanode): java.io.ioexception: not data file.     @ org.apache.avro.file.datafilestream.initialize(datafilestream.java:102)     @ org.apache.avro.file.datafilereader.<init>(datafilereader.java:97)     @ org.apache.avro.mapreduce.avrorecordreaderbase.createavrofilereader(avrorecordreaderbase.java:183)     @ org.apache.avro.mapreduce.avrorecordreaderbase.initialize(avrorecordreaderbase.java:94)     @ org.apache.spark.rdd.newhadooprdd$$anon$1.<init>(newhadooprdd.scala:133)     @ org.apache.spark.rdd.newhadooprdd.compute(newhadooprdd.scala:104)     @ org.apache.spark.rdd.newhadooprdd.compute(newhadooprdd.scala:66)     @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:277)     @ org.apache.spark.rdd.rdd.iterator(rdd.scala:244)     @ org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrdd.scala:35)     @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:277)     @ org.apache.spark.rdd.rdd.iterator(rdd.scala:244)     @ org.apache.spark.scheduler.shufflemaptask.runtask(shufflemaptask.scala:68)     @ org.apache.spark.scheduler.shufflemaptask.runtask(shufflemaptask.scala:41)     @ org.apache.spark.scheduler.task.run(task.scala:64)     @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:203)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615)     @ java.lang.thread.run(thread.java:745) 

one of nodes on cluster block located down. data not found because of that, gives error. solution repair , bring nodes in cluster.

i getting exact error below java map reduce program uses avro input. below rundown of issue.

error: java.io.ioexception: not data file.    @ org.apache.avro.file.datafilestream.initialize(datafilestream.java:102) @ org.apache.avro.file.datafilereader.<init>(datafilereader.java:97) @ org.apache.avro.mapreduce.avrorecordreaderbase.createavrofilereader(avrorecordreaderbase.java:183)   @ org.apache.avro.mapreduce.avrorecordreaderbase.initialize(avrorecordreaderbase.java:94) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask.java:548)     @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:786)     @ org.apache.hadoop.mapred.maptask.run(maptask.java:341)   @ org.apache.hadoop.mapred.yarnchild$2.run(yarnchild.java:168)    @  java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:422)   @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1657) @ org.apache.hadoop.mapred.yarnchild.main(yarnchild.java:162) 

i decided cat file because able run program on file in same folder of hdfs , receive following.

info hdfs.dfsclient: no node available <block location in cluster> node: java.io.ioexception: no live nodes contain  block bp-6168826450-10.1.10.123-1457116155679:blk_1073853378_112574  after checking nodes = [], ignorednodes = null no live nodes contain  current block block locations: dead nodes: . new block  locations namenode , retry... 

we have been having problems our cluster , unfortunately nodes down. after remedy of problem error resolved


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -