hadoop - Why use MapReduce vs HBase shell filter -
i need query data on hbase. queries this:
- show books of "authord".
- how many books of author "authora" in database?
as far know can mapreduce or hbase shell filter. please correct me if i'am wrong.
my question is: why use mapreduce (programming needed) if can same on hbase shell (no programming needed) using it's filters?
thank answers. have nice day.
there 3 ways results hbase.
1) shell : simple sure data analysis small volumes of data. small amount of data , developer analysis. if know rowkey directly can data quickly
2) hbase non batch clients : example java client connect hbase apply filters , results small amount of data.
why mapreduce hbase api
what happens if data huge , need process... in case, either hbase shell hang , become un-responsive or continuous flow of data there scrolling.. cant able see , analyze..
3) mapreduce (batch client) : processing huge volume of data. can use same filter , scan object used in java hbase client program... results.
advantages/reason use mapreduce hbase :
batch/parllel processing
at least results stored part-files in hdfs(if mention hdfs sink)
you can aggregate results of ex: staging table summary table etl pipe line...
one classic examples of above explanation counting number of rows...
just think why hbase team has given mapreduce job provision of counting rows same can achieved hbase shell...
mapreduce way :
$ hbase org.apache.hadoop.hbase.mapreduce.rowcounter
usage: rowcounter [options] <tablename> [ --starttime=[start] --endtime=[end] [--range=[startkey],[endkey]] [<column1> <column2>...] ]
hbase shell way : through hbase shell can
$ count 'hbase table'
i hope answer :-)
Comments
Post a Comment