How to import data from MySql into Hive using Apache Nifi? -
i trying import data mysql hive using querydatabasetable
, puthiveql
processors, error occurs.
i have questions:
- what output format of
puthiveql
? - should output table created beforehand or processor that?
- where can find template mysql hive process?
here information questions:
the flow files input puthiveql output after have been sent hive (or if send fails), output format (and contents) identical input format/contents.
the output table should created beforehand, send puthiveql "create table if not exists" statement first , create table you.
i'm not aware of existing template, basic approach following:
querydatabasetable -> convertavrotojson -> splitjson -> evaluatejsonpath -> updateattribute (optional) -> replacetext -> puthiveql
querydatabasetable incremental fetches of mysql table.
convertavrotojson records format can manipulate (there aren't many processors handle avro)
splitjson create flow file each of records/rows
evaluatejsonpath can extract values records , put them in flow file attributes
updateattribute add attributes containing type information. optional, used if using prepared statements puthiveql
replacetext builds hiveql statement (insert, e.g.) either parameters (if want prepared statements) or hard-coded values attributes
puthiveql executes statement(s) records hive
in nifi 1.0, there convertavrotoorc processor, more efficient way data hive (as query hive). approach convert results of querydatabasetable orc files, placed in hdfs (using puthdfs), , generates partial hive ddl statement create table (using type information avro records). pass statement (after filling in target location) puthiveql, , can start querying table.
there plans puthivestreaming processor takes avro records input, flow querydatabasetable -> puthivestreaming, insert records directly hive (and more efficient multiple insert statements).
Comments
Post a Comment