amazon web services - Passing variables between EC2 instances in multi-step AWS data pipeline -
i have pipeline setup wherein have 3 main stages:
1) take input zipped file, unzip file in s3. run basic verification on each file guarantee integrity, move step 2
2) kick off 2 simultaneous processing tasks on separate ec2 instances (parallelization of step saves lot of time, need efficiency sake). each ec2 instance run data processing steps on of files in s3 unzipped in step 1, files required different each instance.
3) after 2 simultaneous processes both done, spin ec2 instance final data processing. once done, run cleanup job remove unzipped files s3, leaving original zip file in place.
so, 1 of problems we're running have 4 ec2 instances run pipeline process, there global parameters each ec2 instance have access to. if running on single instance, of course use shell variables accomplish task, need separate instances efficiency. our best idea store flat file in s3 bucket has access these global variables , read them on initialization , write them if need change. gross , there seems there should better way, can't figure 1 out yet. saw there's way set parameters can accessed @ part of pipeline, looks can set on per pipeline level, not on granularity of each run of pipeline. have resources here? appreciated.
we able solve using dynamodb keep track of variables/state. pipeline doesn't have mechanism this, other parameter values, unfortunately work per pipeline, not per job. you'll need setup dynamodb instance , use pipeline job id keep track of state, connecting via cli tools or sdk.
Comments
Post a Comment