Simplify incoming data ingestion with dynamic parameterized datasets in AWS Glue DataBrew

amazon.com
 3 days ago

Cover picture for the articleWhen data analysts and data scientists prepare data for analysis, they often rely on periodically generated data produced by upstream services, such as labeling datasets from Amazon SageMaker Ground Truth or Cost and Usage Reports from AWS Billing and Cost Management. Alternatively, they can regularly upload such data to Amazon Simple Storage Service (Amazon S3) for further processing. In this post, we demonstrate how you can prepare data for files that are already in your S3 bucket as well as new incoming files using AWS Glue DataBrew, a visual data preparation service that provides over 250 transformations for cleaning and normalizing data. The recently launched dynamic datasets feature of DataBrew allows you to effectively reuse such datasets by using a single recipe in multiple runs of the same job. In this post, I show how to do this and highlight other use cases for the dynamic datasets feature.

aws.amazon.com
