Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Integration with hive and EMR

We have a use case where we load raw feed data from S3 onto EMR and use hive queries to create temp tables and transform the data before loading it onto Snowflake. I didn't notice any mention of Hive/EMR in the documentation/integration guide. Is there a custom way of doing this?

1 Community Answers

Matillion Agent  

Ian Funnell —

Hi Gurudutt,

Matillion doesn’t have dedicated components for defining or running EMR jobs: Matillion’s transformation components are for Snowflake (or BigQuery, or Redshift).

It sounds like what you’ll need to do is:

  • Continue with your current EMR jobs to process the S3 input data
  • Have the EMR jobs write their output to S3, and then load those files into Snowflake using an S3 Load Component. You could perhaps optionally use an S3 Put Object Component to copy files from HDFS into S3 yourself.
  • Use Snowflake to perform data merging, aggregation and further transformations, inside Matillion Transformation jobs.

Best regards,
Ian

Post Your Community Answer

To add an answer please login