Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Excel Query Component - Loading Multiple Files

Is there a way to use a File Iteration object to process multiple Excel files in a S3 bucket? I can see the properties to process one (1) file using the Excel Query Component, but can't determine how to process through many files.

1 Community Answers

Matillion Agent  

Arawan Gajajiva —

Hi Jeffrey -

Yes, you can indeed use a File Iterator component to iterate over a set of Excel files in an S3 Bucket. Generally speaking, the things you would need to do are:

  1. Create an Orchestration Job
  2. Create a Variable to store the “current_file”
  3. Add a File Iterator component
    • Configuration:
      • Input Data Type = S3
      • Input Data URL = s3://[PATH TO BUCKET]
      • Filter Regex = .*xlsx —> Or whatever pattern meets your needs.
      • Variables
        • Variable = current_file (or whatever you named the variable in Step 2)
        • Path Section = Filename
  4. Add an Excel Query component
    • Connect the File Iterator to the Excel Query Component
    • Configuration:
      • Storage Type = Amazon S3 Storage
      • Storage URL = s3://[PATH TO BUCKET]/$(current_file}
        • where ${current_file} is your Variable
    • Configure the rest of the component as you would to read a single Excel sheet.

When the Orchestration runs, it will iterate through all of the files found in the S3 bucket that matches the defined criteria and load them into Redshift. The primary assumption with this method is that ALL of the Excel files that you have identified to load all share the same structure (worksheet name and field name/order).

Regards,
Arawan

Post Your Community Answer

To add an answer please login