Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

Excel Query Component - Loading Multiple Files

Is there a way to use a File Iteration object to process multiple Excel files in a S3 bucket? I can see the properties to process one (1) file using the Excel Query Component, but can't determine how to process through many files.

3 Community Answers

Matillion Agent  

Arawan Gajajiva —

Hi Jeffrey -

Yes, you can indeed use a File Iterator component to iterate over a set of Excel files in an S3 Bucket. Generally speaking, the things you would need to do are:

  1. Create an Orchestration Job
  2. Create a Variable to store the “current_file”
  3. Add a File Iterator component
    • Configuration:
      • Input Data Type = S3
      • Input Data URL = s3://[PATH TO BUCKET]
      • Filter Regex = .*xlsx —> Or whatever pattern meets your needs.
      • Variables
        • Variable = current_file (or whatever you named the variable in Step 2)
        • Path Section = Filename
  4. Add an Excel Query component
    • Connect the File Iterator to the Excel Query Component
    • Configuration:
      • Storage Type = Amazon S3 Storage
      • Storage URL = s3://[PATH TO BUCKET]/$(current_file}
        • where ${current_file} is your Variable
    • Configure the rest of the component as you would to read a single Excel sheet.

When the Orchestration runs, it will iterate through all of the files found in the S3 bucket that matches the defined criteria and load them into Redshift. The primary assumption with this method is that ALL of the Excel files that you have identified to load all share the same structure (worksheet name and field name/order).


selen ugur —


When I did what you said there is some problem occurs like

from the task it shows like load all the files from the s3 file but when I select the table ıt just loaded the first one I choose in the excel component.

what can I do please help.


Matillion Agent  

Kalyan Arangam —

Hi Selen,

By default, the excel query component will recreate the target table. So you may only see the data from last file that was loaded.
Please try this additional small change and see if it creates one table per file.

Set the tablename property as follows – ${current_file}_xls

Next time you run the job, you should have multiple tables, one per file.

Does that help?


Post Your Community Answer

To add an answer please login