Name is required.
Email address is required.
Invalid email address
Answer is required.
Exceeding max length of 5KB

S3 Load Component in File Iterator doesn't work

I want to use an S3 load component connected to a file iterator so that I can grab the file name via a shared environment variable as instructed in this article:

https://snowflakesupport.matillion.com/customer/en/portal/articles/2877610-adding-filename-as-a-column-to-new-table

I've got all the jobs set up and the validations are all fine, but when I run the job, no rows is loaded at all.

Can someone please help?

Thanks.

3 Community Answers

Matillion Agent  

David Lipowitz —

Hi Tam,

Thanks for the question. There’s two things to try here first.

  1. Set the Force Load property to True on the S3 Load component
  2. Add the full path (though not including the bucket name) to the Pattern property on the S3 Load component

So for 2 above, try setting the the S3 Object Prefix to “s3”//your-bucket-name/" and then the Pattern to “full/path/to/your/filename.txt”.

Can you please give those a try and let us know if you still need more support?

Best Regards,
Dave


Tam Bui —

Thank you Dave!! That definitely worked!

That's weird that it made a difference ... because essentially the resulting connection string will still be s3//<bucket>/<path>/<file> but specifying it using "s3//<bucket>/<path>/" and "<file>" strings doesn't work whereas "s3//<bucket>/" and "<path>/<file>" does. That is strange.

I have a follow on question though, still following the instructions in that article, subsequent to that "S3 Load" component, I execute a Data Transformation Job, which includes 1) reading data from the table that was just loaded 2) use a "Calculator Component" to put the value of the extracted file name "variable" in as a new column and 3) write the resulting data back to the table.

There is a setting in the "Table Output" component called "Truncate", which according to the article should be set to "Append". In my test, this has caused the dataset to be duplicated. But if I set it to "Truncate", no data will be left in the table at all after processing.

And also, the resulting data still have "null" all the way down the Filename column, so I'm wondering if the way the article is accessing the environment variable in the "Calculator Component" is correct (the instruction was to use '${filename}', should those single quotes be there?).

Thanks.


Matillion Agent  

Laura Malins —

Hi Tam

The iterator works by coping each file into the table for the table output, which is why “Append” is required. Otherwise you’d only have the data for the last iteration. You can truncate the table before the iterator to avoid duplicate rows.

The quotes are required in the calculator. This is to make the value of the filename a string in Snowflake. I wonder if the set up of the variable is incorrect. You could do this as a shared environment variable or you could set up a job variable in your transformation and both of your orchestration jobs called filename and then in the “Run Transformation” and “Run Orchestration” components set the variable filename to be ${filename}.

Thanks
Laura

Post Your Community Answer

To add an answer please login