Whenever an orchestration or transformation operation is performed by Matillion ETL a Task is created. This includes:
- Tasks created by user operations - This includes almost all operations that the user performs that generate database queries such as running a job, retrieving a sample or a row count
- Tasks created by the Scheduler
- For Redshift users, tasks created from Amazon SQS (Simple Queue Service).
As an overall Orchestration or Transformation task is performed (such as executing an Orchestration job) the overall Task is broken down into more granular tasks as the execution continues. For Orchestration jobs the individual tasks are created “on-the-fly” as decision points are reached in the Orchestration such as Ifs,Loops,Ands and Ors.
Task information can be gathered and scrutinised in several ways. Task information is immediately available in the Task panel at the bottom-right of the client interface, giving concise information on recently-run tasks and how they have progressed. These tasks can be viewed in greater detail using the Task Info screen. Meanwhile, a fuller report on all tasks logged by the instance can be found in Task History.
Internally Matillion ETL uses queues to manage tasks. The most important thing to understand about Queues is that there is one per environment. This means that when running multiple tasks, tasks for all users will execute in sequence (not parallel). If there is a long running task subsequent tasks will be queued behind it. This will show in the task panel with the waiting icon.
The reason for this behaviour is because it is usually most efficient both on loads and transformations to let the parallel database engine manage the concurrency. If this behaviour is undesirable it can be worked around by using multiple environments however this is not recommended.
In addition, tasks initiated by the Scheduler and the SQS Queue listener will also queue behind long running tasks in the same environment.
For these reasons it’s strongly recommended that you set up multiple environments to separate your development and production work so the task queues for your production environment cannot be effected by your development work. These separate environments can connect to the same cluster and database however it is strongly recommended that they point to different default Schemas. For more information on environments, please refer to the Managing Environments documentation.
The Task panel shows the last 20 tasks run, running, queued, cancelled or failed since joining the current session.
All tasks can be expanded to show sub tasks.
- Failed Tasks are display with a red icon . Once expanded the failing subtask is shown in red along with the component that failed and the error message.
- Successful tasks show a green tick
- Running task have an animated icon , expanding the task manager will show a list of sub-tasks with the colours indicating where the task process is up to.
- Task Info (see below section) can be accessed by clicking the icon beside each job
- An Hour Glass Icon means the task is in queue and will be executed when a free thread is available.
- Note: as orchestration tasks are calculated on the fly, not all future tasks may be listed at a point in time however this will be automatically appended to the list as tasks complete.
- Any output from a python script component is written to the task panel.
- Some Orchestration components write status information to the task panel such as the RDS Query Component.
Running tasks can be cancelled by right clicking on them and selecting cancel. Currently this does not apply to loads using the JDBC, RDS or Python scripts. If running Matillion ETL, tasks are cancelled using a PG_CANCEL_BACKEND call (see here)
When a task is cancelled all queued sub-tasks are also cancelled including any remaining loop iterations.
Completed tasks, whether successful or failed, are shown in the task history along with all their detail. This can be accessed via Project → Task History.
The Task History appears as a new tab in the project UI. The Task History displays each task that has been run or scheduled in this project along with details of each task. Clicking a particular task will expand its row, giving further details of the task including any error messages that may have prevented the task completing.
Hovering the mouse over a column name will present the user with two additional features:'Columns' and 'Filters'. The Columns option allows users to add or remove columns from the Task History. The Filters option allows the user to filter the tasks according to criteria that is sensitive to the column the user has selected. Columns that list dates can be filtered according to when the task occurred, while pure text columns such as 'Version' can be matched against a given string. The 'Task Type' column gives options to be filtered by any existing task type in Matillion ETL.
The final column has no name but will list whether a task completed (green tick) or failed (red cross) and can be filtered to these criteria. This can be particularly useful if the user has queued many jobs and come back later to find some have failed and wants more information about those tasks.
Importing Task Information Via API.
By clicking the icon beside jobs in the Task panel, their task information can be brought up for scrutiny in a new tab. This is particularly useful for surveying complex jobs with many components.
In this new tab the job can be extended to show its constituent components and a summary of how they performed in the task. The success and failure of each job and component is given by green ticks or red crosses, respectively.
A recorded run time for each component and job is also given and failed components will have an error message returned that can be expanded using the ellipsis button beside the Message field.