Launching in a VPC

Overview


Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the Amazon Web Services (AWS) Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways.

We recommend using Matillion in a VPC for production environments. However set-ups can vary so this document demonstrates how to set up a minimal VPC as an example and also discusses some of the options.


Internet Access in a VPC


By default an EC2 instance launched into a VPC does not have internet access unless an Elastic IP is associated with that instance. It is possible but not recommended to run Matillion ETL without internet however there are some limitations. If you do not allow internet access the following features and component will not work. (these are components and features that rely on the AWS API).
  • Cluster discovery when setting up the environment - Redshift environment details will need to be entered manually.
  • SQS Discovery - Uses the API to lists existing queues and listen for messages.
  • SNS Message Component - Uses the API to create an SNS endpoint and send messages.
  • SQS Message Component - Uses the API to send a message to an SQS queue.
  • RDS Query Component  - Uses the API to upload the data to Amazon S3.
  • SQS Discovery - Uses the API to lists existing queues and listen for messages.
  • SNS Message Component - Uses the API to create an SNS endpoint and send messages.
  • SQS Message Component - Uses the API to send a message to an SQS queue.
  • RDS Query Component  - Uses the API to upload the data to Amazon S3.
 

Setting up the VPC


These instructions set up a VPC using the Amazon Command Line Interface (CLI). For this we use the AWS CLI installation and setup are assumed.
See below for instructions on launching a Redshift cluster in this same manner.

The following commands can be used to set up the VPC

First we create a new vpc with  10.0.*.* private address block. Note the VPC ID when create.
aws ec2 create-vpc --cidr-block 10.0.0.0/16 

We add a matching subnet note the subnet ID when created.
aws ec2 create-subnet --vpc-id <VPC ID> --cidr-block 10.0.0.0/16

Next we need an internet gateway to allow us to connect out to the internet. Note the gateway id.
aws ec2 create-internet-gateway

Now we attach the internet gateway to the subnet
aws ec2 attach-internet-gateway --internet-gateway-id  <gateway ID> --vpc-id  <VPC ID>

Add default route to route table so that the VPC can route traffic to the internet. First we need to find the route table using. Note the route table ID.
aws ec2 describe-route-tables 

Create the route in the route table
aws ec2 create-route --route-table-id <route table ID> --destination-cidr-block 0.0.0.0/0 --gateway-id <gateway ID>

Find the security group that was created with the VPC.
aws ec2 describe-security-groups --filters Name=vpc-id,Values=<VPC ID>

Then we add a rule so Matillion ETL can connect to the cluster.
aws ec2 authorize-security-group-ingress --group-id <Security Group Id>  --protocol tcp --port 5439 --cidr 10.0.0.0/16

Create Redshift cluster


Next we create a redshift cluster in the VPC called My-Redshift-Cluster  however before we do that we need to create a cluster subnet group for the cluster to live in.
 
aws redshift create-cluster-subnet-group --cluster-subnet-group-name mysubnetgroup  --description "My subnet group" --subnet-ids <subnet ID>

Now launch the cluster. This is a single node cluster. For more nodes remove --cluster-type single-node and add --number-of-nodes x
aws redshift create-cluster --node-type dc1.large  --master-username admin --master-user-password Password1 --cluster-type single-node --cluster-identifier My-Redshift-Cluster --db-name redshift --cluster-subnet-group-name mysubnetgroup 

Now update the default security group for the cluster so the Redshift cluster can talk locally.
aws redshift authorize-cluster-security-group-ingress --cluster-security-group-name default --cidrip 10.0.0.0/16
Now You can launch Matillion ETL into the VPC