AWS Auto-healing VPC NAT Instance

If the NAT instance is terminated or stopped, the status will become”Black Hole” in the route table. All instances in private subnets that associate with the route table will no longer be able to connect to the Internet until the route is updated with another NAT instance.

black hole

In order to make the NAT instance resilient, we can leverage an Auto Scaling Group to make the NAT instance self-heal itself. With MinSize and MaxSize set to 1 on the Auto Scaling Group, it will ensure there is always a NAT instance ready.  The NAT instance will need to  run a shell script to repair the route table automatically when it starts.

It updates route tables that are tagged with network=private in the same VPC. Use –tag-key and –tag-value to overwrite the tag key and tag value respectively if you want to use a different tag and value.  I recommend to set up a cron job in the NAT instance to run the shell script periodically so that any new route tables that are tagged will get the default route set automatically. The latest version of the shell script can be found in

The NAT instance will need to have the following permissions to perform route table update:

CloudFormation template can be found in It creates the following resources and a cron job to execute the shell script to adjust route tables if needed every 5 minutes:

  • Auto Scaling Group
  • Launch Configuration
  • Security Group
  • IAM Instance Policy
  • IAM Role

NOTE: You may need to replace the NAT AMI IDs with the latest ones. On the Choose an Amazon Machine Image (AMI) page, select the Community AMIs category, and search for amzn-ami-vpc-nat to find the most recent AMIs.


Posted in Uncategorized | Tagged , , , , , , , , | 1 Comment

Prepare Mac OSX to Test Chef recipes for AWS OpsWorks

This post is a supplement to Cookbooks 101 in AWS OpsWorks documentation to prepare a Mac OSX to test Chef recipes for OpsWorks using the following tools:

Installing Vagrant, VirtualBox and Chef Development Kit

Download the latest Vagrant, VirtualBox and Che Development Kit from the above links and install them. They can be installed by double clicking the downloaded dmg package of each tool. An installation window will appear similar to the screenshots below. Double click each package and follow the on-screen instructions to complete the installation.

chefdk vagrant

Installing Vagrant Plugins

From Terminal, execute the following commands to install plugins to integrate with Berfshelf and Omnibus respectively:

sudo vagrant plugin install vagrant-berkshelf

sudo vagrant plugin install vagrant-omnibus

Vagrant Berkshelf automatically downloads and installs cookbooks into Vagrant Virtual Machines.

Vagrant Omibus automatically installs the desired version of Chef via the platform-specific Omnibus packages into Vagrant Virtual Machines.

It may prompt you to install Command Line Developer Tools if you don’t have the tools. Proceed to install the tools and re-run vagrant plugin install commands to install vagrant-berkshelf and vagrant-omibus plugins.


Adjusting DHCP Option for VirtualBox

VirtualBox comes with a default DHCP server that set to the following:

$ VBBoxManage list dhcpservers
NetworkName:    HostInterfaceNetworking-vboxnet0
Enabled:        Yes

However, vboxnet0 is set to a different IP adddress space when it is created:

$ VBoxManage list hostonlyifs
Name:            vboxnet0
GUID:            786f6276-656e-4074-8000-0a0027000000
DHCP:            Disabled
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:00
MediumType:      Ethernet
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet0

In order to reconcile the conflict, execute the following command from Terminal to remove the default DHCP server:

VBoxManage dhcpserver remove --netname HostInterfaceNetworking-vboxnet0

Adjusting VCS files for Berkshelf

If you plan to use git, you may encounter a permission denied error while running vagrant provision:

 stderr: /opt/chefdk/embedded/lib/ruby/2.1.0/fileutils.rb:1402:in `initialize': Permission denied @ rb_sysopen - /home/schen/.berkshelf/vagrant-berkshelf/shelves/berkshelf20150223-20325-2b14d0l-default/test_vm/.git/objects/01/6f2ce56a1eff7451b477a2b52a16ed288117da (Errno::EACCES)
    from /opt/chefdk/embedded/lib/ruby/2.1.0/fileutils.rb:1402:in `open'

If it happens, add ‘**/.git’ to the EXCLUDED_VCS_FILES_WHEN_VENDORING in the /opt/chefdk/embedded/apps/berkshelf/lib/berkshelf/berksfile.rb to instruct Berkshelf to ignore git’s metadata which is stored in the .git folder.

Installing AWS Command Line Interface

If you are planning to interact with Amazon Web Services , you will need to install AWS Command Line Interface (AWC CLI). It can be installed using pip. If your Mac OSX does not have pip, download from and execute the following command from Terminal to install pip:


Once pip is installed, execute the following command from Terminal to install AWS Command Line Interface:

pip install awscli

Configuring AWS Command Line Interface

Execute the following command from Terminal, it will prompt you to enter your AWS profile information:

aws configure

If you have multiple profiles, edit ~/.aws/credentials directly. Refer to AWS Official Documentation for more details.

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Report Number of Objects and Bucket Size for S3

Use the AWS Command Line Interface (AWS CLI) to report number for objects and bucket size for S3 in Amazon Linux.

It will print out three columns separated by tabs. The first column is the bucket name. The second column is the number of objects in the bucket. The third column is the total size of the bucket in GB. Below is the sample output.


Posted in Uncategorized | Tagged , , , , | Leave a comment

IAM Policy for Self-managing Credentials and MFA Device

This is an IAM policy to allow users to manage their own MFA device and credentials including access keys on the AWS console.  The CloudFormation script below creates the policy and assign it to an existing group.

[gist /]

Refer to for reference.

Posted in Uncategorized | Tagged , , , , , | 1 Comment

Setting up EC2 Operator Instance with CloudFormation

This post is to extend my post about Auto Start and Stop Your EC2 Instance. I put together a CloudFormation template to automate the process to set up the EC2 Operator instance. You can find the CloudFormation template in my github repository.

I am going to go through each section of the CloudFormation template. It consists of four sections:

  1. Parameters
  2. Mappings
  3. Resources
  4. Outputs


The Parameters section defines input variables that will be used for creating resources.

  • InstanceType – Instance type for the EC2 Operator Instance
  • KeyName – Key pair to ssh the EC2 Operator Instance’s console
  • SSHLocation – The IP address range that can be used to SSH to the EC2 instance


The Mappings sections define what AMI should be used while launching the EC2 Operator Instance. It is essential a mapping table between regions and AMIs. Each region uses a different AMI. They are just a standard Amazon Linux AMI. If these AMIs are no longer available, please replace them with the latest Amazon Linux AMIs.


The Resources section consists of the resources that will be created:

  1. SecurityGroup – A security group for the EC2 Operator Instance
  2. OperatorInstance – The EC2 Operator Instance
  3. WaitHandle – Wait handle to pair with Wait Condition
  4. WaitCondition – Set how long to wait for the EC2 Operator Instance to set up
  5. OperatorInstanceProfile – Instance Profile for the EC2 Operator Instance
  6. OperatorRole – IAM Role to specify the actions the EC2 Operator Instance can perform

Security Group

The SecurityGroup resource defines the firewall rules for the EC2 Operator Instance:

  • TCP 22 – SSH access

If you are not planning to SSH to the instance, you may want to remove this rule. The source of this rule is taken from the SSHLocation variable you specify in the Parameters section.

Operator Instance

It installs python-pip package and gcc package, which are specified in the Metadata section. The following activities are included in the UserData section:

  • Interpret metadata
  • Install croniter PHP library
  • Download the PHP script template
  • Add crontab to run the PHP script
  • Signal the WaitCondition the EC2 Operator Instance is ready

Wait Handle and Wait Condition

WaitHandle and WaitCondition are always used together. CloudFormation is set to wait 300 seconds for the EC2 Operator Instance to signal ready.

Operator Instance Profile and Operator Role

Instance Profile and IAM Role are always used together. The EC2 Operator Instance needs the following permissions to manipulate instances:

  • ec2:DescribeInstances
  • ec2:StartInstances
  • ec2:StopInstances


The Outputs section outputs a the instance ID of the newly created EC2 Operator Instance.






Posted in Uncategorized | Tagged , , , , , | 1 Comment

Connecting Multiple Windows Azure Virtual Networks with AWS

This post is inspired by the Connecting Windows Azure to Amazon post that Michael Washam wrote. In his post, he showed how to connect a Windows Azure Virtual Network (VNET) to a Virtual Private Cloud (VPC) hosted in Amazon Web Services (AWS) with a site-to-site VPN and OpenSwan.  I will extend his post to show you how to connect multiple Windows Azure VNETs together. I will use the following architecture diagram as an example for illustration.

cloud hub

The address space for the VPC in AWS is, and there are four VNETs in Windows Azure with the following address spaces.

  • Windows Azure VNET 16:
  • Windows Azure VNET 20:
  • Windows Azure VNET 24:
  • Windows Azure VNET 28:

The VNETs do not have to be in the same Windows Azure account, and they don’t have to be in the same region.  The address space has to be unique among VNETs and VPC. You need to create one local network per VNET. You can create a Local Network when you create a VNET, but I recommend to create Local Networks in advance. Then you can just pick a Local Network from the drop down list when you create a VNET.

local networks

There are four VNETs so four Local Networks are required. The Address Space of each Local Network is basically all other address spaces minus the address space of the VNET. I called the Local Network as vnet16-local, vnet20-local, vnet24-local, vnet28-local for VNET 16, VNET 20, VNET 24, and VNET 28 respectively.

  • vnet16-local:,,,
  • vnet20-local:,,,
  • vnet24-local:,,,
  • vnet28-local:,,,

Local Networks in Windows Azure are equivalent to Route Tables in AWS. The VPN Gateway Address is the Elastic IP of the OpenSwan Linux instance in AWS.

In the OpenSwan Linux instance, you will need to create four connections. I recommend to use one configuration file per connection under /etc/ipsec.d folder. The files have to have .conf as an extension.  I called them aws-to-vnet16.conf, aws-to-vnet20.conf, aws-to-vnet24.conf, and aws-to-vnet28.conf respectively. The format of the configuration files are slightly different than the one Michael showed.

  • [CONNECTION NAME] – The name of the IPSec tunnel connection. I would just use the name of the configuration file without the extension, such as aws-to-vnet16.
  • [LOCAL NETWORK ADDRESS SPACE] – The Local Network address spaces. They are defined in the Local Network in Windows Azure. For example:,,,
  • [AZURE VNET GATEWAY] – The Gateway IP Address of  the Windows Azure VNET
  • [AZURE VNET ADDRESS SPACE] -The Windows Azure VNET address space. It is defined in the Virtual Network in Windows Azure. For example:

You will need to update the /etc/ipsec.secrets file to include one entry per VNET with the following format.

  • [AZURE VNET GATEWAY] – The Gateway IP Address of  the Windows Azure VNET
  • [PRE-SHARED KEY] – The pre-shared key of the VNET Gateway. You can retrieve it from MANAGE KEY in the VNET Dashboard.

manage key

This is the sample /etc/ipsec.secrets.

Don’t forget to update the Security Group in AWS to allow UDP 500 and 4500 from Azure VNET Gateways to the OpenSwan Linux instance.

security rules

Restart the ipsec service in the OpenSwan instance to establish IPsec tunnels between VPC and Windows Azure VNETs.

If you need other instances in your VPC to connect to Windows Azure VNETs, you will also need to add route entries to the Route Table in AWS. The Destination is the Windows Azure VNET Address Space, and the Target is the Elastic Network Interface or the Instance of the OpenSwan Linux instance.

route table

You should be able to communicate among VNETs through AWS. Please notice the OpenSwan Linux instance is a single point of failure in the architecture. You may want to consider to create a second elastic network interface for the OpenSwan Linux instance and  set up a standby OpenSwan Linux instance for fail over. When the primary OpenSwan Linux instance is down, you can switch the second elastic network interface to the standby OpenSwan Linux instance to take over the primary OpenSwan Linux instance.

Posted in Uncategorized | Tagged , , , , , , , | 4 Comments

0 KB DATA IN/OUT in Site-to-Site VPN with Cisco ASA 8.4

I was working with a customer to set up a site-to-site VPN between Windows Azure and a corporate network. On the Windows Azure Virtual Network Dashboard, it showed the VPN tunnel was connected but data in and out were 0 KB even after a long time. Firewalls were open to allow the Windows Azure gateway in the corporate network.  What went wrong?


The router on the corporate network was Cisco ASA 5500 Series device with ASA OS version 8.4. A VPN configuration script was downloaded from the Virtual Network Dashboard in Windows Azure but the script was for OS version 8.3.

Obviously, the  script did not work well for OS version 8.4.  It ended up two changes were required for the following sections to resolve the issue.

  • Internet Key Exchange (IKE) configuration
  • Tunnel configuration

Internet Key Exchange (IKE) configuration

In this section, replace isakmp with ikev1 on the second line before policy 10.


Tunnel configuration

In this section, add ikev1 in front of the keyword pre-shared-key.


After re-running the modified script in the Cisco VPN device, the IN/OUT KB started to increase. VMs were able to communicate between the two networks via PING. Everything seemed to work fine.


Posted in Uncategorized | Tagged , , , , , , , | 3 Comments