AWS Auto-healing VPC NAT Instance

If the NAT instance is terminated or stopped, the status will become”Black Hole” in the route table. All instances in private subnets that associate with the route table will no longer be able to connect to the Internet until the route is updated with another NAT instance.

black hole

In order to make the NAT instance resilient, we can leverage an Auto Scaling Group to make the NAT instance self-heal itself. With MinSize and MaxSize set to 1 on the Auto Scaling Group, it will ensure there is always a NAT instance ready.  The NAT instance will need to  run a shell script to repair the route table automatically when it starts.

# This script is to set a default route for route tables that need NAT in the same VPC.
REGION=`curl|grep region|awk -F\" '{print $4}'`
# Print help
function usage()
echo ""
echo "$0 [options]"
echo " --tag-key - route table tag key to indicate if it is a private subnet (default: $TAG_KEY)"
echo " --tag-value - route table tag value to indicate if it is a private subnet (default: $TAG_VALUE)"
echo ""
# Get options
while [ "$1" != "" ]; do
case $1 in
--tag-key) shift
--tag-value) shift
*) usage
# Determine route tables that need to use NAT for the same VPC
ROUTE_TABLES=`aws ec2 describe-route-tables --filters "Name=tag:$TAG_KEY,Values=$TAG_VALUE" --region $REGION --output text | grep ROUTETABLES | grep $VPC_ID | awk '{print $2}'`
TARGET=`aws ec2 describe-route-tables --filters "Name=route-table-id,Values=$ROUTE_TABLE" --region $REGION --output text | grep "" | awk '{print $3}'`
echo "Checking $ROUTE_TABLE"
if [ "$TARGET" = "" ]; then
# Create default route
echo "No default route is detected. Creating default route for $ROUTE_TABLE"
aws ec2 create-route --route-table-id $ROUTE_TABLE --destination-cidr-block --instance-id $INSTANCE_ID --region $REGION
elif [ "$TARGET" != "$INSTANCE_ID" ]; then
# Replace default route
echo "Default route is set to $TARGET. Replacing default route to $INSTANCE_ID"
aws ec2 replace-route --route-table-id $ROUTE_TABLE --destination-cidr-block --instance-id $INSTANCE_ID --region $REGION
echo "No change is required"
# disable source destination check
aws ec2 modify-instance-attribute --instance-id $INSTANCE_ID --no-source-dest-check --region $REGION

It updates route tables that are tagged with network=private in the same VPC. Use –tag-key and –tag-value to overwrite the tag key and tag value respectively if you want to use a different tag and value.  I recommend to set up a cron job in the NAT instance to run the shell script periodically so that any new route tables that are tagged will get the default route set automatically. The latest version of the shell script can be found in

The NAT instance will need to have the following permissions to perform route table update:

"Effect": "Allow",
"Action": [
"Resource": "*"

CloudFormation template can be found in It creates the following resources and a cron job to execute the shell script to adjust route tables if needed every 5 minutes:

  • Auto Scaling Group
  • Launch Configuration
  • Security Group
  • IAM Instance Policy
  • IAM Role

NOTE: You may need to replace the NAT AMI IDs with the latest ones. On the Choose an Amazon Machine Image (AMI) page, select the Community AMIs category, and search for amzn-ami-vpc-nat to find the most recent AMIs.


This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

1 Response to AWS Auto-healing VPC NAT Instance

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s