亚马逊AWS官方博客

EC2自动化注册自定义私有域名A记录和PTR记录

背景

在日常运维过程中,业务有时候需要开启EC2的时候会自动注册一个自定义域名。在默认情况下,AWS开启EC2后会生成一个私有DNS,但是这个私有DNS并不支持生成自定义域名,本文就是通过Lambda+Route53来支持此功能,同时还能自动化注册从ip到域名的反向解析记录。这样无论我们在业务拍错的时候,日志中获取到的无论是IP地址或是私有DNS,都可以快速定位EC2实例。

解决方案

方案描述

在Route 53中创建2个私有托管区,一个用于自定义域名解析到EC2的私有IP地址,另一个用于IP地址反向解析到自定义域名。利用CloudWatchd中的Events-Rules,添加监控Tag Change on Resource。根据instance的tag Name来写入私有托管区一个A记录和一个PTR记录。为了名字修改后可以更新,还需要再添加监控Instance State-change Notification,当EC2状态变更为Running或者Terminated的时候更新记录。

适用性:

  • 一个Region内的多个VPC的ip段的前缀至少有一个相同。
    • 符合要求的VPC举例:
      • IP段分别为1.0.0/16,10.2.0.0/16的2个VPC
      • IP段分别为18.0.0/20,18.18.128.0/20的2个VPC
    • 不符合要求的VPC举例:
      • IP段分别为1.0.0/16,10.2.0.0/16的2个VPC

架构图

实施步骤

  1. 创建一个私有托管区B,用于自定义域名。

  1. 创建一个私有托管区B,用于从IP反向获取自定义域名。如果你的VPC绑定的IP段为B.C.D,你的域名必须为以下几种之一:

a. D.in-addr.arpa,例如18.in-addr.arpa

b. C.D.in-addr.arpa,例如18.18.in-addr.arpa

c. B.C.D.in-addr.arpa,例如0.18.18.in-addr.arpa

查询VPC所在的CIDR

设置自定义域名,规则必须符合绑定的域名

  1. 创建IAM Role用于Lambda执行的角色lambda-ec2-name-register-role,并附加以下策略

a. AmazonEC2ReadOnlyAccess

b. AmazonRoute53ReadOnlyAccess

c. AmazonRoute53AutoNamingFullAccess

d. AWSLambdaExecute

  1. 创建Lambda,用于tag响应事件。

a.设置lambda函数名ec2_change_name

b.设置已经创建好的IAM Role lambda-ec2-name-register-role

c.配置中设置超时时间为10

import boto3
import time
import asyncio

config = {
    'HOSTED_ZONE_ID': '<自定义域名的托管区id>', # 本案例中为:Z0348615WGFD7IWPZOCV
    'PTR_ZONE_ID': '<PTR的托管区ID>', # 本案例中为:Z05233005YXC6V4H0HJK
    'PTR_RESERVED_PARTS': 2, # 本案例中值为2,如果你的PTR域名为10.in-addr-apra,则这个值为1
}
route53 = boto3.client('route53')
ec2 = boto3.client('ec2')
loop = asyncio.get_event_loop()


def get_instance_private_ip(instance_id):
    instance = ec2.describe_instances(
        InstanceIds=[instance_id]
    )
    private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']

    return private_ip


async def add_a_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return

    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)

        await delete_dns_record(private_ip)
        # 注册内网A记录

        response = route53.change_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            ChangeBatch={
                'Comment': 'add A %s -> %s' % (new_full_custom_dns_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': new_full_custom_dns_name,
                            'Type': 'A',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': private_ip}]
                        }
                    }
                ]
            }
        )
        print('ADD A: %s is recorded for %s, cost %.3fs' % (
            new_full_custom_dns_name, private_ip, time.time() - begin_time))
    except Exception as e:
        print(e)


async def add_ptr_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return

    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)

        await delete_ptr_record(private_ip)

        # 添加反向PTR记录
        ptr_zone_info = route53.get_hosted_zone(
            Id=config.PTR_ZONE_ID
        )

        ip_parts = private_ip.split('.')
        ptr_reserved_ip_parts = ip_parts[config.PTR_RESERVED_PARTS:]
        ptr_reserved_ip_parts.reverse()
        ptr_name = '.'.join(ptr_reserved_ip_parts)
        ptr_full_name = ptr_name + '.' + ptr_zone_info['HostedZone']['Name']
        record_sets = route53.change_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            ChangeBatch={
                'Comment': 'add PTR %s -> %s' % (ptr_full_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': ptr_full_name,
                            'Type': 'PTR',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': new_full_custom_dns_name}]
                        }
                    }
                ]
            }
        )
        print('ADD PTR: %s is recorded for %s, cost %.3fs' % (
            ptr_full_name, new_full_custom_dns_name, time.time() - begin_time))
    except Exception as e:
        print(e)


async def delete_dns_record(private_ip):
    begin_time = time.time()
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            StartRecordName=private_ip,
            StartRecordType='A'
        )
        print(response)
        # 删除匹配的记录
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'A' and record['ResourceRecords'][0]['Value'] == private_ip:
                record_sets = route53.change_resource_record_sets(
                    HostedZoneId=config.HOSTED_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'A',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': private_ip}]
                                }
                            }
                        ]
                    }
                )
                print('DEL A: %s is deleted, cost %.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)


async def delete_ptr_record(private_ip):
    begin_time = time.time()
    ip_parts = private_ip.split('.')
    ip_parts.reverse()
    reversed_ip = '.'.join(ip_parts)
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            StartRecordName=reversed_ip,
            StartRecordType='PTR'
        )

        # 删除匹配的记录
        ptr_full_name = reversed_ip + '.in-addr.arpa.'
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'PTR' and record['Name'] == ptr_full_name:
                route53.change_resource_record_sets(
                    HostedZoneId=config.PTR_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete PTR %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'PTR',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': record['ResourceRecords'][0]['Value']}]
                                }
                            }
                        ]
                    }
                )
                print('DEL PTR: n%s is deleted, cost %0.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)


def add_node(instance_id, new_name):
    private_ip = get_instance_private_ip(instance_id)
    tasks = [
        add_a_record(new_name, private_ip),
        add_ptr_record(new_name, private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))


def del_node(instance_id):
    private_ip = get_instance_private_ip(instance_id)
    tasks = [
        delete_dns_record(private_ip),
        delete_ptr_record(private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))


def change_instance_dns_name(instance_id, new_name):
    if len(new_name) == 0:
        del_node(instance_id)
    else:
        add_node(instance_id, new_name.strip())


def lambda_handler(event, context):
    resources = event['resources']
    detail = event['detail']
    if 'changed-tag-keys' not in detail:
        return
    if 'Name' not in detail['changed-tag-keys']:
        return

    for resource in resources:
        arn_parts = resource.split(':')
        item = arn_parts[-1:][0].split('/')
        if 'instance' == item[0]:
            change_instance_dns_name(item[1], detail['tags']['Name'])
  1. 创建Lambda,用于EC2开关机响应事件

a.设置lambda函数名ec2_start_and_shutdown

b.设置已经创建好的IAM Role lambda-ec2-name-register-role

c.配置中设置超时时间为10

import boto3
import time
import asyncio

config = {
    'HOSTED_ZONE_ID': '<自定义域名的托管区id>', # 本案例中为:Z0348615WGFD7IWPZOCV
    'PTR_ZONE_ID': '<PTR的托管区ID>', # 本案例中为:Z05233005YXC6V4H0HJK
    'PTR_RESERVED_PARTS': 2, # 本案例中值为2,如果你的PTR域名为10.in-addr-apra,则这个值为1
}
route53 = boto3.client('route53')
ec2 = boto3.client('ec2')
loop = asyncio.get_event_loop()


def get_instance_info(instance_id):
    instance = ec2.describe_instances(
        InstanceIds=[instance_id]
    )
    private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
    name = ''
    for tag in instance['Reservations'][0]['Instances'][0]['Tags']:
        if tag['Key'] == 'Name':
            name = tag['Value']

    return name, private_ip


def get_instance_private_ip(instance_id):
    instance = ec2.describe_instances(
        InstanceIds=[instance_id]
    )
    private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']

    return private_ip


async def add_a_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return

    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)

        await delete_dns_record(private_ip)
        # 注册内网A记录

        response = route53.change_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            ChangeBatch={
                'Comment': 'add A %s -> %s' % (new_full_custom_dns_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': new_full_custom_dns_name,
                            'Type': 'A',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': private_ip}]
                        }
                    }
                ]
            }
        )
        print('ADD A: %s is recorded for %s, cost %.3fs' % (
            new_full_custom_dns_name, private_ip, time.time() - begin_time))
    except Exception as e:
        print(e)


async def add_ptr_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return

    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)

        await delete_ptr_record(private_ip)

        # 添加反向PTR记录
        ptr_zone_info = route53.get_hosted_zone(
            Id=config.PTR_ZONE_ID
        )

        ip_parts = private_ip.split('.')
        ptr_reserved_ip_parts = ip_parts[config.PTR_RESERVED_PARTS:]
        ptr_reserved_ip_parts.reverse()
        ptr_name = ('.').join(ptr_reserved_ip_parts)
        ptr_full_name = ptr_name + '.' + ptr_zone_info['HostedZone']['Name']
        record_sets = route53.change_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            ChangeBatch={
                'Comment': 'add PTR %s -> %s' % (ptr_full_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': ptr_full_name,
                            'Type': 'PTR',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': new_full_custom_dns_name}]
                        }
                    }
                ]
            }
        )
        print('ADD PTR: %s is recorded for %s, cost %.3fs' % (
            ptr_full_name, new_full_custom_dns_name, time.time() - begin_time))
    except Exception as e:
        print(e)


async def delete_dns_record(private_ip):
    begin_time = time.time()
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            StartRecordName=private_ip,
            StartRecordType='A'
        )
        # 删除匹配的记录
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'A' and record['ResourceRecords'][0]['Value'] == private_ip:
                record_sets = route53.change_resource_record_sets(
                    HostedZoneId=config.HOSTED_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'A',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': private_ip}]
                                }
                            }
                        ]
                    }
                )
                print('DEL A: %s is deleted, cost %.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)


async def delete_ptr_record(private_ip):
    begin_time = time.time()
    ip_parts = private_ip.split('.')
    ip_parts.reverse()
    reversed_ip = '.'.join(ip_parts)
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            StartRecordName=reversed_ip,
            StartRecordType='PTR'
        )

        # 删除匹配的记录
        ptr_full_name = reversed_ip + '.in-addr.arpa.'
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'PTR' and record['Name'] == ptr_full_name:
                route53.change_resource_record_sets(
                    HostedZoneId=config.PTR_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete PTR %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'PTR',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': record['ResourceRecords'][0]['Value']}]
                                }
                            }
                        ]
                    }
                )
                print('DEL PTR: n%s is deleted, cost %0.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)


def add_node(instance_id):
    instance_name, private_ip = get_instance_info(instance_id)
    tasks = [
        add_a_record(instance_name, private_ip),
        add_ptr_record(instance_name, private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))


def del_node(instance_id):
    private_ip = get_instance_private_ip(instance_id)
    tasks = [
        delete_dns_record(private_ip),
        delete_ptr_record(private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))


def lambda_handler(event, context):
    state = event['detail']['state']
    instance_id = event['detail']['instance-id']
    if 'running' == state:
        add_node(instance_id)

    if 'shutting-down' == state:
        del_node(instance_id)

    return 0
  1. CloudWatch中的“Event”,创建“Rule”

a.创建Tag Change on Resource事件,目标设置为Lambda,选择函数ec2_change_name

b. 创建Instance State-change Notification事件,目标设置为Lambda,选择函数ec2_start_and_shutdown

测试方法

1.正向域名解析道IP和反向IP解析到域名的验证办法:

a.用ssh登陆到与托管区绑定的VPC

b.正向域名测试使用Ping <custom domain>命令,可以解析指定ip

c.反向IP查询域名使用dig -x <ip address>,可以查询到自定义域名

2.事件有效性验证

a.创建一个EC2实例,设置Tag,Key=NameValue=Demo后,查看托管区记录

b.Terminate一台实例后,查看托管区记录,相关记录已经删除

c.修改EC2实例名后,查看托管区记录,相关记录已经更新

成本估计

Lambda每月2万次调用,每次5秒钟运行时间,估计每月成本$0.21

Route53 2个托管区,每月成本$2

源代码

SAM源代码:GitHub – yourlin/ec2-name-register

通过SAM build和SAM deploy --guided来部署lambda,会自动创建并绑定CloudWatch Event的Rule,无需手动创建CloudWatch Event Rule。Lambda的相关IAM权限也会自动附加上,不需要手动添加。

参考

针对 PTR 记录启用 Route 53 的反向 DNS 功能

通过 Route 53 Resolver 使用和覆盖反向 DNS 规则

Creating records by using the Amazon Route 53 console

本篇作者

林业

AWS解决方案架构师,负责基于 AWS 的云计算方案的咨询与架构设计。拥有超过14年研发经验,曾打造千万级用户APP,多项Github开源项目贡献者。在游戏、IOT、智慧城市、汽车、电商等多个领域都拥有丰富的实践经验。