背景
 
       在日常运维过程中,业务有时候需要开启EC2的时候会自动注册一个自定义域名。在默认情况下,AWS开启EC2后会生成一个私有DNS,但是这个私有DNS并不支持生成自定义域名,本文就是通过Lambda+Route53来支持此功能,同时还能自动化注册从ip到域名的反向解析记录。这样无论我们在业务拍错的时候,日志中获取到的无论是IP地址或是私有DNS,都可以快速定位EC2实例。
 
       解决方案
 
       方案描述
 
       在Route 53中创建2个私有托管区,一个用于自定义域名解析到EC2的私有IP地址,另一个用于IP地址反向解析到自定义域名。利用CloudWatchd中的Events-Rules,添加监控Tag Change on Resource。根据instance的tag Name来写入私有托管区一个A记录和一个PTR记录。为了名字修改后可以更新,还需要再添加监控Instance State-change Notification,当EC2状态变更为Running或者Terminated的时候更新记录。
 
       适用性:
 
        
        - 一个Region内的多个VPC的ip段的前缀至少有一个相同。 
          
          - 符合要求的VPC举例: 
            
            - IP段分别为1.0.0/16,10.2.0.0/16的2个VPC
- IP段分别为18.0.0/20,18.18.128.0/20的2个VPC
 
- 不符合要求的VPC举例: 
            
            - IP段分别为1.0.0/16,10.2.0.0/16的2个VPC
 
 
架构图
 
       
 
       实施步骤
 
        
        - 创建一个私有托管区B,用于自定义域名。

 
       
 
        
        - 创建一个私有托管区B,用于从IP反向获取自定义域名。如果你的VPC绑定的IP段为B.C.D,你的域名必须为以下几种之一:
a. D.in-addr.arpa,例如18.in-addr.arpa
 
       b. C.D.in-addr.arpa,例如18.18.in-addr.arpa
 
       c. B.C.D.in-addr.arpa,例如0.18.18.in-addr.arpa
 
       
 
       查询VPC所在的CIDR
 
       
 
       设置自定义域名,规则必须符合绑定的域名
 
        
        - 创建IAM Role用于Lambda执行的角色lambda-ec2-name-register-role,并附加以下策略
a. AmazonEC2ReadOnlyAccess
 
       b. AmazonRoute53ReadOnlyAccess
 
       c. AmazonRoute53AutoNamingFullAccess
 
       d. AWSLambdaExecute
 
       
 
        
        - 创建Lambda,用于tag响应事件。
a.设置lambda函数名ec2_change_name
 
       b.设置已经创建好的IAM Role lambda-ec2-name-register-role
 
       c.配置中设置超时时间为10秒
 
       
 
        
        import boto3
import time
import asyncio
config = {
    'HOSTED_ZONE_ID': '<自定义域名的托管区id>', # 本案例中为:Z0348615WGFD7IWPZOCV
    'PTR_ZONE_ID': '<PTR的托管区ID>', # 本案例中为:Z05233005YXC6V4H0HJK
    'PTR_RESERVED_PARTS': 2, # 本案例中值为2,如果你的PTR域名为10.in-addr-apra,则这个值为1
}
route53 = boto3.client('route53')
ec2 = boto3.client('ec2')
loop = asyncio.get_event_loop()
def get_instance_private_ip(instance_id):
    instance = ec2.describe_instances(
        InstanceIds=[instance_id]
    )
    private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
    return private_ip
async def add_a_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return
    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
        await delete_dns_record(private_ip)
        # 注册内网A记录
        response = route53.change_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            ChangeBatch={
                'Comment': 'add A %s -> %s' % (new_full_custom_dns_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': new_full_custom_dns_name,
                            'Type': 'A',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': private_ip}]
                        }
                    }
                ]
            }
        )
        print('ADD A: %s is recorded for %s, cost %.3fs' % (
            new_full_custom_dns_name, private_ip, time.time() - begin_time))
    except Exception as e:
        print(e)
async def add_ptr_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return
    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
        await delete_ptr_record(private_ip)
        # 添加反向PTR记录
        ptr_zone_info = route53.get_hosted_zone(
            Id=config.PTR_ZONE_ID
        )
        ip_parts = private_ip.split('.')
        ptr_reserved_ip_parts = ip_parts[config.PTR_RESERVED_PARTS:]
        ptr_reserved_ip_parts.reverse()
        ptr_name = '.'.join(ptr_reserved_ip_parts)
        ptr_full_name = ptr_name + '.' + ptr_zone_info['HostedZone']['Name']
        record_sets = route53.change_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            ChangeBatch={
                'Comment': 'add PTR %s -> %s' % (ptr_full_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': ptr_full_name,
                            'Type': 'PTR',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': new_full_custom_dns_name}]
                        }
                    }
                ]
            }
        )
        print('ADD PTR: %s is recorded for %s, cost %.3fs' % (
            ptr_full_name, new_full_custom_dns_name, time.time() - begin_time))
    except Exception as e:
        print(e)
async def delete_dns_record(private_ip):
    begin_time = time.time()
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            StartRecordName=private_ip,
            StartRecordType='A'
        )
        print(response)
        # 删除匹配的记录
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'A' and record['ResourceRecords'][0]['Value'] == private_ip:
                record_sets = route53.change_resource_record_sets(
                    HostedZoneId=config.HOSTED_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'A',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': private_ip}]
                                }
                            }
                        ]
                    }
                )
                print('DEL A: %s is deleted, cost %.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)
async def delete_ptr_record(private_ip):
    begin_time = time.time()
    ip_parts = private_ip.split('.')
    ip_parts.reverse()
    reversed_ip = '.'.join(ip_parts)
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            StartRecordName=reversed_ip,
            StartRecordType='PTR'
        )
        # 删除匹配的记录
        ptr_full_name = reversed_ip + '.in-addr.arpa.'
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'PTR' and record['Name'] == ptr_full_name:
                route53.change_resource_record_sets(
                    HostedZoneId=config.PTR_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete PTR %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'PTR',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': record['ResourceRecords'][0]['Value']}]
                                }
                            }
                        ]
                    }
                )
                print('DEL PTR: n%s is deleted, cost %0.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)
def add_node(instance_id, new_name):
    private_ip = get_instance_private_ip(instance_id)
    tasks = [
        add_a_record(new_name, private_ip),
        add_ptr_record(new_name, private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))
def del_node(instance_id):
    private_ip = get_instance_private_ip(instance_id)
    tasks = [
        delete_dns_record(private_ip),
        delete_ptr_record(private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))
def change_instance_dns_name(instance_id, new_name):
    if len(new_name) == 0:
        del_node(instance_id)
    else:
        add_node(instance_id, new_name.strip())
def lambda_handler(event, context):
    resources = event['resources']
    detail = event['detail']
    if 'changed-tag-keys' not in detail:
        return
    if 'Name' not in detail['changed-tag-keys']:
        return
    for resource in resources:
        arn_parts = resource.split(':')
        item = arn_parts[-1:][0].split('/')
        if 'instance' == item[0]:
            change_instance_dns_name(item[1], detail['tags']['Name'])
 
         
        
        - 创建Lambda,用于EC2开关机响应事件
a.设置lambda函数名ec2_start_and_shutdown
 
       b.设置已经创建好的IAM Role lambda-ec2-name-register-role
 
       c.配置中设置超时时间为10秒
 
        
        import boto3
import time
import asyncio
config = {
    'HOSTED_ZONE_ID': '<自定义域名的托管区id>', # 本案例中为:Z0348615WGFD7IWPZOCV
    'PTR_ZONE_ID': '<PTR的托管区ID>', # 本案例中为:Z05233005YXC6V4H0HJK
    'PTR_RESERVED_PARTS': 2, # 本案例中值为2,如果你的PTR域名为10.in-addr-apra,则这个值为1
}
route53 = boto3.client('route53')
ec2 = boto3.client('ec2')
loop = asyncio.get_event_loop()
def get_instance_info(instance_id):
    instance = ec2.describe_instances(
        InstanceIds=[instance_id]
    )
    private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
    name = ''
    for tag in instance['Reservations'][0]['Instances'][0]['Tags']:
        if tag['Key'] == 'Name':
            name = tag['Value']
    return name, private_ip
def get_instance_private_ip(instance_id):
    instance = ec2.describe_instances(
        InstanceIds=[instance_id]
    )
    private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
    return private_ip
async def add_a_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return
    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
        await delete_dns_record(private_ip)
        # 注册内网A记录
        response = route53.change_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            ChangeBatch={
                'Comment': 'add A %s -> %s' % (new_full_custom_dns_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': new_full_custom_dns_name,
                            'Type': 'A',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': private_ip}]
                        }
                    }
                ]
            }
        )
        print('ADD A: %s is recorded for %s, cost %.3fs' % (
            new_full_custom_dns_name, private_ip, time.time() - begin_time))
    except Exception as e:
        print(e)
async def add_ptr_record(new_name, private_ip):
    begin_time = time.time()
    # 如果有自定义dns_name
    if len(new_name) == 0:
        return
    try:
        host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
        host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
        new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
        await delete_ptr_record(private_ip)
        # 添加反向PTR记录
        ptr_zone_info = route53.get_hosted_zone(
            Id=config.PTR_ZONE_ID
        )
        ip_parts = private_ip.split('.')
        ptr_reserved_ip_parts = ip_parts[config.PTR_RESERVED_PARTS:]
        ptr_reserved_ip_parts.reverse()
        ptr_name = ('.').join(ptr_reserved_ip_parts)
        ptr_full_name = ptr_name + '.' + ptr_zone_info['HostedZone']['Name']
        record_sets = route53.change_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            ChangeBatch={
                'Comment': 'add PTR %s -> %s' % (ptr_full_name, private_ip),
                'Changes': [
                    {
                        'Action': 'UPSERT',
                        'ResourceRecordSet': {
                            'Name': ptr_full_name,
                            'Type': 'PTR',
                            'TTL': 300,
                            'ResourceRecords': [{'Value': new_full_custom_dns_name}]
                        }
                    }
                ]
            }
        )
        print('ADD PTR: %s is recorded for %s, cost %.3fs' % (
            ptr_full_name, new_full_custom_dns_name, time.time() - begin_time))
    except Exception as e:
        print(e)
async def delete_dns_record(private_ip):
    begin_time = time.time()
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.HOSTED_ZONE_ID,
            StartRecordName=private_ip,
            StartRecordType='A'
        )
        # 删除匹配的记录
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'A' and record['ResourceRecords'][0]['Value'] == private_ip:
                record_sets = route53.change_resource_record_sets(
                    HostedZoneId=config.HOSTED_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'A',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': private_ip}]
                                }
                            }
                        ]
                    }
                )
                print('DEL A: %s is deleted, cost %.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)
async def delete_ptr_record(private_ip):
    begin_time = time.time()
    ip_parts = private_ip.split('.')
    ip_parts.reverse()
    reversed_ip = '.'.join(ip_parts)
    try:
        # 查找匹配的记录
        response = route53.list_resource_record_sets(
            HostedZoneId=config.PTR_ZONE_ID,
            StartRecordName=reversed_ip,
            StartRecordType='PTR'
        )
        # 删除匹配的记录
        ptr_full_name = reversed_ip + '.in-addr.arpa.'
        for record in response['ResourceRecordSets']:
            if record['Type'] == 'PTR' and record['Name'] == ptr_full_name:
                route53.change_resource_record_sets(
                    HostedZoneId=config.PTR_ZONE_ID,
                    ChangeBatch={
                        'Comment': 'delete PTR %s' % record['Name'][:-1],
                        'Changes': [
                            {
                                'Action': 'DELETE',
                                'ResourceRecordSet': {
                                    'Name': record['Name'][:-1],
                                    'Type': 'PTR',
                                    'TTL': record['TTL'],
                                    'ResourceRecords': [{'Value': record['ResourceRecords'][0]['Value']}]
                                }
                            }
                        ]
                    }
                )
                print('DEL PTR: n%s is deleted, cost %0.3fs' % (record['Name'][:-1], time.time() - begin_time))
    except Exception as e:
        print(e)
def add_node(instance_id):
    instance_name, private_ip = get_instance_info(instance_id)
    tasks = [
        add_a_record(instance_name, private_ip),
        add_ptr_record(instance_name, private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))
def del_node(instance_id):
    private_ip = get_instance_private_ip(instance_id)
    tasks = [
        delete_dns_record(private_ip),
        delete_ptr_record(private_ip)
    ]
    loop.run_until_complete(asyncio.wait(tasks))
def lambda_handler(event, context):
    state = event['detail']['state']
    instance_id = event['detail']['instance-id']
    if 'running' == state:
        add_node(instance_id)
    if 'shutting-down' == state:
        del_node(instance_id)
    return 0
 
         
        
        - CloudWatch中的“Event”,创建“Rule”
a.创建Tag Change on Resource事件,目标设置为Lambda,选择函数ec2_change_name
 
       
 
       b. 创建Instance State-change Notification事件,目标设置为Lambda,选择函数ec2_start_and_shutdown
 
       
 
       测试方法
 
       1.正向域名解析道IP和反向IP解析到域名的验证办法:
 
       a.用ssh登陆到与托管区绑定的VPC
 
       b.正向域名测试使用Ping <custom domain>命令,可以解析指定ip
 
       c.反向IP查询域名使用dig -x <ip address>,可以查询到自定义域名
 
       2.事件有效性验证
 
       a.创建一个EC2实例,设置Tag,Key=Name,Value=Demo后,查看托管区记录
 
       b.Terminate一台实例后,查看托管区记录,相关记录已经删除
 
       c.修改EC2实例名后,查看托管区记录,相关记录已经更新
 
       成本估计
 
       Lambda每月2万次调用,每次5秒钟运行时间,估计每月成本$0.21
 
       Route53 2个托管区,每月成本$2
 
       源代码
 
       SAM源代码:GitHub – yourlin/ec2-name-register
 
       通过SAM build和SAM deploy --guided来部署lambda,会自动创建并绑定CloudWatch Event的Rule,无需手动创建CloudWatch Event Rule。Lambda的相关IAM权限也会自动附加上,不需要手动添加。
 
       参考
 
       针对 PTR 记录启用 Route 53 的反向 DNS 功能
 
       通过 Route 53 Resolver 使用和覆盖反向 DNS 规则
 
       Creating records by using the Amazon Route 53 console
 
       本篇作者