亚马逊AWS官方博客

Blog — 通过ODCR和Prioritized Allocation Strategy 构建高效、经济的EMR集群(二)

在之前的 blog 中,我们介绍了在⿊五等促销季来临时,怎么使⽤ ODCR 来保留资源,并在 EMR 中如何使⽤这些资源。不过在有些场景中,仅靠 ODCR 并不能完全满⾜我们的需求,或者说配置会⾮常复杂,举例来说:

  • 场景⼀:两个集群 A 和 B 都配置了 r7g.4xlarge 和 r7g.8xlarge,但我们希望集群 A 多使⽤ r7g.4xlarge,⽽集群 B 多使⽤r7g.8xlarge;
  • 场景⼆:我们在⼀个集群内,同时配置了 r7g 和 r6g,但我们希望使⽤性价⽐更⾼的 r7g。

虽然 Targeted ODCR 通过指定特定的 ODCR 预留来控制资源的使⽤,但它仅⽀持设置“有”或“⽆”,不⽀持设定优先级。⽽之前的 EMR 中,Fleet 的 On-Demand 机型,只⽀持 lowest-price ⼀种 allocation strategy,所以在场景⼆中,总是会使⽤单价更低的 r6g ⽽不是性价⽐更⾼ r7g。 所幸,在 2024年,EMR 发布了⼀个新的特性,⽀持指定实例优先级,在 On-Demand allocation strategy 中称为 prioritized,在 Spot 中则是 capacity-optimized-prioritized。本篇Blog就重点介绍 Prioritized 新特性的使⽤场景和具体⽤法。

以前⾯提到的两个场景为例,我们来解释实例优先级的使⽤:

  • 场景⼀:在集群 A 中,将 r7g.4xlarge 的优先级配置为更⾼( priority 数值更⼩ ),则 集群 A 会优先使⽤ r7g.4xlarge。对应的,集群 B 中,需要将 r7g.8xlarge 的优先级配置为更⾼;
  • 场景⼆:默认的 lowest-price 会优先单价更低的 r6g,因此需要切换到 prioritized allocation strategy,并且将 r7g 的优先级设为更⾼。

需要再次提醒的是,priority 数值更⼩代表着优先级更⾼,数值 0 代表最⾼优先级。另外,allocation strategy 仅对 Instance Fleet有效,在 Instance Group 中没有对应配置。

下⾯我们通过⼀个例⼦,来说明 prioritized allocation strategy 的使⽤。

如果是在 EMR 控制台上,创建集群时需要选择 Instance Fleet,并且勾选 “Apply allocation strategy”,在 Allocation strategy中,On-Demand 选择 Prioritized,Spot 选择 Capacity optimized prioritized。

给不同的实例类型赋予不同的 priority,下图中,r7a.48xlarge 的优先级最⾼,r7g.16xlarge 和 r7g.12xlarge 并列第⼆,最低的是r6g.16xlarge。

如果是 AWS CLI 中,则需要在 LaunchSpecifications 指定 AllocationStrategy,并且还要给每个实例赋予 Priority 数值。

aws emr create-cluster \
 --name "priority" \
 --log-uri "s3://aws-logs-0123456789-us-west-2/elasticmapreduce" \
 --release-label "emr-6.10.1" \
 --service-role "arn:aws:iam::0123456789:role/EMR_DefaultRole" \
 --unhealthy-node-replacement \
 --ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","EmrManagedMasterSecurityGroup":"sg-0123456789","EmrManagedSlaveSecurityGroup":"sg-0123456789","KeyName":"KEY-NAME","AdditionalMasterSecurityGroups":[],"AdditionalSlaveSecurityGroups":[],"ServiceAccessSecurityGroup":"sg-0123456789","SubnetId":"subnet-0123456789"}' \
 --applications Name=Hive \
 --instance-fleets '[{"Name":"Core","InstanceFleetType":"CORE","TargetSpotCapacity":0,"TargetOnDemandCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":60,"TimeoutAction":"TERMINATE_CLUSTER","AllocationStrategy":"CAPACITY_OPTIMIZED_PRIORITIZED"},"OnDemandSpecification":{"AllocationStrategy":"PRIORITIZED"}},"InstanceTypeConfigs":[{"WeightedCapacity":4,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m6g.xlarge","Priority":0}]},{"Name":"Primary","InstanceFleetType":"MASTER","TargetSpotCapacity":0,"TargetOnDemandCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":60,"TimeoutAction":"TERMINATE_CLUSTER","AllocationStrategy":"CAPACITY_OPTIMIZED_PRIORITIZED"},"OnDemandSpecification":{"AllocationStrategy":"PRIORITIZED"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m6g.2xlarge","Priority":0}]},{"Name":"Task - 1","InstanceFleetType":"TASK","TargetSpotCapacity":24,"TargetOnDemandCapacity":24,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":5,"TimeoutAction":"TERMINATE_CLUSTER","AllocationStrategy":"CAPACITY_OPTIMIZED_PRIORITIZED"},"OnDemandSpecification":{"AllocationStrategy":"PRIORITIZED"}},"InstanceTypeConfigs":[{"WeightedCapacity":64,"EbsConfiguration":{"EbsOptimized":true,"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"r7g.16xlarge","Priority":2},{"WeightedCapacity":192,"EbsConfiguration":{"EbsOptimized":true,"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"r7a.48xlarge","Priority":1},{"WeightedCapacity":64,"EbsConfiguration":{"EbsOptimized":true,"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":32}}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"r6g.16xlarge","Priority":10},{"WeightedCapacity":48,"EbsConfiguration":{"EbsOptimized":true,"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}},{"VolumeSpecification":{"VolumeType":"gp2","SizeInGB":64}}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"r7g.12xlarge","Priority":2}]}]' \
 --scale-down-behavior "TERMINATE_AT_TASK_COMPLETION" \
 --auto-termination-policy '{"IdleTimeout":3600}' \
 --region "us-west-2"

"LaunchSpecifications": {
  "SpotSpecification": {
    "TimeoutDurationMinutes": 60,
    "TimeoutAction": "TERMINATE_CLUSTER",
    "AllocationStrategy": "CAPACITY_OPTIMIZED_PRIORITIZED"
  },
  "OnDemandSpecification": {
    "AllocationStrategy": "PRIORITIZED"
  }
}

如果是使⽤ SDK,则需要同时指定 Provisioning 和 Resizing 时的配置

OnDemandProvisioningSpecification onDemandProvisioningSpecification = OnDemandProvisioningSpecification.builder()
        .allocationStrategy(OnDemandProvisioningAllocationStrategy.PRIORITIZED)
        .capacityReservationOptions(onDemandCapacityReservationOptions) 
        .build();

SpotProvisioningSpecification spotProvisioningSpecification = SpotProvisioningSpecification.builder()
        .allocationStrategy(SpotProvisioningAllocationStrategy.CAPACITY_OPTIMIZED_PRIORITIZED)
        .timeoutDurationMinutes(5)
        .timeoutAction(SpotProvisioningTimeoutAction.TERMINATE_CLUSTER)
        .build();

OnDemandResizingSpecification onDemandResizingSpecification = OnDemandResizingSpecification.builder()
        .allocationStrategy(OnDemandProvisioningAllocationStrategy.PRIORITIZED)
        .capacityReservationOptions(onDemandCapacityReservationOptions) 
        .timeoutDurationMinutes(5)
        .build();

SpotResizingSpecification spotResizingSpecification = SpotResizingSpecification.builder()
        .allocationStrategy(SpotProvisioningAllocationStrategy.CAPACITY_OPTIMIZED_PRIORITIZED)
        .timeoutDurationMinutes(5)
        .build();

每个机型则需要以 Double 类型指定优先级数值。

InstanceTypeConfig r6g16xLarge = InstanceTypeConfig.builder()
        .bidPriceAsPercentageOfOnDemandPrice(100.0)
        .instanceType("r6g.16xlarge")
        .ebsConfiguration(
                EbsConfiguration.builder()
                        .ebsBlockDeviceConfigs(EbsBlockDeviceConfig.builder()
                                .volumeSpecification(
                                        VolumeSpecification.builder()
                                                .sizeInGB(1400)
                                                .iops(3000)
                                                .throughput(250)
                                                .volumeType("gp3")
                                                .build()
                                )
                                .build())
                        .build())
        .weightedCapacity(64)
        .priority(10.0)
        .build();

值得⼀提的是,Spot 机型是 capacity-optimized-prioritized,它先考虑容量,再尽量考虑实例优先级。以我们前⾯的控制台配置为例,实际启动集群时,可能 task node 会成功创建了 r7g.16xlarge spot 实例 1 个,r7a.48xlarge on-demand 实例 1 个。 r7a.48xlarge 资源紧张,因此 Spot 并没有从优先级最⾼的 r7a.48xlarge 创建实例,⽽是使⽤了次⼀级的 r7g.16xlarge。

综上所述,EMR 新增的 prioritized/capacity-optimized-prioritized allocation strategy 在机型选择⽅⾯提供了更多的可定制性,通过它,我们可以根据⾃⼰的实际需求,结合当前资源容量,控制在 Instance Fleet 中不同实例类型的搭配。另外,Allocation strategy 既可以单独使⽤,也可以和 ODCR 配合使⽤,可以在保证资源供应的同时,还能具有机型调配的灵活性。总⽽⾔之,对于需要在资源可⽤性和成本效益之间取得平衡,同时对特定实例类型有明确偏好的企业⽤户,特别是运⾏⼤规模数据处理⼯作负载且关注资源分配精确性的客户,prioritized/capacity-optimized-prioritized allocation strategy 会是您的最佳选择。

*前述特定亚马逊云科技生成式人工智能相关的服务目前在亚马逊云科技海外区域可用。亚马逊云科技中国区域相关云服务由西云数据和光环新网运营,具体信息以中国区域官网为准。

本篇作者

方浩

亚马逊云科技资深解决方案架构师。逾二十年 IT 行业经验,十年大数据架构经验,连续创业者。致力于现代化数据架构的学习、研究和推广。

任田田

亚马逊云科技解决方案架构师,负责基于亚马逊云科技云计算方案架构的咨询和设计,推广亚马逊云科技云技术和各种解决方案。

陆毅

亚马逊云科技高级解决方案架构师,拥有 20 年传统 IT 和公有云行业经验。擅长云上基础设施的架构设计、运维等。