AWS Developer Tools Blog
Leveraging the s3 and s3api Commands
Have you ever run aws help
on the command line or browsed the AWS CLI Reference Documentation and noticed that there are two sets of Amazon S3 commands to choose from: s3
and s3api
? If you are completely unfamiliar with either the s3
or s3api
commands, you can read about the two commands in the AWS CLI User Guide. In this post, I am going to go into detail about the two different commands and provide a few examples on how to leverage the two sets of commands to your advantage.
s3api
Most of the commands in the AWS CLI are generated from JSON models, which directly model the APIs of the various AWS services. This allows the CLI to generate commands that are a near one-to-one mapping of the service’s API. The s3api
commands falls into this category of commands. The commands are entirely driven by these JSON models and closely mirrors the API of S3, hence the name s3api
. It mirrors the API such that each command operation, e.g. s3api list-objects
or s3api make-bucket
, shares a similar operation name, a similar input, and a similar output as the corresponding operation in S3’s API. As a result, this gives you a significantly granular amount of control over the requests you make to S3 using the CLI.
s3
The s3
commands are a custom set of commands specifically designed to make it even easier for you to manage your S3 files using the CLI. The main difference between the s3
and s3api
commands is that the s3
commands are not solely driven by the JSON models. Rather, the s3
commands are built on top of the operations found in the s3api
commands. As a result, these commands allow for higher-level features that are not provided by the s3api
commands. This includes, but is not limited to, the ability to synchronize local directories and S3 buckets, transfer multiple files in parallel, stream files, and automatically handle multipart transfers. In short, these commands further simplify and further quicken the transferring of files to, within, and from S3.
s3 and s3api Examples
Both sets of S3 commands have a lot to offer. With this wide array of commands to choose from, it is important to be able to identify what commands you need for your specific use case. For example, if you want to upload a set of files on your local machine to your S3 bucket, you would probably want to use the s3
commands via the cp
or sync
command operations. On the other hand, if you wanted to set a bucket policy, you would use the s3api
commands via the put-bucket-policy
command operation.
However, your choice of S3 commands should not be limited to strictly deciding whether you need to use the s3
commands or s3api
commands. Sometimes you can use both sets of commands in conjunction to satisfy your use case. Often times this proves to be even more powerful as you are able to the leverage the low-level granular control of the s3api
commands with the higher-level simplicity and speed of the s3
commands. Here are a few examples of how you can work with both sets of S3 commands for your specific use case.
Bucket Regions
When you create an S3 bucket, the bucket is created in a specific region. Knowing the region that your bucket is in is essential for a variety of use cases such as transferring files across buckets located in different regions and making requests that require Signature Version 4 signing. However, you may not know or remember where your bucket is located. Fortunately by using the s3api
commands, you can determine your bucket’s region.
For example, if I make a bucket located in the Frankfurt region using the s3
commands:
$ aws s3 mb s3://myeucentral1bucket --region eu-central-1 make_bucket: s3://myeucentral1bucket/
I can then use s3api get-bucket-location
to determine the region of my newly created bucket:
$ aws s3api get-bucket-location --bucket myeucentral1bucket { "LocationConstraint": "eu-central-1" }
As shown above, the value of the LocationConstraint
member in the output JSON is the expected region of the bucket, eu-central-1
. Note that for buckets created in the US Standard region, us-east-1, the value of LocationConstraint
will be null
. As a quick reference to how location constraints correspond to regions, refer to the AWS Regions and Endpoints Guide.
Once you have learned the region of your bucket, you can pass the region in using the --region
parameter, setting it in your config file, setting it in a profile, or setting it using the AWS_DEFAULT_REGION
environment variable. You can read more about how to set a region in the AWS CLI User Guide This allows you to select your region when you are making subsequent requests to your bucket via the s3
and s3api
commands.
Deleting a Set of Buckets
For this example, suppose that I have a lot of buckets that I was using for testing and they are no longer needed. But, I have other buckets, too, and they need to stick around:
$ aws s3 ls 2014-12-02 13:36:17 awsclitest-123 2014-12-02 13:36:24 awsclitest-234 2014-12-02 13:36:51 awsclitest-345 2014-11-21 16:47:14 mybucketfoo
The buckets beginning with awsclitest-
are test buckets that I want to get rid of. An obvious way would to be to just delete each bucket using aws s3 rb
one at a time. This becomes tedious though if I were to have a lot of these test buckets or the test bucket names were longer and more complicated. I am going to go step by step on how you can build a single command that will delete all of the buckets that begin with awsclitest-
.
Instead of using the s3 ls
command to list my buckets, I am going to use the s3api list-buckets
command to list them:
$ aws s3api list-buckets { "Owner": { "DisplayName": "mydisplayname", "ID": "myid" }, "Buckets": [ { "CreationDate": "2014-12-02T21:36:17.000Z", "Name": "awsclitest-123" }, { "CreationDate": "2014-12-02T21:36:24.000Z", "Name": "awsclitest-234" }, { "CreationDate": "2014-12-02T21:36:51.000Z", "Name": "awsclitest-345" }, { "CreationDate": "2014-11-22T00:47:14.000Z", "Name": "mybucketfoo" } ] }
At first glance, it does not make much sense to use the s3api list-buckets
over the s3 ls
because all of the bucket names are embedded in the JSON output of the command. However, we can take advantage of the command’s --query
parameter to perform JMESPath queries for specific members and values in the JSON output:
$ aws s3api list-buckets --query 'Buckets[?starts_with(Name, `awsclitest-`) == `true`].Name' [ "awsclitest-123", "awsclitest-234", "awsclitest-345" ]
If you are unfamiliar with the --query
parameter, you can read about it in the AWS CLI User Guide. For this specific query, I am asking for the names of all of the buckets that begin with awsclitest-
. However, the output is still a little difficult to parse if we hope to use that as input to the s3 rb
command. To make the names easier to parse out, we can modify our query slightly and specify text
for the --output
parameter:
$ aws s3api list-buckets --query 'Buckets[?starts_with(Name, `awsclitest-`) == `true`].[Name]' --output text awsclitest-123 awsclitest-234 awsclitest-345
With this output, we can now use it as input to perform a forced bucket delete on all of the buckets whose name starts with awsclitest-
:
$ aws s3api list-buckets --query 'Buckets[?starts_with(Name, `awsclitest-`) == `true`].[Name]' --output text | xargs -I {} aws s3 rb s3://{} --force delete: s3://awsclitest-123/test remove_bucket: s3://awsclitest-123/ delete: s3://awsclitest-234/test remove_bucket: s3://awsclitest-234/ delete: s3://awsclitest-345/test remove_bucket: s3://awsclitest-345/
As shown in the output, all of the desired buckets along with any files inside of them were deleted. Then to ensure that it worked, I then can list out all of my buckets:
$ aws s3 ls 2014-11-21 16:47:14 mybucketfoo
Aggregating S3 Server Access Logs
In this final example, I will show you how you can use the s3
and s3api
commands together in order to aggregate your S3 server access logs. These logs are used to track the requests for access to your S3 bucket. If you are unfamiliar with server access logs, you read can about them in the Amazon S3 Developer Guide.
Server access logs follow the naming convention TargetPrefixYYYY-mm-DD-HH-MM-SS-UniqueString
where YYYY
, mm
, DD
, HH
, MM
and SS
are the digits of the year, month, day, hour, minute, and seconds, respectively, of when the log file was delivered. However, the numbers of logs delivered for a specific period of time and inside of a specific log file is somewhat unpredictable. As a result, it would be convenient to aggregate all of the logs for a specific period of time into one file in an S3 bucket.
For this example, I am going to aggregate all of the logs that were delivered on October 31, 2014 from 11 a.m. to 12 p.m. to the file 2014-10-31-11.log
in my bucket. To begin, I will use s3api list-objects
to list all of the objects in my bucket beginning with logs/2014-10-31-11
:
$ aws s3api list-objects --bucket myclilogs --output text --prefix logs/2014-10-31-11 --query Contents[].[Key] logs/2014-10-31-11-19-03-D7E3D44429C236C9 logs/2014-10-31-11-19-05-9FCEDD1393C9319F logs/2014-10-31-11-19-26-01DE8498F22E8EB6 logs/2014-10-31-11-20-03-1B26CD31AE5BFEEF logs/2014-10-31-11-21-34-757D6904963C22A6 logs/2014-10-31-11-21-35-27B909408B88017B logs/2014-10-31-11-21-50-1967E793B8865384 ....... Continuing to the end ........... logs/2014-10-31-11-42-44-F8AD38626A24E288 logs/2014-10-31-11-43-47-160D794F4D713F24
Using both the --query
and --ouput
parameters, I was able to list the logs in a format that could easily be used as input for the s3
commands. Now that I have identified all of the logs that I want to aggregate, I am going to take advantage of s3 cp
streaming capability to actually aggregate the logs.
When using s3 cp
to stream, you have two options: upload a stream from standard input to an S3 object or download an S3 object as a stream to standard output. You can do so by specifying -
as the first path parameter to the cp
command if you want to upload a stream or by specifying -
as the second path parameter to the cp
if you want to download an object as a stream. For my use case, I am going to stream in both directions:
$ aws s3api list-objects --bucket myclilogs --output text --prefix logs/2014-10-31-11 --query Contents[].[Key] | xargs -I {} aws s3 cp s3://myclilogs/{} - | aws s3 cp - s3://myclilogs/aggregatedlogs/2014-10-31-11.log
The workflow for this command is as follows. First, I stream each desired log one by one to standard output. Then I pipe the stream from standard output to standard input and upload the stream to the desired location in my bucket.
If you wanted to speed up this process, you can utilize GNU parallel shell tool to make each of the s3 cp
commands, that download the log as a stream, run in parallel with each other:
$ aws s3api list-objects --bucket myclilogs --output text --prefix logs/2014-10-31-11 --query Contents[].[Key] | parallel -j5 aws s3 cp s3://myclilogs/{} - | aws s3 cp - s3://myclilogs/aggregatedlogs/2014-10-31-11.log
By indicating the -j5
parameter in the command above, I am assigning each s3 cp
streaming download command to one of five jobs that are running those commands in parallel. Also, note that the GNU parallel shell tool may not be automatically installed on your machine and can be installed with tools such as brew
and apt-get
.
Once the command finishes, I can then verify that my aggregated log exists:
$ aws s3 ls s3://myclilogs/aggregatedlogs/ 2014-12-03 10:43:49 269956 2014-10-31-11.log
Conclusion
I hope that the description and examples that I provided will help you further leverage both the s3
and s3api
commands to your advantage. However, do not limit yourself to just the examples I provided. Go ahead and try to figure out other ways to utilize the s3
and s3api
commands together today!
You can follow us on Twitter @AWSCLI and let us know what you’d like to read about next! If you have any questions about the CLI or any feature requests, do not be afraid to get in communication with us via our GitHub repository
Stay tuned for our next blog post!