AWS Developer Tools Blog

Introducing support for Amazon S3 Select in the AWS SDK for PHP

We’re excited to announce support for the Amazon Simple Storage Service (Amazon S3) SelectObjectContent API with event streams in the AWS SDK for PHP in version 3.67.11. Using Amazon S3 Select, you can query for a subset of data from an S3 object by using simple SQL expressions.

Amazon S3 streams the responses as a series of events, instead of returning the full API response all at once. This enables your applications to process the parts of the response as the application receives them. To support this new API behavior, the AWS SDK for PHP supports processing these events from the API response.

Using Amazon S3 Select to query an object

Amazon S3 Select enables you to query an object that contains CSV-formatted or JSON-formatted data with simple SQL expressions.

For our example, let’s use a CSV file named target-file.csv as the key, which is uploaded to an S3 object in the bucket named my-bucket in the us-west-2 AWS Region.

The CSV file contains a common delimited list of user names and ages.

user_name,age
phprocks,13
elePHPant,22
querytime,29
...

With this CSV file, we want our application to select only users with an age greater than 20. To do this, we write an SQL expression like the following to select the user_name field for users with an age greater than 20.

SELECT user_name FROM S3Object WHERE cast(age as int) > 20

Using Amazon S3 Select to select records

We can now use the AWS SDK for PHP with the Amazon S3 SelectObjectContent API to select records from JSON and CSV files stored in Amazon S3.

First, we want our application to create a new Amazon S3 client for the AWS Region that our my-bucket is in. We’ll use this client to make the SelectObjectContent API calls.

	
use Aws\S3\S3Client;

$client = new S3Client([
    'region' => 'us-west-2',
    'version' => 'latest',
]);
	

By following the AWS SDK for PHP API documentation for SelectObjectContent, our API request parameters could look like the following for the SQL expression we want to use.

	
$result = $client->selectObjectContent([
    'Bucket' => 'my-bucket', 
    'Key' => 'target-file.csv', 
    'ExpressionType' => 'SQL',
    'Expression' => 'SELECT user_name FROM S3Object WHERE cast(age AS int) > 20',
    'InputSerialization' => [
        'CSV' => [
            'FileHeaderInfo' => 'USE', 
            'RecordDelimiter' => "\n", 
            'FieldDelimiter' => '.',
        ],
    ], 
    'OutputSerialization' => [
        'CSV' => [],
    ],
]);
	

Now we have everything ready to make the API call. How we handle processing the events depends on the environment the SDK is running in.

Using an EventParsingIterator

In the AWS SDK for PHP, Payload, the event stream member of the response, will be wrapped in an EventParsingIterator so that we can iterate over all of the events supplied on the stream from Amazon S3.

	
foreach ($result['Payload'] as $event) {
    if (isset($event['Records'])) {
        echo (string) $event['Records']['Payload'] . PHP_EOL;
    } elseif (isset($event['Stats'])) {
        echo 'Processed ' . $event['Stats']['Details']['BytesProcessed'] . ' bytes' . PHP_EOL;
    } elseif (isset($event['End'])) {
        echo 'Complete.' . PHP_EOL;
    }
}
	

Final thoughts

With Amazon S3 Select, you can create SQL expressions to select a subset of CSV or JSON records from files stored in Amazon S3. The AWS SDK for PHP provides the tools to use this API to process events from the API.