I created an Amazon Athena table using RegexSerDe. When I query the table, I get the following error: "Number of matching groups doesn't match the number of columns."

To resolve this error, be sure that the number of capturing groups in the regex pattern matches the number of fields that were defined when you created the table in Athena.

For example, here is a row of input data: - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523

The row has seven fields. This is the correct regex pattern: 

^([0-9.]+) ([\\w.-]) ([\\w.-]) \\[([A-Za-z0-9:/]+ [+-][0-9]{4})\\] \"(.+?)\" ([0-9]{3}) ([0-9]+)$

Note: RegexSerDe follows the Java standard. Because the backslash is an escape character in the Java String class, you must use a double backslash to define a single backslash. For example, to define \w, you must use \\w in your regex.

There are seven capturing groups in this regex pattern, and there are seven fields in the input data. When the table is queried, RegexSerDe doesn't throw the "Number of matching groups doesn't match the number of columns" exception.

To run a DDL statement, specify the regex capturing groups for SERDEPROPERTIES, as shown in the following example:

CREATE external table logs(col1 string, col2 string, col3 string, col4 string, col5 string, col6 string, col7 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^([0-9.]+) ([\\w.-]) ([\\w.-]) \\[([A-Za-z0-9:/]+ [+-][0-9]{4})\\] \"(.+?)\" ([0-9]{3}) ([0-9]+)$") LOCATION 's3://mybucket/path/'

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-08-27