| Location of Jar | s3://elasticmapreduce/libs/pig/0.3/piggybank-0.3-amzn.jar |
| Source License | Apache License, Version 2.0 |
The following functions are described.
1.0 FORMAT_DT
Description
Takes a DateTimeFormat string and a DateTime(i.e. a string produced by DATE_TIME()), and formats it into a string. The DateTimeFormat is the Joda Time form, documented at DateTimeFormat.
Import
DEFINE FORMAT_DT org.apache.pig.piggybank.evaluation.datetime.FORMAT_DT();
Signature
FORMAT_DT(datetimeformat: chararray, date: DateTime(chararray))
returns chararray;
2.0 DATE_TIME
Description
A function that returns a DateTime string, of the form yyyy-MM-dd'T'HH:mm:ss.SSSZZ.
A DateTime represents a precise point on the time line. This is the number of milliseconds from the Java epoch of 1970-01-01T00:00:00Z. Together with this information DateTime holds a Timezone used for interpreting its fields.
The constructor for this function had overloads which allows the specification of a default Timezone and a default DateTimeFormat. The Timezone replaces the system default in any function that would use it. The DateTimeFormat is a default format which is attempted to match a string which doesn't match the overload.
A Timezone may be specified as:
- 'Z' or 'UTC' to represent a UTC TimeZone
- '[+-]hh:mm' to represent a numeric offset from UTC
- A long form Time Zone supported by the system, such as "America/Los_Angeles".
DateTimeFormat is the Joda Time format, described at DateTimeFormat
Import
DEFINE DATE_TIME org.apache.pig.piggybank.evaluation.datetime.DATE_TIME();
DEFINE MY_DATE_TIME org.apache.pig.piggybank.evaluation.datetime.DATE_TIME(
'-07:00', 'MM-dd-yyyy-HH-mm-ss'
);
Signatures
DATE_TIME() returns DateTime;
Creates a DateTime for now with the default timezone.
DATE_TIME(timezone: chararray) returns DateTime;
Creates a DateTime for now with the given timezone.
DATE_TIME(datetime: chararray) returns DateTime;
Converts the given DateTime to the default timezone.
DATE_TIME(datetime: chararray, timezone: chararray) returns DateTime;
Converts the given DateTime to the given timezone.
DATE_TIME(instant:long) returns DateTime;
Creates a DateTime for the given number of milliseconds since 1970-01-01 with the default timezone.
DATE_TIME(instant:long, timezone: chararray) returns DateTime;
Creates a DateTime for the given number of milliseconds since 1970-01-01 with the given timezone.
DATE_TIME(str:chararray, datetimeformat: chararray) returns DateTime;
Parses str into a DateTime using format. If timezone is not parsed then it defaults to the default timezone.
DATE_TIME(str:chararray, datetimeformat: chararray, timezone: chararray)
returns DateTime;
Parses str into a DateTime using format, in and with the given timezone.
3.0 REPLACE
Description
Replaces a string with another string inside a larger string. A null reference passed to this method is a no-op.
Import
DEFINE REPLACE org.apache.pig.piggybank.evaluation.string.REPLACE();
Signature
REPLACE(string: chararray, pattern: chararray, replacement: chararray)
returns chararray;
Note that the function only does string matching, pattern is not a regular expression.
4.0 FORMAT
Description
Formats a list of arguments into a single string. See java.util.Formatter for the definition of format strings.
Import
DEFINE FORMAT org.apache.pig.piggybank.evaluation.string.FORMAT();
Signature
FORMAT(format: chararray, args: object...);
FORMAT(format: chararray, args: tuple);
5.0 EXTRACT
Description
Parses input string with a regular expression, and returns all matching groups. The regular expression format is documented in java.util.regex.Pattern.
You may find it useful to combine EXTRACT with FLATTEN like so:
grunt> numbers = FOREACH mylog GENERATE
FLATTEN(EXTRACT(line, '([0-9]+) [0-9]+ ([0-9]+)')) as (first:chararray, second:charray);
grunt> dump numbers;
(22, 64)
Note the 'as (...)' clause must be included in this case due to a shortcoming in the Pig type system.
Import
DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
Signature
EXTRACT(string: chararray, pattern: chararray) returns
tuple(chararray ...);