Data reader for avro events where there might be multiple records for a given key.
Data Reader for event type CSV data, where there may be multiple records for a given key.
Data Reader for CSV events, where there may be multiple records for a given key.
Data Reader for CSV events, where there may be multiple records for a given key. Each csv record will be automatically converted to type T that defines an Encoder.
Data Reader for event type CSV data, where there may be multiple records for a given key.
Data Reader for event type CSV data, where there may be multiple records for a given key. Each csv record will be automatically converted to an avro record using the provided schema.
Custom aggregate data reader
DataReader to use for event type data, with multiple records per key
Aggregate data reader params
Aggregate data reader params
An additional timeStamp function for extracting the timestamp of the event
A cut off time to be used for aggregating features extracted from the events
Data Reader for Parquet events, where there may be multiple records for a given key.
Data Reader for Parquet events, where there may be multiple records for a given key. Each parquet record will be automatically converted to type T that defines an Encoder.
Readers that extend this can be used as right hand side arguments for joins and so should do aggregation on the key to return only a single value
Data reader for avro data.
Data Reader for CSV data that automatically infers the schema from the CSV data and converts to T <: GenericRecord.
Data Reader for CSV data that automatically infers the schema from the CSV data and converts to T <: GenericRecord. The schema is inferred either using the provided headers params, otherwise the first row is assumed as a headers line
CSV reader for any type that defines an Encoder.
CSV reader for any type that defines an Encoder. Scala case classes and tuples/products included automatically.
Data Reader for CSV data.
Data Reader for CSV data. Each CSV record will be automatically converted to an Avro record using the provided schema.
Data reader for avro events when computing conditional probabilities.
Data Reader for event type CSV data (with schema inference), when computing conditional probabilities.
Data Reader for event type CSV data (with schema inference), when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to an avro record with an inferred schema.
Data Reader for CSV events, when computing conditional probabilities.
Data Reader for CSV events, when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to type T that defines an Encoder.
Data Reader for event type CSV data, when computing conditional probabilities.
Data Reader for event type CSV data, when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to an avro record using the provided schema.
Custom conditional aggregate data reader
DataReader to use for event type data, when modeling conditional probabilities.
DataReader to use for event type data, when modeling conditional probabilities. Predictor variables will be aggregated from events up until the occurrence of the condition. Response variables will be aggregated from events following the occurrence of the condition.
Conditional data reader params
Conditional data reader params
function for extracting the timestamp from an event
function for identifying if the condition is met
optional size of time window over which the response variable is to be aggregated
optional size of time window over which the predictor variables are to be aggregated
if a particular key met the condition multiple times, which of the times would you like to use in the training set
optional function to compute the cutoff value based on key and aggregated sequence of events for that key
do not generate feature vectors for keys in training set where the target condition is not met. If set to false, and condition is not met, features for those
Data Reader for Parquet events, when computing conditional probabilities.
Data Reader for Parquet events, when computing conditional probabilities. There may be multiple records for a given key. Each parquet record will be automatically converted to type T that defines an Encoder.
Custom data reader
DataReaders must specify: 1.
DataReaders must specify: 1. An optional path to read from 2. A function for extracting the key from the records being read 3. The read method to be used for reading the data
Simple avro streaming reader that monitors a Hadoop-compatible filesystem for new files.
Join Keys to use
Join Keys to use
key to use from left table
key to use from right table (will always be the aggregation key
key of joined result
ParquetReader for any type that defines an Encoder.
ParquetReader for any type that defines an Encoder. Scala case classes and tuples/products included automatically.
Time based filter for conditional aggregation
Time based filter for conditional aggregation
condition time column
primary time column
time window for conditional aggregation
Time column for aggregation
Time column for aggregation
column name
should keep the column in result
The name of the column containing the entity being scored will always be key
Just a handy factory for data readers
Just a handy factory for streaming readers
Data Reader for event type CSV data, where there may be multiple records for a given key. Each csv record will be automatically converted to an avro record by inferring a schema.