



package readers

  1. class AggregateAvroReader[T <: GenericRecord] extends AvroReader[T] with AggregateDataReader[T]


    Data reader for avro events where there might be multiple records for a given key.

  2. class AggregateCSVAutoReader[T <: GenericRecord] extends CSVAutoReader[T] with AggregateDataReader[T]


    Data Reader for event type CSV data, where there may be multiple records for a given key. Each csv record will be automatically converted to an avro record by inferring a schema.

  3. class AggregateCSVProductReader[T <: Product] extends CSVProductReader[T] with AggregateDataReader[T]


    Data Reader for CSV events, where there may be multiple records for a given key. Each csv record will be automatically converted to type T that defines an Encoder.

  4. class AggregateCSVReader[T <: GenericRecord] extends CSVReader[T] with AggregateDataReader[T]


    Data Reader for event type CSV data, where there may be multiple records for a given key. Each csv record will be automatically converted to an avro record using the provided schema.

  5. abstract class AggregateCustomReader[T] extends CustomReader[T] with AggregateDataReader[T]


    Custom aggregate data reader

  6. trait AggregateDataReader[T] extends AggregatedReader[T]


    DataReader to use for event type data, with multiple records per key

  7. case class AggregateParams[T](timeStampFn: Option[(T) ⇒ Long], cutOffTime: CutOffTime) extends Product with Serializable


    Aggregate data reader params


    An additional timeStamp function for extracting the timestamp of the event


    A cut off time to be used for aggregating features extracted from the events

    • Predictor variables will be aggregated from events up until the cut off time
    • Response variables will be aggregated from events following the cut off time
  8. class AggregateParquetProductReader[T <: Product] extends ParquetProductReader[T] with AggregateDataReader[T]


    Data Reader for Parquet events, where there may be multiple records for a given key. Each parquet record will be automatically converted to type T that defines an Encoder.

  9. trait AggregatedReader[T] extends DataReader[T]


    Readers that extend this can be used as right hand side arguments for joins and so should do aggregation on the key to return only a single value

  10. class AvroReader[T <: GenericRecord] extends DataReader[T]


    Data reader for avro data.

  11. class CSVAutoReader[T <: GenericRecord] extends DataReader[T]


    Data Reader for CSV data that automatically infers the schema from the CSV data and converts to T <: GenericRecord. The schema is inferred either using the provided headers params, otherwise the first row is assumed as a headers line

  12. class CSVProductReader[T <: Product] extends DataReader[T]


    CSV reader for any type that defines an Encoder. Scala case classes and tuples/products included automatically.

  13. class CSVReader[T <: GenericRecord] extends DataReader[T]


    Data Reader for CSV data. Each CSV record will be automatically converted to an Avro record using the provided schema.

  14. class ConditionalAvroReader[T <: GenericRecord] extends AvroReader[T] with ConditionalDataReader[T]


    Data reader for avro events when computing conditional probabilities.

  15. class ConditionalCSVAutoReader[T <: GenericRecord] extends CSVAutoReader[T] with ConditionalDataReader[T]


    Data Reader for event type CSV data (with schema inference), when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to an avro record with an inferred schema.

  16. class ConditionalCSVProductReader[T <: Product] extends CSVProductReader[T] with ConditionalDataReader[T]


    Data Reader for CSV events, when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to type T that defines an Encoder.

  17. class ConditionalCSVReader[T <: GenericRecord] extends CSVReader[T] with ConditionalDataReader[T]


    Data Reader for event type CSV data, when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to an avro record using the provided schema.

  18. abstract class ConditionalCustomReader[T] extends CustomReader[T] with ConditionalDataReader[T]


    Custom conditional aggregate data reader

  19. trait ConditionalDataReader[T] extends AggregatedReader[T]


    DataReader to use for event type data, when modeling conditional probabilities. Predictor variables will be aggregated from events up until the occurrence of the condition. Response variables will be aggregated from events following the occurrence of the condition.

  20. case class ConditionalParams[T](timeStampFn: (T) ⇒ Long, targetCondition: (T) ⇒ Boolean, responseWindow: Option[Duration] = ..., predictorWindow: Option[Duration] = ..., timeStampToKeep: TimeStampToKeep = TimeStampToKeep.Random, cutOffTimeFn: Option[(String, Seq[T]) ⇒ CutOffTime] = None, dropIfTargetConditionNotMet: Boolean = false) extends Product with Serializable


    Conditional data reader params


    function for extracting the timestamp from an event


    function for identifying if the condition is met


    optional size of time window over which the response variable is to be aggregated


    optional size of time window over which the predictor variables are to be aggregated


    if a particular key met the condition multiple times, which of the times would you like to use in the training set


    optional function to compute the cutoff value based on key and aggregated sequence of events for that key


    do not generate feature vectors for keys in training set where the target condition is not met. If set to false, and condition is not met, features for those

  21. class ConditionalParquetProductReader[T <: Product] extends ParquetProductReader[T] with ConditionalDataReader[T]


    Data Reader for Parquet events, when computing conditional probabilities. There may be multiple records for a given key. Each parquet record will be automatically converted to type T that defines an Encoder.

  22. abstract class CustomReader[T] extends DataReader[T]


    Custom data reader

  23. trait DataReader[T] extends Reader[T] with ReaderKey[T]


    DataReaders must specify: 1. An optional path to read from 2. A function for extracting the key from the records being read 3. The read method to be used for reading the data

  24. class FileStreamingAvroReader[T <: GenericRecord] extends StreamingReader[T]


    Simple avro streaming reader that monitors a Hadoop-compatible filesystem for new files.

  25. case class JoinKeys(leftKey: String = KeyFieldName, rightKey: String = KeyFieldName, resultKey: String = CombinedKeyName) extends Product with Serializable


    Join Keys to use


    key to use from left table


    key to use from right table (will always be the aggregation key


    key of joined result

  26. sealed abstract class JoinType extends EnumEntry with Serializable

  27. class ParquetProductReader[T <: Product] extends DataReader[T]


    ParquetReader for any type that defines an Encoder. Scala case classes and tuples/products included automatically.

  28. trait Reader[T] extends ReaderType[T]

  29. trait StreamingReader[T] extends ReaderType[T] with ReaderKey[T]

  30. case class TimeBasedFilter(condition: TimeColumn, primary: TimeColumn, timeWindow: Duration) extends Product with Serializable


    Time based filter for conditional aggregation


    condition time column


    primary time column


    time window for conditional aggregation

  31. case class TimeColumn(name: String, keep: Boolean) extends Product with Serializable


    Time column for aggregation


    column name


    should keep the column in result

  32. sealed abstract class TimeStampToKeep extends EnumEntry with Serializable


  1. object CSVDefaults

  2. object DataFrameFieldNames extends Product with Serializable


    The name of the column containing the entity being scored will always be key

  3. object DataReaders


    Just a handy factory for data readers

  4. object JoinTypes extends Enum[JoinType]

  5. object ReaderKey extends Serializable

  6. object StreamingReaders


    Just a handy factory for streaming readers

  7. object TimeStampToKeep extends Enum[TimeStampToKeep] with Serializable

