Conditional data reader params
Function for extracting key from a record
Function for extracting key from a record
key string
Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns either RDD or Dataset of the type specified by this reader. It can be overwritten to carry out any special logic required for the reader (ie filters or joins needed to produce the specified reader type).
parameters used to carry out specialized logic in reader (passed in from workflow)
spark instance to do the reading and conversion from RDD to Dataframe
either RDD or Dataset of type T
Default optional read path
Reader type tag
Reader type tag
Full reader input type name
Full reader input type name
full input type name
Generate the Dataframe that will be used in the OpPipeline calling this method
Generate the Dataframe that will be used in the OpPipeline calling this method
features to generate from the dataset read in by this reader
op parameters
spark instance to do the reading and conversion from RDD to Dataframe
A Dataframe containing columns with all of the raw input features expected by the pipeline
Default method for extracting the path used in read method.
Default method for extracting the path used in read method. The path is taken in the following order of priority: readerPath, params
final path to use
Default method for extracting this reader's parameters from readerParams in OpParams
Default method for extracting this reader's parameters from readerParams in OpParams
contains map of reader type to ReaderParams instances
ReaderParams instance if it exists
Derives DataFrame schema for raw features.
Derives DataFrame schema for raw features.
feature array representing raw feature-data
a StructType instance
Inner join
Inner join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
Join readers
Join readers
Type of data read by right data reader
reader from right side of join
type of join to perform
join keys to use
joined reader
Left Outer join
Left Outer join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
Function to repartition the data based on the op params of this reader
Function to repartition the data based on the op params of this reader
dataset
op params
maybe repartitioned dataset
Function to repartition the data based on the op params of this reader
Function to repartition the data based on the op params of this reader
rdd
op params
maybe repartitioned rdd
Outer join
Outer join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a Dataset of the type specified by this reader.
parameters used to carry out specialized logic in reader (passed in from workflow)
spark session
Dataset of type T
Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a RDD of the type specified by this reader.
parameters used to carry out specialized logic in reader (passed in from workflow)
spark session
RDD of type T
All the reader's sub readers (used in joins)
All the reader's sub readers (used in joins)
sub readers
Short reader input type name
Short reader input type name
short reader input type name
DataReader to use for event type data, when modeling conditional probabilities. Predictor variables will be aggregated from events up until the occurrence of the condition. Response variables will be aggregated from events following the occurrence of the condition.