default path to data
function for extracting key from avro record
header of the CSV file as array, otherwise the first row is assumed as a headers line
CSV options
timeZone to be used for any dateTime fields
result record namespace
result record name
aggregate params function for extracting timestamp of event
aggregate params function for extracting timestamp of event
aggregate params function for extracting timestamp of event
Full reader input type name
Full reader input type name
full input type name
Generate the Dataframe that will be used in the OpPipeline calling this method
Generate the Dataframe that will be used in the OpPipeline calling this method
features to generate from the dataset read in by this reader
op parameters
spark instance to do the reading and conversion from RDD to Dataframe
A Dataframe containing columns with all of the raw input features expected by the pipeline
Default method for extracting the path used in read method.
Default method for extracting the path used in read method. The path is taken in the following order of priority: readerPath, params
final path to use
Default method for extracting this reader's parameters from readerParams in OpParams
Default method for extracting this reader's parameters from readerParams in OpParams
contains map of reader type to ReaderParams instances
ReaderParams instance if it exists
Derives DataFrame schema for raw features.
Derives DataFrame schema for raw features.
feature array representing raw feature-data
a StructType instance
header of the CSV file as array, otherwise the first row is assumed as a headers line
header of the CSV file as array, otherwise the first row is assumed as a headers line
Inner join
Inner join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
Join readers
Join readers
Type of data read by right data reader
reader from right side of join
type of join to perform
join keys to use
joined reader
function for extracting key from avro record
function for extracting key from avro record
Left Outer join
Left Outer join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
Function to repartition the data based on the op params of this reader
Function to repartition the data based on the op params of this reader
dataset
op params
maybe repartitioned dataset
Function to repartition the data based on the op params of this reader
Function to repartition the data based on the op params of this reader
rdd
op params
maybe repartitioned rdd
CSV options
CSV options
Outer join
Outer join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns either RDD or Dataset of the type specified by this reader. It can be overwritten to carry out any special logic required for the reader (ie filters or joins needed to produce the specified reader type).
parameters used to carry out specialized logic in reader (passed in from workflow)
spark instance to do the reading and conversion from RDD to Dataframe
either RDD or Dataset of type T
Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a Dataset of the type specified by this reader.
parameters used to carry out specialized logic in reader (passed in from workflow)
spark session
Dataset of type T
default path to data
default path to data
Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a RDD of the type specified by this reader.
parameters used to carry out specialized logic in reader (passed in from workflow)
spark session
RDD of type T
result record name
result record name
result record namespace
result record namespace
All the reader's sub readers (used in joins)
All the reader's sub readers (used in joins)
sub readers
timeZone to be used for any dateTime fields
timeZone to be used for any dateTime fields
Short reader input type name
Short reader input type name
short reader input type name
Reader type tag
Reader type tag
Data Reader for event type CSV data, where there may be multiple records for a given key. Each csv record will be automatically converted to an avro record by inferring a schema.