 
            
            default path to data
function for extracting key from avro record
header of the CSV file as array, otherwise the first row is assumed as a headers line
CSV options
timeZone to be used for any dateTime fields
result record namespace
result record name
 
             
             
             
             
             
            Conditional data reader params
Conditional data reader params
 
             
             
             
            Full reader input type name
Full reader input type name
full input type name
 
            Generate the Dataframe that will be used in the OpPipeline calling this method
Generate the Dataframe that will be used in the OpPipeline calling this method
features to generate from the dataset read in by this reader
op parameters
spark instance to do the reading and conversion from RDD to Dataframe
A Dataframe containing columns with all of the raw input features expected by the pipeline
 
             
             
             
            Default method for extracting the path used in read method.
Default method for extracting the path used in read method. The path is taken in the following order of priority: readerPath, params
final path to use
 
             
            Default method for extracting this reader's parameters from readerParams in OpParams
Default method for extracting this reader's parameters from readerParams in OpParams
contains map of reader type to ReaderParams instances
ReaderParams instance if it exists
 
            Derives DataFrame schema for raw features.
Derives DataFrame schema for raw features.
feature array representing raw feature-data
a StructType instance
 
             
            header of the CSV file as array, otherwise the first row is assumed as a headers line
header of the CSV file as array, otherwise the first row is assumed as a headers line
 
            Inner join
Inner join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
 
             
            Join readers
Join readers
Type of data read by right data reader
reader from right side of join
type of join to perform
join keys to use
joined reader
 
            function for extracting key from avro record
function for extracting key from avro record
 
            Left Outer join
Left Outer join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
 
            Function to repartition the data based on the op params of this reader
Function to repartition the data based on the op params of this reader
dataset
op params
maybe repartitioned dataset
 
            Function to repartition the data based on the op params of this reader
Function to repartition the data based on the op params of this reader
rdd
op params
maybe repartitioned rdd
 
             
             
             
            CSV options
CSV options
 
            Outer join
Outer join
Type of data read by right data reader
reader from right side of join
join keys to use
joined reader
 
            Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns either RDD or Dataset of the type specified by this reader. It can be overwritten to carry out any special logic required for the reader (ie filters or joins needed to produce the specified reader type).
parameters used to carry out specialized logic in reader (passed in from workflow)
spark instance to do the reading and conversion from RDD to Dataframe
either RDD or Dataset of type T
 
            Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a Dataset of the type specified by this reader.
parameters used to carry out specialized logic in reader (passed in from workflow)
spark session
Dataset of type T
 
            default path to data
default path to data
 
            Function which reads raw data from specified location to use in Dataframe creation, i.e.
Function which reads raw data from specified location to use in Dataframe creation, i.e. generateDataFrame fun. This function returns a RDD of the type specified by this reader.
parameters used to carry out specialized logic in reader (passed in from workflow)
spark session
RDD of type T
 
            result record name
result record name
 
            result record namespace
result record namespace
 
             
             
            All the reader's sub readers (used in joins)
All the reader's sub readers (used in joins)
sub readers
 
             
            timeZone to be used for any dateTime fields
timeZone to be used for any dateTime fields
 
             
             
            Short reader input type name
Short reader input type name
short reader input type name
 
             
             
             
            Reader type tag
Reader type tag
Data Reader for event type CSV data (with schema inference), when computing conditional probabilities. There may be multiple records for a given key. Each csv record will be automatically converted to an avro record with an inferred schema.