App metrics container.
Logs & collects metrics upon completion of Spark application, jobs, stages
Full history for each column element in a vector
Full history for each column element in a vector
name for feature in column
name of immediate parent feature that was used to create the vector
names of raw features that went into the parent feature
stageNames of all stages applied to the parent feature before conversion to a vector
type of the parent feature
The name of the group a column belongs to (usually the parent feature, but in the case of TextMapVectorizer, this includes keys in maps too). Every other vector column in the same vector that has this same indicator group should be mutually exclusive to this one. If this is not an indicator, then this field is None
A name for a binary indicator value (null indicator or result of a pivot or whatever that value is), otherwise None
A name for a value that is continuous (not a binary indicator) eg for geolocation (lat, lon, accuracy) or for dates that have been converted to a circular representation the time window and x or y coordinate, otherwise None
the index of the vector column this information is tied to
Represents the metadata a column in a vector.
Represents the metadata a column in a vector.
Because we expect every vector column to have been produced by some vectorization process, we provide the name of the feature that led to this column.
Also note that each column's indicator value should be unique, meaning that they represent mutually exclusive values. The output of a hashing vectorizer, for instance, does not produce mutually exclusive values.
The name of the parent feature(s) for the column. Usually a column has one parent feature, but can have many (eg. in the case of multiple Text columns being vectorized using a shared hash space)
The type of the parent feature(s) for the column
The name of the group an column belongs to (usually the parent feature, but in the case of Maps, this is the keys). Every other column in the same vector that has this grouping should be mutually exclusive to this one. If there is no grouping then this field is None
A name for an binary indicator value (null indicator or result of a pivot or whatever that value is), otherwise None eg this is none when the column is from a numeric group that is not pivoted
A name for a value that is continuous (not a binary indicator) eg for geolocation (lat, lon, accuracy) or for dates that have been converted to a circular representation the time window and x or y coordinate, otherwise None
Index of the vector this info is associated with (this is updated when OpVectorColumnMetadata is passed into OpVectorMetadata
Represents a metadata wrapper that includes parent feature information.
Represents a metadata wrapper that includes parent feature information.
The metadata includes a columns field that describes the columns in the vector.
Spark stage metrics container for a org.apache.spark.scheduler.StageInfo Note: all the time values are in milliseconds.
Convenience methods for working with Spark's job groups.
Dataset enrichment functions
Various Evaluator helpers functions
org.apache.spark.sql.Row enrichment functions
org.apache.spark.ml.linalg.Vector enrichment functions
A factory for Spark sequence aggregators
App metrics container. Contains the app info, all the stage metrics computed by the spark listener and project version info.