spark

Type Members

case class AppMetrics(appName: String, appId: String, runType: String, customTagName: Option[String], customTagValue: Option[String], appStartTime: Long, appEndTime: Long, appDuration: Long, stageMetrics: Seq[StageMetrics], cumulativeStageMetrics: CumulativeStageMetrics, versionInfo: VersionInfo) extends MetricJsonLike with Product with Serializable

App metrics container.

App metrics container. Contains the app info, all the stage metrics computed by the spark listener and project version info.
trait BaseStageMetrics extends AnyRef
case class CumulativeStageMetrics(numTasks: Int, numAccumulables: Int, executorRunTime: Long, executorCpuTime: Long, executorDeserializeTime: Long, executorDeserializeCpuTime: Long, resultSerializationTime: Long, jvmGCTime: Long, resultSizeBytes: Long, numUpdatedBlockStatuses: Int, diskBytesSpilled: Long, memoryBytesSpilled: Long, peakExecutionMemory: Max[Long], recordsRead: Long, bytesRead: Long, recordsWritten: Long, bytesWritten: Long, shuffleFetchWaitTime: Long, shuffleTotalBytesRead: Long, shuffleTotalBlocksFetched: Long, shuffleLocalBlocksFetched: Long, shuffleRemoteBlocksFetched: Long, shuffleWriteTime: Long, shuffleBytesWritten: Long, shuffleRecordsWritten: Long, duration: Option[Long] = None) extends BaseStageMetrics with MetricJsonLike with Product with Serializable
trait MetricJsonLike extends JsonLike
class OpSparkListener extends SparkListener

Logs & collects metrics upon completion of Spark application, jobs, stages
sealed abstract class OpStep extends EnumEntry with Serializable
case class OpVectorColumnHistory(columnName: String, parentFeatureName: Seq[String], parentFeatureOrigins: Seq[String], parentFeatureStages: Seq[String], parentFeatureType: Seq[String], grouping: Option[String], indicatorValue: Option[String], descriptorValue: Option[String], index: Int) extends JsonLike with Product with Serializable

Full history for each column element in a vector

Full history for each column element in a vector

columnName

name for feature in column

parentFeatureName

name of immediate parent feature that was used to create the vector

parentFeatureOrigins

names of raw features that went into the parent feature

parentFeatureStages

stageNames of all stages applied to the parent feature before conversion to a vector

parentFeatureType

type of the parent feature

grouping

The name of the group a column belongs to (usually the parent feature, but in the case of TextMapVectorizer, this includes keys in maps too). Every other vector column in the same vector that has this same indicator group should be mutually exclusive to this one. If this is not an indicator, then this field is None

indicatorValue

A name for a binary indicator value (null indicator or result of a pivot or whatever that value is), otherwise None

descriptorValue

A name for a value that is continuous (not a binary indicator) eg for geolocation (lat, lon, accuracy) or for dates that have been converted to a circular representation the time window and x or y coordinate, otherwise None

index

the index of the vector column this information is tied to
case class OpVectorColumnMetadata(parentFeatureName: Seq[String], parentFeatureType: Seq[String], grouping: Option[String], indicatorValue: Option[String] = None, descriptorValue: Option[String] = None, index: Int = 0) extends JsonLike with Product with Serializable

Represents the metadata a column in a vector.

Represents the metadata a column in a vector.

Because we expect every vector column to have been produced by some vectorization process, we provide the name of the feature that led to this column.

Also note that each column's indicator value should be unique, meaning that they represent mutually exclusive values. The output of a hashing vectorizer, for instance, does not produce mutually exclusive values.

parentFeatureName

The name of the parent feature(s) for the column. Usually a column has one parent feature, but can have many (eg. in the case of multiple Text columns being vectorized using a shared hash space)

parentFeatureType

The type of the parent feature(s) for the column

grouping

The name of the group an column belongs to (usually the parent feature, but in the case of Maps, this is the keys). Every other column in the same vector that has this grouping should be mutually exclusive to this one. If there is no grouping then this field is None

indicatorValue

A name for an binary indicator value (null indicator or result of a pivot or whatever that value is), otherwise None eg this is none when the column is from a numeric group that is not pivoted

descriptorValue

A name for a value that is continuous (not a binary indicator) eg for geolocation (lat, lon, accuracy) or for dates that have been converted to a circular representation the time window and x or y coordinate, otherwise None

index

Index of the vector this info is associated with (this is updated when OpVectorColumnMetadata is passed into OpVectorMetadata
class OpVectorMetadata extends AnyRef

Represents a metadata wrapper that includes parent feature information.

Represents a metadata wrapper that includes parent feature information.

The metadata includes a columns field that describes the columns in the vector.
case class StageMetrics extends BaseStageMetrics with MetricJsonLike with Product with Serializable

Spark stage metrics container for a org.apache.spark.scheduler.StageInfo Note: all the time values are in milliseconds.

Value Members

object CumulativeStageMetrics extends Serializable
object JobGroupUtil

Convenience methods for working with Spark's job groups.
object OpSparkListener
object OpStep extends Enum[OpStep] with Serializable
object OpVectorColumnHistory extends Product with Serializable
object OpVectorColumnMetadata extends Serializable
object OpVectorMetadata
object RichDataType
object RichDataset

Dataset enrichment functions
object RichEvaluator extends Product with Serializable

Various Evaluator helpers functions
object RichMetadata
object RichParamMap
object RichRDD
object RichRow

org.apache.spark.sql.Row enrichment functions
object RichStructType
object RichVector

org.apache.spark.ml.linalg.Vector enrichment functions
object SequenceAggregators

A factory for Spark sequence aggregators
object StageMetrics extends Serializable

package spark

Type Members

trait BaseStageMetrics extends AnyRef

trait MetricJsonLike extends JsonLike

class OpSparkListener extends SparkListener

sealed abstract class OpStep extends EnumEntry with Serializable

case class OpVectorColumnMetadata(parentFeatureName: Seq[String], parentFeatureType: Seq[String], grouping: Option[String], indicatorValue: Option[String] = None, descriptorValue: Option[String] = None, index: Int = 0) extends JsonLike with Product with Serializable

class OpVectorMetadata extends AnyRef

case class StageMetrics extends BaseStageMetrics with MetricJsonLike with Product with Serializable

Value Members

object CumulativeStageMetrics extends Serializable

object JobGroupUtil

object OpSparkListener

object OpStep extends Enum[OpStep] with Serializable

object OpVectorColumnHistory extends Product with Serializable

object OpVectorColumnMetadata extends Serializable

object OpVectorMetadata

object RichDataType

object RichDataset

object RichEvaluator extends Product with Serializable

object RichMetadata

object RichParamMap

object RichRDD

object RichRow

object RichStructType

object RichVector

object SequenceAggregators

object StageMetrics extends Serializable

Ungrouped