Package

com.salesforce.op.utils

spark

Permalink

package spark

Visibility
  1. Public
  2. All

Type Members

  1. case class AppMetrics(appName: String, appId: String, runType: String, customTagName: Option[String], customTagValue: Option[String], appStartTime: Long, appEndTime: Long, appDuration: Long, stageMetrics: Seq[StageMetrics], cumulativeStageMetrics: CumulativeStageMetrics, versionInfo: VersionInfo) extends MetricJsonLike with Product with Serializable

    Permalink

    App metrics container.

    App metrics container. Contains the app info, all the stage metrics computed by the spark listener and project version info.

  2. trait BaseStageMetrics extends AnyRef

    Permalink
  3. case class CumulativeStageMetrics(numTasks: Int, numAccumulables: Int, executorRunTime: Long, executorCpuTime: Long, executorDeserializeTime: Long, executorDeserializeCpuTime: Long, resultSerializationTime: Long, jvmGCTime: Long, resultSizeBytes: Long, numUpdatedBlockStatuses: Int, diskBytesSpilled: Long, memoryBytesSpilled: Long, peakExecutionMemory: Max[Long], recordsRead: Long, bytesRead: Long, recordsWritten: Long, bytesWritten: Long, shuffleFetchWaitTime: Long, shuffleTotalBytesRead: Long, shuffleTotalBlocksFetched: Long, shuffleLocalBlocksFetched: Long, shuffleRemoteBlocksFetched: Long, shuffleWriteTime: Long, shuffleBytesWritten: Long, shuffleRecordsWritten: Long, duration: Option[Long] = None) extends BaseStageMetrics with MetricJsonLike with Product with Serializable

    Permalink
  4. trait MetricJsonLike extends JsonLike

    Permalink
  5. class OpSparkListener extends SparkListener

    Permalink

    Logs & collects metrics upon completion of Spark application, jobs, stages

  6. sealed abstract class OpStep extends EnumEntry with Serializable

    Permalink
  7. case class OpVectorColumnHistory(columnName: String, parentFeatureName: Seq[String], parentFeatureOrigins: Seq[String], parentFeatureStages: Seq[String], parentFeatureType: Seq[String], grouping: Option[String], indicatorValue: Option[String], descriptorValue: Option[String], index: Int) extends JsonLike with Product with Serializable

    Permalink

    Full history for each column element in a vector

    Full history for each column element in a vector

    columnName

    name for feature in column

    parentFeatureName

    name of immediate parent feature that was used to create the vector

    parentFeatureOrigins

    names of raw features that went into the parent feature

    parentFeatureStages

    stageNames of all stages applied to the parent feature before conversion to a vector

    parentFeatureType

    type of the parent feature

    grouping

    The name of the group a column belongs to (usually the parent feature, but in the case of TextMapVectorizer, this includes keys in maps too). Every other vector column in the same vector that has this same indicator group should be mutually exclusive to this one. If this is not an indicator, then this field is None

    indicatorValue

    A name for a binary indicator value (null indicator or result of a pivot or whatever that value is), otherwise None

    descriptorValue

    A name for a value that is continuous (not a binary indicator) eg for geolocation (lat, lon, accuracy) or for dates that have been converted to a circular representation the time window and x or y coordinate, otherwise None

    index

    the index of the vector column this information is tied to

  8. case class OpVectorColumnMetadata(parentFeatureName: Seq[String], parentFeatureType: Seq[String], grouping: Option[String], indicatorValue: Option[String] = None, descriptorValue: Option[String] = None, index: Int = 0) extends JsonLike with Product with Serializable

    Permalink

    Represents the metadata a column in a vector.

    Represents the metadata a column in a vector.

    Because we expect every vector column to have been produced by some vectorization process, we provide the name of the feature that led to this column.

    Also note that each column's indicator value should be unique, meaning that they represent mutually exclusive values. The output of a hashing vectorizer, for instance, does not produce mutually exclusive values.

    parentFeatureName

    The name of the parent feature(s) for the column. Usually a column has one parent feature, but can have many (eg. in the case of multiple Text columns being vectorized using a shared hash space)

    parentFeatureType

    The type of the parent feature(s) for the column

    grouping

    The name of the group an column belongs to (usually the parent feature, but in the case of Maps, this is the keys). Every other column in the same vector that has this grouping should be mutually exclusive to this one. If there is no grouping then this field is None

    indicatorValue

    A name for an binary indicator value (null indicator or result of a pivot or whatever that value is), otherwise None eg this is none when the column is from a numeric group that is not pivoted

    descriptorValue

    A name for a value that is continuous (not a binary indicator) eg for geolocation (lat, lon, accuracy) or for dates that have been converted to a circular representation the time window and x or y coordinate, otherwise None

    index

    Index of the vector this info is associated with (this is updated when OpVectorColumnMetadata is passed into OpVectorMetadata

  9. class OpVectorMetadata extends AnyRef

    Permalink

    Represents a metadata wrapper that includes parent feature information.

    Represents a metadata wrapper that includes parent feature information.

    The metadata includes a columns field that describes the columns in the vector.

  10. case class StageMetrics extends BaseStageMetrics with MetricJsonLike with Product with Serializable

    Permalink

    Spark stage metrics container for a org.apache.spark.scheduler.StageInfo Note: all the time values are in milliseconds.

Value Members

  1. object CumulativeStageMetrics extends Serializable

    Permalink
  2. object JobGroupUtil

    Permalink

    Convenience methods for working with Spark's job groups.

  3. object OpSparkListener

    Permalink
  4. object OpStep extends Enum[OpStep] with Serializable

    Permalink
  5. object OpVectorColumnHistory extends Product with Serializable

    Permalink
  6. object OpVectorColumnMetadata extends Serializable

    Permalink
  7. object OpVectorMetadata

    Permalink
  8. object RichDataType

    Permalink
  9. object RichDataset

    Permalink

    Dataset enrichment functions

  10. object RichEvaluator extends Product with Serializable

    Permalink

    Various Evaluator helpers functions

  11. object RichMetadata

    Permalink
  12. object RichParamMap

    Permalink
  13. object RichRDD

    Permalink
  14. object RichRow

    Permalink

    org.apache.spark.sql.Row enrichment functions

  15. object RichStructType

    Permalink
  16. object RichVector

    Permalink

    org.apache.spark.ml.linalg.Vector enrichment functions

  17. object SequenceAggregators

    Permalink

    A factory for Spark sequence aggregators

  18. object StageMetrics extends Serializable

    Permalink

Ungrouped