Skip to content

Commit 87bd1f9

Browse files
committed
SPARK-1093: Annotate developer and experimental API's
This patch marks some existing classes as private[spark] and adds two types of API annotations: - `EXPERIMENTAL API` = experimental user-facing module - `DEVELOPER API - UNSTABLE` = developer-facing API that might change There is some discussion of the different mechanisms for doing this here: https://issues.apache.org/jira/browse/SPARK-1081 I was pretty aggressive with marking things private. Keep in mind that if we want to open something up in the future we can, but we can never reduce visibility. A few notes here: - In the past we've been inconsistent with the visiblity of the X-RDD classes. This patch marks them private whenever there is an existing function in RDD that can directly creat them (e.g. CoalescedRDD and rdd.coalesce()). One trade-off here is users can't subclass them. - Noted that compression and serialization formats don't have to be wire compatible across versions. - Compression codecs and serialization formats are semi-private as users typically don't instantiate them directly. - Metrics sources are made private - user only interacts with them through Spark's reflection Author: Patrick Wendell <[email protected]> Author: Andrew Or <[email protected]> Closes #274 from pwendell/private-apis and squashes the following commits: 44179e4 [Patrick Wendell] Merge remote-tracking branch 'apache-github/master' into private-apis 042c803 [Patrick Wendell] spark.annotations -> spark.annotation bfe7b52 [Patrick Wendell] Adding experimental for approximate counts 8d0c873 [Patrick Wendell] Warning in SparkEnv 99b223a [Patrick Wendell] Cleaning up annotations e849f64 [Patrick Wendell] Merge pull request #2 from andrewor14/annotations 982a473 [Andrew Or] Generalize jQuery matching for non Spark-core API docs a01c076 [Patrick Wendell] Merge pull request #1 from andrewor14/annotations c1bcb41 [Andrew Or] DeveloperAPI -> DeveloperApi 0d48908 [Andrew Or] Comments and new lines (minor) f3954e0 [Andrew Or] Add identifier tags in comments to work around scaladocs bug 99192ef [Andrew Or] Dynamically add badges based on annotations 824011b [Andrew Or] Add support for injecting arbitrary JavaScript to API docs 037755c [Patrick Wendell] Some changes after working with andrew or f7d124f [Patrick Wendell] Small fixes c318b24 [Patrick Wendell] Use CSS styles e4c76b9 [Patrick Wendell] Logging f390b13 [Patrick Wendell] Better visibility for workaround constructors d6b0afd [Patrick Wendell] Small chang to existing constructor 403ba52 [Patrick Wendell] Style fix 870a7ba [Patrick Wendell] Work around for SI-8479 7fb13b2 [Patrick Wendell] Changes to UnionRDD and EmptyRDD 4a9e90c [Patrick Wendell] EXPERIMENTAL API --> EXPERIMENTAL c581dce [Patrick Wendell] Changes after building against Shark. 8452309 [Patrick Wendell] Style fixes 1ed27d2 [Patrick Wendell] Formatting and coloring of badges cd7a465 [Patrick Wendell] Code review feedback 2f706f1 [Patrick Wendell] Don't use floats 542a736 [Patrick Wendell] Small fixes cf23ec6 [Patrick Wendell] Marking GraphX as alpha d86818e [Patrick Wendell] Another naming change 5a76ed6 [Patrick Wendell] More visiblity clean-up 42c1f09 [Patrick Wendell] Using better labels 9d48cbf [Patrick Wendell] Initial pass
1 parent 9689b66 commit 87bd1f9

File tree

84 files changed

+614
-130
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+614
-130
lines changed

core/src/main/scala/org/apache/spark/Aggregator.scala

+3
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,18 @@
1717

1818
package org.apache.spark
1919

20+
import org.apache.spark.annotation.DeveloperApi
2021
import org.apache.spark.util.collection.{AppendOnlyMap, ExternalAppendOnlyMap}
2122

2223
/**
24+
* :: DeveloperApi ::
2325
* A set of functions used to aggregate data.
2426
*
2527
* @param createCombiner function to create the initial value of the aggregation.
2628
* @param mergeValue function to merge a new value into the aggregation result.
2729
* @param mergeCombiners function to merge outputs from multiple mergeValue function.
2830
*/
31+
@DeveloperApi
2932
case class Aggregator[K, V, C] (
3033
createCombiner: V => C,
3134
mergeValue: (C, V) => C,

core/src/main/scala/org/apache/spark/Dependency.scala

+11
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,24 @@
1717

1818
package org.apache.spark
1919

20+
import org.apache.spark.annotation.DeveloperApi
2021
import org.apache.spark.rdd.RDD
2122
import org.apache.spark.serializer.Serializer
2223

2324
/**
25+
* :: DeveloperApi ::
2426
* Base class for dependencies.
2527
*/
28+
@DeveloperApi
2629
abstract class Dependency[T](val rdd: RDD[T]) extends Serializable
2730

2831

2932
/**
33+
* :: DeveloperApi ::
3034
* Base class for dependencies where each partition of the parent RDD is used by at most one
3135
* partition of the child RDD. Narrow dependencies allow for pipelined execution.
3236
*/
37+
@DeveloperApi
3338
abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd) {
3439
/**
3540
* Get the parent partitions for a child partition.
@@ -41,13 +46,15 @@ abstract class NarrowDependency[T](rdd: RDD[T]) extends Dependency(rdd) {
4146

4247

4348
/**
49+
* :: DeveloperApi ::
4450
* Represents a dependency on the output of a shuffle stage.
4551
* @param rdd the parent RDD
4652
* @param partitioner partitioner used to partition the shuffle output
4753
* @param serializer [[org.apache.spark.serializer.Serializer Serializer]] to use. If set to null,
4854
* the default serializer, as specified by `spark.serializer` config option, will
4955
* be used.
5056
*/
57+
@DeveloperApi
5158
class ShuffleDependency[K, V](
5259
@transient rdd: RDD[_ <: Product2[K, V]],
5360
val partitioner: Partitioner,
@@ -61,20 +68,24 @@ class ShuffleDependency[K, V](
6168

6269

6370
/**
71+
* :: DeveloperApi ::
6472
* Represents a one-to-one dependency between partitions of the parent and child RDDs.
6573
*/
74+
@DeveloperApi
6675
class OneToOneDependency[T](rdd: RDD[T]) extends NarrowDependency[T](rdd) {
6776
override def getParents(partitionId: Int) = List(partitionId)
6877
}
6978

7079

7180
/**
81+
* :: DeveloperApi ::
7282
* Represents a one-to-one dependency between ranges of partitions in the parent and child RDDs.
7383
* @param rdd the parent RDD
7484
* @param inStart the start of the range in the parent RDD
7585
* @param outStart the start of the range in the child RDD
7686
* @param length the length of the range
7787
*/
88+
@DeveloperApi
7889
class RangeDependency[T](rdd: RDD[T], inStart: Int, outStart: Int, length: Int)
7990
extends NarrowDependency[T](rdd) {
8091

core/src/main/scala/org/apache/spark/FutureAction.scala

+7
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,16 @@ import scala.concurrent._
2121
import scala.concurrent.duration.Duration
2222
import scala.util.Try
2323

24+
import org.apache.spark.annotation.Experimental
2425
import org.apache.spark.rdd.RDD
2526
import org.apache.spark.scheduler.{JobFailed, JobSucceeded, JobWaiter}
2627

2728
/**
29+
* :: Experimental ::
2830
* A future for the result of an action to support cancellation. This is an extension of the
2931
* Scala Future interface to support cancellation.
3032
*/
33+
@Experimental
3134
trait FutureAction[T] extends Future[T] {
3235
// Note that we redefine methods of the Future trait here explicitly so we can specify a different
3336
// documentation (with reference to the word "action").
@@ -84,9 +87,11 @@ trait FutureAction[T] extends Future[T] {
8487

8588

8689
/**
90+
* :: Experimental ::
8791
* A [[FutureAction]] holding the result of an action that triggers a single job. Examples include
8892
* count, collect, reduce.
8993
*/
94+
@Experimental
9095
class SimpleFutureAction[T] private[spark](jobWaiter: JobWaiter[_], resultFunc: => T)
9196
extends FutureAction[T] {
9297

@@ -148,10 +153,12 @@ class SimpleFutureAction[T] private[spark](jobWaiter: JobWaiter[_], resultFunc:
148153

149154

150155
/**
156+
* :: Experimental ::
151157
* A [[FutureAction]] for actions that could trigger multiple Spark jobs. Examples include take,
152158
* takeSample. Cancellation works by setting the cancelled flag to true and interrupting the
153159
* action thread if it is being blocked by a job.
154160
*/
161+
@Experimental
155162
class ComplexFutureAction[T] extends FutureAction[T] {
156163

157164
// Pointer to the thread that is executing the action. It is set when the action is run.

core/src/main/scala/org/apache/spark/InterruptibleIterator.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ package org.apache.spark
2121
* An iterator that wraps around an existing iterator to provide task killing functionality.
2222
* It works by checking the interrupted flag in [[TaskContext]].
2323
*/
24-
class InterruptibleIterator[+T](val context: TaskContext, val delegate: Iterator[T])
24+
private[spark] class InterruptibleIterator[+T](val context: TaskContext, val delegate: Iterator[T])
2525
extends Iterator[T] {
2626

2727
def hasNext: Boolean = !context.interrupted && delegate.hasNext

core/src/main/scala/org/apache/spark/Logging.scala

+7
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,18 @@ import org.apache.log4j.{LogManager, PropertyConfigurator}
2121
import org.slf4j.{Logger, LoggerFactory}
2222
import org.slf4j.impl.StaticLoggerBinder
2323

24+
import org.apache.spark.annotation.DeveloperApi
25+
2426
/**
27+
* :: DeveloperApi ::
2528
* Utility trait for classes that want to log data. Creates a SLF4J logger for the class and allows
2629
* logging messages at different levels using methods that only evaluate parameters lazily if the
2730
* log level is enabled.
31+
*
32+
* NOTE: DO NOT USE this class outside of Spark. It is intended as an internal utility.
33+
* This will likely be changed or removed in future releases.
2834
*/
35+
@DeveloperApi
2936
trait Logging {
3037
// Make the log field transient so that objects with Logging can
3138
// be serialized and used on another machine

core/src/main/scala/org/apache/spark/SerializableWritable.scala

+3
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ import org.apache.hadoop.conf.Configuration
2323
import org.apache.hadoop.io.ObjectWritable
2424
import org.apache.hadoop.io.Writable
2525

26+
import org.apache.spark.annotation.DeveloperApi
27+
28+
@DeveloperApi
2629
class SerializableWritable[T <: Writable](@transient var t: T) extends Serializable {
2730
def value = t
2831
override def toString = t.toString

core/src/main/scala/org/apache/spark/SparkContext.scala

+73-13
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ import org.apache.hadoop.mapreduce.{InputFormat => NewInputFormat, Job => NewHad
3434
import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat => NewFileInputFormat}
3535
import org.apache.mesos.MesosNativeLibrary
3636

37+
import org.apache.spark.annotation.{DeveloperApi, Experimental}
3738
import org.apache.spark.broadcast.Broadcast
3839
import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil}
3940
import org.apache.spark.input.WholeTextFileInputFormat
@@ -48,22 +49,35 @@ import org.apache.spark.ui.SparkUI
4849
import org.apache.spark.util.{ClosureCleaner, MetadataCleaner, MetadataCleanerType, TimeStampedWeakValueHashMap, Utils}
4950

5051
/**
52+
* :: DeveloperApi ::
5153
* Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
5254
* cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
5355
*
5456
* @param config a Spark Config object describing the application configuration. Any settings in
5557
* this config overrides the default configs as well as system properties.
56-
* @param preferredNodeLocationData used in YARN mode to select nodes to launch containers on. Can
57-
* be generated using [[org.apache.spark.scheduler.InputFormatInfo.computePreferredLocations]]
58-
* from a list of input files or InputFormats for the application.
5958
*/
60-
class SparkContext(
61-
config: SparkConf,
62-
// This is used only by YARN for now, but should be relevant to other cluster types (Mesos,
63-
// etc) too. This is typically generated from InputFormatInfo.computePreferredLocations. It
64-
// contains a map from hostname to a list of input format splits on the host.
65-
val preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map())
66-
extends Logging {
59+
60+
@DeveloperApi
61+
class SparkContext(config: SparkConf) extends Logging {
62+
63+
// This is used only by YARN for now, but should be relevant to other cluster types (Mesos,
64+
// etc) too. This is typically generated from InputFormatInfo.computePreferredLocations. It
65+
// contains a map from hostname to a list of input format splits on the host.
66+
private[spark] var preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map()
67+
68+
/**
69+
* :: DeveloperApi ::
70+
* Alternative constructor for setting preferred locations where Spark will create executors.
71+
*
72+
* @param preferredNodeLocationData used in YARN mode to select nodes to launch containers on. Ca
73+
* be generated using [[org.apache.spark.scheduler.InputFormatInfo.computePreferredLocations]]
74+
* from a list of input files or InputFormats for the application.
75+
*/
76+
@DeveloperApi
77+
def this(config: SparkConf, preferredNodeLocationData: Map[String, Set[SplitInfo]]) = {
78+
this(config)
79+
this.preferredNodeLocationData = preferredNodeLocationData
80+
}
6781

6882
/**
6983
* Alternative constructor that allows setting common Spark properties directly
@@ -93,10 +107,45 @@ class SparkContext(
93107
environment: Map[String, String] = Map(),
94108
preferredNodeLocationData: Map[String, Set[SplitInfo]] = Map()) =
95109
{
96-
this(SparkContext.updatedConf(new SparkConf(), master, appName, sparkHome, jars, environment),
97-
preferredNodeLocationData)
110+
this(SparkContext.updatedConf(new SparkConf(), master, appName, sparkHome, jars, environment))
111+
this.preferredNodeLocationData = preferredNodeLocationData
98112
}
99113

114+
// NOTE: The below constructors could be consolidated using default arguments. Due to
115+
// Scala bug SI-8479, however, this causes the compile step to fail when generating docs.
116+
// Until we have a good workaround for that bug the constructors remain broken out.
117+
118+
/**
119+
* Alternative constructor that allows setting common Spark properties directly
120+
*
121+
* @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]).
122+
* @param appName A name for your application, to display on the cluster web UI.
123+
*/
124+
private[spark] def this(master: String, appName: String) =
125+
this(master, appName, null, Nil, Map(), Map())
126+
127+
/**
128+
* Alternative constructor that allows setting common Spark properties directly
129+
*
130+
* @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]).
131+
* @param appName A name for your application, to display on the cluster web UI.
132+
* @param sparkHome Location where Spark is installed on cluster nodes.
133+
*/
134+
private[spark] def this(master: String, appName: String, sparkHome: String) =
135+
this(master, appName, sparkHome, Nil, Map(), Map())
136+
137+
/**
138+
* Alternative constructor that allows setting common Spark properties directly
139+
*
140+
* @param master Cluster URL to connect to (e.g. mesos://host:port, spark://host:port, local[4]).
141+
* @param appName A name for your application, to display on the cluster web UI.
142+
* @param sparkHome Location where Spark is installed on cluster nodes.
143+
* @param jars Collection of JARs to send to the cluster. These can be paths on the local file
144+
* system or HDFS, HTTP, HTTPS, or FTP URLs.
145+
*/
146+
private[spark] def this(master: String, appName: String, sparkHome: String, jars: Seq[String]) =
147+
this(master, appName, sparkHome, jars, Map(), Map())
148+
100149
private[spark] val conf = config.clone()
101150

102151
/**
@@ -189,7 +238,7 @@ class SparkContext(
189238
jars.foreach(addJar)
190239
}
191240

192-
def warnSparkMem(value: String): String = {
241+
private def warnSparkMem(value: String): String = {
193242
logWarning("Using SPARK_MEM to set amount of memory to use per executor process is " +
194243
"deprecated, please use spark.executor.memory instead.")
195244
value
@@ -653,6 +702,9 @@ class SparkContext(
653702
def union[T: ClassTag](first: RDD[T], rest: RDD[T]*): RDD[T] =
654703
new UnionRDD(this, Seq(first) ++ rest)
655704

705+
/** Get an RDD that has no partitions or elements. */
706+
def emptyRDD[T: ClassTag] = new EmptyRDD[T](this)
707+
656708
// Methods for creating shared variables
657709

658710
/**
@@ -716,6 +768,11 @@ class SparkContext(
716768
postEnvironmentUpdate()
717769
}
718770

771+
/**
772+
* :: DeveloperApi ::
773+
* Register a listener to receive up-calls from events that happen during execution.
774+
*/
775+
@DeveloperApi
719776
def addSparkListener(listener: SparkListener) {
720777
listenerBus.addListener(listener)
721778
}
@@ -1021,8 +1078,10 @@ class SparkContext(
10211078
}
10221079

10231080
/**
1081+
* :: DeveloperApi ::
10241082
* Run a job that can return approximate results.
10251083
*/
1084+
@DeveloperApi
10261085
def runApproximateJob[T, U, R](
10271086
rdd: RDD[T],
10281087
func: (TaskContext, Iterator[T]) => U,
@@ -1040,6 +1099,7 @@ class SparkContext(
10401099
/**
10411100
* Submit a job for execution and return a FutureJob holding the result.
10421101
*/
1102+
@Experimental
10431103
def submitJob[T, U, R](
10441104
rdd: RDD[T],
10451105
processPartition: Iterator[T] => U,

core/src/main/scala/org/apache/spark/SparkEnv.scala

+7-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import scala.util.Properties
2525
import akka.actor._
2626
import com.google.common.collect.MapMaker
2727

28+
import org.apache.spark.annotation.DeveloperApi
2829
import org.apache.spark.api.python.PythonWorkerFactory
2930
import org.apache.spark.broadcast.BroadcastManager
3031
import org.apache.spark.metrics.MetricsSystem
@@ -35,13 +36,18 @@ import org.apache.spark.storage._
3536
import org.apache.spark.util.{AkkaUtils, Utils}
3637

3738
/**
39+
* :: DeveloperApi ::
3840
* Holds all the runtime environment objects for a running Spark instance (either master or worker),
3941
* including the serializer, Akka actor system, block manager, map output tracker, etc. Currently
4042
* Spark code finds the SparkEnv through a thread-local variable, so each thread that accesses these
4143
* objects needs to have the right SparkEnv set. You can get the current environment with
4244
* SparkEnv.get (e.g. after creating a SparkContext) and set it with SparkEnv.set.
45+
*
46+
* NOTE: This is not intended for external use. This is exposed for Shark and may be made private
47+
* in a future release.
4348
*/
44-
class SparkEnv private[spark] (
49+
@DeveloperApi
50+
class SparkEnv (
4551
val executorId: String,
4652
val actorSystem: ActorSystem,
4753
val serializer: Serializer,

core/src/main/scala/org/apache/spark/TaskContext.scala

+6
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,14 @@ package org.apache.spark
1919

2020
import scala.collection.mutable.ArrayBuffer
2121

22+
import org.apache.spark.annotation.DeveloperApi
2223
import org.apache.spark.executor.TaskMetrics
2324

25+
/**
26+
* :: DeveloperApi ::
27+
* Contextual information about a task which can be read or mutated during execution.
28+
*/
29+
@DeveloperApi
2430
class TaskContext(
2531
val stageId: Int,
2632
val partitionId: Int,

0 commit comments

Comments
 (0)