Skip to content

Commit 48cecf6

Browse files
Marcelo Vanzinpwendell
Marcelo Vanzin
authored andcommitted
[SPARK-4048] Enhance and extend hadoop-provided profile.
This change does a few things to make the hadoop-provided profile more useful: - Create new profiles for other libraries / services that might be provided by the infrastructure - Simplify and fix the poms so that the profiles are only activated while building assemblies. - Fix tests so that they're able to run when the profiles are activated - Add a new env variable to be used by distributions that use these profiles to provide the runtime classpath for Spark jobs and daemons. Author: Marcelo Vanzin <[email protected]> Closes apache#2982 from vanzin/SPARK-4048 and squashes the following commits: 82eb688 [Marcelo Vanzin] Add a comment. eb228c0 [Marcelo Vanzin] Fix borked merge. 4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes. 371ebee [Marcelo Vanzin] Review feedback. 52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 322f882 [Marcelo Vanzin] Fix merge fail. f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 9640503 [Marcelo Vanzin] Cleanup child process log message. 115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom). e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile. 7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles. 1be73d4 [Marcelo Vanzin] Restore flume-provided profile. d1399ed [Marcelo Vanzin] Restore jetty dependency. 82a54b9 [Marcelo Vanzin] Remove unused profile. 5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles. 1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver. f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list. 9e4e001 [Marcelo Vanzin] Remove duplicate hive profile. d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log. 4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn. 417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH". 2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing. 1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects. 284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
1 parent c9c8b21 commit 48cecf6

File tree

23 files changed

+490
-353
lines changed

23 files changed

+490
-353
lines changed

assembly/pom.xml

+20
Original file line numberDiff line numberDiff line change
@@ -354,5 +354,25 @@
354354
</dependency>
355355
</dependencies>
356356
</profile>
357+
358+
<!-- Profiles that disable inclusion of certain dependencies. -->
359+
<profile>
360+
<id>hadoop-provided</id>
361+
<properties>
362+
<hadoop.deps.scope>provided</hadoop.deps.scope>
363+
</properties>
364+
</profile>
365+
<profile>
366+
<id>hive-provided</id>
367+
<properties>
368+
<hive.deps.scope>provided</hive.deps.scope>
369+
</properties>
370+
</profile>
371+
<profile>
372+
<id>parquet-provided</id>
373+
<properties>
374+
<parquet.deps.scope>provided</parquet.deps.scope>
375+
</properties>
376+
</profile>
357377
</profiles>
358378
</project>

bagel/pom.xml

-4
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,6 @@
4040
<artifactId>spark-core_${scala.binary.version}</artifactId>
4141
<version>${project.version}</version>
4242
</dependency>
43-
<dependency>
44-
<groupId>org.eclipse.jetty</groupId>
45-
<artifactId>jetty-server</artifactId>
46-
</dependency>
4743
<dependency>
4844
<groupId>org.scalacheck</groupId>
4945
<artifactId>scalacheck_${scala.binary.version}</artifactId>

bin/compute-classpath.cmd

+7
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,13 @@ if "x%YARN_CONF_DIR%"=="x" goto no_yarn_conf_dir
109109
set CLASSPATH=%CLASSPATH%;%YARN_CONF_DIR%
110110
:no_yarn_conf_dir
111111

112+
rem To allow for distributions to append needed libraries to the classpath (e.g. when
113+
rem using the "hadoop-provided" profile to build Spark), check SPARK_DIST_CLASSPATH and
114+
rem append it to tbe final classpath.
115+
if not "x%$SPARK_DIST_CLASSPATH%"=="x" (
116+
set CLASSPATH=%CLASSPATH%;%SPARK_DIST_CLASSPATH%
117+
)
118+
112119
rem A bit of a hack to allow calling this script within run2.cmd without seeing output
113120
if "%DONT_PRINT_CLASSPATH%"=="1" goto exit
114121

bin/compute-classpath.sh

+7
Original file line numberDiff line numberDiff line change
@@ -146,4 +146,11 @@ if [ -n "$YARN_CONF_DIR" ]; then
146146
CLASSPATH="$CLASSPATH:$YARN_CONF_DIR"
147147
fi
148148

149+
# To allow for distributions to append needed libraries to the classpath (e.g. when
150+
# using the "hadoop-provided" profile to build Spark), check SPARK_DIST_CLASSPATH and
151+
# append it to tbe final classpath.
152+
if [ -n "$SPARK_DIST_CLASSPATH" ]; then
153+
CLASSPATH="$CLASSPATH:$SPARK_DIST_CLASSPATH"
154+
fi
155+
149156
echo "$CLASSPATH"

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

+14-7
Original file line numberDiff line numberDiff line change
@@ -55,19 +55,26 @@ private[spark] class SparkDeploySchedulerBackend(
5555
"{{WORKER_URL}}")
5656
val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
5757
.map(Utils.splitCommandString).getOrElse(Seq.empty)
58-
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath").toSeq.flatMap { cp =>
59-
cp.split(java.io.File.pathSeparator)
60-
}
61-
val libraryPathEntries =
62-
sc.conf.getOption("spark.executor.extraLibraryPath").toSeq.flatMap { cp =>
63-
cp.split(java.io.File.pathSeparator)
58+
val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath")
59+
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
60+
val libraryPathEntries = sc.conf.getOption("spark.executor.extraLibraryPath")
61+
.map(_.split(java.io.File.pathSeparator).toSeq).getOrElse(Nil)
62+
63+
// When testing, expose the parent class path to the child. This is processed by
64+
// compute-classpath.{cmd,sh} and makes all needed jars available to child processes
65+
// when the assembly is built with the "*-provided" profiles enabled.
66+
val testingClassPath =
67+
if (sys.props.contains("spark.testing")) {
68+
sys.props("java.class.path").split(java.io.File.pathSeparator).toSeq
69+
} else {
70+
Nil
6471
}
6572

6673
// Start executors with a few necessary configs for registering with the scheduler
6774
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
6875
val javaOpts = sparkJavaOpts ++ extraJavaOpts
6976
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
70-
args, sc.executorEnvs, classPathEntries, libraryPathEntries, javaOpts)
77+
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
7178
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
7279
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
7380
appUIAddress, sc.eventLogDir)

core/src/main/scala/org/apache/spark/util/Utils.scala

+3-2
Original file line numberDiff line numberDiff line change
@@ -990,11 +990,12 @@ private[spark] object Utils extends Logging {
990990
for ((key, value) <- extraEnvironment) {
991991
environment.put(key, value)
992992
}
993+
993994
val process = builder.start()
994995
new Thread("read stderr for " + command(0)) {
995996
override def run() {
996997
for (line <- Source.fromInputStream(process.getErrorStream).getLines()) {
997-
System.err.println(line)
998+
logInfo(line)
998999
}
9991000
}
10001001
}.start()
@@ -1089,7 +1090,7 @@ private[spark] object Utils extends Logging {
10891090
var firstUserLine = 0
10901091
var insideSpark = true
10911092
var callStack = new ArrayBuffer[String]() :+ "<unknown>"
1092-
1093+
10931094
Thread.currentThread.getStackTrace().foreach { ste: StackTraceElement =>
10941095
// When running under some profilers, the current stack trace might contain some bogus
10951096
// frames. This is intended to ensure that we don't crash in these situations by

core/src/test/scala/org/apache/spark/DriverSuite.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ class DriverSuite extends FunSuite with Timeouts {
3535
forAll(masters) { (master: String) =>
3636
failAfter(60 seconds) {
3737
Utils.executeAndGetOutput(
38-
Seq("./bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
38+
Seq(s"$sparkHome/bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
3939
new File(sparkHome),
4040
Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome))
4141
}

0 commit comments

Comments
 (0)