You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/quick-start.md
+31-8
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ scala> textFile.filter(line => line.contains("Spark")).count() // How many lines
53
53
res3: Long = 15
54
54
{% endhighlight %}
55
55
56
-
## More On RDD Operations
56
+
## More on RDD Operations
57
57
RDD actions and transformations can be used for more complex computations. Let's say we want to find the line with the most words:
58
58
59
59
{% highlight scala %}
@@ -163,8 +163,6 @@ $ sbt run
163
163
Lines with a: 46, Lines with b: 23
164
164
{% endhighlight %}
165
165
166
-
This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS.
167
-
168
166
# A Standalone Job In Java
169
167
Now say we wanted to write a standalone job using the Java API. We will walk through doing this with Maven. If you are using other build systems, consider using the Spark assembly JAR described in the developer guide.
This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS.
256
-
257
253
# A Standalone Job In Python
258
254
Now we will show how to write a standalone job using the Python API (PySpark).
259
255
@@ -290,6 +286,33 @@ $ ./pyspark SimpleJob.py
290
286
Lines with a: 46, Lines with b: 23
291
287
{% endhighlight python %}
292
288
293
-
This example only runs the job locally; for a tutorial on running jobs across several machines, see the [Standalone Mode](spark-standalone.html) documentation, and consider using a distributed input source, such as HDFS.
294
-
295
-
Also, this example links against the default version of HDFS that Spark builds with (1.0.4). You can run it against other HDFS versions by [building Spark with another HDFS version](index.html#a-note-about-hadoop-versions).
289
+
# Running Jobs on a Cluster
290
+
291
+
There are a few additional considerations when running jobs on a
292
+
[Spark](spark-standalone.html), [YARN](running-on-yarn.html), or
293
+
[Mesos](running-on-mesos.html) cluster.
294
+
295
+
### Including Your Dependencies
296
+
If your code depends on other projects, you will need to ensure they are also
297
+
present on the slave nodes. A popular approach is to create an
298
+
assembly jar (or "uber" jar) containing your code and its dependencies. Both
0 commit comments