These instructions will show you how to run a .NET for Apache Spark app using .NET Core on MacOSX.
- Download and install .NET Core 2.1 SDK
- Install Java 8
- Select the appropriate version for your operating system e.g.,
jdk-8u231-macosx-x64.dmg
. - Install using the installer and verify you are able to run
java
from your command-line
- Select the appropriate version for your operating system e.g.,
- Download and install Apache Spark 2.4.4:
- Add the necessary environment variables SPARK_HOME e.g.,
~/bin/spark-2.4.4-bin-hadoop2.7/
export SPARK_HOME=~/bin/spark-2.4.4-bin-hadoop2.7/ export PATH="$SPARK_HOME/bin:$PATH" source ~/.bashrc
- Add the necessary environment variables SPARK_HOME e.g.,
- Download and install Microsoft.Spark.Worker release:
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
/bin/Microsoft.Spark.Worker/
). - IMPORTANT Create a new environment variable using
export DOTNET_WORKER_DIR <your_path>
and set it to the directory where you downloaded and extracted the Microsoft.Spark.Worker (e.g.,/bin/Microsoft.Spark.Worker/
). - Make sure the worker is marked as executable and remove any "quarantined" attributes, e.g.:
chmod 755 /bin/Microsoft.Spark.Worker/Microsoft.Spark.Worker xattr -d com.apple.quarantine /bin/Microsoft.Spark.Worker/*
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
- Use the
dotnet
CLI to create a console application.dotnet new console -o HelloSpark
- Install
Microsoft.Spark
Nuget package into the project from the spark nuget.org feed - see Ways to install Nuget Packagecd HelloSpark dotnet add package Microsoft.Spark
- Replace the contents of the
Program.cs
file with the following code:using Microsoft.Spark.Sql; namespace HelloSpark { class Program { static void Main(string[] args) { var spark = SparkSession.Builder().GetOrCreate(); var df = spark.Read().Json("people.json"); df.Show(); } } }
- Use the
dotnet
CLI to build the application:dotnet build
-
Open your terminal and navigate into your app folder:
cd <your-app-output-directory>
-
Create
people.json
with the following content:{ "name" : "Michael" } { "name" : "Andy", "age" : 30 } { "name" : "Justin", "age" : 19 }
-
Run your app
spark-submit \ --class org.apache.spark.deploy.dotnet.DotnetRunner \ --master local \ microsoft-spark-<version>.jar \ dotnet HelloSpark.dll
Note: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use
spark-submit
, otherwise, you would have to use the full path (e.g.,~/spark/bin/spark-submit
). -
The output of the application should look similar to the output below:
+----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+