Tip
If you already have a Spark instance running, skip the steps to set up and clean up Spark.
-
Start a Spark instance:
docker run -d --rm --name spark-connect -p 15002:15002 apache/spark:4.1.2 bash -c "/opt/spark/sbin/start-connect-server.sh && tail -f /dev/null"
-
Install the Spark ADBC driver:
dbc install spark --pre
-
Customize the Python script
main.pyas needed- Change the connection arguments in
db_kwargs- Format
uriaccording to the driver documentation, or keep it as is
- Format
- If you changed which database you're connecting to, also change the SQL SELECT statement in
cursor.execute()
- Change the connection arguments in
-
Run the Python script:
uv run main.py
Stop the Docker container running Spark:
docker stop spark-connect