Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Connecting Python and Apache Spark with ADBC

Instructions

Tip

If you already have a Spark instance running, skip the steps to set up and clean up Spark.

Prerequisites

  1. Install uv

  2. Install dbc

Set up Spark

  1. Install Docker

  2. Start a Spark instance:

    docker run -d --rm --name spark-connect -p 15002:15002 apache/spark:4.1.2 bash -c "/opt/spark/sbin/start-connect-server.sh && tail -f /dev/null"

Connect to Spark

  1. Install the Spark ADBC driver:

    dbc install spark --pre
  2. Customize the Python script main.py as needed

    • Change the connection arguments in db_kwargs
    • If you changed which database you're connecting to, also change the SQL SELECT statement in cursor.execute()
  3. Run the Python script:

    uv run main.py

Clean up

Stop the Docker container running Spark:

docker stop spark-connect