Beta: This SDK is supported for production use cases, but we do expect future releases to have some interface changes; see Interface stability. We are keen to hear feedback from you on these SDKs. Please file issues, and we will address them | See documentation at Go Packages | See also the Terraform Provider | See also the SDK for Python | See also the SDK for Java
The Databricks SDK for Go includes functionality to accelerate development with Go for the Databricks Lakehouse. It covers all public Databricks REST API operations. The SDK's internal HTTP client is robust and handles failures on different levels by performing intelligent retries.
- Getting started
- Authentication
- Code examples
- Long running operations
- Paginated responses
- GetByName utility methods
- Node type and Databricks Runtime selectors
- Integration with
io
interfaces for DBFS - Logging
- Testing
- Interface stability
-
On your local development machine with Go already installed and a Go code project active, create a
go.mod
file to track your Go code's dependencies by running thego mod init
command, for example:go mod init sample
-
Take a dependency on the Databricks SDK for Go package by running the
go mod edit -require
command:go mod edit -require github.com/databricks/databricks-sdk-go@latest
Your
go.mod
file should now look like this:module sample go 1.18 require github.com/databricks/databricks-sdk-go v0.9.0 // Indirect dependencies will go here.
-
Within your project, create a Go code file that imports the Databricks SDK for Go. The following example, in a file named
main.go
with the following contents, simply lists all the clusters in your Databricks workspace:package main import ( "context" "github.com/databricks/databricks-sdk-go" "github.com/databricks/databricks-sdk-go/service/compute" ) func main() { w := databricks.Must(databricks.NewWorkspaceClient()) all, err := w.Clusters.ListAll(context.Background(), compute.ListClustersRequest{}) if err != nil { panic(err) } for _, c := range all { println(c.ClusterName) } }
-
Add any misssing module dependencies by running the
go mod tidy
command:go mod tidy
Note: If you get the error
go: warning: "all" matched no packages
, you forgot to add the preceding Go code file that imports the Databricks SDK for Go. -
Grab copies of all packages needed to support builds and tests of packages in your
main
module, by running thego mod vendor
command:go mod vendor
-
Set up Databricks authentication on your local development machine by running
databricks configure
command, if you have not done so already. For details, see the next section, Authentication. -
Run your Go code file, assuming a file named
main.go
, by running thego run
command:go run main.go
Assuming the preceding example code is run, the output is:
[TRACE] Loading config via environment [TRACE] Loading config via config-file ... [TRACE] Attempting to configure auth: pat [TRACE] Attempting to configure auth: basic [TRACE] Attempting to configure auth: azure-client-secret ...
If you use Databricks configuration profiles or Databricks-specific environment variables for Databricks authentication, the only code required to start working with a Databricks workspace is the following code snippet, which instructs the Databricks SDK for Go to use its default authentication flow:
w := databricks.Must(databricks.NewWorkspaceClient())
w./*press TAB for autocompletion*/
The conventional name for the variable that holds the workspace-level client of the Databricks SDK for Go is w
, which is shorthand for workspace
.
- Default authentication flow
- Databricks native authentication
- Azure native authentication
- Google Cloud Platform native authentication
- Overriding .databrickscfg
- Additional authentication configuration options
- Custom credentials provider
If you run the Databricks Terraform Provider, the Databricks CLI, or applications that target the Databricks SDKs for other langauges, most likely they will all interoperate nicely together. By default, the Databricks SDK for Go tries the following authentication methods, in the following order, until it succeeds:
- Databricks native authentication
- Azure native authentication
- Google Cloud Platform native authentication
- If the SDK is unsuccessful at this point, it returns an authentication error and stops running.
You can instruct the Databricks SDK for Go to use a specific authentication method by setting the AuthType
field in *databricks.Config
as described in the following sections.
For each authentication method, the SDK searches for compatible authentication credentials in the following locations, in the following order. Once the SDK finds a compatible set of credentials that it can use, it stops searching:
-
Credentials that hard-coded into
*databricks.Config
.Caution: Databricks does not recommend hard-coding credentials into
*databricks.Config
, as they can be exposed in plain text in version control systems. Use environment variables or configuration profiles instead. -
Credentials in Databricks-specific environment variables.
-
For Databricks native authentication, credentials in the
.databrickscfg
file'sDEFAULT
configuration profile from its default file location (~
for Linux or macOS, and%USERPROFILE%
for Windows). -
For Azure or Google Cloud Platform native authentication, the SDK searches for credentials through the Azure CLI or Google Cloud CLI as needed.
Depending on the Databricks authentication method, the SDK uses the following information. Presented are the *databricks.Config
arguments, their descriptions, any corresponding environment variables, and any corresponding .databrickscfg
file fields, respectively.
By default, the Databricks SDK for Go initially tries Databricks token authentication (AuthType: "pat"
in *databricks.Config
). If the SDK is unsuccessful, it then tries Databricks basic (username/password) authentication (AuthType: "basic"
in *databricks.Config
).
- For Databricks token authentication, you must provide
Host
andToken
; or their environment variable or.databrickscfg
file field equivalents. - For Databricks basic authentication, you must provide
Host
,Username
, andPassword
(for AWS workspace-level operations); orHost
,AccountID
,Username
, andPassword
(for AWS, Azure, or GCP account-level operations); or their environment variable or.databrickscfg
file field equivalents.
*databricks.Config argument |
Description | Environment variable / .databrickscfg file field |
---|---|---|
Host |
(String) The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint. | DATABRICKS_HOST / host |
AccountID |
(String) The Databricks account ID for the Databricks accounts endpoint. Only has effect when Host is either https://accounts.cloud.databricks.com/ (AWS), https://accounts.azuredatabricks.net/ (Azure), or https://accounts.gcp.databricks.com/ (GCP). |
DATABRICKS_ACCOUNT_ID / account_id |
Token |
(String) The Databricks personal access token (PAT) (AWS, Azure, and GCP) or Azure Active Directory (Azure AD) token (Azure). | DATABRICKS_TOKEN / token |
Username |
(String) The Databricks username part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS). |
DATABRICKS_USERNAME / username |
Password |
(String) The Databricks password part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS). |
DATABRICKS_PASSWORD / password |
For example, to use Databricks token authentication:
package main
import (
"bufio"
"context"
"fmt"
"os"
"strings"
"github.com/databricks/databricks-sdk-go"
"github.com/databricks/databricks-sdk-go/config"
)
func main() {
// Perform Databricks token authentication for a Databricks workspace.
w, err := databricks.NewWorkspaceClient(&databricks.Config{
Host: askFor("Host:"), // workspace url
Token: askFor("Personal Access Token:"), // PAT
Credentials: config.PatCredentials{}, // enforce PAT auth
})
if err != nil {
panic(err)
}
me, err := w.CurrentUser.Me(context.Background())
if err != nil {
panic(err)
}
fmt.Printf("Hello, my name is %s!\n", me.DisplayName)
}
func askFor(prompt string) string {
var s string
r := bufio.NewReader(os.Stdin)
for {
fmt.Fprint(os.Stdout, prompt+" ")
s, _ = r.ReadString('\n')
s = strings.TrimSpace(s)
if s != "" {
break
}
}
return s
}
By default, the Databricks SDK for Go first tries Azure client secret authentication (AuthType: "azure-client-secret"
in *databricks.Config
). If the SDK is unsuccessful, it then tries Azure CLI authentication (AuthType: "azure-cli"
in *databricks.Config
). See Manage service principals.
The Databricks SDK for Go picks up an Azure CLI token, if you've previously authenticated as an Azure user by running az login
on your machine. See Get Azure AD tokens for users by using the Azure CLI.
To authenticate as an Azure Active Directory (Azure AD) service principal, you must provide one of the following. See also Add a service principal to your Azure Databricks account:
AzureResourceID
,AzureClientSecret
,AzureClientID
, andAzureTenantID
; or their environment variable or.databrickscfg
file field equivalents.AzureResourceID
andAzureUseMSI
; or their environment variable or.databrickscfg
file field equivalents.
*databricks.Config argument |
Description | Environment variable / .databrickscfg file field |
---|---|---|
AzureResourceID |
(String) The Azure Resource Manager ID for the Azure Databricks workspace, which is exchanged for a Databricks host URL. | DATABRICKS_AZURE_RESOURCE_ID / azure_workspace_resource_id |
AzureUseMSI |
(Boolean) true to use Azure Managed Service Identity passwordless authentication flow for service principals. Requires AzureResourceID to be set. |
ARM_USE_MSI / azure_use_msi |
AzureClientSecret |
(String) The Azure AD service principal's client secret. | ARM_CLIENT_SECRET / azure_client_secret |
AzureClientID |
(String) The Azure AD service principal's application ID. | ARM_CLIENT_ID / azure_client_id |
AzureTenantID |
(String) The Azure AD service principal's tenant ID. | ARM_TENANT_ID / azure_tenant_id |
AzureEnvironment |
(String) The Azure environment type (such as Public, UsGov, China, and Germany) for a specific set of API endpoints. Defaults to PUBLIC . |
ARM_ENVIRONMENT / azure_environment |
For example, to use Azure client secret authentication:
w, err := databricks.NewWorkspaceClient(&databricks.Config{
Host: askFor("Host:"),
AzureResourceID: askFor("Azure Resource ID:"),
AzureTenantID: askFor("AAD Tenant ID:"),
AzureClientID: askFor("AAD Client ID:"),
AzureClientSecret: askFor("AAD Client Secret:"),
Credentials: config.AzureClientSecretCredentials{},
})
By default, the Databricks SDK for Go first tries GCP credentials authentication (AuthType: "google-credentials"
in *databricks.Config
). If the SDK is unsuccessful, it then tries Google Cloud Platform (GCP) ID authentication (AuthType: "google-id"
in *databricks.Config
).
The Databricks SDK for Go picks up an OAuth token in the scope of the Google Default Application Credentials (DAC) flow. This means that if you have run gcloud auth application-default login
on your development machine, or launch the application on the compute, that is allowed to impersonate the Google Cloud service account specified in GoogleServiceAccount
. Authentication should then work out of the box. See Creating and managing service accounts.
To authenticate as a Google Cloud service account, you must provide one of the following:
Host
andGoogleCredentials
; or their environment variable or.databrickscfg
file field equivalents.Host
andGoogleServiceAccount
; or their environment variable or.databrickscfg
file field equivalents.
*databricks.Config argument |
Description | Environment variable / .databrickscfg file field |
---|---|---|
GoogleCredentials |
(String) GCP Service Account Credentials JSON or the location of these credentials on the local filesystem. | GOOGLE_CREDENTIALS / google_credentials |
GoogleServiceAccount |
(String) The Google Cloud Platform (GCP) service account e-mail used for impersonation in the Default Application Credentials Flow that does not require a password. | DATABRICKS_GOOGLE_SERVICE_ACCOUNT / google_service_account |
For example, to use Google ID authentication:
w, err := databricks.NewWorkspaceClient(&databricks.Config{
Host: askFor("Host:"),
GoogleServiceAccount: askFor("Google Service Account:"),
Credentials: config.GoogleDefaultCredentials{},
})
For Databricks native authentication, you can override the default behavior in *databricks.Config
for using .databrickscfg
as follows:
*databricks.Config argument |
Description | Environment variable |
---|---|---|
Profile |
(String) A connection profile specified within .databrickscfg to use instead of DEFAULT . |
DATABRICKS_CONFIG_PROFILE |
ConfigFile |
(String) A non-default location of the Databricks CLI credentials file. | DATABRICKS_CONFIG_FILE |
For example, to use a profile named MYPROFILE
instead of DEFAULT
:
w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
Profile: "MYPROFILE",
}))
// Now call the Databricks workspace APIs as desired...
For all authentication methods, you can override the default behavior in *databricks.Config
as follows:
*databricks.Config argument |
Description | Environment variable |
---|---|---|
AuthType |
(String) When multiple auth attributes are available in the environment, use the auth type specified by this argument. This argument also holds the currently selected auth. | (None) |
HTTPTimeoutSeconds |
(Integer) Number of seconds for HTTP timeout. Default is 60. | (None) |
RetryTimeoutSeconds |
(Integer) Number of seconds to keep retrying HTTP requests. Default is 300 (5 minutes). | (None) |
DebugTruncateBytes |
(Integer) Truncate JSON fields in debug logs above this limit. Default is 96. | DATABRICKS_DEBUG_TRUNCATE_BYTES |
DebugHeaders |
(Boolean) true to debug HTTP headers of requests made by the application. Default is false , as headers contain sensitive data, such as access tokens. |
DATABRICKS_DEBUG_HEADERS |
RateLimit |
(Integer) Maximum number of requests per second made to Databricks REST API. | DATABRICKS_RATE_LIMIT |
For example, to turn on debug HTTP headers:
w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
DebugHeaders: true,
}))
// Now call the Databricks workspace APIs as desired...
In some cases, you may want to have deeper control over authentication to Databricks. This can be achieved by creating your own credentials provider that returns an HTTP request visitor:
type CustomCredentials struct {}
func (c *CustomCredentials) Name() string {
return "custom"
}
func (c *CustomCredentials) Configure(ctx context.Context, cfg *config.Config) (func(*http.Request) error, error) {
return func(r *http.Request) error {
token := "..."
r.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token))
return nil
}, nil
}
func main() {
w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
Credentials: &CustomCredentials{},
}))
// ..
}
To find code examples that demonstrate how to call the Databricks SDK for Go, see the top-level examples folder within this repository
More than 20 methods across different Databricks APIs are long-running operations for managing things like clusters, command execution, jobs, libraries, Delta Live Tables pipelines, and Databricks SQL warehouses. For example, in the Clusters API, once you create a cluster, you receive a cluster ID, and the cluster is in the PENDING
state while Databricks takes care of provisioning virtual machines from the cloud provider in the background. But the cluster is only usable in the RUNNING
state. Another example is the API for running a job or repairing the run: right after the run starts, the run is in the PENDING
state, though the job is considered to be finished only when it is in the TERMINATED
or SKIPPED
states. And of course you. would want to know the error message when the long-running operation times out or why things fail. And sometimes you want to configure a custom timeout other than the default of 20 minutes.
To hide all of the integration-specific complexity from the end user, Databricks SDK for Go provides a high-level API for triggering the long-running operations and waiting for the releated entities to reach the right state or return back the error message about the problem in case of failure. All long-running operations have the XxxAndWait
name pattern, where Xxx
is the operation name. All these generated methods return information about the relevant entity once the operation is finished. It is possible to configure a custom timeout to XxxAndWait
by providing a functional option argument constructed by retries.Timeout[Zzz](time.Duration)
function, where Zzz
is the result type of XxxAndWait
.
In the following example, CreateAndWait
returns ClusterInfo
only once the cluster is in the RUNNING
state, otherwise it will timeout in 10 minutes:
clusterInfo, err = w.Clusters.CreateAndWait(ctx, clusters.CreateCluster{
ClusterName: "Created cluster",
SparkVersion: latestLTS,
NodeTypeId: smallestWithDisk,
AutoterminationMinutes: 10,
NumWorkers: 1,
}, retries.Timeout[clusters.ClusterInfo](10*time.Minute))
You can run Python, Scala, R, or SQL code on running interactive Databricks clusters and get the results back. All supplied code gets leading whitespace removed, so that you could easily embed Python code into Go applications. This high-level wrapper comes from the Databricks Terraform provider, where it was tested for over 2 years for use cases such as DBFS mounts and SQL permissions. This interface hides the intricate complexity of all internal APIs involved to simplify the unit-testing experience for command execution. Databricks does not recommending that you use lower-level interfaces for command execution. The execution timeout is 20 minutes and cannot be overriden for the sake of interface simplicity, meaning that you should only use this API if you have some relatively complex executions to perform. Please use jobs in case your commands must run longer than 20 minutes. Or use the Databricks SQL Driver for Go in case your workload type is purely for business intelligence.
res := w.CommandExecutor.Execute(ctx, clusterId, "python", "print(1)")
if res.Failed() {
return fmt.Errorf("command failed: %w", res.Err())
}
println(res.Text())
// Out: 1
You can install or uninstall libraries on running Databricks clusters. UpdateAndWait
follows all conventions of long-running operations and wraps Install
and Uninstall
operations, followed by checking for the installation status of the cluster, exposing error messages back in a simplified way. This high-level wrapper came from the Databricks Terraform provider, where it was tested for over 2 years in the databricks_cluster and databricks_library resources. Databricks recommends that you use UpdateAndWait
as the only API for cluster library management.
err = w.Libraries.UpdateAndWait(ctx, libraries.Update{
ClusterId: clusterId,
Install: []libraries.Library{
{
Pypi: &libraries.PythonPyPiLibrary{
Package: "dbl-tempo",
},
},
},
})
You can track the intermediate state of a long-running operation while waiting to reach the correct state by supplying the func(i *retries.Info[Zzz])
functional option, where Zzz
is the return type of the XxxAndWait
method:
clusterInfo, err = w.Clusters.CreateAndWait(ctx, clusters.CreateCluster{
// ...
}, func(i *retries.Info[clusters.ClusterInfo]) {
updateIntermediateState(i.Info.StateMessage)
})
On the platform side, some Databricks APIs have result pagination, and some of them do not. Some APIs follow the offset-plus-limit pagination, some start their offsets from 0 and some from 1, some use the cursor-based iteration, and others just return all results in a single response. The Databricks SDK for Go hides this intricate complexity and generates a more high-level interface for retrieving all results of a certain entity type. The naming pattern is XxxAll
, where Xxx
is the name of the method to retrieve a single page of results.
all, err := w.Repos.ListAll(ctx, repos.List{})
if err != nil {
return fmt.Errorf("list repos: %w", err)
}
for _, repo := range all {
println(repo.Path)
}
On the platform side, most of the Databricks APIs could be retrieved primarily by their identifiers. In some common workflows, it's easier to reason about workspace objects by their names. To simplify development experience and speed-up proof-of-concepts, the Databricks SDK for Go generates code for GetByName
client-side utilities. Please keep in mind, that some Databricks APIs don't enforce unique names on objects and these generated helpers return an error whenever duplicate name is detected.
repo, err := w.Repos.GetByPath(ctx, path)
if err != nil {
return err
}
return w.Repos.Update(ctx, repos.UpdateRepo{
RepoId: repo.Id,
Branch: tag,
})
The Databricks SDK for Go provides selector methods that make developing multi-cloud applications easier and just rely on characteristics of the virtual machine, such as the number of cores or availability of local disks or always picking up the latest Databricks Runtime for the interactive cluster or per-job cluster.
// Fetch the list of spark runtime versions.
sparkVersions, err := w.Clusters.SparkVersions(ctx)
if err != nil {
return err
}
// Select the latest LTS version.
latestLTS, err := sparkVersions.Select(clusters.SparkVersionRequest{
Latest: true,
LongTermSupport: true,
})
if err != nil {
return err
}
// Fetch the list of available node types.
nodeTypes, err := w.Clusters.ListNodeTypes(ctx)
if err != nil {
return err
}
// Select the smallest node type ID.
smallestWithDisk, err := nodeTypes.Smallest(clusters.NodeTypeRequest{
LocalDisk: true,
})
if err != nil {
return err
}
// Create the cluster and wait for it to start properly.
runningCluster, err := w.Clusters.CreateAndWait(ctx, clusters.CreateCluster{
ClusterName: clusterName,
SparkVersion: latestLTS,
NodeTypeId: smallestWithDisk,
AutoterminationMinutes: 15,
NumWorkers: 1,
})
You can open a file on DBFS for reading or writing with w.Dbfs.Open
.
This function returns a dbfs.Handle
that is compatible with a subset of io
interfaces for reading, writing, and closing.
Uploading a file from an io.Reader
:
upload, _ := os.Open("/path/to/local/file.ext")
remote, _ := w.Dbfs.Open(ctx, "/path/to/remote/file", dbfs.FileModeWrite|dbfs.FileModeOverwrite)
_, _ = io.Copy(remote, upload)
_ = remote.Close()
Downloading a file to an io.Writer
:
download, _ := os.Create("/path/to/local")
remote, _ := w.Dbfs.Open(ctx, "/path/to/remote/file", dbfs.FileModeRead)
_, _ = io.Copy(download, remote)
You can read from or write to a DBFS file directly from a byte slice through
the convenience functions w.Dbfs.ReadFile
and w.Dbfs.WriteFile
.
Uploading a file from a byte slice:
err := w.Dbfs.WriteFile(ctx, "/path/to/remote/file", []byte("Hello world!"))
Downloading a file into a byte slice:
buf, err := w.Dbfs.ReadFile(ctx, "/path/to/remote/file")
Databricks SDK for Go loosely integrates with spf13/pflag by implementing pflag.Value for all enum types.
By default, Databricks SDK for Go uses logger.SimpleLogger, which is a levelled proxy to log.Printf
, printing to os.Stderr
. You can disable logging completely by adding log.SetOutput(io.Discard)
to your init()
function. You are encouraged to override logging.DefaultLogger
with your own implementation that follows the logger.Logger interface.
Since v0.10.0, default logger prints only INFO
level messages. To replicate more verbose behavior from the previous versions, set the DEBUG
level in SimpleLogger
:
import "github.com/databricks/databricks-sdk-go/logger"
func init() {
logger.DefaultLogger = &logger.SimpleLogger{
Level: logger.LevelDebug,
}
}
Current Logger interface will evolve in the future versions of Databricks SDK for Go.
The Databricks SDK for Go makes it easy to write unit tests for your code that uses the SDK. The SDK provides a mockery-based mock implementation of the SDK's interfaces. You can use this mock implementation to write unit tests for your code that uses the SDK. For example:
package my_test
import (
"context"
"testing"
"github.com/databricks/databricks-sdk-go/experimental/mocks"
"github.com/databricks/databricks-sdk-go/listing"
"github.com/databricks/databricks-sdk-go/qa/poll"
"github.com/databricks/databricks-sdk-go/service/compute"
"github.com/databricks/databricks-sdk-go/service/iam"
"github.com/databricks/databricks-sdk-go/service/sql"
"github.com/stretchr/testify/mock"
)
func TestDatabricksSDK(t *testing.T) {
ctx := context.Background()
w := mocks.NewMockWorkspaceClient(t)
w.GetMockClustersAPI().EXPECT().ListAll(
ctx,
mock.AnythingOfType("compute.ListClustersRequest"),
).Return(
[]compute.ClusterDetails{
{ClusterName: "test-cluster-1"},
{ClusterName: "test-cluster-2"},
}, nil)
// You can also mock the AccountClient as follows.
a := mocks.NewMockAccountClient(t)
a.GetMockAccountUsersAPI().EXPECT().ListAll(
ctx,
mock.AnythingOfType("iam.ListAccountUsersRequest"),
).Return(
[]iam.User{
{DisplayName: "test-user-1"},
{DisplayName: "test-user-2"},
}, nil)
}
The SDK also provides several testing utilities to simplify mocking test results.
- The
*listing.SliceIterator
type simplifies mocking the results of a listing operation. You can specify the items to be iterated over as a slice. - The
qa/poll.Simple()
method constructs a poller function to mock the results of polling for a long-running operation.
For example:
func TestDatabricksSDK_helpers(t *testing.T) {
// To mock iterators, you can provide the items to iterate over with
// *listing.SliceIterator.
iterator := listing.SliceIterator[iam.User]([]iam.User{
{DisplayName: "test-user-1"},
{DisplayName: "test-user-2"},
})
a.GetMockAccountUsersAPI().EXPECT().List(
ctx,
mock.AnythingOfType("iam.ListAccountUsersRequest"),
).Return(&iterator)
// To mock Wait* structures, you can stub out the Poll field.
getResponse := sql.GetWarehouseResponse{
Id: "abc",
}
wait := sql.WaitGetWarehouseRunning[struct{}]{
Poll: poll.Simple(getResponse),
}
w.GetMockWarehousesAPI().EXPECT().Edit(mock.Anything, sql.EditWarehouseRequest{}).Return(&wait, nil)
}
During the Beta period, Databricks is actively working on stabilizing the Databricks SDK for Go's interfaces. API clients for all services are generated from specification files that are synchronized from the main platform. You are highly encouraged to pin the exact version in the go.mod
file and read the changelog where Databricks documents the changes. Some types of interfaces are more stable than others. For those interfaces that are not yet nightly tested, Databricks may have minor documented backward-incompatible changes, such as fixing mapping correctness from int
to int64
or renaming some type names to bring more consistency.