🦀databricks-kube-operator
A Kubernetes operator for Databricks
[!IMPORTANT] As of 2025/06 this project is archived. We recommend using Upjet to generate a Crossplane provider from the official Databricks Terraform Provider
A kube-rs operator to enable GitOps style management of Databricks resources. It supports the following APIs:
Jobs 2.1
DatabricksJob
Git Credentials 2.0
GitCredential
Repos 2.0
Repo
Secrets 2.0
DatabricksSecretScope, DatabricksSecret
Experimental headed towards stable. See the GitHub project board for the roadmap. Contributions and feedback are welcome!
Quick Start
Looking for a more in-depth example? Read the tutorial.
Installation
Add the Helm repository and install the chart:
helm repo add mach https://mach-kernel.github.io/databricks-kube-operator
helm install databricks-kube-operator mach/databricks-kube-operator
Create a config map in the same namespace as the operator. To override the configmap name, --set configMapName=my-custom-name
:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: databricks-kube-operator
data:
api_secret_name: databricks-api-secret
EOF
Create a secret with your API URL and credentials:
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
access_token: $(echo -n 'shhhh' | base64)
databricks_url: $(echo -n 'https://my-tenant.cloud.databricks.com/api' | base64)
kind: Secret
metadata:
name: databricks-api-secret
type: Opaque
EOF
Usage
See the examples directory for samples of Databricks CRDs. Resources that are created via Kubernetes are owned by the operator: your checked-in manifests are the source of truth.
apiVersion: com.dstancu.databricks/v1
kind: DatabricksJob
metadata:
name: my-word-count
namespace: default
spec:
job:
settings:
email_notifications:
no_alert_for_skipped_runs: false
format: MULTI_TASK
job_clusters:
- job_cluster_key: word-count-cluster
new_cluster:
...
max_concurrent_runs: 1
name: my-word-count
git_source:
git_branch: misc-and-docs
git_provider: gitHub
git_url: https://github.com/mach-kernel/databricks-kube-operator
tasks:
- email_notifications: {}
job_cluster_key: word-count-cluster
notebook_task:
notebook_path: examples/job.py
source: GIT
task_key: my-word-count
timeout_seconds: 0
timeout_seconds: 0
Changes made by users in the Databricks webapp will be overwritten by the operator if drift is detected:
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count drifted!
Diff (remote, kube):
json atoms at path ".settings.tasks[0].notebook_task.notebook_path" are not equal:
lhs:
"examples/job_oops_is_this_right.py"
rhs:
"examples/job.py"
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count reconciling drift...
Look at jobs (allowed to be viewed by the operator's access token):
$ kubectl get databricksjobs
NAME STATUS
contoso-ingest-qa RUNNING
contoso-ingest-staging INTERNAL_ERROR
contoso-stats-qa TERMINATED
contoso-stats-staging NO_RUNS
$ kubectl describe databricksjob contoso-ingest-qa
...
A job's status key surfaces API information about the latest run. The status is polled every 60s:
$ kubectl get databricksjob contoso-ingest-staging -ojson | jq .status
{
"latest_run_state": {
"life_cycle_state": "INTERNAL_ERROR",
"result_state": "FAILED",
"state_message": "Task contoso-ingest-staging failed. This caused all downstream tasks to get skipped.",
"user_cancelled_or_timedout": false
}
}
Developers
Begin by creating the configmap as per the Helm instructions.
Generate and install the CRDs by running the crd_gen
bin target:
cargo run --bin crd_gen | kubectl apply -f -
The quickest way to test the operator is with a working minikube cluster:
minikube start
minikube tunnel &
export RUST_LOG=databricks_kube
cargo run
[2022-11-02T18:56:25Z INFO databricks_kube] boot! (build: df7e26b-modified)
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD: databricksjobs.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD: gitcredentials.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for settings in config map: databricks-kube-operator
[2022-11-02T18:56:25Z INFO databricks_kube::context] Found config map
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested GitCredential(s)
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested DatabricksJob(s)
Generating API Clients
The client is generated by openapi-generator
and then lightly postprocessed so we get models that derive JsonSchema
and fix some bugs.
Expand CRD macros
Deriving CustomResource
uses macros to generate another struct. For this example, the output struct name would be DatabricksJob
:
#[derive(Clone, CustomResource, Debug, Default, Deserialize, PartialEq, Serialize, JsonSchema)]
#[kube(
group = "com.dstancu.databricks",
version = "v1",
kind = "DatabricksJob",
derive = "Default",
namespaced
)]
pub struct DatabricksJobSpec {
pub job: Job,
}
rust-analyzer
shows squiggles when you use crds::databricks_job::DatabricksJob
, but one may want to look inside. To see what is generated with cargo-expand:
rustup default nightly
cargo expand --bin databricks_kube
Adding a new CRD
Want to add support for a new API? Provided it has an OpenAPI definition, these are the steps. Look for existing examples in the codebase:
Download API definition into
openapi/
and make a Rust generator configuration (feel free to copy the others and change name)Generate the SDK, add it to the Cargo workspace and dependencies for
databricks-kube/
Implement
RestConfig<TSDKConfig>
for your new clientDefine the new CRD Spec type (follow kube-rs tutorial)
impl RemoteAPIResource<TAPIResource> for MyNewCRD
impl StatusAPIResource<TStatusType> for MyNewCRD
and specifyTStatusType
in your CRDAdd the new resource to the context ensure CRDs condition
Add the new resource to
crdgen.rs
Running tests
Tests must be run with a single thread since we use a stateful singleton to 'mock' the state of a remote API. Eventually it would be nice to have integration tests targetting Databricks.
$ cargo test -- --test-threads=1
License
Last updated