# databricks-kube-operator

> \[!IMPORTANT]\
> As of 2025/06 this project is archived. We recommend using [Upjet](https://github.com/crossplane/upjet) to generate a Crossplane provider from the official [Databricks Terraform Provider](https://registry.terraform.io/providers/databricks/databricks/latest/docs)

[![Rust](https://github.com/mach-kernel/databricks-kube-operator/actions/workflows/rust.yml/badge.svg?branch=master)](https://github.com/mach-kernel/databricks-kube-operator/actions/workflows/rust.yml)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator.svg?type=shield)](https://app.fossa.com/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator?ref=badge_shield)

A [kube-rs](https://kube.rs/) operator to enable GitOps style management of Databricks resources. It supports the following APIs:

| API                 | CRD                                     |
| ------------------- | --------------------------------------- |
| Jobs 2.1            | DatabricksJob                           |
| Git Credentials 2.0 | GitCredential                           |
| Repos 2.0           | Repo                                    |
| Secrets 2.0         | DatabricksSecretScope, DatabricksSecret |

Experimental headed towards stable. See the GitHub project board for the roadmap. Contributions and feedback are welcome!

[Read the docs](https://databricks-kube-operator.gitbook.io/doc)

## Quick Start

Looking for a more in-depth example? Read the [tutorial](https://databricks-kube-operator.gitbook.io/doc/tutorial).

### Installation

Add the Helm repository and install the chart:

```bash
helm repo add mach https://mach-kernel.github.io/databricks-kube-operator
helm install databricks-kube-operator mach/databricks-kube-operator
```

Create a config map in the same namespace as the operator. To override the configmap name, `--set configMapName=my-custom-name`:

```bash
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: databricks-kube-operator
data:
  api_secret_name: databricks-api-secret
EOF
```

Create a secret with your API URL and credentials:

```bash
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
  access_token: $(echo -n 'shhhh' | base64)
  databricks_url: $(echo -n 'https://my-tenant.cloud.databricks.com/api' | base64)
kind: Secret
metadata:
  name: databricks-api-secret
type: Opaque
EOF
```

### Usage

See the examples directory for samples of Databricks CRDs. Resources that are created via Kubernetes are owned by the operator: your checked-in manifests are the source of truth.

```yaml
apiVersion: com.dstancu.databricks/v1
kind: DatabricksJob
metadata:
  name: my-word-count
  namespace: default
spec:
  job:
    settings:
      email_notifications:
        no_alert_for_skipped_runs: false
      format: MULTI_TASK
      job_clusters:
      - job_cluster_key: word-count-cluster
        new_cluster:
          ...
      max_concurrent_runs: 1
      name: my-word-count
      git_source:
        git_branch: misc-and-docs
        git_provider: gitHub
        git_url: https://github.com/mach-kernel/databricks-kube-operator
      tasks:
      - email_notifications: {}
        job_cluster_key: word-count-cluster
        notebook_task:
          notebook_path: examples/job.py
          source: GIT
        task_key: my-word-count
        timeout_seconds: 0
      timeout_seconds: 0
```

Changes made by users in the Databricks webapp will be overwritten by the operator if drift is detected:

```
[2024-01-11T14:20:40Z INFO  databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count drifted!
    Diff (remote, kube):
    json atoms at path ".settings.tasks[0].notebook_task.notebook_path" are not equal:
        lhs:
            "examples/job_oops_is_this_right.py"
        rhs:
            "examples/job.py"
[2024-01-11T14:20:40Z INFO  databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count reconciling drift...
```

Look at jobs (allowed to be viewed by the operator's access token):

```bash
$ kubectl get databricksjobs
NAME                                 STATUS
contoso-ingest-qa                      RUNNING
contoso-ingest-staging                 INTERNAL_ERROR
contoso-stats-qa                       TERMINATED
contoso-stats-staging                  NO_RUNS

$ kubectl describe databricksjob contoso-ingest-qa
...
```

A job's status key surfaces API information about the latest [run](https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunsList). The status is polled every 60s:

```bash
$ kubectl get databricksjob contoso-ingest-staging -ojson | jq .status
{
  "latest_run_state": {
    "life_cycle_state": "INTERNAL_ERROR",
    "result_state": "FAILED",
    "state_message": "Task contoso-ingest-staging failed. This caused all downstream tasks to get skipped.",
    "user_cancelled_or_timedout": false
  }
}
```

## Developers

Begin by creating the configmap as per the Helm instructions.

Generate and install the CRDs by running the `crd_gen` bin target:

```bash
cargo run --bin crd_gen | kubectl apply -f -
```

The quickest way to test the operator is with a working [minikube](https://minikube.sigs.k8s.io/docs/start/) cluster:

```bash
minikube start
minikube tunnel &
```

```bash
export RUST_LOG=databricks_kube
cargo run
[2022-11-02T18:56:25Z INFO  databricks_kube] boot! (build: df7e26b-modified)
[2022-11-02T18:56:25Z INFO  databricks_kube::context] Waiting for CRD: databricksjobs.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO  databricks_kube::context] Waiting for CRD: gitcredentials.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO  databricks_kube::context] Waiting for settings in config map: databricks-kube-operator
[2022-11-02T18:56:25Z INFO  databricks_kube::context] Found config map
[2022-11-02T18:56:25Z INFO  databricks_kube::traits::synced_api_resource] Looking for uningested GitCredential(s)
[2022-11-02T18:56:25Z INFO  databricks_kube::traits::synced_api_resource] Looking for uningested DatabricksJob(s)
```

### Generating API Clients

The client is generated by `openapi-generator` and then lightly postprocessed so we get models that derive [`JsonSchema`](https://github.com/GREsau/schemars#basic-usage) and fix some bugs.

<details>

<summary>TODO: Manual client 'fixes'</summary>

```bash
# Hey!! This uses GNU sed
# brew install gnu-sed

# Jobs API
openapi-generator generate -g rust -i openapi/jobs-2.1-aws.yaml -c openapi/config-jobs.yaml -o dbr_jobs

# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_jobs/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_jobs/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_jobs/Cargo.toml

# Missing import?
gsed -r -i -e 's/(use reqwest;)/\1\nuse crate::models::ViewsToExport;/' dbr_jobs/src/apis/default_api.rs

# Git Credentials API
openapi-generator generate -g rust -i openapi/gitcredentials-2.0-aws.yaml -c openapi/config-git.yaml -o dbr_git_creds

# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_git_creds/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_git_creds/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_git_creds/Cargo.toml

# Repos API
openapi-generator generate -g rust -i openapi/repos-2.0-aws.yaml -c openapi/config-repos.yaml -o dbr_repo

# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_repo/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_repo/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_repo/Cargo.toml

# Secrets API
openapi-generator generate -g rust -i openapi/secrets-aws.yaml -c openapi/config-secrets.yaml -o dbr_secrets
sed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_secrets/src/models/*
sed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_secrets/src/models/*
sed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_secrets/Cargo.toml
```

</details>

### Expand CRD macros

Deriving `CustomResource` uses macros to generate another struct. For this example, the output struct name would be `DatabricksJob`:

```rust
#[derive(Clone, CustomResource, Debug, Default, Deserialize, PartialEq, Serialize, JsonSchema)]
#[kube(
    group = "com.dstancu.databricks",
    version = "v1",
    kind = "DatabricksJob",
    derive = "Default",
    namespaced
)]
pub struct DatabricksJobSpec {
    pub job: Job,
}
```

`rust-analyzer` shows squiggles when you `use crds::databricks_job::DatabricksJob`, but one may want to look inside. To see what is generated with [cargo-expand](https://github.com/dtolnay/cargo-expand):

```bash
rustup default nightly
cargo expand --bin databricks_kube
```

### Adding a new CRD

Want to add support for a new API? Provided it has an OpenAPI definition, these are the steps. Look for existing examples in the codebase:

* Download API definition into `openapi/` and make a [Rust generator configuration](https://openapi-generator.tech/docs/generators/rust/) (feel free to copy the others and change name)
* Generate the SDK, add it to the Cargo workspace and dependencies for `databricks-kube/`
* Implement `RestConfig<TSDKConfig>` for your new client
* Define the new CRD Spec type ([follow kube-rs tutorial](https://kube.rs/getting-started/))
* `impl RemoteAPIResource<TAPIResource> for MyNewCRD`
* `impl StatusAPIResource<TStatusType> for MyNewCRD` and [specify `TStatusType` in your CRD](https://github.com/kube-rs/kube/blob/main/examples/crd_derive.rs#L20)
* Add the new resource to the context ensure CRDs condition
* Add the new resource to `crdgen.rs`

### Running tests

Tests must be run with a single thread since we use a stateful singleton to 'mock' the state of a remote API. Eventually it would be nice to have integration tests targetting Databricks.

```bash
$ cargo test -- --test-threads=1
```

## License

[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator.svg?type=large)](https://app.fossa.com/projects/custom%2B34302%2Fgithub.com%2Fmach-kernel%2Fdatabricks-kube-operator?ref=badge_large)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://databricks-kube-operator.gitbook.io/doc/readme.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
