See the examples directory for samples of Databricks CRDs. Resources that are created via Kubernetes are owned by the operator: your checked-in manifests are the source of truth.
Changes made by users in the Databricks webapp will be overwritten by the operator if drift is detected:
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count drifted!
Diff (remote, kube):
json atoms at path ".settings.tasks[0].notebook_task.notebook_path" are not equal:
lhs:
"examples/job_oops_is_this_right.py"
rhs:
"examples/job.py"
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count reconciling drift...
Look at jobs (allowed to be viewed by the operator's access token):
$ kubectl get databricksjobs
NAME STATUS
contoso-ingest-qa RUNNING
contoso-ingest-staging INTERNAL_ERROR
contoso-stats-qa TERMINATED
contoso-stats-staging NO_RUNS
$ kubectl describe databricksjob contoso-ingest-qa
...
A job's status key surfaces API information about the latest run. The status is polled every 60s:
$ kubectl get databricksjob contoso-ingest-staging -ojson | jq .status
{
"latest_run_state": {
"life_cycle_state": "INTERNAL_ERROR",
"result_state": "FAILED",
"state_message": "Task contoso-ingest-staging failed. This caused all downstream tasks to get skipped.",
"user_cancelled_or_timedout": false
}
}
Developers
Begin by creating the configmap as per the Helm instructions.
Generate and install the CRDs by running the crd_gen bin target:
cargo run --bin crd_gen | kubectl apply -f -
The quickest way to test the operator is with a working minikube cluster:
minikube start
minikube tunnel &
export RUST_LOG=databricks_kube
cargo run
[2022-11-02T18:56:25Z INFO databricks_kube] boot! (build: df7e26b-modified)
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD: databricksjobs.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD: gitcredentials.com.dstancu.databricks
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for settings in config map: databricks-kube-operator
[2022-11-02T18:56:25Z INFO databricks_kube::context] Found config map
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested GitCredential(s)
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested DatabricksJob(s)
Generating API Clients
The client is generated by openapi-generator and then lightly postprocessed so we get models that derive JsonSchema and fix some bugs.
TODO: Manual client 'fixes'
# Hey!! This uses GNU sed
# brew install gnu-sed
# Jobs API
openapi-generator generate -g rust -i openapi/jobs-2.1-aws.yaml -c openapi/config-jobs.yaml -o dbr_jobs
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_jobs/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_jobs/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_jobs/Cargo.toml
# Missing import?
gsed -r -i -e 's/(use reqwest;)/\1\nuse crate::models::ViewsToExport;/' dbr_jobs/src/apis/default_api.rs
# Git Credentials API
openapi-generator generate -g rust -i openapi/gitcredentials-2.0-aws.yaml -c openapi/config-git.yaml -o dbr_git_creds
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_git_creds/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_git_creds/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_git_creds/Cargo.toml
# Repos API
openapi-generator generate -g rust -i openapi/repos-2.0-aws.yaml -c openapi/config-repos.yaml -o dbr_repo
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_repo/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_repo/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_repo/Cargo.toml
# Secrets API
openapi-generator generate -g rust -i openapi/secrets-aws.yaml -c openapi/config-secrets.yaml -o dbr_secrets
sed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_secrets/src/models/*
sed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_secrets/src/models/*
sed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_secrets/Cargo.toml
Expand CRD macros
Deriving CustomResource uses macros to generate another struct. For this example, the output struct name would be DatabricksJob:
rust-analyzer shows squiggles when you use crds::databricks_job::DatabricksJob, but one may want to look inside. To see what is generated with cargo-expand:
Add the new resource to the context ensure CRDs condition
Add the new resource to crdgen.rs
Running tests
Tests must be run with a single thread since we use a stateful singleton to 'mock' the state of a remote API. Eventually it would be nice to have integration tests targetting Databricks.