Tutorial
An example GitOps recipe
Last updated
An example GitOps recipe
Last updated
This repository contains a (how did you guess it was word count?) that we are going to operationalize using Helm and databricks-kube-operator. You can follow along with a local cluster, or use in an environment with or .
Begin by creating a Helm . The Helm starter chart has unneeded example resources and values that we remove:
Your directory structure should look like this:
In Chart.yaml
, add a dependency to the operator chart:
We are now going to create our resources in the templates/
directory.
Create a secret containing your Databricks API URL and a valid access token. The snippet below is for your convenience, to run against the cluster for this example. Do not create a template and check in your token.
Create the file below. The operator configmap expects a secret name from which to pull its REST configuration.
Here is another "quick snippet" for making the required secret if deploying your own job from a private repo. As previously mentioned, do not check this in as a template.
The available Git providers are awsCodeCommit, azureDevOpsServices, bitbucketCloud, bitbucketServer, gitHub, gitHubEnterprise, gitLab, and gitLabEnterpriseEdition.
This is for use with the Repo
API, which clones a repository to your workspace. Tasks are then launched from WORKSPACE
paths. You can reuse the CRD from above removing git_source
and changing the task definition to match the example below:
Awesome! We have templates for our shiny new job. Let's make sure the chart works as expected. Inspect the resulting templates for errors:
Local/minikube: Comment out the dependency key and continue with installation
Fleet/others: Use one chart for your operator deployment, and another for the Databricks resources. On first deploy, the operator chart will sync successfully and example-job
will do so on retry.
If successful, you should see the following Helm deployments, as well as your job in Databricks:
Bump the chart version for your Databricks definitions as they change, and let databricks-kube-operator reconcile them when they are merged to your main branch.
We recommend using the Git source for your job definitions, as databricks-kube-operator does not poll Databricks to update the workspace repository clone. PRs are accepted.
Begin by creating a , and use the according API call to create an access token. If your new service principal is unable to issue a token, enable token permissions for it by following the instructions .
In a production environment, the Databricks API URL and access token can be sourced via in combination with (e.g. AWS Secrets Manager).
Public repositories do not require . The tutorial deploys the job from this public repository. You can skip this step, unless you are following along with your own job and a private repo.
Create the file below. According to the , the following VCS providers are available:
Create the file below to create a job. There are two possible strategies for running jobs via Git sources. For more possible configuration, see the .
Does your job use an ? You will have to give your new service principal access to the instance profile or your job will fail to launch.
If your credentials are configured, Databricks job definitions directly referencing a Git source. Whenever the job is triggered, it will use the latest version from source control without needing to poll the repo for updates.
Follow the before proceeding.
If everything looks good, it's time to install. Unfortunately this requires discussion of the problem. Here are suggestions for different readers:
ArgoCD: Use
Create the file below to create a repo. Ensure that the /Test
directory exists within Repos on your Databricks instance, or else the create request will 400: