Four Hallucinations and a Python Script
Copilot Hallucination over simple Databricks Job/Task Parameters
I asked an LLM agent to get a Databricks job ID at runtime. It confidently proposed four approaches. All four were wrong. The fix was just a few lines Python script I could have written in ten minutes.
I had a custom metrics table. It was working. Batch durations, row counts, streaming heartbeats, all landing in Delta. One problem: the `job_id` and `run_id` columns were null in every row.
These two columns exist to enable you to join custom metrics to Databricks system tables. Without them, my per-batch timing data lives in isolation. With them, one SQL join gives you batch internals correlated with job cost, cluster utilization, and run outcomes. The whole point of the table.
So I asked my LLM coding agent to fix it. What followed was an afternoon of increasingly creative hallucinations, each delivered with full confidence, each completely wrong.
Hallucination 1: Spark conf
The agent’s first suggestion:
job_id = spark.conf.get(”spark.databricks.job.id”)
run_id = spark.conf.get(”spark.databricks.job.runId”)Sensible-looking. There are plenty of Stack Overflow answers and blog posts mentioning these keys. The agent had probably trained on hundreds of them.
The result on our serverless compute:
ERROR: [CONFIG_NOT_AVAILABLE] Configuration spark.databricks.job.id is not available.Not “key not found.” Not “returns null.” A hard error with a JVM stack trace 80 lines long. This config key doesn’t exist in the Spark Connect protocol that serverless uses. The agent had no way to know that because it trained on content from the classic compute era.
Hallucination 2: environment variables
After the Spark conf failure, the agent pivoted to environment variables:
import os
job_id = int(os.environ[”DATABRICKS_JOB_ID”])
run_id = int(os.environ[”DATABRICKS_RUN_ID”])This one was interesting because the agent didn’t just suggest reading env vars. It invented the variable names. `DATABRICKS_JOB_ID` is not a real environment variable that Databricks sets. The agent generated a plausible-sounding name, wrote the code with confidence, and I deployed it.
The metrics kept showing null.
I dumped every environment variable matching “JOB”, “RUN”, or “DATABRICKS” from a running job. Here’s what Databricks actually sets:
DATABRICKS_RUNTIME_VERSION=client.5.1
DATABRICKS_CLUSTER_LIBS_PYTHON_ROOT_DIR=python
DATABRICKS_GANGLIA_ENABLED=FALSERuntime metadata. Library paths. Nothing about job or run identity. `DATABRICKS_JOB_ID` doesn’t exist. The agent made it up.
Hallucination 3: dbutils notebook context
Third attempt. The agent went deeper into the Databricks internals:
ctx = dbutils.notebook.entry_point.getDbutils().notebook().getContext()
job_id = int(ctx.tags().get(”jobId”).get())
run_id = int(ctx.tags().get(”idInJob”).get())This is a real API. It actually works, in notebooks. But we run Python wheel tasks, not notebooks. The error:
module ‘pyspark.dbutils’ has no attribute ‘notebook’The `pyspark.dbutils` module exists in wheel task context but the `notebook` sub-module doesn’t load. There’s no notebook. The module doesn’t load because there’s nothing to load it for. But the agent found an API that looks right, generated the code, and moved on.
Hallucination 4: dynamic references in spark_env_vars
At this point I pulled up the Databricks docs myself. I found the page on dynamic value references. The agent read it too and proposed putting `{{job_id}}` in the DAB `spark_env_vars`:
spark_env_vars:
DATABRICKS_JOB_ID: “{{job_id}}”
DATABRICKS_RUN_ID: “{{run_id}}”Two problems. First, the syntax was wrong. The correct dynamic reference is `{{job.id}}`, not `{{job_id}}`. Second, and more fundamentally, `spark_env_vars` doesn’t resolve dynamic value references at all. The values pass through as literal strings. The cluster environment showed:
DATABRICKS_RUN_ID={{run_id}}Not the run ID. The literal text `{{run_id}}`.
The Databricks docs don’t say “dynamic value references don’t work in spark_env_vars.” They just don’t list spark_env_vars as a supported location. The docs describe where they do work (task parameters, job parameters), but they never explicitly say where they don’t. That silence is a trap for both humans and language models.
The documentation problem
The Databricks documentation for dynamic value references says you can use `{{job.id}}` in “parameters or fields that pass context into tasks.” It gives examples for notebook `base_parameters` and job-level `parameters`. For Python wheel tasks, it says “parameters defined in the task definition are passed as keyword arguments to your code.”
What it doesn’t say:
Which specific YAML fields support resolution and which don’t
That `spark_env_vars` passes values through without resolving them
That the old `spark.databricks.job.id` conf key doesn’t work on serverless
That `dbutils.notebook` doesn’t load in non-notebook task types
Each hallucination mapped to a gap in the documentation. The agent wasn’t generating random nonsense. It was generating reasonable-sounding answers to questions the docs leave unanswered. Incomplete docs don’t just confuse humans. They give LLMs just enough information to construct confident wrong answers.
The human fix: stop guessing, start testing
After four failed attempts, I did what I should have done first. I wrote a test script:
import sys
import os
print(”=== sys.argv ===”)
print(sys.argv)
print(”\n=== Job-related env vars ===”)
for key in sorted(os.environ):
if “JOB” in key or “RUN” in key or “DATABRICKS” in key:
print(f” {key}={os.environ[key]}”)
print(”\n=== dbutils context ===”)
try:
from dbruntime.databricks_repl_context import get_context
ctx = get_context()
print(f” jobId={ctx.jobId}”)
print(f” idInJob={ctx.idInJob}”)
except Exception as e:
print(f” repl_context failed: {e}”)Just a few lines. Created a Databricks job, added job parameters with `{{job.id}}` and `{{job.run_id}}`, set the task parameters to pass them as CLI args, ran it.
The output told me everything in one shot:
`sys.argv` had the resolved job and run IDs from the task parameters
Every env var approach was dead
Spark conf threw hard errors
`dbruntime.databricks_repl_context` actually worked too (undocumented but functional)
Ten minutes from “let me just test this” to knowing exactly which approaches work and which don’t. Compare that to four rounds of agent suggestions, deployments, and failures.
The working solution
Job-level parameters with dynamic value references, referenced from task `named_parameters`. The values arrive as `sys.argv` and get parsed with argparse:
# DAB job definition
parameters:
- name: job_id
default: “{{job.id}}”
- name: run_id
default: “{{job.run_id}}”
tasks:
- python_wheel_task:
entry_point: “my-workflow”
named_parameters:
job_id: “{{job.parameters.job_id}}”
run_id: “{{job.parameters.run_id}}”@staticmethod
def _parse_job_context():
import argparse, sys
parser = argparse.ArgumentParser()
parser.add_argument(”--job_id”, type=int, default=None)
parser.add_argument(”--run_id”, type=int, default=None)
args, _ = parser.parse_known_args(sys.argv[1:])
return args.job_id, args.run_idWe put the parsing in the base `Workflow` class. Any workflow that enables metrics gets job context automatically. The only per-workflow work is adding the `parameters` and `named_parameters` blocks to the DAB YAML.
Not an anti-AI post
I’m not writing this to dunk on LLMs. I use one every day. It wrote most of the boilerplate in our metrics writer. It’s genuinely good at generating code when the problem is well-understood and the patterns are common.
But there’s a specific failure mode that showed up four times in one afternoon: the agent treats documentation gaps as opportunities to interpolate. When the docs don’t say how to do something, it constructs an answer from adjacent knowledge. `spark.databricks.job.id` exists in older Databricks content, so it suggests that. `DATABRICKS_` is a common prefix for their env vars, so it invents one. The `dbutils.notebook.entry_point` chain works in notebooks, so it assumes it works everywhere.
Each interpolation sounds plausible. Each fails for a reason the agent can’t know without testing.
The fix wasn’t more prompting or a better model. It was stepping back and writing a test script. Isolating the problem. Running it. Reading the output. Deciding based on evidence instead of confidence.
That’s not a prompting skill. That’s an engineering skill. The specific one where you stop asking “what should work?” and start asking “what actually works right now, on this compute, in this runtime?”
LLMs write code. Engineers figure out which code to write. Those are different skills, and this afternoon was a good reminder that the second one isn’t going anywhere.

