bug: repair_run triggers full job run instead of repairing failed tasks

## Summary

When a user asks Claude to perform a **task repair** (re-run only the failed tasks of a previous job run), the skill triggers a full `run_now` instead of a `repair_run`. This is because `repair_run` is not implemented in the MCP tools.

## Steps to Reproduce

1. Run a Databricks job where one or more tasks fail
2. Ask Claude Code (with ai-dev-kit) to repair the failed run
3. Observe that a **new full job run** is triggered instead of repairing the failed tasks

## Expected Behavior

Claude should call `jobs.repair_run(run_id=<failed_run_id>, ...)` to re-run only the failed/skipped tasks from the original run, preserving the successful task outputs.

## Actual Behavior

Claude falls back to `manage_job_runs(action='run_now', job_id=...)`, which starts a brand new full run from scratch.

## Root Cause

The `manage_job_runs` MCP tool only exposes these actions: `run_now`, `get`, `get_output`, `cancel`, `list`, `wait`.

A `repair` action is missing. The Databricks SDK supports `w.jobs.repair_run()` but it has not been implemented in:
- `databricks-mcp-server/databricks_mcp_server/tools/jobs.py`
- `databricks-tools-core/databricks_tools_core/jobs/runs.py`

## Proposed Fix

1. Add a `repair_run()` function in `databricks_tools_core/jobs/runs.py` using `w.jobs.repair_run()`
2. Add a `repair` action to `manage_job_runs` in `databricks_mcp_server/tools/jobs.py`
3. Update `SKILL.md` to document the repair workflow

## Impact

Customers using ai-dev-kit for job orchestration and failure recovery are inadvertently re-running entire jobs, wasting compute and time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: repair_run triggers full job run instead of repairing failed tasks #392

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Proposed Fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: repair_run triggers full job run instead of repairing failed tasks #392

Description

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Proposed Fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions