Automating Large-Scale Dataset Migrations with Background Coding Agents

By — min read

Overview

Migrating thousands of datasets across downstream consumer services can be a daunting task—prone to human error, time-consuming, and often blocking development velocity. At Spotify, we tackled this challenge by combining three powerful tools: Honk (our background coding agent), Backstage (the developer portal), and Fleet Management (our service orchestration layer). This tutorial walks through how you can build a similar system to automate dataset migrations with confidence and minimal manual overhead.

Automating Large-Scale Dataset Migrations with Background Coding Agents
Source: engineering.atspotify.com

By the end of this guide, you’ll understand how to design background agents that decode migration rules, apply schema transformations, and coordinate updates across a fleet of services—all while maintaining observability and rollback capabilities.

Prerequisites

Before diving in, ensure you have the following:

  • Working knowledge of Backstage – You should be familiar with creating entities, plugins, and software templates.
  • Basic understanding of Honk – Honk is a code generation and execution agent that runs as a background process. Understanding its lifecycle (generation, validation, execution) is helpful.
  • Access to Fleet Management APIs – Ability to trigger deployments, rolling updates, and health checks across your service fleet.
  • A target dataset system – For example, a data warehouse, streaming platform, or key-value store that stores consumer datasets.
  • Python/Go familiarity – Examples use Python-style pseudocode, but the principles apply to any language.

Step-by-Step Instructions

1. Model Your Migration as a Backstage Entity

Backstage serves as the central catalog for all your services, datasets, and migrations. Create a new entity type called Migration with the following YAML template:

# catalog/migration-template.yaml
apiVersion: backstage.io/v1beta1
kind: Component
metadata:
  name: ${{ name }}
  annotations:
    honk/agent: honk-migration-agent
    fleet-manager/target-group: ${{ fleet_group }}
spec:
  type: dataset-migration
  lifecycle: experimental
  owner: ${{ owner }}
  dependsOn:
    - component:default/${{ source_dataset }}
    - component:default/${{ target_dataset }}

This entity links the migration to a Honk agent (via annotation) and defines its dependencies. Register this template in Backstage so developers can create migration requests with standard fields like source_dataset, target_dataset, and migration_strategy.

2. Define Migration Rules in Honk

Honk agents are triggered by Backstage entity lifecycle events. Create a migration agent script that reads the entity spec and generates a plan:

# honk/agents/migration_agent.py
import yaml
from honk import Agent, Plan, Step

class DatasetMigrationAgent(Agent):
    def generate_plan(self, entity):
        spec = entity['spec']
        source = spec['source_dataset']
        target = spec['target_dataset']
        strategy = spec['migration_strategy']

        plan = Plan(name=f"migrate-{source}-to-{target}")
        
        if strategy == 'schema-transform':
            plan.add_step(Step(
                name='validate-schema',
                action='validate',
                params={'source': source, 'target': target}
            ))
            plan.add_step(Step(
                name='transform-data',
                action='transform',
                params={'mapping_file': spec['mapping_file']}
            ))
        elif strategy == 'bulk-copy':
            plan.add_step(Step(
                name='copy-dataset',
                action='copy',
                params={'source': source, 'target': target}
            ))
        # ... additional strategies
        return plan

Honk will store this plan and execute it in a background thread, reporting status back to Backstage.

3. Orchestrate with Fleet Management

Once Honk generates the plan, it needs to coordinate with Fleet Management to safely roll out changes across services. Use Fleet Management’s rollout API:

# fleet_manager/rollout.py
import requests

def trigger_rollout(service_name, migration_id):
    payload = {
        'service': service_name,
        'migration_id': migration_id,
        'rollout_strategy': 'canary',  # or 'blue-green'
        'max_instances': 5
    }
    response = requests.post(
        'https://fleet-manager.internal/rollouts',
        json=payload
    )
    response.raise_for_status()
    return response.json()['rollout_id']

Integrate this call into the Honk agent after each migration step that modifies consumer code or data schemas. For example, after a schema transform, the agent can instruct Fleet Management to gradually update services that read the old schema.

Automating Large-Scale Dataset Migrations with Background Coding Agents
Source: engineering.atspotify.com

4. Hook Everything Together via Backstage Actions

Backstage provides a scaffolder plugin that can trigger Honk plans directly. Create a custom action:

# backstage-plugin/scaffolder/actions/migrateDataset.ts
import { createTemplateAction } from '@backstage/plugin-scaffolder-node';

export const migrateDatasetAction = createTemplateAction({
  id: 'honk:migrate-dataset',
  async handler(ctx) {
    const { entityRef, source, target } = ctx.input;
    // Fetch Honk agent endpoint from entity annotations
    const honkUrl = await ctx.discovery.getUrl('honk');
    const response = await fetch(`${honkUrl}/agents/migrate`, {
      method: 'POST',
      body: JSON.stringify({ entityRef, source, target }),
    });
    ctx.output('migrationId', response.json().id);
  },
});

Now, when a developer creates a new migration entity in Backstage, the scaffolder can automatically call this action—starting Honk and notifying Fleet Management—all from a simple form.

5. Monitor and Rollback

Add observability via Backstage’s TechDocs and custom plugins. Honk can log every step to a structured log stream, which feeds into a monitoring dashboard. Create a rollback step in Honk that reverses the last plan action:

def rollback(plan: Plan):
    for step in reversed(plan.steps):
        if step.status == 'completed':
            reverse_action = reverse_mapping[step.action]
            reverse_action(step.params)
            # Notify Fleet Management to rollback services
            fleet_manager.rollback(step.params['rollout_id'])

Include this rollback as a manual trigger in Backstage, so operators can click a “Rollback” button that restores the previous state.

Common Mistakes

  • Not separating plan generation from execution – If Honk both plans and executes without human review, migrations may proceed with incorrect assumptions. Always include a validation step or manual approval.
  • Overlooking backpressure – Fleet Management may throttle rollouts. Ensure Honk agents poll rollback status rather than assuming instant completion.
  • Hardcoding environment-specific values – Migration rules, endpoints, and dataset names should be parameters. Use Backstage’s entity metadata to inject them dynamically.
  • Ignoring schema drift – If source datasets change mid-migration, agents need to re-validate. Implement a pre-flight check that compares current state with the plan’s original snapshot.
  • Missing rollback capability for partial migrations – Not every step is reversible (e.g., deletions). Define rollback strategies explicitly for each migration type in your Honk plan.

Summary

By combining Honk’s background coding agents with Backstage’s catalog and Fleet Management’s orchestration, you can automate even the most complex dataset migrations. The key takeaways are: model migrations as entities, let agents generate and execute plans, coordinate with fleet operations, and always build in observability and rollback. This approach reduces manual toil, increases safety, and scales to thousands of datasets.

Tags:

Recommended

Discover More

How to Enable Self-Improvement in Language Models: A Guide to MIT's SEAL FrameworkThe Inside Story of GitHub’s Critical RCE Vulnerability: 6 Key Facts You Need to KnowKubernetes v1.36: Declarative Validation Goes GA – Your Questions AnsweredEssential Security Patches You Must Apply This WeekVibe Coding vs. Spec-Driven Development: Choosing the Right AI Approach for Your Team