Automating Trajectory Analysis with Agent-Driven Development on GitHub Copilot

By — min read

Overview

As a software engineer or AI researcher, you’ve likely experienced the cycle of building tools to eliminate repetitive tasks—only to end up maintaining those tools. But what if you could automate not just mechanical toil, but intellectual toil? That’s exactly what the eval-agents project does: it leverages GitHub Copilot to create autonomous agents that analyze coding agent trajectories, freeing you to focus on higher-level insights.

Automating Trajectory Analysis with Agent-Driven Development on GitHub Copilot
Source: github.blog

This guide walks you through the principles and step-by-step process of setting up agent-driven development for trajectory analysis. You’ll learn how to author, share, and iterate on agents that sift through thousands of JSON files—each containing agent thought processes and actions—to surface patterns and anomalies. By the end, you’ll be able to replicate this workflow for your own evaluation benchmarks, saving hours of manual analysis.

Prerequisites

Before diving in, ensure you have the following:

  • GitHub Copilot (installed and configured in your IDE)
  • Basic understanding of agent trajectories (e.g., TerminalBench2, SWE-bench formats)
  • Familiarity with JSON and command-line tools
  • Node.js or Python (depending on your preferred scripting language)
  • Access to a benchmark dataset (local or cloud storage)

No deep AI expertise is required—the agents themselves handle the heavy lifting.

Step-by-Step Guide

1. Understand the Problem

Your goal is to analyze hundreds of agent trajectories from benchmark runs. Each trajectory is a JSON file with hundreds of lines describing agent steps, actions, and outcomes. Manually reviewing them is impractical. Instead, you’ll create agents that automate this analysis.

Start by examining a sample trajectory to identify common patterns: successful completions, failures, partial progress, or repeated loops. This will inform the agents you build.

2. Set Up the Project

Create a new directory for your agent-driven analysis project:

mkdir eval-agents-project
cd eval-agents-project

Initialize with a package manager (e.g., npm init or pip init). Add a trajectories/ folder for your data and a agents/ folder for your custom agents.

mkdir trajectories agents

Place a few sample trajectory files in trajectories/ to test your agents.

3. Create Your First Agent

An agent is a script that uses GitHub Copilot’s API (or a local LLM) to process trajectories. Here’s a Python example that reads a JSON trajectory and prompts Copilot to identify issues:

# agents/failure_detector.py
import json
import os
from copilot import Copilot  # hypothetical wrapper

def analyze_trajectory(filepath):
    with open(filepath, 'r') as f:
        traj = json.load(f)
    prompt = f"""
    Analyze this agent trajectory for common failure patterns.
    Trajectory steps: {traj['steps']}
    Output a list of issues with line references.
    """
    copilot = Copilot()
    response = copilot.generate(prompt)
    return response

if __name__ == "__main__":
    result = analyze_trajectory("trajectories/sample1.json")
    print(result)

This is a simplified illustration. In practice, you’d integrate with GitHub Copilot’s chat API or use a chain-of-thought approach.

4. Automate Bulk Analysis

Write a loop to process all trajectories in the folder:

Automating Trajectory Analysis with Agent-Driven Development on GitHub Copilot
Source: github.blog
# agents/batch_analyzer.py
import os
from failure_detector import analyze_trajectory

def run_batch(directory):
    results = {}
    for filename in os.listdir(directory):
        if filename.endswith(".json"):
            filepath = os.path.join(directory, filename)
            results[filename] = analyze_trajectory(filepath)
    return results

if __name__ == "__main__":
    output = run_batch("trajectories")
    for traj, analysis in output.items():
        print(f"{traj}: {analysis[:100]}...")

5. Share and Reuse Agents

To make agents easy to share, package them with a simple interface and document their purpose. Use GitHub to host your agent repository. Consider adding a package.json or setup.py for dependencies.

Encourage teammates to contribute by creating a agents/ directory structure with clear naming conventions (e.g., failure_detector, performance_summarizer). Each agent should have a README.md explaining input/output.

6. Author New Agents

To make authoring easy, follow this template:

# agents/template.py
"""
Agent: [Name]
Purpose: [Describe what it analyzes]
Input: Trajectory file path or JSON object
Output: [Structured analysis]
"""
import json

def run(trajectory):
    # Your Copilot prompt logic here
    pass

Use Copilot itself to help write agents. For instance, start with a comment describing the agent’s behavior, and Copilot will suggest code implementations.

Common Mistakes

  • Overcomplicating agents: Start with a simple prompt and expand only when needed. Avoid trying to handle every edge case upfront.
  • Ignoring existing patterns: Reuse agent templates from your team or community—don’t reinvent the wheel.
  • Forgetting to test: Always test agents with a small sample of trajectories before running on full datasets.
  • Not iterating: Agent performance improves with feedback. Run your analysis, verify results, and adjust prompts accordingly.

Summary

Agent-driven development with GitHub Copilot transforms tedious trajectory analysis into an automated, scalable process. By building and sharing agents, you not only save time but also enable your team to collaborate on deeper insights. The key is to start small, iterate, and leverage Copilot’s generative capabilities to compose analysis logic. With this guide, you can now create your own eval-agents workflow and reduce manual review from hours to minutes.

Tags:

Recommended

Discover More

BioticsAI CEO Reveals Blueprint for FDA Approval and Fundraising in Heavily Regulated Healthcare AI SpaceLightweight Linux Distros Breathe New Life Into 4GB Laptops: Surprising Contender Rises Above the RestModernizing Your Go Codebase with go fix: A Step-by-Step GuideApple Q2 2026 Earnings Breakdown: Revenue Hits $111.2B, Up 17%TGR-STA-1030 Intensifies Cyber Operations Across Latin America