Quick Facts
- Category: Open Source
- Published: 2026-05-01 15:49:32
- Framework Laptop 16 OCuLink Dev Kit Promises Desktop-Grade External GPU Support by Year-End
- Navigating Ingress-NGINX Quirks: What to Know Before Migration
- Chainguard Forks Abandoned Open Source Projects to Plug Security Gaps
- Biwin M350 2TB SSD Q&A: Is It the Best Budget PCIe 4.0 Drive?
- Everything You Need to Know About Ubuntu 26.04 LTS 'Resolute Raccoon'
Introduction
Documentation is the gateway to your project, especially for open-source tools. When a command fails, an output doesn't match, or a step is unclear, most users won't file a bug report—they'll just leave. This silent drift accumulates as your code evolves, and manual testing can't keep up. The Drasi team, a CNCF sandbox project, faced this exact problem: they shipped code faster than they could manually test tutorials. After a Docker update broke every tutorial, they realized they needed an automated approach. By treating documentation testing as a monitoring problem, they built an AI agent using GitHub Copilot CLI and Dev Containers to act as a synthetic new user. In this guide, you'll learn how to replicate this process for your own project, turning documentation maintenance into an automated, continuous process.

What You Need
- GitHub Copilot CLI – The command-line interface for Copilot, which can be used to simulate user interactions.
- Dev Containers (Visual Studio Code Dev Containers or GitHub Codespaces) – To create reproducible, isolated environments for testing.
- Your project's documentation – Specifically, step-by-step tutorials or getting-started guides with exact commands and expected outputs.
- Basic scripting knowledge – Familiarity with Bash, Python, or similar to orchestrate the agent.
- A testing framework or output verifier – Something to compare actual outputs against expected ones (e.g.,
diff,grep, or a lightweight test runner). - A containerized environment – Docker installed, plus any dependencies your tutorials require (e.g., k3d, sample databases).
Step-by-Step Instructions
Step 1: Define the Agent's Behavior
The key is to create an agent that mimics a naïve, literal, and unforgiving new user. Write down three principles for your agent:
- Naïveté: It must have no prior knowledge of your project. It only knows what's explicitly written in the documentation.
- Literal execution: Every command must be executed exactly as written. If a step is missing, the agent should fail.
- Unforgiving verification: It checks every expected output. If the doc says "You should see 'Success'" and the CLI returns nothing, the agent flags it as a bug.
Document these rules so your script can enforce them.
Step 2: Set Up a Dev Container with All Dependencies
Create a .devcontainer/devcontainer.json file for your repository. This ensures the testing environment matches your users' setup exactly. Include:
- The base image (e.g., Ubuntu, Debian).
- All runtime dependencies (Docker, k3d, sample databases, etc.).
- Environment variables and startup commands that replicate the tutorial's prerequisites.
Test the container manually first to confirm it builds and launches correctly.
Step 3: Extract Expected Commands and Outputs from Documentation
Parse your tutorial markdown files to extract each code block or instruction. For each step, note:
- The command (exact text from the
>or code fence). - The expected output (if mentioned).
- Any branching logic (e.g., "if you see this, run that").
You can do this manually for a few tutorials, or write a parser using regex. Save the mapping in a JSON file like tutorial_steps.json.
Step 4: Build the Agent Using GitHub Copilot CLI
GitHub Copilot CLI can be used to generate scripts that simulate user actions. Create a main script (e.g., agent_test.sh) that:
- Loops through each step from your extracted JSON.
- Runs the command using
copilot run <command>or directly in the shell. - Captures the output (stdout and stderr).
- Compares the output against the expected result using an assertion function (e.g.,
assert_output_contains "Success").
Example snippet:
# Inside agent_test.sh
while read -r step; do
command=$(echo "$step" | jq -r '.command')
expected=$(echo "$step" | jq -r '.expected_output')
output=$(eval "$command" 2>&1)
if echo "$output" | grep -q "$expected"; then
echo "PASS: $command"
else
echo "FAIL: $command"
exit 1
fi
done < <(jq -c '.[]' tutorial_steps.json)Note: For security, avoid eval in production; use proper input validation.

Step 5: Integrate with a Continuous Testing Pipeline
Place the agent script inside your Dev Container setup and run it automatically on a schedule or after code changes. Use GitHub Actions or a cron job to:
- Spin up the Dev Container.
- Execute
agent_test.sh. - Collect results and report failures (e.g., via issue creation or email).
Example GitHub Action workflow:
name: Test Documentation
on:
schedule:
- cron: '0 6 * * *' # daily
push:
paths:
- 'docs/**'
- '.devcontainer/**'
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build and run agent
uses: devcontainers/ci@v0.3
with:
runCmd: bash agent_test.shStep 6: Handle Silent Failures with Monitoring
Standard CI passes if the script exits with code 0, but silent drift (e.g., a command that succeeds but produces unexpected side effects) requires extra care. Implement these strategies:
- Log everything: Capture full command outputs and store them in a file with timestamps.
- Diff against a golden copy: After an initial successful run, save the output as the baseline. Future runs compare against this baseline; any diff signals a change.
- Monitor for missing commands: If a step in your JSON is skipped because the agent got stuck, alert the team.
Step 7: Iterate and Improve the Agent
Run the agent on your existing tutorials. Fix any failures by updating your documentation or the agent's assumptions. Over time, you'll build a robust test suite. Consider adding:
- Parallel testing for multiple tutorials.
- Versioning – Test against different releases or branches.
- User feedback integration – Map real user issues to the failing steps.
Tips for Success
- Start small: Begin with one simple tutorial to validate your agent before expanding.
- Use a deterministic environment: Dev Containers ensure reproducibility, but also pin dependency versions to avoid unexpected breaks from external changes.
- Pair with human review: Automated agents catch many bugs, but some require human context (e.g., ambiguous wording). Use the agent's reports as a triage tool.
- Share your agent script with your team so others can contribute improvements.
- Celebrate caught bugs: When the agent finds a documentation error before a user does, it's a win. Log these wins to motivate maintenance.
- Consider privacy: If your documentation contains proprietary commands, be cautious about running them in an automated agent without approvals.
By following these steps, you transform documentation testing from a manual, reactive chore into an automated, proactive process. Your synthetic user will tirelessly verify every step, ensuring that your getting-started experience remains smooth even as your code evolves.