GitHub Service Incident Recap: April 2026

By — min read

In April 2026, GitHub encountered 10 incidents that affected service performance. Two significant events—a prolonged code search outage and a brief audit log disruption—are highlighted below. This Q&A covers the details, causes, fixes, and the company’s ongoing transparency efforts. Note: No repository data was lost in any incident.

How did GitHub’s services perform overall in April 2026?

During April 2026, GitHub experienced 10 incidents that led to degraded performance across various services. The most notable was a code search outage on April 1 lasting 8 hours and 43 minutes, where 100% of search queries failed for over two hours, followed by stale results. A separate audit log service issue occurred the same day, lasting only 4 minutes but affecting API and web UI availability for a 28-minute window. GitHub released a blog post at the end of the month detailing these major incidents and emphasized ongoing investments in near-term and long-term reliability improvements.

GitHub Service Incident Recap: April 2026 — Source: github.blog

What was the main code search outage and how long did it last?

On April 1, 2026, between 14:40 and 17:00 UTC, GitHub’s code search service was completely unavailable—every search query failed. Service was partially restored in a degraded state at 17:00 UTC, returning stale results (not reflecting changes after ~07:00 UTC). Full recovery with current data was achieved by 23:45 UTC. The total duration of degraded or unavailable service was 8 hours and 43 minutes, with 2 hours and 20 minutes of total blackout. No repository data was impacted; the search index is a secondary derived source.

What caused the code search outage and how was it resolved?

The root cause was an automated change during a routine infrastructure upgrade to the messaging system supporting code search. The change was applied too aggressively, causing a coordination failure between internal services. This halted search indexing, making results stale. While engineers worked to recover the messaging infrastructure, an unintended service deployment cleared internal routing state, escalating the staleness into a complete outage. The fix involved a controlled restart of the messaging system and resetting the search index to a pre-disruption point. Re-indexing completed by end of day, returning full functionality. No Git repositories were affected.

What was the audit log service incident on April 1?

On the same day, between 15:34 and 16:02 UTC, the audit log service lost connectivity to its backing data store due to a failed credential rotation. This caused a 28-minute window where audit log history was unavailable via the API and web UI, resulting in 5xx errors for 4,297 API actors and 127 users. Additionally, events created during that window were delayed by up to 29 minutes. However, no audit log events were lost; all were eventually written and streamed successfully. Customers using GitHub Enterprise Cloud with data residency were not impacted. The team was alerted 6 minutes after the failure began.

What steps has GitHub taken to improve transparency about incidents?

At the end of April, GitHub released a detailed blog post covering the major incidents on April 23 and April 27 (in addition to the April 1 events). They have also expanded the information provided on the GitHub status page to give more granular details during ongoing incidents. These actions are part of a broader effort to increase transparency and keep users informed about service health and incident response.

What improvements are being implemented to prevent similar outages?

GitHub is implementing several measures based on lessons from the April incidents: more gradual upgrades with better health checks to catch problems before they cascade; deployment safeguards to prevent unintended changes during active incidents; faster recovery tooling to reduce time to restore service; and better traffic isolation to prevent cascading impact from unexpected traffic spikes during outages. For the audit log issue, credential rotation processes are being hardened to reduce the risk of failures.

Tags: