Building GitHub Copilot Extensions: A Practical Guide

WHAT COPILOT EXTENSIONS ACTUALLY ARE

A GitHub Copilot Extension is a GitHub App with a special capability: it registers an agent endpoint that Copilot Chat forwards messages to when a user invokes it with @your-extension-name. The extension receives the conversation, does whatever it needs to do (query an API, run code, retrieve context), and streams a response back into Copilot Chat.

From the user's perspective it feels native — the response appears in the same chat window as regular Copilot responses, with the same streaming feel. From the developer's perspective it's a webhook with a Server-Sent Events (SSE) response.

The use case that most interests us: building a Copilot extension that knows about your internal codebase, your team's Salesforce org, or your proprietary documentation — things the base Copilot model has no knowledge of. The extension acts as a bridge between Copilot Chat and your system of record.

THE ARCHITECTURE

A Copilot Extension consists of three parts:

A GitHub App — registered in your GitHub account or organisation. This gives you a client ID, client secret, and webhook secret for verifying requests. The App is what users install and what GitHub uses to authenticate the connection.
An agent endpoint — an HTTPS endpoint you host (or run locally via a tunnel during development). GitHub forwards Copilot Chat messages here as POST requests with the conversation payload. Your endpoint streams the response back as SSE.
The agent logic — whatever your extension actually does. Calls to your APIs, database queries, LLM calls with injected context, tool execution. This is pure application code — GitHub imposes no constraints on it.

THE REQUEST FORMAT

Each incoming request from GitHub includes the full conversation history in an OpenAI-compatible messages format, plus metadata about the GitHub user and the installation:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant...",
      "copilot_references": [...]
    },
    {
      "role": "user",
      "content": "@your-agent what's the status of PR #123?",
      "copilot_references": [
        {
          "type": "github.repository",
          "data": { "owner": "acme", "name": "backend" }
        }
      ]
    }
  ]
}

copilot_references is the most useful part of the payload. Copilot automatically attaches context about the user's current state — the repo they're looking at, the file they have open, a selected code range, a PR they've referenced. Your extension receives this context without having to ask for it.

Always verify the request signature before processing. GitHub signs each request with your webhook secret using HMAC-SHA256. Any request that fails signature verification should return a 401 immediately:

import { createHmac, timingSafeEqual } from 'crypto';

function verifySignature(payload, signature, secret) {
  const expected = createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  const sig = Buffer.from(signature.replace('sha256=', ''), 'hex');
  const exp = Buffer.from(expected, 'hex');
  return sig.length === exp.length && timingSafeEqual(sig, exp);
}

STREAMING THE RESPONSE

Responses must be streamed as Server-Sent Events. Copilot Chat expects the OpenAI streaming chunk format — the same data: {...} envelope that the OpenAI SDK emits. This means you can pipe an OpenAI or Anthropic streaming response almost directly to the SSE output, with some envelope wrapping:

res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');

// Forward each chunk from your LLM call
for await (const chunk of llmStream) {
  const delta = chunk.choices?.[0]?.delta?.content ?? '';
  if (delta) {
    res.write(`data: ${JSON.stringify({
      id: chunk.id,
      object: 'chat.completion.chunk',
      choices: [{ delta: { content: delta }, index: 0 }]
    })}\n\n`);
  }
}

// Signal completion
res.write('data: [DONE]\n\n');
res.end();

The [DONE] sentinel is required — Copilot Chat won't close the stream without it.

INJECTING YOUR OWN CONTEXT

The real value of an extension is adding context the base model doesn't have. The pattern is straightforward: extract the user's query from the messages array, retrieve relevant context from your system, inject it into the system prompt or as a preceding assistant message, then call your LLM with the enriched context.

const userMessage = messages.filter(m => m.role === 'user').pop();
const query = userMessage.content.replace('@your-agent', '').trim();

// Retrieve context from your system
const relevantDocs = await searchInternalKnowledgeBase(query);
const sfCaseData   = await fetchRelevantCases(query, githubUser);

// Build enriched system prompt
const systemPrompt = `
You are a senior engineer assistant with access to internal systems.

RELEVANT DOCUMENTATION:
${relevantDocs.map(d => d.content).join('\n\n')}

RECENT SALESFORCE CASES:
${sfCaseData.map(c => `Case ${c.Id}: ${c.Subject} — ${c.Status}`).join('\n')}

Answer the user's question using this context. Cite specific cases or docs when relevant.
`.trim();

// Call your LLM with enriched context
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  stream: true,
  messages: [
    { role: 'system', content: systemPrompt },
    ...messages.filter(m => m.role !== 'system')
  ]
});

WHAT THE DOCS GLOSS OVER

A few things we learned through trial and error:

Response latency matters more than you'd expect. Copilot Chat shows a loading state while it waits for the first SSE event. If your context retrieval is slow, users see a long pause before streaming starts. Push the first token out as fast as possible — even a "Searching..." indicator chunk improves the perceived experience significantly.
The GitHub App OAuth token is short-lived. When you need to make GitHub API calls on behalf of the user (read their repos, fetch PR data), you get a user token from the request headers. It expires in 8 hours. Build token refresh into any feature that caches the token across sessions.
Errors should stream, not 500. If your context retrieval fails, don't return an HTTP error — Copilot Chat handles a streaming error message far better than a broken connection. Return a 200, stream an explanatory message, send [DONE].
Test locally with a tunnel. GitHub needs a public HTTPS endpoint to send requests to. Use ngrok or GitHub's own Codespaces forwarding to expose your local dev server during development — it's much faster than deploying for every change.

THE RIGHT USE CASE

Copilot Extensions shine for one thing: giving developers access to your internal systems from inside their editor, without context switching. Internal documentation Q&A, codebase-specific tooling, integration with your project management or CRM — anything where "I wish Copilot knew about our system" is the pain point.

They're not the right fit for customer-facing products (users need Copilot Business/Enterprise licenses to use extensions) or highly latency-sensitive interactions (the SSE protocol adds overhead compared to direct API calls). But for internal developer tooling, the distribution model is hard to beat: install the GitHub App, mention @your-agent, and it's available everywhere Copilot Chat runs.