WHAT COPILOT EXTENSIONS ACTUALLY ARE
A GitHub Copilot Extension is a GitHub App with a special capability: it registers an agent endpoint that Copilot Chat forwards messages to when a user invokes it with @your-extension-name. The extension receives the conversation, does whatever it needs to do (query an API, run code, retrieve context), and streams a response back into Copilot Chat.
From the user's perspective it feels native — the response appears in the same chat window as regular Copilot responses, with the same streaming feel. From the developer's perspective it's a webhook with a Server-Sent Events (SSE) response.
The use case that most interests us: building a Copilot extension that knows about your internal codebase, your team's Salesforce org, or your proprietary documentation — things the base Copilot model has no knowledge of. The extension acts as a bridge between Copilot Chat and your system of record.
THE ARCHITECTURE
A Copilot Extension consists of three parts:
- A GitHub App — registered in your GitHub account or organisation. This gives you a client ID, client secret, and webhook secret for verifying requests. The App is what users install and what GitHub uses to authenticate the connection.
- An agent endpoint — an HTTPS endpoint you host (or run locally via a tunnel during development). GitHub forwards Copilot Chat messages here as POST requests with the conversation payload. Your endpoint streams the response back as SSE.
- The agent logic — whatever your extension actually does. Calls to your APIs, database queries, LLM calls with injected context, tool execution. This is pure application code — GitHub imposes no constraints on it.
THE REQUEST FORMAT
Each incoming request from GitHub includes the full conversation history in an OpenAI-compatible messages format, plus metadata about the GitHub user and the installation:
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant...",
"copilot_references": [...]
},
{
"role": "user",
"content": "@your-agent what's the status of PR #123?",
"copilot_references": [
{
"type": "github.repository",
"data": { "owner": "acme", "name": "backend" }
}
]
}
]
}
copilot_references is the most useful part of the payload. Copilot automatically attaches context about the user's current state — the repo they're looking at, the file they have open, a selected code range, a PR they've referenced. Your extension receives this context without having to ask for it.
Always verify the request signature before processing. GitHub signs each request with your webhook secret using HMAC-SHA256. Any request that fails signature verification should return a 401 immediately:
import { createHmac, timingSafeEqual } from 'crypto';
function verifySignature(payload, signature, secret) {
const expected = createHmac('sha256', secret)
.update(payload)
.digest('hex');
const sig = Buffer.from(signature.replace('sha256=', ''), 'hex');
const exp = Buffer.from(expected, 'hex');
return sig.length === exp.length && timingSafeEqual(sig, exp);
}
STREAMING THE RESPONSE
Responses must be streamed as Server-Sent Events. Copilot Chat expects the OpenAI streaming chunk format — the same data: {...} envelope that the OpenAI SDK emits. This means you can pipe an OpenAI or Anthropic streaming response almost directly to the SSE output, with some envelope wrapping:
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Forward each chunk from your LLM call
for await (const chunk of llmStream) {
const delta = chunk.choices?.[0]?.delta?.content ?? '';
if (delta) {
res.write(`data: ${JSON.stringify({
id: chunk.id,
object: 'chat.completion.chunk',
choices: [{ delta: { content: delta }, index: 0 }]
})}\n\n`);
}
}
// Signal completion
res.write('data: [DONE]\n\n');
res.end();
The [DONE] sentinel is required — Copilot Chat won't close the stream without it.
INJECTING YOUR OWN CONTEXT
The real value of an extension is adding context the base model doesn't have. The pattern is straightforward: extract the user's query from the messages array, retrieve relevant context from your system, inject it into the system prompt or as a preceding assistant message, then call your LLM with the enriched context.
const userMessage = messages.filter(m => m.role === 'user').pop();
const query = userMessage.content.replace('@your-agent', '').trim();
// Retrieve context from your system
const relevantDocs = await searchInternalKnowledgeBase(query);
const sfCaseData = await fetchRelevantCases(query, githubUser);
// Build enriched system prompt
const systemPrompt = `
You are a senior engineer assistant with access to internal systems.
RELEVANT DOCUMENTATION:
${relevantDocs.map(d => d.content).join('\n\n')}
RECENT SALESFORCE CASES:
${sfCaseData.map(c => `Case ${c.Id}: ${c.Subject} — ${c.Status}`).join('\n')}
Answer the user's question using this context. Cite specific cases or docs when relevant.
`.trim();
// Call your LLM with enriched context
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
stream: true,
messages: [
{ role: 'system', content: systemPrompt },
...messages.filter(m => m.role !== 'system')
]
});
WHAT THE DOCS GLOSS OVER
A few things we learned through trial and error:
- Response latency matters more than you'd expect. Copilot Chat shows a loading state while it waits for the first SSE event. If your context retrieval is slow, users see a long pause before streaming starts. Push the first token out as fast as possible — even a "Searching..." indicator chunk improves the perceived experience significantly.
- The GitHub App OAuth token is short-lived. When you need to make GitHub API calls on behalf of the user (read their repos, fetch PR data), you get a user token from the request headers. It expires in 8 hours. Build token refresh into any feature that caches the token across sessions.
- Errors should stream, not 500. If your context retrieval fails, don't return an HTTP error — Copilot Chat handles a streaming error message far better than a broken connection. Return a 200, stream an explanatory message, send
[DONE]. - Test locally with a tunnel. GitHub needs a public HTTPS endpoint to send requests to. Use
ngrokor GitHub's own Codespaces forwarding to expose your local dev server during development — it's much faster than deploying for every change.
THE RIGHT USE CASE
Copilot Extensions shine for one thing: giving developers access to your internal systems from inside their editor, without context switching. Internal documentation Q&A, codebase-specific tooling, integration with your project management or CRM — anything where "I wish Copilot knew about our system" is the pain point.
They're not the right fit for customer-facing products (users need Copilot Business/Enterprise licenses to use extensions) or highly latency-sensitive interactions (the SSE protocol adds overhead compared to direct API calls). But for internal developer tooling, the distribution model is hard to beat: install the GitHub App, mention @your-agent, and it's available everywhere Copilot Chat runs.