Using the OpenAI SDK: A Practical Guide

SETUP AND THE BASICS

The OpenAI SDK is available in Python and Node.js, and has become the lingua franca of the AI integration world — many third-party models expose an OpenAI-compatible API specifically because the SDK is so widely understood.

npm install openai
# or
pip install openai

Authentication uses the OPENAI_API_KEY environment variable. A minimal chat completion call:

import OpenAI from 'openai';

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What are the key fields on a Salesforce Account object?' }
  ]
});

console.log(response.choices[0].message.content);

The response shape centres on choices — an array of completions (usually one, unless you set n higher). The message content is at choices[0].message.content for a standard text response. When function calling is involved, content may be null and tool_calls is populated instead.

STREAMING

Streaming is essential for user-facing features. Enable it by setting stream: true — the SDK returns an async iterable of chunks:

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  stream: true,
  messages: [{ role: 'user', content: prompt }]
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

Each chunk's useful payload is at choices[0].delta. For text, that's delta.content. For function calls, the function arguments stream incrementally through delta.tool_calls[0].function.arguments — you accumulate the JSON string across chunks and parse it when the stream ends.

The SDK also offers a higher-level streaming helper that collects chunks and emits typed events:

const stream = client.beta.chat.completions.stream({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: prompt }]
});

stream.on('content', (delta) => process.stdout.write(delta));
const finalCompletion = await stream.finalChatCompletion();

FUNCTION CALLING (TOOL USE)

Function calling lets the model decide to invoke your code based on the conversation. Define tools in the tools array using JSON Schema:

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'function',
      function: {
        name: 'search_knowledge_base',
        description: 'Search the internal knowledge base for articles matching a query.',
        parameters: {
          type: 'object',
          properties: {
            query: { type: 'string', description: 'The search query' },
            limit: { type: 'integer', description: 'Maximum number of results', default: 5 }
          },
          required: ['query']
        }
      }
    }
  ],
  messages
});

When the model calls a tool, response.choices[0].finish_reason is 'tool_calls'. Parse the tool call from message.tool_calls, execute your function, and append a tool role message with the result before calling the API again.

Set tool_choice: 'required' to force the model to always call a tool — useful for structured extraction tasks where you want the model to populate a schema rather than return free text.

STRUCTURED OUTPUTS

Structured Outputs (GA since mid-2024) guarantee that the model returns valid JSON matching a schema you provide. It eliminates the JSON parsing failures that plagued earlier function-calling workflows:

const response = await client.beta.chat.completions.parse({
  model: 'gpt-4o-2026-02-05',
  messages: [
    { role: 'user', content: 'Extract the customer name, issue type, and priority from this ticket: ...' }
  ],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'ticket_extraction',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          customer_name: { type: 'string' },
          issue_type: { type: 'string', enum: ['billing', 'technical', 'account', 'other'] },
          priority: { type: 'string', enum: ['low', 'medium', 'high', 'critical'] }
        },
        required: ['customer_name', 'issue_type', 'priority'],
        additionalProperties: false
      }
    }
  }
});

const ticket = response.choices[0].message.parsed;

The .parse() method (from the beta namespace) handles the JSON parsing automatically — message.parsed gives you the typed object directly.

THE ASSISTANTS API: WHEN TO USE IT

The Assistants API provides managed conversation threads, file search (RAG over uploaded documents), and a code interpreter that runs Python in a sandbox. It's a higher-level abstraction that handles conversation state server-side.

We use it for two scenarios:

Document Q&A — upload PDFs or text files, attach them to a thread, and the assistant retrieves relevant passages automatically. Faster to build than a custom RAG pipeline and good enough for most document sizes.
Code execution — when the feature genuinely needs to run code (data analysis, chart generation, formula evaluation), the code interpreter sandbox is much simpler than spinning up your own execution environment.

We avoid it for high-throughput or latency-sensitive features. Runs are asynchronous and require polling, which adds latency and complexity that the simple chat completions endpoint doesn't have. For straightforward conversational features, the chat completions API with your own message history management is faster and cheaper.

COST AND RATE LIMIT MANAGEMENT

A few operational lessons from running OpenAI in production:

Log usage on every call. response.usage.prompt_tokens and completion_tokens are your only reliable source of cost data. Aggregate them per feature and per user from day one.
Use the right model for the job. GPT-4o is powerful but expensive. GPT-4o-mini handles classification, extraction, and summarisation well at a fraction of the cost. Profile your use cases and match model to task.
Implement a token budget on the prompt side. Truncate conversation history when it grows long rather than sending unbounded context. The cost of a 100,000-token prompt is not trivial.
Handle rate limit errors (429) with exponential backoff. The OpenAI SDK does not retry automatically by default. Wrap your calls in retry logic with jitter for any production-grade integration.