Using the OpenAI SDK: A Practical Guide

SETUP AND THE BASICS

The OpenAI SDK is available in Python and Node.js, and has become the lingua franca of the AI integration world — many third-party models expose an OpenAI-compatible API specifically because the SDK is so widely understood.

npm install openai
# or
pip install openai

Authentication uses the OPENAI_API_KEY environment variable. A minimal chat completion call:

import OpenAI from 'openai';

const client = new OpenAI();

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What are the key fields on a Salesforce Account object?' }
  ]
});

console.log(response.choices[0].message.content);

The response shape centres on choices — an array of completions (usually one, unless you set n higher). The message content is at choices[0].message.content for a standard text response. When function calling is involved, content may be null and tool_calls is populated instead.

STREAMING

Streaming is essential for user-facing features. Enable it by setting stream: true — the SDK returns an async iterable of chunks:

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  stream: true,
  messages: [{ role: 'user', content: prompt }]
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

Each chunk's useful payload is at choices[0].delta. For text, that's delta.content. For function calls, the function arguments stream incrementally through delta.tool_calls[0].function.arguments — you accumulate the JSON string across chunks and parse it when the stream ends.

The SDK also offers a higher-level streaming helper that collects chunks and emits typed events:

const stream = client.beta.chat.completions.stream({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: prompt }]
});

stream.on('content', (delta) => process.stdout.write(delta));
const finalCompletion = await stream.finalChatCompletion();

FUNCTION CALLING (TOOL USE)

Function calling lets the model decide to invoke your code based on the conversation. Define tools in the tools array using JSON Schema:

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  tools: [
    {
      type: 'function',
      function: {
        name: 'search_knowledge_base',
        description: 'Search the internal knowledge base for articles matching a query.',
        parameters: {
          type: 'object',
          properties: {
            query: { type: 'string', description: 'The search query' },
            limit: { type: 'integer', description: 'Maximum number of results', default: 5 }
          },
          required: ['query']
        }
      }
    }
  ],
  messages
});

When the model calls a tool, response.choices[0].finish_reason is 'tool_calls'. Parse the tool call from message.tool_calls, execute your function, and append a tool role message with the result before calling the API again.

Set tool_choice: 'required' to force the model to always call a tool — useful for structured extraction tasks where you want the model to populate a schema rather than return free text.

STRUCTURED OUTPUTS

Structured Outputs (GA since mid-2024) guarantee that the model returns valid JSON matching a schema you provide. It eliminates the JSON parsing failures that plagued earlier function-calling workflows:

const response = await client.beta.chat.completions.parse({
  model: 'gpt-4o-2026-02-05',
  messages: [
    { role: 'user', content: 'Extract the customer name, issue type, and priority from this ticket: ...' }
  ],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'ticket_extraction',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          customer_name: { type: 'string' },
          issue_type: { type: 'string', enum: ['billing', 'technical', 'account', 'other'] },
          priority: { type: 'string', enum: ['low', 'medium', 'high', 'critical'] }
        },
        required: ['customer_name', 'issue_type', 'priority'],
        additionalProperties: false
      }
    }
  }
});

const ticket = response.choices[0].message.parsed;

The .parse() method (from the beta namespace) handles the JSON parsing automatically — message.parsed gives you the typed object directly.

THE ASSISTANTS API: WHEN TO USE IT

The Assistants API provides managed conversation threads, file search (RAG over uploaded documents), and a code interpreter that runs Python in a sandbox. It's a higher-level abstraction that handles conversation state server-side.

We use it for two scenarios:

We avoid it for high-throughput or latency-sensitive features. Runs are asynchronous and require polling, which adds latency and complexity that the simple chat completions endpoint doesn't have. For straightforward conversational features, the chat completions API with your own message history management is faster and cheaper.

COST AND RATE LIMIT MANAGEMENT

A few operational lessons from running OpenAI in production: