Using the Claude SDK: A Practical Guide

SETUP AND FIRST CALL

The Anthropic SDK is available for Python and TypeScript. Installation is standard:

npm install @anthropic-ai/sdk
# or
pip install anthropic

Authentication uses an API key via the ANTHROPIC_API_KEY environment variable — the client picks it up automatically, so you don't need to pass it explicitly in most setups:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const message = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Summarise this support ticket in one sentence.' }
  ]
});

console.log(message.content[0].text);

The response shape is consistent and predictable. message.content is an array of content blocks — in a basic call there's one text block, but with tool use or vision there can be multiple. Always index into content rather than assuming a single string.

STREAMING

For any user-facing feature, stream the response. Waiting for the full completion before showing anything to the user feels broken, even at low latencies. The SDK makes streaming straightforward:

const stream = await client.messages.stream({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [{ role: 'user', content: prompt }]
});

for await (const chunk of stream) {
  if (
    chunk.type === 'content_block_delta' &&
    chunk.delta.type === 'text_delta'
  ) {
    process.stdout.write(chunk.delta.text);
  }
}

const finalMessage = await stream.finalMessage();

The stream.finalMessage() call waits for completion and returns the full message object, including usage stats. We always capture this — finalMessage.usage gives you input and output token counts, which you need for cost tracking in any production system.

TOOL USE

Tool use (function calling) is where the SDK earns its keep for agent-style features. You define tools as JSON schemas, and Claude decides whether and when to call them based on the conversation:

const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  tools: [
    {
      name: 'get_case_details',
      description: 'Retrieve full details of a Salesforce support case by ID.',
      input_schema: {
        type: 'object',
        properties: {
          case_id: {
            type: 'string',
            description: 'The Salesforce Case ID, e.g. 5001a00000AbCdEf'
          }
        },
        required: ['case_id']
      }
    }
  ],
  messages: [{ role: 'user', content: 'What is the status of case 5001a00000AbCdEf?' }]
});

When Claude wants to call a tool, response.stop_reason is 'tool_use' and the content array contains a tool_use block with the tool name and input. You execute the tool, add the result to the message history, and call the API again. The loop continues until stop_reason is 'end_turn'.

The SDK's tool runner handles this loop automatically — for most use cases you don't need to write the loop manually.

SYSTEM PROMPTS AND PROMPT CACHING

System prompts set the context and persona for the conversation. They go in the system parameter, separate from the messages array:

const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  system: 'You are a support analyst for Acme Corp. You have access to case data and should provide concise, accurate responses. Always cite the case ID when referencing specific cases.',
  messages: conversationHistory
});

If your system prompt is long and stable across requests — a detailed instruction set, reference documentation, a large tool schema — use prompt caching. You mark the stable prefix with a cache_control breakpoint, and Anthropic caches it server-side. Subsequent requests that hit the cache cost 10% of normal input token pricing for the cached portion:

system: [
  {
    type: 'text',
    text: longStableSystemPrompt,
    cache_control: { type: 'ephemeral' }
  }
]

The cache persists for 5 minutes and resets on each hit, so it's effective for any steady-state usage pattern. For a feature handling dozens of requests per minute, prompt caching is the single highest-leverage cost optimisation available.

VISION

Claude is natively multimodal. Passing images works by including image content blocks in the user message:

const response = await client.messages.create({
  model: 'claude-opus-4-7',
  max_tokens: 1024,
  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'image',
          source: {
            type: 'base64',
            media_type: 'image/png',
            data: base64ImageData
          }
        },
        {
          type: 'text',
          text: 'Describe what you see in this screenshot and identify any error messages.'
        }
      ]
    }
  ]
});

You can also pass a URL directly using type: 'url' instead of base64 — useful when images are already hosted. For the Salesforce Screen Recorder, we use vision to have Claude annotate screenshots before packaging them into the debug ZIP.

PATTERNS THAT HOLD UP IN PRODUCTION

A few things we've learned building production features on the Claude SDK:

Always set max_tokens explicitly. The default is low and will silently truncate responses on longer outputs. Set it to the realistic maximum for your use case — you only pay for what Claude actually generates.
Track usage on every call. Log message.usage.input_tokens and output_tokens per request, per feature, per user. AI costs can surprise you at scale and you need the data to optimise.
Treat the system prompt as code. Version it, review changes to it, and test the effects of changes before deploying. A one-line change to a system prompt can meaningfully change output behaviour across thousands of requests.
Don't retry 4xx errors automatically. A 400 Bad Request means your payload is malformed — retrying won't fix it. Only retry on 529 (overloaded) and network errors, with exponential backoff.
Build evals before you build features. A set of input/output pairs that represent the behaviour you want is the only reliable way to know whether a model change or prompt change is an improvement or a regression.