Optimize OpenAI and Claude API Costs with TOON

Practical guide to reducing OpenAI GPT and Anthropic Claude API costs by 30-60% using TOON format. Includes code examples and implementation strategies.

ImplementationTOONOpenAIClaude

Token-based pricing from OpenAI and Anthropic can quickly become your biggest AI expense. This guide shows you exactly how to implement TOON in your applications to cut those costs by 30-60% without sacrificing functionality.

Understanding the Cost Problem

Current Pricing (2025)

OpenAI:

  • GPT-4 Turbo: $0.01/1K input tokens, $0.03/1K output
  • GPT-4: $0.03/1K input tokens, $0.06/1K output
  • GPT-3.5 Turbo: $0.0005/1K input tokens, $0.0015/1K output

Anthropic Claude:

  • Claude 3 Opus: $0.015/1K input tokens, $0.075/1K output
  • Claude 3 Sonnet: $0.003/1K input tokens, $0.015/1K output
  • Claude 3 Haiku: $0.00025/1K input tokens, $0.00125/1K output

The Token Tax

Every JSON object you send includes:

  • Curly braces: { }
  • Quotation marks: ""
  • Commas: ,
  • Colons: :

These add up fast.

Example:

{
  "user": {
    "name": "Alice",
    "email": "alice@example.com",
    "age": 30
  }
}

Tokens: ~35 Cost per 1M calls with GPT-4: $1,050

Same data in TOON:

user:
  name: Alice
  email: alice@example.com
  age: 30

Tokens: ~18 Cost per 1M calls with GPT-4: $540 Savings: $510 (49%)

Implementation Strategies

Strategy 1: Boundary Conversion

Convert at the LLM boundary - keep your application logic unchanged.

Architecture:

Your App (JSON) → Converter → TOON → LLM API → Response

OpenAI Implementation

Before (JSON):

const OpenAI = require('openai');
const openai = new OpenAI();

async function analyzeData(data) {
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [{
      role: "user",
      content: JSON.stringify(data)
    }]
  });
  
  return response.choices[0].message.content;
}

// Usage
const result = await analyzeData({
  user: {
    name: "Alice",
    purchases: [
      { item: "Laptop", price: 1200 },
      { item: "Mouse", price: 25 }
    ]
  }
});

After (with TOON):

const OpenAI = require('openai');
const toon = require('toon-js');
const openai = new OpenAI();

async function analyzeData(data) {
  // Convert to TOON before sending
  const toonData = toon.stringify(data);
  
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [{
      role: "user",
      content: toonData
    }]
  });
  
  return response.choices[0].message.content;
}

// Usage - same as before!
const result = await analyzeData({
  user: {
    name: "Alice",
    purchases: [
      { item: "Laptop", price: 1200 },
      { item: "Mouse", price: 25 }
    ]
  }
});

Token Savings: ~45% Code Changes: Minimal (just conversion line) Application Impact: Zero

Claude Implementation

Before (JSON):

const Anthropic = require('@anthropic-ai/sdk');
const anthropic = new Anthropic();

async function processWithClaude(data) {
  const message = await anthropic.messages.create({
    model: "claude-3-sonnet-20240229",
    max_tokens: 1024,
    messages: [{
      role: "user",
      content: JSON.stringify(data)
    }]
  });
  
  return message.content[0].text;
}

After (with TOON):

const Anthropic = require('@anthropic-ai/sdk');
const toon = require('toon-js');
const anthropic = new Anthropic();

async function processWithClaude(data) {
  // Convert to TOON
  const toonData = toon.stringify(data);
  
  const message = await anthropic.messages.create({
    model: "claude-3-sonnet-20240229",
    max_tokens: 1024,
    messages: [{
      role: "user",
      content: toonData
    }]
  });
  
  return message.content[0].text;
}

Token Savings: ~50%

Strategy 2: System Prompt Optimization

Tell the LLM to expect TOON format.

Enhanced System Prompt:

const systemPrompt = `You are a helpful assistant. 
User data will be provided in TOON format (Token-Oriented Object Notation), 
which is similar to YAML but optimized for tokens.

Example TOON format:
key: value
nested:
  child: value
array: [1, 2, 3]

Please respond in natural language.`;

const response = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [
    { role: "system", content: systemPrompt },
    { role: "user", content: toonData }
  ]
});

Benefits:

  • LLM understands TOON format
  • Better context understanding
  • Clearer responses

Strategy 3: Batch Processing

Process multiple items in one request using TOON's efficiency.

Example:

async function batchAnalyze(items) {
  // Create TOON batch
  const batchData = toon.stringify({ items });
  
  const response = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [{
      role: "user",
      content: `Analyze each item:\n${batchData}`
    }]
  });
  
  return response.choices[0].message.content;
}

// Process 100 items at once instead of 100 separate calls
const items = [/* 100 user objects */];
const results = await batchAnalyze(items);

Savings:

  • Fewer API calls
  • Reduced per-request overhead
  • More data per context window

Real-World Use Cases

Use Case 1: E-commerce Product Analysis

Scenario: Analyze product reviews and generate insights

Before (JSON):

const productData = {
  productId: "12345",
  reviews: [
    {
      rating: 5,
      comment: "Great product!",
      date: "2025-01-15"
    },
    {
      rating: 4,
      comment: "Good value",
      date: "2025-01-14"
    }
    // ... 98 more reviews
  ]
};

const analysis = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [{
    role: "user",
    content: `Analyze these reviews: ${JSON.stringify(productData)}`
  }]
});

Tokens: ~2,500 Cost per request: $0.025

After (TOON):

const toonData = toon.stringify(productData);

const analysis = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [{
    role: "user",
    content: `Analyze these reviews:\n${toonData}`
  }]
});

Tokens: ~1,250 (50% reduction) Cost per request: $0.0125 Monthly Savings (1,000 products): $12.50

Use Case 2: Customer Support Chatbot

Scenario: Provide context from user history

Implementation:

async function getChatResponse(userMessage, userContext) {
  // Convert context to TOON
  const contextTOON = toon.stringify({
    user: userContext.profile,
    recentOrders: userContext.orders.slice(0, 5),
    supportHistory: userContext.tickets.slice(0, 3)
  });
  
  const response = await openai.chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [
      {
        role: "system",
        content: "You are a customer support assistant."
      },
      {
        role: "system",
        content: `User context:\n${contextTOON}`
      },
      {
        role: "user",
        content: userMessage
      }
    ]
  });
  
  return response.choices[0].message.content;
}

Before (JSON): ~600 tokens of context After (TOON): ~300 tokens of context Per conversation savings: 50% Monthly savings (10,000 conversations): $150

Use Case 3: Document Summarization

Scenario: Summarize structured documents with Claude

Implementation:

async function summarizeDocument(document) {
  const documentTOON = toon.stringify({
    metadata: document.metadata,
    sections: document.sections,
    references: document.references
  });
  
  const message = await anthropic.messages.create({
    model: "claude-3-sonnet-20240229",
    max_tokens: 500,
    messages: [{
      role: "user",
      content: `Summarize this document:\n${documentTOON}`
    }]
  });
  
  return message.content[0].text;
}

Token Reduction: ~55% Cost Reduction: ~55% Monthly Savings (500 documents): $82.50

Use Case 4: Data Analysis Pipeline

Scenario: Process analytics data through GPT-4

Implementation:

async function analyzeMetrics(metricsData) {
  const metricsTOON = toon.stringify({
    period: metricsData.period,
    metrics: {
      revenue: metricsData.revenue,
      users: metricsData.users,
      engagement: metricsData.engagement
    },
    comparisons: metricsData.previousPeriod
  });
  
  const analysis = await openai.chat.completions.create({
    model: "gpt-4-turbo",
    messages: [{
      role: "user",
      content: `Analyze these metrics and provide insights:\n${metricsTOON}`
    }],
    temperature: 0.3
  });
  
  return analysis.choices[0].message.content;
}

Before: ~1,800 tokens After: ~900 tokens Savings per analysis: $0.027 Monthly Savings (daily reports): $24.30

Advanced Optimization Techniques

1. Selective TOON Conversion

Not all data needs TOON. Identify high-volume, structured data.

function optimizeForLLM(data) {
  // Convert structured data to TOON
  if (isStructuredData(data)) {
    return toon.stringify(data);
  }
  
  // Keep natural text as-is
  return data;
}

2. Caching with TOON

Cache TOON conversions to avoid repeated processing.

const toonCache = new Map();

function getCachedTOON(key, data) {
  if (!toonCache.has(key)) {
    toonCache.set(key, toon.stringify(data));
  }
  return toonCache.get(key);
}

3. Streaming with TOON

Use streaming for large responses.

const stream = await openai.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [{
    role: "user",
    content: toonData
  }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

4. Context Window Management

Fit more data in context with TOON.

GPT-4 Context: 128K tokens

With JSON: ~50 user profiles with full history With TOON: ~85 user profiles with full history (70% more!)

function packContextWindow(users, maxTokens = 120000) {
  const toonUsers = users.map(u => toon.stringify(u));
  let packed = [];
  let tokenCount = 0;
  
  for (const user of toonUsers) {
    const tokens = toon.countTokens(user);
    if (tokenCount + tokens <= maxTokens) {
      packed.push(user);
      tokenCount += tokens;
    } else {
      break;
    }
  }
  
  return packed.join('\n---\n');
}

Cost Tracking and Monitoring

Track Your Savings

class TOONMetrics {
  constructor() {
    this.jsonTokens = 0;
    this.toonTokens = 0;
  }
  
  recordConversion(jsonData) {
    const jsonTokens = this.countTokens(JSON.stringify(jsonData));
    const toonData = toon.stringify(jsonData);
    const toonTokens = toon.countTokens(toonData);
    
    this.jsonTokens += jsonTokens;
    this.toonTokens += toonTokens;
    
    return toonData;
  }
  
  getSavings() {
    const reduction = (this.jsonTokens - this.toonTokens) / this.jsonTokens;
    return {
      jsonTokens: this.jsonTokens,
      toonTokens: this.toonTokens,
      reduction: (reduction * 100).toFixed(2) + '%',
      savedTokens: this.jsonTokens - this.toonTokens
    };
  }
  
  getCostSavings(costPerToken) {
    const savedTokens = this.jsonTokens - this.toonTokens;
    return (savedTokens * costPerToken).toFixed(2);
  }
}

// Usage
const metrics = new TOONMetrics();

async function processData(data) {
  const toonData = metrics.recordConversion(data);
  return await callLLM(toonData);
}

// Check savings
console.log(metrics.getSavings());
// Output: { jsonTokens: 50000, toonTokens: 25000, reduction: '50.00%', savedTokens: 25000 }

console.log(`Saved $${metrics.getCostSavings(0.00001)} so far`);
// Output: Saved $0.25 so far

Migration Checklist

Week 1: Setup

  • Install TOON library
  • Identify high-volume API calls
  • Set up conversion functions
  • Create test environment

Week 2: Testing

  • Convert test data to TOON
  • Validate LLM responses
  • Measure token reduction
  • Compare costs

Week 3: Pilot

  • Deploy to 10% of traffic
  • Monitor for issues
  • Track actual savings
  • Gather feedback

Week 4: Rollout

  • Increase to 50% traffic
  • Full deployment
  • Document savings
  • Share results with team

Common Pitfalls and Solutions

Pitfall 1: Over-Converting

Problem: Converting simple strings to TOON Solution: Only convert structured data

// Don't do this
const name = "Alice";
const toonName = toon.stringify({name}); // Wasteful

// Do this
if (typeof data === 'object' && data !== null) {
  return toon.stringify(data);
}
return data;

Pitfall 2: Ignoring System Prompts

Problem: LLM doesn't understand TOON format Solution: Add TOON explanation to system prompt

Pitfall 3: Not Measuring

Problem: Assuming savings without tracking Solution: Implement metrics tracking from day 1

Expected Results

By Scale

Small (< 10K requests/month):

  • Token reduction: 35-45%
  • Monthly savings: $50-200
  • ROI timeline: 1-2 months

Medium (10K-100K requests/month):

  • Token reduction: 40-55%
  • Monthly savings: $500-3,000
  • ROI timeline: 2-4 weeks

Large (> 100K requests/month):

  • Token reduction: 45-60%
  • Monthly savings: $5,000-50,000
  • ROI timeline: 1-2 weeks

Conclusion

Implementing TOON for OpenAI and Claude APIs:

Benefits:

  • ✅ 30-60% cost reduction
  • ✅ Minimal code changes
  • ✅ Better context window usage
  • ✅ Faster processing
  • ✅ Easy to measure ROI

Implementation:

  1. Add TOON library to your project
  2. Convert at LLM boundary
  3. Update system prompts
  4. Track and measure savings
  5. Scale gradually

ROI: Most applications see positive ROI within 2-4 weeks.

Start small, measure everything, and scale as you see savings. Your AI budget will thank you.


Ready to start saving? Download our implementation starter kit with code samples and best practices.