Understanding TOON Specification and Syntax

A comprehensive technical guide to TOON format specification, syntax rules, data types, and advanced features for developers.

Technical DocumentationTOONSpecificationSyntax

A deep dive into TOON's formal specification, syntax rules, and technical details. This guide serves as the authoritative reference for developers implementing TOON parsers, converters, and applications.

TOON Specification Overview

Version

This document describes TOON Format Specification v1.0

Design Goals

TOON was designed with these core principles:

  1. Token Efficiency: Minimize token count for LLM consumption
  2. Human Readability: Maintain clear, scannable structure
  3. Machine Parseable: Enable simple, efficient parsing
  4. Type Safety: Support all common data types
  5. Simplicity: Keep syntax minimal and intuitive

File Extension

Standard extension: .toon

Character Encoding

UTF-8 encoding is required for all TOON files.

Basic Syntax Rules

1. Document Structure

A TOON document consists of:

  • Key-value pairs
  • Nested objects
  • Arrays
  • Comments
  • Whitespace

Root Level:

key1: value1
key2: value2

2. Key-Value Pairs

Syntax:

key: value

Rules:

  • Keys must start at the beginning of the line (or at the appropriate indentation level)
  • Keys cannot contain colons unless escaped
  • Whitespace after the colon is optional but recommended
  • Keys are case-sensitive

Valid examples:

name: Alice
user_name: Bob
userName: Charlie
user-name: David

Invalid examples:

name:Alice  # Works but not recommended
name :Alice # Space before colon not allowed
: Alice     # Key missing
name        # Colon missing

3. Indentation

Rules:

  • Use spaces only (no tabs)
  • Each indentation level is exactly 2 spaces
  • Inconsistent indentation is a syntax error

Valid:

parent:
  child1: value1
  child2:
    grandchild: value2

Invalid:

parent:
 child1: value1      # Only 1 space
   child2: value2    # 3 spaces (inconsistent)
    child3: value3   # 4 spaces (wrong level)

4. Whitespace

Significant Whitespace:

  • Indentation spaces
  • Newlines separating key-value pairs

Ignored Whitespace:

  • Trailing spaces at end of line
  • Blank lines (treated as formatting)
  • Multiple spaces between key and value

Data Types

String Values

Unquoted Strings:

name: Alice
status: active

Rules for unquoted strings:

  • No leading/trailing whitespace
  • Cannot start with special characters: [, {, #, |, >
  • Cannot be keywords: true, false, null

Quoted Strings:

title: "Hello World"
message: "Value with: special characters"
path: "C:\Users\Alice"

Rules for quoted strings:

  • Use double quotes only
  • Escape sequences supported: \", \\, \n, \t, \r
  • Required for: values with spaces, special characters, or that look like other types

Multi-line Strings:

Literal block (preserves newlines):

description: |
  First line
  Second line
  Third line

Folded block (collapses to single line):

description: >
  This will be
  folded into
  a single line

Numeric Values

Integer:

age: 30
count: 1000
negative: -42

Float:

price: 29.99
temperature: -3.14
scientific: 1.5e10

Rules:

  • No quotes around numbers
  • Leading zeros allowed: 007
  • Exponential notation supported: 1e5, 2.5E-3
  • Infinity and NaN not supported (use strings instead)

Boolean Values

Syntax:

isActive: true
isDeleted: false

Rules:

  • Lowercase only: true, false
  • No quotes
  • Case-sensitive (True, TRUE, False, FALSE are invalid)

Null Values

Syntax:

middleName: null

Rules:

  • Lowercase only: null
  • Represents absence of value
  • Different from empty string

Complex Structures

Nested Objects

Syntax:

parent:
  child1: value1
  child2: value2
  nested:
    deepChild: value3

Rules:

  • Parent key ends with colon
  • Child keys indented 2 spaces
  • Arbitrary nesting depth allowed
  • Each level must be consistently indented

Example with multiple levels:

level1:
  level2:
    level3:
      level4:
        deepValue: found

Arrays

Simple Arrays (Inline):

numbers: [1, 2, 3, 4, 5]
colors: [red, green, blue]
mixed: [1, "text", true, null]

Rules for inline arrays:

  • Use square brackets: []
  • Items separated by commas
  • Whitespace after commas optional
  • Can contain any data type

Arrays of Objects (Block):

users:
  - name: Alice
    age: 30
  - name: Bob
    age: 25

Rules for block arrays:

  • Each item starts with dash (-)
  • Dash at the indentation level of the array content
  • Properties of each object indented 2 spaces from dash
  • Dash and first property can be on same line

Alternative format:

users:
  - name: Alice
    age: 30
  
  - name: Bob
    age: 25

Blank lines between items allowed for readability

Nested Arrays:

matrix:
  - [1, 2, 3]
  - [4, 5, 6]
  - [7, 8, 9]

Mixed content arrays:

items:
  - simple value
  - name: Complex Object
    value: 42
  - [nested, array]

Empty Values

Empty Object:

emptyObject: {}

Empty Array:

emptyArray: []

Empty String:

emptyString: ""

Comments

Syntax:

# This is a comment
key: value  # Inline comment

Rules:

  • Comments start with #
  • Everything from # to end of line is ignored
  • Can appear at start of line or after a value
  • Multiple consecutive comment lines allowed

Multi-line comments:

# This is a multi-line comment
# that spans several lines
# to document something complex
key: value

Advanced Syntax Features

Anchors and Aliases

Anchors (References):

defaults: &defaults
  timeout: 30
  retries: 3

service1:
  <<: *defaults
  name: API

service2:
  <<: *defaults
  name: Worker

Rules:

  • Anchor defined with &anchorName
  • Reference with *anchorName
  • Merge with <<: operator

Type Hints

Explicit type specification:

# String that looks like number
id: !!str 12345

# Force integer parsing
value: !!int "42"

# Timestamp
created: !!timestamp 2025-11-10T10:30:00Z

Supported type hints:

  • !!str - String
  • !!int - Integer
  • !!float - Float
  • !!bool - Boolean
  • !!null - Null
  • !!timestamp - ISO 8601 timestamp

Special Characters in Keys

Keys with special characters:

"key with spaces": value
"key:with:colons": value
"key#with#hashes": value

Rules:

  • Quote keys containing: spaces, colons, hashes, brackets
  • Use double quotes only
  • Escape internal quotes: "key with \"quotes\""

Token Optimization Patterns

Pattern 1: Flat Structure

Optimized TOON:

userId: 12345
userName: Alice
userEmail: alice@example.com

Tokens: ~12

Nested alternative:

user:
  id: 12345
  name: Alice
  email: alice@example.com

Tokens: ~14

Recommendation: Use flat when keys are naturally prefixed.

Pattern 2: Short Keys

Optimized:

id: 12345
nm: Alice
em: alice@example.com

Readable:

userId: 12345
fullName: Alice
emailAddress: alice@example.com

Recommendation: Balance token savings with readability.

Pattern 3: Array Optimization

For simple values:

tags: [js, py, go]

More efficient than:

tags:
  - js
  - py
  - go

For complex objects:

users:
  - id: 1
    name: Alice
  - id: 2
    name: Bob

More efficient than inline:

users: [{id: 1, name: Alice}, {id: 2, name: Bob}]

Parsing Rules

Parsing Algorithm

  1. Tokenization: Split into lines and tokens
  2. Indentation Analysis: Determine structure depth
  3. Type Detection: Identify value types
  4. Structure Building: Create nested object representation

Type Detection Order

Parser should check in this order:

  1. null keyword
  2. Boolean keywords (true, false)
  3. Numeric format (int or float)
  4. Quoted string
  5. Unquoted string (default)

Error Handling

Required error detection:

  • Inconsistent indentation
  • Missing colons
  • Unclosed quotes
  • Invalid escape sequences
  • Duplicate keys at same level

Example errors:

# Error: Inconsistent indentation
parent:
  child1: value
   child2: value  # 3 spaces instead of 2

# Error: Duplicate keys
name: Alice
name: Bob  # Same key repeated

# Error: Invalid structure
key: value:
  nested: wrong  # Can't have both value and nested

Comparison with Other Formats

TOON vs JSON

Token Efficiency:

JSON: {"key": "value"}  # 5 tokens
TOON: key: value        # 3 tokens

Nesting:

JSON: {"a": {"b": "c"}}  # 9 tokens
TOON: a:                 # 4 tokens
        b: c

TOON vs YAML

TOON is similar to YAML but with key differences:

Simpler:

  • No complex anchors/merges (unless needed)
  • No multi-document support
  • No custom tags

More Consistent:

  • Always 2-space indentation
  • Limited syntax options
  • Fewer edge cases

Best Practices for Implementation

Parser Implementation

Recommended structure:

class TOONParser:
    def __init__(self):
        self.indent_level = 0
        self.indent_size = 2
    
    def parse(self, text):
        lines = self.tokenize(text)
        return self.build_structure(lines)
    
    def tokenize(self, text):
        # Split into lines, handle comments
        pass
    
    def build_structure(self, lines):
        # Create nested object structure
        pass
    
    def detect_type(self, value):
        # Determine value type
        pass

Serialization (Object to TOON)

Algorithm:

def to_toon(obj, indent=0):
    result = []
    indent_str = "  " * indent
    
    if isinstance(obj, dict):
        for key, value in obj.items():
            if isinstance(value, (dict, list)):
                result.append(f"{indent_str}{key}:")
                result.append(to_toon(value, indent + 1))
            else:
                result.append(f"{indent_str}{key}: {format_value(value)}")
    
    elif isinstance(obj, list):
        if all(isinstance(x, (str, int, float, bool)) for x in obj):
            # Inline array
            return f"[{', '.join(format_value(x) for x in obj)}]"
        else:
            # Block array
            for item in obj:
                result.append(f"{indent_str}-")
                result.append(to_toon(item, indent + 1))
    
    return "\n".join(result)

Validation

Required validations:

def validate_toon(text):
    checks = [
        check_indentation,
        check_syntax,
        check_types,
        check_structure,
    ]
    
    errors = []
    for check in checks:
        errors.extend(check(text))
    
    return errors

Grammar Definition (EBNF)

document       ::= (comment | key_value | blank_line)*
key_value      ::= indent key ":" value
key            ::= unquoted_string | quoted_string
value          ::= simple_value | object | array
simple_value   ::= string | number | boolean | null
string         ::= unquoted_string | quoted_string | multiline_string
number         ::= integer | float
boolean        ::= "true" | "false"
null           ::= "null"
object         ::= newline (indent key_value)+
array          ::= inline_array | block_array
inline_array   ::= "[" (value ("," value)*)? "]"
block_array    ::= newline (indent "- " (value | object))+
comment        ::= "#" [^\n]*
indent         ::= "  "*
multiline_string ::= "|" newline (indent string_line)+

Conclusion

TOON's specification prioritizes:

  • Simplicity: Easy to learn and implement
  • Efficiency: Minimal token usage
  • Clarity: Unambiguous parsing rules
  • Practicality: Covers common use cases

This specification provides enough detail for implementing compliant parsers while keeping the format accessible to developers.


For implementation examples and reference parsers, check our GitHub repository.