Understanding TOON Specification and Syntax

A deep dive into TOON's formal specification, syntax rules, and technical details. This guide serves as the authoritative reference for developers implementing TOON parsers, converters, and applications.

TOON Specification Overview

Version

This document describes TOON Format Specification v1.0

Design Goals

TOON was designed with these core principles:

Token Efficiency: Minimize token count for LLM consumption
Human Readability: Maintain clear, scannable structure
Machine Parseable: Enable simple, efficient parsing
Type Safety: Support all common data types
Simplicity: Keep syntax minimal and intuitive

File Extension

Standard extension: .toon

Character Encoding

UTF-8 encoding is required for all TOON files.

Basic Syntax Rules

1. Document Structure

A TOON document consists of:

Key-value pairs
Nested objects
Arrays
Comments
Whitespace

Root Level:

key1: value1
key2: value2

2. Key-Value Pairs

Syntax:

key: value

Rules:

Keys must start at the beginning of the line (or at the appropriate indentation level)
Keys cannot contain colons unless escaped
Whitespace after the colon is optional but recommended
Keys are case-sensitive

Valid examples:

name: Alice
user_name: Bob
userName: Charlie
user-name: David

Invalid examples:

name:Alice  # Works but not recommended
name :Alice # Space before colon not allowed
: Alice     # Key missing
name        # Colon missing

3. Indentation

Rules:

Use spaces only (no tabs)
Each indentation level is exactly 2 spaces
Inconsistent indentation is a syntax error

Valid:

parent:
  child1: value1
  child2:
    grandchild: value2

Invalid:

parent:
 child1: value1      # Only 1 space
   child2: value2    # 3 spaces (inconsistent)
    child3: value3   # 4 spaces (wrong level)

4. Whitespace

Significant Whitespace:

Indentation spaces
Newlines separating key-value pairs

Ignored Whitespace:

Trailing spaces at end of line
Blank lines (treated as formatting)
Multiple spaces between key and value

Data Types

String Values

Unquoted Strings:

name: Alice
status: active

Rules for unquoted strings:

No leading/trailing whitespace
Cannot start with special characters: [, {, #, |, >
Cannot be keywords: true, false, null

Quoted Strings:

title: "Hello World"
message: "Value with: special characters"
path: "C:\Users\Alice"

Rules for quoted strings:

Use double quotes only
Escape sequences supported: \", \\, \n, \t, \r
Required for: values with spaces, special characters, or that look like other types

Multi-line Strings:

Literal block (preserves newlines):

description: |
  First line
  Second line
  Third line

Folded block (collapses to single line):

description: >
  This will be
  folded into
  a single line

Numeric Values

Integer:

age: 30
count: 1000
negative: -42

Float:

price: 29.99
temperature: -3.14
scientific: 1.5e10

Rules:

No quotes around numbers
Leading zeros allowed: 007
Exponential notation supported: 1e5, 2.5E-3
Infinity and NaN not supported (use strings instead)

Boolean Values

Syntax:

isActive: true
isDeleted: false

Rules:

Lowercase only: true, false
No quotes
Case-sensitive (True, TRUE, False, FALSE are invalid)

Null Values

Syntax:

middleName: null

Rules:

Lowercase only: null
Represents absence of value
Different from empty string

Complex Structures

Nested Objects

Syntax:

parent:
  child1: value1
  child2: value2
  nested:
    deepChild: value3

Rules:

Parent key ends with colon
Child keys indented 2 spaces
Arbitrary nesting depth allowed
Each level must be consistently indented

Example with multiple levels:

level1:
  level2:
    level3:
      level4:
        deepValue: found

Arrays

Simple Arrays (Inline):

numbers: [1, 2, 3, 4, 5]
colors: [red, green, blue]
mixed: [1, "text", true, null]

Rules for inline arrays:

Use square brackets: []
Items separated by commas
Whitespace after commas optional
Can contain any data type

Arrays of Objects (Block):

users:
  - name: Alice
    age: 30
  - name: Bob
    age: 25

Rules for block arrays:

Each item starts with dash (-)
Dash at the indentation level of the array content
Properties of each object indented 2 spaces from dash
Dash and first property can be on same line

Alternative format:

users:
  - name: Alice
    age: 30
  
  - name: Bob
    age: 25

Blank lines between items allowed for readability

Nested Arrays:

matrix:
  - [1, 2, 3]
  - [4, 5, 6]
  - [7, 8, 9]

Mixed content arrays:

items:
  - simple value
  - name: Complex Object
    value: 42
  - [nested, array]

Empty Values

Empty Object:

emptyObject: {}

Empty Array:

emptyArray: []

Empty String:

emptyString: ""

Comments

Syntax:

# This is a comment
key: value  # Inline comment

Rules:

Comments start with #
Everything from # to end of line is ignored
Can appear at start of line or after a value
Multiple consecutive comment lines allowed

Multi-line comments:

# This is a multi-line comment
# that spans several lines
# to document something complex
key: value

Advanced Syntax Features

Anchors and Aliases

Anchors (References):

defaults: &defaults
  timeout: 30
  retries: 3

service1:
  <<: *defaults
  name: API

service2:
  <<: *defaults
  name: Worker

Rules:

Anchor defined with &anchorName
Reference with *anchorName
Merge with <<: operator

Type Hints

Explicit type specification:

# String that looks like number
id: !!str 12345

# Force integer parsing
value: !!int "42"

# Timestamp
created: !!timestamp 2025-11-10T10:30:00Z

Supported type hints:

!!str - String
!!int - Integer
!!float - Float
!!bool - Boolean
!!null - Null
!!timestamp - ISO 8601 timestamp

Special Characters in Keys

Keys with special characters:

"key with spaces": value
"key:with:colons": value
"key#with#hashes": value

Rules:

Quote keys containing: spaces, colons, hashes, brackets
Use double quotes only
Escape internal quotes: "key with \"quotes\""

Token Optimization Patterns

Pattern 1: Flat Structure

Optimized TOON:

userId: 12345
userName: Alice
userEmail: alice@example.com

Tokens: ~12

Nested alternative:

user:
  id: 12345
  name: Alice
  email: alice@example.com

Tokens: ~14

Recommendation: Use flat when keys are naturally prefixed.

Pattern 2: Short Keys

Optimized:

id: 12345
nm: Alice
em: alice@example.com

Readable:

userId: 12345
fullName: Alice
emailAddress: alice@example.com

Recommendation: Balance token savings with readability.

Pattern 3: Array Optimization

For simple values:

tags: [js, py, go]

More efficient than:

tags:
  - js
  - py
  - go

For complex objects:

users:
  - id: 1
    name: Alice
  - id: 2
    name: Bob

More efficient than inline:

users: [{id: 1, name: Alice}, {id: 2, name: Bob}]

Parsing Rules

Parsing Algorithm

Tokenization: Split into lines and tokens
Indentation Analysis: Determine structure depth
Type Detection: Identify value types
Structure Building: Create nested object representation

Type Detection Order

Parser should check in this order:

null keyword
Boolean keywords (true, false)
Numeric format (int or float)
Quoted string
Unquoted string (default)

Error Handling

Required error detection:

Inconsistent indentation
Missing colons
Unclosed quotes
Invalid escape sequences
Duplicate keys at same level

Example errors:

# Error: Inconsistent indentation
parent:
  child1: value
   child2: value  # 3 spaces instead of 2

# Error: Duplicate keys
name: Alice
name: Bob  # Same key repeated

# Error: Invalid structure
key: value:
  nested: wrong  # Can't have both value and nested

Comparison with Other Formats

TOON vs JSON

Token Efficiency:

JSON: {"key": "value"}  # 5 tokens
TOON: key: value        # 3 tokens

Nesting:

JSON: {"a": {"b": "c"}}  # 9 tokens
TOON: a:                 # 4 tokens
        b: c

TOON vs YAML

TOON is similar to YAML but with key differences:

Simpler:

No complex anchors/merges (unless needed)
No multi-document support
No custom tags

More Consistent:

Always 2-space indentation
Limited syntax options
Fewer edge cases

Best Practices for Implementation

Parser Implementation

Recommended structure:

class TOONParser:
    def __init__(self):
        self.indent_level = 0
        self.indent_size = 2
    
    def parse(self, text):
        lines = self.tokenize(text)
        return self.build_structure(lines)
    
    def tokenize(self, text):
        # Split into lines, handle comments
        pass
    
    def build_structure(self, lines):
        # Create nested object structure
        pass
    
    def detect_type(self, value):
        # Determine value type
        pass

Serialization (Object to TOON)

Algorithm:

def to_toon(obj, indent=0):
    result = []
    indent_str = "  " * indent
    
    if isinstance(obj, dict):
        for key, value in obj.items():
            if isinstance(value, (dict, list)):
                result.append(f"{indent_str}{key}:")
                result.append(to_toon(value, indent + 1))
            else:
                result.append(f"{indent_str}{key}: {format_value(value)}")
    
    elif isinstance(obj, list):
        if all(isinstance(x, (str, int, float, bool)) for x in obj):
            # Inline array
            return f"[{', '.join(format_value(x) for x in obj)}]"
        else:
            # Block array
            for item in obj:
                result.append(f"{indent_str}-")
                result.append(to_toon(item, indent + 1))
    
    return "\n".join(result)

Validation

Required validations:

def validate_toon(text):
    checks = [
        check_indentation,
        check_syntax,
        check_types,
        check_structure,
    ]
    
    errors = []
    for check in checks:
        errors.extend(check(text))
    
    return errors

Grammar Definition (EBNF)

document       ::= (comment | key_value | blank_line)*
key_value      ::= indent key ":" value
key            ::= unquoted_string | quoted_string
value          ::= simple_value | object | array
simple_value   ::= string | number | boolean | null
string         ::= unquoted_string | quoted_string | multiline_string
number         ::= integer | float
boolean        ::= "true" | "false"
null           ::= "null"
object         ::= newline (indent key_value)+
array          ::= inline_array | block_array
inline_array   ::= "[" (value ("," value)*)? "]"
block_array    ::= newline (indent "- " (value | object))+
comment        ::= "#" [^\n]*
indent         ::= "  "*
multiline_string ::= "|" newline (indent string_line)+

Conclusion

TOON's specification prioritizes:

Simplicity: Easy to learn and implement
Efficiency: Minimal token usage
Clarity: Unambiguous parsing rules
Practicality: Covers common use cases

This specification provides enough detail for implementing compliant parsers while keeping the format accessible to developers.

For implementation examples and reference parsers, check our GitHub repository.