A deep dive into TOON's formal specification, syntax rules, and technical details. This guide serves as the authoritative reference for developers implementing TOON parsers, converters, and applications.
TOON Specification Overview
Version
This document describes TOON Format Specification v1.0
Design Goals
TOON was designed with these core principles:
- Token Efficiency: Minimize token count for LLM consumption
- Human Readability: Maintain clear, scannable structure
- Machine Parseable: Enable simple, efficient parsing
- Type Safety: Support all common data types
- Simplicity: Keep syntax minimal and intuitive
File Extension
Standard extension: .toon
Character Encoding
UTF-8 encoding is required for all TOON files.
Basic Syntax Rules
1. Document Structure
A TOON document consists of:
- Key-value pairs
- Nested objects
- Arrays
- Comments
- Whitespace
Root Level:
key1: value1
key2: value2
2. Key-Value Pairs
Syntax:
key: value
Rules:
- Keys must start at the beginning of the line (or at the appropriate indentation level)
- Keys cannot contain colons unless escaped
- Whitespace after the colon is optional but recommended
- Keys are case-sensitive
Valid examples:
name: Alice
user_name: Bob
userName: Charlie
user-name: David
Invalid examples:
name:Alice # Works but not recommended
name :Alice # Space before colon not allowed
: Alice # Key missing
name # Colon missing
3. Indentation
Rules:
- Use spaces only (no tabs)
- Each indentation level is exactly 2 spaces
- Inconsistent indentation is a syntax error
Valid:
parent:
child1: value1
child2:
grandchild: value2
Invalid:
parent:
child1: value1 # Only 1 space
child2: value2 # 3 spaces (inconsistent)
child3: value3 # 4 spaces (wrong level)
4. Whitespace
Significant Whitespace:
- Indentation spaces
- Newlines separating key-value pairs
Ignored Whitespace:
- Trailing spaces at end of line
- Blank lines (treated as formatting)
- Multiple spaces between key and value
Data Types
String Values
Unquoted Strings:
name: Alice
status: active
Rules for unquoted strings:
- No leading/trailing whitespace
- Cannot start with special characters:
[,{,#,|,> - Cannot be keywords:
true,false,null
Quoted Strings:
title: "Hello World"
message: "Value with: special characters"
path: "C:\Users\Alice"
Rules for quoted strings:
- Use double quotes only
- Escape sequences supported:
\",\\,\n,\t,\r - Required for: values with spaces, special characters, or that look like other types
Multi-line Strings:
Literal block (preserves newlines):
description: |
First line
Second line
Third line
Folded block (collapses to single line):
description: >
This will be
folded into
a single line
Numeric Values
Integer:
age: 30
count: 1000
negative: -42
Float:
price: 29.99
temperature: -3.14
scientific: 1.5e10
Rules:
- No quotes around numbers
- Leading zeros allowed:
007 - Exponential notation supported:
1e5,2.5E-3 - Infinity and NaN not supported (use strings instead)
Boolean Values
Syntax:
isActive: true
isDeleted: false
Rules:
- Lowercase only:
true,false - No quotes
- Case-sensitive (True, TRUE, False, FALSE are invalid)
Null Values
Syntax:
middleName: null
Rules:
- Lowercase only:
null - Represents absence of value
- Different from empty string
Complex Structures
Nested Objects
Syntax:
parent:
child1: value1
child2: value2
nested:
deepChild: value3
Rules:
- Parent key ends with colon
- Child keys indented 2 spaces
- Arbitrary nesting depth allowed
- Each level must be consistently indented
Example with multiple levels:
level1:
level2:
level3:
level4:
deepValue: found
Arrays
Simple Arrays (Inline):
numbers: [1, 2, 3, 4, 5]
colors: [red, green, blue]
mixed: [1, "text", true, null]
Rules for inline arrays:
- Use square brackets:
[] - Items separated by commas
- Whitespace after commas optional
- Can contain any data type
Arrays of Objects (Block):
users:
- name: Alice
age: 30
- name: Bob
age: 25
Rules for block arrays:
- Each item starts with dash (
-) - Dash at the indentation level of the array content
- Properties of each object indented 2 spaces from dash
- Dash and first property can be on same line
Alternative format:
users:
- name: Alice
age: 30
- name: Bob
age: 25
Blank lines between items allowed for readability
Nested Arrays:
matrix:
- [1, 2, 3]
- [4, 5, 6]
- [7, 8, 9]
Mixed content arrays:
items:
- simple value
- name: Complex Object
value: 42
- [nested, array]
Empty Values
Empty Object:
emptyObject: {}
Empty Array:
emptyArray: []
Empty String:
emptyString: ""
Comments
Syntax:
# This is a comment
key: value # Inline comment
Rules:
- Comments start with
# - Everything from
#to end of line is ignored - Can appear at start of line or after a value
- Multiple consecutive comment lines allowed
Multi-line comments:
# This is a multi-line comment
# that spans several lines
# to document something complex
key: value
Advanced Syntax Features
Anchors and Aliases
Anchors (References):
defaults: &defaults
timeout: 30
retries: 3
service1:
<<: *defaults
name: API
service2:
<<: *defaults
name: Worker
Rules:
- Anchor defined with
&anchorName - Reference with
*anchorName - Merge with
<<:operator
Type Hints
Explicit type specification:
# String that looks like number
id: !!str 12345
# Force integer parsing
value: !!int "42"
# Timestamp
created: !!timestamp 2025-11-10T10:30:00Z
Supported type hints:
!!str- String!!int- Integer!!float- Float!!bool- Boolean!!null- Null!!timestamp- ISO 8601 timestamp
Special Characters in Keys
Keys with special characters:
"key with spaces": value
"key:with:colons": value
"key#with#hashes": value
Rules:
- Quote keys containing: spaces, colons, hashes, brackets
- Use double quotes only
- Escape internal quotes:
"key with \"quotes\""
Token Optimization Patterns
Pattern 1: Flat Structure
Optimized TOON:
userId: 12345
userName: Alice
userEmail: alice@example.com
Tokens: ~12
Nested alternative:
user:
id: 12345
name: Alice
email: alice@example.com
Tokens: ~14
Recommendation: Use flat when keys are naturally prefixed.
Pattern 2: Short Keys
Optimized:
id: 12345
nm: Alice
em: alice@example.com
Readable:
userId: 12345
fullName: Alice
emailAddress: alice@example.com
Recommendation: Balance token savings with readability.
Pattern 3: Array Optimization
For simple values:
tags: [js, py, go]
More efficient than:
tags:
- js
- py
- go
For complex objects:
users:
- id: 1
name: Alice
- id: 2
name: Bob
More efficient than inline:
users: [{id: 1, name: Alice}, {id: 2, name: Bob}]
Parsing Rules
Parsing Algorithm
- Tokenization: Split into lines and tokens
- Indentation Analysis: Determine structure depth
- Type Detection: Identify value types
- Structure Building: Create nested object representation
Type Detection Order
Parser should check in this order:
nullkeyword- Boolean keywords (
true,false) - Numeric format (int or float)
- Quoted string
- Unquoted string (default)
Error Handling
Required error detection:
- Inconsistent indentation
- Missing colons
- Unclosed quotes
- Invalid escape sequences
- Duplicate keys at same level
Example errors:
# Error: Inconsistent indentation
parent:
child1: value
child2: value # 3 spaces instead of 2
# Error: Duplicate keys
name: Alice
name: Bob # Same key repeated
# Error: Invalid structure
key: value:
nested: wrong # Can't have both value and nested
Comparison with Other Formats
TOON vs JSON
Token Efficiency:
JSON: {"key": "value"} # 5 tokens
TOON: key: value # 3 tokens
Nesting:
JSON: {"a": {"b": "c"}} # 9 tokens
TOON: a: # 4 tokens
b: c
TOON vs YAML
TOON is similar to YAML but with key differences:
Simpler:
- No complex anchors/merges (unless needed)
- No multi-document support
- No custom tags
More Consistent:
- Always 2-space indentation
- Limited syntax options
- Fewer edge cases
Best Practices for Implementation
Parser Implementation
Recommended structure:
class TOONParser:
def __init__(self):
self.indent_level = 0
self.indent_size = 2
def parse(self, text):
lines = self.tokenize(text)
return self.build_structure(lines)
def tokenize(self, text):
# Split into lines, handle comments
pass
def build_structure(self, lines):
# Create nested object structure
pass
def detect_type(self, value):
# Determine value type
pass
Serialization (Object to TOON)
Algorithm:
def to_toon(obj, indent=0):
result = []
indent_str = " " * indent
if isinstance(obj, dict):
for key, value in obj.items():
if isinstance(value, (dict, list)):
result.append(f"{indent_str}{key}:")
result.append(to_toon(value, indent + 1))
else:
result.append(f"{indent_str}{key}: {format_value(value)}")
elif isinstance(obj, list):
if all(isinstance(x, (str, int, float, bool)) for x in obj):
# Inline array
return f"[{', '.join(format_value(x) for x in obj)}]"
else:
# Block array
for item in obj:
result.append(f"{indent_str}-")
result.append(to_toon(item, indent + 1))
return "\n".join(result)
Validation
Required validations:
def validate_toon(text):
checks = [
check_indentation,
check_syntax,
check_types,
check_structure,
]
errors = []
for check in checks:
errors.extend(check(text))
return errors
Grammar Definition (EBNF)
document ::= (comment | key_value | blank_line)*
key_value ::= indent key ":" value
key ::= unquoted_string | quoted_string
value ::= simple_value | object | array
simple_value ::= string | number | boolean | null
string ::= unquoted_string | quoted_string | multiline_string
number ::= integer | float
boolean ::= "true" | "false"
null ::= "null"
object ::= newline (indent key_value)+
array ::= inline_array | block_array
inline_array ::= "[" (value ("," value)*)? "]"
block_array ::= newline (indent "- " (value | object))+
comment ::= "#" [^\n]*
indent ::= " "*
multiline_string ::= "|" newline (indent string_line)+
Conclusion
TOON's specification prioritizes:
- Simplicity: Easy to learn and implement
- Efficiency: Minimal token usage
- Clarity: Unambiguous parsing rules
- Practicality: Covers common use cases
This specification provides enough detail for implementing compliant parsers while keeping the format accessible to developers.
For implementation examples and reference parsers, check our GitHub repository.