Skip to content

Latest commit

 

History

History
1730 lines (1180 loc) · 64.7 KB

File metadata and controls

1730 lines (1180 loc) · 64.7 KB

Lython Python Subset Specification

This document defines Lython as an embeddable, contained-by-design runtime for a strict subset of Python.

The intent is not to design a new scripting language that merely looks like Python. The intent is to provide a runtime where ordinary short Python scripts for supported file and text workflows typically work unchanged.

Lython is therefore defined by the following principle:

  • Python compatibility is the default.

When Lython diverges from Python, the divergence must be:

  • explicit
  • narrow
  • justified by safety, determinism, or embeddability
  • surfaced as a clear error rather than as silent behavioral drift

This document defines the subset boundary normatively. It does not attempt to restate lower-level grammar machinery that belongs to parser-specific documentation.

The exact parser implementation strategy is not part of the language contract. This specification does not require a bespoke parser or source generator.


0. Purpose

Lython exists to run small Python scripts in a controlled environment, primarily for tasks such as:

  • reading text files
  • reading structured delimited text files
  • transforming text
  • transforming structured in-memory data derived from text files
  • creating or overwriting files
  • renaming or deleting files
  • scanning directories
  • producing derived output files
  • applying repeated edits across a codebase

The target scripts are typically:

  • one-shot
  • short
  • generated by an automation agent or written by a supervising human
  • pragmatic rather than architecturally elaborate

Representative examples include:

  • reading a structured delimited text file
  • parsing it in-process
  • transposing rows and columns
  • writing the transformed output to another file
  • reading many source files and applying regex-based edits

Within the supported subset, Lython must behave according to Python semantics. Outside that subset, Lython must fail explicitly rather than drift semantically.


1. Core Contract

Lython is a restricted Python runtime, not a Python-inspired DSL.

This means:

  1. The source language should use Python syntax.
  2. Supported constructs must behave like Python unless explicitly specified otherwise.
  3. Unsupported constructs must fail clearly.
  4. The runtime must not silently reinterpret Python code with non-Python semantics.

Lython is intended to be:

  • a strict subset of Python
  • an extension of the Lark foundation where applicable

If a construct is already defined by the adopted Lark foundation, this specification does not need to restate it unless Lython changes, restricts, or extends its meaning.

The implementation may be independent from any existing Python runtime, but the language contract is defined in Python terms first.


2. Design Goals

Lython must be:

  • embeddable
  • deterministic enough for automation
  • safe by design
  • text-and-file-workflow oriented
  • small enough to reason about
  • compatible enough that a coding agent can usually write ordinary Python

Lython is not trying to become a general-purpose Python runtime.


3. Non-Goals

The following are explicitly outside the initial scope:

  • full Python compatibility
  • third-party package installation
  • dynamic imports from disk
  • native extensions
  • unrestricted process execution
  • ambient shell command execution outside host mediation
  • unrestricted networking
  • unrestricted reflection
  • concurrency primitives intended for long-running applications
  • becoming a general-purpose application runtime

4. Python Compatibility Strategy

4.1 Compatibility Target

Lython must target a clearly identified version family of Python syntax and core semantics.

The initial target is:

  • a modern Python 3 subset

The exact reference version must be pinned by the implementation. The design intent is contemporary Python 3, not Python 2 and not an invented hybrid.

4.2 Default Rule

If a construct is part of normal Python and is supported by Lython, it must behave like Python.

Examples include:

  • variable assignment
  • string literals
  • integer arithmetic
  • truthiness
  • if
  • for
  • function definition and calls
  • lists and dictionaries

4.3 Divergence Rule

If Lython does not support a Python feature, it must:

  • reject it explicitly, preferably during parse or bind time
  • explain that the feature is unsupported
  • avoid partial emulation with surprising semantics

Unsupported Python must fail, not degrade into something “sort of similar”.

Unsupported constructs must be detected before execution whenever practical.

The order of rejection is:

  1. parse time
  2. semantic validation / binding time
  3. runtime only when earlier detection is impractical

5. Safety and Containment

5.1 No Ambient Authority

Scripts must not gain access to the outside world implicitly.

Lython must not provide ambient access to:

  • the host-managed path space
  • the process environment
  • network APIs
  • clocks or random sources unless explicitly exposed
  • arbitrary module loading
  • arbitrary code evaluation

Every effect must go through explicit host-provided capabilities or builtins backed by those capabilities.

5.2 Host Boundary

The host application controls what a script can do.

The runtime must expose a narrow host interface for controlled operations over a host-managed path space, such as:

  • reading a UTF-8 text resource
  • writing a UTF-8 text resource
  • appending to a UTF-8 text resource
  • listing entries under a path
  • querying path metadata
  • copying, moving, and deleting paths

5.3 Explicit Failure

If a script attempts an unavailable capability or unsupported construct, the failure must be explicit and deterministic.


6. Primary Use Case Bias

Lython is optimized for coding-agent workflows.

That means the language should be especially convenient for:

  • file reads and writes
  • string manipulation
  • regex-based text manipulation
  • lightweight structured-text processing and tabular reshaping
  • line-based edits
  • small repository-wide loops
  • guard-style checks before mutating files

The language does not need to optimize for:

  • scientific computing
  • long-lived services
  • large object models
  • interactive REPL-driven programming

7. Execution Model

7.1 Script-Oriented Runtime

The basic execution unit is a Python script.

A script:

  • is UTF-8 text
  • is parsed using Python syntax rules as constrained by this specification
  • is parsed directly or compiled into an internal artifact before execution
  • executes against a host and optional globals

7.2 Phases

The runtime must conceptually separate:

  1. Parse
  2. Semantic validation / binding
  3. Compilation to internal form
  4. Execution

Errors must identify the phase in which they occur.

7.3 Reuse

The runtime must support:

  • parse and run
  • compile once, run many times

Compiled scripts must be reusable across multiple host instances if the required capabilities are available.


8. Language Surface

This section is normative.

Unless explicitly listed as supported, a Python construct must be treated as unsupported.

For syntax, expression forms, and lower-level language constructs already defined by the adopted Lark foundation, Lython inherits that foundation rather than restating it here.

This section therefore focuses on the additional Python-surface commitments that are not merely inherited from Lark.

8.1 Supported Python-Surface Additions

The initial subset must support:

  • named function definitions
  • positional parameters
  • positional calls
  • return values
  • return without a value
  • simple assignment
  • expression statements
  • if / elif / else
  • for
  • while
  • try / except / finally
  • raise
  • break
  • continue
  • pass

The initial subset must support:

  • nested blocks
  • break and continue inside loops
  • recursion

The initial subset does not support:

  • loop else
  • nested function definitions

8.2 Supported Import Surface

The runtime supports an explicit allowlist of built-in modules and host-allowed local script modules. The built-in allowlist includes the standard-library subsets specified in this document, including argparse, collections, copy, csv, dataclasses, datetime, decimal, difflib, fnmatch, functools, glob, itertools, json, math, operator, os, pathlib, pkgutil, random, re, shutil, statistics, subprocess when host-enabled, sys, typing, and related contained helpers.

import ..., import ... as ..., and from ... import ... are supported for allowlisted modules and members. The runtime must reject imports outside the allowlist and local imports not explicitly permitted by the embedder.

The datetime subset is fixed-offset and host-mediated. date, time, datetime, timedelta, timezone, tzinfo, ISO parsing, formatting, and ordinary arithmetic/comparison behavior follow CPython shapes where practical. Clock- and local-offset-sensitive operations such as today(), now(), fromtimestamp(), naive timestamp(), and astimezone() must read time through the configured host rather than the ambient process. IANA timezone databases and ambient locale/timezone discovery are outside the supported surface.

The statistics subset includes common averages, medians, modes, variance, standard deviation, quantiles, covariance, correlation, linear regression, and NormalDist. Supported real inputs may be coerced to double for aggregate helpers, including inputs from the contained decimal.Decimal subset. Empty data, insufficient sample data, invalid quantile parameters, degenerate correlation/regression inputs, and zero-sigma distribution operations must fail explicitly with CPython-shaped exceptions. NormalDist.samples(...) is deterministic when no seed is supplied; KDE helpers are outside the contained surface and must fail explicitly.

The random subset is deterministic unless a future host-provided entropy abstraction is explicitly added. Module-level helpers and random.Random instances expose independent deterministic state, seed, getstate, setstate, randrange, randint, choice, choices, shuffle, sample, getrandbits, randbytes, and common distribution helpers. sample(..., counts=...) and keyword-shaped calls are supported where CPython commonly accepts them. Weight validation for choices must reject mismatched lengths, non-monotonic cumulative weights, non-finite weights, and all-zero totals. SystemRandom must fail explicitly rather than reading ambient system entropy.

The copy subset includes copy.copy, copy.deepcopy(x, memo=None), copy.replace(obj, **changes), Error/error, and dispatch_table. Deep-copy operations must preserve cycles and honor explicit memo dictionaries using contained object identity keys. copy.replace supports dataclass instances, namedtuple-like values, and objects exposing __replace__. __copy__ and __deepcopy__(memo) hooks are supported; pickle-oriented __reduce__, __reduce_ex__, __getstate__, and __setstate__ protocols must fail explicitly unless a direct copy hook handles the object.

The shutil subset includes host-mediated copyfile(src, dst, *, follow_symlinks=True), copy(src, dst, *, follow_symlinks=True), move(src, dst, copy_function=copy), text-only copyfileobj(fsrc, fdst, length=0), Error, and SameFileError. copy and move must treat an existing directory destination as dst / os.path.basename(src). copyfile and copy may overwrite an existing file destination through host-mediated remove plus copy. copy2 must fail explicitly because the current host contract cannot preserve file metadata. Symlink behavior, custom move copy functions, recursive tree helpers, archive helpers, ownership/permission helpers, and raw host inspection helpers remain outside the contained path model unless separately specified.

The functools subset includes WRAPPER_ASSIGNMENTS, WRAPPER_UPDATES, update_wrapper, wraps, total_ordering, reduce, partial, partialmethod, cmp_to_key, lru_cache, cache, cached_property, simple singledispatch and singledispatchmethod registration, and recursive_repr. Wrapper helpers copy supported dynamic metadata and preserve __wrapped__; cache decorators expose cache_info, cache_clear, and cache_parameters. Cache keys follow Lython's hashable-value rules. Dispatch helpers support explicit class/type registration; annotation-only registration and placeholder partial application must fail explicitly.

8.3 Supported Literal Surface

The initial subset must support exactly the following literal forms:

  • decimal integer literals
  • decimal integer literals with _ separators
  • float literals
  • float literals with exponent notation
  • single-quoted string literals
  • double-quoted string literals
  • triple-quoted string literals
  • raw string literals
  • True
  • False
  • None
  • list literals
  • dictionary literals
  • tuple literals

The initial subset does not support:

  • bytes literals
  • set literals
  • formatted string literals
  • complex-number literals

8.4 Unsupported Python Surface

The following Python features are outside the initial subset and must be rejected explicitly if encountered:

  • classes
  • comprehensions
  • generator expressions
  • lambdas
  • decorators
  • with
  • yield
  • async
  • await
  • pattern matching
  • assert
  • del
  • keyword arguments
  • default parameter values
  • variadic parameters
  • conditional expressions

These features are unsupported in the initial subset. If any of them are added later, they must not be half-supported.


9. Semantic Compatibility

This section is normative.

It defines runtime semantics that are not supplied by the adopted Lark foundation.

9.1 Truthiness

Truthiness for supported value types must follow Python semantics.

In particular, the following supported values are false:

  • None
  • False
  • numeric zero
  • empty strings
  • empty lists
  • empty tuples
  • empty dictionaries

Other supported values are true unless Python semantics for that supported value type specify otherwise.

9.2 Evaluation Order

Expression evaluation order for supported constructs must follow Python semantics.

Boolean operators for the supported subset must follow Python semantics, including:

  • and
  • or
  • not

and and or must short-circuit exactly as in Python.

9.3 None, Comparison, Membership, and Indexing

The initial subset must include the value None.

return without a value must produce None.

The initial subset must support Python comparison semantics for the supported subset, including:

  • ==
  • !=
  • <
  • <=
  • >
  • >=
  • is
  • is not
  • in
  • not in

The initial subset does not support chained comparisons.

Membership semantics must follow Python for the supported subset:

  • string membership tests substring containment
  • list membership tests element equality
  • tuple membership tests element equality
  • dictionary membership tests keys

The initial subset must support indexing with Python semantics for the supported subset:

  • string indexing
  • list indexing
  • tuple indexing
  • dictionary indexing by key

Negative indices for strings, lists, and tuples must behave as in Python.

The initial subset must support Python-style slicing for strings, lists, and tuples. Mutable list slices must also support assignment and deletion as described in section 11.6.

9.3.1 Assignment Semantics

Assignment behavior for supported target forms must follow Python semantics.

The supported ordinary assignment targets are:

  • simple names
  • list, tuple, and dictionary subscripts
  • mutable list slices
  • object attributes

Chained assignment may assign to any supported ordinary assignment target. The right-hand expression must be evaluated once, then stored into each target from left to right.

Flat unpacking assignment with identifier targets is supported, including at most one starred identifier target. Parenthesized and list-shaped assignment targets such as (a, b) = row, [a, b] = row, and (target) = value are not supported. Nested destructuring such as (a, (b, c)) = row is not supported.

Annotated assignment is supported for simple names only. Annotated attribute and subscript targets such as obj.value: T = x and items[0]: T = x are not supported.

Augmented assignment is supported for simple names, subscript targets, mutable list slice targets, and object attribute targets. The supported operators are:

  • +=
  • -=
  • *=
  • /=
  • //=
  • %=
  • **=
  • |=
  • &=
  • ^=
  • <<=
  • >>=

Augmented assignment must evaluate target receivers, indexes, slice bounds, and member bases once, read the current target value before evaluating the right-hand expression, apply the corresponding operation, and store the result back to the original target. Mutable values may preserve Python-like in-place behavior where Lython supports that value type, including list += alias behavior and set update behavior for |=, &=, and ^=.

Assignment expressions using := are supported for simple-name targets only. They must assign the evaluated right-hand value to the name and produce that same value as the expression result. Non-name assignment-expression targets must be rejected explicitly.

9.4 Numeric Semantics

Numeric behavior for the supported subset must follow Python semantics.

The initial subset supports integer values, floating-point values, and boolean values.

int must have Python integer semantics:

  • arbitrary precision
  • no fixed-width overflow
  • exact integer comparisons
  • exact integer string conversion

bool must have Python boolean semantics:

  • the values are exactly True and False
  • bool participates in truthiness exactly as in Python
  • when used in numeric contexts, False behaves as 0 and True behaves as 1

float must have Python floating-point semantics for the supported subset.

The initial subset must support numeric operators with Python semantics for the supported subset, including:

  • +
  • -
  • *
  • /
  • //
  • %

Division and remainder semantics must follow Python exactly:

  • / is true division
  • // is floor division
  • % satisfies Python remainder semantics, including for negative operands

The initial subset must support mixed int and float arithmetic for these operators with Python result semantics.

The initial subset must support numeric comparisons between int and float with Python semantics.

The identity a == (a // b) * b + (a % b) must hold for supported numeric values when b != 0 and Python defines the expression.

Division or remainder by zero must raise the Python-appropriate runtime error for the supported subset.

The runtime must not silently fall back to host-language numeric overflow, truncation, or remainder rules when they differ from Python.

9.5 Function Semantics

Supported function semantics must follow Python semantics for the supported subset.

In particular, this includes:

  • positional argument passing
  • local variable behavior
  • lexical name lookup behavior for the supported scope model
  • return-value behavior
  • recursion behavior for supported functions

Additional function features are outside the initial subset. If they are added later, they must be implemented faithfully or rejected explicitly.

9.6 Collections

Collection behavior must follow modern Python semantics for supported operations.

In particular:

  • lists are ordered
  • tuples are ordered
  • dictionaries preserve insertion order

Tuple behavior must follow Python semantics for the supported subset.

The initial subset must support:

  • tuple indexing
  • tuple iteration
  • tuple truthiness
  • tuple equality and inequality
  • tuple membership with in and not in

list(iterable), tuple(iterable), and dict(iterable_of_pairs) must follow Python semantics for the supported subset.

The collections module supports the common contained container helpers:

  • defaultdict, including mapping/keyword initialization and mutable default_factory
  • Counter, including total(), keyword updates/subtractions, positive-count arithmetic, unary +/-, &, |, and missing-count equality semantics
  • deque, including maxlen, bounded eviction, indexing, assignment, index, insert, remove, rotate, and copy/clear/reverse helpers
  • namedtuple, including generated callable tuple-like records, field attributes, _fields, _field_defaults, _make, _asdict, and _replace
  • insertion-ordered OrderedDict as a dict-shaped alias
  • ChainMap for layered lookup with first-map writes/deletes
  • inert/importable collections.abc names for compatibility

UserDict, UserList, and UserString are explicitly unsupported unless a future object-wrapper model is specified.

9.7 Strings

Strings must behave as Python Unicode strings, not as byte arrays.

9.8 Iteration

The initial subset must support iteration over:

  • strings
  • lists
  • tuples
  • the results of range(...)
  • the results of enumerate(...)
  • csv.reader(...) results

Dictionary iteration must follow Python semantics for the supported subset.

range(...) supports exactly:

  • range(stop)
  • range(start, stop)
  • range(start, stop, step)

enumerate(...) supports exactly:

  • enumerate(iterable)
  • enumerate(iterable, start)

The itertools module supports common lazy iterator helpers:

  • chain, including chain.from_iterable
  • islice, product, and zip_longest
  • count, repeat, and cycle
  • combinations, combinations_with_replacement, and permutations
  • accumulate, including func=None addition and keyword-only initial
  • compress, filterfalse, dropwhile, and takewhile
  • starmap, pairwise, groupby, tee, and batched

Unbounded forms such as count, repeat(..., times=None), and cycle must remain lazy. Helpers that cache values by design, including cycle, combinatorics, groupby group iterators, and tee, must remain subject to execution and memory limits.

9.9 Exceptions

The initial subset must support Python-style exception propagation.

The initial subset must support at least the following exception classes:

  • Exception
  • ValueError
  • KeyError
  • IndexError
  • RuntimeError

The initial subset must support:

  • raise ExceptionType("message")
  • raise variable
  • try / except ExceptionType as name
  • finally

The initial subset does not support:

  • bare except
  • exception tuples in except
  • else on try

The initial subset does not support user-defined exception classes.

An except ExceptionType as name clause must match exactly the named builtin exception class. Subclass-based matching is not part of the initial subset.

If an uncaught exception reaches the top level of the script, execution must fail.

The initial subset must raise the following exception types for the following runtime conditions:

  • division or remainder by zero: ValueError
  • invalid integer conversion through int(...): ValueError
  • invalid floating-point conversion through float(...): ValueError
  • invalid regex pattern or unsupported regex syntax inside re: re.error
  • invalid JSON text: json.JSONDecodeError
  • invalid structured-delimited-text input accepted by the csv subset: csv.Error
  • missing dictionary key through indexing: KeyError
  • out-of-range string, list, or tuple index: IndexError
  • host path/resource failures and unsupported host operations: RuntimeError

10. Scope and Variables

The scope model must follow Python semantics for the supported subset.

10.1 Required Scope Levels

The runtime must support:

  • module/script scope
  • function-local scope

10.2 Scope Directives

global is supported in module and function bodies. A function-level global directive makes reads, writes, deletes, imports, function definitions, class definitions, loop targets, unpacking targets, context-manager aliases, exception aliases, pattern captures, and assignment expressions bind against module scope.

nonlocal is supported in nested function bodies when an enclosing function declares a local binding for the named value. A nonlocal directive must bind to the nearest enclosing function scope that owns the name, and the same binding forms as global must target that enclosing scope.

Scope directives inside class bodies are outside the supported subset and must be reported explicitly.

10.3 Principle

The scope model must not invent a simpler but incompatible alternative if ordinary Python semantics can be preserved reasonably.


11. Builtins and Standard Library

Lython exposes a strict subset of Python's builtin and standard-library surface.

Supported builtins and supported standard-library modules must:

  • use Python names
  • follow Python semantics for the supported subset
  • fail explicitly when a script requests behavior outside the supported subset

Lython must not introduce alternate names for supported Python builtins or supported Python standard-library functions.

11.1 Supported Builtins

The initial builtin environment must include exactly the following builtins and builtin exception classes:

  • len
  • range
  • enumerate
  • zip
  • iter
  • next
  • reversed
  • map
  • filter
  • slice
  • sorted
  • any
  • all
  • min
  • max
  • sum
  • abs
  • pow
  • round
  • divmod
  • bin
  • oct
  • hex
  • chr
  • ord
  • callable
  • hash
  • bool
  • str
  • repr
  • ascii
  • format
  • int
  • float
  • bytes
  • list
  • tuple
  • dict
  • set
  • object
  • type
  • property
  • staticmethod
  • classmethod
  • super
  • isinstance
  • issubclass
  • getattr
  • hasattr
  • setattr
  • delattr
  • dir
  • vars
  • open
  • print
  • input
  • BaseException
  • Exception
  • ArithmeticError
  • LookupError
  • UnicodeError
  • Warning
  • TypeError
  • ValueError
  • KeyError
  • IndexError
  • RuntimeError
  • AssertionError
  • ImportError
  • ModuleNotFoundError
  • NameError
  • AttributeError
  • SyntaxError
  • FileNotFoundError
  • FileExistsError
  • IsADirectoryError
  • NotADirectoryError
  • PermissionError
  • TimeoutError
  • IOError
  • EnvironmentError
  • OSError
  • StopIteration
  • ZeroDivisionError
  • NotImplementedError
  • RecursionError
  • MemoryError
  • UnicodeEncodeError
  • UnicodeDecodeError
  • UnicodeTranslateError
  • OverflowError
  • SystemExit

No builtin outside this set is part of the initial supported subset unless it is explicitly added elsewhere in this specification.

11.2 Contained File And Path Surface

For path and text-resource manipulation, scripts must use Python-shaped APIs: open, pathlib, os, os.path, glob, and shutil. These APIs remain host-mediated through ILythonHost; Lython-specific global filesystem helper names are not part of the supported script surface.

pathlib.Path.read_text(...) is supported for host-mediated text resources. It accepts the Python-shaped forms read_text(), read_text("utf-8"), read_text(encoding="utf-8"), and read_text(encoding="utf-8-sig", errors="strict"). Other encodings, other error modes, and extra arguments must fail explicitly.

Python-shaped open(...) and pathlib.Path.open(...) are supported only as UTF-8 text-handle helpers. They expose ordinary text-handle inspection such as closed, readable(), writable(), seekable(), tell(), and iteration. Binary modes and random access must fail explicitly.

pathlib follows Lython's normalized /-separated path model. Path, PurePath, PurePosixPath, and PosixPath produce the same contained path values. WindowsPath and PureWindowsPath must fail explicitly because no Windows-specific path semantics are exposed through the language surface.

Path.cwd() resolves through the host current working directory. Path.home() and Path.expanduser() must fail explicitly unless a future host capability exposes a contained home-directory source; the runtime must not read the ambient process home directory.

Path.iterdir(), Path.glob(...), and Path.rglob(...) may materialize eager path lists. Case-insensitive globbing, symlink traversal, rich inode/device/user/mode stat metadata, permission APIs, symlink APIs, and path byte helpers remain outside the text-first host boundary unless separately specified.

The glob module follows the same contained path model. glob.glob(pathname, *, root_dir=None, dir_fd=None, recursive=False, include_hidden=False) returns a materialized list of Python strings. glob.iglob(...) returns a one-shot iterator over the same materialized string results. Relative patterns produce relative strings, absolute patterns produce absolute strings, and root_dir changes the contained matching root without exposing ambient filesystem authority. Recursive ** preserves duplicate matches consistently with CPython. Hidden names match only when the pattern segment starts with . or include_hidden=True.

glob.escape(pathname), glob.has_magic(s), and glob.translate(pathname, *, recursive=False, include_hidden=False, seps=None) are supported for common agent-authored scripts. glob.glob0, glob.glob1, and any non-None dir_fd must fail explicitly because raw file descriptors and CPython internal traversal helpers are outside the host path model.

The decimal module exposes the common CPython-shaped Decimal, DecimalTuple, Context, getcontext, setcontext, localcontext, rounding constants, and decimal signal names expected by ordinary scripts. Lython Decimal remains backed by .NET decimal: arithmetic is fixed-precision, context precision is surfaced for compatibility but does not provide CPython arbitrary precision, and NaN, sNaN, Infinity, and values outside the .NET decimal range must fail explicitly.

The math module exposes the common CPython 3.13 scalar and aggregate helpers expected by generated scripts: elementary functions and constants, factorial, gcd, lcm, comb, perm, isqrt, dist, variadic hypot, frexp, ldexp, modf, remainder, nextafter, ulp, exp2, expm1, log1p, cbrt, erf, erfc, gamma, lgamma, fma, sumprod, prod, and fsum. Integer-only functions must accept bool as an integer. Domain, overflow, and keyword-only call-shape errors should follow CPython for supported functions. Exact integer helpers should use arbitrary-size integers where practical, while computations that imply unbounded local loops may fail explicitly under Lython's contained execution model.

The operator module exposes direct-function equivalents for supported expression and container operations: truth/not/identity helpers, unary numeric helpers, arithmetic and bitwise binary helpers, rich comparisons, getitem, setitem, delitem, contains, length_hint, countOf, indexOf, in-place helpers matching Lython augmented assignment, operator.call, itemgetter, attrgetter, and methodcaller. These helpers must reuse the runtime's existing Python-shaped equality, comparison, indexing, calling, and mutation semantics. operator.matmul must fail explicitly unless Lython later adds matrix-multiplication syntax and semantics.

The os module follows the same contained path model. It may expose Python-shaped constants such as name, sep, linesep, pathsep, extsep, devnull, and access-mode constants using documented contained values. Supported file-tree operations must remain host-mediated through ILythonHost.

os.environ is a live mapping backed only by an explicit contained environment supplied by the embedder. os.getenv, os.putenv, os.unsetenv, os.get_exec_path, and os.path.expandvars must use that contained mapping. The runtime must not read or mutate the ambient process environment by default.

os.scandir(path) returns an iterator/context manager of DirEntry-shaped objects exposing name, path, __fspath__(), is_file(), is_dir(), and cached basic stat() metadata. DirEntry.inode() and DirEntry.is_symlink() must fail explicitly unless the host path model grows corresponding metadata.

os.path may expose CPython-shaped pure helpers over the contained POSIX-like path model, including commonprefix, normcase, splitdrive, splitroot, expandvars, getatime, getctime, and ismount. expanduser and islink must fail explicitly unless a future host capability exposes a contained home directory or symlink source.

The os surface must not expose ambient process or filesystem authority. File descriptors, chdir, permission/owner mutation, symlink/link creation, process identity, signals, system, popen, exec*, spawn*, and similar raw OS APIs must fail explicitly or remain absent.

Lython scripts must not be able to:

  • bypass host-mediated subprocess policy to spawn shell commands
  • bypass host-mediated subprocess policy to invoke external utilities
  • bypass host-mediated subprocess policy to pipe data through external executables
  • rely on shell expansion semantics without an explicit host-mediated subprocess request
  • access a local filesystem directly outside the host-managed path space

11.3 Path Semantics

Paths in the language surface are strings.

The path separator in the language surface is /.

The initial subset must support:

  • absolute paths
  • relative paths
  • normalization of . and ..
  • a current working directory visible through os.getcwd() and Path.cwd()

Relative-path resolution and . / .. normalization must be performed by the runtime before host path operations are issued.

The host remains authoritative for path existence, path kinds, and resource contents.

os.mkdir(path) and Path(path).mkdir() must fail with RuntimeError if the parent path does not exist.

os.mkdir(path) and Path(path).mkdir() must fail with RuntimeError if the path already exists.

os.remove(path), os.unlink(path), os.rmdir(path), Path(path).unlink(), and Path(path).rmdir() must fail with RuntimeError if the path does not exist.

Removal APIs must remove files and empty directories only.

Removal APIs must fail with RuntimeError on non-empty directories.

shutil.copyfile(src, dst) and shutil.copy(src, dst) must fail with RuntimeError if the source path does not exist.

shutil.copyfile(src, dst) and shutil.copy(src, dst) may overwrite an existing file destination through host-mediated remove plus copy.

shutil.move(src, dst), os.rename(src, dst), os.replace(src, dst), Path.rename(dst), and Path.replace(dst) must fail with RuntimeError if the source path does not exist.

Move/rename APIs must fail with RuntimeError if the destination path already exists, except where a Python-shaped replace API explicitly overwrites.

os.listdir(path) must return a lexicographically sorted list of entry names as strings.

The returned names are relative entry names, not absolute paths.

The returned order must use ordinal string comparison.

11.4 Stat Surface

os.stat(path) and Path(path).stat() must return an object exposing at least the following attributes:

  • st_size
  • st_mtime

The attribute types are:

  • st_size: int
  • st_mtime: int

Path existence and kind checks must be exposed through Python-shaped predicates such as os.path.exists, os.path.isfile, os.path.isdir, Path.exists(), Path.is_file(), and Path.is_dir().

11.5 Path Helper Surface

The initial subset must support Python-shaped path helpers such as os.path.join, os.path.dirname, os.path.basename, Path composition with /, Path.parent, and Path.name. These helpers must operate on the language-level path model defined by this specification.

11.6 Supported String, List, and Dictionary Surface

The initial subset must support the following string methods:

  • split
  • splitlines
  • join
  • strip
  • lstrip
  • rstrip
  • replace
  • startswith
  • endswith
  • find
  • lower
  • upper
  • format

The initial subset must support the following list methods:

  • append
  • extend
  • index
  • count
  • insert
  • remove
  • pop
  • reverse
  • sort
  • copy
  • clear

Lists must support Python-style repetition with list * int, int * list, and plain-name list *= int.

Lists must support Python-style slice assignment and deletion for ordinary contiguous and stepped slices. Extended-slice assignment must reject replacement sequences whose length does not match the selected slice length.

Lists and tuples must support lexicographic ordering when their corresponding elements are comparable.

The initial subset must support the following dictionary methods:

  • get
  • keys
  • values
  • items
  • update
  • pop

dict.keys() must return an iterable view of keys.

dict.values() must return an iterable view of values.

dict.items() must return an iterable view of (key, value) tuples.

str.format(...) must support positional formatting fields only.

The initial subset does not support named formatting fields or format-spec mini-language extensions beyond the default behavior.

11.7 Regular Expressions

This section is normative.

Regular expressions are part of the standard runtime.

The runtime must support the statement:

  • import re

The imported module must expose the common Python-shaped regex helpers:

  • re.compile(pattern, flags=0)
  • re.search(pattern, string, flags=0, pos=0, endpos=len(string))
  • re.match(pattern, string, flags=0, pos=0, endpos=len(string))
  • re.fullmatch(pattern, string, flags=0, pos=0, endpos=len(string))
  • re.findall(pattern, string, flags=0, pos=0, endpos=len(string))
  • re.finditer(pattern, string, flags=0, pos=0, endpos=len(string))
  • re.sub(pattern, repl, string, count=0, flags=0, pos=0, endpos=len(string))
  • re.subn(pattern, repl, string, count=0, flags=0, pos=0, endpos=len(string))
  • re.split(pattern, string, maxsplit=0, flags=0, pos=0, endpos=len(string))
  • re.escape(string)
  • re.purge()

Compiled pattern objects must expose search, match, fullmatch, findall, finditer, sub, subn, and split with corresponding pos and endpos range arguments where applicable. They must also expose pattern, flags, groups, and groupindex.

Match objects must expose re, string, pos, endpos, lastindex, lastgroup, group, groups(default=None), groupdict(default=None), expand(template), and group-aware start(group=0), end(group=0), and span(group=0).

The module must expose re.error, re.PatternError, re.RegexFlag, re.NOFLAG, re.ASCII/re.A, re.IGNORECASE/re.I, re.UNICODE/re.U, re.MULTILINE/re.M, re.DOTALL/re.S, and re.VERBOSE/re.X. re.LOCALE/re.L and re.DEBUG must fail explicitly under Lython's Unicode-only regex subset.

Bytes patterns and subjects are outside the supported regex surface unless the public bytes model is explicitly expanded. If unsupported regex features are used, the runtime must fail explicitly.

11.7.1 Supported Pattern Surface

The initial regex pattern surface must support exactly:

  • literal characters
  • escaping of metacharacters with backslash
  • .
  • character classes such as [abc]
  • character-class ranges such as [a-z]
  • negated character classes
  • grouping with (...)
  • alternation with |
  • quantifiers ?, *, +
  • bounded repetition {m}, {m,}, {m,n}
  • anchors ^ and $
  • common escapes such as \n, \r, \t, \\
  • shorthand character classes \d, \D, \s, \S, \w, \W

11.7.2 Replacement Surface

For re.sub(...), re.subn(...), compiled-pattern replacement methods, and Match.expand(...), the runtime must support Python-shaped replacement templates, numeric and named group references, and callable replacements whose return value is a string.

11.7.3 Regex Semantics

Supported regex behavior must follow Python semantics for the supported subset.

Regex semantics are defined over Lython string values as Unicode text, not over raw byte sequences.

The canonical storage and interchange encoding for Lython text is UTF-8.

The runtime may choose any internal execution strategy, but the preferred implementation direction is UTF-8-native execution over the canonical PyString / UTF-8 text model rather than pervasive decode-to-.NET string bridging.

The full Python-compatible regex profile must be provided by a separate Utf8Regex.Python extension.

Lython accepts Utf8Regex and Utf8Regex.Python as implementation dependencies for import re.

Utf8Regex.Python is responsible for the Python regex syntax, Python regex replacement rules, and Python-visible regex errors of the supported subset.

Because Utf8Regex already supports UTF-8-oriented inputs and byte-aligned capture access, Lython should treat re as a UTF-8-native downstream subsystem, not as a justification for reintroducing .NET string as an internal semantic representation.

Where the runtime chooses to reject a regex feature that exists in Python, the failure must make clear that:

  • the script is using regex syntax outside the supported subset
  • the runtime is rejecting that syntax intentionally

11.8 JSON

This section is normative.

JSON support is part of the standard runtime.

The runtime must support the statement:

  • import json

The imported module must expose the following JSON helpers:

  • json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None)
  • json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None)
  • json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False)
  • json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False)
  • json.JSONDecodeError
  • json.JSONEncoder
  • json.JSONDecoder

load and dump are text-only and must operate on Lython text file handles. Binary file handles are outside the public file boundary.

loads must support object_hook, object_pairs_hook, parse_int, parse_float, and root-level parse_constant callbacks. Invalid JSON text must raise catchable JSONDecodeError with msg, doc, pos, lineno, and colno fields.

dumps and dump must support indentation, separators, key sorting, ensure_ascii, skipkeys, callable default, allow_nan, and circular-reference checks. Supported output values include None, booleans, strings, integers, floats, decimals, lists, tuples, and dictionaries. Dictionary keys may be strings, integers, finite floats, booleans, or None; unsupported keys fail unless skipkeys=True.

JSONEncoder and JSONDecoder are exposed only as explicit unsupported custom-class stubs. Passing non-None cls=... must fail explicitly.

If a script requests JSON behavior outside the supported subset, the runtime must fail explicitly.

11.9 Structured Delimited Text

This section is normative.

Structured delimited text processing is part of the standard runtime.

The runtime must support the statement:

  • import csv

The imported module must expose:

  • csv.reader(csvfile[, dialect][, ...])
  • csv.writer([fileobj][, dialect][, ...])
  • csv.DictReader(f[, fieldnames][, restkey][, restval][, ...])
  • csv.DictWriter(fileobj, fieldnames[, restval][, extrasaction][, ...])
  • csv.Error
  • csv.QUOTE_MINIMAL
  • csv.QUOTE_ALL
  • csv.QUOTE_NONE
  • csv.QUOTE_NONNUMERIC

The object returned by csv.reader(...) must be iterable and subscript-compatible with the historical Lython helper. It must yield rows as ordered lists of strings and expose line_num.

The object returned by csv.DictReader(...) must be iterable and must yield dictionaries keyed by field name. It must expose fieldnames and line_num. If fieldnames is omitted, the first row supplies the field names. restkey and restval must handle extra and missing fields.

The object returned by csv.writer() without a file object accumulates output in memory. It must support:

  • writer.writerow(row)
  • writer.writerows(rows)
  • writer.getvalue()

The object returned by csv.writer(fileobj, ...) must write rows to a Lython writable text file handle. It may also retain the in-memory helper methods for compatibility.

The object returned by csv.DictWriter(fileobj, fieldnames, ...) must support:

  • writer.writeheader()
  • writer.writerow(rowdict)
  • writer.writerows(rowdicts)

The CSV subset must support ordinary scripts using:

  • delimiter, quotechar, quoting, doublequote, escapechar, skipinitialspace, lineterminator, and strict options
  • text file handles opened with open(...) or Path.open(...)
  • multiline quoted records across physical input lines
  • scalar writer conversion for strings, integers, floats, booleans, and None

The CSV subset does not support:

  • register_dialect, get_dialect, list_dialects, or unregister_dialect
  • field_size_limit
  • Sniffer

The default dialect is:

  • delimiter ,
  • quote character "
  • line terminator \n

If a script requests structured-text behavior outside the supported subset, the runtime must fail explicitly.

11.10 Filename Matching

This section is normative.

Filename-pattern matching is part of the standard runtime.

The runtime must support the statement:

  • import fnmatch

The imported module must expose exactly the following functions:

  • fnmatch.fnmatch(name, pattern)
  • fnmatch.fnmatchcase(name, pattern)
  • fnmatch.filter(names, pattern)
  • fnmatch.translate(pattern)

Both fnmatch and fnmatchcase use deterministic POSIX-like case-sensitive matching. Lython does not apply os.path.normcase or host-platform case folding.

Patterns must support *, ?, bracket character classes, negated [!...] classes, ranges such as [a-z], literal leading ] inside a class, and malformed or unterminated classes as literal text. filter(names, pattern) must preserve input order and return the matching names as strings.

fnmatch.translate(pattern) returns an anchored regex string compatible with Lython re. The exact regex spelling is Lython-defined and need not match CPython's internal translation text.

Filename-pattern behavior must follow Python semantics for the supported subset in a deterministic, platform-independent way.

11.11 Structured Text Processing

Structured text processing is in scope.

The runtime must support the ability to:

  • read structured delimited text
  • parse rows in-process
  • transform tabular data structures in memory
  • write the transformed output elsewhere

This capability is provided through the csv surface defined in section 11.9 together with the builtins defined in sections 11.2 through 11.6.

11.12 Host-Mediated Subprocesses

Process execution is optional and must remain host-mediated.

When the host provides a subprocess capability, Lython may expose a contained subset of Python's subprocess module:

  • subprocess.run(...)
  • subprocess.call(...)
  • subprocess.check_call(...)
  • subprocess.check_output(...)
  • subprocess.CompletedProcess(args, returncode, stdout=None, stderr=None)
  • subprocess.CalledProcessError
  • subprocess.SubprocessError
  • subprocess.list2cmdline(seq)
  • subprocess.PIPE
  • subprocess.STDOUT
  • subprocess.DEVNULL

The subprocess request sent to the host must carry the command arguments, optional cwd, optional environment, stdin bytes, stream modes, shell/text-mode flags, encoding and error-mode requests, timeout, and output bounds. The host remains authoritative over whether a command may run, how streams are connected, whether shell execution is allowed, and what process environment is used.

Checked failures must raise catchable CalledProcessError with returncode, cmd, output, stdout, and stderr fields. CompletedProcess.check_returncode() must raise the same exception type for non-zero return codes. SubprocessError catches subprocess-specific checked failures.

The supported subprocess surface does not include Popen, TimeoutExpired, getoutput, getstatusoutput, background processes, unmanaged pipes, or ambient shell authority. Unsupported helpers must fail explicitly without broadening shell integration.

11.13 Line Diffs

Line-oriented diffing is part of the standard runtime.

The runtime must support the statement:

  • import difflib

The imported module must expose:

  • difflib.unified_diff(...)
  • difflib.context_diff(...)
  • difflib.ndiff(...)
  • difflib.restore(...)
  • difflib.get_close_matches(...)
  • difflib.diff_bytes(...)
  • difflib.Differ
  • difflib.HtmlDiff
  • difflib.SequenceMatcher
  • difflib.IS_LINE_JUNK
  • difflib.IS_CHARACTER_JUNK

The supported difflib surface must follow Python semantics for ordinary line-diff scripts, including junk predicates, grouped opcodes, Match-shaped matching blocks, close-match ranking, intraline ? hints, byte-preserving diff_bytes(...), and simple HTML table/file generation.

Diff helpers may return materialized lists rather than lazy generators. This is an intentional containment and resource-accounting choice; scripts that iterate, join, or wrap the result in list(...) must continue to behave like ordinary Python for the supported subset.

11.14 Module Discovery

Contained module discovery is available through pkgutil.

The runtime must support:

  • pkgutil.ModuleInfo
  • pkgutil.iter_modules(...)
  • pkgutil.walk_packages(...)
  • pkgutil.find_loader(...)
  • pkgutil.get_loader(...)
  • pkgutil.extend_path(...)
  • pkgutil.resolve_name(...)

Default discovery is limited to built-in modules and explicitly allowed local modules. Explicit path discovery is host-mediated through text-shaped file and directory primitives; .py files and package directories with __init__.py are discoverable only when the local module allowlist permits the resulting module name or source path.

ModuleInfo must behave like Python's tuple-shaped record for ordinary agent scripts: attribute access, unpacking, indexing, equality, _fields, _asdict, _replace, count, and index are supported. Discovery helpers may return materialized lists instead of lazy generators.

Importer and resource helpers that would expose ambient import machinery or binary resource reads remain unsupported by Lython: get_importer, iter_importers, iter_importer_modules, iter_zipimport_modules, get_data, and read_code.

11.15 Runtime Metadata

Lython exposes a contained sys view for ordinary runtime feature checks.

Supported metadata includes sys.version, sys.version_info, sys.hexversion, sys.implementation, sys.platform, sys.maxsize, sys.byteorder, sys.prefix, sys.base_prefix, sys.executable, sys.getdefaultencoding(), sys.path, sys.modules, sys.builtin_module_names, and sys.stdlib_module_names.

These values describe the Lython runtime and its contained import surface, not the host process, host executable path, host PATH, or ambient CPython installation. sys.path is the Lython local import base derived from SourcePath or host cwd. sys.modules is an inspection snapshot of Lython builtins and imported local modules.

sys.exc_info() reports the active handled exception inside an except block and (None, None, None) outside exception handling. sys.exit(...) raises a catchable SystemExit value whose code and args payload follow Python's ordinary shape for the supported subset.

Global process mutation and debugging hooks such as settrace, setprofile, setrecursionlimit, addaudithook, and audit remain unsupported and must fail explicitly.

11.16 CLI Argument Parsing

Contained command-line parsing is available through argparse.

The runtime must support ordinary agent-authored CLI parsers, including:

  • ArgumentParser(prog=None, usage=None, description=None, epilog=None, formatter_class=None, add_help=True, allow_abbrev=True, exit_on_error=True)
  • Namespace(**kwargs)
  • text-only, host-mediated FileType
  • HelpFormatter, RawDescriptionHelpFormatter, RawTextHelpFormatter, and ArgumentDefaultsHelpFormatter
  • SUPPRESS, OPTIONAL, ZERO_OR_MORE, ONE_OR_MORE, PARSER, and REMAINDER
  • add_argument(...), add_mutually_exclusive_group(...), parse_args(...), parse_known_args(...), format_usage(), format_help(), print_usage(...), print_help(...), error(...), exit(...), set_defaults(...), and get_default(...)
  • short options, long options, multiple aliases, --option=value, compact short no-value flags, dashed-name destination normalization, choices, required, default, metavar, help, and default=SUPPRESS
  • nargs values None, "?", "*", "+", and fixed positive integers for supported value-taking actions
  • actions store, store_true, store_false, append, store_const, count, and version

Type converters must accept ordinary callables, including int, float, str, pathlib.Path, and callables that raise ArgumentTypeError.

Parser failures use Python-shaped SystemExit with status code 2 by default. When exit_on_error=False, parse failures raise ArgumentError for the supported subset.

Advanced parser composition and ambient file expansion remain unsupported and must fail explicitly: fromfile_prefix_chars, parent parsers, subparsers, conflict handlers, non-default prefix character models, parser-wide argument_default, and custom Action subclasses.


12. Host Capability Interface

The host interface is a required part of the standalone package design.

The host model must not assume a local filesystem. It must be capable of being implemented on top of a virtual file tree, repository snapshot, in-memory workspace, or similar host-managed path space.

12.1 Minimum Expected Capabilities

The host must provide operations such as:

  • read UTF-8 text bytes at a path
  • write UTF-8 text bytes at a path
  • append UTF-8 text bytes at a path
  • list entries under a path
  • test existence
  • get metadata
  • create path container
  • copy
  • move
  • delete

The meaning of paths, roots, separators, normalization, and case sensitivity belongs to the host contract, not to the language runtime.

For text resources, the host boundary must be UTF-8-explicit. Lython should not rely on a host-provided .NET string text abstraction as the normative transport shape for file contents.

Lython string values remain Unicode text values at the language level, but the runtime must not let raw .NET string behavior become the language contract for Python-like strings.

The UTF-8 requirement applies both to script text and to text-resource interchange at the host boundary. Internally, the runtime should model Python-like string values through a dedicated text subsystem rather than treating host-language string operations as the semantic definition of str.

Run options may include a contained environment mapping. That mapping is script-visible through os.environ and related helpers, is mutable for the duration of the run, and must be initialized from embedder-provided values rather than from the ambient process environment.

12.2 Optional Capabilities

Optional capabilities are limited to:

  • globbing
  • patch application
  • streams
  • pipes
  • subprocess execution

These capabilities must remain explicit and capability-bound.

Regex support must not depend on an optional host capability. It is part of the base runtime.

12.3 No Direct Machine Access

The Lython runtime itself must not bypass the host interface.


13. Error Model

Diagnostics are critical because the language is intended to be used by both humans and automation agents.

13.1 Parse Errors

Parse errors must include:

  • line
  • column
  • message
  • script name when available

13.2 Unsupported Feature Errors

When valid Python syntax is rejected because it is outside the supported subset, the error must make that explicit.

It must say, in effect:

  • this is valid Python
  • Lython does not support it

This is important both for users and for agents attempting repair.

These errors must be produced before execution begins whenever practical.

13.3 Runtime Errors

Runtime errors must include:

  • source location when available
  • call trace
  • failing operation context
  • host failure context when relevant

13.4 Error Philosophy

Errors must be:

  • deterministic
  • concise
  • structured enough for machines to consume

14. Determinism and Resource Control

Lython must be deterministic enough for automation and testing.

14.1 Deterministic Behavior

Given the same:

  • script
  • globals
  • runtime version
  • host responses

execution must behave identically.

14.2 Resource Constraints

The runtime must allow the embedding host to constrain:

  • execution time
  • memory use
  • interpreter work / execution steps
  • recursion depth
  • maximum string size
  • maximum collection size
  • maximum number of host calls
  • cancellation

The first version does not need to implement every possible quota intrinsically, but it must expose practical control points.

14.3 Cooperative Cancellation

Script execution must be cooperatively cancellable by the host.

Cancellation must be driven by a host-provided cancellation mechanism and must be checked at high frequency across:

  • statement dispatch
  • expression evaluation
  • loops
  • function calls
  • builtin calls
  • module operations

Cancellation must terminate script execution safely with an explicit runtime failure rather than leaving the host to rely on thread aborts or similar unsafe interruption mechanisms.

The public embedding API may expose synchronous execution, asynchronous execution, or both, but host-safe cancellation must not depend on unsafe preemptive interruption.

14.4 Stack Safety

Python-level recursion must never be allowed to become an uncontrolled CLR stack overflow.

The runtime must therefore enforce an explicit recursion budget before host stack exhaustion is possible.

If internal implementation strategies introduce additional recursive paths, those paths must be bounded or rewritten so they cannot bypass the script-level recursion limit.

14.5 Execution-Step Budget

Interpreter work must be constrainable by an explicit execution-step budget.

This budget exists in addition to cancellation and host-call limits. It must be capable of stopping runaway scripts that:

  • loop forever without allocating heavily
  • avoid host calls
  • remain inside pure interpreter work

The exact internal notion of a “step” is implementation-defined, but it must be stable enough to act as a practical host-protection mechanism.

14.6 Memory Budgets

The runtime must expose practical in-process memory budgets that reduce the risk of runaway scripts exhausting host memory.

These budgets must include explicit limits for at least:

  • string growth
  • collection growth
  • other interpreter-managed runtime structures where practical

The runtime may approximate memory pressure conservatively rather than measuring exact byte-for-byte CLR allocation cost.

Such approximations must lean on the safe side: it is acceptable to reject scripts earlier than a perfect memory model would require if doing so better protects the host.

Lython does not need to promise a perfect hard memory ceiling from inside the same managed process, but it must provide conservative in-process resource controls that make runaway allocation materially harder.


15. Text Model

Lython is text-first.

15.1 Encoding

Scripts are UTF-8 text.

Text-resource operations must use UTF-8.

The canonical host-I/O boundary for text resources is UTF-8 bytes. A host implementation may internally decode or encode however it likes, but the observable contract with Lython is UTF-8 text interchange rather than host-native string transport.

15.2 Newline Behavior

Text-editing operations must define newline handling explicitly.

Lython uses Linux newline conventions.

Text produced by the runtime must use \n as the newline separator.

When reading text resources, hosts may accept other newline encodings as input, but the runtime text model is normalized to \n.

Lython string values are Unicode text values. UTF-8 is the canonical script and text-resource encoding. String operations are defined over Unicode text semantics, not over host-specific string representations.

This distinction is important:

  • language-level str values are text values
  • host text-resource transport is UTF-8

Lython must not collapse these two concerns into an implementation-defined host string abstraction.

15.3 Internal String Representation

The runtime should treat Python-like strings as a dedicated subsystem, not as incidental .NET string values.

In particular:

  • the semantic definition of len, indexing, slicing, search, replacement, splitting, formatting, and regex interaction must not be whatever raw .NET string happens to do
  • the runtime should prefer a UTF-8-native or otherwise dedicated internal string representation for Python-like str values
  • if host-language string values are used transiently inside the implementation, they must remain an implementation detail rather than the semantic source of truth

The intended direction is stronger than a UTF-8-aware core plus string-shaped downstream modules. Text-sensitive subsystems such as:

  • re
  • json
  • csv
  • fnmatch
  • glob
  • formatting and interpolation
  • file/text-handle behavior

should operate directly on PyString values and/or their underlying UTF-8 representation whenever practical.

If an external library forces a string boundary, that boundary must remain narrow and explicit. It must not cause downstream runtime logic to become string-centric again.

The design goal is that externally visible string behavior is defined by Python Unicode semantics and by this specification, not by UTF-16 code-unit behavior inherited accidentally from the host runtime.

15.4 Bytes

The initial subset does not expose a public bytes model.

If a public bytes model is added later, it must be introduced as an explicit extension without changing the text-first semantics of the initial subset.


16. Public Embedding API

The standalone package must expose a high-level API centered on Python scripts, not on grammar-authoring mechanics.

16.1 Minimum Operations

Consumers must be able to:

  • parse source
  • compile source
  • execute source
  • execute compiled code
  • inspect diagnostics

16.2 Conceptual Shape

The exact names are not prescribed, but the API must follow this conceptual shape:

var engine = new LythonEngine(options);
var compiled = engine.Compile(source);
var result = await compiled.RunAsync(host, globals, cancellationToken);

The host-facing text-resource contract behind host should be UTF-8-explicit. In particular, the preferred public host abstraction should expose UTF-8 payloads for text-resource reads and writes, while Lython itself continues to present Python str values as text through a dedicated runtime string model rather than through accidental .NET string semantics.

16.3 Separation of Concerns

Consumers who only want the scripting runtime must not need to depend on grammar-authoring infrastructure.


17. Packaging Direction

Lython must be packaged as a standalone embeddable runtime.

The scripting runtime must be a first-class product.

If separate grammar tooling exists, it must remain separate enough that:

  • embedding the scripting runtime is simple
  • package dependencies stay small
  • consumers are not forced into source-generator workflows

The accepted dependency set for the scripting runtime includes Utf8Regex and Utf8Regex.Python for the re module surface.


18. Compatibility-Sensitive Areas

The following areas must be treated as highly compatibility-sensitive:

  • syntax acceptance
  • truthiness
  • scoping
  • function call rules
  • list and dict behavior
  • string behavior
  • regex behavior
  • builtin names and behavior
  • file-operation semantics

Behavioral drift in these areas is costly because agents will assume Python.


19. Initial Supported Subset

The initial supported subset consists of the exact Python-surface subset defined in section 8, together with:

  • the re, json, csv, and fnmatch subsets defined in sections 11.7 through 11.10
  • structured delimited text processing and tabular reshaping
  • controlled path-space operations
  • host-mediated subprocess execution as defined in section 11.12, when the host provides the capability

The initial subset explicitly excludes ambient shell command execution outside host mediation.

This subset must be sufficient for a coding agent to naturally write scripts such as:

  • read a file
  • modify the text
  • write it back
  • read a structured delimited text file, reshape it in memory, and write another file
  • loop over files in a folder
  • run regex-based replacements across files
  • check preconditions and fail if they are not met

20. Success Criteria

Lython succeeds if:

  • a coding agent can usually write ordinary short Python for supported edit workflows
  • the agent does not need to learn a separate DSL
  • unsupported features fail clearly
  • the runtime remains contained and embeddable
  • the host retains full control over effects

Lython fails if:

  • agents must learn “special Lython syntax”
  • Python-looking code behaves unlike Python without explicit warning
  • the runtime becomes an uncontrolled general-purpose execution environment

21. Summary

Lython is a safe embedded runtime for a strict subset of Python.

Its defining properties are:

  • Python compatibility by default
  • explicit unsupported-feature boundaries
  • contained host-mediated effects
  • usefulness for text and file manipulation
  • deterministic and manageable behavior

The right mental model is:

  • not a full Python runtime
  • not a DSL
  • a controlled Python subset for automation