Introducing 墨砚 Moyan

A structure-aware text editor built on a persistent document database.
What does this mean? The text editor is designed to go much further than other text writing applications in acknowledging that language is not simply a stream of text: words, phrases, sentences, paragraphs. Instead, spoken and written language is presented in the form of a one dimensional stream, but is actually a complex network, where words, phrases, sentences, paragraphs, sections, chapters, documents are connected to other words, phrases, sentences, etc. This structure is often assumed to be something like a tree network; a hierarchical structure whereby the meaning of a text is not an aggregate of all the words used, but rather emerges from the interaction of all words, phrases, sentences, etc. This is why statistical analysis of texts can only give limited understanding. A text is not a heap of words, it is a complex system. Current text editors range from simple plain text editors (like Vim, emacs, notepad, etc) where the user simply types text as a stream of characters, and the application represents this as simply a block of bytes. Any structure in the document is provided by a syntax layer on top of the characters, like markdown # Heading syntax. At the other end of the spectrum are word processing applications where the writing and presentation layer are combined--WYSIWYG What You See Is What You Get, (like Microsoft Word, Google Docs, LibreOffice Writer, etc) where the presentation layer and the underlying structure both specify a text area as heading, list, table, bold, underline, link, reference, etc. This is extremely useful both in converting the document to another format, and in computational linguistic analysis of the document. However, in these editors, and to facilitate backwards compatibility from Microsoft Word, build originally in 1983, successive Word, LibreOffice, Google Docs have used an xml based syntax as the document format. This by itself is not ideal, have the memory efficiency and speed of a gap-buffer when writing, and the over a range of file sizes, with native semantic tree structure inspired by Pandoc AST design

The first major architectural milestone of the provenance-first text editor project has now been completed.

A localized mutable gap buffer system has been integrated into the Micro editor prototype as a temporary hot editing overlay over the existing []Line backend.

The current implementation preserves Micro’s existing rendering, search, syntax highlighting, save, undo/redo, and split-buffer infrastructure while introducing a new editing model based on localised mutable regions.

The next phase is to replace the current []Line backend with a persistent B+ tree.

Core architectural principle

The document structure is canonical. The gap buffer is a temporary mutable overlay.

The canonical document structure remains authoritative at all times. The gap buffer exists only for fast localized editing. Whole-document operations flush the hot edit region first.

This rule keeps the implementation simple: character edits go into the gap buffer, while save, backup, search navigation, hashing, serialization, and other whole-document operations read from committed document structure.

What has been completed

The prototype now includes a standalone gap buffer, localized edit regions, projection-based line rendering, hot-region routing logic, structural edit detection, and a commit-before-global-operation coherence model.

The Micro buffer subsystem has also been reviewed for direct storage access, save-path correctness, backup behaviour, concurrency hazards, and abstraction boundaries.

Current prototype architecture

The editor now materializes the active editing area into a temporary gap buffer. Normal character-level edits occur inside that hot mutable region instead of directly mutating the full document structure on every keystroke.

The existing LineArray remains the canonical document structure for the prototype. It is flushed to before whole-document operations, and projected from during viewport-level reads.

Structural edits, such as newline insertion or cross-line deletion, commit the active gap buffer first, then update the document structure directly.

Major findings

The coherence model has been simplified successfully. Earlier concerns around stale line counts, coordinate translation, projected offsets, multi-region synchronization, and viewport inconsistency are resolved by one rule: flush the hot edit region before any whole-document operation.

The review also identified critical save and backup hazards. Some save and backup paths read line data directly from the underlying document structure, which could bypass dirty gap-buffer state. The fix is to commit on the main goroutine before dispatching save or backup work.

Next phase: persistent B+ tree backend

The next major step is to replace LineArray with a persistent copy-on-write B+ tree.

The B+ tree will become the canonical document structure. The gap buffer will remain a temporary hot mutable overlay. Flushing will become a write to B+ tree leaves instead of a write back to []Line.

This means the editing architecture does not need to be redesigned. The storage layer changes, but the separation between hot mutable editing and canonical document structure remains the same.

Long-term direction

This prototype is the foundation for a provenance-first academic editor built around deterministic markdown serialization, semantic document structure, immutable snapshots, append-only provenance recording, and replayable editing history.

The project intentionally avoids HTML storage, DOM editing, opaque binary formats, and page-layout-centric document models. Plain text remains sovereign, markdown remains the canonical exchange format, and rich text is treated as a projection over structure.