commit 2c1004500473fae8c9595146c9a28af3f620f94f
parent 8d4cb7f11d5f69c90139cd28fa1a3e31e43390e8
Author: Chip Senkbeil <chip@senkbeil.org>
Date: Tue, 29 Sep 2020 00:02:41 -0500
Add vimwiki language specification 0.1.0
Diffstat:
1 file changed, 941 insertions(+), 0 deletions(-)
diff --git a/doc/specification.wiki b/doc/specification.wiki
@@ -0,0 +1,941 @@
+= Specification =
+
+The following is a draft specification for the *vimwiki* markup language. It is
+provided as a guideline for consistent parsing and rendering of vimwiki
+content.
+
+Similar to the approach taken with [[https://www.commonmark.org/|commonmark]],
+this document attempts to specify *vimwiki* syntax unambiguously. It contains
+examples of the language and describes the specifics of the language in a way
+that tooling can more easily define parsers and generators.
+
+== Version ==
+
+Current: *0.1.0*
+
+This specification is versioned in order to provide a stable point of reference
+for external tools. [[https://semver.org/|Semantic versioning]] is used to keep
+track of the current state of the specification. *MAJOR.MINOR.PATCH* is the
+format.
+
+While the specification remains with a zero major version, the contents of this
+specification can change without any requirement to maintain compatibility.
+Once a non-zero major version is released, new vimwiki language elements may
+only be added in minor releases and any breaking change must be reflected by a
+major release. Tweaks in language that do not add, alter, or remove elements of
+vimwiki (such as typos, clarifications, etc) may be made with patch releases.
+
+The specification will continue to remain in a development mode (zero major
+mode) until the language has become relatively stable.
+
+== Language ==
+
+The following will describe the individual elements - hereby described as
+components - of the vimwiki language. It will cover each component's purpose
+and a clear description of the syntax.
+
+For details on parsing prescedence, see the [[#Specification#Parser Details|companion section]].
+
+=== Primitives ===
+
+In order to define the vimwiki language, we first need to present several
+definitions for primitive building blocks used to shape up higher-level
+components. Relevant definitions are borrowed from
+[[https://spec.commonmark.org/0.29/#characters-and-lines|commonmark characters and lines]].
+
+A *character* in vimwiki is a valid UTF-8 code point.
+
+A *line* is a sequence of zero or more [[#Specification#Language#Primitives#character|characters]] other than a
+newline (`U+000A` aka `\n`) or carriage return (`U+000D` aka `\r`), followed by
+a [[#Specification#Language#Primitives#line ending|line ending]] or by the end of a file.
+
+A *line ending* is a newline (`U+000A` aka `\n`), a carriage return (`U+000D`
+aka `\r`) not followed by a newline, or a carriage return and a following
+newline (`\r\n`).
+
+A line with no characters, or a line containing only spaces (`U+0020`) or tabs
+(`U+0009`), is called a *blank line*.
+
+A *whitespace character* is a space (`U+0020`) or tab (`U+0009`).
+
+=== Block Components ===
+
+The vimwiki language has a variety of syntax that represent components within
+a page. In this section, we discuss block-level components, which are vimwiki
+syntax that are standalone and can comprise one or more entire lines within
+a file.
+
+==== Blockquote ====
+
+A blockquote has two different forms available: indented text or text prefixed
+with a right angle bracket or chevron `>`. Its purpose is to convey an extended
+quotation.
+
+*Form 1*:
+
+{{{vimwiki
+ This is a blockquote
+ that exists on more than one line
+}}}
+
+*Form 2*:
+
+{{{vimwiki
+> This is a blockquote
+> that exists on more than one line
+}}}
+
+===== Syntax =====
+
+*Form 1*:
+
+A blockquote is made of *one* or more [[#indented blockquote line|indented blockquote lines]]
+
+An *indented blockquote line* is made of the following:
+1. Four or more [[#whitespace characters|whitespace characters]]
+2. All characters up until a [[#line ending|line ending]]
+3. A [[#line ending|line ending]] or end of input
+
+*Form 2*:
+
+A blockquote is made of *one* or more [[#chevron blockquote line|chevron blockquote lines]], which may be
+separated by zero or more [[#blank line|blank lines]]
+
+A *chevron blockquote line* is made of the following:
+1. Starts at the beginning of a line
+2. A prefix right angle bracket or chevron (`U+003E` aka `>`)
+3. A [[#whitespace character|whitespace character]] such as ' ' or '\t'
+4. All characters up until a [[#line ending|line ending]]
+5. A [[#line ending|line ending]] or end of input
+
+*Extra Notes*: Each line of a blockquote, minus the indentation or chevron, is
+trimmed to remove all leading and trailing [[#whitespace character|whitespace characters]].
+
+==== Definition List ====
+
+A definition list is composed of a series of terms and associated definitions.
+It mirrors the functionality available in an [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/dl|HTML Description List]].
+
+{{{vimwiki
+Term 1:: Some definition
+Term 2:: First def
+:: Second def
+Term3::
+:: Some definition
+}}}
+
+TODO ... should the definitions and terms be raw text? Or can they support
+ decorations and links? e.g. `Term 1:: *Bold* def with [[link]]`
+
+===== Syntax =====
+
+A *definition list* is composed of one or more [[#term and definitions|term and definitions]]
+
+A *term and definitions* is composed of one [[#term line|term line]] and
+zero or more [[#definition line|definition lines]]
+
+A *term line* is represented by the following:
+1. Starts at the beginning of a line
+2. One or more characters before the sequence `::`
+3. The sequence `::`
+4. An optional one or more characters before [[#line ending|line ending]]
+ to be the first definition
+5. A [[#line ending|line ending]] or end of input
+
+A *definition line* is represented by the following:
+1. Starts at the beginning of a line
+2. The sequence `::`
+3. One or more characters before [[#line ending|line ending]]
+4. A [[#line ending|line ending]] or end of input
+
+*Extra Notes*: Each term and definition is trimmed to remove all leading and
+trailing [[#whitespace character|whitespace characters]].
+
+==== Divider ====
+
+A divider is composed of a sequence of dashes (`U+002D`). It mirrors the
+functionality of the [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/hr|HTML Horizontal Rule]].
+
+{{{vimwiki
+----
+}}}
+
+===== Syntax =====
+
+A *divider* is represented by the following:
+1. Starts at the beginning of a line
+2. Four or more dashes (`U+002D`)
+3. A [[#line ending|line ending]] or end of input
+
+==== Header ====
+
+A header is composed of some content surrounded by equals sign (`U+003D` aka
+`=`) of equal length. It mirrors the functionality of the [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements|HTML Heading]].
+
+{{{vimwiki
+= Some Heading =
+== Some Sub Headering ==
+}}}
+
+TODO ... should the content of the header be raw text? Or can it support
+ decorations and links? e.g. `= *Bold* Header with [[link]] =`
+ Pandoc appears to support decorations, links, and other [[#inline components|inline components]]
+
+===== Syntax =====
+
+A *header* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]] (implying
+ whether or not a header is centered)
+3. One or more equal sign (`U+003D`) characters
+4. Any character until an equal number of equal sign characters are found
+5. An equivalent number of equal sign characters as in step #3
+6. A [[#line ending|line ending]] or end of input
+
+*Extra Notes*: Each header's content, minus the equals signs, is trimmed to
+remove all leading and trailing [[#whitespace character|whitespace characters]].
+For example, `= header =` is equal to `=header=`.
+
+==== List ====
+
+A list is composed of a series list items, each being comprised
+of [[#inline components|inline components]] and [[#list|sub lists]]. It mirrors the
+functionality available in an [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ul|HTML Unordered List]]
+and [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ol|HTML Ordered List]].
+
+{{{vimwiki
+- List item 1 has *bold* and [[links]]
+- List item 2 has content
+ 1. Ordered sublist
+ 2. within list item 2
+}}}
+
+===== Syntax =====
+
+A *list* is composed of one or more [[#list item|list items]].
+
+A *list item* is composed of a [[#starting list item line|starting list item line]]
+and zero or more [[#companion list item line|companion list item lines]].
+
+A *starting list item line* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]] that signify
+ the level of indentation to use when understanding if later content is
+ still associated with this list item, if a new list item is the beginning
+ of a sublist, if a new list item is a sibling, or if a new list item
+ is the sibling of a parent
+3. One of the following prefixes that determines if the type of list item:
+ * Hyphen (`U+002D` aka `-`) is for an unordered list
+ * Asterisk (`U+002A` aka `*`) is for an unordered list
+ * Pound (`U+0023` aka `#`) is for an ordered list
+ * One or more digits followed by a period (`U+002E` aka `.`) or
+ a right parenthesis (`U+0029` aka `)`) is for an ordered list
+ * One or more lowercase alphabetic (`a-z`) followed by a period
+ (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is for an
+ ordered list
+ * One or more uppercase alphabetic (`A-Z`) followed by a period
+ (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is for an
+ ordered list
+ * One or more lowercase Roman numerals (any of `ivxlcdm`) followed by a
+ period (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is
+ for an ordered list
+ * One or more uppercase Roman numerals (any of `IVXLCDM`) followed by a
+ period (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is
+ for an ordered list
+4. A [[#whitespace character|whitespace character]]
+5. An optional [[#todo attribute|todo attribute]] and additional [[#whitespace character|whitespace character]]
+6. Zero or more [[#inline components|inline components]]
+7. A [[#line ending|line ending]] or end of input
+
+A *companion list item line* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]] that total
+ as many or more as the [[#starting list item line|starting list item line]] indentation
+3. One of the following:
+ * The start of a new [[#list|list]] (to be treated as a sublist of the
+ current list item)
+ * A series of one or more [[#inline components|inline components]]
+ (to be added to the content of the current list item) followed by
+ a [[#line ending|line ending]] or end of input
+ * A [[#blank line|blank line]] if there is guaranteed to still be some
+ list item content in later lines
+
+A *todo attribute* is composed of surrounding square brackets in the form
+of a left square bracket (`U+005B` aka `[`) and right square bracket
+(`U+005D` aka `]`). Inbetween the square brackets is a single character to
+denote the todo status and is one of the following:
+* A space (`U+0020` aka ' ') meaning 0% or incomplete
+* A period (`U+002E` aka `.`) meaning 1-33% progress
+* A lowercase o (`U+006F` aka `o`) meaning 34-66% progress
+* An uppercase O (`U+004F` aka `O`) meaning 67-99% progress
+* An uppercase X (`U+0058` aka `X`) meaning 100% or completed
+* A hyphen (`U+002D` aka `-`) meaning rejected
+
+*Extra Notes*: Because of the ambiguity of alphabetic list items and Roman
+numerals, which are composed of specific alphabetic characters in various
+arrangements, a list needs to be evaluated across all of its items to determine
+if a list item's type is Roman or alphabetic. If all list items begin with
+valid Roman numerals, then the types are Roman numerals. If any list item is
+not a valid Roman numeral, then all list item type's for those prefixes are
+considered to be alphabetic.
+
+==== Math Block ====
+
+A math block is composed of a series of lines representing a mathematical
+formula. It is rendered in HTML using the [[https://www.mathjax.org/|MathJax engine]].
+
+{{{vimwiki
+{{$%align%
+\sum_i a_i^2 &= 1 + 1 \\
+&= 2.
+}}$
+}}}
+
+===== Syntax =====
+
+A *math block* is composed of a [[#beginning math block line|beginning math block line]],
+one or more [[#math block line|math block lines]], and an [[#ending math block line|ending math block line]].
+
+A *beginning math block line* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]]
+3. The sequence `{{$`
+4. An optional [[#math environment|math environment]]
+5. Zero or more [[#whitespace character|whitespace characters]]
+6. A [[#line ending|line ending]] or end of input
+
+A *math environment* is represented by the following:
+1. The percent sign (`U+0025` aka `%`)
+2. One or more characters that are not the percent sign or [[#line ending|line ending]]
+3. The percent sign (`U+0025` aka `%`)
+
+A *math block line* is a line found after a [[#beginning math block line|beginning math block line]]
+and before an [[#ending math block line|ending math block line]] and is
+comprised of zero or more characters followed by a [[#line ending|line ending]].
+
+An *ending math block line* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]]
+3. The sequence `}}$`
+4. Zero or more [[#whitespace character|whitespace characters]]
+5. A [[#line ending|line ending]] or end of input
+
+==== Paragraph ====
+
+A paragraph is composed of a series of lines representing some content. It
+mirrors [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p|HTML Paragraph]].
+
+{{{vimwiki
+Some paragraph containing
+multiple lines including
+*bold* and [[links]].
+}}}
+
+===== Syntax =====
+
+A *paragraph* is composed of one or more [[#paragraph line|paragraph lines]].
+
+A *paragraph line* is represented by the following:
+1. Starts at the beginning of a line
+2. Has no [[#whitespace character|whitespace]] indentation
+3. Is not any of the following:
+ * [[#header|header]]
+ * [[#definition list|definition list]]
+ * [[#list|list]]
+ * [[#table|table]]
+ * [[#preformatted text|preformatted text]]
+ * [[#math block|math block]]
+ * [[#blank link|blank line]]
+ * [[#blockquote|blockquote]]
+ * [[#divider|divider]]
+ * [[#placeholder|placeholder]]
+4. One or more [[#inline components|inline components]]
+5. A [[#line ending|line ending]] or end of input
+
+TODO ... should we combine [[#non-blank line|non-blank line]] with paragraph
+ by having a paragraph trim all leading [[#whitespace character|whitespace]]?
+
+==== Placeholder ====
+
+A placeholder is composed of an identifier and some information. Its purpose
+is to provide metadata for use in rendering vimwiki to HTML and populating
+portions of the HTML template used with a vimwiki page.
+
+{{{vimwiki
+%title Some title
+%nohtml
+%template my_template
+%date 2020-12-23
+}}}
+
+===== Syntax =====
+
+
+A *placeholder* is represented by one of the following:
+* A [[#title placeholder|title placeholder]]
+* A [[#nohtml placeholder|nohtml placeholder]]
+* A [[#template placeholder|template placeholder]]
+* A [[#date placeholder|date placeholder]]
+
+A *title placeholder* is represented by the following:
+1. Starts at the beginning of a line
+2. A percent sign (`U+0025` aka `%`)
+3. The sequence `title`
+4. One or more [[#whitespace character|whitespace characters]]
+5. Any sequence of [[#whitespace character|whitespace characters]] and
+ non-whitespace characters leading up to a [[#line ending|line ending]]
+6. A [[#line ending|line ending]] or end of input
+
+A *nohtml placeholder* is represented by the following:
+1. Starts at the beginning of a line
+2. A percent sign (`U+0025` aka `%`)
+3. The sequence `nohtml`
+4. A [[#line ending|line ending]] or end of input
+
+A *template placeholder* is represented by the following:
+1. Starts at the beginning of a line
+2. A percent sign (`U+0025` aka `%`)
+3. The sequence `template`
+4. One or more [[#whitespace character|whitespace characters]]
+5. Any sequence of [[#whitespace character|whitespace characters]] and
+ non-whitespace characters leading up to a [[#line ending|line ending]]
+6. A [[#line ending|line ending]] or end of input
+
+A *date placeholder* is represented by the following:
+1. Starts at the beginning of a line
+2. A percent sign (`U+0025` aka `%`)
+3. The sequence `date`
+4. One or more [[#whitespace character|whitespace characters]]
+5. A date string in the format `YYYY-MM-DD` where `YYYY` symbolizes a
+ four-digit year (e.g. `1990`), `MM` symbolizes a two-digit month (e.g.
+ `04`), and `DD` symbolizes a two-digit day (e.g. `23`)
+6. A [[#line ending|line ending]] or end of input
+
+==== Preformatted Text ====
+
+A preformatted text block is composed of a series of lines representing some
+content, usually related to a programming language. It mirrors
+[[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre|HTML Preformatted Text]].
+
+{{{vimwiki
+{{{rust
+fn my_func() -> u32 {
+ 1 + 2
+}
+\}}}
+}}}
+
+TODO ... wrapping a preformatted text block with another preformatted text
+ isn't possible right now due to the use of `}}}` matching. We would
+ need some sort of escape like `\}}}` or `\{{{` that could be used to
+ avoid matching legit syntax but still provide the literal text
+ upon rendering.
+
+===== Syntax =====
+
+A *preformatted text* is composed of a [[#beginning preformatted text line|beginning preformatted text line]],
+one or more [[#preformatted text line|preformatted text lines]], and an [[#ending preformatted text line|ending preformatted text line]].
+
+A *beginning preformatted text line* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]]
+3. The sequence `{{{`
+4. An optional [[#preformatted language identifier|preformatted language identifier]]
+5. An optional [[#preformatted metadata list|preformatted metadata list]]
+6. Zero or more [[#whitespace character|whitespace characters]]
+7. A [[#line ending|line ending]] or end of input
+
+A *preformatted language identifier* is represented by the following:
+1. One or more characters that are not equals sign (`U+003D` aka `=`)
+2. An optional semicolon (`U+003B` aka `;`)
+
+A *preformatted metadata list* is composed of one or more
+[[#preformatted metadata list items|preformatted metadata list items]] separated by semicolons (`U+003B` aka `;`).
+
+A *preformatted metadata list item* is represented by the following:
+1. One or more characters leading up to an equals sign (`U+003D` aka `=`),
+ not including a [[#line ending|line ending]]
+2. An equals sign (`U+003D` aka `=`)
+3. A quotation mark (`U+0022` aka `"`)
+4. One or more characters leading up to a quotation mark (`U+0022` aka `"`)
+5. A quotation mark (`U+0022` aka `"`)
+
+A *preformatted text line* is a line found after a [[#beginning preformatted text line|beginning preformatted text line]]
+and before an [[#ending preformatted text line|ending preformatted text line]] and is
+comprised of zero or more characters followed by a [[#line ending|line ending]].
+
+An *ending preformatted text line* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]]
+3. The sequence `}}}`
+4. Zero or more [[#whitespace character|whitespace characters]]
+5. A [[#line ending|line ending]] or end of input
+
+==== Table ====
+
+A table is composed of a series of rows containing various other components. It
+mirrors [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table|HTML Table]].
+
+{{{vimwiki
+| Year | Temperature (low) | Temperature (high) | Temperature (avg) |
+|------|-------------------|--------------------------|-------------------|
+| 1990 | *50* degrees | 90 according to [[link]] | 72 |
+| \/ | 45 degrees | > | 80 |
+| \/ | \/ | > | 60 |
+| 2000 | > | > | > |
+}}}
+
+===== Syntax =====
+
+A *table* is composed of one or more [[#row|rows]] with the indentation of
+the first row indicating whether the table is centered (is indented) or not.
+
+A *row* is represented by one of the following:
+* A [[#divider row|divider row]]
+* A [[#content row|content row]]
+
+A *divider row* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]]
+3. A sequence of pairs comprised of a [[#cell boundary|cell boundary]] and
+ one or more hyphens (`U+002D` aka `-`)
+4. A final [[#cell boundary|cell boundary]]
+5. A [[#line ending|line ending]] or end of input
+
+A *content row* is represented by the following:
+1. Starts at the beginning of a line
+2. Zero or more [[#whitespace character|whitespace characters]]
+3. A sequence of pairs comprised of a [[#cell boundary|cell boundary]] and a [[#cell|cell]]
+4. A final [[#cell boundary|cell boundary]]
+5. A [[#line ending|line ending]] or end of input
+
+A *cell* is represented by one of the following:
+* A [[#span above cell|span above cell]]
+* A [[#span left cell|span left cell]]
+* A [[#content cell|content cell]]
+
+A *span above cell* is represented by the following:
+1. Zero or more [[#whitespace character|whitespace characters]]
+2. Sequence `\/`
+3. Zero or more [[#whitespace character|whitespace characters]]
+
+A *span left cell* is represented by the following:
+1. Zero or more [[#whitespace character|whitespace characters]]
+2. Sequence `>`
+3. Zero or more [[#whitespace character|whitespace characters]]
+
+A *content cell* is represented by the following:
+1. Zero or more [[#whitespace character|whitespace characters]]
+2. One or more [[#inline components|inline components]] not comprised of `|`
+3. Zero or more [[#whitespace character|whitespace characters]]
+
+A *cell boundary* is represented by the pipe character (`U+007C` aka `|`).
+
+==== Non-blank Line ====
+
+A non-blank line is a single line that is not a paragraph. It is used to
+capture text not represented by any other component.
+
+{{{vimwiki
+ Some other text containing *bold* and [[links]].
+}}}
+
+===== Syntax =====
+
+A *non-blank line* is represented by the following:
+1. Between one and three [[#whitespace character|whitespace characters]]
+2. One or more [[#inline components|inline components]]
+3. A [[#line ending|line ending]] or end of input
+
+TODO ... should we combine [[#non-blank line|non-blank line]] with paragraph
+ by having a paragraph trim all leading [[#whitespace character|whitespace]]?
+
+=== Inline Components ===
+
+The vimwiki language also has a variety of syntax that can be used within a
+line on a page. These are referred to as *inline components* and can be found
+within a variety of [[#block components|block components]] as well as nested
+within other inline components.
+
+==== Math Inline ====
+
+A math inline component is composed of a single-line formula.
+Like its big brother, the [[#math block|math block]], it is rendered in HTML
+using the [[https://www.mathjax.org/|MathJax engine]].
+
+{{{vimwiki
+$ \sum_i a_i^2 = 1 $
+}}}
+
+TODO ... is there a way to escape the `$` used to mark the beginning and end
+ of an inline math component? Is `$` even a concern within an
+ inline formula? If it is, maybe an escape sequence of `\$` would
+ be applicable.
+
+===== Syntax =====
+
+An *inline math* component is represented by the following:
+1. A dollar sign (`U+0024` aka `$`)
+2. One or more characters that are not a dollar sign or [[#line ending|line ending]]
+3. A dollar sign (`U+0024` aka `$`)
+
+*Extra Notes*: The formula within the inline component is trimmed to remove all
+leading and trailing [[#whitespace character|whitespace characters]].
+
+==== Tags ====
+
+A tags component is composed of a series of individual tag elements. It is used
+both to mark various places within a page as well as act as an [[#anchor|anchor]].
+
+{{{vimwiki
+:tag-1:tag-2:
+}}}
+
+===== Syntax =====
+
+A *tags* component is represented by the following:
+1. A [[#tag separator|tag separator]]
+2. A sequence of [[#tag|tag]] separated by [[#tag separator|tag separator]]
+3. A [[#tag separator|tag separator]]
+
+A *tag* is represented by one or more characters that are not a colon,
+[[#whitespace character|whitespace]], or [[#line ending|line ending]]
+
+A *tag separator* is represented by a colon (`U+003A` aka `:`).
+
+TODO ... should tags with whitespace be allowed?
+
+==== Link ====
+
+A link is a crucial inline component of vimwiki and is able to connect pages
+with each other as well as external wikis and resources.
+
+{{{vimwiki
+[[other page|link to another page]]
+[[wiki1:page|link to page in another wiki]]
+[[#some#anchor|link to another location in same page]]
+[[diary:2020-12-23|link to diary entry]]
+{{https://example.com/img.jpg|Transclusion to pull in image}}
+[[https://example.com|{{https://example.com/img.jpg}}]]
+}}}
+
+===== Syntax =====
+
+A *link* is represented by one of the following:
+* A [[#wiki link|wiki link]]
+* An [[#interwiki link|interwiki link]]
+* A [[#diary link|diary link]]
+* An [[#external file link|external file link]]
+* A [[#raw link|raw link]]
+* A [[#transclusion link|transclusion link]]
+
+A *wiki link* is represented by the following:
+1. A [[#link start seq|link start seq]]
+2. At least one of the following (in order):
+ 1. An optional [[#link path|link path]]
+ 2. An optional [[#link anchor|link anchor]]
+3. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]]
+4. A [[#link end seq|link end seq]]
+
+An *interwiki link* is represented by one of the following:
+* An [[#indexed interwiki link|indexed interwiki link]]
+* An [[#named interwiki link|named interwiki link]]
+
+An *indexed interwiki link* is represented by the following:
+1. A [[#link start seq|link start seq]]
+2. The sequence `wiki`
+3. One or more digits (`0-9`), but must be `1` or higher
+4. A colon (`U+003A` aka `:`)
+5. A [[#link path|link path]]
+6. An optional [[#link anchor|link anchor]]
+7. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]]
+8. A [[#link end seq|link end seq]]
+
+A *named interwiki link* is represented by the following:
+1. A [[#link start seq|link start seq]]
+2. The sequence `wn.`
+3. One or more characters that are not a colon (`U+003A` aka `:`) or [[#line ending|line ending]]
+4. A colon (`U+003A` aka `:`)
+5. A [[#link path|link path]]
+6. An optional [[#link anchor|link anchor]]
+7. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]]
+8. A [[#link end seq|link end seq]]
+
+An *external file link* is represented by the following:
+1. A [[#link start seq|link start seq]]
+2. A [[#link uri|link uri]] whose schema is `local` or `file` or has no schema
+ and starts with `//` for an absolute file path
+3. A [[#link path|link path]]
+4. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]]
+5. A [[#link end seq|link end seq]]
+
+A *raw link* is represented by a [[#link uri|link uri]] not found within
+another link type.
+
+A *transclusion link* is represented by the following:
+1. The sequence `{{`
+2. A [[#link uri|link uri]]
+3. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]]
+4. An optional sequence of [[#link key value pair|link key value pairs]],
+ each separated by [[#link inner separator|link inner separator]]
+4. The sequence `}}`
+
+A *link key value pair* is represented by the following:
+1. One or more characters that are not a pipe symbol (`U+007C`
+ aka `|`), equals sign (`U+003D` aka `=`), `}}`, or [[#line ending|line ending]]
+2. An equals sign (`U+003D` aka `=`)
+3. A quotation mark (`U+0022` aka `"`)
+4. One or more characters that are not a pipe symbol (`U+007C`
+ aka `|`), quotation mark (`U+0022` aka `"`), `}}`, or [[#line ending|line ending]]
+5. A quotation mark (`U+0022` aka `"`)
+
+A *link path* is represented by the following:
+1. Does not start with a [[#link anchor prefix|link anchor prefix]]
+2. One or more characters that are not a [[#link anchor prefix|link anchor prefix]],
+ [[#link inner separator|link inner separator]], [[#link end seq|link end seq]], or [[#line ending|line ending]]
+
+A *link description* is represented by one of the following:
+* A [[#link uri|link uri]]
+* One or more characters that are not a [[#link end seq|link end seq]] or [[#line ending|line ending]]
+
+A *link anchor* is represented by a series of pairs, each comprised of
+a [[#link anchor prefix|link anchor prefix]] and a [[#link anchor component|link anchor component]]
+
+A *link anchor component* is represented by one or more characters that are
+not a [[#link anchor prefix|link anchor prefix]], [[#link inner separator|link inner separator]],
+[[#link end seq|link end seq]], or [[#line ending|line ending]]
+
+A *link anchor prefix* is represented by a pound symbol (`U+0023` aka `#`).
+
+A *link inner separator* is represented by a pipe symbol (`U+007C` aka `|`).
+
+A *link start seq* is represented by `[[`.
+
+A *link end seq* is represented by `]]`.
+
+A *link uri* is represented by the following:
+1. Starts with `www.`, `//`, or a [[#link uri scheme|link uri scheme]]
+ a. If starting with `www.`, we add a virtual prefix of `https://` going forward
+ b. If starting with `//`, we add a virtual prefix of `file:/` going forward
+2. One or more characters that are not [[#whitespace character|whitespace characters]]
+ or [[#line ending|line ending]]
+
+A *link uri scheme* is represented by a series of alphanumeric characters
+ (`a-z`, `A-Z`, `0-9`) as well as plus (`U+002B` aka `+`), period (`U+002E`
+ aka `.`), and hyphen (`U+002D` aka `-`). The scheme is terminated by a
+ colon (`U+003A` aka `:`).
+
+*Extra Notes*: Additional validation should be done to ensure that a
+[[#line uri|link uri]] properly adheres to [[https://tools.ietf.org/html/rfc3986|RFC3986]].
+
+==== Decorated Text ====
+
+Decorated text supports a variety of markups across [[#link|links]],
+[[#keyword|keywords]], and [[#text|text]]. It mirrors these different HTML elements:
+* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/strong|<strong>]]
+* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/em|<em>]]
+* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/s|<s>]]
+* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/code|<code>]]
+* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/sup|<sup>]]
+* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/sub|<sub>]]
+
+{{{vimwiki
+*bold*
+_italic_
+*_bold italic_*
+_*bold italic*_
+~~strikeout~~
+`code`
+^superscript^
+,,superscript,,
+}}}
+
+===== Syntax =====
+
+*Decorated text* is represented by one of the following:
+* [[#bold text|Bold text]]
+* [[#italic text|Italic text]]
+* [[#bold italic text|Bold italic text]]
+* [[#strikeout text|Strikeout text]]
+* [[#code text|Code text]]
+* [[#superscript text|Superscript text]]
+* [[#subscript text|Subscript text]]
+
+*Bold text* is represented by the following:
+1. An asterisk (`U+002A` aka `*`)
+2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ an asterisk (`U+002A` aka `*`) is encountered
+3. An asterisk (`U+002A` aka `*`)
+
+*Italic text* is represented by the following:
+1. An underscore (`U+005F` aka `_`)
+2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ an underscore (`U+005F` aka `_`) is encountered
+3. An underscore (`U+005F` aka `_`)
+
+*Bold Italic text* is represented by either of the following:
+* Form 1
+ 1. Sequence `*_`
+ 2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ an `_*` is encountered
+ 3. Sequence `_*`
+* Form 2
+ 1. Sequence `_*`
+ 2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ an `*_` is encountered
+ 3. Sequence `*_`
+
+*Strikeout text* is represented by the following:
+1. Two tilde (`U+007E` aka `~`)
+2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ two tilde (`U+007E` aka `~`) are encountered
+3. Two tilde (`U+007E` aka `~`)
+
+*Code text* is represented by the following:
+1. A backtick or grave accent (`U+0060`)
+2. Any character other than a backtick or [[#line ending|line ending]]
+3. A backtick or grave accent (`U+0060`)
+
+*Superscript text* is represented by the following:
+1. A carrot or circumflex accent (`U+005E` aka `^`)
+2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ a carrot or circumflex accent (`U+005E` aka `^`) is encountered
+3. A carrot or circumflex accent (`U+005E` aka `^`)
+
+*Superscript text* is represented by the following:
+1. Two commas (`U+002C` aka `,`)
+2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until
+ two commas (`U+002C` aka `,`) are encountered
+3. Two commas (`U+002C` aka `,`)
+
+TODO ... Cannot escape a backtick within code, is this something that we'd
+ expect to support?
+
+==== Keyword ====
+
+Keywords are specific, case-sensitive words that have an alternative
+highlighting within vim, but serve no other special purpose.
+
+===== Syntax =====
+
+A *keyword* is represented as one of the following:
+* `DONE`
+* `FIXED`
+* `FIXME`
+* `STARTED`
+* `TODO`
+* `XXX`
+
+==== Text ====
+
+Text is a plain series of characters that have no special stylings applied
+directly, but can be included in other [[#inline components|inline components]].
+
+===== Syntax =====
+
+A *text* is represented as one or more characters until any of the following
+is encountered:
+* [[#inline math|inline math]]
+* [[#tags|tags]]
+* [[#link|link]]
+* [[#decorated text|decorated text]]
+* [[#keyword|keyword]]
+* [[#line ending|line ending]]
+
+=== Comments ===
+
+Separately from [[#block components|block components]] and [[#inline components|inline components]],
+comments are another component available within vimwiki. There are two
+classifications:
+
+1. Line comment in the form of `%%CONTENT`
+2. Multi-line comment in the form of `%%+CONTENT++%`
+
+TODO ... if comments are removed from vimwiki before all other syntax is
+ evaluated, we need to provide an escape mechanism, otherwise the above syntax
+ within inline code will be removed when rendering to HTML, parsing, etc.
+
+TODO ... while vimwiki, pandoc, and vimwiki server do not yet offer this, should
+ we consider an escape sequence to enable leaving a comment within a vimwiki
+ file as normal text. Something like `\%%` would leave as `%%` and `\%%+` would
+ leave as `%%+`. Could use the same conceal vim syntax as with bold and other
+ decorations to head the preceding backslash.
+
+==== Line Comment Syntax ====
+
+1. A *line comment* is represented by the following:
+ 1. The sequence `%%`
+ 2. Any character until [[#line ending|line ending]]
+
+*Extra Notes*: A line comment does not consume a [[#line ending|line ending]],
+only the characters leading up to one. If a line comment is at the beginning of
+a line, it will leave a blank line in its place.
+
+TODO ... today, the documentation describes a line comment as starting at the
+ beginning of a line while pandoc and vimwiki-server support a line comment at
+ any position in a line. What stance do we want to take here? Would we need a
+ compatibility layer for people who might have leveraged %% within the middle of
+ a line not expecting it to be a comment? Or should this be one of the advantages
+ of finally defining a specification in that we can avoid hard backwards
+ compatibility?
+
+==== Multi-line Comment Syntax ====
+
+1. A *multi-line comment* is represented by the following:
+ 1. The sequence `%%+`
+ 2. Any character until the sequence `+%%`
+ 3. The sequence `+%%`
+
+*Extra Notes*: A multi-line comment consumes all characters - including
+[[#line ending|line ending]] - between the surrounding character sequences. It
+can be used to join content in separate lines together. See example below.
+
+{{{vimwiki
+first line%%+
++%%second line
+
+would become
+
+first linesecond line
+}}}
+
+== Parser Details ==
+
+When building a parser for the vimwiki language, certain components may overlap
+in the text that they can match. This means that the order in which components
+are evaluated can affect how a page is perceived.
+
+Additionally, the inclusion of [[#comments|comments]] further complicates the
+process of parsing a file. Comments should remove any content from a file and
+multi-line comments can remove [[#line ending|line ending]] characters.
+
+To that end, a two-pass parser is required to support properly extracting
+comments prior to parsing the full vimwiki syntax:
+1. Parse all comments and remove from input
+2. Parse a page that is full of [[#block components|block components]]
+
+{{{
+Comment =
+ | Multi Line Comment
+ | Line Comment
+Page = (Block Component)+
+Block Component =
+ | Header
+ | Definition List
+ | List
+ | Table
+ | Math Block
+ | Blank Line
+ | Blockquote
+ | Divider
+ | Placeholder
+ | Paragraph
+ | Non-blank Line
+Inline Component =
+ | Math Inline
+ | Tags
+ | Link
+ | Decorated Text
+ | Keyword
+ | Text
+}}}