vimwiki

Personal wiki for vim
git clone https://github.com/vimwiki/vimwiki.git
Log | Files | Refs | README | LICENSE

commit 2c1004500473fae8c9595146c9a28af3f620f94f
parent 8d4cb7f11d5f69c90139cd28fa1a3e31e43390e8
Author: Chip Senkbeil <chip@senkbeil.org>
Date:   Tue, 29 Sep 2020 00:02:41 -0500

Add vimwiki language specification 0.1.0

Diffstat:
Adoc/specification.wiki | 941+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 941 insertions(+), 0 deletions(-)

diff --git a/doc/specification.wiki b/doc/specification.wiki @@ -0,0 +1,941 @@ += Specification = + +The following is a draft specification for the *vimwiki* markup language. It is +provided as a guideline for consistent parsing and rendering of vimwiki +content. + +Similar to the approach taken with [[https://www.commonmark.org/|commonmark]], +this document attempts to specify *vimwiki* syntax unambiguously. It contains +examples of the language and describes the specifics of the language in a way +that tooling can more easily define parsers and generators. + +== Version == + +Current: *0.1.0* + +This specification is versioned in order to provide a stable point of reference +for external tools. [[https://semver.org/|Semantic versioning]] is used to keep +track of the current state of the specification. *MAJOR.MINOR.PATCH* is the +format. + +While the specification remains with a zero major version, the contents of this +specification can change without any requirement to maintain compatibility. +Once a non-zero major version is released, new vimwiki language elements may +only be added in minor releases and any breaking change must be reflected by a +major release. Tweaks in language that do not add, alter, or remove elements of +vimwiki (such as typos, clarifications, etc) may be made with patch releases. + +The specification will continue to remain in a development mode (zero major +mode) until the language has become relatively stable. + +== Language == + +The following will describe the individual elements - hereby described as +components - of the vimwiki language. It will cover each component's purpose +and a clear description of the syntax. + +For details on parsing prescedence, see the [[#Specification#Parser Details|companion section]]. + +=== Primitives === + +In order to define the vimwiki language, we first need to present several +definitions for primitive building blocks used to shape up higher-level +components. Relevant definitions are borrowed from +[[https://spec.commonmark.org/0.29/#characters-and-lines|commonmark characters and lines]]. + +A *character* in vimwiki is a valid UTF-8 code point. + +A *line* is a sequence of zero or more [[#Specification#Language#Primitives#character|characters]] other than a +newline (`U+000A` aka `\n`) or carriage return (`U+000D` aka `\r`), followed by +a [[#Specification#Language#Primitives#line ending|line ending]] or by the end of a file. + +A *line ending* is a newline (`U+000A` aka `\n`), a carriage return (`U+000D` +aka `\r`) not followed by a newline, or a carriage return and a following +newline (`\r\n`). + +A line with no characters, or a line containing only spaces (`U+0020`) or tabs +(`U+0009`), is called a *blank line*. + +A *whitespace character* is a space (`U+0020`) or tab (`U+0009`). + +=== Block Components === + +The vimwiki language has a variety of syntax that represent components within +a page. In this section, we discuss block-level components, which are vimwiki +syntax that are standalone and can comprise one or more entire lines within +a file. + +==== Blockquote ==== + +A blockquote has two different forms available: indented text or text prefixed +with a right angle bracket or chevron `>`. Its purpose is to convey an extended +quotation. + +*Form 1*: + +{{{vimwiki + This is a blockquote + that exists on more than one line +}}} + +*Form 2*: + +{{{vimwiki +> This is a blockquote +> that exists on more than one line +}}} + +===== Syntax ===== + +*Form 1*: + +A blockquote is made of *one* or more [[#indented blockquote line|indented blockquote lines]] + +An *indented blockquote line* is made of the following: +1. Four or more [[#whitespace characters|whitespace characters]] +2. All characters up until a [[#line ending|line ending]] +3. A [[#line ending|line ending]] or end of input + +*Form 2*: + +A blockquote is made of *one* or more [[#chevron blockquote line|chevron blockquote lines]], which may be +separated by zero or more [[#blank line|blank lines]] + +A *chevron blockquote line* is made of the following: +1. Starts at the beginning of a line +2. A prefix right angle bracket or chevron (`U+003E` aka `>`) +3. A [[#whitespace character|whitespace character]] such as ' ' or '\t' +4. All characters up until a [[#line ending|line ending]] +5. A [[#line ending|line ending]] or end of input + +*Extra Notes*: Each line of a blockquote, minus the indentation or chevron, is +trimmed to remove all leading and trailing [[#whitespace character|whitespace characters]]. + +==== Definition List ==== + +A definition list is composed of a series of terms and associated definitions. +It mirrors the functionality available in an [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/dl|HTML Description List]]. + +{{{vimwiki +Term 1:: Some definition +Term 2:: First def +:: Second def +Term3:: +:: Some definition +}}} + +TODO ... should the definitions and terms be raw text? Or can they support + decorations and links? e.g. `Term 1:: *Bold* def with [[link]]` + +===== Syntax ===== + +A *definition list* is composed of one or more [[#term and definitions|term and definitions]] + +A *term and definitions* is composed of one [[#term line|term line]] and +zero or more [[#definition line|definition lines]] + +A *term line* is represented by the following: +1. Starts at the beginning of a line +2. One or more characters before the sequence `::` +3. The sequence `::` +4. An optional one or more characters before [[#line ending|line ending]] + to be the first definition +5. A [[#line ending|line ending]] or end of input + +A *definition line* is represented by the following: +1. Starts at the beginning of a line +2. The sequence `::` +3. One or more characters before [[#line ending|line ending]] +4. A [[#line ending|line ending]] or end of input + +*Extra Notes*: Each term and definition is trimmed to remove all leading and +trailing [[#whitespace character|whitespace characters]]. + +==== Divider ==== + +A divider is composed of a sequence of dashes (`U+002D`). It mirrors the +functionality of the [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/hr|HTML Horizontal Rule]]. + +{{{vimwiki +---- +}}} + +===== Syntax ===== + +A *divider* is represented by the following: +1. Starts at the beginning of a line +2. Four or more dashes (`U+002D`) +3. A [[#line ending|line ending]] or end of input + +==== Header ==== + +A header is composed of some content surrounded by equals sign (`U+003D` aka +`=`) of equal length. It mirrors the functionality of the [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements|HTML Heading]]. + +{{{vimwiki += Some Heading = +== Some Sub Headering == +}}} + +TODO ... should the content of the header be raw text? Or can it support + decorations and links? e.g. `= *Bold* Header with [[link]] =` + Pandoc appears to support decorations, links, and other [[#inline components|inline components]] + +===== Syntax ===== + +A *header* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] (implying + whether or not a header is centered) +3. One or more equal sign (`U+003D`) characters +4. Any character until an equal number of equal sign characters are found +5. An equivalent number of equal sign characters as in step #3 +6. A [[#line ending|line ending]] or end of input + +*Extra Notes*: Each header's content, minus the equals signs, is trimmed to +remove all leading and trailing [[#whitespace character|whitespace characters]]. +For example, `= header =` is equal to `=header=`. + +==== List ==== + +A list is composed of a series list items, each being comprised +of [[#inline components|inline components]] and [[#list|sub lists]]. It mirrors the +functionality available in an [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ul|HTML Unordered List]] +and [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ol|HTML Ordered List]]. + +{{{vimwiki +- List item 1 has *bold* and [[links]] +- List item 2 has content + 1. Ordered sublist + 2. within list item 2 +}}} + +===== Syntax ===== + +A *list* is composed of one or more [[#list item|list items]]. + +A *list item* is composed of a [[#starting list item line|starting list item line]] +and zero or more [[#companion list item line|companion list item lines]]. + +A *starting list item line* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] that signify + the level of indentation to use when understanding if later content is + still associated with this list item, if a new list item is the beginning + of a sublist, if a new list item is a sibling, or if a new list item + is the sibling of a parent +3. One of the following prefixes that determines if the type of list item: + * Hyphen (`U+002D` aka `-`) is for an unordered list + * Asterisk (`U+002A` aka `*`) is for an unordered list + * Pound (`U+0023` aka `#`) is for an ordered list + * One or more digits followed by a period (`U+002E` aka `.`) or + a right parenthesis (`U+0029` aka `)`) is for an ordered list + * One or more lowercase alphabetic (`a-z`) followed by a period + (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is for an + ordered list + * One or more uppercase alphabetic (`A-Z`) followed by a period + (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is for an + ordered list + * One or more lowercase Roman numerals (any of `ivxlcdm`) followed by a + period (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is + for an ordered list + * One or more uppercase Roman numerals (any of `IVXLCDM`) followed by a + period (`U+002E` aka `.`) or a right parenthesis (`U+0029` aka `)`) is + for an ordered list +4. A [[#whitespace character|whitespace character]] +5. An optional [[#todo attribute|todo attribute]] and additional [[#whitespace character|whitespace character]] +6. Zero or more [[#inline components|inline components]] +7. A [[#line ending|line ending]] or end of input + +A *companion list item line* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] that total + as many or more as the [[#starting list item line|starting list item line]] indentation +3. One of the following: + * The start of a new [[#list|list]] (to be treated as a sublist of the + current list item) + * A series of one or more [[#inline components|inline components]] + (to be added to the content of the current list item) followed by + a [[#line ending|line ending]] or end of input + * A [[#blank line|blank line]] if there is guaranteed to still be some + list item content in later lines + +A *todo attribute* is composed of surrounding square brackets in the form +of a left square bracket (`U+005B` aka `[`) and right square bracket +(`U+005D` aka `]`). Inbetween the square brackets is a single character to +denote the todo status and is one of the following: +* A space (`U+0020` aka ' ') meaning 0% or incomplete +* A period (`U+002E` aka `.`) meaning 1-33% progress +* A lowercase o (`U+006F` aka `o`) meaning 34-66% progress +* An uppercase O (`U+004F` aka `O`) meaning 67-99% progress +* An uppercase X (`U+0058` aka `X`) meaning 100% or completed +* A hyphen (`U+002D` aka `-`) meaning rejected + +*Extra Notes*: Because of the ambiguity of alphabetic list items and Roman +numerals, which are composed of specific alphabetic characters in various +arrangements, a list needs to be evaluated across all of its items to determine +if a list item's type is Roman or alphabetic. If all list items begin with +valid Roman numerals, then the types are Roman numerals. If any list item is +not a valid Roman numeral, then all list item type's for those prefixes are +considered to be alphabetic. + +==== Math Block ==== + +A math block is composed of a series of lines representing a mathematical +formula. It is rendered in HTML using the [[https://www.mathjax.org/|MathJax engine]]. + +{{{vimwiki +{{$%align% +\sum_i a_i^2 &= 1 + 1 \\ +&= 2. +}}$ +}}} + +===== Syntax ===== + +A *math block* is composed of a [[#beginning math block line|beginning math block line]], +one or more [[#math block line|math block lines]], and an [[#ending math block line|ending math block line]]. + +A *beginning math block line* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] +3. The sequence `{{$` +4. An optional [[#math environment|math environment]] +5. Zero or more [[#whitespace character|whitespace characters]] +6. A [[#line ending|line ending]] or end of input + +A *math environment* is represented by the following: +1. The percent sign (`U+0025` aka `%`) +2. One or more characters that are not the percent sign or [[#line ending|line ending]] +3. The percent sign (`U+0025` aka `%`) + +A *math block line* is a line found after a [[#beginning math block line|beginning math block line]] +and before an [[#ending math block line|ending math block line]] and is +comprised of zero or more characters followed by a [[#line ending|line ending]]. + +An *ending math block line* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] +3. The sequence `}}$` +4. Zero or more [[#whitespace character|whitespace characters]] +5. A [[#line ending|line ending]] or end of input + +==== Paragraph ==== + +A paragraph is composed of a series of lines representing some content. It +mirrors [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p|HTML Paragraph]]. + +{{{vimwiki +Some paragraph containing +multiple lines including +*bold* and [[links]]. +}}} + +===== Syntax ===== + +A *paragraph* is composed of one or more [[#paragraph line|paragraph lines]]. + +A *paragraph line* is represented by the following: +1. Starts at the beginning of a line +2. Has no [[#whitespace character|whitespace]] indentation +3. Is not any of the following: + * [[#header|header]] + * [[#definition list|definition list]] + * [[#list|list]] + * [[#table|table]] + * [[#preformatted text|preformatted text]] + * [[#math block|math block]] + * [[#blank link|blank line]] + * [[#blockquote|blockquote]] + * [[#divider|divider]] + * [[#placeholder|placeholder]] +4. One or more [[#inline components|inline components]] +5. A [[#line ending|line ending]] or end of input + +TODO ... should we combine [[#non-blank line|non-blank line]] with paragraph + by having a paragraph trim all leading [[#whitespace character|whitespace]]? + +==== Placeholder ==== + +A placeholder is composed of an identifier and some information. Its purpose +is to provide metadata for use in rendering vimwiki to HTML and populating +portions of the HTML template used with a vimwiki page. + +{{{vimwiki +%title Some title +%nohtml +%template my_template +%date 2020-12-23 +}}} + +===== Syntax ===== + + +A *placeholder* is represented by one of the following: +* A [[#title placeholder|title placeholder]] +* A [[#nohtml placeholder|nohtml placeholder]] +* A [[#template placeholder|template placeholder]] +* A [[#date placeholder|date placeholder]] + +A *title placeholder* is represented by the following: +1. Starts at the beginning of a line +2. A percent sign (`U+0025` aka `%`) +3. The sequence `title` +4. One or more [[#whitespace character|whitespace characters]] +5. Any sequence of [[#whitespace character|whitespace characters]] and + non-whitespace characters leading up to a [[#line ending|line ending]] +6. A [[#line ending|line ending]] or end of input + +A *nohtml placeholder* is represented by the following: +1. Starts at the beginning of a line +2. A percent sign (`U+0025` aka `%`) +3. The sequence `nohtml` +4. A [[#line ending|line ending]] or end of input + +A *template placeholder* is represented by the following: +1. Starts at the beginning of a line +2. A percent sign (`U+0025` aka `%`) +3. The sequence `template` +4. One or more [[#whitespace character|whitespace characters]] +5. Any sequence of [[#whitespace character|whitespace characters]] and + non-whitespace characters leading up to a [[#line ending|line ending]] +6. A [[#line ending|line ending]] or end of input + +A *date placeholder* is represented by the following: +1. Starts at the beginning of a line +2. A percent sign (`U+0025` aka `%`) +3. The sequence `date` +4. One or more [[#whitespace character|whitespace characters]] +5. A date string in the format `YYYY-MM-DD` where `YYYY` symbolizes a + four-digit year (e.g. `1990`), `MM` symbolizes a two-digit month (e.g. + `04`), and `DD` symbolizes a two-digit day (e.g. `23`) +6. A [[#line ending|line ending]] or end of input + +==== Preformatted Text ==== + +A preformatted text block is composed of a series of lines representing some +content, usually related to a programming language. It mirrors +[[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/pre|HTML Preformatted Text]]. + +{{{vimwiki +{{{rust +fn my_func() -> u32 { + 1 + 2 +} +\}}} +}}} + +TODO ... wrapping a preformatted text block with another preformatted text + isn't possible right now due to the use of `}}}` matching. We would + need some sort of escape like `\}}}` or `\{{{` that could be used to + avoid matching legit syntax but still provide the literal text + upon rendering. + +===== Syntax ===== + +A *preformatted text* is composed of a [[#beginning preformatted text line|beginning preformatted text line]], +one or more [[#preformatted text line|preformatted text lines]], and an [[#ending preformatted text line|ending preformatted text line]]. + +A *beginning preformatted text line* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] +3. The sequence `{{{` +4. An optional [[#preformatted language identifier|preformatted language identifier]] +5. An optional [[#preformatted metadata list|preformatted metadata list]] +6. Zero or more [[#whitespace character|whitespace characters]] +7. A [[#line ending|line ending]] or end of input + +A *preformatted language identifier* is represented by the following: +1. One or more characters that are not equals sign (`U+003D` aka `=`) +2. An optional semicolon (`U+003B` aka `;`) + +A *preformatted metadata list* is composed of one or more +[[#preformatted metadata list items|preformatted metadata list items]] separated by semicolons (`U+003B` aka `;`). + +A *preformatted metadata list item* is represented by the following: +1. One or more characters leading up to an equals sign (`U+003D` aka `=`), + not including a [[#line ending|line ending]] +2. An equals sign (`U+003D` aka `=`) +3. A quotation mark (`U+0022` aka `"`) +4. One or more characters leading up to a quotation mark (`U+0022` aka `"`) +5. A quotation mark (`U+0022` aka `"`) + +A *preformatted text line* is a line found after a [[#beginning preformatted text line|beginning preformatted text line]] +and before an [[#ending preformatted text line|ending preformatted text line]] and is +comprised of zero or more characters followed by a [[#line ending|line ending]]. + +An *ending preformatted text line* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] +3. The sequence `}}}` +4. Zero or more [[#whitespace character|whitespace characters]] +5. A [[#line ending|line ending]] or end of input + +==== Table ==== + +A table is composed of a series of rows containing various other components. It +mirrors [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/table|HTML Table]]. + +{{{vimwiki +| Year | Temperature (low) | Temperature (high) | Temperature (avg) | +|------|-------------------|--------------------------|-------------------| +| 1990 | *50* degrees | 90 according to [[link]] | 72 | +| \/ | 45 degrees | > | 80 | +| \/ | \/ | > | 60 | +| 2000 | > | > | > | +}}} + +===== Syntax ===== + +A *table* is composed of one or more [[#row|rows]] with the indentation of +the first row indicating whether the table is centered (is indented) or not. + +A *row* is represented by one of the following: +* A [[#divider row|divider row]] +* A [[#content row|content row]] + +A *divider row* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] +3. A sequence of pairs comprised of a [[#cell boundary|cell boundary]] and + one or more hyphens (`U+002D` aka `-`) +4. A final [[#cell boundary|cell boundary]] +5. A [[#line ending|line ending]] or end of input + +A *content row* is represented by the following: +1. Starts at the beginning of a line +2. Zero or more [[#whitespace character|whitespace characters]] +3. A sequence of pairs comprised of a [[#cell boundary|cell boundary]] and a [[#cell|cell]] +4. A final [[#cell boundary|cell boundary]] +5. A [[#line ending|line ending]] or end of input + +A *cell* is represented by one of the following: +* A [[#span above cell|span above cell]] +* A [[#span left cell|span left cell]] +* A [[#content cell|content cell]] + +A *span above cell* is represented by the following: +1. Zero or more [[#whitespace character|whitespace characters]] +2. Sequence `\/` +3. Zero or more [[#whitespace character|whitespace characters]] + +A *span left cell* is represented by the following: +1. Zero or more [[#whitespace character|whitespace characters]] +2. Sequence `>` +3. Zero or more [[#whitespace character|whitespace characters]] + +A *content cell* is represented by the following: +1. Zero or more [[#whitespace character|whitespace characters]] +2. One or more [[#inline components|inline components]] not comprised of `|` +3. Zero or more [[#whitespace character|whitespace characters]] + +A *cell boundary* is represented by the pipe character (`U+007C` aka `|`). + +==== Non-blank Line ==== + +A non-blank line is a single line that is not a paragraph. It is used to +capture text not represented by any other component. + +{{{vimwiki + Some other text containing *bold* and [[links]]. +}}} + +===== Syntax ===== + +A *non-blank line* is represented by the following: +1. Between one and three [[#whitespace character|whitespace characters]] +2. One or more [[#inline components|inline components]] +3. A [[#line ending|line ending]] or end of input + +TODO ... should we combine [[#non-blank line|non-blank line]] with paragraph + by having a paragraph trim all leading [[#whitespace character|whitespace]]? + +=== Inline Components === + +The vimwiki language also has a variety of syntax that can be used within a +line on a page. These are referred to as *inline components* and can be found +within a variety of [[#block components|block components]] as well as nested +within other inline components. + +==== Math Inline ==== + +A math inline component is composed of a single-line formula. +Like its big brother, the [[#math block|math block]], it is rendered in HTML +using the [[https://www.mathjax.org/|MathJax engine]]. + +{{{vimwiki +$ \sum_i a_i^2 = 1 $ +}}} + +TODO ... is there a way to escape the `$` used to mark the beginning and end + of an inline math component? Is `$` even a concern within an + inline formula? If it is, maybe an escape sequence of `\$` would + be applicable. + +===== Syntax ===== + +An *inline math* component is represented by the following: +1. A dollar sign (`U+0024` aka `$`) +2. One or more characters that are not a dollar sign or [[#line ending|line ending]] +3. A dollar sign (`U+0024` aka `$`) + +*Extra Notes*: The formula within the inline component is trimmed to remove all +leading and trailing [[#whitespace character|whitespace characters]]. + +==== Tags ==== + +A tags component is composed of a series of individual tag elements. It is used +both to mark various places within a page as well as act as an [[#anchor|anchor]]. + +{{{vimwiki +:tag-1:tag-2: +}}} + +===== Syntax ===== + +A *tags* component is represented by the following: +1. A [[#tag separator|tag separator]] +2. A sequence of [[#tag|tag]] separated by [[#tag separator|tag separator]] +3. A [[#tag separator|tag separator]] + +A *tag* is represented by one or more characters that are not a colon, +[[#whitespace character|whitespace]], or [[#line ending|line ending]] + +A *tag separator* is represented by a colon (`U+003A` aka `:`). + +TODO ... should tags with whitespace be allowed? + +==== Link ==== + +A link is a crucial inline component of vimwiki and is able to connect pages +with each other as well as external wikis and resources. + +{{{vimwiki +[[other page|link to another page]] +[[wiki1:page|link to page in another wiki]] +[[#some#anchor|link to another location in same page]] +[[diary:2020-12-23|link to diary entry]] +{{https://example.com/img.jpg|Transclusion to pull in image}} +[[https://example.com|{{https://example.com/img.jpg}}]] +}}} + +===== Syntax ===== + +A *link* is represented by one of the following: +* A [[#wiki link|wiki link]] +* An [[#interwiki link|interwiki link]] +* A [[#diary link|diary link]] +* An [[#external file link|external file link]] +* A [[#raw link|raw link]] +* A [[#transclusion link|transclusion link]] + +A *wiki link* is represented by the following: +1. A [[#link start seq|link start seq]] +2. At least one of the following (in order): + 1. An optional [[#link path|link path]] + 2. An optional [[#link anchor|link anchor]] +3. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]] +4. A [[#link end seq|link end seq]] + +An *interwiki link* is represented by one of the following: +* An [[#indexed interwiki link|indexed interwiki link]] +* An [[#named interwiki link|named interwiki link]] + +An *indexed interwiki link* is represented by the following: +1. A [[#link start seq|link start seq]] +2. The sequence `wiki` +3. One or more digits (`0-9`), but must be `1` or higher +4. A colon (`U+003A` aka `:`) +5. A [[#link path|link path]] +6. An optional [[#link anchor|link anchor]] +7. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]] +8. A [[#link end seq|link end seq]] + +A *named interwiki link* is represented by the following: +1. A [[#link start seq|link start seq]] +2. The sequence `wn.` +3. One or more characters that are not a colon (`U+003A` aka `:`) or [[#line ending|line ending]] +4. A colon (`U+003A` aka `:`) +5. A [[#link path|link path]] +6. An optional [[#link anchor|link anchor]] +7. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]] +8. A [[#link end seq|link end seq]] + +An *external file link* is represented by the following: +1. A [[#link start seq|link start seq]] +2. A [[#link uri|link uri]] whose schema is `local` or `file` or has no schema + and starts with `//` for an absolute file path +3. A [[#link path|link path]] +4. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]] +5. A [[#link end seq|link end seq]] + +A *raw link* is represented by a [[#link uri|link uri]] not found within +another link type. + +A *transclusion link* is represented by the following: +1. The sequence `{{` +2. A [[#link uri|link uri]] +3. An optional [[#link inner separator|link inner separator]] and [[#link description|link description]] +4. An optional sequence of [[#link key value pair|link key value pairs]], + each separated by [[#link inner separator|link inner separator]] +4. The sequence `}}` + +A *link key value pair* is represented by the following: +1. One or more characters that are not a pipe symbol (`U+007C` + aka `|`), equals sign (`U+003D` aka `=`), `}}`, or [[#line ending|line ending]] +2. An equals sign (`U+003D` aka `=`) +3. A quotation mark (`U+0022` aka `"`) +4. One or more characters that are not a pipe symbol (`U+007C` + aka `|`), quotation mark (`U+0022` aka `"`), `}}`, or [[#line ending|line ending]] +5. A quotation mark (`U+0022` aka `"`) + +A *link path* is represented by the following: +1. Does not start with a [[#link anchor prefix|link anchor prefix]] +2. One or more characters that are not a [[#link anchor prefix|link anchor prefix]], + [[#link inner separator|link inner separator]], [[#link end seq|link end seq]], or [[#line ending|line ending]] + +A *link description* is represented by one of the following: +* A [[#link uri|link uri]] +* One or more characters that are not a [[#link end seq|link end seq]] or [[#line ending|line ending]] + +A *link anchor* is represented by a series of pairs, each comprised of +a [[#link anchor prefix|link anchor prefix]] and a [[#link anchor component|link anchor component]] + +A *link anchor component* is represented by one or more characters that are +not a [[#link anchor prefix|link anchor prefix]], [[#link inner separator|link inner separator]], +[[#link end seq|link end seq]], or [[#line ending|line ending]] + +A *link anchor prefix* is represented by a pound symbol (`U+0023` aka `#`). + +A *link inner separator* is represented by a pipe symbol (`U+007C` aka `|`). + +A *link start seq* is represented by `[[`. + +A *link end seq* is represented by `]]`. + +A *link uri* is represented by the following: +1. Starts with `www.`, `//`, or a [[#link uri scheme|link uri scheme]] + a. If starting with `www.`, we add a virtual prefix of `https://` going forward + b. If starting with `//`, we add a virtual prefix of `file:/` going forward +2. One or more characters that are not [[#whitespace character|whitespace characters]] + or [[#line ending|line ending]] + +A *link uri scheme* is represented by a series of alphanumeric characters + (`a-z`, `A-Z`, `0-9`) as well as plus (`U+002B` aka `+`), period (`U+002E` + aka `.`), and hyphen (`U+002D` aka `-`). The scheme is terminated by a + colon (`U+003A` aka `:`). + +*Extra Notes*: Additional validation should be done to ensure that a +[[#line uri|link uri]] properly adheres to [[https://tools.ietf.org/html/rfc3986|RFC3986]]. + +==== Decorated Text ==== + +Decorated text supports a variety of markups across [[#link|links]], +[[#keyword|keywords]], and [[#text|text]]. It mirrors these different HTML elements: +* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/strong|<strong>]] +* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/em|<em>]] +* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/s|<s>]] +* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/code|<code>]] +* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/sup|<sup>]] +* [[https://developer.mozilla.org/en-US/docs/Web/HTML/Element/sub|<sub>]] + +{{{vimwiki +*bold* +_italic_ +*_bold italic_* +_*bold italic*_ +~~strikeout~~ +`code` +^superscript^ +,,superscript,, +}}} + +===== Syntax ===== + +*Decorated text* is represented by one of the following: +* [[#bold text|Bold text]] +* [[#italic text|Italic text]] +* [[#bold italic text|Bold italic text]] +* [[#strikeout text|Strikeout text]] +* [[#code text|Code text]] +* [[#superscript text|Superscript text]] +* [[#subscript text|Subscript text]] + +*Bold text* is represented by the following: +1. An asterisk (`U+002A` aka `*`) +2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + an asterisk (`U+002A` aka `*`) is encountered +3. An asterisk (`U+002A` aka `*`) + +*Italic text* is represented by the following: +1. An underscore (`U+005F` aka `_`) +2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + an underscore (`U+005F` aka `_`) is encountered +3. An underscore (`U+005F` aka `_`) + +*Bold Italic text* is represented by either of the following: +* Form 1 + 1. Sequence `*_` + 2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + an `_*` is encountered + 3. Sequence `_*` +* Form 2 + 1. Sequence `_*` + 2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + an `*_` is encountered + 3. Sequence `*_` + +*Strikeout text* is represented by the following: +1. Two tilde (`U+007E` aka `~`) +2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + two tilde (`U+007E` aka `~`) are encountered +3. Two tilde (`U+007E` aka `~`) + +*Code text* is represented by the following: +1. A backtick or grave accent (`U+0060`) +2. Any character other than a backtick or [[#line ending|line ending]] +3. A backtick or grave accent (`U+0060`) + +*Superscript text* is represented by the following: +1. A carrot or circumflex accent (`U+005E` aka `^`) +2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + a carrot or circumflex accent (`U+005E` aka `^`) is encountered +3. A carrot or circumflex accent (`U+005E` aka `^`) + +*Superscript text* is represented by the following: +1. Two commas (`U+002C` aka `,`) +2. One or more [[#link|links]], [[#keyword|keywords]], or [[#text|text]] until + two commas (`U+002C` aka `,`) are encountered +3. Two commas (`U+002C` aka `,`) + +TODO ... Cannot escape a backtick within code, is this something that we'd + expect to support? + +==== Keyword ==== + +Keywords are specific, case-sensitive words that have an alternative +highlighting within vim, but serve no other special purpose. + +===== Syntax ===== + +A *keyword* is represented as one of the following: +* `DONE` +* `FIXED` +* `FIXME` +* `STARTED` +* `TODO` +* `XXX` + +==== Text ==== + +Text is a plain series of characters that have no special stylings applied +directly, but can be included in other [[#inline components|inline components]]. + +===== Syntax ===== + +A *text* is represented as one or more characters until any of the following +is encountered: +* [[#inline math|inline math]] +* [[#tags|tags]] +* [[#link|link]] +* [[#decorated text|decorated text]] +* [[#keyword|keyword]] +* [[#line ending|line ending]] + +=== Comments === + +Separately from [[#block components|block components]] and [[#inline components|inline components]], +comments are another component available within vimwiki. There are two +classifications: + +1. Line comment in the form of `%%CONTENT` +2. Multi-line comment in the form of `%%+CONTENT++%` + +TODO ... if comments are removed from vimwiki before all other syntax is + evaluated, we need to provide an escape mechanism, otherwise the above syntax + within inline code will be removed when rendering to HTML, parsing, etc. + +TODO ... while vimwiki, pandoc, and vimwiki server do not yet offer this, should + we consider an escape sequence to enable leaving a comment within a vimwiki + file as normal text. Something like `\%%` would leave as `%%` and `\%%+` would + leave as `%%+`. Could use the same conceal vim syntax as with bold and other + decorations to head the preceding backslash. + +==== Line Comment Syntax ==== + +1. A *line comment* is represented by the following: + 1. The sequence `%%` + 2. Any character until [[#line ending|line ending]] + +*Extra Notes*: A line comment does not consume a [[#line ending|line ending]], +only the characters leading up to one. If a line comment is at the beginning of +a line, it will leave a blank line in its place. + +TODO ... today, the documentation describes a line comment as starting at the + beginning of a line while pandoc and vimwiki-server support a line comment at + any position in a line. What stance do we want to take here? Would we need a + compatibility layer for people who might have leveraged %% within the middle of + a line not expecting it to be a comment? Or should this be one of the advantages + of finally defining a specification in that we can avoid hard backwards + compatibility? + +==== Multi-line Comment Syntax ==== + +1. A *multi-line comment* is represented by the following: + 1. The sequence `%%+` + 2. Any character until the sequence `+%%` + 3. The sequence `+%%` + +*Extra Notes*: A multi-line comment consumes all characters - including +[[#line ending|line ending]] - between the surrounding character sequences. It +can be used to join content in separate lines together. See example below. + +{{{vimwiki +first line%%+ ++%%second line + +would become + +first linesecond line +}}} + +== Parser Details == + +When building a parser for the vimwiki language, certain components may overlap +in the text that they can match. This means that the order in which components +are evaluated can affect how a page is perceived. + +Additionally, the inclusion of [[#comments|comments]] further complicates the +process of parsing a file. Comments should remove any content from a file and +multi-line comments can remove [[#line ending|line ending]] characters. + +To that end, a two-pass parser is required to support properly extracting +comments prior to parsing the full vimwiki syntax: +1. Parse all comments and remove from input +2. Parse a page that is full of [[#block components|block components]] + +{{{ +Comment = + | Multi Line Comment + | Line Comment +Page = (Block Component)+ +Block Component = + | Header + | Definition List + | List + | Table + | Math Block + | Blank Line + | Blockquote + | Divider + | Placeholder + | Paragraph + | Non-blank Line +Inline Component = + | Math Inline + | Tags + | Link + | Decorated Text + | Keyword + | Text +}}}