//! GFM: table occurs in the [flow][] content type.
//!
//! ## Grammar
//!
//! Tables form with the following BNF
//! (see [construct][crate::construct] for character groups):
//!
//! ```bnf
//! gfm_table ::= gfm_table_head 0*(eol gfm_table_body_row)
//!
//! ; Restriction: both rows must have the same number of cells.
//! gfm_table_head ::= gfm_table_row eol gfm_table_delimiter_row
//!
//! gfm_table_row ::= ['|'] gfm_table_cell 0*('|' gfm_table_cell) ['|'] *space_or_tab
//! gfm_table_cell ::= *space_or_tab gfm_table_text *space_or_tab
//! gfm_table_text ::= 0*(line - '\\' - '|' | '\\' ['\\' | '|'])
//
//! gfm_table_delimiter_row ::= ['|'] gfm_table_delimiter_cell 0*('|' gfm_table_delimiter_cell) ['|'] *space_or_tab
//! gfm_table_delimiter_cell ::= *space_or_tab gfm_table_delimiter_value *space_or_tab
//! gfm_table_delimiter_value ::= [':'] 1*'-' [':']
//! ```
//!
//! As this construct occurs in flow, like all flow constructs, it must be
//! followed by an eol (line ending) or eof (end of file).
//!
//! The above grammar shows that basically anything can be a cell or a row.
//! The main thing that makes something a row, is that it occurs directly before
//! or after a delimiter row, or after another row.
//!
//! It is not required for a table to have a body: it can end right after the
//! delimiter row.
//!
//! Each column can be marked with an alignment.
//! The alignment marker is a colon (`:`) used before and/or after delimiter row
//! filler.
//! To illustrate:
//!
//! ```markdown
//! | none | left | right | center |
//! | ---- | :--- | ----: | :----: |
//! ```
//!
//! The number of cells in the delimiter row, is the number of columns of the
//! table.
//! Only the head row is required to have the same number of cells.
//! Body rows are not required to have a certain number of cells.
//! For body rows that have less cells than the number of columns of the table,
//! empty cells are injected.
//! When a row has more cells than the number of columns of the table, the
//! superfluous cells are dropped.
//! To illustrate:
//!
//! ```markdown
//! | a | b |
//! | - | - |
//! | c |
//! | d | e | f |
//! ```
//!
//! Yields:
//!
//! ```html
//!
//!
//!
//!
a
//!
b
//!
//!
//!
//!
//!
c
//!
//!
//!
//!
d
//!
e
//!
//!
//!
//! ```
//!
//! Each cell’s text is interpreted as the [text][] content type.
//! That means that it can include constructs such as [attention][attention].
//!
//! The grammar for cells prohibits the use of `|` in them.
//! To use pipes in cells, encode them as a character reference or character
//! escape: `|` (or `|`, `|`, `|`, `|`) or
//! `\|`.
//!
//! Escapes will typically work, but they are not supported in
//! [code (text)][raw_text] (and the math (text) extension).
//! To work around this, GitHub came up with a rather weird “trick”.
//! When inside a table cell *and* inside code, escaped pipes *are* decoded.
//! To illustrate:
//!
//! ```markdown
//! | Name | Character |
//! | - | - |
//! | Left curly brace | `{` |
//! | Pipe | `\|` |
//! | Right curly brace | `}` |
//! ```
//!
//! Yields:
//!
//! ```html
//!
//!
//!
//!
Name
//!
Character
//!
//!
//!
//!
//!
Left curly brace
//!
{
//!
//!
//!
Pipe
//!
|
//!
//!
//!
Right curly brace
//!
}
//!
//!
//!
//! ```
//!
//! > 👉 **Note**: no other character can be escaped like this.
//! > Escaping pipes in code does not work when not inside a table, either.
//!
//! ## HTML
//!
//! GFM tables relate to several HTML elements: `
`, ``, `
`,
//! `
`, ``, and `
`.
//! See
//! [*§ 4.9.1 The `table` element*][html_table],
//! [*§ 4.9.5 The `tbody` element*][html_tbody],
//! [*§ 4.9.9 The `td` element*][html_td],
//! [*§ 4.9.10 The `th` element*][html_th],
//! [*§ 4.9.6 The `thead` element*][html_thead], and
//! [*§ 4.9.8 The `tr` element*][html_tr]
//! in the HTML spec for more info.
//!
//! If the the alignment of a column is left, right, or center, a deprecated
//! `align` attribute is added to each `
` and `
` element belonging to
//! that column.
//! That attribute is interpreted by browsers as if a CSS `text-align` property
//! was included, with its value set to that same keyword.
//!
//! ## Recommendation
//!
//! When authoring markdown with GFM tables, it’s recommended to *always* put
//! pipes around cells.
//! Without them, it can be hard to infer whether the table will work, how many
//! columns there are, and which column you are currently editing.
//!
//! It is recommended to not use many columns, as it results in very long lines,
//! making it hard to infer which column you are currently editing.
//!
//! For larger tables, particularly when cells vary in size, it is recommended
//! *not* to manually “pad” cell text.
//! While it can look better, it results in a lot of time spent realigning
//! everything when a new, longer cell is added or the longest cell removed, as
//! every row then must be changed.
//! Other than costing time, it also causes large diffs in Git.
//!
//! To illustrate, when authoring large tables, it is discouraged to pad cells
//! like this:
//!
//! ```markdown
//! | Alpha bravo charlie | delta |
//! | ------------------- | -----------------: |
//! | Echo | Foxtrot golf hotel |
//! ```
//!
//! Instead, use single spaces (and single filler dashes):
//!
//! ```markdown
//! | Alpha bravo charlie | delta |
//! | - | -: |
//! | Echo | Foxtrot golf hotel |
//! ```
//!
//! ## Bugs
//!
//! GitHub’s own algorithm to parse tables contains a bug.
//! This bug is not present in this project.
//! The issue relating to tables is:
//!
//! * [GFM tables: escaped escapes are incorrectly treated as escapes](https://github.com/github/cmark-gfm/issues/277)
//!
//! ## Tokens
//!
//! * [`GfmTable`][Name::GfmTable]
//! * [`GfmTableBody`][Name::GfmTableBody]
//! * [`GfmTableCell`][Name::GfmTableCell]
//! * [`GfmTableCellDivider`][Name::GfmTableCellDivider]
//! * [`GfmTableCellText`][Name::GfmTableCellText]
//! * [`GfmTableDelimiterCell`][Name::GfmTableDelimiterCell]
//! * [`GfmTableDelimiterCellValue`][Name::GfmTableDelimiterCellValue]
//! * [`GfmTableDelimiterFiller`][Name::GfmTableDelimiterFiller]
//! * [`GfmTableDelimiterMarker`][Name::GfmTableDelimiterMarker]
//! * [`GfmTableDelimiterRow`][Name::GfmTableDelimiterRow]
//! * [`GfmTableHead`][Name::GfmTableHead]
//! * [`GfmTableRow`][Name::GfmTableRow]
//! * [`LineEnding`][Name::LineEnding]
//!
//! ## References
//!
//! * [`micromark-extension-gfm-table`](https://github.com/micromark/micromark-extension-gfm-table)
//! * [*§ 4.10 Tables (extension)* in `GFM`](https://github.github.com/gfm/#tables-extension-)
//!
//! [flow]: crate::construct::flow
//! [text]: crate::construct::text
//! [attention]: crate::construct::attention
//! [raw_text]: crate::construct::raw_text
//! [html_table]: https://html.spec.whatwg.org/multipage/tables.html#the-table-element
//! [html_tbody]: https://html.spec.whatwg.org/multipage/tables.html#the-tbody-element
//! [html_td]: https://html.spec.whatwg.org/multipage/tables.html#the-td-element
//! [html_th]: https://html.spec.whatwg.org/multipage/tables.html#the-th-element
//! [html_thead]: https://html.spec.whatwg.org/multipage/tables.html#the-thead-element
//! [html_tr]: https://html.spec.whatwg.org/multipage/tables.html#the-tr-element
use crate::construct::partial_space_or_tab::{space_or_tab, space_or_tab_min_max};
use crate::event::{Content, Event, Kind, Link, Name};
use crate::resolve::Name as ResolveName;
use crate::state::{Name as StateName, State};
use crate::subtokenize::Subresult;
use crate::tokenizer::Tokenizer;
use crate::util::{constant::TAB_SIZE, skip::opt_back as skip_opt_back};
use alloc::{string::String, vec};
/// Start of a GFM table.
///
/// If there is a valid table row or table head before, then we try to parse
/// another row.
/// Otherwise, we try to parse a head.
///
/// ```markdown
/// > | | a |
/// ^
/// | | - |
/// > | | b |
/// ^
/// ```
pub fn start(tokenizer: &mut Tokenizer) -> State {
if tokenizer.parse_state.options.constructs.gfm_table {
if !tokenizer.pierce
&& !tokenizer.events.is_empty()
&& matches!(
tokenizer.events[skip_opt_back(
&tokenizer.events,
tokenizer.events.len() - 1,
&[Name::LineEnding, Name::SpaceOrTab],
)]
.name,
Name::GfmTableHead | Name::GfmTableRow
)
{
State::Retry(StateName::GfmTableBodyRowStart)
} else {
State::Retry(StateName::GfmTableHeadRowBefore)
}
} else {
State::Nok
}
}
/// Before table head row.
///
/// ```markdown
/// > | | a |
/// ^
/// | | - |
/// | | b |
/// ```
pub fn head_row_before(tokenizer: &mut Tokenizer) -> State {
tokenizer.enter(Name::GfmTableHead);
tokenizer.enter(Name::GfmTableRow);
if matches!(tokenizer.current, Some(b'\t' | b' ')) {
tokenizer.attempt(State::Next(StateName::GfmTableHeadRowStart), State::Nok);
State::Retry(space_or_tab_min_max(
tokenizer,
0,
if tokenizer.parse_state.options.constructs.code_indented {
TAB_SIZE - 1
} else {
usize::MAX
},
))
} else {
State::Retry(StateName::GfmTableHeadRowStart)
}
}
/// Before table head row, after whitespace.
///
/// ```markdown
/// > | | a |
/// ^
/// | | - |
/// | | b |
/// ```
pub fn head_row_start(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
// 4+ spaces.
Some(b'\t' | b' ') => State::Nok,
Some(b'|') => State::Retry(StateName::GfmTableHeadRowBreak),
_ => {
tokenizer.tokenize_state.seen = true;
State::Retry(StateName::GfmTableHeadRowBreak)
}
}
}
/// At break in table head row.
///
/// ```markdown
/// > | | a |
/// ^
/// ^
/// ^
/// | | - |
/// | | b |
/// ```
pub fn head_row_break(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
None => {
tokenizer.tokenize_state.seen = false;
State::Nok
}
Some(b'\n') => {
// Feel free to interrupt:
tokenizer.interrupt = true;
tokenizer.exit(Name::GfmTableRow);
tokenizer.enter(Name::LineEnding);
tokenizer.consume();
tokenizer.exit(Name::LineEnding);
State::Next(StateName::GfmTableHeadDelimiterStart)
}
Some(b'\t' | b' ') => {
tokenizer.attempt(State::Next(StateName::GfmTableHeadRowBreak), State::Nok);
State::Retry(space_or_tab(tokenizer))
}
_ => {
// Whether a delimiter was seen.
if tokenizer.tokenize_state.seen {
tokenizer.tokenize_state.seen = false;
// Header cell count.
tokenizer.tokenize_state.size += 1;
}
if tokenizer.current == Some(b'|') {
tokenizer.enter(Name::GfmTableCellDivider);
tokenizer.consume();
tokenizer.exit(Name::GfmTableCellDivider);
// Whether a delimiter was seen.
tokenizer.tokenize_state.seen = true;
State::Next(StateName::GfmTableHeadRowBreak)
} else {
// Anything else is cell data.
tokenizer.enter(Name::Data);
State::Retry(StateName::GfmTableHeadRowData)
}
}
}
}
/// In table head row data.
///
/// ```markdown
/// > | | a |
/// ^
/// | | - |
/// | | b |
/// ```
pub fn head_row_data(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
None | Some(b'\t' | b'\n' | b' ' | b'|') => {
tokenizer.exit(Name::Data);
State::Retry(StateName::GfmTableHeadRowBreak)
}
_ => {
let name = if tokenizer.current == Some(b'\\') {
StateName::GfmTableHeadRowEscape
} else {
StateName::GfmTableHeadRowData
};
tokenizer.consume();
State::Next(name)
}
}
}
/// In table head row escape.
///
/// ```markdown
/// > | | a\-b |
/// ^
/// | | ---- |
/// | | c |
/// ```
pub fn head_row_escape(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'\\' | b'|') => {
tokenizer.consume();
State::Next(StateName::GfmTableHeadRowData)
}
_ => State::Retry(StateName::GfmTableHeadRowData),
}
}
/// Before delimiter row.
///
/// ```markdown
/// | | a |
/// > | | - |
/// ^
/// | | b |
/// ```
pub fn head_delimiter_start(tokenizer: &mut Tokenizer) -> State {
// Reset `interrupt`.
tokenizer.interrupt = false;
if tokenizer.lazy || tokenizer.pierce {
State::Nok
} else {
tokenizer.enter(Name::GfmTableDelimiterRow);
// Track if we’ve seen a `:` or `|`.
tokenizer.tokenize_state.seen = false;
match tokenizer.current {
Some(b'\t' | b' ') => {
tokenizer.attempt(
State::Next(StateName::GfmTableHeadDelimiterBefore),
State::Next(StateName::GfmTableHeadDelimiterNok),
);
State::Retry(space_or_tab_min_max(
tokenizer,
0,
if tokenizer.parse_state.options.constructs.code_indented {
TAB_SIZE - 1
} else {
usize::MAX
},
))
}
_ => State::Retry(StateName::GfmTableHeadDelimiterBefore),
}
}
}
/// Before delimiter row, after optional whitespace.
///
/// Reused when a `|` is found later, to parse another cell.
///
/// ```markdown
/// | | a |
/// > | | - |
/// ^
/// | | b |
/// ```
pub fn head_delimiter_before(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'-' | b':') => State::Retry(StateName::GfmTableHeadDelimiterValueBefore),
Some(b'|') => {
tokenizer.tokenize_state.seen = true;
// If we start with a pipe, we open a cell marker.
tokenizer.enter(Name::GfmTableCellDivider);
tokenizer.consume();
tokenizer.exit(Name::GfmTableCellDivider);
State::Next(StateName::GfmTableHeadDelimiterCellBefore)
}
// More whitespace / empty row not allowed at start.
_ => State::Retry(StateName::GfmTableHeadDelimiterNok),
}
}
/// After `|`, before delimiter cell.
///
/// ```markdown
/// | | a |
/// > | | - |
/// ^
/// ```
pub fn head_delimiter_cell_before(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'\t' | b' ') => {
tokenizer.attempt(
State::Next(StateName::GfmTableHeadDelimiterValueBefore),
State::Nok,
);
State::Retry(space_or_tab(tokenizer))
}
_ => State::Retry(StateName::GfmTableHeadDelimiterValueBefore),
}
}
/// Before delimiter cell value.
///
/// ```markdown
/// | | a |
/// > | | - |
/// ^
/// ```
pub fn head_delimiter_value_before(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
None | Some(b'\n') => State::Retry(StateName::GfmTableHeadDelimiterCellAfter),
Some(b':') => {
// Align: left.
tokenizer.tokenize_state.size_b += 1;
tokenizer.tokenize_state.seen = true;
tokenizer.enter(Name::GfmTableDelimiterMarker);
tokenizer.consume();
tokenizer.exit(Name::GfmTableDelimiterMarker);
State::Next(StateName::GfmTableHeadDelimiterLeftAlignmentAfter)
}
Some(b'-') => {
// Align: none.
tokenizer.tokenize_state.size_b += 1;
State::Retry(StateName::GfmTableHeadDelimiterLeftAlignmentAfter)
}
_ => State::Retry(StateName::GfmTableHeadDelimiterNok),
}
}
/// After delimiter cell left alignment marker.
///
/// ```markdown
/// | | a |
/// > | | :- |
/// ^
/// ```
pub fn head_delimiter_left_alignment_after(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'-') => {
tokenizer.enter(Name::GfmTableDelimiterFiller);
State::Retry(StateName::GfmTableHeadDelimiterFiller)
}
// Anything else is not ok after the left-align colon.
_ => State::Retry(StateName::GfmTableHeadDelimiterNok),
}
}
/// In delimiter cell filler.
///
/// ```markdown
/// | | a |
/// > | | - |
/// ^
/// ```
pub fn head_delimiter_filler(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'-') => {
tokenizer.consume();
State::Next(StateName::GfmTableHeadDelimiterFiller)
}
Some(b':') => {
// Align is `center` if it was `left`, `right` otherwise.
tokenizer.tokenize_state.seen = true;
tokenizer.exit(Name::GfmTableDelimiterFiller);
tokenizer.enter(Name::GfmTableDelimiterMarker);
tokenizer.consume();
tokenizer.exit(Name::GfmTableDelimiterMarker);
State::Next(StateName::GfmTableHeadDelimiterRightAlignmentAfter)
}
_ => {
tokenizer.exit(Name::GfmTableDelimiterFiller);
State::Retry(StateName::GfmTableHeadDelimiterRightAlignmentAfter)
}
}
}
/// After delimiter cell right alignment marker.
///
/// ```markdown
/// | | a |
/// > | | -: |
/// ^
/// ```
pub fn head_delimiter_right_alignment_after(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'\t' | b' ') => {
tokenizer.attempt(
State::Next(StateName::GfmTableHeadDelimiterCellAfter),
State::Nok,
);
State::Retry(space_or_tab(tokenizer))
}
_ => State::Retry(StateName::GfmTableHeadDelimiterCellAfter),
}
}
/// After delimiter cell.
///
/// ```markdown
/// | | a |
/// > | | -: |
/// ^
/// ```
pub fn head_delimiter_cell_after(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
None | Some(b'\n') => {
// Exit when:
// * there was no `:` or `|` at all (it’s a thematic break or setext
// underline instead)
// * the header cell count is not the delimiter cell count
if !tokenizer.tokenize_state.seen
|| tokenizer.tokenize_state.size != tokenizer.tokenize_state.size_b
{
State::Retry(StateName::GfmTableHeadDelimiterNok)
} else {
// Reset.
tokenizer.tokenize_state.seen = false;
tokenizer.tokenize_state.size = 0;
tokenizer.tokenize_state.size_b = 0;
tokenizer.exit(Name::GfmTableDelimiterRow);
tokenizer.exit(Name::GfmTableHead);
tokenizer.register_resolver(ResolveName::GfmTable);
State::Ok
}
}
Some(b'|') => State::Retry(StateName::GfmTableHeadDelimiterBefore),
_ => State::Retry(StateName::GfmTableHeadDelimiterNok),
}
}
/// In delimiter row, at a disallowed byte.
///
/// ```markdown
/// | | a |
/// > | | x |
/// ^
/// ```
pub fn head_delimiter_nok(tokenizer: &mut Tokenizer) -> State {
// Reset.
tokenizer.tokenize_state.seen = false;
tokenizer.tokenize_state.size = 0;
tokenizer.tokenize_state.size_b = 0;
State::Nok
}
/// Before table body row.
///
/// ```markdown
/// | | a |
/// | | - |
/// > | | b |
/// ^
/// ```
pub fn body_row_start(tokenizer: &mut Tokenizer) -> State {
if tokenizer.lazy {
State::Nok
} else {
tokenizer.enter(Name::GfmTableRow);
match tokenizer.current {
Some(b'\t' | b' ') => {
tokenizer.attempt(State::Next(StateName::GfmTableBodyRowBefore), State::Nok);
State::Retry(space_or_tab_min_max(
tokenizer,
0,
if tokenizer.parse_state.options.constructs.code_indented {
TAB_SIZE - 1
} else {
usize::MAX
},
))
}
_ => State::Retry(StateName::GfmTableBodyRowBefore),
}
}
}
/// Before table body row, after optional whitespace.
///
/// ```markdown
/// | | a |
/// | | - |
/// > | | b |
/// ^
/// ```
pub fn body_row_before(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'\t' | b' ') => State::Nok,
_ => State::Retry(StateName::GfmTableBodyRowBreak),
}
}
/// At break in table body row.
///
/// ```markdown
/// | | a |
/// | | - |
/// > | | b |
/// ^
/// ^
/// ^
/// ```
pub fn body_row_break(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
None | Some(b'\n') => {
tokenizer.exit(Name::GfmTableRow);
State::Ok
}
Some(b'\t' | b' ') => {
tokenizer.attempt(State::Next(StateName::GfmTableBodyRowBreak), State::Nok);
State::Retry(space_or_tab(tokenizer))
}
Some(b'|') => {
tokenizer.enter(Name::GfmTableCellDivider);
tokenizer.consume();
tokenizer.exit(Name::GfmTableCellDivider);
State::Next(StateName::GfmTableBodyRowBreak)
}
// Anything else is cell content.
_ => {
tokenizer.enter(Name::Data);
State::Retry(StateName::GfmTableBodyRowData)
}
}
}
/// In table body row data.
///
/// ```markdown
/// | | a |
/// | | - |
/// > | | b |
/// ^
/// ```
pub fn body_row_data(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
None | Some(b'\t' | b'\n' | b' ' | b'|') => {
tokenizer.exit(Name::Data);
State::Retry(StateName::GfmTableBodyRowBreak)
}
_ => {
let name = if tokenizer.current == Some(b'\\') {
StateName::GfmTableBodyRowEscape
} else {
StateName::GfmTableBodyRowData
};
tokenizer.consume();
State::Next(name)
}
}
}
/// In table body row escape.
///
/// ```markdown
/// | | a |
/// | | ---- |
/// > | | b\-c |
/// ^
/// ```
pub fn body_row_escape(tokenizer: &mut Tokenizer) -> State {
match tokenizer.current {
Some(b'\\' | b'|') => {
tokenizer.consume();
State::Next(StateName::GfmTableBodyRowData)
}
_ => State::Retry(StateName::GfmTableBodyRowData),
}
}
/// Resolve GFM table.
pub fn resolve(tokenizer: &mut Tokenizer) -> Result