| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An undocumented part of CommonMark is how to deal with things in definition
labels or definition titles (which both can span multiple lines).
Can flow (or containers?) interrupt them?
They can according to the `cmark` reference parser, so this was implemented here.
This adds a new `Content` content type, which houses zero or more definitions,
and then zero-or-one paragraphs.
Content can be followed by a setext heading underline, which either turns
into a setext heading when the content ends in a paragraph, or turns into
the start of the following paragraph when it is followed by content that
starts with a paragraph, or turns into a stray paragraph.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
* Pass state names from an enum around instead of boxed functions
* Refactor to simplify attempts a lot
* Use a subtokenizer for the the `document` content type
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Remove custom kind wrappers, use plain bytes instead
* Remove `Into`s, use the explicit expected types instead
* Refactor to use `slice.as_str` in most places
* Remove unneeded unique check before adding a definition
* Use a shared CDATA prefix in constants
* Inline byte checks into matches
* Pass bytes back from parser instead of whole parse state
* Refactor to work more often on bytes
* Rename custom `size` to `len`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, a custom char implementation was used.
This was easier to work with, as sometimes “virtual” characters are injected,
or characters are ignored.
This replaces that with working on actual `char`s.
In the hope of in the future working on `u8`s, even.
This simplifies the state machine somewhat, as only `\n` is fed, regardless of
whether it was a CRLF, CR, or LF.
It also feeds `' '` instead of virtual spaces.
The BOM, if present, is now available as a `ByteOrderMark` event.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The approach that `micromark-js` takes is as follows: to parse a
paragraph, check whether each line starts with something else.
If it does, exit, otherwise continue.
That is slow, because our actual flow parser does similar things: the work was
being done twice.
To fix this, this commit introduces parsing each line of a paragraph separately.
And finally, when done with flow, combining adjacent paragraphs.
This same mechanism is reused for setext headings.
Additionally, this commit adds support for interrupting things (or not).
E.g., HTML (flow, complete) cannot interrupt paragraphs.
Definitions cannot interrupt paragraphs, and connect be interrupted either,
but they can follow each other.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
* Fix bug where whitespace after `:` was not allowed, it is
* Fix bug where escapes in labels did not work due to typo
* Fix to prefer first definition
* Fix whitespace after definitions
* Fix matching by adding normalizing
* Fix reference from being output as data
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is still some messy code that needs cleaning up, but it adds support for
links and images, of the resource kind (`[a](b)`).
References (`[a][b]`) are parsed and will soon be supported, but need matching.
* Fix bug to pad percent-encoded bytes when normalizing urls
* Fix bug with escapes counting as balancing in destination
* Add `space_or_tab_one_line_ending`, to parse whitespace including up to
one line ending (but not a blank line)
* Add `ParserState` to share codes, definitions, etc
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
* Improve passing stuff around
* Add traits to enums for markers and such
* Fix “life time” stuff I didn’t understand
|
|
|
|
|
|
| |
* add several helpers for parsing betwen x and y `space_or_tab`s
* use those helpers in a bunch of places
* move initial indent parsing to flow constructs themselves
|