diff options
Diffstat (limited to '')
-rw-r--r-- | readme.md | 417 |
1 files changed, 332 insertions, 85 deletions
@@ -1,114 +1,361 @@ # micromark-rs -<img align="right" width="106" height="106" alt="" src="https://raw.githubusercontent.com/wooorm/micromark-rs/14f1ad0/logo.svg?sanitize=true"> +<!-- To do: enable image when repo is public. --> -A [CommonMark][] compliant, `no_std` + `alloc`, markdown parser, with extensions, -in Rust. +<!-- <img align="right" width="106" height="106" alt="" src="https://raw.githubusercontent.com/wooorm/micromark-rs/14f1ad0/logo.svg?sanitize=true"> --> -Crate docs currently at -[`wooorm.com/micromark-rs/micromark/`][docs]. +<!-- To do: enable badges when repo is public/published --> +<!-- To do: link `Downloads`/`crate-badge` to `crate` instead of temporary site. --> +<!-- To do: enable discussions on this repo? --> -## To do +<!-- [![Build][build-badge]][build] --> +<!-- [![Downloads][crate-badge]][docs] --> +<!-- [![Coverage][coverage-badge]][coverage] --> +<!-- [![Chat][chat-badge]][chat] --> -### Docs +[![Sponsors][sponsors-badge]][opencollective] +[![Backers][backers-badge]][opencollective] -- [ ] (1) Add overview docs on how everything works +A [`CommonMark`][commonmark-spec] compliant markdown parser in [Rust][] with +positional info, concrete tokens, and extensions. -### Refactor +## Feature highlights -- [ ] (1) Improve `interrupt`, `concrete`, `lazy` fields somehow? -- [ ] (?) Remove last box: the one around the child tokenizer? -- [ ] (1) Add helper to get byte at, get char before/after, etc. -- [ ] (?) Use smaller things that usizes? +- [x] **[compliant][commonmark]** (100% to CommonMark) +- [x] **[extensions][]** (GFM, directives, frontmatter, math) +- [x] **[safe][security]** (100% safe rust, also 100% safe HTML by default) +- [x] **[robust][test]** (1800+ tests, 100% coverage) -### Test +It’s also `#![no_std]` + `alloc`, has tons of docs, and has a single dependency +(for optional debug logging). + +> 🐣 **Note**: extensions and coverage are currently within progress. + +## When to use this + +- If you _just_ want to turn markdown into HTML (with maybe a few extensions) +- If you want to do _really complex things_ with markdown + +See [§ Comparison][comparison] for more info + +## Intro + +micromark is markdown parser in Rust. +It uses a state machine to parse the entirety of markdown into concrete +tokens. +Its API compiles to HTML, but its parts are made to be used separately, so as to +generate syntax trees or compile to other output formats. +`micromark-rs` has a sibling in JavaScript, [`micromark-js`][micromark-js]. + +<!-- To do: link to unified etc if this repo gets moved there? --> +<!-- To do: link to discussions --> +<!-- * for questions, see [Discussions][chat] --> + +- to learn markdown, see this [cheatsheet and tutorial][cheat] +- to help, see [contribute][] or [sponsor][] below + +## Contents -- [ ] (1) Make sure positional info is perfect -- [ ] (3) Share tests with `micromark-js` -- [ ] (3) Add tests for a zillion attention markers, tons of lists, tons of labels, etc? - -### Misc - -- [ ] (?) Improve document performance (potential 50%) -- [ ] (?) Improve paragraph performance (potential 15%) -- [ ] (?) Improve label (link, image) performance (potential 7%) -- [ ] (3) Read through rust docs to figure out what useful functions there are, - and fix stuff I’m doing manually now -- [ ] (5) Do some research on rust best practices for APIs, e.g., what to accept, - how to integrate with streams or so? -- [ ] (3) Write comparison to other parsers -- [ ] (3) Add node/etc bindings? -- [ ] (3) Bunch of docs -- [ ] (5) Site - -### Extensions - -The extensions below are listed from top to bottom from more important to less -important. - -- [x] (1) frontmatter (yaml, toml) (flow) - — [`micromark-extension-frontmatter`](https://github.com/micromark/micromark-extension-frontmatter) -- [x] (3) autolink literal (GFM) (text) - — [`micromark-extension-gfm-autolink-literal`](https://github.com/micromark/micromark-extension-gfm-autolink-literal) -- [ ] (3) footnote (GFM) (flow, text) - — [`micromark-extension-gfm-footnote`](https://github.com/micromark/micromark-extension-gfm-footnote) -- [ ] (3) strikethrough (GFM) (text) - — [`micromark-extension-gfm-strikethrough`](https://github.com/micromark/micromark-extension-gfm-strikethrough) -- [ ] (5) table (GFM) (flow) - — [`micromark-extension-gfm-table`](https://github.com/micromark/micromark-extension-gfm-table) -- [ ] (1) task list item (GFM) (text) - — [`micromark-extension-gfm-task-list-item`](https://github.com/micromark/micromark-extension-gfm-task-list-item) -- [ ] (3) math (flow, text) - — [`micromark-extension-math`](https://github.com/micromark/micromark-extension-math) -- [ ] (8) directive (flow, text) - — [`micromark-extension-directive`](https://github.com/micromark/micromark-extension-directive) -- [ ] (8) expression (MDX) (flow, text) - — [`micromark-extension-mdx-expression`](https://github.com/micromark/micromark-extension-mdx-expression) -- [ ] (5) JSX (MDX) (flow, text) - — [`micromark-extension-mdx-jsx`](https://github.com/micromark/micromark-extension-mdx-jsx) -- [ ] (3) ESM (MDX) (flow) - — [`micromark-extension-mdxjs-esm`](https://github.com/micromark/micromark-extension-mdxjs-esm) -- [ ] (1) tagfilter (GFM) (n/a, renderer) - — [`micromark-extension-gfm-tagfilter`](https://github.com/micromark/micromark-extension-gfm-tagfilter) - -#### After - -- [ ] (8) After all extensions, including MDX, are done, see if we can integrate - this with SWC to compile MDX - -## Scripts - -Run examples: +<!-- To do: generate this? Use remark? --> + +> 🚧 **To do**. + +## Install + +With [Rust][] (rust edition 2018+, ±version 1.56+), install with `cargo`: ```sh -RUST_BACKTRACE=1 RUST_LOG=debug cargo run --example lib +cargo install micromark ``` -Format: +## Use -```sh -cargo fmt --all +```rs +extern crate micromark; +use micromark::micromark; + +fn main() { + println!("{}", micromark("## Hello, *world*!")); +} ``` -Lint: +Yields: -```sh -cargo fmt --all -- --check && cargo clippy -- -D clippy::pedantic -D clippy::cargo -A clippy::doc_link_with_quotes +```html +<h2>Hello, <em>world</em>!</h2> ``` -Tests: +Extensions (in this case GFM): -```sh -RUST_BACKTRACE=1 cargo test +```rs +extern crate micromark; +use micromark::{micromark_with_options, Constructs, Options}; + +fn main() { + println!( + "{}", + micromark_with_options( + "* [x] contact@example.com ~~strikethrough~~", + &Options { + constructs: Constructs::gfm(), + ..Options::default() + } + ) + ); +} ``` -Docs: +Yields: -```sh -cargo doc --document-private-items +```html +<ul> + <li> + <input checked="" disabled="" type="checkbox" /> + <a href="mailto:contact@example.com">contact@example.com</a> + <del>strikethrough</del> + </li> +</ul> +``` + +## API + +`micromark` exposes +[`micromark`](https://wooorm.com/micromark-rs/micromark/fn.micromark.html), +[`micromark_with_options`](https://wooorm.com/micromark-rs/micromark/fn.micromark_with_options.html), and +[`Options`](https://wooorm.com/micromark-rs/micromark/struct.Options.html). +See [crate docs][docs] for more info. + +## Extensions + +micromark supports extensions. +These extensions are maintained in this project. +They are not enabled by default but can be turned on with `options.constructs`. + +> 🐣 **Note**: extensions are currently within progress. + +- [ ] directive +- [x] frontmatter +- [ ] gfm + - [x] autolink literal + - [ ] footnote + - [ ] strikethrough + - [ ] table + - [ ] tagfilter + - [ ] task list item +- [ ] math + +It is not a goal of this project to support lots of different extensions. +It’s instead a goal to support incredibly common, somewhat standardized, +extensions. + +## Architecture + +micromark is maintained as a single monolithic package. + +### Overview + +The process to parse markdown looks like this: + +```txt + micromark ++------------------------------------------------+ +| +-------+ +---------+ | +| -markdown->+ parse +-events->+ compile +-html- | +| +-------+ +---------+ | ++------------------------------------------------+ +``` + +### File structure + +The files in `src/` are as follows: + +- `construct/*.rs` + — CommonMark, GFM, and other extension constructs used in micromark +- `util/*.rs` + — helpers often needed when parsing markdown +- `compiler.rs` + — turns events into a string of HTML +- `constants.rs` + — magic numbers and sets used in markdown +- `event.rs` + — things with meaning happening somewhere +- `lib.rs` + — core module +- `parser.rs` + — turn a string of markdown into events +- `resolve.rs` + — steps to process events +- `state.rs` + — steps of the state machine +- `subtokenize.rs` + — handle content in other content +- `tokenizer.rs` + — glue the states of the state machine together +- `unicode.rs` + — info on unicode + +## Examples + +<!-- To do: example section with more full-fledges examples, on GFM, math, frontmatter, etc. --> + +> 🚧 **To do**. + +## Markdown + +### CommonMark + +The first definition of “Markdown” gave several examples of how it worked, +showing input Markdown and output HTML, and came with a reference implementation +(`Markdown.pl`). +When new implementations followed, they mostly followed the first definition, +but deviated from the first implementation, and added extensions, thus making +the format a family of formats. + +Some years later, an attempt was made to standardize the differences between +implementations, by specifying how several edge cases should be handled, through +more input and output examples. +This is known as [CommonMark][commonmark-spec], and many implementations now +work towards some degree of CommonMark compliancy. +Still, CommonMark describes what the output in HTML should be given some +input, which leaves many edge cases up for debate, and does not answer what +should happen for other output formats. + +micromark passes all tests from CommonMark and has many more tests to match the +CommonMark reference parsers. + +### Grammar + +The syntax of markdown can be described in Backus–Naur form (BNF) as: + +```bnf +markdown = .* ``` -(add `--open` to open them in a browser) +No, that’s [not a typo](http://trevorjim.com/a-specification-for-markdown/): +markdown has no syntax errors; anything thrown at it renders _something_. + +For more practical examples of how things roughly work in BNF, see the module docs of each `src/construct`. + +## Project + +### Comparison + +> 🚧 **To do**. + +<!-- To do. --> + +### Test + +micromark is tested with the \~650 CommonMark tests and more than 1.2k extra +tests confirmed with CM reference parsers. +Then there’s even more tests for GFM and other extensions. +These tests reach all branches in the code, which means that this project has +100% code coverage. + +The following scripts are useful when working on this project: + +- run examples: + ```sh + RUST_BACKTRACE=1 RUST_LOG=debug cargo run --example lib + ``` +- format: + ```sh + cargo fmt --all + ``` +- lint: + ```sh + cargo fmt --all --check && cargo clippy -- -D clippy::pedantic -D clippy::cargo -A clippy::doc_link_with_quotes + ``` +- test: + ```sh + RUST_BACKTRACE=1 cargo test + ``` +- docs: + ```sh + cargo doc --document-private-items + ``` + +### Version + +micromark adheres to [semver](https://semver.org). + +### Security + +The typical security aspect discussed for markdown is [cross-site scripting +(XSS)][xss] attacks. +Markdown itself is safe if it does not include embedded HTML or dangerous +protocols in links/images (such as `javascript:` or `data:`). +micromark makes any markdown safe by default, even if HTML is embedded or +dangerous protocols are used, as it encodes or drops them. +Turning on the `allow_dangerous_html` or `allow_dangerous_protocol` options for +user-provided markdown opens you up to XSS attacks. + +Another security aspect is DDoS attacks. +For example, an attacker could throw a 100mb file at micromark, in which case +it’s going to take a long while to finish. +It is also possible to crash micromark with smaller payloads, notably when +thousands of links, images, emphasis, or strong are opened but not closed. +It is wise to cap the accepted size of input (500kb can hold a big book) and to +process content in a different thread so that it can be stopped when needed. + +For more information on markdown sanitation, see +[`improper-markup-sanitization.md`][improper] by [**@chalker**][chalker]. + +### Contribute + +> 🚧 **To do**. + +<!-- To do: contrib. --> +<!-- See [`contributing.md`][contributing] for ways to get started. +See [`support.md`][support] for ways to get help. --> + +<!-- To do: CoC. --> + +### Sponsor + +> 🚧 **To do**. + +<!-- To do: mention Vercel. --> + +Support this effort and give back by sponsoring: + +- [GitHub Sponsors](https://github.com/sponsors/wooorm) + (personal; monthly or one-time) +- [OpenCollective](https://opencollective.com/unified) or + [GitHub Sponsors](https://github.com/sponsors/unifiedjs) + (unified; monthly or one-time) + +<!-- To do: origin story --> + +### License + +[MIT][license] © [Titus Wormer][author] + +<!-- To do: public/publish --> +<!-- [build-badge]: https://github.com/wooorm/micromark-rs/workflows/main/badge.svg --> +<!-- [build]: https://github.com/wooorm/micromark-rs/actions --> +<!-- [crate-badge]: https://img.shields.io/crates/d/micromark.svg --> +<!-- [crate]: https://crates.io/crates/micromark --> -[commonmark]: https://spec.commonmark.org +[sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg +[backers-badge]: https://opencollective.com/unified/backers/badge.svg +[opencollective]: https://opencollective.com/unified [docs]: https://wooorm.com/micromark-rs/micromark/ +[commonmark-spec]: https://spec.commonmark.org +[cheat]: https://commonmark.org/help/ +[gfm-spec]: https://github.github.com/gfm/ +[rust]: https://www.rust-lang.org +[cmsm]: https://github.com/micromark/common-markup-state-machine +[micromark-js]: https://github.com/micromark/micromark +[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting +[improper]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md +[chalker]: https://github.com/ChALkeR +[license]: https://github.com/micromark/micromark/blob/main/license +[author]: https://wooorm.com +[contribute]: #contribute +[sponsor]: #sponsor +[commonmark]: #commonmark +[extensions]: #extensions +[security]: #security +[test]: #test +[comparison]: #comparison |