From 110935f485f16693f7ae8c97d40c3efaa5638383 Mon Sep 17 00:00:00 2001 From: Titus Wormer Date: Fri, 19 Aug 2022 14:48:04 +0200 Subject: Add some docs --- examples/lib.rs | 2 +- readme.md | 417 ++++++++++++++++++++++++++++++++++++++++++++------------ 2 files changed, 333 insertions(+), 86 deletions(-) diff --git a/examples/lib.rs b/examples/lib.rs index 62d7ee4..94e04f5 100644 --- a/examples/lib.rs +++ b/examples/lib.rs @@ -7,7 +7,7 @@ fn main() { env_logger::init(); // Safely turn (untrusted?) markdown into HTML. - println!("{:?}", micromark("# Hello, world!")); + println!("{:?}", micromark("## Hello, *world*!")); // Turn trusted markdown into HTML. println!( diff --git a/readme.md b/readme.md index 78c5447..286999f 100644 --- a/readme.md +++ b/readme.md @@ -1,114 +1,361 @@ # micromark-rs - + -A [CommonMark][] compliant, `no_std` + `alloc`, markdown parser, with extensions, -in Rust. + -Crate docs currently at -[`wooorm.com/micromark-rs/micromark/`][docs]. + + + -## To do + + + + -### Docs +[![Sponsors][sponsors-badge]][opencollective] +[![Backers][backers-badge]][opencollective] -- [ ] (1) Add overview docs on how everything works +A [`CommonMark`][commonmark-spec] compliant markdown parser in [Rust][] with +positional info, concrete tokens, and extensions. -### Refactor +## Feature highlights -- [ ] (1) Improve `interrupt`, `concrete`, `lazy` fields somehow? -- [ ] (?) Remove last box: the one around the child tokenizer? -- [ ] (1) Add helper to get byte at, get char before/after, etc. -- [ ] (?) Use smaller things that usizes? +- [x] **[compliant][commonmark]** (100% to CommonMark) +- [x] **[extensions][]** (GFM, directives, frontmatter, math) +- [x] **[safe][security]** (100% safe rust, also 100% safe HTML by default) +- [x] **[robust][test]** (1800+ tests, 100% coverage) -### Test +It’s also `#![no_std]` + `alloc`, has tons of docs, and has a single dependency +(for optional debug logging). + +> 🐣 **Note**: extensions and coverage are currently within progress. + +## When to use this + +- If you _just_ want to turn markdown into HTML (with maybe a few extensions) +- If you want to do _really complex things_ with markdown + +See [§ Comparison][comparison] for more info + +## Intro + +micromark is markdown parser in Rust. +It uses a state machine to parse the entirety of markdown into concrete +tokens. +Its API compiles to HTML, but its parts are made to be used separately, so as to +generate syntax trees or compile to other output formats. +`micromark-rs` has a sibling in JavaScript, [`micromark-js`][micromark-js]. + + + + + +- to learn markdown, see this [cheatsheet and tutorial][cheat] +- to help, see [contribute][] or [sponsor][] below + +## Contents -- [ ] (1) Make sure positional info is perfect -- [ ] (3) Share tests with `micromark-js` -- [ ] (3) Add tests for a zillion attention markers, tons of lists, tons of labels, etc? - -### Misc - -- [ ] (?) Improve document performance (potential 50%) -- [ ] (?) Improve paragraph performance (potential 15%) -- [ ] (?) Improve label (link, image) performance (potential 7%) -- [ ] (3) Read through rust docs to figure out what useful functions there are, - and fix stuff I’m doing manually now -- [ ] (5) Do some research on rust best practices for APIs, e.g., what to accept, - how to integrate with streams or so? -- [ ] (3) Write comparison to other parsers -- [ ] (3) Add node/etc bindings? -- [ ] (3) Bunch of docs -- [ ] (5) Site - -### Extensions - -The extensions below are listed from top to bottom from more important to less -important. - -- [x] (1) frontmatter (yaml, toml) (flow) - — [`micromark-extension-frontmatter`](https://github.com/micromark/micromark-extension-frontmatter) -- [x] (3) autolink literal (GFM) (text) - — [`micromark-extension-gfm-autolink-literal`](https://github.com/micromark/micromark-extension-gfm-autolink-literal) -- [ ] (3) footnote (GFM) (flow, text) - — [`micromark-extension-gfm-footnote`](https://github.com/micromark/micromark-extension-gfm-footnote) -- [ ] (3) strikethrough (GFM) (text) - — [`micromark-extension-gfm-strikethrough`](https://github.com/micromark/micromark-extension-gfm-strikethrough) -- [ ] (5) table (GFM) (flow) - — [`micromark-extension-gfm-table`](https://github.com/micromark/micromark-extension-gfm-table) -- [ ] (1) task list item (GFM) (text) - — [`micromark-extension-gfm-task-list-item`](https://github.com/micromark/micromark-extension-gfm-task-list-item) -- [ ] (3) math (flow, text) - — [`micromark-extension-math`](https://github.com/micromark/micromark-extension-math) -- [ ] (8) directive (flow, text) - — [`micromark-extension-directive`](https://github.com/micromark/micromark-extension-directive) -- [ ] (8) expression (MDX) (flow, text) - — [`micromark-extension-mdx-expression`](https://github.com/micromark/micromark-extension-mdx-expression) -- [ ] (5) JSX (MDX) (flow, text) - — [`micromark-extension-mdx-jsx`](https://github.com/micromark/micromark-extension-mdx-jsx) -- [ ] (3) ESM (MDX) (flow) - — [`micromark-extension-mdxjs-esm`](https://github.com/micromark/micromark-extension-mdxjs-esm) -- [ ] (1) tagfilter (GFM) (n/a, renderer) - — [`micromark-extension-gfm-tagfilter`](https://github.com/micromark/micromark-extension-gfm-tagfilter) - -#### After - -- [ ] (8) After all extensions, including MDX, are done, see if we can integrate - this with SWC to compile MDX - -## Scripts - -Run examples: + + +> 🚧 **To do**. + +## Install + +With [Rust][] (rust edition 2018+, ±version 1.56+), install with `cargo`: ```sh -RUST_BACKTRACE=1 RUST_LOG=debug cargo run --example lib +cargo install micromark ``` -Format: +## Use -```sh -cargo fmt --all +```rs +extern crate micromark; +use micromark::micromark; + +fn main() { + println!("{}", micromark("## Hello, *world*!")); +} ``` -Lint: +Yields: -```sh -cargo fmt --all -- --check && cargo clippy -- -D clippy::pedantic -D clippy::cargo -A clippy::doc_link_with_quotes +```html +

Hello, world!

``` -Tests: +Extensions (in this case GFM): -```sh -RUST_BACKTRACE=1 cargo test +```rs +extern crate micromark; +use micromark::{micromark_with_options, Constructs, Options}; + +fn main() { + println!( + "{}", + micromark_with_options( + "* [x] contact@example.com ~~strikethrough~~", + &Options { + constructs: Constructs::gfm(), + ..Options::default() + } + ) + ); +} ``` -Docs: +Yields: -```sh -cargo doc --document-private-items +```html + +``` + +## API + +`micromark` exposes +[`micromark`](https://wooorm.com/micromark-rs/micromark/fn.micromark.html), +[`micromark_with_options`](https://wooorm.com/micromark-rs/micromark/fn.micromark_with_options.html), and +[`Options`](https://wooorm.com/micromark-rs/micromark/struct.Options.html). +See [crate docs][docs] for more info. + +## Extensions + +micromark supports extensions. +These extensions are maintained in this project. +They are not enabled by default but can be turned on with `options.constructs`. + +> 🐣 **Note**: extensions are currently within progress. + +- [ ] directive +- [x] frontmatter +- [ ] gfm + - [x] autolink literal + - [ ] footnote + - [ ] strikethrough + - [ ] table + - [ ] tagfilter + - [ ] task list item +- [ ] math + +It is not a goal of this project to support lots of different extensions. +It’s instead a goal to support incredibly common, somewhat standardized, +extensions. + +## Architecture + +micromark is maintained as a single monolithic package. + +### Overview + +The process to parse markdown looks like this: + +```txt + micromark ++------------------------------------------------+ +| +-------+ +---------+ | +| -markdown->+ parse +-events->+ compile +-html- | +| +-------+ +---------+ | ++------------------------------------------------+ +``` + +### File structure + +The files in `src/` are as follows: + +- `construct/*.rs` + — CommonMark, GFM, and other extension constructs used in micromark +- `util/*.rs` + — helpers often needed when parsing markdown +- `compiler.rs` + — turns events into a string of HTML +- `constants.rs` + — magic numbers and sets used in markdown +- `event.rs` + — things with meaning happening somewhere +- `lib.rs` + — core module +- `parser.rs` + — turn a string of markdown into events +- `resolve.rs` + — steps to process events +- `state.rs` + — steps of the state machine +- `subtokenize.rs` + — handle content in other content +- `tokenizer.rs` + — glue the states of the state machine together +- `unicode.rs` + — info on unicode + +## Examples + + + +> 🚧 **To do**. + +## Markdown + +### CommonMark + +The first definition of “Markdown” gave several examples of how it worked, +showing input Markdown and output HTML, and came with a reference implementation +(`Markdown.pl`). +When new implementations followed, they mostly followed the first definition, +but deviated from the first implementation, and added extensions, thus making +the format a family of formats. + +Some years later, an attempt was made to standardize the differences between +implementations, by specifying how several edge cases should be handled, through +more input and output examples. +This is known as [CommonMark][commonmark-spec], and many implementations now +work towards some degree of CommonMark compliancy. +Still, CommonMark describes what the output in HTML should be given some +input, which leaves many edge cases up for debate, and does not answer what +should happen for other output formats. + +micromark passes all tests from CommonMark and has many more tests to match the +CommonMark reference parsers. + +### Grammar + +The syntax of markdown can be described in Backus–Naur form (BNF) as: + +```bnf +markdown = .* ``` -(add `--open` to open them in a browser) +No, that’s [not a typo](http://trevorjim.com/a-specification-for-markdown/): +markdown has no syntax errors; anything thrown at it renders _something_. + +For more practical examples of how things roughly work in BNF, see the module docs of each `src/construct`. + +## Project + +### Comparison + +> 🚧 **To do**. + + + +### Test + +micromark is tested with the \~650 CommonMark tests and more than 1.2k extra +tests confirmed with CM reference parsers. +Then there’s even more tests for GFM and other extensions. +These tests reach all branches in the code, which means that this project has +100% code coverage. + +The following scripts are useful when working on this project: + +- run examples: + ```sh + RUST_BACKTRACE=1 RUST_LOG=debug cargo run --example lib + ``` +- format: + ```sh + cargo fmt --all + ``` +- lint: + ```sh + cargo fmt --all --check && cargo clippy -- -D clippy::pedantic -D clippy::cargo -A clippy::doc_link_with_quotes + ``` +- test: + ```sh + RUST_BACKTRACE=1 cargo test + ``` +- docs: + ```sh + cargo doc --document-private-items + ``` + +### Version + +micromark adheres to [semver](https://semver.org). + +### Security + +The typical security aspect discussed for markdown is [cross-site scripting +(XSS)][xss] attacks. +Markdown itself is safe if it does not include embedded HTML or dangerous +protocols in links/images (such as `javascript:` or `data:`). +micromark makes any markdown safe by default, even if HTML is embedded or +dangerous protocols are used, as it encodes or drops them. +Turning on the `allow_dangerous_html` or `allow_dangerous_protocol` options for +user-provided markdown opens you up to XSS attacks. + +Another security aspect is DDoS attacks. +For example, an attacker could throw a 100mb file at micromark, in which case +it’s going to take a long while to finish. +It is also possible to crash micromark with smaller payloads, notably when +thousands of links, images, emphasis, or strong are opened but not closed. +It is wise to cap the accepted size of input (500kb can hold a big book) and to +process content in a different thread so that it can be stopped when needed. + +For more information on markdown sanitation, see +[`improper-markup-sanitization.md`][improper] by [**@chalker**][chalker]. + +### Contribute + +> 🚧 **To do**. + + + + + + +### Sponsor + +> 🚧 **To do**. + + + +Support this effort and give back by sponsoring: + +- [GitHub Sponsors](https://github.com/sponsors/wooorm) + (personal; monthly or one-time) +- [OpenCollective](https://opencollective.com/unified) or + [GitHub Sponsors](https://github.com/sponsors/unifiedjs) + (unified; monthly or one-time) + + + +### License + +[MIT][license] © [Titus Wormer][author] + + + + + + -[commonmark]: https://spec.commonmark.org +[sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg +[backers-badge]: https://opencollective.com/unified/backers/badge.svg +[opencollective]: https://opencollective.com/unified [docs]: https://wooorm.com/micromark-rs/micromark/ +[commonmark-spec]: https://spec.commonmark.org +[cheat]: https://commonmark.org/help/ +[gfm-spec]: https://github.github.com/gfm/ +[rust]: https://www.rust-lang.org +[cmsm]: https://github.com/micromark/common-markup-state-machine +[micromark-js]: https://github.com/micromark/micromark +[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting +[improper]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md +[chalker]: https://github.com/ChALkeR +[license]: https://github.com/micromark/micromark/blob/main/license +[author]: https://wooorm.com +[contribute]: #contribute +[sponsor]: #sponsor +[commonmark]: #commonmark +[extensions]: #extensions +[security]: #security +[test]: #test +[comparison]: #comparison -- cgit