diff options
| author | 2022-06-22 16:41:04 +0200 | |
|---|---|---|
| committer | 2022-06-22 16:41:04 +0200 | |
| commit | b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9 (patch) | |
| tree | 7137fb206a909feb5c6f841c1a9e3dafc590312c | |
| parent | cbd0155d7f1f20b3fe261723515203df5108a56d (diff) | |
| download | markdown-rs-b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9.tar.gz markdown-rs-b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9.tar.bz2 markdown-rs-b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9.zip | |
Add docs on encoding to definition, destination
Diffstat (limited to '')
| -rw-r--r-- | readme.md | 2 | ||||
| -rw-r--r-- | src/construct/autolink.rs | 4 | ||||
| -rw-r--r-- | src/construct/definition.rs | 10 | ||||
| -rw-r--r-- | src/construct/partial_destination.rs | 40 | 
4 files changed, 48 insertions, 8 deletions
| @@ -68,7 +68,6 @@ cargo doc --document-private-items  #### Docs -- [ ] (1) Add docs for sanitation (autolink, definition, resource)  - [ ] (1) Add docs for how references and definitions match (definition, reference)  - [ ] (1) Go through all bnf  - [ ] (1) Go through all docs @@ -233,6 +232,7 @@ cargo doc --document-private-items  - [x] (1) Do not capture in `tokenizer.go`  - [x] (1) Clean attempts  - [x] (1) Add docs for tokenizer +- [x] (1) Add docs for sanitation  ### Extensions diff --git a/src/construct/autolink.rs b/src/construct/autolink.rs index 84c483d..33cb3f0 100644 --- a/src/construct/autolink.rs +++ b/src/construct/autolink.rs @@ -23,8 +23,8 @@  //! The maximum allowed size of a domain is `63` (inclusive), which is defined  //! in [`AUTOLINK_DOMAIN_SIZE_MAX`][autolink_domain_size_max].  //! -//! The grammar for autolinks is quite strict and requires ASCII to be used -//! (without, for example, spaces). +//! The grammar for autolinks is quite strict and prohibits the use of ASCII control +//! characters or spaces.  //! To use non-ascii characters and otherwise impossible characters, in URLs,  //! you can use percent encoding:  //! diff --git a/src/construct/definition.rs b/src/construct/definition.rs index b545643..57c62a5 100644 --- a/src/construct/definition.rs +++ b/src/construct/definition.rs @@ -30,7 +30,7 @@  //! space_or_tab ::= ' ' | '\t'  //! ```  //! -//! Definitions in markdown to not, on their own, relate to anything in HTML. +//! Definitions in markdown do not, on their own, relate to anything in HTML.  //! When connected with a link (reference), they together relate to the `<a>`  //! element in HTML.  //! The definition forms its `href`, and optionally `title`, attributes. @@ -41,6 +41,12 @@  //! That means that [character escapes][character_escape] and  //! [character references][character_reference] are allowed.  //! +//! For info on how to encode characters in URLs, see +//! [`partial_destination`][destination]. +//! For info on how to characters are encoded as `href` on `<a>` or `src` on +//! `<img>` when compiling, see +//! [`sanitize_uri`][sanitize_uri]. +//!  //! ## Tokens  //!  //! *   [`Definition`][TokenType::Definition] @@ -68,6 +74,8 @@  //! [string]: crate::content::string  //! [character_escape]: crate::construct::character_escape  //! [character_reference]: crate::construct::character_reference +//! [destination]: crate::construct::partial_destination +//! [sanitize_uri]: crate::util::sanitize_uri  //! [html]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element  //!  //! <!-- To do: link link (reference) --> diff --git a/src/construct/partial_destination.rs b/src/construct/partial_destination.rs index b2ceeb8..03dcbee 100644 --- a/src/construct/partial_destination.rs +++ b/src/construct/partial_destination.rs @@ -21,14 +21,45 @@  //! before it.  //! Escaped parens do not count in balancing.  //! -//! It is recommended to use the enclosed variant of destinations, as it allows -//! arbitrary parens, and also allows for whitespace and other characters in -//! URLs. -//!  //! The destination is interpreted as the [string][] content type.  //! That means that [character escapes][character_escape] and  //! [character references][character_reference] are allowed.  //! +//! The grammar for enclosed destinations (`<x>`) prohibits the use of `<`, +//! `>`, and line endings to form URLs. +//! The angle brackets can be encoded as a character reference, character +//! escape, or percent encoding: for `<` as `<`, `\<`, or `%3c` and for +//! `>` as `>`, `\>`, or `%3e`. +//! +//! The grammar for raw destinations (`x`) prohibits space (` `) and all +//! [ASCII control][char::is_ascii_control] characters, which thus must be +//! encoded. +//! Unbalanced arens can be encoded as a character reference, character escape, +//! or percent encoding: for `(` as `(`, `\(`, or `%28` and for `)` as +//! `)`, `\)`, or `%29`. +//! +//! It is recommended to use the enclosed variant of destinations, as it allows +//! the most characters, including arbitrary parens, in URLs. +//! +//! There are several cases where incorrect encoding of URLs would, in other +//! languages, result in a parse error. +//! In markdown, there are no errors, and URLs are normalized. +//! In addition, unicode characters are percent encoded +//! ([`sanitize_uri`][sanitize_uri]). +//! For example: +//! +//! ```markdown +//! [x] +//! +//! [x]: <https://a👍b%> +//! ``` +//! +//! Yields: +//! +//! ```html +//! <p><a href="https://a%F0%9F%91%8Db%25">x</a></p> +//! ``` +//!  //! ## References  //!  //! *   [`micromark-factory-destination/index.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-factory-destination/dev/index.js) @@ -37,6 +68,7 @@  //! [string]: crate::content::string  //! [character_escape]: crate::construct::character_escape  //! [character_reference]: crate::construct::character_reference +//! [sanitize_uri]: crate::util::sanitize_uri  //!  //! <!-- To do: link label end. --> | 
