aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLibravatar Titus Wormer <tituswormer@gmail.com>2022-06-22 16:41:04 +0200
committerLibravatar Titus Wormer <tituswormer@gmail.com>2022-06-22 16:41:04 +0200
commitb0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9 (patch)
tree7137fb206a909feb5c6f841c1a9e3dafc590312c
parentcbd0155d7f1f20b3fe261723515203df5108a56d (diff)
downloadmarkdown-rs-b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9.tar.gz
markdown-rs-b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9.tar.bz2
markdown-rs-b0accb11f1aade55e9fc4dc0a1c1d1b8362ab5d9.zip
Add docs on encoding to definition, destination
-rw-r--r--readme.md2
-rw-r--r--src/construct/autolink.rs4
-rw-r--r--src/construct/definition.rs10
-rw-r--r--src/construct/partial_destination.rs40
4 files changed, 48 insertions, 8 deletions
diff --git a/readme.md b/readme.md
index d79169c..6594148 100644
--- a/readme.md
+++ b/readme.md
@@ -68,7 +68,6 @@ cargo doc --document-private-items
#### Docs
-- [ ] (1) Add docs for sanitation (autolink, definition, resource)
- [ ] (1) Add docs for how references and definitions match (definition, reference)
- [ ] (1) Go through all bnf
- [ ] (1) Go through all docs
@@ -233,6 +232,7 @@ cargo doc --document-private-items
- [x] (1) Do not capture in `tokenizer.go`
- [x] (1) Clean attempts
- [x] (1) Add docs for tokenizer
+- [x] (1) Add docs for sanitation
### Extensions
diff --git a/src/construct/autolink.rs b/src/construct/autolink.rs
index 84c483d..33cb3f0 100644
--- a/src/construct/autolink.rs
+++ b/src/construct/autolink.rs
@@ -23,8 +23,8 @@
//! The maximum allowed size of a domain is `63` (inclusive), which is defined
//! in [`AUTOLINK_DOMAIN_SIZE_MAX`][autolink_domain_size_max].
//!
-//! The grammar for autolinks is quite strict and requires ASCII to be used
-//! (without, for example, spaces).
+//! The grammar for autolinks is quite strict and prohibits the use of ASCII control
+//! characters or spaces.
//! To use non-ascii characters and otherwise impossible characters, in URLs,
//! you can use percent encoding:
//!
diff --git a/src/construct/definition.rs b/src/construct/definition.rs
index b545643..57c62a5 100644
--- a/src/construct/definition.rs
+++ b/src/construct/definition.rs
@@ -30,7 +30,7 @@
//! space_or_tab ::= ' ' | '\t'
//! ```
//!
-//! Definitions in markdown to not, on their own, relate to anything in HTML.
+//! Definitions in markdown do not, on their own, relate to anything in HTML.
//! When connected with a link (reference), they together relate to the `<a>`
//! element in HTML.
//! The definition forms its `href`, and optionally `title`, attributes.
@@ -41,6 +41,12 @@
//! That means that [character escapes][character_escape] and
//! [character references][character_reference] are allowed.
//!
+//! For info on how to encode characters in URLs, see
+//! [`partial_destination`][destination].
+//! For info on how to characters are encoded as `href` on `<a>` or `src` on
+//! `<img>` when compiling, see
+//! [`sanitize_uri`][sanitize_uri].
+//!
//! ## Tokens
//!
//! * [`Definition`][TokenType::Definition]
@@ -68,6 +74,8 @@
//! [string]: crate::content::string
//! [character_escape]: crate::construct::character_escape
//! [character_reference]: crate::construct::character_reference
+//! [destination]: crate::construct::partial_destination
+//! [sanitize_uri]: crate::util::sanitize_uri
//! [html]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element
//!
//! <!-- To do: link link (reference) -->
diff --git a/src/construct/partial_destination.rs b/src/construct/partial_destination.rs
index b2ceeb8..03dcbee 100644
--- a/src/construct/partial_destination.rs
+++ b/src/construct/partial_destination.rs
@@ -21,14 +21,45 @@
//! before it.
//! Escaped parens do not count in balancing.
//!
-//! It is recommended to use the enclosed variant of destinations, as it allows
-//! arbitrary parens, and also allows for whitespace and other characters in
-//! URLs.
-//!
//! The destination is interpreted as the [string][] content type.
//! That means that [character escapes][character_escape] and
//! [character references][character_reference] are allowed.
//!
+//! The grammar for enclosed destinations (`<x>`) prohibits the use of `<`,
+//! `>`, and line endings to form URLs.
+//! The angle brackets can be encoded as a character reference, character
+//! escape, or percent encoding: for `<` as `&lt;`, `\<`, or `%3c` and for
+//! `>` as `&gt;`, `\>`, or `%3e`.
+//!
+//! The grammar for raw destinations (`x`) prohibits space (` `) and all
+//! [ASCII control][char::is_ascii_control] characters, which thus must be
+//! encoded.
+//! Unbalanced arens can be encoded as a character reference, character escape,
+//! or percent encoding: for `(` as `&lpar;`, `\(`, or `%28` and for `)` as
+//! `&rpar;`, `\)`, or `%29`.
+//!
+//! It is recommended to use the enclosed variant of destinations, as it allows
+//! the most characters, including arbitrary parens, in URLs.
+//!
+//! There are several cases where incorrect encoding of URLs would, in other
+//! languages, result in a parse error.
+//! In markdown, there are no errors, and URLs are normalized.
+//! In addition, unicode characters are percent encoded
+//! ([`sanitize_uri`][sanitize_uri]).
+//! For example:
+//!
+//! ```markdown
+//! [x]
+//!
+//! [x]: <https://a👍b%>
+//! ```
+//!
+//! Yields:
+//!
+//! ```html
+//! <p><a href="https://a%F0%9F%91%8Db%25">x</a></p>
+//! ```
+//!
//! ## References
//!
//! * [`micromark-factory-destination/index.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-factory-destination/dev/index.js)
@@ -37,6 +68,7 @@
//! [string]: crate::content::string
//! [character_escape]: crate::construct::character_escape
//! [character_reference]: crate::construct::character_reference
+//! [sanitize_uri]: crate::util::sanitize_uri
//!
//! <!-- To do: link label end. -->