diff options
author | Titus Wormer <tituswormer@gmail.com> | 2022-06-15 19:28:54 +0200 |
---|---|---|
committer | Titus Wormer <tituswormer@gmail.com> | 2022-06-15 19:28:54 +0200 |
commit | 7875ada79cea1194dc9e15acee36ed0700be70e6 (patch) | |
tree | f9d5b82ac92a07a3ff1d05401446dd84bc24d6a7 | |
parent | b150a72975d6e75b96298b3d405afe070271d78b (diff) | |
download | markdown-rs-7875ada79cea1194dc9e15acee36ed0700be70e6.tar.gz markdown-rs-7875ada79cea1194dc9e15acee36ed0700be70e6.tar.bz2 markdown-rs-7875ada79cea1194dc9e15acee36ed0700be70e6.zip |
Add docs on sanitizing urls to autolink
-rw-r--r-- | src/construct/autolink.rs | 18 | ||||
-rw-r--r-- | src/util/sanitize_uri.rs | 4 |
2 files changed, 20 insertions, 2 deletions
diff --git a/src/construct/autolink.rs b/src/construct/autolink.rs index 2682878..78003fb 100644 --- a/src/construct/autolink.rs +++ b/src/construct/autolink.rs @@ -38,6 +38,23 @@ //! <p><a href="https://example.com/alpha%20bravo">https://example.com/alpha%20bravo</a></p> //! ``` //! +//! There are several cases where incorrect encoding of URLs would, in other +//! languages, result in a parse error. +//! In markdown, there are no errors, and URLs are normalized. +//! In addition, unicode characters are percent encoded +//! ([`sanitize_uri`][sanitize_uri]). +//! For example: +//! +//! ```markdown +//! <https://a👍b%> +//! ``` +//! +//! Yields: +//! +//! ```html +//! <p><a href="https://a%F0%9F%91%8Db%25">https://a👍b%</a></p> +//! ``` +//! //! Interestingly, there are a couple of things that are valid autolinks in //! markdown but in HTML would be valid tags, such as `<svg:rect>` and //! `<xml:lang/>`. @@ -73,6 +90,7 @@ //! [text]: crate::content::text //! [autolink_scheme_size_max]: crate::constant::AUTOLINK_SCHEME_SIZE_MAX //! [autolink_domain_size_max]: crate::constant::AUTOLINK_DOMAIN_SIZE_MAX +//! [sanitize_uri]: crate::util::sanitize_uri //! [html-a]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element //! //! <!-- To do: add explanation of sanitation. --> diff --git a/src/util/sanitize_uri.rs b/src/util/sanitize_uri.rs index 40e0f2c..d66978e 100644 --- a/src/util/sanitize_uri.rs +++ b/src/util/sanitize_uri.rs @@ -5,9 +5,9 @@ use crate::util::encode::encode; /// Make a value safe for injection as a URL. /// /// This encodes unsafe characters with percent-encoding and skips already -/// encoded sequences (see `normalize_uri` below). +/// encoded sequences (see [`normalize_uri`][] below). /// Further unsafe characters are encoded as character references (see -/// `encode`). +/// [`encode`][]). /// /// Then, a vec of (lowercase) allowed protocols can be given, in which case /// the URL is sanitized. |