aboutsummaryrefslogtreecommitdiffstats
path: root/src
diff options
context:
space:
mode:
authorLibravatar Titus Wormer <tituswormer@gmail.com>2022-06-15 19:28:54 +0200
committerLibravatar Titus Wormer <tituswormer@gmail.com>2022-06-15 19:28:54 +0200
commit7875ada79cea1194dc9e15acee36ed0700be70e6 (patch)
treef9d5b82ac92a07a3ff1d05401446dd84bc24d6a7 /src
parentb150a72975d6e75b96298b3d405afe070271d78b (diff)
downloadmarkdown-rs-7875ada79cea1194dc9e15acee36ed0700be70e6.tar.gz
markdown-rs-7875ada79cea1194dc9e15acee36ed0700be70e6.tar.bz2
markdown-rs-7875ada79cea1194dc9e15acee36ed0700be70e6.zip
Add docs on sanitizing urls to autolink
Diffstat (limited to 'src')
-rw-r--r--src/construct/autolink.rs18
-rw-r--r--src/util/sanitize_uri.rs4
2 files changed, 20 insertions, 2 deletions
diff --git a/src/construct/autolink.rs b/src/construct/autolink.rs
index 2682878..78003fb 100644
--- a/src/construct/autolink.rs
+++ b/src/construct/autolink.rs
@@ -38,6 +38,23 @@
//! <p><a href="https://example.com/alpha%20bravo">https://example.com/alpha%20bravo</a></p>
//! ```
//!
+//! There are several cases where incorrect encoding of URLs would, in other
+//! languages, result in a parse error.
+//! In markdown, there are no errors, and URLs are normalized.
+//! In addition, unicode characters are percent encoded
+//! ([`sanitize_uri`][sanitize_uri]).
+//! For example:
+//!
+//! ```markdown
+//! <https://a👍b%>
+//! ```
+//!
+//! Yields:
+//!
+//! ```html
+//! <p><a href="https://a%F0%9F%91%8Db%25">https://a👍b%</a></p>
+//! ```
+//!
//! Interestingly, there are a couple of things that are valid autolinks in
//! markdown but in HTML would be valid tags, such as `<svg:rect>` and
//! `<xml:lang/>`.
@@ -73,6 +90,7 @@
//! [text]: crate::content::text
//! [autolink_scheme_size_max]: crate::constant::AUTOLINK_SCHEME_SIZE_MAX
//! [autolink_domain_size_max]: crate::constant::AUTOLINK_DOMAIN_SIZE_MAX
+//! [sanitize_uri]: crate::util::sanitize_uri
//! [html-a]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element
//!
//! <!-- To do: add explanation of sanitation. -->
diff --git a/src/util/sanitize_uri.rs b/src/util/sanitize_uri.rs
index 40e0f2c..d66978e 100644
--- a/src/util/sanitize_uri.rs
+++ b/src/util/sanitize_uri.rs
@@ -5,9 +5,9 @@ use crate::util::encode::encode;
/// Make a value safe for injection as a URL.
///
/// This encodes unsafe characters with percent-encoding and skips already
-/// encoded sequences (see `normalize_uri` below).
+/// encoded sequences (see [`normalize_uri`][] below).
/// Further unsafe characters are encoded as character references (see
-/// `encode`).
+/// [`encode`][]).
///
/// Then, a vec of (lowercase) allowed protocols can be given, in which case
/// the URL is sanitized.