aboutsummaryrefslogtreecommitdiffstats
path: root/src/subtokenize.rs (unfollow)
Commit message (Collapse)AuthorFilesLines
2022-08-11Refactor to move some code to `event.rs`Libravatar Titus Wormer1-11/+12
2022-08-11Refactor to move some code to `state.rs`Libravatar Titus Wormer1-3/+4
2022-08-11Refactor internal docs, code style of tokenizerLibravatar Titus Wormer1-5/+1
2022-08-11Add improved container exit injectionLibravatar Titus Wormer1-5/+8
2022-08-10Rename `State::Fn` to `State::Next`Libravatar Titus Wormer1-2/+2
2022-08-09Refactor to share some codeLibravatar Titus Wormer1-56/+86
2022-08-09Rewrite algorithm to not pass around boxed functionsLibravatar Titus Wormer1-7/+6
* Pass state names from an enum around instead of boxed functions * Refactor to simplify attempts a lot * Use a subtokenizer for the the `document` content type
2022-07-28Refactor to use `debug_assert`Libravatar Titus Wormer1-7/+7
2022-07-26Refactor to drastically improve perf around whitespaceLibravatar Titus Wormer1-6/+8
2022-07-26Refactor to simplify tokenizerLibravatar Titus Wormer1-6/+3
2022-07-25Refactor to remove need for cloning codesLibravatar Titus Wormer1-10/+4
2022-07-25Improve performance w/ a single feed loopLibravatar Titus Wormer1-2/+6
2022-07-22Refactor to remove unneeded tuples in every statesLibravatar Titus Wormer1-13/+9
2022-07-22Refactor to pass ints instead of vecs aroundLibravatar Titus Wormer1-4/+6
2022-07-21Refactor to move `index` field to `point`Libravatar Titus Wormer1-5/+5
2022-07-21Refactor to move some event fields to `link`Libravatar Titus Wormer1-35/+36
2022-07-20Refactor to share edit mapLibravatar Titus Wormer1-3/+3
2022-07-20Refactor to use less vecs for eventsLibravatar Titus Wormer1-2/+4
2022-07-19Refactor to remove cloning in `edit_map`Libravatar Titus Wormer1-2/+2
2022-07-19Use `edit_map` in `subtokenize`Libravatar Titus Wormer1-67/+40
2022-07-19Remove an unneeded `HashMap`Libravatar Titus Wormer1-1/+1
2022-07-15Fix annoying bug around virtual spaces in containersLibravatar Titus Wormer1-1/+1
2022-07-07Add support for `Flow` content typeLibravatar Titus Wormer1-2/+4
2022-07-05Refactor to do some to dosLibravatar Titus Wormer1-3/+2
2022-07-04Add support for unicode punctuationLibravatar Titus Wormer1-1/+1
2022-07-04Update list of todosLibravatar Titus Wormer1-2/+0
2022-06-28Fix jumps in `edit_map`Libravatar Titus Wormer1-101/+99
* Use resolve more often (e.g., heading (atx, setext)) * Fix to link whole phrasing (e.g., one big chunk of text in heading (atx, setext), titles, labels) * Replace `ChunkText`, `ChunkString`, with `event.content_type: Option<ContentType>` * Refactor to externalize `edit_map` from `label`
2022-06-24Add link, images (resource)Libravatar Titus Wormer1-12/+26
This is still some messy code that needs cleaning up, but it adds support for links and images, of the resource kind (`[a](b)`). References (`[a][b]`) are parsed and will soon be supported, but need matching. * Fix bug to pad percent-encoded bytes when normalizing urls * Fix bug with escapes counting as balancing in destination * Add `space_or_tab_one_line_ending`, to parse whitespace including up to one line ending (but not a blank line) * Add `ParserState` to share codes, definitions, etc
2022-06-22Refactor some unneeded assignmentsLibravatar Titus Wormer1-2/+1
2022-06-22Add docs for token typesLibravatar Titus Wormer1-1/+3
2022-06-21Add docs for `subtokenize`Libravatar Titus Wormer1-2/+51
2022-06-21Update todo listLibravatar Titus Wormer1-8/+1
2022-06-20Add support for BOMLibravatar Titus Wormer1-0/+4
2022-06-20Remove unneeded `content` content typeLibravatar Titus Wormer1-6/+3
2022-06-14Fix support for deep subtokenizationLibravatar Titus Wormer1-9/+19
* Fix a couple of forgotten line ending handling in html (text) * Fix missing initial case for html (text) not having a `<` 😬 * Add line ending handling to `text` construct
2022-06-14Reorganize to split utilLibravatar Titus Wormer1-6/+4
2022-06-14Add docs for html (text)Libravatar Titus Wormer1-0/+1
2022-06-13Add basic html (text)Libravatar Titus Wormer1-3/+9
* Add all states for html (text) * Fix to link paragraph tokens together * Add note about uncovered bug where linking paragraph tokens together doesn’t work 😅
2022-06-10Add text content typeLibravatar Titus Wormer1-4/+10
* Add character reference and character escapes in text * Add recursive subtokenization
2022-06-10Add proper support for subtokenizationLibravatar Titus Wormer1-50/+116
- Add “content” content type - Add paragraph - Add skips - Add linked tokens
2022-06-09Add basic subtokenization, string content in fenced codeLibravatar Titus Wormer1-0/+67