|
1 | 1 | Sanitization |
2 | 2 | ================= |
3 | 3 |
|
4 | | -This example leverages python-lolhtml to delete potentially dangerous elements and attributes. |
5 | | -We assume that only elements within a `main` tag could be dangerous. Further, we forbid tags besides `p` and `span`, |
6 | | -which are common in blog posts. However, these tags could still contain malicious content, such as hidden text, |
7 | | -prompting us to remove attributes. |
| 4 | +This example leverages python-lolhtml to delete potentially dangerous elements and attributes. We forbid tags besides |
| 5 | +`p` and `span`, which are common in blog posts. However, these tags could still contain malicious content, such as |
| 6 | +hidden text, prompting us to remove attributes. |
8 | 7 |
|
9 | | -Notice how the otherwise hidden message is revealed by removing the `style` tag whereas our known-good footer text's |
10 | | -style is preserved. Additionally, the `script` tag is removed and its inner text is retained. |
| 8 | +Notice how the otherwise hidden message is revealed by removing the `style` tag. Additionally, the `script` tag is |
| 9 | +removed and its inner text is retained. |
11 | 10 |
|
12 | 11 | While this example loads all the content into the rewriter at once with one singular read, python-lolhtml also supports |
13 | 12 | streaming content and writing in chunks. |
14 | 13 |
|
15 | 14 | > [!CAUTION] |
16 | 15 | > It is generally a bad idea to make your own sanitizer, especially if your use case prompts a more permissive set of |
17 | 16 | > rules. If you are not sure about what you are doing, consider using another purpose-built package like |
18 | | -> [nh3](https://pypi.org/project/nh3/) or [bleach](https://pypi.org/project/bleach/). |
| 17 | +> [nh3](https://pypi.org/project/nh3/) or [bleach](https://pypi.org/project/bleach/). |
| 18 | +
|
| 19 | +> [!CAUTION] |
| 20 | +> When making sanitizers using lol-html, you should generally **not** restrict sanitization to a specific element within |
| 21 | +> the HTML. In these cases, the parser may effectively become confused and mistake an element to be outside the scope, |
| 22 | +> such as with tags that never close to where the outcome is determined by browser implementation. |
0 commit comments