This repository has been archived on 2024-09-12. You can view files and clone it, but cannot push or open issues or pull requests.
svrjs-blog/source/_posts/What-is-ReDoS-How-to-prevent-it.md

132 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: What is ReDoS? How to prevent it?
date: 2024-04-17 21:41:40
tags:
- cybersecurity
- regex
category: Tips
thumbnail: /images/covers/What-is-ReDoS-How-to-prevent-it.png
---
Regular expression Denial of Service (ReDoS) attacks may cause your web application to be slow and unresponsive. This attack relies on catastrophic backtracking caused by specially constructed input for unoptimized regular expressions.
**WARNING: Were not responsible for damage caused by ReDoS attacks! Malicious hacking is a computer crime and you may face legal consequences! This post is meant to gain awareness about ReDoS attacks and give a way to prevent those vulnerabilities.**
## "Evil Regex"
Let's take `/^(a|a)*$/` as an example. The visualization looks like this:
![Visualization of `/^(a|a)*$/` regular expression](/images/redos-visualize-exponential.png)
As you will see, it will first match the beginning of the string, then any amount of either "a" or "a" character, then it will match the end of the string.
Let's try `aaaaaaaaaaaaaaaaaaaaaaaab` input on a backtracking regular expression engine. There is the beginning, then multiple "a" characters, but there is a "b" character, and there is another possibility of "a", so the regular expression engine will backtrack and try to match "a" again.
```
a
aa
aaa
aaaa
aaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaa[BACKTRACK]
...
```
As you will see, there is an exponential function in amount of backtracks (<code>1 + 2 + 2<sup>2</sup> + ... + 2<sup>n</sup> = 2<sup>n+1</sup> - 1</code>). The amount of steps has an exponential function, so the amount grows very fast (faster than linear function). The fast growth of steps then causes the matching to be very slow for long strings.
Let's take another example: `/^a*a*$/`, that matches the same strings as previous regular expression.
![Visualization of `/^a*a*$/` regular expression](/images/redos-visualize-polymonial.png)
As you will see, it will first match the beginning of the string, then any amount of "a" character, then also any amount of "a" character, then it will match the end of the string.
Let's try the same `aaaaaaaaaaaaaaaaaaaaaaaab` input as before on a backtracking regular expression engine. There is the beginning, then multiple "a" character (first `a*`), but there is a "b" character, and there is another possibility of any amount of "a" character (second `a*`), so the regular expression engine will backtrack and try to match any amount of "a".
```
a
aa
aaa
aaaa
aaaaa
...
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaa[BACKTRACK]
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaa[BACKTRACK]
...
```
As you will see, the amount of steps after first backtrack is polymonial: (<code>1 + 2 + 3 + 4 + 5 + ... + n = ((n+1) * n)/2) = (n<sup>2</sup> + n)/2</code>). The amount of steps has an polymonial function, so the amount grows faster than linear function. The fast growth of steps then causes the matching to be slow for very long strings.
Overall, "Evil Regex" contains:
* Grouping with repetition
* Inside the repeated group:
* Repetition
* Alternation with overlapping
There are also other examples of "Evil Regex":
* `(a+)+`
* `([a-zA-Z]+)*`
* `(a|aa)+`
* `(a|a?)+`
* `(.*a){x}` for x ≥ 10
All the above are causing slowness, when the `aaaaaaaaaaaaaaaaaaaaaaaa!` string is tried.
## ReDoS prevention
You can prevent ReDoS by optimizing regular expressions: `/^(a|a)*$/` and `/^a*a*$/` could be optimized into `/^a*$/`.
Optimizations for examples of "Evil Regex":
* `(a+)+` could be optimized into `a+`
* `([a-zA-Z]+)*` could be optimized into `[a-zA-Z]*`
* `(a|aa)+` could be optimized into `a+`
* `(a|a?)+` could be optimized into `a*`
* `(.*a){x}` could be optimized into `([^a]*a){x}`
You can check if the regular expression is vulnerable by checking it with ReDoS checker; [Devina.io has one of them.](https://devina.io/redos-checker)
![`/^(a|a)*$/` regular expression is checked](/images/redos-check.png)
You can also switch to non-backtracking regular expression engine, such as RE2 or one used in Rust `regex` library.