132 lines
5.2 KiB
Markdown
132 lines
5.2 KiB
Markdown
---
|
||
title: What is ReDoS? How to prevent it?
|
||
date: 2024-04-17 21:41:40
|
||
tags:
|
||
- cybersecurity
|
||
- regex
|
||
category: Tips
|
||
thumbnail: /images/covers/What-is-ReDoS-How-to-prevent-it.png
|
||
---
|
||
Regular expression Denial of Service (ReDoS) attacks may cause your web application to be slow and unresponsive. This attack relies on catastrophic backtracking caused by specially constructed input for unoptimized regular expressions.
|
||
|
||
**WARNING: We’re not responsible for damage caused by ReDoS attacks! Malicious hacking is a computer crime and you may face legal consequences! This post is meant to gain awareness about ReDoS attacks and give a way to prevent those vulnerabilities.**
|
||
|
||
## "Evil Regex"
|
||
|
||
Let's take `/^(a|a)*$/` as an example. The visualization looks like this:
|
||
|
||
![Visualization of `/^(a|a)*$/` regular expression](/images/redos-visualize-exponential.png)
|
||
|
||
As you will see, it will first match the beginning of the string, then any amount of either "a" or "a" character, then it will match the end of the string.
|
||
|
||
Let's try `aaaaaaaaaaaaaaaaaaaaaaaab` input on a backtracking regular expression engine. There is the beginning, then multiple "a" characters, but there is a "b" character, and there is another possibility of "a", so the regular expression engine will backtrack and try to match "a" again.
|
||
```
|
||
a
|
||
aa
|
||
aaa
|
||
aaaa
|
||
aaaaa
|
||
...
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
...
|
||
```
|
||
As you will see, there is an exponential function in amount of backtracks (<code>1 + 2 + 2<sup>2</sup> + ... + 2<sup>n</sup> = 2<sup>n+1</sup> - 1</code>). The amount of steps has an exponential function, so the amount grows very fast (faster than linear function). The fast growth of steps then causes the matching to be very slow for long strings.
|
||
|
||
Let's take another example: `/^a*a*$/`, that matches the same strings as previous regular expression.
|
||
|
||
![Visualization of `/^a*a*$/` regular expression](/images/redos-visualize-polymonial.png)
|
||
|
||
As you will see, it will first match the beginning of the string, then any amount of "a" character, then also any amount of "a" character, then it will match the end of the string.
|
||
|
||
Let's try the same `aaaaaaaaaaaaaaaaaaaaaaaab` input as before on a backtracking regular expression engine. There is the beginning, then multiple "a" character (first `a*`), but there is a "b" character, and there is another possibility of any amount of "a" character (second `a*`), so the regular expression engine will backtrack and try to match any amount of "a".
|
||
```
|
||
a
|
||
aa
|
||
aaa
|
||
aaaa
|
||
aaaaa
|
||
...
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
aaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaaaaaaaa
|
||
aaaaaaaaaaaaaaaaaa[BACKTRACK]
|
||
...
|
||
```
|
||
As you will see, the amount of steps after first backtrack is polymonial: (<code>1 + 2 + 3 + 4 + 5 + ... + n = ((n+1) * n)/2) = (n<sup>2</sup> + n)/2</code>). The amount of steps has an polymonial function, so the amount grows faster than linear function. The fast growth of steps then causes the matching to be slow for very long strings.
|
||
|
||
Overall, "Evil Regex" contains:
|
||
|
||
* Grouping with repetition
|
||
* Inside the repeated group:
|
||
* Repetition
|
||
* Alternation with overlapping
|
||
|
||
There are also other examples of "Evil Regex":
|
||
* `(a+)+`
|
||
* `([a-zA-Z]+)*`
|
||
* `(a|aa)+`
|
||
* `(a|a?)+`
|
||
* `(.*a){x}` for x ≥ 10
|
||
|
||
All the above are causing slowness, when the `aaaaaaaaaaaaaaaaaaaaaaaa!` string is tried.
|
||
|
||
## ReDoS prevention
|
||
|
||
You can prevent ReDoS by optimizing regular expressions: `/^(a|a)*$/` and `/^a*a*$/` could be optimized into `/^a*$/`.
|
||
|
||
Optimizations for examples of "Evil Regex":
|
||
* `(a+)+` could be optimized into `a+`
|
||
* `([a-zA-Z]+)*` could be optimized into `[a-zA-Z]*`
|
||
* `(a|aa)+` could be optimized into `a+`
|
||
* `(a|a?)+` could be optimized into `a*`
|
||
* `(.*a){x}` could be optimized into `([^a]*a){x}`
|
||
|
||
You can check if the regular expression is vulnerable by checking it with ReDoS checker; [Devina.io has one of them.](https://devina.io/redos-checker)
|
||
|
||
![`/^(a|a)*$/` regular expression is checked](/images/redos-check.png)
|
||
|
||
You can also switch to non-backtracking regular expression engine, such as RE2 or one used in Rust `regex` library.
|