Bento check: Catch catastrophic backtracking ReDoS bugs

get this check now in Bento: $ pip3 install bento-cli && bento init

ReDoS (regular expression denial-of-service) bugs occur when a specially crafted string is passed to an inefficient regular expression, making it take a very long time to complete. We recently found a high severity vulnerability in several versions of Python 2 and 3, so we created a Bento check to find similar vulnerabilities in other projects. The check is included as of Bento version 0.9.

One type of inefficient regular expression occurs with nested quantifiers. A simple example is (a+)+. Another type of inefficient expression occurs with mutually inclusive alternation. These are either/or expressions with overlapping character space like ([a-z]|a)+. Both of these inefficient expressions must also contain a pattern after the inefficient part that can’t be matched against, like (a+)+b. This causes backtracking, and in the inefficient case it causes “catastrophic backtracking.” Catastrophic backtracking can impact the security and availability of an application, thus opening the door for denial-of-service attacks.

A catastrophic backtracking bug caused a large Cloudflare outage in July, 2019. In this case, “CPU exhaustion was caused by a single WAF rule that contained a poorly written regular expression that ended up creating excessive backtracking.” This led to Cloudflare’s first global outage in six years.

Here’s an example of catastrophic backtracking:

ReDoS backtracking

To create this check, we added heuristics for detecting inefficient regular expressions into tooling and then used our program analysis platform to run the check against several thousand open source Python projects. Our first tests of the check produced many findings:

Initial findings across thousands of Python projects

A particularly interesting finding came from the Python urllib module:

rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
                'realm=(["\']?)([^"\']*)\\2', re.I)

This line was caught by the check due to nested quantifiers in the (?:.*,)* part of the regular expression. Since . and , overlap (dot means any character), by simply repeating a comma and then including a character we can cause catastrophic backtracking. This led to Python bpo-39503 and CVE-2020-8492.

In looking at the check’s other results, we found and reported bugs in several widely-used Python projects:

If you’re using regular expressions in your Python project, you can now run Bento and determine if this bug affects it. For more details on how we found the specific vulnerabilities noted above, see this post on the r2c engineering blog.

References