What Is a Regular Expression?
A regular expression (regex) is a sequence of characters that defines a search pattern. It's a mini-language for describing text: you write a pattern and then test whether a string matches it, find all occurrences in a larger text, or extract specific parts of a match.
For example, the pattern \d{3}-\d{4} matches any 7-digit phone number in the format 555-1234. The pattern [a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,} is a basic email address validator. Regex patterns look cryptic at first glance, but they follow a consistent logic once you learn the building blocks.
Regex is supported natively in almost every programming language (JavaScript, Python, PHP, Java, Go, Ruby) and in many text editors and command-line tools (grep, sed, VS Code, Notepad++). The syntax is largely compatible across these environments, though each has small variations.
Essential Regex Syntax Reference
| Pattern | Meaning | Example |
|---|---|---|
| . | Any character except newline | a.c matches "abc", "a1c" |
| \d | Any digit (0–9) | \d\d matches "42" |
| \w | Word character (a–z, A–Z, 0–9, _) | \w+ matches "hello_world" |
| \s | Whitespace (space, tab, newline) | \s+ matches multiple spaces |
| ^ | Start of string (or line in multiline mode) | ^Hello matches lines starting with Hello |
| $ | End of string (or line in multiline mode) | world$ matches lines ending with world |
| * | Zero or more of the preceding element | go*d matches "gd", "god", "good" |
| + | One or more of the preceding element | go+d matches "god", "good" but not "gd" |
| ? | Zero or one (makes preceding element optional) | colou?r matches "color" and "colour" |
| {n,m} | Between n and m repetitions | \d{2,4} matches 2 to 4 digits |
| [abc] | Character class — matches any one of a, b, c | [aeiou] matches any vowel |
| [^abc] | Negated class — matches anything except a, b, c | [^0-9] matches non-digits |
| (abc) | Capturing group — group and capture | (\d{4}) captures a 4-digit year |
| a|b | Alternation — matches a or b | cat|dog matches "cat" or "dog" |
| \b | Word boundary | \bcat\b matches "cat" but not "catch" |
Regex Flags and Modes
Flags (also called modifiers) change how a regex pattern behaves:
- g (global): Find all matches, not just the first one. Essential for replace-all operations.
- i (case-insensitive): Makes the pattern ignore letter case.
/hello/imatches "Hello", "HELLO", and "hello". - m (multiline): Makes
^and$match the start and end of each line, not just the entire string. Critical when processing multi-line text. - s (dotAll / single-line): Makes
.match newline characters as well as everything else. Without this flag,.skips newlines. - u (unicode): Enables full Unicode matching, including proper handling of characters above U+FFFF (emoji, rare scripts).
In JavaScript, flags are added after the closing slash: /pattern/gi. Most online regex testers let you toggle flags with checkboxes.
Common Regex Patterns
Here are useful production-tested patterns for common tasks:
- Email address:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} - URL:
https?://[^\s/$.?#].[^\s]* - US phone number:
(\+1\s?)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4} - IPv4 address:
(\d{1,3}\.){3}\d{1,3} - Date (YYYY-MM-DD):
\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]) - Hex color code:
#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}) - HTML tags:
<[^>]+> - Blank lines:
^\s*$ - Whitespace cleanup:
\s{2,}(find runs of 2+ spaces) - Capitalized words:
\b[A-Z][a-z]+\b
Common Regex Pitfalls
Catastrophic backtracking: Patterns like (a+)+b applied to a long string that doesn't match can cause exponential slowdown — the regex engine backtracks through exponentially many combinations. Avoid nested quantifiers on similar character classes.
Greedy vs lazy matching: By default, quantifiers are greedy — they match as much as possible. The pattern <.+> applied to <b>text</b> will match the entire string, not just <b>. Add ? to make it lazy: <.+?> matches the shortest possible string.
Forgetting to escape special characters: The characters . * + ? ^ $ { } [ ] | ( ) \ have special meaning in regex. To match them literally, escape with a backslash: \. matches a literal dot, not "any character".
Assuming . matches newlines: By default, . does not match \n. If your text spans multiple lines and you want . to match across them, enable the dotAll flag (s).
How to Use an Online Regex Tester
An online regex tester gives you two inputs — the pattern and the test string — and highlights matches in real time. The workflow:
- Step 1: Enter your regex pattern in the pattern field (without the surrounding slashes).
- Step 2: Set your flags — enable global (g) if you want all matches highlighted, case-insensitive (i) if case shouldn't matter.
- Step 3: Paste your test string. Matches highlight immediately.
- Step 4: Tweak the pattern until it matches exactly what you need and nothing else.
- Step 5: Copy the pattern into your code, including the flags.
Combine regex testing with text analysis tools: use the word counter to measure your text before and after a regex-based cleanup, or use the duplicate line remover after extracting regex matches to clean up repeated results.