Python RegEx

Python Regular Expressions (RegEx)

Regular expressions (RegEx) in Python are used for matching patterns in strings. The re module provides support for working with regular expressions. You can use regular expressions to search, match, and manipulate text in powerful ways.

To work with regular expressions, you need to import the re module:

import re

Common Functions in the re Module

Here are some of the most commonly used functions in the re module:

1. re.match()

The re.match() function checks if the regular expression matches at the beginning of the string. It returns a match object if there is a match, otherwise None.

import re

# Check if the string starts with "Hello"
result = re.match(r"Hello", "Hello World")
if result:
    print("Match found!")
else:
    print("Match not found.")

Output:

Match found!

2. re.search()

The re.search() function searches the entire string for a match and returns the first occurrence as a match object.

import re

# Search for the word "world" anywhere in the string
result = re.search(r"world", "Hello World")
if result:
    print("Search found!")
else:
    print("Search not found.")

Output:

Search found!

3. re.findall()

The re.findall() function returns all non-overlapping matches of the pattern in the string as a list of strings.

import re

# Find all words starting with 'w'
text = "The quick brown fox jumps over the lazy wolf."
result = re.findall(r"\bw\w*", text)
print(result)

Output:

['quick', 'wolf']

4. re.finditer()

The re.finditer() function returns an iterator yielding match objects for all non-overlapping matches of the pattern.

import re

# Find all occurrences of "fox"
text = "The quick brown fox jumps over the lazy fox."
matches = re.finditer(r"fox", text)
for match in matches:
    print(match.start(), match.end(), match.group())

Output:

16 19 fox
39 42 fox

5. re.sub()

The re.sub() function is used to replace parts of a string that match a pattern with a new string.

import re

# Replace all occurrences of 'fox' with 'cat'
text = "The quick brown fox jumps over the lazy fox."
result = re.sub(r"fox", "cat", text)
print(result)

Output:

The quick brown cat jumps over the lazy cat.

6. re.split()

The re.split() function splits a string by the occurrences of the pattern.

import re

# Split the string based on spaces
text = "The quick brown fox"
result = re.split(r"\s+", text)
print(result)

Output:

['The', 'quick', 'brown', 'fox']

Regular Expression Syntax

Regular expressions are composed of various components that match text patterns. Here are some of the basic components:

  • Literal Characters: Matches the exact character. For example, r"cat" matches the word “cat”.
  • Metacharacters: Special characters with a specific meaning in regular expressions:
    • .: Matches any character except a newline.
    • ^: Anchors the match to the start of the string.
    • $: Anchors the match to the end of the string.
    • []: Denotes a character class, matching any one character inside the brackets.
    • |: Acts as a logical OR. Matches either the pattern on the left or the right.
    • *: Matches zero or more occurrences of the preceding character.
    • +: Matches one or more occurrences of the preceding character.
    • ?: Matches zero or one occurrence of the preceding character.
    • {n,m}: Matches between n and m occurrences of the preceding character.

Examples:

  1. ^ and $ Anchors
    • ^ matches the beginning of a string.
    • $ matches the end of a string.
    import re
    
    # Check if the string starts with 'Hello' and ends with 'World'
    text = "Hello, World"
    if re.match(r"^Hello", text) and re.search(r"World$", text):
        print("Match found!")
    

    Output:

    Match found!
    
  2. Character Classes
    • \d: Matches any digit (equivalent to [0-9]).
    • \D: Matches any non-digit character.
    • \w: Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
    • \W: Matches any non-alphanumeric character.
    • \s: Matches any whitespace character (spaces, tabs, newlines).
    • \S: Matches any non-whitespace character.

    Example: Matching digits in a string:

    import re
    
    text = "There are 123 apples and 456 bananas."
    result = re.findall(r"\d+", text)
    print(result)
    

    Output:

    ['123', '456']
    
  3. Quantifiers
    • *: Zero or more occurrences.
    • +: One or more occurrences.
    • {n}: Exactly n occurrences.
    • {n,}: At least n occurrences.
    • {n,m}: Between n and m occurrences.

    Example: Match any string of three digits:

    import re
    
    text = "I have 123 apples, 45 bananas, and 67890 oranges."
    result = re.findall(r"\d{3}", text)
    print(result)
    

    Output:

    ['123', '678']
    
  4. Grouping and Capturing You can group parts of a pattern using parentheses. This allows you to capture the matched groups for later use.
    import re
    
    text = "John: 123, Alice: 456"
    result = re.findall(r"(\w+): (\d+)", text)
    print(result)
    

    Output:

    [('John', '123'), ('Alice', '456')]
    

    In this example, (\w+) captures the name and (\d+) captures the number.

Flags

You can use flags in regular expressions to modify the behavior of matching:

  • re.IGNORECASE or re.I: Makes the pattern case-insensitive.
  • re.MULTILINE or re.M: Allows ^ and $ to match the beginning and end of each line, not just the string.
  • re.DOTALL or re.S: Makes the dot (.) match newlines as well.

Example with a flag:

import re

text = "Hello world\nhello World"
result = re.findall(r"hello", text, flags=re.IGNORECASE)
print(result)

Output:

['Hello', 'hello']

Summary

The re module in Python provides powerful tools for working with regular expressions. You can use it for searching, matching, splitting, and replacing text, as well as extracting specific parts of strings based on patterns.

Key Functions:

  • re.match()
  • re.search()
  • re.findall()
  • re.finditer()
  • re.sub()
  • re.split()
Leave a Reply 0

Your email address will not be published. Required fields are marked *