Saturday, July 20, 2024

How to perform Fuzzy Match in Python?

 The thefuzz library is a modern replacement for fuzzywuzzy. Here's the script in order to perform fuzzy match in Python using thefuzz:





Business use case:

Create a detailed Python script to perform fuzzy matching. We have a file containing data, and the user will provide a search string. The goal is to perform a fuzzy match of the search string against the content of the file. The Python script should include code for reading the file and implementing the fuzzy match logic.

A) Install thefuzz:

pip install thefuzz

pip install python-Levenshtein


B) Script for reading a file and fuzzy matching input against file content

import sys
from thefuzz import fuzz
from thefuzz import process

def read_file(file_path):
    """Reads the content of the file and returns it as a list of strings."""
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.readlines()
        return [line.strip() for line in content]
    except FileNotFoundError:
        print(f"File not found: {file_path}")
        sys.exit(1)

def fuzzy_match(content, search_string, threshold=80):
    """
    Performs fuzzy match on the content with the search string.
    
    Args:
        content (list): List of strings from the file.
        search_string (str): The string to search for.
        threshold (int): Minimum similarity ratio to consider a match.
    
    Returns:
        list: List of tuples with matching strings and their similarity ratios.
    """
    matches = process.extract(search_string, content, limit=None)
    return [match for match in matches if match[1] >= threshold]

def main():
    if len(sys.argv) < 3:
        print("Usage: python fuzzy_match.py <file_path> <search_string> [threshold]")
        sys.exit(1)

    file_path = sys.argv[1]
    search_string = sys.argv[2]
    threshold = int(sys.argv[3]) if len(sys.argv) > 3 else 80

    content = read_file(file_path)
    matches = fuzzy_match(content, search_string, threshold)

    if matches:
        print("Matches found:")
        for match in matches:
            print(f"String: {match[0]}, Similarity: {match[1]}")
    else:
        print("No matches found.")

if __name__ == "__main__":
    main()






C) How to Run the Script

  1. Save the script as fuzzy_match.py.
  2. Prepare a text file with the content you want to search in, let's say data.txt.
  3. Run the script from the command line: 
python fuzzy_match.py data.txt "search string" [threshold]


  • data.txt is the file containing your data.
  • "search string" is the string you want to fuzzy match.
  • [threshold] is an optional parameter specifying the minimum similarity ratio (default is 80).

  • D) Example Usage

    python fuzzy_match.py data.txt "example search string" 75

    This script will read data.txt, perform a fuzzy match with "example search string", and print the matches with a similarity ratio of at least 75.

    E) Explanation

  • read_file: This function reads the file content and returns it as a list of stripped strings.
  • fuzzy_match: This function performs fuzzy matching on the list of strings using the thefuzz library. It filters matches based on a similarity ratio threshold.
  • main: This is the entry point of the script. It checks for command-line arguments, reads the file content, performs the fuzzy match, and prints the results.

  • No comments:

    Post a Comment

    Please do not enter any spam link in the comment box.