The thefuzz
library is a modern replacement for fuzzywuzzy
. Here's the script in order to perform fuzzy match in Python using thefuzz
:
Business use case:
Create a detailed Python script to perform fuzzy matching. We have a file containing data, and the user will provide a search string. The goal is to perform a fuzzy match of the search string against the content of the file. The Python script should include code for reading the file and implementing the fuzzy match logic.
A) Install thefuzz
:
pip install thefuzz
pip install python-Levenshtein
B) Script for reading a file and fuzzy matching input against file content
import sys
from thefuzz import fuzz
from thefuzz import process
def read_file(file_path):
"""Reads the content of the file and returns it as a list of strings."""
try:
with open(file_path, 'r', encoding='utf-8') as file:
content = file.readlines()
return [line.strip() for line in content]
except FileNotFoundError:
print(f"File not found: {file_path}")
sys.exit(1)
def fuzzy_match(content, search_string, threshold=80):
"""
Performs fuzzy match on the content with the search string.
Args:
content (list): List of strings from the file.
search_string (str): The string to search for.
threshold (int): Minimum similarity ratio to consider a match.
Returns:
list: List of tuples with matching strings and their similarity ratios.
"""
matches = process.extract(search_string, content, limit=None)
return [match for match in matches if match[1] >= threshold]
def main():
if len(sys.argv) < 3:
print("Usage: python fuzzy_match.py <file_path> <search_string> [threshold]")
sys.exit(1)
file_path = sys.argv[1]
search_string = sys.argv[2]
threshold = int(sys.argv[3]) if len(sys.argv) > 3 else 80
content = read_file(file_path)
matches = fuzzy_match(content, search_string, threshold)
if matches:
print("Matches found:")
for match in matches:
print(f"String: {match[0]}, Similarity: {match[1]}")
else:
print("No matches found.")
if __name__ == "__main__":
main()
C) How to Run the Script
- Save the script as
fuzzy_match.py
. - Prepare a text file with the content you want to search in, let's say
data.txt
. - Run the script from the command line:
python fuzzy_match.py data.txt "search string" [threshold]
data.txt
is the file containing your data."search string"
is the string you want to fuzzy match.[threshold]
is an optional parameter specifying the minimum similarity ratio (default is 80).D) Example Usage
python fuzzy_match.py data.txt "example search string" 75
This script will read
data.txt
, perform a fuzzy match with "example search string"
, and print the matches with a similarity ratio of at least 75.E) Explanation
thefuzz
library. It filters matches based on a similarity ratio threshold.
No comments:
Post a Comment
Please do not enter any spam link in the comment box.