Text similarity/comparison
Native pacakage - SequenceMatcher
https://docs.python.org/3/library/difflib.html
There are are many are different string metrics like Levenshtein , Damerau-Levenshtein , Hamming distance , Jaro-Winkler and Strike a match .
Levenshtein
- much faster than sequenceMatcher
Locality-sensitive hashing
Elasticsearch
- It supports fuzzy search - which calculates Levenshtein distance
- Text similarity search with vectors
- Advanced usage - Text similarity with TF models and Elastic search
- https://www.ulam.io/blog/text-similarity-search-using-elasticsearch-and-python
- Not training a model, but use a pre-trained model to get text embeddings
- https://www.ulam.io/blog/text-similarity-search-using-elasticsearch-and-python