TF-IDF (Term Frequency - Inverse Document Frequency) is a numerical statistic used to evaluate the importance of a word in a document relative to a collection or corpus of documents.
It consists of two components:
The formula for TF-IDF is:
\( \text{TF-IDF} = \text{TF} \times \text{IDF} \)
This weighting scheme helps prioritize rare but significant words over common but less informative ones (e.g., "the", "is", "and").
TF-IDF vectors are used for text representation in tasks like document classification and clustering. It helps highlight unique content in a document, making it a powerful tool for keyword extraction and information retrieval.