Text Normalization is the process of transforming an input text into a standard, consistent format before further processing (like tokenization, search, or machine learning).

Some important aspects when normalizing text:

  1. Punctuation Handling
  2. Numbers, Dates, and Quantities
  3. Abbreviations and Acronyms
  4. Homographs (same spelling, different pronunciation/meaning)
  5. Case Normalization
  6. Special Symbols & Units
  7. Handling Non-standard Words (NSWs)

Algorithms/Methods Used

Example: “Hello!! How are you today, Dr. Manner from the 123 Hospital?”.