Word Tokenization | AI Cloud Platform

Provided by VISAI logo

Word Tokenization

AI can split sentences into words for use in text search, keyword extraction, and data retrieval.

What is Word Tokenization?

Word tokenization is a process of defining boundaries between words in a sentence. Tokenization is a method of breaking raw texts into smaller units. Each unit is called a “token” which can be a word, subword, or character. In this model, a token refers to a word. Word tokenization is essential to many Natural Language Processing (NLP) pipelines such as text search, keyword extraction, etc. It is also crucial to doing NLP in Thai language which does not have word boundary in a sentence (no spaces between words).

Get Started

Request More Information About Our Model