N-gram:
An n-gram is a contiguous sequence of n items from a given sequence of text or speech. It is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n-1).
In the context of data analytics and natural language processing, an n-gram refers to a contiguous sequence of n items from a given sample of text or speech. These items could be words, characters, or other units depending on the application.
For example:
- A unigram (or 1-gram) would be a single word.
- A bigram (or 2-gram) would be a sequence of two adjacent words.
- A trigram (or 3-gram) would be a sequence of three adjacent words.
- And so on for higher values of n.
N-grams are widely used in various natural language processing tasks such as text mining, sentiment analysis, language modeling, and machine translation. They capture the local linguistic context of words within a text and are often used to extract features or patterns from textual data. Additionally, n-grams can help in understanding the syntactic structure and semantics of a language.