Explain what is n-gram?

N-gram:

An n-gram is a contiguous sequence of n items from a given sequence of text or speech. It is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n-1).

In the context of data analytics and natural language processing, an n-gram refers to a contiguous sequence of n items from a given sample of text or speech. These items could be words, characters, or other units depending on the application.

For example:

  • A unigram (or 1-gram) would be a single word.
  • A bigram (or 2-gram) would be a sequence of two adjacent words.
  • A trigram (or 3-gram) would be a sequence of three adjacent words.
  • And so on for higher values of n.

N-grams are widely used in various natural language processing tasks such as text mining, sentiment analysis, language modeling, and machine translation. They capture the local linguistic context of words within a text and are often used to extract features or patterns from textual data. Additionally, n-grams can help in understanding the syntactic structure and semantics of a language.