Class NlpUtils

java.lang.Object
ai.djl.modality.nlp.NlpUtils

public final class NlpUtils extends Object
Utility functions for processing String and Characters in NLP problems.
  • Method Details

    • isWhiteSpace

      public static boolean isWhiteSpace(char c)
      Check whether a character is is considered as a whitespace.

      tab, newline and unicode space characters are all considered as whitespace.

      Parameters:
      c - input character to be checked.
      Returns:
      whether a character is considered as a whitespace
    • isControl

      public static boolean isControl(char c)
      Check whether a character is is considered as a control character.

      tab, newline and ios control characters are all considered as control character.

      Parameters:
      c - input character to be checked.
      Returns:
      whether a character is considered as control character
    • isPunctuation

      public static boolean isPunctuation(char c)
      Check whether a character is considered as a punctuation.

      We treat all non-letter/number ASCII as punctuation. Characters such as "^", "$", and "`" are not in the Unicode Punctuation class but we treat them as punctuation anyways, for consistency.

      Parameters:
      c - input character to be checked
      Returns:
      whether the character is considered as a punctuation