Package ai.djl.modality.nlp.preprocess
package ai.djl.modality.nlp.preprocess
Contains utility classes for natural language pre-processing tasks.
-
ClassDescriptionUnicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP input.
TextProcessorwill apply user defined lambda function on input tokens.LowerCaseConvertorconverts every character of the input tokens to it's respective lower case character.PunctuationSeparatorseparates punctuation into a separate token.SimpleTokenizeris an implementation of theTokenizerinterface that converts sentences into token by splitting them by a given delimiter.Applies remove or replace of certain characters based on condition.TextProcessorallows applying pre-processing to input tokens for natural language applications.ATextProcessorthat adds a beginning of string and end of string token.TextProcessorthat truncates text to a maximum size.Tokenizerinterface provides the ability to break-down sentences into embeddable tokens.Applies unicode normalization to input strings.