Class CharGroupTokenizer
A tokenizer that breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of PatternTokenizer is not acceptable.
Inherited Members
Namespace: OpenSearch.Client
Assembly: OpenSearch.Client.dll
Syntax
public class CharGroupTokenizer : TokenizerBase, ICharGroupTokenizer, ITokenizer
Constructors
| Edit this page View SourceCharGroupTokenizer()
Declaration
public CharGroupTokenizer()
Properties
| Edit this page View SourceMaxTokenLength
The maximum token length. If a token is seen that exceeds this length then
it is split at MaxTokenLength intervals. Defaults to 255
.
Declaration
public int? MaxTokenLength { get; set; }
Property Value
Type | Description |
---|---|
int? |
TokenizeOnCharacters
A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit, punctuation, symbol.
Declaration
public IEnumerable<string> TokenizeOnCharacters { get; set; }
Property Value
Type | Description |
---|---|
IEnumerable<string> |