Interface ICharGroupTokenizer
A tokenizer that breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful for cases where a simple custom tokenization is desired, and the overhead of use of PatternTokenizer is not acceptable.
Namespace: OpenSearch.Client
Assembly: OpenSearch.Client.dll
Syntax
public interface ICharGroupTokenizer : ITokenizer
Properties
| Edit this page View SourceMaxTokenLength
The maximum token length. If a token is seen that exceeds this length then
it is split at MaxTokenLength intervals. Defaults to 255
.
Declaration
[DataMember(Name = "max_token_length")]
int? MaxTokenLength { get; set; }
Property Value
Type | Description |
---|---|
int? |
TokenizeOnCharacters
A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit, punctuation, symbol.
Declaration
[DataMember(Name = "tokenize_on_chars")]
IEnumerable<string> TokenizeOnCharacters { get; set; }
Property Value
Type | Description |
---|---|
IEnumerable<string> |