Class CharGroupTokenizerDescriptor
A tokenizer that breaks text into terms whenever it encounters a character which is in a defined set. It is mostly useful
for cases where a simple custom tokenization is desired, and the overhead of use of PatternTokenizer is not acceptable.
Inheritance
CharGroupTokenizerDescriptor
Assembly: OpenSearch.Client.dll
Syntax
public class CharGroupTokenizerDescriptor : TokenizerDescriptorBase<CharGroupTokenizerDescriptor, ICharGroupTokenizer>, IDescriptor, ICharGroupTokenizer, ITokenizer
Properties
|
Edit this page
View Source
Type
Declaration
protected override string Type { get; }
Property Value
Overrides
Methods
|
Edit this page
View Source
MaxTokenLength(int?)
The maximum token length. If a token is seen that exceeds this length then
it is split at MaxTokenLength intervals. Defaults to 255
.
Declaration
public CharGroupTokenizerDescriptor MaxTokenLength(int? maxTokenLength)
Parameters
Type |
Name |
Description |
int? |
maxTokenLength |
|
Returns
|
Edit this page
View Source
TokenizeOnCharacters(IEnumerable<string>)
A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a
new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit,
punctuation, symbol.
Declaration
public CharGroupTokenizerDescriptor TokenizeOnCharacters(IEnumerable<string> characters)
Parameters
Returns
|
Edit this page
View Source
TokenizeOnCharacters(params string[])
A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a
new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit,
punctuation, symbol.
Declaration
public CharGroupTokenizerDescriptor TokenizeOnCharacters(params string[] characters)
Parameters
Type |
Name |
Description |
string[] |
characters |
|
Returns
Implements
Extension Methods