Search Results for

    Show / Hide Table of Contents

    Class TokenizersDescriptor

    Inheritance
    object
    DescriptorPromiseBase<TokenizersDescriptor, ITokenizers>
    IsADictionaryDescriptorBase<TokenizersDescriptor, ITokenizers, string, ITokenizer>
    TokenizersDescriptor
    Implements
    IDescriptor
    IPromise<ITokenizers>
    Inherited Members
    IsADictionaryDescriptorBase<TokenizersDescriptor, ITokenizers, string, ITokenizer>.Assign(string, ITokenizer)
    DescriptorPromiseBase<TokenizersDescriptor, ITokenizers>.Self
    DescriptorPromiseBase<TokenizersDescriptor, ITokenizers>.Assign(Action<ITokenizers>)
    DescriptorPromiseBase<TokenizersDescriptor, ITokenizers>.Assign<TNewValue>(TNewValue, Action<ITokenizers, TNewValue>)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: OpenSearch.Client
    Assembly: OpenSearch.Client.dll
    Syntax
    public class TokenizersDescriptor : IsADictionaryDescriptorBase<TokenizersDescriptor, ITokenizers, string, ITokenizer>, IDescriptor, IPromise<ITokenizers>

    Constructors

    | Edit this page View Source

    TokenizersDescriptor()

    Declaration
    public TokenizersDescriptor()

    Methods

    | Edit this page View Source

    CharGroup(string, Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer>)

    A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit, punctuation, symbol.

    Declaration
    public TokenizersDescriptor CharGroup(string name, Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    EdgeNGram(string, Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer>)

    A tokenizer of type edgeNGram.

    Declaration
    public TokenizersDescriptor EdgeNGram(string name, Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Icu(string, Func<IcuTokenizerDescriptor, IIcuTokenizer>)

    Tokenizes text into words on word boundaries, as defined in UAX #29: Unicode Text Segmentation. It behaves much like the standard tokenizer, but adds better support for some Asian languages by using a dictionary-based approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and using custom rules to break Myanmar and Khmer text into syllables. Part of the analysis-icu plugin:

    Declaration
    public TokenizersDescriptor Icu(string name, Func<IcuTokenizerDescriptor, IIcuTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<IcuTokenizerDescriptor, IIcuTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Keyword(string, Func<KeywordTokenizerDescriptor, IKeywordTokenizer>)

    A tokenizer of type keyword that emits the entire input as a single input.

    Declaration
    public TokenizersDescriptor Keyword(string name, Func<KeywordTokenizerDescriptor, IKeywordTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<KeywordTokenizerDescriptor, IKeywordTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Kuromoji(string, Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer>)

    A tokenizer of type pattern that can flexibly separate text into terms via a regular expression. Part of the analysis-kuromoji plugin:

    Declaration
    public TokenizersDescriptor Kuromoji(string name, Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Letter(string, Func<LetterTokenizerDescriptor, ILetterTokenizer>)

    A tokenizer of type letter that divides text at non-letters. That’s to say, it defines tokens as maximal strings of adjacent letters.

    Note, this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

    Declaration
    public TokenizersDescriptor Letter(string name, Func<LetterTokenizerDescriptor, ILetterTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<LetterTokenizerDescriptor, ILetterTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Lowercase(string, Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer>)

    A tokenizer of type lowercase that performs the function of Letter Tokenizer and Lower Case Token Filter together.

    It divides text at non-letters and converts them to lower case.

    While it is functionally equivalent to the combination of Letter Tokenizer and Lower Case Token Filter,

    there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

    Declaration
    public TokenizersDescriptor Lowercase(string name, Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    NGram(string, Func<NGramTokenizerDescriptor, INGramTokenizer>)

    A tokenizer of type nGram.

    Declaration
    public TokenizersDescriptor NGram(string name, Func<NGramTokenizerDescriptor, INGramTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<NGramTokenizerDescriptor, INGramTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Nori(string, Func<NoriTokenizerDescriptor, INoriTokenizer>)

    Tokenizer that ships with the analysis-nori plugin

    Declaration
    public TokenizersDescriptor Nori(string name, Func<NoriTokenizerDescriptor, INoriTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<NoriTokenizerDescriptor, INoriTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    PathHierarchy(string, Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer>)

    The path_hierarchy tokenizer takes something like this:

    /something/something/else

    And produces tokens:

    /something

    /something/something

    /something/something/else

    Declaration
    public TokenizersDescriptor PathHierarchy(string name, Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Pattern(string, Func<PatternTokenizerDescriptor, IPatternTokenizer>)

    A tokenizer of type pattern that can flexibly separate text into terms via a regular expression.

    Declaration
    public TokenizersDescriptor Pattern(string name, Func<PatternTokenizerDescriptor, IPatternTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<PatternTokenizerDescriptor, IPatternTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Standard(string, Func<StandardTokenizerDescriptor, IStandardTokenizer>)

    A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents.

    The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

    Declaration
    public TokenizersDescriptor Standard(string name, Func<StandardTokenizerDescriptor, IStandardTokenizer> selector = null)
    Parameters
    Type Name Description
    string name
    Func<StandardTokenizerDescriptor, IStandardTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    UaxEmailUrl(string, Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer>)

    A tokenizer of type uax_url_email which works exactly like the standard tokenizer, but tokenizes emails and urls as single tokens

    Declaration
    public TokenizersDescriptor UaxEmailUrl(string name, Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer> selector)
    Parameters
    Type Name Description
    string name
    Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    UserDefined(string, ITokenizer)

    Declaration
    public TokenizersDescriptor UserDefined(string name, ITokenizer analyzer)
    Parameters
    Type Name Description
    string name
    ITokenizer analyzer
    Returns
    Type Description
    TokenizersDescriptor
    | Edit this page View Source

    Whitespace(string, Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer>)

    A tokenizer of type whitespace that divides text at whitespace.

    Declaration
    public TokenizersDescriptor Whitespace(string name, Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer> selector = null)
    Parameters
    Type Name Description
    string name
    Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer> selector
    Returns
    Type Description
    TokenizersDescriptor

    Implements

    IDescriptor
    IPromise<TValue>

    Extension Methods

    SuffixExtensions.Suffix(object, string)
    • Edit this page
    • View Source
    In this article
    • Constructors
      • TokenizersDescriptor()
    • Methods
      • CharGroup(string, Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer>)
      • EdgeNGram(string, Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer>)
      • Icu(string, Func<IcuTokenizerDescriptor, IIcuTokenizer>)
      • Keyword(string, Func<KeywordTokenizerDescriptor, IKeywordTokenizer>)
      • Kuromoji(string, Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer>)
      • Letter(string, Func<LetterTokenizerDescriptor, ILetterTokenizer>)
      • Lowercase(string, Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer>)
      • NGram(string, Func<NGramTokenizerDescriptor, INGramTokenizer>)
      • Nori(string, Func<NoriTokenizerDescriptor, INoriTokenizer>)
      • PathHierarchy(string, Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer>)
      • Pattern(string, Func<PatternTokenizerDescriptor, IPatternTokenizer>)
      • Standard(string, Func<StandardTokenizerDescriptor, IStandardTokenizer>)
      • UaxEmailUrl(string, Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer>)
      • UserDefined(string, ITokenizer)
      • Whitespace(string, Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer>)
    • Implements
    • Extension Methods
    Back to top Generated by DocFX