Search Results for

    Show / Hide Table of Contents

    Class AnalyzeTokenizersSelector

    Inheritance
    object
    SelectorBase
    AnalyzeTokenizersSelector
    Implements
    ISelector
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: OpenSearch.Client
    Assembly: OpenSearch.Client.dll
    Syntax
    public class AnalyzeTokenizersSelector : SelectorBase, ISelector

    Methods

    | Edit this page View Source

    CharGroup(Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer>)

    A list containing a list of characters to tokenize the string on. Whenever a character from this list is encountered, a new token is started. This accepts either single characters like eg. -, or character groups: whitespace, letter, digit, punctuation, symbol.

    Declaration
    public ITokenizer CharGroup(Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer> selector)
    Parameters
    Type Name Description
    Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    EdgeNGram(Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer>)

    A tokenizer of type edgeNGram.

    Declaration
    public ITokenizer EdgeNGram(Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer> selector)
    Parameters
    Type Name Description
    Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Icu(Func<IcuTokenizerDescriptor, IIcuTokenizer>)

    Tokenizes text into words on word boundaries, as defined in UAX #29: Unicode Text Segmentation. It behaves much like the standard tokenizer, but adds better support for some Asian languages by using a dictionary-based approach to identify words in Thai, Lao, Chinese, Japanese, and Korean, and using custom rules to break Myanmar and Khmer text into syllables. Part of the analysis-icu plugin:

    Declaration
    public ITokenizer Icu(Func<IcuTokenizerDescriptor, IIcuTokenizer> selector)
    Parameters
    Type Name Description
    Func<IcuTokenizerDescriptor, IIcuTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Keyword(Func<KeywordTokenizerDescriptor, IKeywordTokenizer>)

    A tokenizer of type keyword that emits the entire input as a single input.

    Declaration
    public ITokenizer Keyword(Func<KeywordTokenizerDescriptor, IKeywordTokenizer> selector)
    Parameters
    Type Name Description
    Func<KeywordTokenizerDescriptor, IKeywordTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Kuromoji(Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer>)

    A tokenizer of type pattern that can flexibly separate text into terms via a regular expression. Part of the analysis-kuromoji plugin:

    Declaration
    public ITokenizer Kuromoji(Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer> selector)
    Parameters
    Type Name Description
    Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Letter(Func<LetterTokenizerDescriptor, ILetterTokenizer>)

    A tokenizer of type letter that divides text at non-letters. That’s to say, it defines tokens as maximal strings of adjacent letters.

    Note, this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

    Declaration
    public ITokenizer Letter(Func<LetterTokenizerDescriptor, ILetterTokenizer> selector)
    Parameters
    Type Name Description
    Func<LetterTokenizerDescriptor, ILetterTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Lowercase(Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer>)

    A tokenizer of type lowercase that performs the function of Letter Tokenizer and Lower Case Token Filter together.

    It divides text at non-letters and converts them to lower case.

    While it is functionally equivalent to the combination of Letter Tokenizer and Lower Case Token Filter,

    there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

    Declaration
    public ITokenizer Lowercase(Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer> selector)
    Parameters
    Type Name Description
    Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    NGram(Func<NGramTokenizerDescriptor, INGramTokenizer>)

    A tokenizer of type nGram.

    Declaration
    public ITokenizer NGram(Func<NGramTokenizerDescriptor, INGramTokenizer> selector)
    Parameters
    Type Name Description
    Func<NGramTokenizerDescriptor, INGramTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Nori(Func<NoriTokenizerDescriptor, INoriTokenizer>)

    Tokenizer that ships with the analysis-nori plugin

    Declaration
    public ITokenizer Nori(Func<NoriTokenizerDescriptor, INoriTokenizer> selector)
    Parameters
    Type Name Description
    Func<NoriTokenizerDescriptor, INoriTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    PathHierarchy(Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer>)

    The path_hierarchy tokenizer takes something like this:

    /something/something/else

    And produces tokens:

    /something

    /something/something

    /something/something/else

    Declaration
    public ITokenizer PathHierarchy(Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer> selector)
    Parameters
    Type Name Description
    Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Pattern(Func<PatternTokenizerDescriptor, IPatternTokenizer>)

    A tokenizer of type pattern that can flexibly separate text into terms via a regular expression.

    Declaration
    public ITokenizer Pattern(Func<PatternTokenizerDescriptor, IPatternTokenizer> selector)
    Parameters
    Type Name Description
    Func<PatternTokenizerDescriptor, IPatternTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Standard(Func<StandardTokenizerDescriptor, IStandardTokenizer>)

    A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents.

    The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

    Declaration
    public ITokenizer Standard(Func<StandardTokenizerDescriptor, IStandardTokenizer> selector = null)
    Parameters
    Type Name Description
    Func<StandardTokenizerDescriptor, IStandardTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    UaxEmailUrl(Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer>)

    A tokenizer of type uax_url_email which works exactly like the standard tokenizer, but tokenizes emails and urls as single tokens

    Declaration
    public ITokenizer UaxEmailUrl(Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer> selector)
    Parameters
    Type Name Description
    Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer> selector
    Returns
    Type Description
    ITokenizer
    | Edit this page View Source

    Whitespace(Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer>)

    A tokenizer of type whitespace that divides text at whitespace.

    Declaration
    public ITokenizer Whitespace(Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer> selector = null)
    Parameters
    Type Name Description
    Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer> selector
    Returns
    Type Description
    ITokenizer

    Implements

    ISelector

    Extension Methods

    SuffixExtensions.Suffix(object, string)
    • Edit this page
    • View Source
    In this article
    • Methods
      • CharGroup(Func<CharGroupTokenizerDescriptor, ICharGroupTokenizer>)
      • EdgeNGram(Func<EdgeNGramTokenizerDescriptor, IEdgeNGramTokenizer>)
      • Icu(Func<IcuTokenizerDescriptor, IIcuTokenizer>)
      • Keyword(Func<KeywordTokenizerDescriptor, IKeywordTokenizer>)
      • Kuromoji(Func<KuromojiTokenizerDescriptor, IKuromojiTokenizer>)
      • Letter(Func<LetterTokenizerDescriptor, ILetterTokenizer>)
      • Lowercase(Func<LowercaseTokenizerDescriptor, ILowercaseTokenizer>)
      • NGram(Func<NGramTokenizerDescriptor, INGramTokenizer>)
      • Nori(Func<NoriTokenizerDescriptor, INoriTokenizer>)
      • PathHierarchy(Func<PathHierarchyTokenizerDescriptor, IPathHierarchyTokenizer>)
      • Pattern(Func<PatternTokenizerDescriptor, IPatternTokenizer>)
      • Standard(Func<StandardTokenizerDescriptor, IStandardTokenizer>)
      • UaxEmailUrl(Func<UaxEmailUrlTokenizerDescriptor, IUaxEmailUrlTokenizer>)
      • Whitespace(Func<WhitespaceTokenizerDescriptor, IWhitespaceTokenizer>)
    • Implements
    • Extension Methods
    Back to top Generated by DocFX