Search Results for

    Show / Hide Table of Contents

    Interface IKuromojiTokenizer

    A tokenizer of type pattern that can flexibly separate text into terms via a regular expression. Part of the analysis-kuromoji plugin:

    Inherited Members
    ITokenizer.Type
    ITokenizer.Version
    Namespace: OpenSearch.Client
    Assembly: OpenSearch.Client.dll
    Syntax
    public interface IKuromojiTokenizer : ITokenizer

    Properties

    | Edit this page View Source

    DiscardCompoundToken

    Whether original compound tokens should be discarded from the output with Search Mode. Defaults to false.

    Declaration
    [DataMember(Name = "discard_compound_token")]
    bool? DiscardCompoundToken { get; set; }
    Property Value
    Type Description
    bool?
    | Edit this page View Source

    DiscardPunctuation

    Whether punctuation should be discarded from the output. Defaults to true.

    Declaration
    [DataMember(Name = "discard_punctuation")]
    bool? DiscardPunctuation { get; set; }
    Property Value
    Type Description
    bool?
    | Edit this page View Source

    Mode

    The tokenization mode determines how the tokenizer handles compound and unknown words.

    Declaration
    [DataMember(Name = "mode")]
    KuromojiTokenizationMode? Mode { get; set; }
    Property Value
    Type Description
    KuromojiTokenizationMode?
    | Edit this page View Source

    NBestCost

    The nbest_cost parameter specifies an additional Viterbi cost. The KuromojiTokenizer will include all tokens in Viterbi paths that are within the nbest_cost value of the best path.

    Declaration
    [DataMember(Name = "nbest_cost")]
    int? NBestCost { get; set; }
    Property Value
    Type Description
    int?
    | Edit this page View Source

    NBestExamples

    The nbest_examples can be used to find a nbest_cost value based on examples. For example, a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts, 箱根山 (Mt. Hakone) and 成田空港 (Narita Airport) we’d like a cost that gives is us 箱根 (Hakone) and 成田 (Narita).

    Declaration
    [DataMember(Name = "nbest_examples")]
    string NBestExamples { get; set; }
    Property Value
    Type Description
    string
    | Edit this page View Source

    UserDictionary

    The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A user_dictionary may be appended to the default dictionary.

    Declaration
    [DataMember(Name = "user_dictionary")]
    string UserDictionary { get; set; }
    Property Value
    Type Description
    string
    | Edit this page View Source

    UserDictionaryRules

    Inline rule version of UserDictionary

    Declaration
    [DataMember(Name = "user_dictionary_rules")]
    IEnumerable<string> UserDictionaryRules { get; set; }
    Property Value
    Type Description
    IEnumerable<string>

    Extension Methods

    SuffixExtensions.Suffix(object, string)
    • Edit this page
    • View Source
    In this article
    • Properties
      • DiscardCompoundToken
      • DiscardPunctuation
      • Mode
      • NBestCost
      • NBestExamples
      • UserDictionary
      • UserDictionaryRules
    • Extension Methods
    Back to top Generated by DocFX