Class KuromojiTokenizerDescriptor
Inheritance
KuromojiTokenizerDescriptor
Assembly: OpenSearch.Client.dll
Syntax
public class KuromojiTokenizerDescriptor : TokenizerDescriptorBase<KuromojiTokenizerDescriptor, IKuromojiTokenizer>, IDescriptor, IKuromojiTokenizer, ITokenizer
Properties
|
Edit this page
View Source
Type
Declaration
protected override string Type { get; }
Property Value
Overrides
Methods
|
Edit this page
View Source
DiscardCompoundToken(bool?)
Whether original compound tokens should be discarded from the output with
SearchMode. Defaults to false
.
Declaration
public KuromojiTokenizerDescriptor DiscardCompoundToken(bool? discard = true)
Parameters
Type |
Name |
Description |
bool? |
discard |
|
Returns
|
Edit this page
View Source
DiscardPunctuation(bool?)
Whether punctuation should be discarded from the output. Defaults to true.
Declaration
public KuromojiTokenizerDescriptor DiscardPunctuation(bool? discard = true)
Parameters
Type |
Name |
Description |
bool? |
discard |
|
Returns
|
Edit this page
View Source
Mode(KuromojiTokenizationMode?)
The tokenization mode determines how the tokenizer handles compound and unknown words.
Declaration
public KuromojiTokenizerDescriptor Mode(KuromojiTokenizationMode? mode)
Parameters
Returns
|
Edit this page
View Source
NBestCost(int?)
The nbest_cost parameter specifies an additional Viterbi cost. The KuromojiTokenizer will include all tokens in
Viterbi paths that are within the nbest_cost value of the best path.
Declaration
public KuromojiTokenizerDescriptor NBestCost(int? cost)
Parameters
Type |
Name |
Description |
int? |
cost |
|
Returns
|
Edit this page
View Source
NBestExamples(string)
The nbest_examples can be used to find a nbest_cost value based on examples. For example,
a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts, 箱根山 (Mt. Hakone) and 成田空港 (Narita Airport)
we’d like a cost that gives is us 箱根 (Hakone) and 成田 (Narita).
Declaration
public KuromojiTokenizerDescriptor NBestExamples(string examples)
Parameters
Type |
Name |
Description |
string |
examples |
|
Returns
|
Edit this page
View Source
UserDictionary(string)
The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A user_dictionary may be
appended to the default dictionary.
Declaration
public KuromojiTokenizerDescriptor UserDictionary(string userDictionary)
Parameters
Type |
Name |
Description |
string |
userDictionary |
|
Returns
|
Edit this page
View Source
UserDictionaryRules(IEnumerable<string>)
Declaration
public KuromojiTokenizerDescriptor UserDictionaryRules(IEnumerable<string> rules)
Parameters
Returns
|
Edit this page
View Source
UserDictionaryRules(params string[])
Declaration
public KuromojiTokenizerDescriptor UserDictionaryRules(params string[] rules)
Parameters
Type |
Name |
Description |
string[] |
rules |
|
Returns
Implements
Extension Methods