Interface IStandardTokenizer

A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents.

The tokenizer implements the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

public interface IStandardTokenizer : ITokenizer

| Edit this page View Source

The maximum token length. If a token is seen that exceeds this length then it is discarded. Defaults to 255.

[DataMember(Name = "max_token_length")]
int? MaxTokenLength { get; set; }

Type	Description
int?