N-gram
A contiguous sequence of n items from a text, used in natural language processing and text analysis.
技术细节
N-gram operates on sequences of Unicode code points, where each character's properties (category, script, case, directionality) are defined by the Unicode standard. Text processing in the browser uses the TextEncoder/TextDecoder APIs for encoding conversion and Intl.Segmenter for locale-aware word and sentence boundary detection. Understanding the distinction between bytes, code units, code points, and grapheme clusters is essential for correct text manipulation.
示例
```javascript // N-gram: text processing example const input = 'Sample text for processing'; const result = input .trim() .split(/\s+/) .filter(Boolean); console.log(result); // ['Sample', 'text', 'for', 'processing'] ```