Hybrid comparison for unicode text strings consisting primarily of ASCII characters

申请号 US15719479 申请日 2017-09-28 公开(公告)号 US10089281B1 公开(公告)日 2018-10-02
申请人 Tableau Software, Inc.; 发明人 Thomas Neumann; Viktor Leis; Alfons Kemper;
摘要 Comparing text strings with Unicode encoding includes receiving two text strings S1 and S2. The process computes, for the first text string S1, a first weight according to a weight function ƒ that computes an ASCII prefix ƒA(S1), computes a Unicode weight suffix ƒU(S1), and concatenates the weights to form the first weight ƒ(S1)=ƒA(S1)+ƒU(S1). Computing the ASCII prefix for the first string applies bitwise operations to n-byte contiguous blocks of the first string to determine whether each block contains only ASCII characters, and replaces accented Unicode characters with equivalent unaccented ASCII characters when comparison is designated as accent-insensitive. When there is a first block containing a non-replaceable non-ASCII character, the Unicode weight suffix is computed by performing a character-by-character Unicode weight lookup beginning with the first block. The same process is applied to the second string. The text string are compared by comparing their computed weights.
权利要求
说明书全文
QQ群二维码
意见反馈