如何调整ElasticSearch中字段的类型以适应长尾词搜索？

2026-03-27 06:011阅读0评论SEO问题

内容介绍
文章标签
相关推荐

本文共计451个文字，预计阅读时间需要2分钟。

（1）字符串+text+于全文检索，搜索时会自动使用分词器进行分词再匹配keyword+不分词，搜索时需要匹配完整的值（2）类型：byte，short，integer，long+浮点类型：float，half_float，sca

（1）、字符串

text ⽤于全⽂索引，搜索时会自动使用分词器进⾏分词再匹配

keyword 不分词，搜索时需要匹配完整的值

（2）、

整型：byte，short，integer，long

浮点型： float, half_float, scaled_float，double

（3）日期类型

date

（4）、范围型

integer_range， long_range， float_range，double_range，date_range

（5）布尔

boolean #true、false

（6）⼆进制

binary 会把值当做经过 base64 编码的字符串，默认不存储，且不可搜索

（7）、复杂数据类型

数组类型：array

对象类型：object

嵌套类型：nested object

（8）、专用数据类型

二、K分词器测试

IK提供了两个分词算法ik_smart 和 ik_max_word，其中 ik_smart 为最少切分，ik_max_word为最细粒度划分。

GET _analyze { "analyzer": "ik_smart", "text":"我是程序员" } { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "程序员", "start_offset" : 2, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 } ] }

最细切分ik_max_word GET _analyze { "analyzer": "ik_max_word", "text":"我是程序员" } 输出的结果为： { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "是", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "程序员", "start_offset" : 2, "end_offset" : 5, "type" : "CN_WORD", "position" : 2 }, { "token" : "程序", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 3 }, { "token" : "员", "start_offset" : 4, "end_offset" : 5, "type" : "CN_CHAR", "position" : 4 } ] }

标签：字段