Elasticeach
ES是什么?
官方文档:Documentation
搜索与分析数据平台
很方便的使大量数据具有搜索、分析和探索的能力
首先用户将数据提交到Elasticsearch 数据库中,再通过分词控制器去将对应的语句分词,将其权重和分词结果一并存入数据,当用户搜索数据时候,再根据权重将结果排名,打分,再将返回结果呈现给用户。
组成
- 索引 可以类比一张表
- 文档 可以类比行数据
- 字段 可以类比列数据
倒排索引
ES实现搜索的关键
每个文档在被索引的时候,会被分词,然后ES会为这些词进行排序,并标记这些词在哪些文档中出现过。
这样在查询的时候,就能快速的找到所在的文档有哪些。
打分
如果更加准确的找到用户想要的数据,就涉及到打分
打分的依据
- 词频
- 反向文档频率
- 文档长度
这里解释下方向文档频率,举个小例子:假如现在有10条文档,都包含了A这个词,然后只有其中一条包含了B。那我们在搜索AB的时候,包含同时包含AB的这个文档的得分就会比较高。
语法
基于json的方式查询
{
"query": {
"match": {
"message": "Elasticsearch 是强大的"
}
}
}
基于sql的方式查询
SELECT name, age FROM users WHERE age > 30 ORDER BY age DESC
查询条件
具体可以翻阅官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-match-query.html
ES客户端
如何使用es呢
- api接口:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/rest-apis.html
- spring data elasticseach:https://spring.io/projects/spring-data-elasticsearch
- java提供的:https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/7.17/connecting.html
- kibana
实际业务中的数据如何传递给ES
定时任务
定时查询关系数据库,将需要查询搜索的数据写到ES中
双写
手动在写数据库的时候,往ES写数据,删除也同理
引入logstash
还需要引入kafka消息队列和beats数据采集器才能用。
基于配置文件,业务解耦,扩展性好
监听binlog
数据有变动的时候,同步到ES 可以使用阿里开源的Canal来做
扩展
ELK:日志分析平台,名为Logstash的数据收集和日志解析引擎以及名为Kibana的分析和可视化平台
下载安装
elsaticseach 和 kibana 的版本要对应,另外要看项目springboot版本是多少
另外springboot 2.x 对应的 spring data elasticseach 版本是 4.x 对应 es 的版本是 7.x
安装es
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/7.17/setup.html
启动
.\bin\elasticsearch.bat
测试
curl -X GET "localhost:9200/?pretty"
在windows中安装es为服务
.\bin\elasticsearch-service.bat
Usage: elasticsearch-service.bat install|remove|start|stop|manager [SERVICE_ID]
安装kibana
下载对应版本解压 执行命令
.\bin\kibana.bat
启动成功后,访问 localhost:5601 即可开始
测试
打开kibana的开发者工具,可以在这里进行命令测试
POST /_analyze
{
"analyzer": "standard",
"text": "这是一个测试命令"
}
执行后,可以在右边看到分词的效果
安装分词器(ik)中文分词器
可以执行命令,直接安装,也可以下载扩展包手动安装,注意版本要和ES版本适配
安装包:https://release.infinilabs.com/analysis-ik/stable/
github:https://github.com/infinilabs/analysis-ik
.\bin\elasticsearch-plugin.bat install https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-7.17.23.zip
ik为我们提供了两种分词策略
ik_smart:智能分词,即尽可能分出来像词语的
ik_max_word:尽可能更多的进行分词
我们再次在kibana中进行测试
POST /_analyze
{
"analyzer": "ik_smart",
"text": "张三是好学生"
}
新建索引
PUT /question_v1
{
"aliases": {
"question": {}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"tags": {
"type": "keyword"
},
"answer": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"userId": {
"type": "long"
},
"editTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"createTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updateTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"isDelete": {
"type": "keyword"
}
}
}
}
- 索引别名,用来给外部进行查询,做为一个映射,当索引名变更时,查询代码可以不变
- 字段定义为text类型,适合存储较长的需要搜索的文本内容
- keyword,用于存储不分词的数据,可以用来做精准匹配
- ignore_above:当数据超过范围后,不再进行精准匹配
- date类型尽量进行格式化,保持一致性,可以高效的进行搜索和查询
- 所有类型都支持数组,比如tag,如果是数组,es会对数组中每一项进行存储,在查找的时候进行单独匹配
后端集成
引入坐标
<!-- elasticsearch-->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
添加配置
spring:
elasticsearch:
uris: http://xxx:9200
username: elastic
password: coder_yupi_swag
使用spring自带的模板方法进行操作
@Resource
private ElasticsearchRestTemplate elasticsearchRestTemplate;
编写单元测试
package com.yupi.mianshiya.es;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.IndexOperations;
import org.springframework.data.elasticsearch.core.document.Document;
import org.springframework.data.elasticsearch.core.query.*;
import org.springframework.data.elasticsearch.core.mapping.IndexCoordinates;
import java.util.HashMap;
import java.util.Map;
import static org.assertj.core.api.Assertions.assertThat;
@SpringBootTest
public class ElasticsearchRestTemplateTest {
@Autowired
private ElasticsearchRestTemplate elasticsearchRestTemplate;
private final String INDEX_NAME = "test_index";
// Index (Create) a document
@Test
public void indexDocument() {
Map<String, Object> doc = new HashMap<>();
doc.put("title", "Elasticsearch Introduction");
doc.put("content", "Learn Elasticsearch basics and advanced usage.");
doc.put("tags", "elasticsearch,search");
doc.put("answer", "Yes");
doc.put("userId", 1L);
doc.put("editTime", "2023-09-01 10:00:00");
doc.put("createTime", "2023-09-01 09:00:00");
doc.put("updateTime", "2023-09-01 09:10:00");
doc.put("isDelete", false);
IndexQuery indexQuery = new IndexQueryBuilder().withId("1").withObject(doc).build();
String documentId = elasticsearchRestTemplate.index(indexQuery, IndexCoordinates.of(INDEX_NAME));
assertThat(documentId).isNotNull();
}
// Get (Retrieve) a document by ID
@Test
public void getDocument() {
String documentId = "1"; // Replace with the actual ID of an indexed document
Map<String, Object> document = elasticsearchRestTemplate.get(documentId, Map.class, IndexCoordinates.of(INDEX_NAME));
assertThat(document).isNotNull();
assertThat(document.get("title")).isEqualTo("Elasticsearch Introduction");
}
// Update a document
@Test
public void updateDocument() {
String documentId = "1"; // Replace with the actual ID of an indexed document
Map<String, Object> updates = new HashMap<>();
updates.put("title", "Updated Elasticsearch Title");
updates.put("updateTime", "2023-09-01 10:30:00");
UpdateQuery updateQuery = UpdateQuery.builder(documentId)
.withDocument(Document.from(updates))
.build();
elasticsearchRestTemplate.update(updateQuery, IndexCoordinates.of(INDEX_NAME));
Map<String, Object> updatedDocument = elasticsearchRestTemplate.get(documentId, Map.class, IndexCoordinates.of(INDEX_NAME));
assertThat(updatedDocument.get("title")).isEqualTo("Updated Elasticsearch Title");
}
// Delete a document
@Test
public void deleteDocument() {
String documentId = "1"; // Replace with the actual ID of an indexed document
String result = elasticsearchRestTemplate.delete(documentId, IndexCoordinates.of(INDEX_NAME));
assertThat(result).isNotNull();
}
// Delete the entire index
@Test
public void deleteIndex() {
IndexOperations indexOps = elasticsearchRestTemplate.indexOps(IndexCoordinates.of(INDEX_NAME));
boolean deleted = indexOps.delete();
assertThat(deleted).isTrue();
}
}
业务代码参考
@Override
public Page<Question> searchFromEs(QuestionQueryRequest questionQueryRequest) {
// 获取参数
Long id = questionQueryRequest.getId();
Long notId = questionQueryRequest.getNotId();
String searchText = questionQueryRequest.getSearchText();
List<String> tags = questionQueryRequest.getTags();
Long questionBankId = questionQueryRequest.getQuestionBankId();
Long userId = questionQueryRequest.getUserId();
// 注意,ES 的起始页为 0
int current = questionQueryRequest.getCurrent() - 1;
int pageSize = questionQueryRequest.getPageSize();
String sortField = questionQueryRequest.getSortField();
String sortOrder = questionQueryRequest.getSortOrder();
// 构造查询条件
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
// 过滤
boolQueryBuilder.filter(QueryBuilders.termQuery("isDelete", 0));
if (id != null) {
boolQueryBuilder.filter(QueryBuilders.termQuery("id", id));
}
if (notId != null) {
boolQueryBuilder.mustNot(QueryBuilders.termQuery("id", notId));
}
if (userId != null) {
boolQueryBuilder.filter(QueryBuilders.termQuery("userId", userId));
}
if (questionBankId != null) {
boolQueryBuilder.filter(QueryBuilders.termQuery("questionBankId", questionBankId));
}
// 必须包含所有标签
if (CollUtil.isNotEmpty(tags)) {
for (String tag : tags) {
boolQueryBuilder.filter(QueryBuilders.termQuery("tags", tag));
}
}
// 按关键词检索
if (StringUtils.isNotBlank(searchText)) {
boolQueryBuilder.should(QueryBuilders.matchQuery("title", searchText));
boolQueryBuilder.should(QueryBuilders.matchQuery("content", searchText));
boolQueryBuilder.should(QueryBuilders.matchQuery("answer", searchText));
boolQueryBuilder.minimumShouldMatch(1);
}
// 排序
SortBuilder<?> sortBuilder = SortBuilders.scoreSort();
if (StringUtils.isNotBlank(sortField)) {
sortBuilder = SortBuilders.fieldSort(sortField);
sortBuilder.order(CommonConstant.SORT_ORDER_ASC.equals(sortOrder) ? SortOrder.ASC : SortOrder.DESC);
}
// 分页
PageRequest pageRequest = PageRequest.of(current, pageSize);
// 构造查询
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(boolQueryBuilder)
.withPageable(pageRequest)
.withSorts(sortBuilder)
.build();
SearchHits<QuestionEsDTO> searchHits = elasticsearchRestTemplate.search(searchQuery, QuestionEsDTO.class);
// 复用 MySQL 的分页对象,封装返回结果
Page<Question> page = new Page<>();
page.setTotal(searchHits.getTotalHits());
List<Question> resourceList = new ArrayList<>();
if (searchHits.hasSearchHits()) {
List<SearchHit<QuestionEsDTO>> searchHitList = searchHits.getSearchHits();
for (SearchHit<QuestionEsDTO> questionEsDTOSearchHit : searchHitList) {
resourceList.add(QuestionEsDTO.dtoToObj(questionEsDTOSearchHit.getContent()));
}
}
page.setRecords(resourceList);
return page;
}