在Doris中, 实时导入数据是通过Stream Load来实现的, 通过我们的一些实践发现, stream load频率和每次load的数据量对于Doris服务稳定有极大的影响.
目前doris最新版为0.13.0, 还是建议Doris以微批的方式, 以分钟级间隔通过Stream Load导入. 所以监控和告警就不得不做.
Grafana: 查看监控数据
Prometheus: 异常指标报警
Doris自身: 统计Stream Load频率, 次数等
OpenResty是基于nginx的一个应用层工具平台, 集成了许多有用的功能, 比如支持直接lua脚本. 更多介绍: https://openresty.org/cn/
1. 每个用户stream load频率
2. 每个表stream load频率
3. 每个db stream load频率
4. 每个stream load耗时、load结果
5. stream load按天等维度的次数统计
6. stream load限速
wget https://openresty.org/package/centos/openresty.repo
sudo mv openresty.repo /etc/yum.repos.d/
sudo yum check-update
sudo yum install -y openresty
默认安装目录: /usr/local/openresty/
vim /usr/local/openresty/nginx/conf/nginx.conf
内容如下:http {
include mime.types;
default_type application/octet-stream;
include /etc/nginx/conf.d/*.conf; ## 需要修改
sendfile on;
keepalive_timeout 65;
server {
listen 80;
server_name localhost;
location / {
root html;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
- 配置stream load转发
在对应的conf.d/目录中添加配置文件, vim /etc/nginx/conf.d/doris_stream_load.conf
upstream normal_fe {
server fe ip:fe http端口; ## 修改
}
underscores_in_headers on;
log_format load_access_log_format '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'$upstream_response_time $request_time $content_length $http_label $resp_body';
server {
listen 9001;
access_log /data/logs/nginx/access.log load_access_log_format; ## 目录修改
error_log /data/logs/nginx/error.log; ## 目录修改
client_max_body_size 100000M;
proxy_connect_timeout 300;
proxy_send_timeout 300;
proxy_read_timeout 300;
send_timeout 300;
underscores_in_headers on;
set $resp_body "";
lua_need_request_body on;
body_filter_by_lua '
local resp_body = string.sub(ngx.arg[1], 1, 1000)
ngx.ctx.buffered = (ngx.ctx.buffered or "") .. resp_body
if ngx.arg[2] then
ngx.var.resp_body = ngx.ctx.buffered
end
';
location / {
proxy_pass http://normal_fe;
proxy_set_header Expect '100-continue';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_intercept_errors on;
error_page 301 302 307 = @mirrorredirect;
}
location @mirrorredirect {
set $redirect_uri '$upstream_http_location';
proxy_pass $redirect_uri;
proxy_set_header Expect '100-continue';
proxy_pass_request_body on;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
### 启动OpenResty
### 发送一个stream load请求, openresty的访问日志就包含监控指标, 如下一条
1.1.3.65 - admin [10/Dec/2020:22:35:36 +0800] "PUT /api/test/tbl_stream_load_mirror_02/_stream_load HTTP/1.1" 200 447 "-" "curl/7.29.0" "-" 0.003 : 0.032 3.034 390 stream_load_mirror_009 {\x0A \x22TxnId\x22: 14613236,\x0A \x22Label\x22: \x22stream_load_mirror_009\x22,\x0A \x22Status\x22: \x22Success\x22,\x0A \x22Message\x22: \x22OK\x22,\x0A \x22NumberTotalRows\x22: 10,\x0A \x22NumberLoadedRows\x22: 10,\x0A \x22NumberFilteredRows\x22: 0,\x0A \x22NumberUnselectedRows\x22: 0,\x0A \x22LoadBytes\x22: 390,\x0A \x22LoadTimeMs\x22: 31,\x0A \x22BeginTxnTimeMs\x22: 1,\x0A \x22StreamLoadPutTimeMs\x22: 1,\x0A \x22ReadDataTimeMs\x22: 0,\x0A \x22WriteDataTimeMs\x22: 14,\x0A \x22CommitAndPublishTimeMs\x22: 14\x0A}
```
每个字段和load_access_log_format一一对应, 再把这些数据load进入doris, 我们想要的各种监控数据就有啦.
最后一个json就是stream 返回结果, 我们将:
\x0A 替换为 \n
\x22 替换为 "
就可以看到解码后正常结果.
欢迎添加微信,互相学习↑↑↑ -_-
白老虎
programming is not only to solve problems, ways to think
grafana 级连 菜单 templating (variables) 配置
rocketmq 集群搭建 (2master + 2slave + 2namesrv)
AI 机器人 抓取 微信 聊天中的 百度网盘 分享地址和密码