Prometheus 内存占用过大优化方法参考

摘要：Prometheus 自带的监控：Prometheus 会暴露自身的监控指标，可以通过 `http://:9090/metrics` 查看。

Prometheus 内存占用过高是一个常见问题，尤其是在监控目标数量较多或数据量较大的情况下。

Prometheus 自带的监控：Prometheus 会暴露自身的监控指标，可以通过 `http://

:9090/metrics` 查看。

`process_resident_memory_bytes`：Prometheus 进程占用的物理内存。`go_memstats_alloc_bytes`：Go 运行时分配的内存。`prometheus_tsdb_head_series`：当前时间序列的数量。`prometheus_tsdb_head_chunks`：当前内存中的 chunks 数量。**日志检查**：查看 Prometheus 的日志文件（通常位于 `/var/log/prometheus/` 或通过 `systemctl status prometheus` 查看），检查是否有内存相关的警告或错误。

通过调整 Prometheus 的配置文件（通常为 `prometheus.yml`），可以优化内存使用。

1. 减少抓取目标的数量

每个抓取目标都会增加 Prometheus 的内存占用。如果目标数量过多，可以考虑合并相似的抓取任务。使用 Prometheus 的 `relabel_configs` 过滤不必要的指标。

2. 减少抓取频率

降低 `scrape_interval`（抓取间隔时间）可以减少内存占用，但会降低数据精度。

3. 限制时间序列的数量

使用 `metric_relabel_configs` 删除不必要的指标或标签。

- 示例：

scrape_configs:- job_name: 'example'metric_relabel_configs:- source_labels: [__name__]regex: 'up|process_cpu_seconds_total' # 只保留特定的指标action: keep

4. 调整存储配置

在 `prometheus.yml` 中调整存储相关的配置：

storage:tsdb:retention: 15d # 减少数据保留时间max_block_chunk_segment_size: 512MB # 减少 chunk 大小

Prometheus 使用 TSDB 存储数据，优化 TSDB 可以显著减少内存占用。

1. 减少内存中的时间序列

通过 `--storage.tsdb.retention.time` 参数减少数据保留时间：

--storage.tsdb.retention.time=15d # 默认是 15 天，可以根据需要调整

2. 限制内存中的 chunks

通过 `--storage.tsdb.max-block-chunk-segment-size` 参数限制每个 chunk 的大小：

--storage.tsdb.max-block-chunk-segment-size=512MB

3. 启用内存映射

通过 `--storage.tsdb.memory-mapping` 参数启用内存映射，减少内存占用：

--storage.tsdb.memory-mapping

如果单个 Prometheus 实例无法处理所有数据，可以考虑使用分片（Sharding）和联邦（Federation）。

1. 分片

将抓取任务分配到多个 Prometheus 实例上，每个实例只处理一部分数据。

scrape_configs:- job_name: 'shard1'static_configs:- targets: ['target1:9090', 'target2:9090']- job_name: 'shard2'static_configs:- targets: ['target3:9090', 'target4:9090']

2. 联邦

使用联邦功能将多个 Prometheus 实例的数据聚合到一个中心实例中。

scrape_configs:- job_name: 'federate'honor_labels: truemetrics_path: '/federate'params:'match':- '{job="example"}'static_configs:- targets: ['prometheus1:9090', 'prometheus2:9090']

此外新版本通常会优化内存使用和性能，我们可以考虑升级Prometheus。如果本地存储的压力过大，可以考虑使用远程存储（如 Thanos、Cortex 或 M3DB）来扩展 Prometheus 的存储能力。

设置 Prometheus 的内存使用告警，及时发现和解决问题。示例告警规则：

groups:- name: prometheus_memoryrules:- alert: PrometheusMemoryUsageHighexpr: process_resident_memory_bytes / 1024 / 1024 > 4096 # 内存超过 4GBfor: 5mlabels:severity: criticalannotations:summary: "Prometheus memory usage is high"description: "Prometheus instance {{ $labels.instance }} is using {{ $value }} MB of memory."优化配置，减少抓取目标数量和抓取频率。调整 TSDB 参数，限制内存中的时间序列和 chunks。使用分片和联邦分担负载。升级到最新版本。使用远程存储扩展能力。

如果问题仍然存在，可以结合 Prometheus 的监控指标和日志进一步分析。