vendor lock-in → exit plan
Get an exact quote
Monitoring migration path

From Datadog to Prometheus

Cost comparison, a phase-by-phase migration plan, and the automation to execute it.

Effort
High
Est. timeline
~18 wks
Prometheus model
Free (self-hosted)
Open source
Yes
▶ Model your savings in the calculator

3-year cost calculator

Pre-filled for Datadog → Prometheus. Adjust every figure with your own numbers.

Every figure here is an illustrative estimate, not a vendor quote. Defaults are editable starting points compiled from public information; real, binding pricing comes from the vendor or an authorized distributor. See our methodology.

Sized at 300 monitored hosts — cost is computed on this.
Stay on Datadog (3yr)
$243,000
Move to Prometheus (3yr + migration)
$88,200
Projected savings
$154,800 (64%)
Payback period
11.4 mo
Build a decision report from these numbers:

All figures are illustrative and fully editable — adjust the cost-per-host and migration inputs with your own numbers. Not guaranteed vendor pricing (defaults reviewed May 2026). For a binding quote, use the request form below to reach an authorized distributor or partner.

Quick comparison: Datadog vs Prometheus

Common trade-offs teams weigh when staying on Datadog versus moving to Prometheus. These are general, commonly-reported considerations — not statements of fact about any vendor — so check them against your own contract and the vendors' current terms.

Datadog Current
Datadog · Usage-based (host + ingest)
  • Already in production — no migration effort or risk
  • Mature ecosystem with vendor support and SLAs
  • Per-host plus per-GB ingest billing is hard to predict
  • Costs spike with custom metrics and high cardinality
  • Each module (APM, logs, RUM) is billed separately
  • Bill shock at scale is widely reported
  • Ongoing usage-based (host + ingest) cost to budget for
  • Higher vendor lock-in to weigh
Prometheus Planned
Open source · Free (self-hosted)
  • Open source — no license fees
  • No vendor lock-in
  • Cost model: Free (self-hosted)
  • Requires a migration (~18 weeks, high effort)
  • Community support by default — paid support optional
  • Higher operational learning curve

In-depth guide

Datadog is a superb, all-in-one observability platform — and its consumption-based pricing (per host, plus per-GB ingest, plus per-module for APM, logs, RUM, and more) is one of the most unpredictable line items in modern infrastructure. Bill shock at scale is widely reported. The most common open-source replacement is a Prometheus + Grafana stack, often extended with Loki for logs and Tempo for traces. This guide is about doing that swap deliberately, not heroically.

Why teams leave Datadog

It’s almost never dissatisfaction with the product. It’s the cost trajectory: every new host, custom metric, high-cardinality tag, and module compounds the bill, and forecasting it is hard. Teams with the engineering capacity to run their own stack can cut spend dramatically — trading a managed SaaS for operational ownership.

What you’re actually replacing

Datadog bundles several products. Map each before you start:

  • Infrastructure metrics → Prometheus (with node_exporter, cAdvisor, and app /metrics endpoints) or the OpenTelemetry Collector.
  • Dashboards → Grafana.
  • Monitors/alerts → Prometheus alerting rules + Alertmanager (routing, grouping, silences, paging).
  • Logs → Loki (or Elasticsearch/OpenSearch).
  • APM/traces → Tempo (or Jaeger), instrumented via OpenTelemetry.
  • Synthetics/RUM → separate tooling (e.g., Blackbox exporter for uptime; RUM has fewer turnkey OSS options).

The honest gap: Datadog’s correlation across metrics/logs/traces and its polished UX take real effort to approximate. Grafana + Loki + Tempo get you most of the way, but you own the integration.

Sizing and cost model

Datadog bills largely per monitored host (plus ingest). Size your migration on the number of hosts/devices under monitoring and your metric/log volume. Self-hosting shifts cost to compute + storage + engineering time — usually far lower at scale, but not zero. Plan retention deliberately: long-retention, high-cardinality metrics are what made Datadog expensive, and they’ll size your Prometheus/Mimir and Loki storage too.

A safe migration flow

  1. Inventory what Datadog is doing for you: dashboards, monitors, integrations, retention, and paging/ticketing hooks. Export dashboards and monitors via the Datadog API.
  2. Stand up the stack. A common path is the kube-prometheus-stack Helm chart (Prometheus + Alertmanager + Grafana) for Kubernetes, plus Loki/Tempo as needed. Deploy node_exporter and instrument apps with exporters or OpenTelemetry.
  3. Recreate the essentials first. Translate your most important monitors into PromQL alerting rules and rebuild the top dashboards in Grafana (or import community equivalents). Don’t try to recreate everything on day one — prioritize what pages humans.
  4. Dual-run. Keep Datadog and the new stack running side by side; compare coverage, alert fidelity, and false-positive rates. This is where you find the gaps.
  5. Cut over paging. Move Alertmanager → PagerDuty/Opsgenie/email once you trust the alerts, then decommission Datadog agents.

PromQL is a real shift

Datadog’s query language and Prometheus’s PromQL are different models. Rate calculations, histogram_quantile, label matching, and recording rules all need learning. Budget time for your on-call engineers to get fluent — alert quality depends on it. Recording rules and sensible scrape intervals also keep cardinality (and cost) under control.

Validation before you trust it

Before switching paging off Datadog: fire test alerts end-to-end (trigger → Alertmanager → pager → acknowledgement), do a dashboard parity review against the metrics that matter, and run a retention/scale load test so you’re not surprised when storage fills. Treat “we get paged correctly for the incidents we care about” as the acceptance bar.

Bottom line

Datadog → Prometheus/Grafana is primarily a cost and ownership decision. The metrics and dashboards migrate well; alerting needs PromQL fluency; correlated logs/traces and polished UX take the most effort to match. Run both stacks in parallel, prove alert fidelity, then cut over paging last. Model your per-host savings in the calculator above — and validate the numbers against a real quote, since self-hosting trades license cost for engineering time.

Why teams evaluate alternatives to Datadog

Reasons commonly cited by users and in public industry coverage for re-evaluating Datadog. These are general, reported considerations — not statements of fact about Datadog — and may not reflect your situation or the vendor's current terms. Verify against your own contract before deciding.

  • Per-host plus per-GB ingest billing is hard to predict
  • Costs spike with custom metrics and high cardinality
  • Each module (APM, logs, RUM) is billed separately
  • Bill shock at scale is widely reported

The migration plan

Roughly 18 weeks for a mid-size estate, in six phases.

Assessment & discovery
Inventory every workload, dependency, and integration; flag anything high-risk.
Target design & sizing
Size the new platform, design storage and networking, set RPO/RTO and rollback criteria.
Pilot migration
Migrate a small low-risk set end-to-end and validate the runbook.
↳ Deploy node/exporter agents and Prometheus + Alertmanager; rebuild Datadog monitors as alerting rules; recreate dashboards in Grafana; dual-run before cutover.
Production migration
Move workloads in scheduled waves using automation; verify after each wave.
Validation & optimization
Tune performance, confirm backup/DR, and update monitoring and docs.
Decommission source
Reclaim licenses, retire old infrastructure, and capture lessons learned.

Tooling & automation

Deploy node/exporter agents and Prometheus + Alertmanager; rebuild Datadog monitors as alerting rules; recreate dashboards in Grafana; dual-run before cutover.

OffVendor's wizard pre-fills these scripts with your environment — inventory export, disk/schema conversion, bulk provisioning, and validation.

Frequently asked

Is migrating from Datadog to Prometheus worth it?

For most teams facing rising Datadog costs, yes — Prometheus (free (self-hosted)) typically lowers 3-year total cost of ownership, though the right answer depends on workload complexity and in-house skills. Use the calculator to model your own numbers.

How long does a Datadog to Prometheus migration take?

A typical mid-size estimate is around 18 weeks across six phases — discovery, design, pilot, waved production migration, validation, and decommission. Larger or more complex estates take longer.

What tools are used to migrate from Datadog to Prometheus?

Deploy node/exporter agents and Prometheus + Alertmanager; rebuild Datadog monitors as alerting rules; recreate dashboards in Grafana; dual-run before cutover.

Get a vendor-accurate Prometheus quote

A guided builder that turns your estimates into a requirements report you can send to a vendor, partner, or distributor to secure a binding quote.

How this works — and what's yours to provide
  • Your inputs, your responsibility. The figures and estimates here describe your environment and requirements — please make sure they're accurate. OffVendor's defaults are illustrative starting points only, not vendor pricing.
  • It generates a requirements report (RFQ). Use it to capture your sizing and requirements and share it with your authorized vendor / partner / distributor to obtain a final, binding quote.
  • Then close the loop on your TCO. When the real quote comes back, plug those actual prices into the calculator above to refine your TCO and see where reality differs from the estimate.
  1. 1Size it
  2. 2Requirements
  3. 3Your details
  4. 4Channels & export

How big is your Datadog estate?

Physical + virtual hosts sending telemetry. Not sure? Enter rough numbers — the distributor confirms exact counts later.

300 monitored hosts
Default mid-size assumption (300 monitored hosts)
Estimates are illustrative and configurable; production figures come from vendor list prices and your own quotes.