Databricks Just Made the Format Question Disappear. The Interesting Part Is What It Kept.

$1B+Databricks paid for Tabular in 2024

~100contributors to Apache Polaris at graduation

2,800+Polaris pull requests closed in incubation

3 wksbetween Snowflake and Databricks v3 GA

to read Ali Ghodsi's own words on the day Databricks shipped Apache Iceberg version 3 to general availability, the open table format that lets multiple engines query the same files, you would think the whole thing had been a friendly collaboration. The two formats are now very close to each other, the Databricks chief executive wrote on LinkedIn, listing the deletion vectors, variant type, and row tracking that both Iceberg and the company's own Delta Lake now share. Much less headache for every organization to organize its data.

He is right about the headache. He is also describing a fight his company spent more than a billion dollars to win, and the victory is real even though it looks nothing like the one most people expected.

Databricks did not beat Iceberg. It absorbed the question of whether Iceberg or Delta Lake matters, and answered it with: no longer. Snowflake reached the same general availability milestone for Iceberg v3 on May 7, roughly three weeks earlier. Amazon Web Services added the same v3 capabilities to its Glue and S3 stack last November. When every major platform ships the same format features within months of one another, the format stops being a decision. It becomes plumbing.

The feature gap that justified picking a side is gone

For years the argument for Delta Lake over Iceberg, or the reverse, came down to a short list of capabilities one had and the other lacked. Iceberg v3 erases that list.

Deletion vectors let a system mark rows as deleted without rewriting whole data files, which is the difference between a compliance deletion that finishes in seconds and one that rewrites terabytes. Row lineage gives every row a tracking identity, so downstream pipelines process only what changed instead of reprocessing entire partitions. The variant data type lets semi-structured data, logs, telemetry, the messy JavaScript Object Notation that real applications emit, live inside the table instead of in a separate store. Those three were Delta Lake's differentiators. They are now part of the open Iceberg specification, which means a Snowflake engine and a Databricks engine read them the same way.

Databricks publishes a figure of roughly ten times faster merge operations from deletion vectors. That number is the vendor's own and has not been independently audited, so treat it as a direction rather than a measurement. The direction is not in dispute. The merge-on-read pattern that deletion vectors enable is a genuine reduction in write amplification, and every platform now implements it against the same spec.

The roadmap closes the door entirely. Databricks says the next versions, Iceberg v4 and Delta 5.0, will converge on a shared metadata structure. The two formats will not merely interoperate. They will sit on the same foundation. At that point the word you choose, Delta or Iceberg, describes history, not architecture.

When the format costs nothing to switch, the thing you pay for is whatever you cannot switch. That is no longer the table.

The billion-dollar acquisition bought narrative control, not technology

In June 2024, Databricks paid more than a billion dollars for Tabular, a company with revenue measured in the low single-digit millions. On a multiple of revenue the price was indefensible. On a different measure it was exact.

Tabular brought the original creators of Iceberg inside Databricks. Two years later, the people who built the open format that was supposed to be the alternative to Databricks now help set the roadmap for how Databricks governs that format. The company can stand on a stage and credibly serve both Delta and Iceberg customers because it employs the architects of both. Ghodsi's LinkedIn note frames the acquisition as a contribution to bringing the communities together, and at the engineering level that is fair. It also means the firm steering the convergence narrative is the firm that bought the right to steer it.

I wrote last month, when SAP bought Dremio to acquire a lakehouse it could not build, that the open format is no longer where the lock-in lives. The Databricks announcement is the same lesson from the opposite direction. SAP bought its way into the open layer. Databricks is giving the open layer away precisely because the open layer is no longer where the money sits.

The catalog and the optimizer are what you actually cannot leave

Strip away the format debate and look at what remains proprietary. Two things.

The first is the catalog, the layer that decides who can see which table, how data is discovered, how it is shared across engines. Databricks reached general availability for managed Iceberg, foreign Iceberg, and v3 inside Unity Catalog, and Unity Catalog federates across Glue, Snowflake Horizon, Hive Metastore, and more. Against it sits Apache Polaris, co-created by Dremio and Snowflake and donated to the Apache Software Foundation, which graduated to a top-level project in February 2026 after closing more than 2,800 pull requests with around one hundred contributors. Dremio describes Polaris in plain terms as the open alternative to Unity Catalog and to Glue. The catalog war, unlike the format war, has clearly drawn sides.

The second is the optimizer. Databricks runs predictive optimization and liquid clustering that automatically tune table performance, and those run best, sometimes only, inside the Databricks engine. The data sits in an open format any tool can read. The thing that makes the data fast does not travel with it.

So the honest version of the convergence story is this. Format portability is now real and nearly free. Governance portability and performance portability are neither. A table you can read from anywhere is not a table you can operate from anywhere at the same cost.

What a data team should change this quarter

Stop spending architecture cycles on Delta versus Iceberg. The answer is whichever your engines already speak, and increasingly that is both, through write-once-read-anywhere tooling that adds negligible overhead. The decision that used to feel irreversible now reverses itself.

Move the scrutiny you once aimed at format choice onto catalog choice. Ask how a table governed in Unity Catalog or Horizon or Polaris moves to a different governance layer, what access policies survive the move, and what it costs in engineering time. That is the portability question that still has teeth.

And price the optimizer separately from the storage. When a vendor tells you the data is open, agree, then ask what fraction of your performance depends on engine-specific tuning that the open format does not carry. The gap between those two answers is your real switching cost.

CIO/CTO Viability Question

Before your next renewal, run one test. Take a production table, point a second engine at it through an open catalog endpoint, and measure both the query performance and the governance you lose. If the format truly no longer locks you in, that exercise should be cheap and the results should sting only a little. If it is expensive or the access controls do not follow the data, you have just located the bill the convergence story was written to keep you from reading. Find that number before the vendor finds it for you.

Sources

Databricks. "Unity Catalog and the Next Era of Apache Iceberg." Databricks Blog, 28 May 2026, databricks.com.

Ghodsi, Ali. Post on Apache Iceberg v3 general availability. LinkedIn, 28 May 2026, linkedin.com.

Snowflake. "Support for Apache Iceberg Version 3 (General Availability)." Snowflake Documentation, 7 May 2026, snowflake.com.

Amazon Web Services. "AWS Announces Support for Apache Iceberg V3 Deletion Vectors and Row Lineage." AWS What's New, 26 Nov. 2025, aws.amazon.com.

Dremio. "Apache Polaris Graduates to a Top-Level Apache Project." Dremio Blog, 19 Feb. 2026, dremio.com.

Bellamkonda, Shashi. "SAP Buys the Lakehouse It Could Not Build." shashi.co, May 2026, shashi.co.

Disclaimer: This blog reflects my personal views only. Content does not represent the views of my employer, Info-Tech Research Group. AI tools may have been used for brevity, structure, or research support. Please independently verify any information before relying on it.

Shashi.co

Databricks Just Made the Format Question Disappear. The Interesting Part Is What It Kept.

The feature gap that justified picking a side is gone

The billion-dollar acquisition bought narrative control, not technology

The catalog and the optimizer are what you actually cannot leave

What a data team should change this quarter

Get new posts by email: