Apache Iceberg & Data Lakehouse Engineer Skills 2026
Apache Iceberg is the 2026 lakehouse standard — adopted by AWS, Snowflake, Databricks, and Cloudflare. Skills, certifications, and career path explained.

Table of Contents
Apache Iceberg won. By May 2026, every major data platform — AWS S3 Tables, Snowflake Polaris, Databricks Unity Catalog, Google BigLake, Cloudflare R2 Data Catalog — speaks native Iceberg. The format wars (Iceberg vs Delta vs Hudi) effectively ended. Data engineers who can model, optimize, and operate Iceberg tables at scale are the 2026 hire.
Why Iceberg Won
Three forces tipped 2025-2026 toward Iceberg:
- AWS S3 Tables launched fully managed Iceberg in late 2024 and made it cheaper than self-managed.
- Databricks acquired Tabular (the company founded by Iceberg's creators) in 2024, neutralizing the Delta-vs-Iceberg standoff.
- Snowflake committed to Polaris and Iceberg-native external tables, signaling cross-platform interop.
The result: open table format with all major engines reading and writing the same files. Vendor lock-in dropped sharply. Skills became portable.
Skills Lakehouse Engineers Need
Manifests, manifest lists, snapshots, partition transforms, hidden partitioning. Time travel and schema evolution semantics. Partition spec evolution without rewriting data.
REST Catalog spec is now standard. AWS Glue Catalog, Snowflake Polaris, Unity Catalog, Project Nessie all expose the same REST API. Auth via SigV4, OAuth2, Vended Credentials.
Small-file compaction, expire snapshots, remove orphan files, rewrite manifests. Each engine (Spark, Trino, S3 Tables) ships a different scheduler. Knowing the tradeoffs prevents silent cost explosions.
Spark, Trino, Flink, DuckDB, Athena, Snowflake, BigQuery, Daft. Knowing when each one shines is half the job. Trino + Iceberg for ad hoc analytics, Spark + Iceberg for ETL, Flink + Iceberg for streaming upserts.
AWS S3 Tables Quick Start
The fastest path to production Iceberg in AWS:
- Create an S3 Tables bucket (separate from regular S3).
- Register it as a catalog in Glue or as a federated catalog in Athena.
- Write with Spark on EMR or via Glue ETL.
- Read with Athena (no compute config needed) or Redshift Spectrum.
- Let S3 Tables auto-compact in the background — this is the killer feature vs DIY Iceberg on plain S3.
Cost watch: S3 Tables charges per request and per maintenance operation on top of standard storage. Worth it for production tables; overkill for small lookup data.
Relevant Certifications
- AWS Data Engineer Associate (DEA-C01) — Iceberg and S3 Tables are now in the exam blueprint.
- Snowflake SnowPro Core — covers Iceberg external tables and Polaris.
- Databricks Data Engineer Professional — Unity Catalog with Iceberg interop.
- Google Cloud Professional Data Engineer — BigLake managed Iceberg.
- Cloudera Certified Data Engineer — for on-prem and hybrid Iceberg.
For most data engineers in 2026, DEA-C01 + SnowPro Core or Databricks DE-Pro is the strongest pair.
Career Paths
The hottest 2026 role is Lakehouse Platform Engineer: someone who builds the company's table catalog, sets standards for partitioning and compaction, and owns SLAs for query latency across teams. The job sits between data engineering and platform engineering.
Frequently Asked Questions
Is Delta Lake dead?
No. Delta Lake is still healthy on Databricks. The 2024 Databricks-Tabular acquisition explicitly committed to Delta Iceberg interop via Uniform. Most production Databricks tables are still Delta. New green-field projects are increasingly Iceberg for portability.
Is Apache Hudi still relevant?
Hudi has lost mindshare in 2025-2026 outside Uber and a few large adopters. Iceberg adoption growth eclipsed Hudi by 5-7x in the same period. New projects rarely choose Hudi today.
Do I need Spark?
Less than you think. Trino, DuckDB, and Athena all read/write Iceberg. For batch ETL, Spark is still the most powerful choice. For ad hoc analytics, you can skip Spark entirely.
Is the REST Catalog spec stable?
Yes. The Iceberg REST Catalog spec stabilized in late 2024 and is now the implementation target for AWS, Snowflake, Databricks, and Project Nessie. Implementing your own catalog is a one-week job vs the multi-month Hive Metastore migration of 2022.
Practice with ExamCert
1000+ certification practice questions covering AWS, Azure, GCP, AI, security, and more — with detailed explanations.
Browse All ExamsMaster the 2026 IT Stack
Practice exam questions with detailed explanations across AWS, Azure, GCP, security, and AI certifications.
