How I Built a Data Lakehouse With Delta Lake Architecture

Data Engineer Explains the Data Lakehouse Architecture

10 min readSep 18, 2023

“Mark my words, AI is far more dangerous than nukes.” — Elon Musk

Data runs the world now, I write and talk about it all over my profile.

As data evolves, businesses are thinking of ways to utilize their data better. Ever since the inception of ChatGPT, it further triggered businesses to realize the potential of AI and its capabilities, and some of them wondered if they could do something similar with their data.

Little do they know, the GPT-1 model was introduced in June 2018, which was the first iteration of ChatGPT itself. It had a whopping 56% accuracy score according to the GPT-1 paper, it was not looking good at the time, but look how the tables have turned now.

The point I’m trying to make here is how people ignore work behind the scenes of GPT. One does not simply create a Large Language Model without a wealth of diverse and rich data. Data is actually required to train the model.

How I Built a Data Lakehouse With Delta Lake Architecture

Data Engineer Explains the Data Lakehouse Architecture

Written by Nicholas Leong