How I Built a Data Lakehouse With Delta Lake Architecture

Data Engineer Explains the Data Lakehouse Architecture

Nicholas Leong
10 min readSep 18, 2023
Image by Author

“Mark my words, AI is far more dangerous than nukes.” — Elon Musk

Data runs the world now, I write and talk about it all over my profile.

As data evolves, businesses are thinking of ways to utilize their data better. Ever since the inception of ChatGPT, it further triggered businesses to realize the potential of AI and its capabilities, and some of them wondered if they could do something similar with their data.

Little do they know, the GPT-1 model was introduced in June 2018, which was the first iteration of ChatGPT itself. It had a whopping 56% accuracy score according to the GPT-1 paper, it was not looking good at the time, but look how the tables have turned now.

Screenshot by Author

The point I’m trying to make here is how people ignore work behind the scenes of GPT. One does not simply create a Large Language Model without a wealth of diverse and rich data. Data is actually required to train the model.



Nicholas Leong

Data Engineer — Crunching data and writing about it so you don’t get headaches. 1M+ reads on Medium.