Updated Survey Link!!
A few readers let me know last week’s survey wasn’t working — it required a Substack account (oops). I’ve moved it to Google Forms so everyone can access it now.
It only takes 1 minute, and your feedback means a lot to me. I’d love to hear what you think:
Introduction
In data analytics, there’s an increasing push to make data accessible to everyone — not just your data science team. The goal? Empower business users to ask questions in plain language and get meaningful answers without waiting days for a dashboard or query from the data team.
Enter: Text-to-SQL. This emerging class of tools lets users ask natural language questions like “How many new customers signed up last month?” and generates SQL behind the scenes to fetch the answer.
Sounds like magic. But, is it ready to replace your data team?
Not yet. While the large language models (LLMs) behind these tools are quite good at understanding natural language, they still struggle with understanding the messy, complex, and constantly changing structure of your organization’s data.
Today, I’ll walk through how text-to-SQL works, its current strengths & weaknesses, and my take on what the future of data analytics looks like as these technologies evolve.
Let’s dive in!
What is SQL? (and why does it matter?)
Before we dive deeper, let’s cover the basics.
SQL (Structured Query Language) is the primary way we interact with databases. It’s how we retrieve, insert, update, and delete data. SQL is relatively human-readable (with commands like SELECT
, WHERE
, and GROUP BY
); but, writing good SQL still requires technical understanding of how data is stored, how tables relate, and how business logic should be applied.
That’s why most organizations rely on analysts and engineers to write SQL, even for relatively simple questions.
What is Text-to-SQL?
Text-to-SQL tools aim to close this gap. Instead of writing SQL, users type questions in plain English, like “What was our revenue last quarter in Europe?” And the system automatically translates that into SQL, runs the query, and returns results.
Behind the scenes, this involves:
Parsing the question with Natural Language Processing (NLP)
Mapping the concepts in the question to tables, columns, and filters in the database
Generating valid SQL
Running the query and formatting the result
All of this is meant to make querying data feel as easy as Googling something. But for that to work consistently, the system needs more than just language understanding…
Why the Semantic Layer Matters (A Lot)
Here’s the thing: language models don’t actually “understand” your business or your data. They’re guessing based on the patterns they've seen during training, and whatever context you give them in your prompt.
And when it comes to translating questions into SQL, guessing doesn’t cut it.
This is where the semantic layer comes in.
A semantic layer is a structured map that connects business concepts (like “customer,” “churn rate,” or “total sales”) to the actual database tables, columns, joins, and logic needed to define them. It acts as a translator between how humans talk about data and how data is physically stored.
Without a semantic layer, Text-to-SQL tools are flying blind. They don’t have the context needed and often make incorrect assumptions. A few of the issues that crop up without semantics:
Ambiguous mappings: The model doesn’t know if “sales” means
orders.amount
,revenue
, or something else.No shared definitions: Every query that asks for “total revenue” might calculate it slightly differently.
Fragile structure: If column names or table structures change, queries silently break.
Lack of domain context: Concepts like “churn” or “customer lifetime value” require pulling data from multiple sources and calculation logic that the model won’t infer correctly without help.
By contrast, a strong semantic layer anchors the system in a consistent understanding of your business. It provides:
Clarity: Definitions like “total sales” are standardized and reused across all queries.
Stability: If your schema changes, you only update the semantic layer — not the model prompts.
Context: It encodes domain knowledge that language models otherwise wouldn’t understand.
Think of it as the difference between guessing what someone means vs. having a shared company glossary of all of your business concepts.
Closing Thought
In closing, these models will get better.
Will Text-to-SQL replace analysts? I don’t think so. But it will change how they work.
In the near future, these tools will handle a large chunk of routine, low-complexity questions which will free up analysts to focus on deeper, more strategic work. Think of it like this:
Text-to-SQL + Semantic Layer = Self-service for the 80% of questions
But, that future only works if you invest in implementing these tools with sufficient context. The real unlock is building a shared, structured semantic layer that codifies you complex data landscape and tribal business knowledge into clear documentation tools can rely on.
As always, hit reply if you’ve got thoughts or questions for me! I’d love to hear how you’re using emerging AI tools for your data work.