AI Technology
Text-to-SQL Production Lessons

💡 Lessons Learned: Implementing a Text-to-SQL Platform in Production

This page documents key lessons and practical insights from building a Text-to-SQL platform in a large tech company with a massive datalake. The goal: share experiences with tech-facing readers who build scalable LLM-powered SQL systems.


🛠️ 1. Starting Small: The Early Roadmap

We began last year when context-window limits were a major bottleneck (e.g., LLM prompt context only ~2,048–4,096 tokens in many models at the time). Our initial experiment used a model we refer to as LLama NSQL and an open-source platform called Vanna, which required three core context-inputs:

  1. Database schema / DDL — all tables, relationships, constraints.
  2. Documentation / metadata — table/column business meanings, comments, descriptions.
  3. Question–SQL example pairs — natural language questions mapped to correct SQL for your schema.

With just these, Vanna could generate near-perfect SQL on a small, simple database. It was essentially a RAG-based approach (Retrieval-Augmented Generation) — ideal for simple cases.


📈 2. Scaling & Roadmap Visual

We mapped out the following sequence:

PhaseFocusChallenge
Phase 1Small DB, simple tablesLightweight context, manageable token count
Phase 2Multiple tables, complex joinsSchema grows, context window becomes a limit
Phase 3Agent-based system, multi-tool orchestrationLatency + many LLM calls + retrieval overhead
Phase 4Production optimizationReduce end-to-end latency, manage hallucination, scale

Visual roadmap (you can insert a timeline graphic) Start → Scale → Agentization → Optimization


🔎 3. Scaling Challenges

When applying to many tables and complex joins:

  • Latency grew significantly.
  • SQL accuracy dropped for multi-table, multi-join queries.

We explored community tools and discovered Bird, which hosts Text-to-SQL challenges. Their "Chess system" inspired us to adopt agent-based architecture for reasoning + SQL generation.


⚡ 4. Latency & Context Window Constraints

In our early agent implementation:

  • Many LLM calls per query (tool chaining + retrieval).
  • Model context window couldn't accommodate full schema + metadata + examples.
  • A single query execution sometimes took ~30 minutes.

We realized: we must optimize what we send into the model and how often.


🧹 5. Workflow Optimization & Code Snippet

We reduced end-to-end execution to ~30 seconds by:

  • Pruning irrelevant schema parts.
  • Chunking metadata.
  • Minimizing LLM calls (fewer hops).
  • Tightening prompt structure.

Example: Prompt Template

You are a SQL-generation agent.
Schema:
{{ddl_snippet}}
Metadata:
{{metadata_snippet}}
Examples:
{{qa_pairs}}
Question:
{{user_question}}
Generate optimized SQL.

Example: Preprocessing Step

# Prune schema
relevant_tables = extract_relevant_tables(question, full_schema)
ddl_snippet = format_ddl(relevant_tables)

Important highlight: token budget matters. Much research (e.g., schema-pruning in text2sql) shows models degrade when input tokens exceed ~4k-8k. (arXiv (opens in a new tab)) In our case, we trimmed context until tokens ~3,000–4,000 per query consistently.


🧠 6. Key Technical Lessons

  1. Data quality is king. Good metadata + correct examples yield better SQL.
  2. LLM evolution brings new trade-offs. Bigger context windows help (e.g., GPT-4/5, Chaos models) but hallucinations increase.
  3. Multi-agent architecture is complex. Agents need communication protocols, orchestration, failure modes.
  4. Prompt + data = true asset. Models are tools; your input quality defines output.
  5. You cannot be purely agile. Rapid AI evolution means you must plan architecture, tool-chain, scalers carefully.

🌟 7. Practical Advice

  • Begin with a narrow scope (few tables) to validate your method.
  • Use RAG with schema + metadata + QA examples.
  • Prune and chunk your context to stay within model limits (~3k–4k tokens).
  • Minimize LLM calls: combine retrieval + generation steps.
  • Monitor SQL accuracy, hallucination, execution failures.
  • Build modular architecture: retrieval module, generation module, revision module.
  • Document everything: schema changes, prompt iterations, agent flows.

✨ 8. Ongoing Reflection & Future Work

Building production-level Text-to-SQL systems isn't just coding. It's data engineering + prompt design + system architecture + human insight. Models may get better, but if your metadata is messy or schema is outdated, you'll struggle. Human expertise remains the core.


📌 Recommended Roadmap Diagram (to include)

  • Start: Small DB → RAG → Simple SQL
  • Scale: Many tables → Complex joins → Agentization
  • Optimize: Latency reduction → Context window workarounds → Hallucination checks
  • Mature: Multi-tenant, live datalake, revision agents, usage monitoring

🔗 References


🛠️ Tools & Technologies Mentioned

  • Vanna: Open-source Text-to-SQL platform
  • Bird: Text-to-SQL challenge platform
  • LLama NSQL: Custom model implementation
  • RAG (Retrieval-Augmented Generation): Core architectural pattern
  • Agent-based systems: Multi-tool orchestration approach