The SQL View

When you ask Wispy a question, you will see alongside the blocks and the results, an SQL view.

This shows two SQL queries, the "AI SQL" and the "Database SQL".

Note: Especially in "thorough" mode, the LLM can sometimes generate more than one query, you can switch between those by clicking on the dots in the chat view on the right.

Overview

In general, the flow is as follows:

LLM writes text, conveys intention
1. Writes postgresql ("AI SQL")
2. Creates blocks directly
Datawisp converts to blocks
1. The SQL and other operations are converted to a Datawisp sheet.
Datawisp executes the query
1. Datawisp chooses to compile (some of) the blocks to the SQL dialect your database understands ("Database SQL")
2. Datawisp executes the query directly using the Datawisp Query Engine

AI SQL

The AI SQL is directly what the LLM generated to analyze your data. To make sure you always get the best experience, we require the LLM to always write postgresql.

AI SQL is never run directly on your database. The primary use is for the LLM to convey it's intention and plan for the data analysis.

Why PostgreSQL

Most SQL dialects have some quirks that current-gen LLMs sometimes get wrong. They generate valid, but buggy SQL. Those can lead to unexpected and bad results.

To give the best user experience, Datawisp detects and handles many such cases. The LLM is always required to write in one dialect - posgresql.

We chose this dialect as it is a rather expressive dialect, has sensible date functions, straight-forward JSON support, and is widely used - so the LLM will have encountered it frequently in training.

Example

A specific example of a SQL dialect quirk: When asked to generate MySQL, sometimes, but not often, the LLM would generate a query like this:

SELECT AVG(arrival_date - ship_date) AS average_shipping_time FROM shipments;

This is completely valid MySQL, and it would run without issues. However, the output of (arrival_date - ship_date) is rather unexpected:

MySQL [(none)]> select (CURRENT_DATE);
+----------------+
| (CURRENT_DATE) |
+----------------+
| 2025-03-04     |
+----------------+

MySQL [(none)]> select (CURRENT_DATE - '2025-03-01');
+-------------------------------+
| (CURRENT_DATE - '2025-03-01') |
+-------------------------------+
|                      20248279 |
+-------------------------------+

MySQL [(none)]> select (CURRENT_DATE - '2025-01-01');
+-------------------------------+
| (CURRENT_DATE - '2025-01-01') |
+-------------------------------+
|                      20248279 |
+-------------------------------+

MySQL [(none)]> select (CURRENT_DATE - 0);
+--------------------+
| (CURRENT_DATE - 0) |
+--------------------+
|           20250304 |
+--------------------+

Generated MySQL is fine almost all of the time, but sometimes LLMs make mistakes like this. This happens especially frequently in scenarios that require more complicated SQL.

By focusing on one dialect, we can reliably detect and mitigate edge cases like this, and create a consistently good experience, no matter which database is used.

Database SQL

As soon as the LLM finished generating a query, this query is compiled to Datawisp blocks:

Those blocks are then, where possible, converted to optimized SQL queries that your database engine can run directly.

Since this SQL is mainly generated to be interpreted by your database engine, it may sometimes look a bit convoluted. In general, the extraneous parts (e.g. unnecessary WHERE 1=1) do not affect the result or the query performance negatively.

PreviousWispy NextDeployment

Last updated 4 months ago