Skip to content

Commit 2cd161f

Browse files
authored
feat(pkg-r): SQL tibble sources (TblSqlSource) (#165)
* docs(pkg-r): Small adjustment of description/details * chore(pkg-r): Factor out valid table name check utility * chore: document * feat(pkg-r): TblLazySource -- querychat with lazy tibbles * fix: Only `use_cte` when necessary * chore: edit comment * chore: Add query guidelines These were developed for tbl_sql, but are broadly useful, so we're including the guidelines generally if either query tool is included. * tests(pkg-r): Add tests for edge cases * chore: separate out data source classes * rename: TblSqlSource * docs: Update pkgdown ref index * docs: Use `con` in examples
1 parent ba50a37 commit 2cd161f

30 files changed

+2037
-1030
lines changed

CLAUDE.md

Lines changed: 49 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,17 @@ The repository contains separate packages for R and Python:
1717
```
1818
/
1919
├── pkg-r/ # R package implementation
20-
│ ├── R/ # R source files
20+
│ ├── R/ # R source files (R6 classes and utilities)
21+
│ │ ├── QueryChat.R # Main QueryChat R6 class
22+
│ │ ├── DataSource.R # Abstract DataSource base class
23+
│ │ ├── DataFrameSource.R # DataSource for data.frames
24+
│ │ ├── DBISource.R # DataSource for DBI connections
25+
│ │ ├── TblSqlSource.R # DataSource for dbplyr tbl_sql
26+
│ │ ├── QueryChatSystemPrompt.R # System prompt management (internal)
27+
│ │ ├── querychat_module.R # Shiny module functions (internal)
28+
│ │ ├── querychat_tools.R # Tool definitions for LLM
29+
│ │ ├── deprecated.R # Deprecated functional API
30+
│ │ └── utils-*.R # Utility functions
2131
│ ├── inst/ # Installed files
2232
│ │ ├── examples-shiny/ # Shiny example applications
2333
│ │ ├── htmldep/ # HTML dependencies
@@ -98,26 +108,60 @@ make py-docs-preview
98108

99109
### Core Components
100110

111+
Both R and Python implementations use an object-oriented architecture:
112+
101113
1. **Data Sources**: Abstractions for data frames and database connections that provide schema information and execute SQL queries
102-
- R: `querychat_data_source()` in `pkg-r/R/data_source.R`
114+
- R: R6 class hierarchy in `pkg-r/R/`
115+
- `DataSource` - Abstract base class defining the interface (`DataSource.R`)
116+
- `DataFrameSource` - For data.frame objects (`DataFrameSource.R`)
117+
- `DBISource` - For DBI database connections (`DBISource.R`)
118+
- `TblSqlSource` - For dbplyr tbl_sql objects (`TblSqlSource.R`)
103119
- Python: `DataSource` classes in `pkg-py/src/querychat/datasource.py`
104120

105121
2. **LLM Client**: Integration with LLM providers (OpenAI, Anthropic, etc.) through:
106122
- R: ellmer package
107123
- Python: chatlas package
108124

109-
3. **Query Chat Interface**: UI components and server logic for the chat experience:
110-
- R: `querychat_sidebar()`, `querychat_ui()`, and `querychat_server()` in `pkg-r/R/querychat.R`
125+
3. **Query Chat Interface**: Main orchestration class that manages the chat experience:
126+
- R: `QueryChat` R6 class in `pkg-r/R/QueryChat.R`
127+
- Provides methods: `$new()`, `$app()`, `$sidebar()`, `$ui()`, `$server()`, `$df()`, `$sql()`, etc.
128+
- Internal Shiny module functions: `mod_ui()` and `mod_server()` in `pkg-r/R/querychat_module.R`
111129
- Python: `QueryChat` class in `pkg-py/src/querychat/querychat.py`
112130

113-
4. **Prompt Engineering**: System prompts and tool definitions that guide the LLM:
131+
4. **System Prompt Management**:
132+
- R: `QueryChatSystemPrompt` R6 class in `pkg-r/R/QueryChatSystemPrompt.R`
133+
- Handles loading and rendering of prompt templates with Mustache
134+
- Manages data descriptions and extra instructions
135+
- Python: Similar logic in `QueryChat` class
136+
137+
5. **Prompt Engineering**: System prompts and tool definitions that guide the LLM:
114138
- R: `pkg-r/inst/prompts/`
115139
- Main prompt (`prompt.md`)
116140
- Tool descriptions (`tool-query.md`, `tool-reset-dashboard.md`, `tool-update-dashboard.md`)
117141
- Python: `pkg-py/src/querychat/prompts/`
118142
- Main prompt (`prompt.md`)
119143
- Tool descriptions (`tool-query.md`, `tool-reset-dashboard.md`, `tool-update-dashboard.md`)
120144

145+
### R Package Architecture
146+
147+
The R package uses R6 classes for object-oriented design:
148+
149+
- **QueryChat**: Main user-facing class that orchestrates the entire query chat experience
150+
- Takes data sources as input
151+
- Provides methods for UI generation (`$sidebar()`, `$ui()`, `$app()`)
152+
- Manages server logic and reactive values (`$server()`)
153+
- Exposes reactive accessors (`$df()`, `$sql()`, `$title()`)
154+
155+
- **DataSource hierarchy**: Abstract interface for different data backends
156+
- All implementations provide: `get_schema()`, `execute_query()`, `test_query()`, `get_data()`
157+
- Allows QueryChat to work with data.frames, DBI connections, and dbplyr objects uniformly
158+
159+
- **QueryChatSystemPrompt**: Internal class for prompt template management
160+
- Loads templates from files or strings
161+
- Renders prompts with tool configurations using Mustache
162+
163+
The package has deprecated the old functional API (`querychat_init()`, `querychat_server()`, etc.) in favor of the R6 class approach. See `pkg-r/R/deprecated.R` for migration guidance.
164+
121165
### Data Flow
122166

123167
1. User enters a natural language query in the UI

pkg-py/src/querychat/_system_prompt.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ def render(self, tools: tuple[TOOL_GROUPS, ...] | None) -> str:
7575
"extra_instructions": self.extra_instructions,
7676
"has_tool_update": "update" in tools if tools else False,
7777
"has_tool_query": "query" in tools if tools else False,
78+
"include_query_guidelines": len(tools or ()) > 0,
7879
}
7980

8081
return chevron.render(self.template, context)

pkg-py/src/querychat/prompts/prompt.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,44 @@ Here is additional information about the data:
1616

1717
For security reasons, you may only query this specific table.
1818

19+
{{#include_query_guidelines}}
20+
## SQL Query Guidelines
21+
22+
When writing SQL queries to interact with the database, please adhere to the following guidelines to ensure compatibility and correctness.
23+
24+
### Structural Rules
25+
26+
**No trailing semicolons**
27+
Never end your query with a semicolon (`;`). The parent query needs to continue after your subquery closes.
28+
29+
**Single statement only**
30+
Return exactly one `SELECT` statement. Do not include multiple statements separated by semicolons.
31+
32+
**No procedural or meta statements**
33+
Do not include:
34+
- `EXPLAIN` / `EXPLAIN ANALYZE`
35+
- `SET` statements
36+
- Variable declarations
37+
- Transaction controls (`BEGIN`, `COMMIT`, `ROLLBACK`)
38+
- DDL statements (`CREATE`, `ALTER`, `DROP`)
39+
- `INTO` clauses (e.g., `SELECT INTO`)
40+
- Locking hints (`FOR UPDATE`, `FOR SHARE`)
41+
42+
### Column Naming Rules
43+
44+
**Alias all computed/derived columns**
45+
Every expression that isn't a simple column reference must have an explicit alias.
46+
47+
**Ensure unique column names**
48+
The result set must not have duplicate column names, even when selecting from multiple tables.
49+
50+
**Avoid `SELECT *` with JOINs**
51+
Explicitly list columns to prevent duplicate column names and ensure a predictable output schema.
52+
53+
**Avoid reserved words as unquoted aliases**
54+
If using reserved words as column aliases, quote them appropriately for your dialect.
55+
56+
{{/include_query_guidelines}}
1957
{{#is_duck_db}}
2058
### DuckDB SQL Tips
2159

@@ -130,7 +168,7 @@ You might want to <span class="suggestion">explore the advanced features</span>
130168
- The user has asked a very specific question requiring only a direct answer
131169
- The conversation is clearly wrapping up
132170

133-
#### Guidelines
171+
#### Suggestion Guidelines
134172

135173
- Suggestions can appear **anywhere** in your response—not just at the end
136174
- Use list format at the end for 2-4 follow-up options (most common pattern)
@@ -141,7 +179,6 @@ You might want to <span class="suggestion">explore the advanced features</span>
141179
- Never use generic phrases like "If you'd like to..." or "Would you like to explore..." — instead, provide concrete suggestions
142180
- Never refer to suggestions as "prompts" – call them "suggestions" or "ideas" or similar
143181

144-
145182
## Important Guidelines
146183

147184
- **Ask for clarification** if any request is unclear or ambiguous

pkg-py/tests/test_system_prompt.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@ def sample_prompt_template():
3030
{{#data_description}}Data: {{data_description}}{{/data_description}}
3131
{{#extra_instructions}}Instructions: {{extra_instructions}}{{/extra_instructions}}
3232
{{#has_tool_update}}UPDATE TOOL ENABLED{{/has_tool_update}}
33-
{{#has_tool_query}}QUERY TOOL ENABLED{{/has_tool_query}}"""
33+
{{#has_tool_query}}QUERY TOOL ENABLED{{/has_tool_query}}
34+
{{#include_query_guidelines}}QUERY GUIDELINES{{/include_query_guidelines}}
35+
"""
3436

3537

3638
class TestQueryChatSystemPromptInit:
@@ -157,6 +159,7 @@ def test_render_with_both_tools(self, sample_data_source, sample_prompt_template
157159

158160
assert "UPDATE TOOL ENABLED" in rendered
159161
assert "QUERY TOOL ENABLED" in rendered
162+
assert "QUERY GUIDELINES" in rendered
160163
assert "Database Type:" in rendered
161164
assert "Schema:" in rendered
162165

@@ -171,6 +174,7 @@ def test_render_with_query_only(self, sample_data_source, sample_prompt_template
171174

172175
assert "UPDATE TOOL ENABLED" not in rendered
173176
assert "QUERY TOOL ENABLED" in rendered
177+
assert "QUERY GUIDELINES" in rendered
174178

175179
def test_render_with_update_only(self, sample_data_source, sample_prompt_template):
176180
"""Test rendering with only update tool enabled."""
@@ -183,6 +187,7 @@ def test_render_with_update_only(self, sample_data_source, sample_prompt_templat
183187

184188
assert "UPDATE TOOL ENABLED" in rendered
185189
assert "QUERY TOOL ENABLED" not in rendered
190+
assert "QUERY GUIDELINES" in rendered
186191

187192
def test_render_with_no_tools(self, sample_data_source, sample_prompt_template):
188193
"""Test rendering with no tools enabled."""
@@ -195,6 +200,7 @@ def test_render_with_no_tools(self, sample_data_source, sample_prompt_template):
195200

196201
assert "UPDATE TOOL ENABLED" not in rendered
197202
assert "QUERY TOOL ENABLED" not in rendered
203+
assert "QUERY GUIDELINES" not in rendered
198204

199205
def test_render_includes_data_description(
200206
self, sample_data_source, sample_prompt_template

pkg-r/DESCRIPTION

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ Imports:
3737
whisker
3838
Suggests:
3939
bsicons,
40+
dbplyr,
41+
dplyr,
4042
DT,
4143
duckdb,
4244
knitr,

pkg-r/NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ export(DBISource)
44
export(DataFrameSource)
55
export(DataSource)
66
export(QueryChat)
7+
export(TblSqlSource)
78
export(querychat)
89
export(querychat_app)
910
export(querychat_data_source)

0 commit comments

Comments
 (0)