Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 49 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,17 @@ The repository contains separate packages for R and Python:
```
/
├── pkg-r/ # R package implementation
│ ├── R/ # R source files
│ ├── R/ # R source files (R6 classes and utilities)
│ │ ├── QueryChat.R # Main QueryChat R6 class
│ │ ├── DataSource.R # Abstract DataSource base class
│ │ ├── DataFrameSource.R # DataSource for data.frames
│ │ ├── DBISource.R # DataSource for DBI connections
│ │ ├── TblSqlSource.R # DataSource for dbplyr tbl_sql
│ │ ├── QueryChatSystemPrompt.R # System prompt management (internal)
│ │ ├── querychat_module.R # Shiny module functions (internal)
│ │ ├── querychat_tools.R # Tool definitions for LLM
│ │ ├── deprecated.R # Deprecated functional API
│ │ └── utils-*.R # Utility functions
│ ├── inst/ # Installed files
│ │ ├── examples-shiny/ # Shiny example applications
│ │ ├── htmldep/ # HTML dependencies
Expand Down Expand Up @@ -98,26 +108,60 @@ make py-docs-preview

### Core Components

Both R and Python implementations use an object-oriented architecture:

1. **Data Sources**: Abstractions for data frames and database connections that provide schema information and execute SQL queries
- R: `querychat_data_source()` in `pkg-r/R/data_source.R`
- R: R6 class hierarchy in `pkg-r/R/`
- `DataSource` - Abstract base class defining the interface (`DataSource.R`)
- `DataFrameSource` - For data.frame objects (`DataFrameSource.R`)
- `DBISource` - For DBI database connections (`DBISource.R`)
- `TblSqlSource` - For dbplyr tbl_sql objects (`TblSqlSource.R`)
- Python: `DataSource` classes in `pkg-py/src/querychat/datasource.py`

2. **LLM Client**: Integration with LLM providers (OpenAI, Anthropic, etc.) through:
- R: ellmer package
- Python: chatlas package

3. **Query Chat Interface**: UI components and server logic for the chat experience:
- R: `querychat_sidebar()`, `querychat_ui()`, and `querychat_server()` in `pkg-r/R/querychat.R`
3. **Query Chat Interface**: Main orchestration class that manages the chat experience:
- R: `QueryChat` R6 class in `pkg-r/R/QueryChat.R`
- Provides methods: `$new()`, `$app()`, `$sidebar()`, `$ui()`, `$server()`, `$df()`, `$sql()`, etc.
- Internal Shiny module functions: `mod_ui()` and `mod_server()` in `pkg-r/R/querychat_module.R`
- Python: `QueryChat` class in `pkg-py/src/querychat/querychat.py`

4. **Prompt Engineering**: System prompts and tool definitions that guide the LLM:
4. **System Prompt Management**:
- R: `QueryChatSystemPrompt` R6 class in `pkg-r/R/QueryChatSystemPrompt.R`
- Handles loading and rendering of prompt templates with Mustache
- Manages data descriptions and extra instructions
- Python: Similar logic in `QueryChat` class

5. **Prompt Engineering**: System prompts and tool definitions that guide the LLM:
- R: `pkg-r/inst/prompts/`
- Main prompt (`prompt.md`)
- Tool descriptions (`tool-query.md`, `tool-reset-dashboard.md`, `tool-update-dashboard.md`)
- Python: `pkg-py/src/querychat/prompts/`
- Main prompt (`prompt.md`)
- Tool descriptions (`tool-query.md`, `tool-reset-dashboard.md`, `tool-update-dashboard.md`)

### R Package Architecture

The R package uses R6 classes for object-oriented design:

- **QueryChat**: Main user-facing class that orchestrates the entire query chat experience
- Takes data sources as input
- Provides methods for UI generation (`$sidebar()`, `$ui()`, `$app()`)
- Manages server logic and reactive values (`$server()`)
- Exposes reactive accessors (`$df()`, `$sql()`, `$title()`)

- **DataSource hierarchy**: Abstract interface for different data backends
- All implementations provide: `get_schema()`, `execute_query()`, `test_query()`, `get_data()`
- Allows QueryChat to work with data.frames, DBI connections, and dbplyr objects uniformly

- **QueryChatSystemPrompt**: Internal class for prompt template management
- Loads templates from files or strings
- Renders prompts with tool configurations using Mustache

The package has deprecated the old functional API (`querychat_init()`, `querychat_server()`, etc.) in favor of the R6 class approach. See `pkg-r/R/deprecated.R` for migration guidance.

### Data Flow

1. User enters a natural language query in the UI
Expand Down
1 change: 1 addition & 0 deletions pkg-py/src/querychat/_system_prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ def render(self, tools: tuple[TOOL_GROUPS, ...] | None) -> str:
"extra_instructions": self.extra_instructions,
"has_tool_update": "update" in tools if tools else False,
"has_tool_query": "query" in tools if tools else False,
"include_query_guidelines": len(tools or ()) > 0,
}

return chevron.render(self.template, context)
41 changes: 39 additions & 2 deletions pkg-py/src/querychat/prompts/prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,44 @@ Here is additional information about the data:

For security reasons, you may only query this specific table.

{{#include_query_guidelines}}
## SQL Query Guidelines

When writing SQL queries to interact with the database, please adhere to the following guidelines to ensure compatibility and correctness.

### Structural Rules

**No trailing semicolons**
Never end your query with a semicolon (`;`). The parent query needs to continue after your subquery closes.

**Single statement only**
Return exactly one `SELECT` statement. Do not include multiple statements separated by semicolons.

**No procedural or meta statements**
Do not include:
- `EXPLAIN` / `EXPLAIN ANALYZE`
- `SET` statements
- Variable declarations
- Transaction controls (`BEGIN`, `COMMIT`, `ROLLBACK`)
- DDL statements (`CREATE`, `ALTER`, `DROP`)
- `INTO` clauses (e.g., `SELECT INTO`)
- Locking hints (`FOR UPDATE`, `FOR SHARE`)

### Column Naming Rules

**Alias all computed/derived columns**
Every expression that isn't a simple column reference must have an explicit alias.

**Ensure unique column names**
The result set must not have duplicate column names, even when selecting from multiple tables.

**Avoid `SELECT *` with JOINs**
Explicitly list columns to prevent duplicate column names and ensure a predictable output schema.

**Avoid reserved words as unquoted aliases**
If using reserved words as column aliases, quote them appropriately for your dialect.

{{/include_query_guidelines}}
{{#is_duck_db}}
### DuckDB SQL Tips

Expand Down Expand Up @@ -130,7 +168,7 @@ You might want to <span class="suggestion">explore the advanced features</span>
- The user has asked a very specific question requiring only a direct answer
- The conversation is clearly wrapping up

#### Guidelines
#### Suggestion Guidelines

- Suggestions can appear **anywhere** in your response—not just at the end
- Use list format at the end for 2-4 follow-up options (most common pattern)
Expand All @@ -141,7 +179,6 @@ You might want to <span class="suggestion">explore the advanced features</span>
- Never use generic phrases like "If you'd like to..." or "Would you like to explore..." — instead, provide concrete suggestions
- Never refer to suggestions as "prompts" – call them "suggestions" or "ideas" or similar


## Important Guidelines

- **Ask for clarification** if any request is unclear or ambiguous
Expand Down
8 changes: 7 additions & 1 deletion pkg-py/tests/test_system_prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ def sample_prompt_template():
{{#data_description}}Data: {{data_description}}{{/data_description}}
{{#extra_instructions}}Instructions: {{extra_instructions}}{{/extra_instructions}}
{{#has_tool_update}}UPDATE TOOL ENABLED{{/has_tool_update}}
{{#has_tool_query}}QUERY TOOL ENABLED{{/has_tool_query}}"""
{{#has_tool_query}}QUERY TOOL ENABLED{{/has_tool_query}}
{{#include_query_guidelines}}QUERY GUIDELINES{{/include_query_guidelines}}
"""


class TestQueryChatSystemPromptInit:
Expand Down Expand Up @@ -157,6 +159,7 @@ def test_render_with_both_tools(self, sample_data_source, sample_prompt_template

assert "UPDATE TOOL ENABLED" in rendered
assert "QUERY TOOL ENABLED" in rendered
assert "QUERY GUIDELINES" in rendered
assert "Database Type:" in rendered
assert "Schema:" in rendered

Expand All @@ -171,6 +174,7 @@ def test_render_with_query_only(self, sample_data_source, sample_prompt_template

assert "UPDATE TOOL ENABLED" not in rendered
assert "QUERY TOOL ENABLED" in rendered
assert "QUERY GUIDELINES" in rendered

def test_render_with_update_only(self, sample_data_source, sample_prompt_template):
"""Test rendering with only update tool enabled."""
Expand All @@ -183,6 +187,7 @@ def test_render_with_update_only(self, sample_data_source, sample_prompt_templat

assert "UPDATE TOOL ENABLED" in rendered
assert "QUERY TOOL ENABLED" not in rendered
assert "QUERY GUIDELINES" in rendered

def test_render_with_no_tools(self, sample_data_source, sample_prompt_template):
"""Test rendering with no tools enabled."""
Expand All @@ -195,6 +200,7 @@ def test_render_with_no_tools(self, sample_data_source, sample_prompt_template):

assert "UPDATE TOOL ENABLED" not in rendered
assert "QUERY TOOL ENABLED" not in rendered
assert "QUERY GUIDELINES" not in rendered

def test_render_includes_data_description(
self, sample_data_source, sample_prompt_template
Expand Down
2 changes: 2 additions & 0 deletions pkg-r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ Imports:
whisker
Suggests:
bsicons,
dbplyr,
dplyr,
DT,
duckdb,
knitr,
Expand Down
1 change: 1 addition & 0 deletions pkg-r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ export(DBISource)
export(DataFrameSource)
export(DataSource)
export(QueryChat)
export(TblSqlSource)
export(querychat)
export(querychat_app)
export(querychat_data_source)
Expand Down
Loading
Loading