Skip to content

Conversation

@cpsievert
Copy link
Contributor

This PR removes pandas as a required dependency and replaces it with narwhals, a lightweight DataFrame abstraction layer that supports both pandas and polars backends. Users can now choose their preferred DataFrame library.

Motivation

  • Reduced dependencies: pandas is a heavy dependency; users who prefer polars shouldn't need to install pandas
  • Flexibility: Some users prefer polars for its performance characteristics
  • Future-proofing: narwhals provides a stable API that works across DataFrame libraries

Changes

Dependencies

  • Removed pandas from required dependencies
  • Added pandas and polars as optional dependencies
  • Install with pip install querychat[pandas] or pip install querychat[polars]

API Changes (Breaking)

  • execute_query(), get_data(), and df() now return narwhals DataFrames instead of pandas DataFrames
  • Call .to_native() on returned DataFrames to get the underlying pandas/polars DataFrame

Internal Changes

  • New _df_compat.py module handles backend selection (prefers polars when available)
  • df_to_html() generates HTML directly without pandas dependency
  • DataFrameSource accepts pandas, polars, or narwhals DataFrames

Tests

  • Added test_df_compat.py for the compatibility layer
  • Added test_dataframe_source.py with comprehensive DataFrameSource tests
  • Polars-specific tests are skipped when polars isn't installed

Migration Guide

# Before
df = querychat.df()  # pandas DataFrame
df.head()

# After
df = querychat.df()  # narwhals DataFrame
df.head()            # still works - narwhals has similar API

# If you need a native pandas/polars DataFrame:
native_df = querychat.df().to_native()

@cpsievert cpsievert requested a review from Copilot December 19, 2025 23:04

This comment was marked as resolved.

@cpsievert cpsievert force-pushed the feat/narwhals-df branch 2 times, most recently from 09816ed to 1081e4d Compare December 19, 2025 23:11
"""A DataSource implementation that wraps a pandas DataFrame using DuckDB."""
"""A DataSource implementation that wraps a DataFrame using DuckDB."""

_df: nw.DataFrame | nw.LazyFrame
Copy link
Contributor Author

@cpsievert cpsievert Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it's going to make sense for us to have a separate LazyFrameSource, which I'll do in a follow up PR (before the next release). The benefit being that we can be more lazy about computation, and possibly have .df() also return a LazyFrame in that scenario


# Ensure we're working with a DataFrame, not a LazyFrame
ndf = (
self._df.head(10).collect()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that downstream calculation of ranges and unique values wasn't working properly because they were based on the first 10 rows -- I'll address this when doing the new LazyFrameSource implementation

@cpsievert cpsievert force-pushed the feat/narwhals-df branch 2 times, most recently from 65e778b to 167317d Compare December 19, 2025 23:38
Remove pandas as a required dependency in favor of narwhals, which provides
a unified DataFrame interface supporting both pandas and polars backends.

Changes:
- Add _df_compat.py module with read_csv, read_sql, and duckdb_result_to_nw helpers
- Update DataSource classes to return narwhals DataFrames
- Update df_to_html to generate HTML without pandas dependency
- Make pandas and polars optional dependencies
- Add comprehensive tests for DataFrameSource and df_compat module

Users can now install with either `pip install querychat[pandas]` or
`pip install querychat[polars]`. Use `.to_native()` on returned DataFrames
to get the underlying pandas or polars DataFrame.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cpsievert cpsievert requested a review from Copilot December 19, 2025 23:48
@cpsievert cpsievert marked this pull request as ready for review December 19, 2025 23:48
@cpsievert cpsievert requested a review from gadenbuie December 19, 2025 23:48

This comment was marked as resolved.

@cpsievert cpsievert changed the title feat(pkg-py): Replace pandas dependency with narwhals abstraction layer feat(pkg-py): Replace pandas with narwhals Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants