Skip to content

Conversation

@blaiszik
Copy link
Contributor

No description provided.

blaiszik and others added 10 commits January 13, 2026 22:13
This file was part of PR #469 but was not included in the merge,
causing ModuleNotFoundError when importing foundry.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The forge DOI search can return multiple results where only one
actually has the matching DOI. Previously, get_metadata_by_doi()
blindly returned the first result, which often didn't have the
requested DOI.

Now it iterates through results to find the one with the exact
DOI match, fixing test_dataframe_search_by_doi and
test_dataframe_download_by_doi tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The combined size of torch, tensorflow, and NVIDIA CUDA dependencies
exceeded GitHub Actions runner disk space (~4GB+). These ML frameworks
are now available as optional extras via pip install .[torch] or
pip install .[tensorflow].

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused imports (sys, rprint, Optional, pandas, numpy)
- Fix unused exception variable
- Remove f-string without placeholders
- Split long line in MCP server description
- Add noqa comment for intentional re-export

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update test imports to use foundry.mdf_client.MDFClient instead of
mdf_forge.Forge, which is no longer a required dependency.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move heavy ML dependencies to optional extras to reduce default
install size:
- pip install foundry-ml[torch]
- pip install foundry-ml[tensorflow]
- pip install foundry-ml[huggingface]
- pip install foundry-ml[excel]
- pip install foundry-ml[examples]
- pip install foundry-ml[dev]

Update README with extras install instructions and NumPy 2.0
compatibility note.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MDFClient improvements:
- Add Globus Search index ID constants (MDF_INDEX_ID, MDF_TEST_INDEX_ID)
- Add match_source_names() method with automatic version suffix stripping
- Add _has_field_filters property for elegant advanced mode detection
- Use advanced=True automatically for DOI and source_name searches
  (required for exact field matching in Globus Search)
- Add try/finally to ensure query state is always reset after search

Foundry search fix:
- Pass free-text query to Globus Search for server-side filtering
  instead of fetching 10 results and filtering client-side
- This fixes searches like f.search("Computational Band Gaps") that
  were failing when the target dataset wasn't in the first 10 results

Test additions:
- Add test_load_mp_band_gaps_dataset to verify DOI-based dataset loading

Re-rendered example notebooks with updated outputs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Progress bar improvements:
- Enable progress bars by default during dataset downloads
- Show "Finding files" progress while discovering files on server
- Show "Downloading" progress with file count (e.g., 5/10 files)
- Add per-file progress bar for files > 1MB with speed and ETA
- Uses tqdm.auto for automatic Jupyter/terminal detection

README improvements:
- Add "Export to HuggingFace Hub" section with CLI and Python examples
- Document the huggingface extra installation
- Mention auto-generated Dataset Cards feature

Test updates:
- Update download tests to mock response.headers for content-length

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…zations

Features:
- Add dataset.preview(n=5) method to show actual data samples as DataFrame
- Add "Open in Colab" badges to all example notebooks (12 notebooks)

Improved error messages with actionable hints:
- get_dataset() now raises DatasetNotFoundError instead of returning None
- DownloadError includes contextual recovery hints based on error type
- Better messages for missing files, unsupported data types, failed loads

Test optimizations:
- Add pytest fixtures (scope=module) to share Foundry client across tests
- Add downloaded_dataset fixture to download once, share across 4 tests
- Use small dataset (10.18126/8p6m-e135) for all tests
- Enable HTTPS download tests to run on GitHub Actions
- Reduces test time by avoiding repeated client creation and downloads

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants