-
Notifications
You must be signed in to change notification settings - Fork 18
Patch/restore mdf client #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
blaiszik
wants to merge
10
commits into
main
Choose a base branch
from
patch/restore-mdf-client
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was part of PR #469 but was not included in the merge, causing ModuleNotFoundError when importing foundry. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The forge DOI search can return multiple results where only one actually has the matching DOI. Previously, get_metadata_by_doi() blindly returned the first result, which often didn't have the requested DOI. Now it iterates through results to find the one with the exact DOI match, fixing test_dataframe_search_by_doi and test_dataframe_download_by_doi tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The combined size of torch, tensorflow, and NVIDIA CUDA dependencies exceeded GitHub Actions runner disk space (~4GB+). These ML frameworks are now available as optional extras via pip install .[torch] or pip install .[tensorflow]. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused imports (sys, rprint, Optional, pandas, numpy) - Fix unused exception variable - Remove f-string without placeholders - Split long line in MCP server description - Add noqa comment for intentional re-export Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update test imports to use foundry.mdf_client.MDFClient instead of mdf_forge.Forge, which is no longer a required dependency. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move heavy ML dependencies to optional extras to reduce default install size: - pip install foundry-ml[torch] - pip install foundry-ml[tensorflow] - pip install foundry-ml[huggingface] - pip install foundry-ml[excel] - pip install foundry-ml[examples] - pip install foundry-ml[dev] Update README with extras install instructions and NumPy 2.0 compatibility note. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MDFClient improvements:
- Add Globus Search index ID constants (MDF_INDEX_ID, MDF_TEST_INDEX_ID)
- Add match_source_names() method with automatic version suffix stripping
- Add _has_field_filters property for elegant advanced mode detection
- Use advanced=True automatically for DOI and source_name searches
(required for exact field matching in Globus Search)
- Add try/finally to ensure query state is always reset after search
Foundry search fix:
- Pass free-text query to Globus Search for server-side filtering
instead of fetching 10 results and filtering client-side
- This fixes searches like f.search("Computational Band Gaps") that
were failing when the target dataset wasn't in the first 10 results
Test additions:
- Add test_load_mp_band_gaps_dataset to verify DOI-based dataset loading
Re-rendered example notebooks with updated outputs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Progress bar improvements: - Enable progress bars by default during dataset downloads - Show "Finding files" progress while discovering files on server - Show "Downloading" progress with file count (e.g., 5/10 files) - Add per-file progress bar for files > 1MB with speed and ETA - Uses tqdm.auto for automatic Jupyter/terminal detection README improvements: - Add "Export to HuggingFace Hub" section with CLI and Python examples - Document the huggingface extra installation - Mention auto-generated Dataset Cards feature Test updates: - Update download tests to mock response.headers for content-length Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…zations Features: - Add dataset.preview(n=5) method to show actual data samples as DataFrame - Add "Open in Colab" badges to all example notebooks (12 notebooks) Improved error messages with actionable hints: - get_dataset() now raises DatasetNotFoundError instead of returning None - DownloadError includes contextual recovery hints based on error type - Better messages for missing files, unsupported data types, failed loads Test optimizations: - Add pytest fixtures (scope=module) to share Foundry client across tests - Add downloaded_dataset fixture to download once, share across 4 tests - Use small dataset (10.18126/8p6m-e135) for all tests - Enable HTTPS download tests to run on GitHub Actions - Reduces test time by avoiding repeated client creation and downloads Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.