Skip to content

Conversation

@blaiszik
Copy link
Contributor

No description provided.

blaiszik and others added 8 commits January 13, 2026 22:13
This file was part of PR #469 but was not included in the merge,
causing ModuleNotFoundError when importing foundry.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The forge DOI search can return multiple results where only one
actually has the matching DOI. Previously, get_metadata_by_doi()
blindly returned the first result, which often didn't have the
requested DOI.

Now it iterates through results to find the one with the exact
DOI match, fixing test_dataframe_search_by_doi and
test_dataframe_download_by_doi tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The combined size of torch, tensorflow, and NVIDIA CUDA dependencies
exceeded GitHub Actions runner disk space (~4GB+). These ML frameworks
are now available as optional extras via pip install .[torch] or
pip install .[tensorflow].

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused imports (sys, rprint, Optional, pandas, numpy)
- Fix unused exception variable
- Remove f-string without placeholders
- Split long line in MCP server description
- Add noqa comment for intentional re-export

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update test imports to use foundry.mdf_client.MDFClient instead of
mdf_forge.Forge, which is no longer a required dependency.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move heavy ML dependencies to optional extras to reduce default
install size:
- pip install foundry-ml[torch]
- pip install foundry-ml[tensorflow]
- pip install foundry-ml[huggingface]
- pip install foundry-ml[excel]
- pip install foundry-ml[examples]
- pip install foundry-ml[dev]

Update README with extras install instructions and NumPy 2.0
compatibility note.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MDFClient improvements:
- Add Globus Search index ID constants (MDF_INDEX_ID, MDF_TEST_INDEX_ID)
- Add match_source_names() method with automatic version suffix stripping
- Add _has_field_filters property for elegant advanced mode detection
- Use advanced=True automatically for DOI and source_name searches
  (required for exact field matching in Globus Search)
- Add try/finally to ensure query state is always reset after search

Foundry search fix:
- Pass free-text query to Globus Search for server-side filtering
  instead of fetching 10 results and filtering client-side
- This fixes searches like f.search("Computational Band Gaps") that
  were failing when the target dataset wasn't in the first 10 results

Test additions:
- Add test_load_mp_band_gaps_dataset to verify DOI-based dataset loading

Re-rendered example notebooks with updated outputs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@blaiszik blaiszik merged commit 5327b2b into main Jan 22, 2026
4 checks passed
@what-the-diff
Copy link

what-the-diff bot commented Jan 22, 2026

PR Summary

  • Enhancements to working_with_data.ipynb

    • The code has been updated to install and manage a specific version of pyarrow, a software library, more efficiently. This change also takes care of potential conflicts with other software.
    • For easier understanding, the code's format was improved and supplemented with extra comments and printed outputs.
  • Adjustments to oqmd.ipynb

    • Certain parts of the code are now run in a specific order to ensure the whole code functions properly.
    • Previous output messages, which might confuse reviewers, have been removed to make the code easier to read.
  • Optimizations in foundry.py

    • A new approach in the get_metadata_by_query process allows for quicker and more efficient data filtering on the server-side.
    • The search feature received an upgrade, now allowing filtering by field-specific attributes like source_name.
  • Updates in mdf_client.py

    • Introduced constants for identifying specific Globus Search Indexes. This addition improves the organization and readability of the code.
    • Also included is a new method to filter data sets by source_name, enhancing navigation.
    • Error handling and management of dataset search were improved, reducing the chance of errors and facilitating smoother searches.
  • Improvements to test_foundry.py

    • A test was added to verify loading of a specific dataset by its DOI (or Digital Object Identifier), offering better quality control.
    • General cleanup of the test code and removal of unnecessary comments provide greater clarity for anyone reviewing or using these tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants