Skip to content

Error of the Stackoverflow Tokernizer example #301

@WilliamYi96

Description

@WilliamYi96

TensorFlow version: 2.5.3
fedjax version: 0.0.16
jax version: 0.4.8

When I follow the docs (https://fedjax.readthedocs.io/en/latest/fedjax.datasets.html#fedjax.datasets.stackoverflow.load_data) to process the Stackoverflow dataset by using

from fedjax.datasets import stackoverflow
# Load partially preprocessed splits.
train, held_out, test = stackoverflow.load_data(cache_dir='../data')
# Apply tokenizer during batching.
Tokenizer = stackoverflow.StackoverflowTokenizer()
train_max_length, eval_max_length = 20, 30
train_for_train = train.preprocess_batch(
    tokenizer.as_preprocess_batch(train_max_length))
train_for_eval = train.preprocess_batch(
    tokenizer.as_preprocess_batch(eval_max_length))

It has the following error:

2023-05-06 23:46:33.460149: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    tokenizer = stackoverflow.StackoverflowTokenizer()
  File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/fedjax/datasets/stackoverflow.py", line 185, in __init__
    self._table = tf.lookup.StaticVocabularyTable(
  File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/tensorflow/python/ops/lookup_ops.py", line 1255, in __init__
    raise TypeError("Invalid key dtype, expected one of %s, but got %s." %
TypeError: Invalid key dtype, expected one of (tf.int64, tf.string), but got <dtype: 'float32'>.
Exception ignored in: <function CapturableResource.__del__ at 0x2b2156f4c040>
Traceback (most recent call last):
  File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'StaticVocabularyTable' object has no attribute '_destruction_context'

Could you please help fix this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions