-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Hi! While playing around with modula a bit, I found out that the hello GPT tutorial gives NaNs if you simply comment out the code section below from the tutorial that unrolls the dataloader for one step and prints shapes of tensors:
# --- NOTE: Simply commenting the code below in hello GPT tutorial leads to NaNs! --- #
for inputs, targets in train_loader:
print("Input shape:", inputs.shape)
print("Target shape:", targets.shape)
print("First input sequence:", inputs[0][:10], "...")
print("First target sequence:", targets[0][:10], "...")
print("\nDecoded input:", decode(inputs[0]))
print("\nDecoded target:", decode(targets[0]))
breakLink to tutorial notebook with NaNs: https://gist.github.com/amoudgl/858c3ba999d8be9af03062a3aadf7a79
I could also reproduce NaNs by sweeping the dataloader seed (set to default 0 in tutorial) from 0 to 99 and I got NaNs in 48/100 runs while keeping the GPT tutorial code exactly as it is (i.e. keeping the above piece of code that prints tensor shapes):
[Seed 0] Step 2000 --> val loss 1.882563829421997
[Seed 1] Step 2000 --> val loss 1.9162565469741821
[Seed 2] Step 2000 --> val loss 1.7845371961593628
[Seed 3] Step 2000 --> val loss nan
[Seed 4] Step 2000 --> val loss 1.7894586324691772
[Seed 5] Step 2000 --> val loss 1.9177284240722656
[Seed 6] Step 2000 --> val loss 1.8586244583129883
[Seed 7] Step 2000 --> val loss nan
[Seed 8] Step 2000 --> val loss nan
[Seed 9] Step 2000 --> val loss 1.8335838317871094
[Seed 10] Step 2000 --> val loss nan
[Seed 11] Step 2000 --> val loss nan
[Seed 12] Step 2000 --> val loss nan
[Seed 13] Step 2000 --> val loss nan
[Seed 14] Step 2000 --> val loss 1.9000786542892456
[Seed 15] Step 2000 --> val loss nan
[Seed 16] Step 2000 --> val loss nan
[Seed 17] Step 2000 --> val loss 1.9089670181274414
[Seed 18] Step 2000 --> val loss 1.7891883850097656
[Seed 19] Step 2000 --> val loss 1.7990573644638062
[Seed 20] Step 2000 --> val loss nan
[Seed 21] Step 2000 --> val loss nan
[Seed 22] Step 2000 --> val loss 1.832013726234436
[Seed 23] Step 2000 --> val loss nan
[Seed 24] Step 2000 --> val loss 1.8820613622665405
[Seed 25] Step 2000 --> val loss nan
[Seed 26] Step 2000 --> val loss 1.9186407327651978
[Seed 27] Step 2000 --> val loss 1.797479510307312
[Seed 28] Step 2000 --> val loss nan
[Seed 29] Step 2000 --> val loss nan
[Seed 30] Step 2000 --> val loss 1.7552546262741089
[Seed 31] Step 2000 --> val loss 2.0175182819366455
[Seed 32] Step 2000 --> val loss nan
[Seed 33] Step 2000 --> val loss nan
[Seed 34] Step 2000 --> val loss 1.959747314453125
[Seed 35] Step 2000 --> val loss 1.8269418478012085
[Seed 36] Step 2000 --> val loss nan
[Seed 37] Step 2000 --> val loss 1.8968876600265503
[Seed 38] Step 2000 --> val loss nan
[Seed 39] Step 2000 --> val loss 1.798439860343933
[Seed 40] Step 2000 --> val loss nan
[Seed 41] Step 2000 --> val loss nan
[Seed 42] Step 2000 --> val loss 1.8629509210586548
[Seed 43] Step 2000 --> val loss nan
[Seed 44] Step 2000 --> val loss 1.9264118671417236
[Seed 45] Step 2000 --> val loss nan
[Seed 46] Step 2000 --> val loss nan
[Seed 47] Step 2000 --> val loss 1.9535969495773315
[Seed 48] Step 2000 --> val loss nan
[Seed 49] Step 2000 --> val loss 1.841861367225647
[Seed 50] Step 2000 --> val loss nan
[Seed 51] Step 2000 --> val loss nan
[Seed 52] Step 2000 --> val loss nan
[Seed 53] Step 2000 --> val loss 1.8942348957061768
[Seed 54] Step 2000 --> val loss nan
[Seed 55] Step 2000 --> val loss 1.738916039466858
[Seed 56] Step 2000 --> val loss 1.953159213066101
[Seed 57] Step 2000 --> val loss 1.8719733953475952
[Seed 58] Step 2000 --> val loss 1.844765543937683
[Seed 59] Step 2000 --> val loss nan
[Seed 60] Step 2000 --> val loss nan
[Seed 61] Step 2000 --> val loss nan
[Seed 62] Step 2000 --> val loss nan
[Seed 63] Step 2000 --> val loss nan
[Seed 64] Step 2000 --> val loss 1.9283636808395386
[Seed 65] Step 2000 --> val loss nan
[Seed 66] Step 2000 --> val loss 1.9348175525665283
[Seed 67] Step 2000 --> val loss 1.8864034414291382
[Seed 68] Step 2000 --> val loss 1.9657562971115112
[Seed 69] Step 2000 --> val loss 1.9793367385864258
[Seed 70] Step 2000 --> val loss 2.008516311645508
[Seed 71] Step 2000 --> val loss 1.982019305229187
[Seed 72] Step 2000 --> val loss nan
[Seed 73] Step 2000 --> val loss 1.9085012674331665
[Seed 74] Step 2000 --> val loss nan
[Seed 75] Step 2000 --> val loss 1.9464409351348877
[Seed 76] Step 2000 --> val loss 1.8595117330551147
[Seed 77] Step 2000 --> val loss 1.9886068105697632
[Seed 78] Step 2000 --> val loss 1.824994683265686
[Seed 79] Step 2000 --> val loss nan
[Seed 80] Step 2000 --> val loss nan
[Seed 81] Step 2000 --> val loss nan
[Seed 82] Step 2000 --> val loss nan
[Seed 83] Step 2000 --> val loss 1.7689552307128906
[Seed 84] Step 2000 --> val loss 1.8707021474838257
[Seed 85] Step 2000 --> val loss nan
[Seed 86] Step 2000 --> val loss 1.9712644815444946
[Seed 87] Step 2000 --> val loss nan
[Seed 88] Step 2000 --> val loss 1.751327395439148
[Seed 89] Step 2000 --> val loss nan
[Seed 90] Step 2000 --> val loss 1.8812755346298218
[Seed 91] Step 2000 --> val loss nan
[Seed 92] Step 2000 --> val loss 1.922793984413147
[Seed 93] Step 2000 --> val loss 1.9098514318466187
[Seed 94] Step 2000 --> val loss 1.8975162506103516
[Seed 95] Step 2000 --> val loss nan
[Seed 96] Step 2000 --> val loss 1.759362816810608
[Seed 97] Step 2000 --> val loss nan
[Seed 98] Step 2000 --> val loss 1.8575009107589722
[Seed 99] Step 2000 --> val loss nan
Following are the links to detailed logs for each seed and associated code that exposes the dataloader seed as a command line argument:
logs: https://gist.github.com/amoudgl/0853bfa2f11af9d31ea4df364d337499
code: https://gist.github.com/amoudgl/b1bd6027f4086e9af6bd32c2ddd483c2
cc @jxbz