Hey there,
Just a quick update on the next video to keep in mind.
In a newer version of Scikit-Learn (0.23+), the OneHotEncoder
class was upgraded to be able to handle None
& NaN
values.
However, since the video was recorded with an older version of Scikit-Learn, this upgrade isn't shown.
What happens in the video:
You will see an error at 4:30-4:35 which says ValueError: Input contains NaN
.
This is expected behaviour for older versions of Scikit-Learn (the version the video was made with).
What might happen with your code:
If you're running Scikit-Learn 0.23+ (you can check by running print(sklearn.__version__)
) no error will appear.
See the update in the documentation under "Attributes" here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
What to do:
Even though you see no error, you can keep coding (continue with the video).
But because you see no error, you might not be aware the dataset still has missing values.
That being said, you can continue with the video and follow the steps to fill missing values in the dataset.
---
Thank you to ChiraagKV for pointing this out on Discord.