Handle Unknown Categories Using OneHotEncoder

Image for post
Image for post

How will you deal with unknown categories which were not part of your training set?

Answer: set handle_unknown=’ignore’ in OneHotEncoder

Example:

Let’s consider below as training data set:

Image for post
Image for post
vw_train

Here ‘Model’ is a categorical variable which we want to encode using OneHotEncoder.

Code snippet:

enc = OneHotEncoder(handle_unknown=’ignore’,sparse=False)

enc_fit = enc.fit(vw_train[[‘Model’]])

enc_fit.transform(vw_train[[‘Model’]])

Output:

Image for post
Image for post
vw_train_transformed

Now if in testing set you found new categories then above function will automatically handle it and encode it with all 0’s.

Ultimately it is assigning new category, let’s say ‘other’ to all the unknown categories as all of them will get same encoding.

Let’s consider below as testing data set:

Image for post
Image for post
vw_test

Encode testing data set.

Code snippet:

enc_fit.transform(vw_test[[‘Model’]])

Output:

Image for post
Image for post
vw_test_transformed

It will encode all the unknown categories in same way. That means it is introducing new category from unknown categories.

Now if we will change handle_unknown to ‘error’, then it will give an error when found unknown category.

Code snippet:

enc = OneHotEncoder(handle_unknown=’error’,sparse=False)

enc_fit = enc.fit(vw_train[[‘Model’]])

enc_fit.transform(vw_test[[‘Model’]])

Output:

Image for post
Image for post
Error

Conclusion:

  • When there is a requirement to handle unknown categories on frequent basis, then this is a good option to implement. Later on you can add unknown categories to training set and re-train your model.
  • Another option is to set handle_unknown=‘error’ and don’t make prediction at all when found unknown categories.

You can download full source code from my GitHub Repository.

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store