Python MissForest Users will have to perform categorical features encoding by themselves

4 days ago 7
ARTICLE AD BOX

I'm trying to use missforest in python to impute missing values in a data set but I'm having issues with the categorical values.

In the original documentation it gives the example :

categorical=["sex", "smoker", "region", "children"] # Default estimators are lgbm classifier and regressor mf = MissForest(categorical=categorical) mf.fit(x=train)

But when I try to actual use it like this inside of a class im making:

from missforest import MissForest from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier mf = MissForest( clf=RandomForestClassifier(n_jobs=-1), rgr=RandomForestRegressor(n_jobs=-1), categorical=list(self.data.select_dtypes(exclude='number').columns) ) self.data = mf.fit_transform(self.data)

It gives me the error of " UserWarning: Label encoding is no longer performed by default. Users will have to perform categorical features encoding by themselves." Every attempt I have made to encode them myself such as :

from missforest import MissForest from sklearn.preprocessing import OrdinalEncoder from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor # Categorical + numeric columns cat_cols = list(self.data.select_dtypes(exclude='number').columns) num_cols = list(self.data.select_dtypes(include='number').columns) # Encode categorical columns safely encoders = {} for col in cat_cols: enc = OrdinalEncoder() self.data.loc[:, col] = enc.fit_transform(self.data[[col]].astype(str)) encoders[col] = enc # Initialize MissForest mf = MissForest( clf=RandomForestClassifier(n_jobs=-1), rgr=RandomForestRegressor(n_jobs=-1), categorical=cat_cols ) # Fit + transform self.data = mf.fit_transform(self.data) # Decode categorical columns for col in cat_cols: self.data.loc[:, col] = encoders[col].inverse_transform(self.data[[col]])

I saw someone ask a similar question on here who answered "try to encode your variables first then use this modified version of missforest" but they didn't specify what method of encoding to use and the updated version of missforest still doesnt work.

Read Entire Article