Friday, February 24, 2012

Avoid splitting trees by value missing

Hi, could anyone kindly let me know how to prevent the Decision Trees model from using value missing as a criterion to split trees?

Thanks,

hz

Decision trees only splits on missing if you have sparse data. Are you using nested tables possibly? If not, then you have a column in your table with enough NULLs that are correlated with your target that there is enough information gain to cause a split. You can use NOT NULL, but this will just prevent the model from processing altogether if nulls are present|||

Yes, unfortunately, there are a lot of null values. I am not sure what you mean by "use NOT NULL". I have many predictor columns. Their null values are not in sync. If "NOT NULL" is used as a filter for every predictor, there may be no case left.

|||Unfortunately there's no way to not split by a value that appears in the data in SQL 2005. It's something for us to think about for future versions (expecially the "NULL" case)

No comments:

Post a Comment