Rules¶
Loss functions and models for rule learning.
Overview¶
Additive rule ensemble fitted by gradient boosting. |
|
Logistic loss function l(y, s) = log2(1 + exp(-ys)). |
|
|
Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q. |
Squared loss function l(y, s) = (y-s)^2. |
Details¶
-
realkd.rules.
logistic_loss
= logistic_loss¶ Logistic loss function l(y, s) = log2(1 + exp(-ys)).
Function assumes that positive and negative values are encoded as +1 and -1, respectively.
>>> y = array([1, -1, 1, -1]) >>> s = array([0, 0, 10, 10]) >>> logistic_loss(y, s) array([1.00000000e+00, 1.00000000e+00, 6.54967668e-05, 1.44270159e+01]) >>> logistic_loss.g(y, s) array([-5.00000000e-01, 5.00000000e-01, -4.53978687e-05, 9.99954602e-01]) >>> logistic_loss.h(y, s) array([2.50000000e-01, 2.50000000e-01, 4.53958077e-05, 4.53958077e-05])
-
realkd.rules.
squared_loss
= squared_loss¶ Squared loss function l(y, s) = (y-s)^2.
>>> squared_loss squared_loss >>> y = array([-2, 0, 3]) >>> s = array([0, 1, 2]) >>> squared_loss(y, s) array([4, 1, 1]) >>> squared_loss.g(y, s) array([ 4, 2, -2]) >>> squared_loss.h(y, s) array([2, 2, 2])
-
class
realkd.rules.
Rule
(q=True, y=0.0, z=0.0, loss=<class 'realkd.rules.SquaredLoss'>, reg=1.0, max_col_attr=10, discretization=<function qcut>, method='bestboundfirst', apx=1.0, max_depth=None)¶ Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.
>>> import pandas as pd >>> titanic = pd.read_csv('../datasets/titanic/train.csv') >>> titanic[['Name', 'Sex', 'Survived']].iloc[0] Name Braund, Mr. Owen Harris Sex male Survived 0 Name: 0, dtype: object >>> titanic[['Name', 'Sex', 'Survived']].iloc[1] Name Cumings, Mrs. John Bradley (Florence Briggs Th... Sex female Survived 1 Name: 1, dtype: object
>>> female = KeyValueProposition('Sex', Constraint.equals('female')) >>> r = Rule(female, 1.0, 0.0) >>> r(titanic.iloc[0]), r(titanic.iloc[1]) (0.0, 1.0) >>> target = titanic.Survived >>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True) >>> opt = Rule(reg=0.0) >>> opt.fit(titanic, target) +0.7420 if Sex==female
>>> best_logistic = Rule(loss='logistic') >>> best_logistic.fit(titanic, target.replace(0, -1)) -1.4248 if Pclass>=2 & Sex==male
>>> best_logistic.predict(titanic) array([-1., 1., 1., 1., ..., 1., 1., -1.])
>>> greedy = Rule(loss='logistic', reg=1.0, method='greedy') >>> greedy.fit(titanic, target.replace(0, -1)) -1.4248 if Pclass>=2 & Sex==male
>>> empty = Rule() >>> empty +0.0000 if True
- Parameters
q –
y –
z –
loss –
reg –
max_col_attr –
discretization –
method –
apx – approximation ratio (ignored when method ‘greedy’)
-
__call__
(x)¶ Predicts score for input data based on loss function.
For instance for logistic loss will return log odds of the positive class.
- Parameters
x –
- Returns
-
fit
(data, target, scores=None, verbose=False)¶ Fits rule to provide best loss reduction on given data (where the baseline prediction scores are either given explicitly through the scores parameter or are assumed to be 0.
- Parameters
data – pandas DataFrame containing only the feature columns
target – pandas Series containing the target values
scores – prior prediction scores according to which the reduction in prediction loss is optimised
verbose – whether to print status update and summary of query search
- Returns
self
-
predict_proba
(data)¶ Generates probability predictions for
- Parameters
data – pandas dataframe with data to predict probabilities for
- Returns
two-dimensional array of probabilities
-
class
realkd.rules.
GradientBoostingRuleEnsemble
(max_rules=3, loss=<class 'realkd.rules.SquaredLoss'>, members=[], reg=1.0, max_col_attr=10, discretization=<function qcut>, offset_rule=False, method='bestboundfirst', apx=1.0, max_depth=None)¶ Additive rule ensemble fitted by gradient boosting.
>>> female = KeyValueProposition('Sex', Constraint.equals('female')) >>> r1 = Rule(Conjunction([]), -0.5, 0.0) >>> r2 = Rule(female, 1.0, 0.0) >>> r3 = Rule(female, 0.3, 0.0) >>> r4 = Rule(Conjunction([]), -0.2, 0.0) >>> ensemble = GradientBoostingRuleEnsemble(members=[r1, r2, r3, r4]) >>> len(ensemble) 4 >>> ensemble[2] +0.3000 if Sex==female
>>> ensemble[:2] -0.5000 if True +1.0000 if Sex==female
>>> import pandas as pd >>> titanic = pd.read_csv('../datasets/titanic/train.csv') >>> survived = titanic.Survived >>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True) >>> re = GradientBoostingRuleEnsemble(loss=logistic_loss) >>> re.fit(titanic, survived.replace(0, -1), verbose=0) -1.4248 if Pclass>=2 & Sex==male +1.7471 if Pclass<=2 & Sex==female +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0
# performance with bestboundfirst: # <BLANKLINE> # Found optimum after inspecting 443 nodes # -1.4248 if Pclass>=2 & Sex==male # <BLANKLINE> # Found optimum after inspecting 786 nodes # +1.7471 if Pclass<=2 & Sex==female # ** # Found optimum after inspecting 6564 nodes #
>>> re_with_offset = GradientBoostingRuleEnsemble(max_rules=2, loss='logistic', offset_rule=True) >>> re_with_offset.fit(titanic, survived.replace(0, -1)) -0.4626 if True +2.3076 if Pclass<=2 & Sex==female
>>> greedy = GradientBoostingRuleEnsemble(max_rules=3, loss='logistic', method='greedy') >>> greedy.fit(titanic, survived.replace(0, -1)) -1.4248 if Pclass>=2 & Sex==male +1.7471 if Pclass<=2 & Sex==female -0.4225 if Parch<=1.0 & Sex==male
-
__call__
(x)¶ Computes combined prediction scores using all ensemble members.
- Parameters
x – dataframe to make predictions for
- Returns
vector of prediction scores for all rows in x
-
consolidated
(inplace=False)¶ Consolidates rules with equivalent queries into one.
- Parameters
inplace – whether to update self or to create new ensemble
- Returns
reference to consolidated ensemble (self if inplace=True)
For example:
>>> female = KeyValueProposition('Sex', Constraint.equals('female')) >>> r1 = Rule(Conjunction([]), -0.5, 0.0) >>> r2 = Rule(female, 1.0, 0.0) >>> r3 = Rule(female, 0.3, 0.0) >>> r4 = Rule(Conjunction([]), -0.2, 0.0) >>> ensemble = GradientBoostingRuleEnsemble(members=[r1, r2, r3, r4]) >>> ensemble.consolidated(inplace=True) -0.7000 if True +1.3000 if Sex==female
-