Rules¶

Loss functions and models for rule learning.

Overview¶

`realkd.rules.GradientBoostingRuleEnsemble`([…])	Additive rule ensemble fitted by gradient boosting.
`realkd.rules.logistic_loss`	Logistic loss function l(y, s) = log2(1 + exp(-ys)).
`realkd.rules.Rule`([q, y, z, loss, reg, …])	Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.
`realkd.rules.squared_loss`	Squared loss function l(y, s) = (y-s)^2.

Details¶

realkd.rules.logistic_loss = logistic_loss¶

Logistic loss function l(y, s) = log2(1 + exp(-ys)).

Function assumes that positive and negative values are encoded as +1 and -1, respectively.

>>> y = array([1, -1, 1, -1])
>>> s = array([0, 0, 10, 10])
>>> logistic_loss(y, s)
array([1.00000000e+00, 1.00000000e+00, 6.54967668e-05, 1.44270159e+01])
>>> logistic_loss.g(y, s)
array([-5.00000000e-01,  5.00000000e-01, -4.53978687e-05,  9.99954602e-01])
>>> logistic_loss.h(y, s)
array([2.50000000e-01, 2.50000000e-01, 4.53958077e-05, 4.53958077e-05])

realkd.rules.squared_loss = squared_loss¶

Squared loss function l(y, s) = (y-s)^2.

>>> squared_loss
squared_loss
>>> y = array([-2, 0, 3])
>>> s = array([0, 1, 2])
>>> squared_loss(y, s)
array([4, 1, 1])
>>> squared_loss.g(y, s)
array([ 4,  2, -2])
>>> squared_loss.h(y, s)
array([2, 2, 2])

class realkd.rules.Rule(q=True, y=0.0, z=0.0, loss=<class 'realkd.rules.SquaredLoss'>, reg=1.0, max_col_attr=10, discretization=<function qcut>, method='bestboundfirst', apx=1.0, max_depth=None)¶

Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.

>>> import pandas as pd
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> titanic[['Name', 'Sex', 'Survived']].iloc[0]
Name        Braund, Mr. Owen Harris
Sex                            male
Survived                          0
Name: 0, dtype: object
>>> titanic[['Name', 'Sex', 'Survived']].iloc[1]
Name        Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                    female
Survived                                                    1
Name: 1, dtype: object

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r = Rule(female, 1.0, 0.0)
>>> r(titanic.iloc[0]), r(titanic.iloc[1])
(0.0, 1.0)
>>> target = titanic.Survived
>>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True)
>>> opt = Rule(reg=0.0)
>>> opt.fit(titanic, target)
   +0.7420 if Sex==female

>>> best_logistic = Rule(loss='logistic')
>>> best_logistic.fit(titanic, target.replace(0, -1))
   -1.4248 if Pclass>=2 & Sex==male

>>> best_logistic.predict(titanic) 
array([-1.,  1.,  1.,  1., ...,  1.,  1., -1.])

>>> greedy = Rule(loss='logistic', reg=1.0, method='greedy')
>>> greedy.fit(titanic, target.replace(0, -1))
   -1.4248 if Pclass>=2 & Sex==male

>>> empty = Rule()
>>> empty
   +0.0000 if True

Parameters

q –
y –
z –
loss –
reg –
max_col_attr –
discretization –
method –
apx – approximation ratio (ignored when method ‘greedy’)

__call__(x)¶

Predicts score for input data based on loss function.

For instance for logistic loss will return log odds of the positive class.

Parameters: x –
Returns

fit(data, target, scores=None, verbose=False)¶

Fits rule to provide best loss reduction on given data (where the baseline prediction scores are either given explicitly through the scores parameter or are assumed to be 0.

Parameters

data – pandas DataFrame containing only the feature columns
target – pandas Series containing the target values
scores – prior prediction scores according to which the reduction in prediction loss is optimised
verbose – whether to print status update and summary of query search

Returns

self

predict_proba(data)¶

Generates probability predictions for

Parameters: data – pandas dataframe with data to predict probabilities for
Returns: two-dimensional array of probabilities

class realkd.rules.GradientBoostingRuleEnsemble(max_rules=3, loss=<class 'realkd.rules.SquaredLoss'>, members=[], reg=1.0, max_col_attr=10, discretization=<function qcut>, offset_rule=False, method='bestboundfirst', apx=1.0, max_depth=None)¶

Additive rule ensemble fitted by gradient boosting.

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r1 = Rule(Conjunction([]), -0.5, 0.0)
>>> r2 = Rule(female, 1.0, 0.0)
>>> r3 = Rule(female, 0.3, 0.0)
>>> r4 = Rule(Conjunction([]), -0.2, 0.0)
>>> ensemble = GradientBoostingRuleEnsemble(members=[r1, r2, r3, r4])
>>> len(ensemble)
4
>>> ensemble[2]
   +0.3000 if Sex==female

>>> ensemble[:2]
   -0.5000 if True
   +1.0000 if Sex==female

>>> import pandas as pd
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> survived = titanic.Survived
>>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True)
>>> re = GradientBoostingRuleEnsemble(loss=logistic_loss)
>>> re.fit(titanic, survived.replace(0, -1), verbose=0) 
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0

# performance with bestboundfirst: # <BLANKLINE> # Found optimum after inspecting 443 nodes # -1.4248 if Pclass>=2 & Sex==male # <BLANKLINE> # Found optimum after inspecting 786 nodes # +1.7471 if Pclass<=2 & Sex==female # ** # Found optimum after inspecting 6564 nodes #

>>> re_with_offset = GradientBoostingRuleEnsemble(max_rules=2, loss='logistic', offset_rule=True)
>>> re_with_offset.fit(titanic, survived.replace(0, -1))
   -0.4626 if True
   +2.3076 if Pclass<=2 & Sex==female

>>> greedy = GradientBoostingRuleEnsemble(max_rules=3, loss='logistic', method='greedy')
>>> greedy.fit(titanic, survived.replace(0, -1)) 
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   -0.4225 if Parch<=1.0 & Sex==male

__call__(x)¶

Computes combined prediction scores using all ensemble members.

Parameters: x – dataframe to make predictions for
Returns: vector of prediction scores for all rows in x

consolidated(inplace=False)¶

Consolidates rules with equivalent queries into one.

Parameters: inplace – whether to update self or to create new ensemble
Returns: reference to consolidated ensemble (self if inplace=True)

For example:

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r1 = Rule(Conjunction([]), -0.5, 0.0)
>>> r2 = Rule(female, 1.0, 0.0)
>>> r3 = Rule(female, 0.3, 0.0)
>>> r4 = Rule(Conjunction([]), -0.2, 0.0)
>>> ensemble = GradientBoostingRuleEnsemble(members=[r1, r2, r3, r4])
>>> ensemble.consolidated(inplace=True) 
-0.7000 if True
+1.3000 if Sex==female