Rules

Loss functions and models for rule learning.

Overview

realkd.rules.GradientBoostingRuleEnsemble([…])

Additive rule ensemble fitted by gradient boosting.

realkd.rules.logistic_loss

Logistic loss function l(y, s) = log2(1 + exp(-ys)).

realkd.rules.Rule([q, y, z, loss, reg, …])

Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.

realkd.rules.squared_loss

Squared loss function l(y, s) = (y-s)^2.

Details

realkd.rules.logistic_loss = logistic_loss

Logistic loss function l(y, s) = log2(1 + exp(-ys)).

Function assumes that positive and negative values are encoded as +1 and -1, respectively.

>>> y = array([1, -1, 1, -1])
>>> s = array([0, 0, 10, 10])
>>> logistic_loss(y, s)
array([1.00000000e+00, 1.00000000e+00, 6.54967668e-05, 1.44270159e+01])
>>> logistic_loss.g(y, s)
array([-5.00000000e-01,  5.00000000e-01, -4.53978687e-05,  9.99954602e-01])
>>> logistic_loss.h(y, s)
array([2.50000000e-01, 2.50000000e-01, 4.53958077e-05, 4.53958077e-05])
realkd.rules.squared_loss = squared_loss

Squared loss function l(y, s) = (y-s)^2.

>>> squared_loss
squared_loss
>>> y = array([-2, 0, 3])
>>> s = array([0, 1, 2])
>>> squared_loss(y, s)
array([4, 1, 1])
>>> squared_loss.g(y, s)
array([ 4,  2, -2])
>>> squared_loss.h(y, s)
array([2, 2, 2])
class realkd.rules.Rule(q=True, y=0.0, z=0.0, loss=<class 'realkd.rules.SquaredLoss'>, reg=1.0, max_col_attr=10, discretization=<function qcut>, method='bestboundfirst', apx=1.0, max_depth=None)

Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.

>>> import pandas as pd
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> titanic[['Name', 'Sex', 'Survived']].iloc[0]
Name        Braund, Mr. Owen Harris
Sex                            male
Survived                          0
Name: 0, dtype: object
>>> titanic[['Name', 'Sex', 'Survived']].iloc[1]
Name        Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                    female
Survived                                                    1
Name: 1, dtype: object
>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r = Rule(female, 1.0, 0.0)
>>> r(titanic.iloc[0]), r(titanic.iloc[1])
(0.0, 1.0)
>>> target = titanic.Survived
>>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True)
>>> opt = Rule(reg=0.0)
>>> opt.fit(titanic, target)
   +0.7420 if Sex==female
>>> best_logistic = Rule(loss='logistic')
>>> best_logistic.fit(titanic, target.replace(0, -1))
   -1.4248 if Pclass>=2 & Sex==male
>>> best_logistic.predict(titanic) 
array([-1.,  1.,  1.,  1., ...,  1.,  1., -1.])
>>> greedy = Rule(loss='logistic', reg=1.0, method='greedy')
>>> greedy.fit(titanic, target.replace(0, -1))
   -1.4248 if Pclass>=2 & Sex==male
>>> empty = Rule()
>>> empty
   +0.0000 if True
Parameters
  • q

  • y

  • z

  • loss

  • reg

  • max_col_attr

  • discretization

  • method

  • apx – approximation ratio (ignored when method ‘greedy’)

__call__(x)

Predicts score for input data based on loss function.

For instance for logistic loss will return log odds of the positive class.

Parameters

x

Returns

fit(data, target, scores=None, verbose=False)

Fits rule to provide best loss reduction on given data (where the baseline prediction scores are either given explicitly through the scores parameter or are assumed to be 0.

Parameters
  • data – pandas DataFrame containing only the feature columns

  • target – pandas Series containing the target values

  • scores – prior prediction scores according to which the reduction in prediction loss is optimised

  • verbose – whether to print status update and summary of query search

Returns

self

predict_proba(data)

Generates probability predictions for

Parameters

data – pandas dataframe with data to predict probabilities for

Returns

two-dimensional array of probabilities

class realkd.rules.GradientBoostingRuleEnsemble(max_rules=3, loss=<class 'realkd.rules.SquaredLoss'>, members=[], reg=1.0, max_col_attr=10, discretization=<function qcut>, offset_rule=False, method='bestboundfirst', apx=1.0, max_depth=None)

Additive rule ensemble fitted by gradient boosting.

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r1 = Rule(Conjunction([]), -0.5, 0.0)
>>> r2 = Rule(female, 1.0, 0.0)
>>> r3 = Rule(female, 0.3, 0.0)
>>> r4 = Rule(Conjunction([]), -0.2, 0.0)
>>> ensemble = GradientBoostingRuleEnsemble(members=[r1, r2, r3, r4])
>>> len(ensemble)
4
>>> ensemble[2]
   +0.3000 if Sex==female
>>> ensemble[:2]
   -0.5000 if True
   +1.0000 if Sex==female
>>> import pandas as pd
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> survived = titanic.Survived
>>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True)
>>> re = GradientBoostingRuleEnsemble(loss=logistic_loss)
>>> re.fit(titanic, survived.replace(0, -1), verbose=0) 
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0

# performance with bestboundfirst: # <BLANKLINE> # Found optimum after inspecting 443 nodes # -1.4248 if Pclass>=2 & Sex==male # <BLANKLINE> # Found optimum after inspecting 786 nodes # +1.7471 if Pclass<=2 & Sex==female # ** # Found optimum after inspecting 6564 nodes #

>>> re_with_offset = GradientBoostingRuleEnsemble(max_rules=2, loss='logistic', offset_rule=True)
>>> re_with_offset.fit(titanic, survived.replace(0, -1))
   -0.4626 if True
   +2.3076 if Pclass<=2 & Sex==female
>>> greedy = GradientBoostingRuleEnsemble(max_rules=3, loss='logistic', method='greedy')
>>> greedy.fit(titanic, survived.replace(0, -1)) 
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   -0.4225 if Parch<=1.0 & Sex==male
__call__(x)

Computes combined prediction scores using all ensemble members.

Parameters

x – dataframe to make predictions for

Returns

vector of prediction scores for all rows in x

consolidated(inplace=False)

Consolidates rules with equivalent queries into one.

Parameters

inplace – whether to update self or to create new ensemble

Returns

reference to consolidated ensemble (self if inplace=True)

For example:

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r1 = Rule(Conjunction([]), -0.5, 0.0)
>>> r2 = Rule(female, 1.0, 0.0)
>>> r3 = Rule(female, 0.3, 0.0)
>>> r4 = Rule(Conjunction([]), -0.2, 0.0)
>>> ensemble = GradientBoostingRuleEnsemble(members=[r1, r2, r3, r4])
>>> ensemble.consolidated(inplace=True) 
-0.7000 if True
+1.3000 if Sex==female