Rules¶

Loss functions and models for rule learning.

Overview¶

`realkd.rules.AdditiveRuleEnsemble`([members])	Rules ensemble that combines scores of its member rules additively to form predictions.
`realkd.rules.logistic_loss`	Logistic loss function l(y, s) = log2(1 + exp(-ys)).
`realkd.rules.Rule`([q, y, z])	Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.
`realkd.rules.RuleBoostingEstimator`([…])	Additive rule ensemble fitted by boosting.
`realkd.rules.squared_loss`	Squared loss function l(y, s) = (y-s)^2.
`realkd.rules.XGBRuleEstimator`([loss, reg, …])	Fits a rule based on first and second loss derivatives of some prior prediction values.

Details¶

realkd.rules.logistic_loss = logistic_loss¶

Logistic loss function l(y, s) = log2(1 + exp(-ys)).

Function assumes that positive and negative values are encoded as +1 and -1, respectively.

>>> y = array([1, -1, 1, -1])
>>> s = array([0, 0, 10, 10])
>>> logistic_loss(y, s)
array([1.00000000e+00, 1.00000000e+00, 6.54967668e-05, 1.44270159e+01])
>>> logistic_loss.g(y, s)
array([-5.00000000e-01,  5.00000000e-01, -4.53978687e-05,  9.99954602e-01])
>>> logistic_loss.h(y, s)
array([2.50000000e-01, 2.50000000e-01, 4.53958077e-05, 4.53958077e-05])

realkd.rules.loss_functions = {'logistic': logistic_loss, 'logistic_loss': logistic_loss, 'squared': squared_loss, 'squared_loss': squared_loss}¶: Dictionary of available loss functions with keys corresponding to their string representations.

realkd.rules.squared_loss = squared_loss¶

Squared loss function l(y, s) = (y-s)^2.

>>> squared_loss
squared_loss
>>> y = array([-2, 0, 3])
>>> s = array([0, 1, 2])
>>> squared_loss(y, s)
array([4, 1, 1])
>>> squared_loss.g(y, s)
array([ 4,  2, -2])
>>> squared_loss.h(y, s)
array([2, 2, 2])

class realkd.rules.AdditiveRuleEnsemble(members=[])¶

Rules ensemble that combines scores of its member rules additively to form predictions.

While order of rules does not influence predictions, it is important for indexing and slicing, which provides convenient access to individual ensemble members and modified ensembles.

For example:

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r1 = Rule(Conjunction([]), -0.5, 0.0)
>>> r2 = Rule(female, 1.0, 0.0)
>>> r3 = Rule(female, 0.3, 0.0)
>>> r4 = Rule(Conjunction([]), -0.2, 0.0)
>>> ensemble = AdditiveRuleEnsemble(members=[r1, r2, r3, r4])
>>> len(ensemble)
4
>>> ensemble[2]
   +0.3000 if Sex==female
>>> ensemble[:2]
   -0.5000 if True
   +1.0000 if Sex==female

Parameters: members (List[Rule]) – the individual rules that make up the ensemble

__call__(x)¶

Computes combined prediction scores using all ensemble members.

Parameters: x (DataFrame) – input data
Returns: array of prediction scores (one for each rows in x)

append(rule)¶

Adds a rule to the ensemble.

Parameters: rule (Rule) – the rule to be added
Returns: self

consolidated(inplace=False)¶

Consolidates rules with equivalent queries into one.

Parameters: inplace (bool) – whether to update self or to create new ensemble
Returns: reference to consolidated ensemble (self if inplace=True)

For example:

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r1 = Rule(Conjunction([]), -0.5, 0.0)
>>> r2 = Rule(female, 1.0, 0.0)
>>> r3 = Rule(female, 0.3, 0.0)
>>> r4 = Rule(Conjunction([]), -0.2, 0.0)
>>> ensemble = AdditiveRuleEnsemble([r1, r2, r3, r4])
>>> ensemble.consolidated(inplace=True) 
-0.7000 if True
+1.3000 if Sex==female

size()¶

Computes the total size of the ensemble.

Currently, this is defined as the number of rules (length of the ensemble) plus the the number of elementary conditions in all rule queries.

In the future this is subject to change to a more general notion of size (taking into account the possible greater number of parameters of more complex rules).

Returns: size of ensemble as defined above

class realkd.rules.Rule(q=True, y=0.0, z=0.0)¶

Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.

>>> import pandas as pd
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> titanic[['Name', 'Sex', 'Survived']].iloc[0]
Name        Braund, Mr. Owen Harris
Sex                            male
Survived                          0
Name: 0, dtype: object
>>> titanic[['Name', 'Sex', 'Survived']].iloc[1]
Name        Cumings, Mrs. John Bradley (Florence Briggs Th...
Sex                                                    female
Survived                                                    1
Name: 1, dtype: object

>>> female = KeyValueProposition('Sex', Constraint.equals('female'))
>>> r = Rule(female, 1.0, 0.0)
>>> r(titanic.iloc[0]), r(titanic.iloc[1])
(0.0, 1.0)

>>> empty = Rule()
>>> empty
   +0.0000 if True

Parameters

q (Conjunction) – rule query (antecedent/condition)
y (float) – prediction value if query satisfied
z (float) – prediction value if query not satisfied

__call__(x)¶

Predicts score for input data based on loss function.

For instance for logistic loss will return log odds of the positive class.

Parameters: x (DataFrame) – input data
Returns: array of prediction scores (one for each rows in x)

class realkd.rules.RuleBoostingEstimator(num_rules=3, base_learner=XGBRuleEstimator(reg=1.0, loss=squared), verbose=False)¶

Additive rule ensemble fitted by boosting.

That is, rules are fitted iteratively by one or more base learners until a desired number of rules has been learned. In each iteration, the base learner fits the training data taking into account the prediction scores of the already fixed part of the ensemble.

Therefore, base learners need to provide a fit method that can take into account prior predictions (see XGBRuleEstimator.fit()).

>>> import pandas as pd
>>> from sklearn.metrics import roc_auc_score
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> survived = titanic.Survived
>>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True)
>>> re = RuleBoostingEstimator(base_learner=XGBRuleEstimator(loss=logistic_loss))
>>> re.fit(titanic, survived.replace(0, -1)).rules_
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0

Multiple base learners can be specified and are used sequentially. The last based learner is used as many times as necessary to learn the desired number of rules. This mechanism can, e.g., be used to fit an “offset rule”:

>>> re_with_offset = RuleBoostingEstimator(num_rules=2, base_learner=[XGBRuleEstimator(loss='logistic', query = Conjunction([])), XGBRuleEstimator(loss='logistic')])
>>> re_with_offset.fit(titanic, survived.replace(0, -1)).rules_
   -0.4626 if True
   +2.3076 if Pclass<=2 & Sex==female

>>> greedy = RuleBoostingEstimator(num_rules=3, base_learner=XGBRuleEstimator(loss='logistic', search='greedy'))
>>> greedy.fit(titanic, survived.replace(0, -1)).rules_ 
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   -0.4225 if Parch<=1.0 & Sex==male
>>> roc_auc_score(survived, greedy.rules_(titanic))
0.8321136782454011
>>> opt = RuleBoostingEstimator(num_rules=3, base_learner=XGBRuleEstimator(loss='logistic', search='exhaustive'))
>>> opt.fit(titanic, survived.replace(0, -1)).rules_ 
   -1.4248 if Pclass>=2 & Sex==male
   +1.7471 if Pclass<=2 & Sex==female
   +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0
>>> roc_auc_score(survived, opt.rules_(titanic)) 
0.8490530363553084

The fitted model can be used to predict for new Xs.

>>> columns = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
>>> new_passengers =         [[2, 'male', 32, 1, 0, 10, 'Q'],          [2, 'female', 62, 0, 0, 7, 'S']]
>>> new_data = pd.DataFrame(new_passengers, columns=columns)
>>> re.predict(new_data)
array([-1.,  1.])
>>> re.predict_proba(new_data)
array([[0.80609552, 0.19390448],
       [0.14841001, 0.85158999]])

Parameters

num_rules (int) – the desired number of ensemble members
base_learner (Estimator|Sequence[Estimator]) – the base learner(s) to be used in each iteration (last base learner is used as many time as necessary to fit desired number of rules)

decision_function(x)¶

Computes combined prediction scores using all ensemble members.

Parameters: x (DataFrame) – input data
Returns: array of prediction scores (one for each rows in x)

class realkd.rules.XGBRuleEstimator(loss='squared', reg=1.0, search='exhaustive', search_params={'apx': 1.0, 'discretization': <function qcut>, 'max_col_attr': 10, 'max_depth': None, 'order': 'bestboundfirst'}, query=None)¶

Fits a rule based on first and second loss derivatives of some prior prediction values.

In more detail, given some prior prediction values \(f(x)\) and a twice differentiable loss function \(l(y,f(x))\), a rule \(r(x)=wq(x)\) is fitted by finding a binary query \(q\) via maximizing the objective function

\[\mathrm{obj}(q) = \frac{\left( \sum_{i \in I(q)} g_i \right )^2}{2n \left(\lambda + \sum_{i \in I(q)} h_i \right)}\]

and finding the optimal weight as

\[w = -\frac{\sum_{i \in I(q)} g_i}{\lambda + \sum_{i \in I(q)} h_i} \enspace .\]

Here, \(I(q)\) denotes the indices of training examples selected by \(q\) and

\[g_i=\frac{\mathrm{d} l(y_i, y)}{\mathrm{d}y}\Bigr|_{\substack{y=f(x_i)}} \enspace , \quad h_i=\frac{\mathrm{d}^2 l(y_i, y)}{\mathrm{d}y^2}\Bigr|_{\substack{y=f(x_i)}}\]

refer to the first and second order gradient statistics of the prior prediction values.

>>> import pandas as pd
>>> titanic = pd.read_csv('../datasets/titanic/train.csv')
>>> target = titanic.Survived
>>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True)
>>> opt = XGBRuleEstimator(reg=0.0)
>>> opt.fit(titanic, target).rule_
   +0.7420 if Sex==female

>>> best_logistic = XGBRuleEstimator(loss='logistic')
>>> best_logistic.fit(titanic, target.replace(0, -1)).rule_
   -1.4248 if Pclass>=2 & Sex==male

>>> best_logistic.predict(titanic) 
array([-1.,  1.,  1.,  1., ...,  1.,  1., -1.])

>>> greedy = XGBRuleEstimator(loss='logistic', reg=1.0, search='greedy')
>>> greedy.fit(titanic, target.replace(0, -1)).rule_
   -1.4248 if Pclass>=2 & Sex==male

Parameters

loss (str|callable) – loss function either specified via string identifier (e.g., 'squared' for regression or 'logistic' for classification) or directly has callable loss function with defined first and second derivative (see loss_functions)
reg (float) – the regularization parameter \(\lambda\)
search (str|type) – search method either specified via string identifier (e.g., 'greedy' or 'exhaustive') or directly as search type (see realkd.search.search_methods())
search_params (dict) – parameters to apply to discretization (when creating binary search context from dataframe via from_df()) as well as to actual search method (specified by method). See search.

decision_function(x)¶

Predicts score for input data based on loss function.

For instance for logistic loss will return log odds of the positive class.

Parameters: x (DataFrame) – input data
Returns: array of prediction scores (one for each rows in x)

fit(data, target, scores=None, verbose=False)¶

Fits rule to provide best loss reduction on given data (where the baseline prediction scores are either given explicitly through the scores parameter or are assumed to be 0.

Parameters

data – pandas DataFrame containing only the feature columns
target – pandas Series containing the target values
scores – prior prediction scores according to which the reduction in prediction loss is optimised
verbose – whether to print status update and summary of query search

Returns

self

predict(data)¶

Generates predictions for input data.

Parameters: data – pandas dataframe with co-variates for which to make predictions
Returns: array of predictions

predict_proba(data)¶

Generates probability predictions for input data.

This method is only supported for suitable loss functions.

Parameters: data – pandas dataframe with data to predict probabilities for
Returns: array of probabilities (shape according to number of classes)