Rules¶
Loss functions and models for rule learning.
Overview¶
|
Rules ensemble that combines scores of its member rules additively to form predictions. |
Logistic loss function l(y, s) = log2(1 + exp(-ys)). |
|
|
Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q. |
Additive rule ensemble fitted by boosting. |
|
Squared loss function l(y, s) = (y-s)^2. |
|
|
Fits a rule based on first and second loss derivatives of some prior prediction values. |
Details¶
-
realkd.rules.
logistic_loss
= logistic_loss¶ Logistic loss function l(y, s) = log2(1 + exp(-ys)).
Function assumes that positive and negative values are encoded as +1 and -1, respectively.
>>> y = array([1, -1, 1, -1]) >>> s = array([0, 0, 10, 10]) >>> logistic_loss(y, s) array([1.00000000e+00, 1.00000000e+00, 6.54967668e-05, 1.44270159e+01]) >>> logistic_loss.g(y, s) array([-5.00000000e-01, 5.00000000e-01, -4.53978687e-05, 9.99954602e-01]) >>> logistic_loss.h(y, s) array([2.50000000e-01, 2.50000000e-01, 4.53958077e-05, 4.53958077e-05])
-
realkd.rules.
loss_functions
= {'logistic': logistic_loss, 'logistic_loss': logistic_loss, 'squared': squared_loss, 'squared_loss': squared_loss}¶ Dictionary of available loss functions with keys corresponding to their string representations.
-
realkd.rules.
squared_loss
= squared_loss¶ Squared loss function l(y, s) = (y-s)^2.
>>> squared_loss squared_loss >>> y = array([-2, 0, 3]) >>> s = array([0, 1, 2]) >>> squared_loss(y, s) array([4, 1, 1]) >>> squared_loss.g(y, s) array([ 4, 2, -2]) >>> squared_loss.h(y, s) array([2, 2, 2])
-
class
realkd.rules.
AdditiveRuleEnsemble
(members=[])¶ Rules ensemble that combines scores of its member rules additively to form predictions.
While order of rules does not influence predictions, it is important for indexing and slicing, which provides convenient access to individual ensemble members and modified ensembles.
For example:
>>> female = KeyValueProposition('Sex', Constraint.equals('female')) >>> r1 = Rule(Conjunction([]), -0.5, 0.0) >>> r2 = Rule(female, 1.0, 0.0) >>> r3 = Rule(female, 0.3, 0.0) >>> r4 = Rule(Conjunction([]), -0.2, 0.0) >>> ensemble = AdditiveRuleEnsemble(members=[r1, r2, r3, r4]) >>> len(ensemble) 4 >>> ensemble[2] +0.3000 if Sex==female >>> ensemble[:2] -0.5000 if True +1.0000 if Sex==female
- Parameters
members (List[Rule]) – the individual rules that make up the ensemble
-
__call__
(x)¶ Computes combined prediction scores using all ensemble members.
- Parameters
x (DataFrame) – input data
- Returns
array
of prediction scores (one for each rows in x)
-
append
(rule)¶ Adds a rule to the ensemble.
- Parameters
rule (Rule) – the rule to be added
- Returns
self
-
consolidated
(inplace=False)¶ Consolidates rules with equivalent queries into one.
- Parameters
inplace (bool) – whether to update self or to create new ensemble
- Returns
reference to consolidated ensemble (self if inplace=True)
For example:
>>> female = KeyValueProposition('Sex', Constraint.equals('female')) >>> r1 = Rule(Conjunction([]), -0.5, 0.0) >>> r2 = Rule(female, 1.0, 0.0) >>> r3 = Rule(female, 0.3, 0.0) >>> r4 = Rule(Conjunction([]), -0.2, 0.0) >>> ensemble = AdditiveRuleEnsemble([r1, r2, r3, r4]) >>> ensemble.consolidated(inplace=True) -0.7000 if True +1.3000 if Sex==female
-
size
()¶ Computes the total size of the ensemble.
Currently, this is defined as the number of rules (length of the ensemble) plus the the number of elementary conditions in all rule queries.
In the future this is subject to change to a more general notion of size (taking into account the possible greater number of parameters of more complex rules).
- Returns
size of ensemble as defined above
-
class
realkd.rules.
Rule
(q=True, y=0.0, z=0.0)¶ Represents a rule of the form “r(x) = y if q(x) else z” for some binary query function q.
>>> import pandas as pd >>> titanic = pd.read_csv('../datasets/titanic/train.csv') >>> titanic[['Name', 'Sex', 'Survived']].iloc[0] Name Braund, Mr. Owen Harris Sex male Survived 0 Name: 0, dtype: object >>> titanic[['Name', 'Sex', 'Survived']].iloc[1] Name Cumings, Mrs. John Bradley (Florence Briggs Th... Sex female Survived 1 Name: 1, dtype: object
>>> female = KeyValueProposition('Sex', Constraint.equals('female')) >>> r = Rule(female, 1.0, 0.0) >>> r(titanic.iloc[0]), r(titanic.iloc[1]) (0.0, 1.0)
>>> empty = Rule() >>> empty +0.0000 if True
- Parameters
q (Conjunction) – rule query (antecedent/condition)
y (float) – prediction value if query satisfied
z (float) – prediction value if query not satisfied
-
__call__
(x)¶ Predicts score for input data based on loss function.
For instance for logistic loss will return log odds of the positive class.
- Parameters
x (DataFrame) – input data
- Returns
array
of prediction scores (one for each rows in x)
-
class
realkd.rules.
RuleBoostingEstimator
(num_rules=3, base_learner=XGBRuleEstimator(reg=1.0, loss=squared), verbose=False)¶ Additive rule ensemble fitted by boosting.
That is, rules are fitted iteratively by one or more base learners until a desired number of rules has been learned. In each iteration, the base learner fits the training data taking into account the prediction scores of the already fixed part of the ensemble.
Therefore, base learners need to provide a fit method that can take into account prior predictions (see
XGBRuleEstimator.fit()
).>>> import pandas as pd >>> from sklearn.metrics import roc_auc_score >>> titanic = pd.read_csv('../datasets/titanic/train.csv') >>> survived = titanic.Survived >>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True) >>> re = RuleBoostingEstimator(base_learner=XGBRuleEstimator(loss=logistic_loss)) >>> re.fit(titanic, survived.replace(0, -1)).rules_ -1.4248 if Pclass>=2 & Sex==male +1.7471 if Pclass<=2 & Sex==female +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0
Multiple base learners can be specified and are used sequentially. The last based learner is used as many times as necessary to learn the desired number of rules. This mechanism can, e.g., be used to fit an “offset rule”:
>>> re_with_offset = RuleBoostingEstimator(num_rules=2, base_learner=[XGBRuleEstimator(loss='logistic', query = Conjunction([])), XGBRuleEstimator(loss='logistic')]) >>> re_with_offset.fit(titanic, survived.replace(0, -1)).rules_ -0.4626 if True +2.3076 if Pclass<=2 & Sex==female
>>> greedy = RuleBoostingEstimator(num_rules=3, base_learner=XGBRuleEstimator(loss='logistic', search='greedy')) >>> greedy.fit(titanic, survived.replace(0, -1)).rules_ -1.4248 if Pclass>=2 & Sex==male +1.7471 if Pclass<=2 & Sex==female -0.4225 if Parch<=1.0 & Sex==male >>> roc_auc_score(survived, greedy.rules_(titanic)) 0.8321136782454011 >>> opt = RuleBoostingEstimator(num_rules=3, base_learner=XGBRuleEstimator(loss='logistic', search='exhaustive')) >>> opt.fit(titanic, survived.replace(0, -1)).rules_ -1.4248 if Pclass>=2 & Sex==male +1.7471 if Pclass<=2 & Sex==female +2.5598 if Age<=19.0 & Fare>=7.8542 & Parch>=1.0 & Sex==male & SibSp<=1.0 >>> roc_auc_score(survived, opt.rules_(titanic)) 0.8490530363553084
The fitted model can be used to predict for new Xs.
>>> columns = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked'] >>> new_passengers = [[2, 'male', 32, 1, 0, 10, 'Q'], [2, 'female', 62, 0, 0, 7, 'S']] >>> new_data = pd.DataFrame(new_passengers, columns=columns) >>> re.predict(new_data) array([-1., 1.]) >>> re.predict_proba(new_data) array([[0.80609552, 0.19390448], [0.14841001, 0.85158999]])
- Parameters
num_rules (int) – the desired number of ensemble members
base_learner (Estimator|Sequence[Estimator]) – the base learner(s) to be used in each iteration (last base learner is used as many time as necessary to fit desired number of rules)
-
decision_function
(x)¶ Computes combined prediction scores using all ensemble members.
- Parameters
x (DataFrame) – input data
- Returns
array
of prediction scores (one for each rows in x)
-
class
realkd.rules.
XGBRuleEstimator
(loss='squared', reg=1.0, search='exhaustive', search_params={'apx': 1.0, 'discretization': <function qcut>, 'max_col_attr': 10, 'max_depth': None, 'order': 'bestboundfirst'}, query=None)¶ Fits a rule based on first and second loss derivatives of some prior prediction values.
In more detail, given some prior prediction values \(f(x)\) and a twice differentiable loss function \(l(y,f(x))\), a rule \(r(x)=wq(x)\) is fitted by finding a binary query \(q\) via maximizing the objective function
\[\mathrm{obj}(q) = \frac{\left( \sum_{i \in I(q)} g_i \right )^2}{2n \left(\lambda + \sum_{i \in I(q)} h_i \right)}\]and finding the optimal weight as
\[w = -\frac{\sum_{i \in I(q)} g_i}{\lambda + \sum_{i \in I(q)} h_i} \enspace .\]Here, \(I(q)\) denotes the indices of training examples selected by \(q\) and
\[g_i=\frac{\mathrm{d} l(y_i, y)}{\mathrm{d}y}\Bigr|_{\substack{y=f(x_i)}} \enspace , \quad h_i=\frac{\mathrm{d}^2 l(y_i, y)}{\mathrm{d}y^2}\Bigr|_{\substack{y=f(x_i)}}\]refer to the first and second order gradient statistics of the prior prediction values.
>>> import pandas as pd >>> titanic = pd.read_csv('../datasets/titanic/train.csv') >>> target = titanic.Survived >>> titanic.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin', 'Survived'], inplace=True) >>> opt = XGBRuleEstimator(reg=0.0) >>> opt.fit(titanic, target).rule_ +0.7420 if Sex==female
>>> best_logistic = XGBRuleEstimator(loss='logistic') >>> best_logistic.fit(titanic, target.replace(0, -1)).rule_ -1.4248 if Pclass>=2 & Sex==male
>>> best_logistic.predict(titanic) array([-1., 1., 1., 1., ..., 1., 1., -1.])
>>> greedy = XGBRuleEstimator(loss='logistic', reg=1.0, search='greedy') >>> greedy.fit(titanic, target.replace(0, -1)).rule_ -1.4248 if Pclass>=2 & Sex==male
- Parameters
loss (str|callable) – loss function either specified via string identifier (e.g.,
'squared'
for regression or'logistic'
for classification) or directly has callable loss function with defined first and second derivative (seeloss_functions
)reg (float) – the regularization parameter \(\lambda\)
search (str|type) – search method either specified via string identifier (e.g.,
'greedy'
or'exhaustive'
) or directly as search type (seerealkd.search.search_methods()
)search_params (dict) – parameters to apply to discretization (when creating binary search context from dataframe via
from_df()
) as well as to actual search method (specified bymethod
). Seesearch
.
-
decision_function
(x)¶ Predicts score for input data based on loss function.
For instance for logistic loss will return log odds of the positive class.
- Parameters
x (DataFrame) – input data
- Returns
array
of prediction scores (one for each rows in x)
-
fit
(data, target, scores=None, verbose=False)¶ Fits rule to provide best loss reduction on given data (where the baseline prediction scores are either given explicitly through the scores parameter or are assumed to be 0.
- Parameters
data – pandas DataFrame containing only the feature columns
target – pandas Series containing the target values
scores – prior prediction scores according to which the reduction in prediction loss is optimised
verbose – whether to print status update and summary of query search
- Returns
self
-
predict
(data)¶ Generates predictions for input data.
- Parameters
data – pandas dataframe with co-variates for which to make predictions
- Returns
array of predictions
-
predict_proba
(data)¶ Generates probability predictions for input data.
This method is only supported for suitable loss functions.
- Parameters
data – pandas dataframe with data to predict probabilities for
- Returns
array of probabilities (shape according to number of classes)