Subgroups¶

Early experimental interface to subgroup discovery methods.

Overview¶

realkd.subgroups.ImpactRuleEstimator([…])

Fits rules with conjunctive query based on multiplicative combination

Details¶

class realkd.subgroups.ImpactRuleEstimator(alpha=1.0, search='greedy', search_params={}, verbose=False)¶

Fits rules with conjunctive query based on multiplicative combination of query coverage and effect of query satisfaction on target mean. Formally, for dataset D and target variable y:

egin{equation} \mathrm{imp}(q) = \left(

rac{|\mathrm{ext}(q)|}{|D|} ight)^lpha (mathrm{mean}(y; mathrm{ext}(q)) - mathrm{mean}(y; D)) .

end{equation}

>>> import pandas as pd
>>> titanic = pd.read_csv("../datasets/titanic/train.csv")
>>> survived = titanic['Survived']
>>> titanic.drop(columns=['Survived', 'PassengerId', 'Name', 'Ticket', 'Cabin'], inplace=True)
>>> subgroup = ImpactRuleEstimator(search='exhaustive', verbose=False)
>>> subgroup.fit(titanic, survived).rule_
   +0.7420 if Sex==female
>>> subgroup.score(titanic, survived)
0.12623428448344273
>>> subgroup2 = ImpactRuleEstimator(alpha=0.5, search='exhaustive', verbose=False)
>>> subgroup2.fit(titanic, survived).rule_
   +0.9471 if Pclass<=2 & Sex==female
>>> subgroup2.score(titanic, survived)
0.24601637556150627

Parameters

alpha – (exponential) weight of coverage term
search (str|type) – search method either specified via string identifier (e.g., 'greedy' or 'exhaustive') or directly as search type (see realkd.search.search_methods())
search_params (dict) – parameters to apply to discretization (when creating binary search context from dataframe via from_df()) as well as to actual search method (specified by method). See search.
verbose –