Subgroups¶
Early experimental interface to subgroup discovery methods.
Overview¶
Fits rules with conjunctive query based on multiplicative combination |
Details¶
-
class
realkd.subgroups.
ImpactRuleEstimator
(alpha=1.0, search='greedy', search_params={}, verbose=False)¶ Fits rules with conjunctive query based on multiplicative combination of query coverage and effect of query satisfaction on target mean. Formally, for dataset D and target variable y:
egin{equation} \mathrm{imp}(q) = \left(rac{|\mathrm{ext}(q)|}{|D|} ight)^lpha (mathrm{mean}(y; mathrm{ext}(q)) - mathrm{mean}(y; D)) .
end{equation}
>>> import pandas as pd >>> titanic = pd.read_csv("../datasets/titanic/train.csv") >>> survived = titanic['Survived'] >>> titanic.drop(columns=['Survived', 'PassengerId', 'Name', 'Ticket', 'Cabin'], inplace=True) >>> subgroup = ImpactRuleEstimator(search='exhaustive', verbose=False) >>> subgroup.fit(titanic, survived).rule_ +0.7420 if Sex==female >>> subgroup.score(titanic, survived) 0.12623428448344273 >>> subgroup2 = ImpactRuleEstimator(alpha=0.5, search='exhaustive', verbose=False) >>> subgroup2.fit(titanic, survived).rule_ +0.9471 if Pclass<=2 & Sex==female >>> subgroup2.score(titanic, survived) 0.24601637556150627
- Parameters
alpha – (exponential) weight of coverage term
search (str|type) – search method either specified via string identifier (e.g.,
'greedy'
or'exhaustive'
) or directly as search type (seerealkd.search.search_methods()
)search_params (dict) – parameters to apply to discretization (when creating binary search context from dataframe via
from_df()
) as well as to actual search method (specified bymethod
). Seesearch
.verbose –