Subgroups

Early experimental interface to subgroup discovery methods.

Overview

realkd.subgroups.ImpactRuleEstimator([…])

Fits rules with conjunctive query based on multiplicative combination of query coverage and effect of query satisfaction on target mean.

Details

class realkd.subgroups.ImpactRuleEstimator(gamma=1.0, search='greedy', search_params={}, verbose=False)

Fits rules with conjunctive query based on multiplicative combination of query coverage and effect of query satisfaction on target mean. Formally, for dataset D and target variable y:

\[\mathrm{imp}(q) = (|\mathrm{ext}(q)|/|D|) (\mathrm{mean}(y; \mathrm{ext}(q)) - \mathrm{mean}(y; D)) .\]
>>> import pandas as pd
>>> titanic = pd.read_csv("../datasets/titanic/train.csv")
>>> survived = titanic['Survived']
>>> titanic.drop(columns=['Survived', 'PassengerId', 'Name', 'Ticket', 'Cabin'], inplace=True)
>>> subgroup = ImpactRuleEstimator(search='exhaustive', verbose=False)
>>> subgroup.fit(titanic, survived)
ImpactRuleEstimator(search='exhaustive')
>>> subgroup.rule_
   +0.7420 if Sex==female
>>> subgroup.score(titanic, survived)
0.1262342844834427