Datasets¶
Access to example datasets and distributions.
-
realkd.datasets.
noisy_parity
(n, d=3, variance=0.25, as_df=True, random_seed=None)¶ Generates observations of mixture model of Gaussian clusters centered at nodes of hypercube \(\{-1, 1\}^d\) labelled according to parity of cube node.
That is,
\begin{align*} C &\sim \mathrm{Unif}(\{0, 1\}^d)\\ X | C &\sim \mathrm{Norm}(C, \sigma^2 I_d)\\ Y | C &= \prod_{i=1}^d C_i \end{align*}For example:
>>> x, y = noisy_parity(10, random_seed=0) >>> x x1 x2 x3 0 0.633866 0.727871 0.841850 1 -0.794185 -0.478743 -1.064267 2 -0.316768 -1.332597 -0.824245 3 1.451735 1.047006 0.628250 4 0.539137 0.771137 1.110098 5 0.495191 0.895412 0.920387 6 1.270423 1.107330 -0.822314 7 0.673086 0.935193 -0.608012 8 -0.253284 0.370467 1.756962 9 -0.327062 1.390656 1.132228
>>> y 0 1 1 -1 2 -1 3 1 4 1 5 1 6 -1 7 -1 8 -1 9 -1 dtype: int64
- Parameters
n – number of observations
d – dimension of data
variance – variance of the clusters
as_df – whether to wrap return value in pandas dataframe/series
random_seed – seed passed to np.random.default_rng
- Returns
dataframe/matrix x and corresponding label series/arrays