-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAppendix_A8.tex
16 lines (14 loc) · 2.07 KB
/
Appendix_A8.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
In case $\mathbf x$ has too many discretionary features, the classification rule is likely to be irrelevant. Indeed, the intersection between $\mathbf x$ and $\mathbb{F}_{\mathbf X}$ is to small to hold enough information and make strong analogies with $\mathbf x$. To overcome this drawback, $2^{\mathbb F}$ is split into two subsets:
\begin{itemize}
\item $\mathcal{F}_1 = \{ \mathbf x \in 2^{\mathbb F} ~ | ~ |\mathbf x \cap \mathbb{F}_{\mathbf X}| \geq \delta\}, ~ \forall \delta \in \mathbb{N}$
\item $\mathcal{F}_2 = 2^{\mathbb F} \setminus \mathcal{F}_1$
\end{itemize}
$\mathcal{F}_1$ corresponds to the elements s.t. they share some features with the examples. An alternative may be considered by using $\mathcal{F}_1 = \{\mathbf x \in 2^{\mathbb F} ~ | ~ \frac{D_\mathbf x}{|\mathbf x|} \leq \delta\}, ~ \forall \delta \in [0,1]$. In this case, $\mathcal{F}_1$ contains the elements for which we have enough information provided by the examples. From our preliminary tests, the choice depends on the dataset structure.
Finally, the decision rule for new cases is built as follows:
\begin{align}
\tag{R2} \label{eqn:updated_cr}
\bar J(\mathbf x) = \left\{\begin{matrix}
\tilde J(\mathbf x) & \text{if} ~ \mathbf x \in \mathcal{F}_1 \\
o_{\mathbf x} & \text{if} ~ \mathbf x \in \mathcal{F}_2
\end{matrix}\right.
\end{align} where $o_{\mathbf x}$ is one draw from a random variable that has Bernoulli law with parameter $p=\frac{|\{\mathbf x \in \mathbf X | J(\mathbf x) = 1 \}|}{|\mathbf X|}$, i.e. the prevalence of class $1$ in $\mathbf X$. It assumes that the prevalence of $\mathbf X$ is close to the prevalence over $2^{\mathbb F}$ (or that the prevalence does not change in time for sequential problems in which the new cases are generated by an unknown random measure). The rationale behind is that if for a case $\mathbf x$, it is not possible to exploit the model built on the hypergraph, then we can still model $J$ as a Bernoulli random variable and use a maximum likelihood estimation for $p$. In a sense, it is extending the {\it local} model to the entire input space $2^\mathbb F$.