Skip to content

Commit

Permalink
[Documentation] make documentation of GATConv and GATv2Conv more …
Browse files Browse the repository at this point in the history
…explicit and accurate (#8201)

## What?
I explicitly named the different `\Theta`s used in the different layers
by an index s and t (like it is already done in the docs when talking
about input sizes). I did the same for a in the GATconv layer. Finally
removed the unclear `||` operator and just wrote out the few additions.

## Why?
To solve the issues discussed in
#8200

## Testing?
No tests needed, as only documentation was changed.

---------

Co-authored-by: Jintang Li <[email protected]>
Co-authored-by: rusty1s <[email protected]>
  • Loading branch information
3 people authored Oct 21, 2023
1 parent 7085a3a commit 4553ca8
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 25 deletions.
36 changes: 23 additions & 13 deletions torch_geometric/nn/conv/gat_conv.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,20 +31,23 @@ class GATConv(MessagePassing):
<https://arxiv.org/abs/1710.10903>`_ paper
.. math::
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}_{s}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)}
\alpha_{i,j}\mathbf{\Theta}_{t}\mathbf{x}_{j},
where the attention coefficients :math:`\alpha_{i,j}` are computed as
.. math::
\alpha_{i,j} =
\frac{
\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}
[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j]
\exp\left(\mathrm{LeakyReLU}\left(
\mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i
+ \mathbf{a}^{\top}_{t} \mathbf{\Theta}_{t}\mathbf{x}_j
\right)\right)}
{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}
\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}
[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k]
\exp\left(\mathrm{LeakyReLU}\left(
\mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i
+ \mathbf{a}^{\top}_{t}\mathbf{\Theta}_{t}\mathbf{x}_k
\right)\right)}.
If the graph has multi-dimensional edge features :math:`\mathbf{e}_{i,j}`,
Expand All @@ -53,19 +56,26 @@ class GATConv(MessagePassing):
.. math::
\alpha_{i,j} =
\frac{
\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}
[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_j
\, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,j}]\right)\right)}
\exp\left(\mathrm{LeakyReLU}\left(
\mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i
+ \mathbf{a}^{\top}_{t} \mathbf{\Theta}_{t}\mathbf{x}_j
+ \mathbf{a}^{\top}_{e} \mathbf{\Theta}_{e} \mathbf{e}_{i,j}
\right)\right)}
{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}
\exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top}
[\mathbf{\Theta}\mathbf{x}_i \, \Vert \, \mathbf{\Theta}\mathbf{x}_k
\, \Vert \, \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]\right)\right)}.
\exp\left(\mathrm{LeakyReLU}\left(
\mathbf{a}^{\top}_{s} \mathbf{\Theta}_{s}\mathbf{x}_i
+ \mathbf{a}^{\top}_{t} \mathbf{\Theta}_{t}\mathbf{x}_k
+ \mathbf{a}^{\top}_{e} \mathbf{\Theta}_{e} \mathbf{e}_{i,k}
\right)\right)}.
If the graph is not bipartite, :math:`\mathbf{\Theta}_{s} =
\mathbf{\Theta}_{t}`.
Args:
in_channels (int or tuple): Size of each input sample, or :obj:`-1` to
derive the size from the first input(s) to the forward method.
A tuple corresponds to the sizes of source and target
dimensionalities.
dimensionalities in case of a bipartite graph.
out_channels (int): Size of each output sample.
heads (int, optional): Number of multi-head-attentions.
(default: :obj:`1`)
Expand Down
30 changes: 18 additions & 12 deletions torch_geometric/nn/conv/gatv2_conv.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,21 @@ class GATv2Conv(MessagePassing):
In contrast, in :class:`GATv2`, every node can attend to any other node.
.. math::
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}\mathbf{x}_{j},
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}_{s}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)}
\alpha_{i,j}\mathbf{\Theta}_{t}\mathbf{x}_{j},
where the attention coefficients :math:`\alpha_{i,j}` are computed as
.. math::
\alpha_{i,j} =
\frac{
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(\mathbf{\Theta}
[\mathbf{x}_i \, \Vert \, \mathbf{x}_j]
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(
\mathbf{\Theta}_{s} \mathbf{x}_i + \mathbf{\Theta}_{t} \mathbf{x}_j
\right)\right)}
{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(\mathbf{\Theta}
[\mathbf{x}_i \, \Vert \, \mathbf{x}_k]
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(
\mathbf{\Theta}_{s} \mathbf{x}_i + \mathbf{\Theta}_{t} \mathbf{x}_k
\right)\right)}.
If the graph has multi-dimensional edge features :math:`\mathbf{e}_{i,j}`,
Expand All @@ -56,19 +57,23 @@ class GATv2Conv(MessagePassing):
.. math::
\alpha_{i,j} =
\frac{
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(\mathbf{\Theta}
[\mathbf{x}_i \, \Vert \, \mathbf{x}_j \, \Vert \, \mathbf{e}_{i,j}]
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(
\mathbf{\Theta}_{s} \mathbf{x}_i
+ \mathbf{\Theta}_{t} \mathbf{x}_j
+ \mathbf{\Theta}_{e} \mathbf{e}_{i,j}
\right)\right)}
{\sum_{k \in \mathcal{N}(i) \cup \{ i \}}
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(\mathbf{\Theta}
[\mathbf{x}_i \, \Vert \, \mathbf{x}_k \, \Vert \, \mathbf{e}_{i,k}]
\exp\left(\mathbf{a}^{\top}\mathrm{LeakyReLU}\left(
\mathbf{\Theta}_{s} \mathbf{x}_i
+ \mathbf{\Theta}_{t} \mathbf{x}_k
+ \mathbf{\Theta}_{e} \mathbf{e}_{i,k}]
\right)\right)}.
Args:
in_channels (int or tuple): Size of each input sample, or :obj:`-1` to
derive the size from the first input(s) to the forward method.
A tuple corresponds to the sizes of source and target
dimensionalities.
dimensionalities in case of a bipartite graph.
out_channels (int): Size of each output sample.
heads (int, optional): Number of multi-head-attentions.
(default: :obj:`1`)
Expand Down Expand Up @@ -96,7 +101,8 @@ class GATv2Conv(MessagePassing):
bias (bool, optional): If set to :obj:`False`, the layer will not learn
an additive bias. (default: :obj:`True`)
share_weights (bool, optional): If set to :obj:`True`, the same matrix
will be applied to the source and the target node of every edge.
will be applied to the source and the target node of every edge,
*i.e.* :math:`\mathbf{\Theta}_{s} = \mathbf{\Theta}_{t}`.
(default: :obj:`False`)
**kwargs (optional): Additional arguments of
:class:`torch_geometric.nn.conv.MessagePassing`.
Expand Down

0 comments on commit 4553ca8

Please sign in to comment.