# 分类归档 Recommendation System

### Graph Convolutional Matrix Completion

#### Graph Convolutional Matrix Completion

View matrix completion as as link prediction problems on graphs.

Graph auto-encoders for the task of recommendation:

• Encoder: X \in \mathcal{R}^{N \times D} -> f(\cdot, A) -> Z \in \mathcal{R}^{N \times E}
• Decoder: \check{A}=g(Z)

We can train this graph auto-encoder by minimizing the reconstruction error.

###### Graph auto-encoders

Assign seperate processing channels for each edge type(rating type). Specifically, for rating type r, we aggregate the information for user i through the following form (Here other functions can be employed and it is a direction for furture work):

\mu_{j \rightarrow i, r}=\frac{1}{c_{i j}} W_{r} x_{j}

Where c_{ij} is a normalization constant and usually set to |N_i| or \sqrt{|N_i||N_j|}. And then we aggregate the all information with different types into a single vector representation (Convolutional Layer):

h_{i}=\sigma\left[\operatorname{accum}\left(\sum_{j \in \mathcal{N}_{i, 1}} \mu_{j \rightarrow i, 1}, \ldots, \sum_{j \in \mathcal{N}_{i, R}} \mu_{j \rightarrow i, R}\right)\right]

Where accum denotes an accumulation operation such as stack, concatenation and sum. We obtain the final embedding as follows (Dense Layer):

u_i = \sigma(Wh_i)

###### Bilinear Decoder

p\left(\check{M}_{i j}=r\right)=\frac{e^{u_{i}^{T} Q_{r} v_{j}}}{\sum_{s \in R} e^{u_{i}^{T} Q_{s} v_{j}}}

Where Q_r is a trainable parameter matrix of shape E \times E. The predicted rating is computed as:

\check{M}_{i j}=g\left(u_{i}, v_{j}\right)=\mathbb{E}_{p\left(\check{M}_{i j}=r\right)}[r]=\sum_{r \in R} r p\left(\check{M}_{i j}=r\right)

###### Model training

Loss function:

\mathcal{L}=-\sum_{i, j ; \Omega_{i j}=1} \sum_{r=1}^{R} I\left[r=M_{i j}\right] \log p\left(\check{M}_{i j}=r\right)

Important tricks: Node Dropout, Mini-batching, Sparse matrix multiplications, weight sharing (Same weight )

###### Input feature representation and side information

Q: The author reckoned that feeding the content information directly into the graph convplution layer leads to a severe bottleneck of information flow.

Set a separate processing channel intpo the dense hidden layer:

u_i = \sigma(Wh_i + W_2^ff_i), \quad f_i =\sigma(W_1^fx_i^f + b)

Q: The input feature matrix is chosen as an identity matrix, with a unique one-hot vector in the graph.

Cold start analysis: compare performance through controlling the N_r(Sparsity).

###### Conclusion
• Introduce a graph convolutional matrix completion(GC-MC).
• Naturally generalize to include side information for both users and items.
###### Future work
• Extend the model to large-scale multi-modal data
• Combined with other CNN frameworks
• Attention mechanism
• Develop efficient approximate schemes such as subsampling local neighborhoods to address scalability.

### INDUCTIVE MATRIX COMPLETION BASED ON GRAPH NEURAL NETWORKS

To make matrix completion inductive, most previous works use content(side information) to make prediction.

Problems: High-quality content is noe always available and can be hard to extract

Conventional methods: low-rank rating matrix decomposition. In IMC: r_{ij}=x_i^T Q y_j, where Q is a learnable matrix modeling the feature interactionms. IMC methods have strong constraints on the content quality.

In 2018, Hartford propose exchangeable matrix layers which do not rely on side information while needing the whole rating matrix as the input, which tends to lead to a scalability challenges for practical datasets.

In this paper, a novel inductive matrix completion method that does not use any content. The key component is local graph pattern. Predicting the unknown ratings converts equivalentl to predicting labeled links in the corresponding bipartite graphs.

IGMC uses a graph-level GNN to learn representations for subgraphs.

##### IGMC
• Enclosing subgraph extraction

BFS procedure for extracting h-hop enclosing subgraphs:

• Node labeling

Assign an integer to every node in the subgraph. The purpose is to mark different roles for nodes.

• target user and target item

• User-type nodes and item-type nodes

0, 1 -> 2i, 2i+1

• IGMC purely relies on graph patterns within local enclosing subgraphs. -> inductive (compared to GC-MC 2018, which is a transductive method)
• Graph neural network architecture

Two key components:

• Message passing layers: extract a feature vector for each node in the subgraph.

R-GCN:

$$\mathbf{x}_{i}^{l+1}=\mathbf{W}_{0}^{l} \mathbf{x}_{i}^{l}+\sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_{r}(i)} \frac{1}{\left|\mathcal{N}_{r}(i)\right|} \mathbf{W}_{r}^{l} \mathbf{x}_{j}^{l}$$

$$\mathbf{h}_{i}=\operatorname{concat}\left(\mathbf{x}_{i}^{1}, \mathbf{x}_{i}^{2}, \ldots, \mathbf{x}_{i}^{L}\right)$$

• A pooling layer: summarize a subgraph representation from node features

\mathbf{g}=\operatorname{concat}\left(\mathbf{h}_{u}, \mathbf{h}_{v}\right), \quad
\hat{r}=\mathbf{w}^{\top} \sigma(\mathbf{W} \mathbf{g})

• Model training

Loss function:

$$\mathcal{L}=\frac{1}{|{(u, v) \mid \mathbf{\Omega}_{u, v}=1}|} \sum_{(u, v): \boldsymbol{\Omega}_{u, v}=1}\left(R_{u, v}-\hat{R}_{u, v}\right)^{2}$$

$$\mathcal{L}_{\mathrm{ARR}}=\sum_{i=1,2, \ldots,|\mathcal{R}|-1}\left\Vert\mathbf{W}_{r_{i+1}}-\mathbf{W}_{r_{i}}\right\Vert_{F}^{2}$$

$$\mathcal{L}_{\mathrm{final}} = \mathcal{L} + \lambda \mathcal{L}_{\mathrm{ARP}}$$

$\mathcal{L}_{\mathrm{ARR}}$ encourages rating adjacent to each other to have similar parameter matrices.

• Comparison to GC-MC

• IGMC uses a graph-level GNN to map the enclosing subgraph whereas it has a higher complexity.

• GC-MC uses a node-level GNN on the bipartite graph G to learn embeddings to predict the rating. The main drawback is that it fails to model interactions and correspondences between the nodes due to the independent consideration for nodes.

### Neural Graph Collaborative Filtering

#### Neural Graph Collaborative Filtering

Drawback of the previous learning-based models: collaborative signal is not encoded in the embedding process.

A new recommendation framework is proposed: NGCF.

Two key components in learnable CF models: 1. embedding 2. interaction modeling

For CF, these methods are not sufficient to yield satisfactory embeddings for CF. The key reason lies in collaborative signal!

collaborative signal: reveal the behavioral similarity between users or items

To be more specific, interaction information tends to be neglected when embedding the nodes while employed to define the objective function for model training.

Framework: embedding layer -> multiple embedding propagation layers -> prediction layer

​ initialized state -> refined embedding ->

• First-order propagation:
• message construction:

\mathbf{m}_{u \leftarrow i}=\frac{1}{\sqrt{\left|\mathcal{N}_{u} | \mathbf{N}_{i}\right|}}\left(\mathbf{W}_{1} \mathbf{e}_{i}+\mathbf{W}_{2}\left(\mathbf{e}_{i} \odot \mathbf{e}_{u}\right)\right)

• message aggregation:

\mathbf{e}_{u}^{(1)}=\text { LeakyReLU }\left(\mathbf{m}_{u \leftarrow u}+\sum_{i \in \mathcal{N}_{u}} \mathbf{m}_{u \leftarrow i}\right)

where: m_{u \leftarrow u}=W_1e_{u}

• High-order propagation:

High-order connectivities are crucial to encode the collaborative signal to estimate the relevance score…

In the l-th step, the representation of user u is recursively formulated as:

\mathbf{e}_{u}^{(l)}=\text { LeakyReLU }\left(\mathbf{m}_{u \leftarrow u}^{(l)}+\sum_{i \in \mathcal{N}_{u}} \mathbf{m}_{u \leftarrow i}^{(l)}\right)

\mathbf{m}_{u \leftarrow i}^{(l)}=p_{u i}\left(\mathbf{W}_{1}^{(l)} \mathbf{e}_{i}^{(l-1)}+\mathbf{W}_{2}^{(l)}\left(\mathbf{e}_{i}^{(l-1)} \odot \mathbf{e}_{u}^{(l-1)}\right)\right)

\mathbf{m}_{u \leftarrow u}^{(l)}=\mathbf{W}_{1}^{(l)} \mathbf{e}_{u}^{(l-1)}

Matrix form:

\mathrm{E}^{(l)}=\text { LeakyReLU }\left((\mathcal{L}+\mathrm{I}) \mathrm{E}^{(l-1)} \mathbf{W}_{1}^{(l)}+\mathcal{L} \mathrm{E}^{(l-1)} \odot \mathrm{E}^{(l-1)} \mathbf{W}_{2}^{(l)}\right)

concatenating the representations from different layers to the final embedding:

\mathbf{e}_{u}^{*}=\mathbf{e}_{u}^{(0)}\Vert\cdots\Vert \mathbf{e}_{u}^{(L)}, \quad \mathbf{e}_{i}^{*}=\mathbf{e}_{i}^{(0)}\Vert\cdots\Vert \mathbf{e}_{i}^{(L)}

Finally, we conduct the inner product to estimate the user’s preference towards the target item:

\hat{y}_{\mathrm{NGCF}}(u, i)=\mathrm{e}_{u}^{* \top} \mathrm{e}_{i}^{*}

Optimization -> BPR Loss

\text { Loss }=\sum_{(u, i, j) \in O}-\ln \sigma\left(\hat{y}_{u i}-\hat{y}_{u j}\right)+\lambda|\Theta|_{2}^{2}

Performance

What are the recall and ndcg?