#### Graph Convolutional Matrix Completion

View matrix completion as as link prediction problems on graphs.

Graph auto-encoders for the task of recommendation:

- Encoder: X \in \mathcal{R}^{N \times D} -> f(\cdot, A) -> Z \in \mathcal{R}^{N \times E}
- Decoder: \check{A}=g(Z)

We can train this graph auto-encoder by minimizing the reconstruction error.

###### Graph auto-encoders

Assign seperate processing channels for each edge type(rating type). Specifically, for rating type r, we aggregate the information for user i through the following form (Here other functions can be employed and it is a direction for furture work):

\mu_{j \rightarrow i, r}=\frac{1}{c_{i j}} W_{r} x_{j}

Where c_{ij} is a normalization constant and usually set to |N_i| or \sqrt{|N_i||N_j|}. And then we aggregate the all information with different types into a single vector representation (Convolutional Layer):

h_{i}=\sigma\left[\operatorname{accum}\left(\sum_{j \in \mathcal{N}_{i, 1}} \mu_{j \rightarrow i, 1}, \ldots, \sum_{j \in \mathcal{N}_{i, R}} \mu_{j \rightarrow i, R}\right)\right]

Where *accum* denotes an accumulation operation such as stack, concatenation and sum. We obtain the final embedding as follows (Dense Layer):

u_i = \sigma(Wh_i)

###### Bilinear Decoder

p\left(\check{M}_{i j}=r\right)=\frac{e^{u_{i}^{T} Q_{r} v_{j}}}{\sum_{s \in R} e^{u_{i}^{T} Q_{s} v_{j}}}

Where Q_r is a trainable parameter matrix of shape E \times E. The predicted rating is computed as:

\check{M}_{i j}=g\left(u_{i}, v_{j}\right)=\mathbb{E}_{p\left(\check{M}_{i j}=r\right)}[r]=\sum_{r \in R} r p\left(\check{M}_{i j}=r\right)

###### Model training

Loss function:

\mathcal{L}=-\sum_{i, j ; \Omega_{i j}=1} \sum_{r=1}^{R} I\left[r=M_{i j}\right] \log p\left(\check{M}_{i j}=r\right)

Important tricks: Node Dropout, Mini-batching, Sparse matrix multiplications, weight sharing (Same weight )

###### Input feature representation and side information

**Q:** The author reckoned that feeding the content information directly into the graph convplution layer leads to a severe bottleneck of information flow.

Set a separate processing channel intpo the dense hidden layer:

u_i = \sigma(Wh_i + W_2^ff_i), \quad f_i =\sigma(W_1^fx_i^f + b)

**Q:** The input feature matrix is chosen as an identity matrix, with a unique one-hot vector in the graph.

Cold start analysis: compare performance through controlling the N_r(Sparsity).

###### Conclusion

- Introduce a graph convolutional matrix completion(GC-MC).
- Naturally generalize to include side information for both users and items.

###### Future work

- Extend the model to large-scale multi-modal data
- Combined with other CNN frameworks
- Attention mechanism
- Develop efficient approximate schemes such as subsampling local neighborhoods to address scalability.

## 关于作者