Gnn 1.4

pageRank

Graph as Matrix

Link Analysis Algorithms

we will cover the following link Analysis approaches to compute the importance of nodes in a graph

  • PageRank
  • Personalized PageRank(PPR)
  • Random Walk with Restarts

PageRamk:Matrix Formulation

  • Stochastic adjacency Matrix M

  • Rank vector r

Recall Eigenvector of A Matrix

The Stationary Distribution

PageRank : How to solve

  • Given a graph with n nodes,we use an iterative procedure
    • Assign each node an initial page rank -Repeat until convergence ()
  • Power Iteration Method -Given a web graph with N nodes,where the nodes are pages and edges are hyperlinks.
    • Power iteration:a simple iterative scheme
  • Problem
    • Some pages are dead ends
    • Spider traps
  • Solution to Spider Traps
    • Solution for spider traps: At each time step,the random surfer has two options
      • with prob.a ,follow a link at random
      • with prob.1-a, jump to a random page
      • Common values for a are in the range 0.8 to 0.9
    • Surfer will teleport out of spider trap within a few time steps
  • Why teleports Solve the problem Why are dead-ends and spider traps a problem and why do teleports solve the problem? -Spider-trap are not a problem ,but with traps PageRank scores are not what we want - Solution : Never get stuck in a spider trap by teleporting out of it in a finite number of steps.
    • Dead-ends are a problem
      • The matrix is not column stochastic so our initial assumptions are not met
      • Solution : Make matrix column stochastic by always teleporting when there is nowhere else to go
  • Solution : Random Teleports PageRank equation

The Google Matrix

Random Walk with Restarts and Personalized PageRank

  • PageRank:
    • Ranks nodes by “importance”
    • Teleports with uniform probability to any node in the network
  • Personalized PageRank:
    • Ranks proximity of nodes to the teleport nodes S
  • Proximity on graphs:
    • Q: What is most related item to Item Q?
    • Random Walks with Restars -Teleport back to the starting nodes: S = {Q}
  • Idea : Random Walks
    • Idea
      • Every node has some importance
      • Importance gets evenly split among all edges and pushed to the neighbors
    • Given a set of QUERY_NODES ,we simulate a random walk:
      • Make a step to a random neighbor and record the visit
      • With probability ALPHA, restart the walk at one of QUERY_NODES
      • The nodes with the highest visit count have highest proximity to the QUERY_NODES

Matrix Factorization and Node Embeddings

Embeddings & Matrix Factorization

  • Recall: encoder as an embedding lookup

  • Connection to Matrix Factorization

  • Random Walk-based Similarity