CC BY 4.0 (除特别声明或转载文章外)
如果这篇博客帮助到你,可以请我喝一杯咖啡~
pageRank
Graph as Matrix
Link Analysis Algorithms
we will cover the following link Analysis approaches to compute the importance of nodes in a graph
- PageRank
- Personalized PageRank(PPR)
- Random Walk with Restarts
Links as Votes
PageRamk:Matrix Formulation
-
Stochastic adjacency Matrix M
-
Rank vector r
Recall Eigenvector of A Matrix
The Stationary Distribution
PageRank : How to solve
- Given a graph with n nodes,we use an iterative procedure
- Assign each node an initial page rank -Repeat until convergence ()
- Power Iteration Method
-Given a web graph with N nodes,where the nodes are pages and edges are hyperlinks.
- Power iteration:a simple iterative scheme
- Problem
- Some pages are dead ends
- Spider traps
- Solution to Spider Traps
- Solution for spider traps: At each time step,the random surfer has two options
- with prob.a ,follow a link at random
- with prob.1-a, jump to a random page
- Common values for a are in the range 0.8 to 0.9
- Surfer will teleport out of spider trap within a few time steps
- Solution for spider traps: At each time step,the random surfer has two options
- Why teleports Solve the problem
Why are dead-ends and spider traps a problem and why do teleports solve the problem?
-Spider-trap are not a problem ,but with traps PageRank scores are not what we want
- Solution : Never get stuck in a spider trap by teleporting out of it in a finite number of steps.
- Dead-ends are a problem
- The matrix is not column stochastic so our initial assumptions are not met
- Solution : Make matrix column stochastic by always teleporting when there is nowhere else to go
- Dead-ends are a problem
- Solution : Random Teleports PageRank equation
The Google Matrix
Random Walk with Restarts and Personalized PageRank
- PageRank:
- Ranks nodes by “importance”
- Teleports with uniform probability to any node in the network
- Personalized PageRank:
- Ranks proximity of nodes to the teleport nodes S
- Proximity on graphs:
- Q: What is most related item to Item Q?
- Random Walks with Restars -Teleport back to the starting nodes: S = {Q}
- Idea : Random Walks
- Idea
- Every node has some importance
- Importance gets evenly split among all edges and pushed to the neighbors
- Given a set of QUERY_NODES ,we simulate a random walk:
- Make a step to a random neighbor and record the visit
- With probability ALPHA, restart the walk at one of QUERY_NODES
- The nodes with the highest visit count have highest proximity to the QUERY_NODES
- Idea
Matrix Factorization and Node Embeddings
Embeddings & Matrix Factorization
-
Recall: encoder as an embedding lookup
-
Connection to Matrix Factorization
-
Random Walk-based Similarity