each chunkserver can have multiple chunks, each chunk have lots of files
Read: client master => namespace => find the chunkserver for client side chunkserver: Provide data for client; communicate with the master thru heart signals
System architecture master + workers
MapReduce: 6 steps
input
split (for worker)
map, Map (key, value)
shuffle (sorted/hash across worker, group key)
reduce, Reduce (key, value)
output
Q. how to use MapReduce to count?
Q. how to use MapReduce to calculate inverted index?
Q. how to use MapReduce to group anagram?
Evolve
Network
storage
Generate a token in a fixed rate, put the token in a bucket until it fill when there is a coming request, fetch the token
Network
Trie Tree: look up time: O(length(word) + k)
SQL:
SELECT NAME FROM table WHERE name like ${query}% ORDER BY NAME
Browser -> Local cache
|
Aggregator (Dispather) --> Cahce
/ \
Personal Ranking Golbal Ranking
Fanout + social graph Push Timeline lists
TimeLine lists have all the following post
Pull Feed lists Each person write to its own feedlists. Followers then pull from their following feedlists
DAU
Only consider active users
Timline Builder
build timeline with UserID integrate feeds to generate timeline
GET(userIDList)
GET(userIDList, k), # k: top k users, lazy load
GET(userIDList, k, endTime, beginTime)