KnowledgeHierarchy
来自集智百科
目录 |
RQ: what is a good question
Zhihu tag tree
知乎数据抓取 https://github.com/7sDream/zhihu-oauth
https://www.zhihu.com/topic/19776749/organize/entire#anchor-children-topic
import pandas as pd # https://github.com/Lynxmac/zhihu_topic_tree/ with open('/Users/datalab/bigdata/zhihu_topic_tree.txt', 'r', encoding='gb18030') as f: lines = f.readlines() df_list = [] for index, line in enumerate(lines): a = line.rstrip().split('─') hierarchy = len(a[0]) if index > 312482: hierarchy -= 1 sign = a[0][-1] b = a[-1].split('_', maxsplit = 2) ids = b[0] name = '_'.join(b[1:]) df_list.append([index, hierarchy, sign, ids, name, line]) df = pd.DataFrame(df_list, columns = ['loc', 'hierarchy', 'sign', 'id', 'name', 'line']) # clean the hierarchy variable new_hierarchy = [] for i in df.hierarchy: if i % 3 ==1: new_hierarchy.append(i) elif i%3 ==2: new_hierarchy.append(i-1) elif i%3 ==0: new_hierarchy.append(i-2) df['new_hierarchy'] = new_hierarchy df['good_hierarchy'] = [(i-1)/3 + 1 for i in new_hierarchy] # add missing id for level 1 topics id_list = [(29855, 19778298, '「形而上」话题'), (178555,19560891,'产业'), (190122, 19618774, '学科'), (223661, 19778287, '实体'), (312482, 19778317,'生活、艺术、文化与活动')] for i in id_list: df['id'][i[0]] = i[1] df['name'][i[0]] = i[2] # delete wrong ids error_id_index = [] for k, i in enumerate(df.id): try: j = int(i) except: error_id_index.append(k) len(error_id_index) df = df.drop(error_id_index) df['id'] = [int(i) for i in df.id] # construct network # it takes around 3 hours # search for the nearest high level neighbor and link together from flownetwork.flownetwork import flushPrint net = [] for i in df.index: if i%100 ==0: flushPrint(i) ids = df['id'][i] hierarchy = df['good_hierarchy'][i] loc = df['loc'][i] if hierarchy == 1: net.append(('root', ids)) else: upper_hierarchy = hierarchy - 1 upper_nodes = df[df['good_hierarchy'] == upper_hierarchy] upper_node_loc = [j for j in upper_nodes['loc'] if (loc - j) > 0][-1] upper_node_id = df['id'][df['loc'] == upper_node_loc] net.append(( int(upper_node_id), ids))
StackOverflow tag network
StackOverflow using tags to organize raised questions, see the tags here: https://stackoverflow.com/tags
Given a tag, such as javascript, you can see the tagged questions: https://stackoverflow.com/questions/tagged/javascript
Note that, stackoverflow also demonstrates the related tags for a tag. For example, the javascript tag is related to
Related Tags
- jquery × 518122
- html × 317264
- css × 146448
- angularjs × 116430
- php × 111251
- node.js × 92095
- ajax × 88493
- json × 58117
- html5 × 51808
- reactjs × 51308
- arrays × 49362
- asp.net × 31550
- regex × 28362
- twitter-bootstrap × 24516
- angular × 24174
- c# × 23339
- forms × 22346
- google-chrome × 21292
- d3.js × 21232
- dom × 19442
- google-maps × 18658
- typescript × 18244
- java × 17724
- canvas × 17054
- express × 16103
Quora
https://www.quora.com/topic/Computer-Science
https://github.com/tapaswenipathak/pyQTopic/blob/master/qtopic/pyqtopics.py
https://github.com/csu/quora-api