“2020集智凯风研读营”的版本间的差异

来自集智百科
跳转到: 导航搜索
Diffusion
第99行: 第99行:
 
''this paper shows that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings without human labeling or supervision. We demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery.''
 
''this paper shows that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings without human labeling or supervision. We demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery.''
  
The geodesic distance of ideas
+
=====The geodesic distance of ideas=====
we can see the idea as the city in Brockmann's model. Scientific exploration will be the human being's migration between cities. Can we think that the difference between scholar's generation, which is the main traditional metrics of scientific innovation and spreading, is a wrong perspective? Some scholars live in the same time with others but do advanced research(eg, pioneer of quantum mechanism active in 1920 s). some scholars live now but still cites and talk about old knowledge, that is out of step with the surrounding people. though every theroy will own its life circles as well as the active time.  However, it cannot explain why scholars used to find the same idea across time and disciplines, as multiple discoveries proposed by Merton. maybe the correct approach to learning idea is to construct their knowledge map( or networks), each node is an idea, the distance between any two nodes, is not time bu the "effective distance". The number of scholars, that travel from one idea to another idea traced by the geodesic distance of ideas, represent the difficulty of thinking. We can use word2vec to embed the idea, or we can suppose the article is the mini element of ideas, one article can only contain one idea. According to the knowledge graph of Wikipedia, we can identify the idea( or claim, the triple-tuple including head, relation and tail) of a paper. And clustering these articles by the claim. We can trace a scholar's career in the idea map and measure the exploration ability of the scholars.
+
 
 +
we can see the idea as the city in Brockmann's model. Scientific exploration will be the human being's migration between cities. Can we think that the difference between scholar's generation, which is the main traditional metrics of scientific innovation and spreading, is a wrong perspective? Some scholars live in the same time with others but do advanced research(eg, pioneer of quantum mechanism active in 1920 s). some scholars live now but still cites and talk about old knowledge, that is out of step with the surrounding people. though every theroy will own its life circles as well as the active time.  However, it cannot explain why scholars used to find the same idea across time and disciplines, as multiple discoveries proposed by Merton. maybe the correct approach to learning idea is to construct their knowledge map( or networks), each node is an idea, the distance between any two nodes, is not time bu the "effective distance". The number of scholars, that travel from one idea to another idea traced by the geodesic distance of ideas, represent the difficulty of thinking. We can use word2vec to embed the idea, or we can suppose the article is the mini element of ideas, one article can only contain one idea. According to the knowledge graph of Wikipedia, we can identify the idea( or claim, the triple-tuple including head, relation, and tail) of a paper. And clustering these articles by the claim. We can trace a scholar's career in the idea map and measure the exploration ability of the scholars.
  
 
====Othoganality ====
 
====Othoganality ====

2020年6月10日 (三) 09:55的版本

目录

主题:面向复杂系统的人工智能

“网络、几何与机器学习”研读营是由集智俱乐部主办,凯风基金会资助的“面向复杂系统的人工智能研究”系列活动的第三期。我们计划将于2019年8月举行的为期5天的前沿文献研读、讨论的活动,主题范围涵盖:复杂网络、统计物理、量子物理与机器学习。其目的是为了从这些前沿科学领域获得新的研究灵感以及促进集智科学家成员之间的彼此互动、交流,从而孕育全新的科研思想。

背景

网络、几何与机器学习研讨会旨在把握当前前沿物理学与人工智能的发展趋势与本质。一方面,复杂网络模型可以用于理解时空的本源,也可以构建复杂的神经网络,模拟大脑的思维。另一方面,网络是几何学的进一步延伸与拓展,我们利用几何化的思路来抓住网络背后的简单原理,机器学习则是一种必要的技术。 传统的社会科学研究的一个主要目标是模仿自然科学把社会活动及其背后的原理运用数学进行定量化。近年来在科学界掀起的注重关系而非实体的思潮可能给这种社会科学-自然科学的交融的产生很大的影响,从而产生新的研究范式。这种范式注重数据分析和社会活动所依托的复杂网络,运用近年来在人工智能领域很有可能在不远的将来颠覆以往的社会科学的研究方式。因此,本研讨会的研讨结果将有可能对社会科学研究产生重要的作用。

本届研读营是对2016年第一届研读营的延伸讨论,主题涉及:张量网络、深度学习、消息传播算法等

参加人员

  • 张江,北京师范大学系统科学学院教授,集智俱乐部创始人、集智AI学园创始人,研究兴趣包括:复杂系统、图网络、集体注意力流、流网络、异速生长律等。
  • 张潘,中国科学院理论物理研究所副研究员,集智科学家,研究方向为统计物理与复杂系统,具体来说是用统计物理中的一些理论如自旋玻璃理论,副本对称破缺理论研究应用数学,网络和计算机科学中的一些问题。张潘的研究兴趣广泛,涉及物理,统计,网络和机器学习的很多方面并且在不断地拓宽自己的研究领域。
  • 尤亦庄,加州大学圣地亚哥分校物理系助理教授,集智科学家,主要研究领域是量子多体物理,关注集体行为导致的演生现象和临界现象。对信息论(特别是量子信息),复杂系统,人工智能等领域也很感兴趣。
  • 吴令飞,匹兹堡大学计算与信息学院助理教授,集智俱乐部核心成员、集智科学家。研究兴趣:思想的几何(the geometry of thinking)及其在人类知识与技能的优化组合上的应用,包括科学学(science of science),技能科学(science of skills),团队科学(science of teams),未来工作(future of work)等方向。

基本信息

  • 时间:待定
  • 地点:待定

日程安排

待定

研讨主题

此处写各位老师希望解读的论文,考虑到宣传的需要,希望老师可以在每篇论文下附上200-300字的简要介绍。

建议这次研读营主题定为:“面向复杂系统的人工智能”

分主题:

  • 复杂系统自动建模
  • 因果推断方法
  • 技能、职业与社会分工的计算社会学

...

复杂系统的自动建模

  • Alvaro Sanchez-Gonzalez,Nicolas Heess,Jost Tobias Springenberg.et al.: Graph networks as learnable physics engines for inference and control ,arxiv,2018

这篇文章是用图网络方法进行多体系统动力学学习以及控制的经典论文

  • Thomas Kipf,Ethan Fetaya,Kuan-Chieh Wang.et al.: Neural Relational Inference for Interacting Systems ,arXiv:1802.04687, 2018.

这篇文章首次将显示地学习网络结构与系统的动力学规则结合在了一起。

  • Seungwoong Ha,Hawoong Jeong: Towards Automated Statistical Physics : Data-driven Modeling of Complex Systems with Deep Learning ,arxiv,2020

该篇将NLP中的Transformer模型中的自注意力机制应用到了多体复杂系统中的自动建模问题中来。可以学习动态的网络结构以及动力学。

  • Danilo Jimenez Rezende Shakir Mohamed: Variational Inference with Normalizing Flows, arXiv:1505.05770v6

这篇文章提出了一种新型梯度计算方法,能够更加方便、快速地对概率密度函数进行梯度计算,从而进行变分推断,目前几乎已经成为了动力学学习中的一种必备方法。

  • Fan Yang†, Ling Chen∗†, Fan Zhou†, Yusong Gao‡, Wei Cao:RELATIONAL STATE-SPACE MODEL FOR STOCHASTIC MULTI-OBJECT SYSTEMS, arXiv:2001.04050v1

这篇文章提出了一种基于状态空间的随机多体系统自动学习建模方法。

  • Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud: Neural Ordinary Differential Equations, arXiv:1806.07366v5

这篇文章首次提出了运用最优控制原理可微分地求解常微分方程的方法,并将深度网络连续化,并视作一种动力系统,因此对深度网络的训练也被转化为一种常微分方程的求解问题。

  • Michael John Lingelbach, Damian Mrowca, Nick Haber, Li Fei-Fei, and Daniel L. K. Yamins: TOWARDS CURIOSITY-DRIVEN LEARNING OF PHYSICAL DYNAMICS, “Bridging AI and Cognitive Science” (ICLR 2020)

这是一篇提出了让机器主动干扰物理系统,从而更有效地学习物理体系规则的人工智能系统。

  • Chengxi Zang and Fei Wang: Neural Dynamics on Complex Networks, AAAI 2020

AAAI 2020的best paper,将Neural ODE与图网络结合针对复杂网络的一般的动力学西问题,利用最优控制原理进行求解。该文还将半监督节点分类问题也转化为最优控制问题,从而取得了显著的效果。

Word2Vec内蕴的几何

(Yiling, hauchuang, lingfei 共同编辑)

Orders of associations (1st and 2nd orders)

Word2vec is dual-embeddings. Each word will have two embeddings (vector representations), including term embedding and context embedding. There are two frameworks, CBOW (many-to-one) and Skip-gram Negative Sampling (SGNS, one-to-many). For CBOW, term and context embeddings correspond to IN and OUT matrices, and it is reversed from SGNS. Meanwhile, term-context embedding (T_iC_j) in SGNS implicitly models pointwise mutual information (PMI), which is the collocation between items (two items co-occur more likely than random). Our assumption is that term-term embedding (T_iT_j) models the substitution between items (how similar or exchangeable two items are, which can be measured by the Jason-Shannon entropy of item context distributions).

1. Nalisnick, E., Mitra, B., Craswell, N., & Caruana, R. (2016, April). Improving document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 83-84).

This paper presents how IN-IN and IN-OUT vector cosine similarities models collocative and substitutive word pairs.

2. Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177-2185).

This paper proves that term-context embedding (T_iC_j) in SGNS implicitly models pointwise mutual information (PMI).

3. Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225.

This paper proposes that the term-term vector cosine similarity model 2nd order association and the term-context vector cosine similarity model 1st order association, and suggests that adding these two vectors to obtain a combined vector improves the performance of word2vec on certain NLP tasks.

4. Rapp, R. (2002, August). The computation of word associations: comparing syntagmatic and paradigmatic approaches. In Proceedings of the 19th international conference on Computational linguistics-Volume 1 (pp. 1-7). Association for Computational Linguistics.

This paper called 1st and 2nd order associations "syntagmatic" and "paradigmatic" relations, respectively, following the convention created by Ferdinand de Saussure (the founding father of linguistics). This paper also proposes to measure 1st order association by co-occurrence and 2nd order association by comparing context word distribution similarity.

Diffusion

word2vec is a diffusion model. This explains why it predicts the diffusion of collective attention in search of scientific knowledge. Base on the duality between dynamics on networks (Newtonian) and geometry of networks(Einsteinian), we can assume that for all network diffusion models with PMI on edges as "geo-distance", we can develop their word2vec/representative learning/manifold learning versions (actually this paper defines "effective distance" in a similar way as PMI).


Brockmann, D., & Helbing, D. (2013). The hidden geometry of complex, network-driven contagion phenomena. science, 342(6164), 1337-1342.

This paper analyzed disease spread via the “effective distance” rather than geographical distance, wherein two locations that are connected by a strong link are effectively close. The approach was successfully applied to predict disease arrival times or disease sources.

Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., ... & Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95-98.

this paper shows that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings without human labeling or supervision. We demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery.

The geodesic distance of ideas

we can see the idea as the city in Brockmann's model. Scientific exploration will be the human being's migration between cities. Can we think that the difference between scholar's generation, which is the main traditional metrics of scientific innovation and spreading, is a wrong perspective? Some scholars live in the same time with others but do advanced research(eg, pioneer of quantum mechanism active in 1920 s). some scholars live now but still cites and talk about old knowledge, that is out of step with the surrounding people. though every theroy will own its life circles as well as the active time. However, it cannot explain why scholars used to find the same idea across time and disciplines, as multiple discoveries proposed by Merton. maybe the correct approach to learning idea is to construct their knowledge map( or networks), each node is an idea, the distance between any two nodes, is not time bu the "effective distance". The number of scholars, that travel from one idea to another idea traced by the geodesic distance of ideas, represent the difficulty of thinking. We can use word2vec to embed the idea, or we can suppose the article is the mini element of ideas, one article can only contain one idea. According to the knowledge graph of Wikipedia, we can identify the idea( or claim, the triple-tuple including head, relation, and tail) of a paper. And clustering these articles by the claim. We can trace a scholar's career in the idea map and measure the exploration ability of the scholars.

Othoganality

sparse coding 稀疏表示

\mathcal{L}_{\text{sc}} = \underbrace{||WH - X||_2^2}_{\text{reconstruction term}} + \underbrace{\lambda  ||H||_1}_{\text{sparsity term}}

Sparse coding is a representation learning method that aims at finding a sparse representation of the input data (also known as sparse coding) in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

Arora, Sanjeev, et al. "Linear algebraic structure of word senses, with applications to polysemy." Transactions of the Association for Computational Linguistics 6 (2018): 483-495.. This paper shows that multiple word senses reside in linear superposition within the word embedding and simple sparse coding can recover vectors that approximately capture the senses. ,A novel aspect of the technique is that each extracted word sense is accompanied by one of about 2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense. Discourse atoms can be of independent interest, and make the method potentially more useful. Empirical tests are used to verify and support the theory.

Honglak Lee et al, Efficient sparse coding algorithms,

主办方介绍

集智俱乐部

集智俱乐部,英文名:Swarma Club,成立于 2003 年,是一个从事学术研究、享受科学乐趣的探索者的团体,也是国内最早的研究人工智能、复杂系统的科学社区。它倡导以平等开放的态度、科学实证的精神,进行跨学科的研究与交流,力图搭建一个中国的 “ 没有围墙的研究所 ”。


凯风基金会

凯风公益基金会于2007年3月设立,是国家民政部主管的非公募公益基金会。也是首批由企业发起、国家民政部批准设立,并由国家民政部作为上级主管部门的公益基金会。凯风公益基金会主要通过机构合作方式,对在学术研究、政策研究、教育和公益实践方面获得重要成果、具备实力和潜力的学术精英和公益精英,进行资助和奖励,进而达到提升公共福利、增加公众利益、传播公益思想的目的。凯风公益基金会的战略合作伙伴包括国内外顶尖的综合性大学和研究机构、专业艺术院校、国内外一流的NGO。


2020.05.29 第二次讨论

参会人员:张江,尤亦庄,张潘,吴令飞,王婷

主题:确定此次研读营的主要时间,举行方式,主题。

成果:

  • 1、确定研读营的时间为8月份
  • 2、流程:先举办读书会,研讨相关主题,后续研读营和公开活动具体形式暂未确认
  • 3、确定研读营的主题(类似面向复杂系统的人工智能),里面的子主题可包括:动力学自动学习,因果推断,NLP相关技术等,可以都先列出来,后续归纳主题。

需要解决的问题:

  • 1、思考根据老师的算力和人力的需求,如何更好的使用研读营的钱促进科研的进展,提升科研效率。
  • 2、请各位老师将推荐的论文更新到wiki上(论文过于专业,希望老师能够简单写一段200-300字左右的简要介绍,方便筹备宣传)



2020.05.21 第一次讨论

参会人员:张江,王婷,张倩,刘培源

主题:探讨此次研读营的主要时间,参与人员,举行方式。

成果:根据国内外的情形,探讨了一些可行的措施,约张潘,张江,尤亦庄,吴令飞的时间探讨相关主题问题。

个人工具
名字空间
操作
导航
工具箱