[READNOTE]
Tracking the dynamics of co-word networks for emerging topic identification
💡 MetaData
| Title | Tracking the dynamics of co-word networks for emerging topic identification |
|---|---|
| Journal | Technological Forecasting and Social Change |
| Authors | Lu Huang; Xiang Chen; Xingxing Ni; Jiarun Liu; Xiaoli Cao; Changtian Wang |
| Pub. date | 2021-09-01 |
| DOI | 10.1016/j.techfore.2021.120944 |
| JINFO | 中科院分区升级版: 管理学1区 影响因子: 10.88 5年影响因子: 10.403 EI: 是 SSCI: Q1 AJG: 3 FMS: B JCI: 2.41 |
| **Abstract | **Identifying emerging topics has been an essential study for nations to develop strategic priorities, for enterprises to create business strategies, and for institutions to define research areas. However, how to characterize emerging topics effectively and comprehensively is still very challenging. This study proposes a framework for identifying emerging topics based on a dynamic co-word network analysis, which integrates a link prediction model with machine learning techniques. Time-sliced co-word networks are weighted according to the frequency of terms’ co-occurrence. A back-propagation neural network is used to forecast a future network by predicting linkages among unconnected nodes based on existing links. Four indicators are then used to sort out potential candidates of emerging topics in the predicted network. A case study on information science demonstrates the reliability of the proposed methodology, followed by subsequent empirical and expert validations. |
📜 研究概况
问题:
对动态共词网络分析,识别新兴主题
现状:
- 新兴主题识别的难点:uncertainty、ambiguity、complexity
- 以往同类研究未考虑网络的动态变化过程,性能有提升空间
路径:
- 生成时序共词网络用于链路预测
- 利用BNN,使用本地特征、路径、随机游走信息增强链路预测性能
- 基于网络拓扑模型量化评价预测的网络
📊 研究细节
-
方法:
-
数据来源:WoS 文本(题摘词)
-
使用VantagePoint工具获取关键词,聚类(clumping)合并同义词
-
时间分片,构建共现频次加权共词网络
-
构建特征:
- 共同邻居数
- 本地路径指数(考虑二跳邻居节点)
- SimR(基于随机游走)
-
构建模型:BNN;二分类;按切片把网络分三部分
$G_S^1$
,
$G_S^2$
,作为训练、验证集、
$G_S^3$
不参与训练
-
未来网络构建:选用最佳模型预测
$G_S^3$
连边,设定阈值判断边是否存在,根据概率生成权重,结合之前所有
$G$
生成未来网络
$G’$
-
识别新兴主题:
-
Smart Local Moving社区发现
-
计算指标
-
novelty(受出现时间早晚影响)
-
growth(词频及其增长率影响)
-
coherence(类内联系程度与类外联系的差距)
-
impact(pagerank)
前两种较显式(Group A),后两种更多基于拓扑结构(Group B)
-
-
识别新兴主题
- 建立B与A的回归(共四组),高于回归线的点视为新兴主题
-
-
评估:
-
$G_S^1$
,
$G_S^2$
、
$G_S^3$
,
$G’$
性能评估
-
-
-
实证:
-
数据:IS领域9源刊论文2009-2018 #9540
-
抽词:4640 distinct terms
-
网络规模在不断扩大,IS领域主题增多
-
调参调参调参,最佳AUC0.965,为
$G_S^3$
生成3525新边,最终生成
$G’$
-
$G’$
4640 terms,49 topics;选出都在回归线上的9个主题视为新兴主题
-
评估:
- 预测性能优于其他;
- 专家评估打分emergence,平均分0.653
- 找新兴主题对应文章case study
-
🚩 主要结论
- 提出了一种动态共词网络分析方法识别新兴主题,实证了可行性和有效性
📌 创新启示
-
文献综述显式分点
🔬 展望思考
-
:“链路预测只能识别新产生的连边,对消失的连边未能识别”——感觉是一个可以研究的点
📜 原文摘录
无