流网络根据流距离嵌入空间

来自集智百科
跳转到: 导航搜索

我们的目的是根据流网络流距离平均首达时间矩阵L)矩阵,将整个网络嵌入到一个欧氏空间里面去,使得任意两点的欧氏距离尽量靠近他们的流距离.[1][2]

目录

原理解析

给定某流网络,假想网络中各节点两两之间用弹簧相互连接,嵌入要做的即调整弹簧,使其松弛长度等于流距离。初始时各节点在欧式空间的坐标完全随机,以此我们可以计算节点两两之间的欧式距离。如果两节点间的流距离大于相应的欧式距离,表示弹簧被压缩了,那么我们应该拉伸这两节点间的弹簧;反之,我们应该压缩该弹簧。具体拉伸或者压缩多少取决于弹簧系数k及流距离和欧式距离的偏差x,即形变量=k*x。我们用该形变量去更新相应的节点欧式位置。反复这一过程,直至收敛。(注:实际运用之,我们常常不考虑流网络的源和汇节点,即流距离矩阵中不需要含有源和汇的项)

嵌入程序(Matlab版本)

初始准备

此处程序没有考虑流矩阵中无穷大的情况

%输入变量流距离矩阵-flow_distance,节点编号-nodes_num,嵌入维数dim

dim=10;
max_distance=max(max(flow_distance));
max_x=max_distance;
max_y=max_x;

%为每个节点生成随机位置

initial_pos=[];
for i=1:length(nodes_num) 
    tmp=[];
    for j=1:dim
    tmp=[tmp max_x-2*max_x*rand(1)];
    end
initial_pos=[initial_pos;tmp];
end

%第一步优化,按照欧氏距离和流距离的差别大小受力

%开始运动

current_pos=initial_pos; 
iterations=1; 
complete_temperature_iterations = 0;
current_target= compute_distance(initial_pos,flow_distance );
basic_size=20;
target=[];
all_temperature=[];
probility=[];
 
while iterations< 200
    iterations
    %-------------产生新的坐标-------------%
    new_pos=[];
    distance_matrix=compute_distance_matrix(current_pos);
    k=1.0000e-0;
    for i=1:size(current_pos,1)
        %计算每一个节点受到的合力
        %计算节点到其他节点距离和流距离的差值
        diff_distance=(distance_matrix(i,:)-flow_distance(i,:))';
        force_direction=-1*(repmat(current_pos(i,:),size(current_pos,1),1)-current_pos);
        force_direction=force_direction./repmat(sqrt(sum(force_direction.^2,2)),1,size(current_pos,2)) ;
        force_direction(isnan(force_direction))=0;
 
        force_vector=diff_distance;
        final_force_vector=force_direction.*repmat(force_vector,1,size(current_pos,2));
        all_force=sum(final_force_vector,1)/length(diff_distance);
        new_pos=[new_pos;current_pos(i,:)+all_force*k];
    end
 
     %-------------比较新的位置与原来的位置-------------%
     current_distance=compute_distance(current_pos,flow_distance);
     new_distance=compute_distance(new_pos,flow_distance);
     diff = abs(current_distance - new_distance);
     target=[target;current_distance];
 
        if complete_temperature_iterations >= 10
            temperature = cooling_rate*temperature;
            complete_temperature_iterations = 0;
        end
       iterations = iterations + 1;
       complete_temperature_iterations = complete_temperature_iterations + 1;
       current_pos=new_pos;
 
end

%第二步优化,按照每个节点失真程度大小受力

%开始运动

initial_pos=new_pos; 
 
iterations=1; 
complete_temperature_iterations = 0;
 
current_target= compute_distance(initial_pos,flow_distance );
cooling_rate=0.9;
basic_size=20;
target=[];
all_temperature=[];
probility=[];
while iterations< 200
    iterations
 
    %-------------产生新的坐标-------------%
    new_pos=[];
 
    distance_matrix=compute_distance_matrix(current_pos);
    distortion=distance_matrix./flow_distance-1;
    distortion(isnan(distortion))=0;
    k=30.0000e-0;
    for i=1:size(current_pos,1)
        %计算每一个节点受到的合力
        %计算节点到其他节点距离和流距离的差值
        diff_distance=distortion(:,i);
        force_direction=-1*(repmat(current_pos(i,:),size(current_pos,1),1)-current_pos);
        force_direction=force_direction./repmat(sqrt(sum(force_direction.^2,2)),1,size(current_pos,2)) ;
        force_direction(isnan(force_direction))=0;
        force_vector=diff_distance;
        final_force_vector=force_direction.*repmat(force_vector,1,size(current_pos,2));
        all_force=sum(final_force_vector,1)/length(diff_distance);
        new_pos=[new_pos;current_pos(i,:)+all_force*k];
    end
 
     %-------------比较新的位置与原来的位置-------------%
 
     current_distance=compute_distance(current_pos,flow_distance);
     new_distance=compute_distance(new_pos,flow_distance);
     diff = abs(current_distance - new_distance);
     target=[target;current_distance];
 
 
 
     iterations = iterations + 1;
 
 
     all_temperature=[all_temperature;temperature];
end

%计算拟合效果

%横轴是流距离,纵轴是欧式距离

figure
tmp=bone_L;
tmp(logical(eye(size(bone_L,1))))=1;
tmp_L=flow_distance(find(tmp>0));
tmp_dis=distance_matrix(find(tmp>0));
plot(tmp_L(:),tmp_dis(:),'.')
hold on
plot([0; max([max(tmp_dis(:)) max(tmp_dis(:))])],[0;max([max(tmp_dis(:)) max(tmp_dis(:))])],'r-')

%平均失真程度

distortion=tmp_dis./tmp_L;
distortion(isnan(distortion))=0;
 
distortion(find(distortion<1&distortion>0))=1./distortion(find(distortion<1&distortion>0));
average_distortion=sum(sum(distortion))/length(find(distortion>0))

%中间计算距离的函数

%现在距离与流距离的总差值

function [ target_distance ] = compute_distance(new_pos,flow_distance )
%UNTITLED Summary of this function goes here
%Detailed explanation goes here
new_distance_matrix=compute_distance_matrix(new_pos);
target_distance=sum(sum((new_distance_matrix-flow_distance).^2));
end

%根据节点坐标,计算节点之间距离

function [ distance_matrix] = compute_distance_matrix( nodes )
%UNTITLED3 Summary of this function goes here
%   Detailed explanation goes here
distance_matrix=zeros(size(nodes,1),size(nodes,1));
for i=1:size(nodes,1)
    for j=i+1:size(nodes,1)
    distance_matrix(i,j)=sqrt((nodes(i,1)-nodes(j,1))^2+(nodes(i,2)-nodes(j,2))^2);
    distance_matrix(j,i)= distance_matrix(i,j);
    end
end
flow_distance=distance_matrix;
end

嵌入程序(python版本)

输入参数

def spring_algorithm(_c, _dim):
    pass

_c:去掉源和汇节点的流距离矩阵,且默认已经将流距离无穷大的节点做处理了

_dim:嵌入维数

初始化

# 最大小幅变化次数(可弹性调整)
max_count = 10
 
# 弹簧系数(可弹性调整,一般取在(0,1)之间,且k1>k2)
k1, k2 = [0.8, 0.5]
 
# 阈值(可弹性调整, 依据具体问题而定)
threshold1, threshold2 = [0.05, 0.01]
 
# 各节点初始位置(dim维)
initial_pos = np.random.rand(_c.shape[0], _dim)  # [0, 1)

Step1 按照欧氏距离和流距离的差别大小受力(快速调整)

# step1 按照欧氏距离和流距离的差别大小受力
total_pos_mat = np.mat(initial_pos)
 
t1 = 0
# 连续小幅度变化的次数
small_count1 = 0
# 与真实流距离的误差
error_list = []
# 上一次迭代的总距离误差
last_distance_sum_error = float("inf")
while True:
    print '距离差受力第%d次迭代' % t1
    for _i in range(_c.shape[0]):
        node_i_pos_mat = total_pos_mat[_i]
        repeat_node_i_pos_mat = np.matlib.repmat(node_i_pos_mat, _c.shape[0], 1)  # 重复len(_c)*1次
 
        ##### 节点至其它节点的欧式距离euclidean_distance #####
        distance_offset_mat = repeat_node_i_pos_mat - total_pos_mat  # 当前节点i与其它节点实际位置偏移矩阵
        dis_square = np.multiply(distance_offset_mat, distance_offset_mat)  # 对应元素相乘
        euclidean_distance = np.mat(np.sqrt(np.sum(dis_square, 1)))  # 列向量:当前节点到其余所有节点(包括自己)的欧式距离
 
        # 欧式距离和流距离的偏差行向量,即弹簧形变量x
        diff_distance = np.mat(_c[_i, :] - euclidean_distance.T)
 
        # 防止除以该向量中的0元素出现Nan
        modifier_euclidean_distance = np.ones((_c.shape[0], 1)) - np.sign(euclidean_distance) + euclidean_distance
 
        # 受力方向:单位向量 当前节点距其余节点的距离向量/|当前节点距其余节点的距离向量|
        direction = distance_offset_mat / modifier_euclidean_distance
 
        ##### 距离差受力矢量 F = k*x*direction #####
        force = np.multiply(diff_distance.T, direction) * k1
 
        ##### 位置调整 #####
        sum_force = np.sum(force, 0)
        total_pos_mat[_i] += np.divide(sum_force, _c.shape[0]).astype(np.float64)
 
        ##### 误差项 #####
        node_i_error = np.sum(np.multiply(diff_distance, diff_distance))
        error_list.append(node_i_error)
 
    sum_distance_error = sum(error_list)
    print sum_distance_error
    if abs(last_distance_sum_error - sum_distance_error) <= threshold1:
            small_count1 += 1
            if small_count1 >= max_count:
                break
 
    t1 += 1
    last_distance_sum_error = sum_distance_error
    error_list = []

Step2 按照每个节点失真程度大小受力(微调)

# step2 按照每个节点失真程度大小受力
t2 = 0
# 连续小幅度变化的次数
small_count2 = 0
# 失真度误差
distortion_list = []
# 上一次迭代的总失真度误差
last_sum_distortion_error = float("inf")
while True:
    print '失真度受力第%d次迭代' % t2
    for _i in range(_c.shape[0]):
        node_i_pos_mat = total_pos_mat[_i]
        repeat_node_i_pos_mat = np.matlib.repmat(node_i_pos_mat, _c.shape[0], 1)  # 重复len(_c)*1次
 
        ##### 节点至其它节点的欧式距离euclidean_distance #####
        distance_offset_mat = repeat_node_i_pos_mat - total_pos_mat  # 当前节点i与其它节点实际位置差距矩阵
        dis_square = np.multiply(distance_offset_mat, distance_offset_mat)  # 对应元素相乘
        euclidean_distance = np.mat(np.sqrt(np.sum(dis_square, 1)))  # 列向量:当前节点到其余所有节点(包括自己)的欧式距离
 
        # 防止除以该向量中的0元素出现Nan
        modifier_euclidean_distance = np.ones((_c.shape[0], 1)) - np.sign(euclidean_distance) + euclidean_distance
 
        # 失真度=max(_c[_i] / modifier_euclidean_distance, modifier_euclidean_distance / _c[_i])
        # 这里为方便,只采用 失真度=_c[_i] / modifier_euclidean_distance
        distortion_ratio = _c[_i, :] / modifier_euclidean_distance.T
        modifier_distortion = np.ones((1, _c.shape[0])) - np.sign(distortion_ratio) + distortion_ratio - 1
 
        # 受力方向:单位向量 当前节点距其余节点的距离向量/\当前节点距其余节点的距离向量\
        direction = distance_offset_mat / modifier_euclidean_distance
 
        ##### 失真度受力矢量 #####
        force = np.multiply(modifier_distortion.T, direction) * k2
 
        ##### 位置调整 #####
        sum_force = np.sum(force, 0)
        total_pos_mat[_i] += np.divide(sum_force, _c.shape[0]).astype(np.float64)
 
        ##### 误差项 #####
        node_i_error = np.sum(np.multiply((distortion_ratio - 1), (distortion_ratio - 1)))
        distortion_list.append(node_i_error)
 
    sum_distortion_error = sum(distortion_list)
    print sum_distortion_error
 
    if abs(last_sum_distortion_error - sum_distortion_error) <= threshold2:
            small_count2 += 1
            if small_count2 >= max_count:
                break
 
    t2 += 1
    last_sum_distortion_error = sum_distortion_error
    distortion_list = []
 
return total_pos_mat

测试样例

思路

随机生成n个dim维欧式空间中的点,计算各点间的欧式距离得到n*n维欧式距离矩阵,以此当作c矩阵输入到弹簧算法中,得到n个dim维嵌入向量。若dim>2,则降维到2维,可视化前后两种点坐标表示,比较

程序

n, dim = 10, 2
# n各点初始位置(2维)
initial_pos = np.random.rand(n, dim)  # [0, 1)
plt.subplot(121)
plt.title('original position')
plt.plot(initial_pos[:, 0], initial_pos[:, 1], 'ro')
for i, (_x, _y) in enumerate(zip(initial_pos[:, 0], initial_pos[:, 1])):
    plt.text(_x, _y, i)
 
# 各节点间的欧式距离
edistance_list = []
for i in range(n):
    distance_offset_vec = initial_pos[i] - initial_pos  # 当前节点i与所有节点(包括自己)欧式位置差距
    dis_square = np.multiply(distance_offset_vec, distance_offset_vec)  # 对应元素相乘
    edistance = np.sqrt(np.sum(dis_square, 1)).T.tolist()  # 行向量
    edistance_list.append(edistance)
edistance_matrix = np.mat(edistance_list)
 
# 嵌入(此处采用迭代500次作为终止条件,也可换成采用阈值,但是需要手动调整大小)
embed_vec = spring_algorithm(edistance_matrix, 2)
 
plt.subplot(122)
plt.title('embedding position')
plt.plot(embed_vec[:, 0], embed_vec[:, 1], 'bo')
for i, (_x, _y) in enumerate(zip(embed_vec[:, 0], embed_vec[:, 1])):
    plt.text(_x, _y, i)
 
plt.show()

结果

结果如下:

Figure.png

可以看到原始图和嵌入图非常相似,这证明弹簧算法的正确性。

注意:两幅图之间可能存在平移、旋转、对称线性变换关系,但是这属于正常现象。(要想完全吻合,需要加入摩擦系数)

补充:Deepwalk方式嵌入

请参考Word2Vec与流网络

参考文献

  1. Peiteng Shi, Xiaohan Huang (2015). "A Geometric Representation of Collective Attention Flows". PLOS ONE.
  2. Shavitt Y, Tankel T (2004). "Big-bang simulation for embedding network distances in euclidean space". IEEE/ACM Transactions on Networking (TON).
个人工具
名字空间
操作
导航
工具箱