# 移动数据挖掘

## 目录

Radius of gyration or gyradius of a body about an axis of rotation is defined as the radial distance of a point from the axis of rotation at which, if whole mass of the body is assumed to be concentrated, its moment of inertia about the given axis would be the same as with its actual distribution of mass. It is denoted by $R_g$.

Mathematically the radius of gyration is the root mean square distance of the object's parts from either its center of mass or a given axis, depending on the relevant application. It is actually the perpendicular distance from point mass to the axis of rotation.

## Center of Mass

The center of mass is the unique point at the center of a distribution of mass in space that has the property that the weighted position vectors relative to this point sum to zero. In analogy to statistics, the center of mass is the mean location of a distribution of mass in space.

## Doc2Vec

### A system of particles

In the case of a system of particles Pi, i = 1, …, n , each with mass mi that are located in space with coordinates ri, i = 1, …, n , the coordinates R of the center of mass satisfy the condition $\sum_{i=1}^n m_i(\mathbf{r}_i - \mathbf{R}) = 0.$

Solving this equation for R yields the formula $\mathbf{R} = \frac 1M \sum_{i=1}^n m_i \mathbf{r}_i,$

where M is the sum of the masses of all of the particles.

# Knowledge Space

from urlparse import urlparse

# clean url
def urlclean(url):
try:
url = urlparse(url).hostname
if url.replace('.','').isdigit(): return 'none'
else:
if len(url.split('.')) >=2 :
if url[-6:]=='com.cn': return '.'.join(url.split('.')[-3:])
return '.'.join(url.split('.')[-2:])
except:
return 'none'

# Communication Network

## Node Centrality Analysis

Degree, PageRank, Triangle Count

Predict Users' CONSUME_AMT

# Mobility Network

## Jump Size

Preferential Return

## Predictability/Entropy

function E=lzentropy(rd)

n=length(rd);
L=zeros(1,n);
L(1)=1;

for i=2:n

sub=rd(i);

match=rd(1:i-1)==sub;

if all(match==0)==1
L(i)=1;
else
k=1;

while k<i

if i+k>n
L(i)=0;
break
end

sub=rd(i:i+k);

for j=1:i-1

match=rd(j:j+length(sub)-1)==sub;

if all(match==1)==1
break;
end
end

L(i)=length(sub);
if all(match==1)==0
k=i;
end
k=k+1;
end
end
end

E=1/(1/n * sum(L))*log(n);

end

Python Script

def contains(small, big):
for i in range(len(big)-len(small)):
if big[i] == small:
if big[i:i+len(small)] == small:
return True
return False

def contains_sublist(lst, sublst):
n = len(sublst)
return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1))

def actual_entropy(l):
n = len(l)
sequence = [l]
sum_gamma = 0

starttime = time.time()
for i in range(1, n):
if i % 1000 == 0:
print(i)
endtime = time.time()
print(endtime - starttime)
starttime = time.time()

for j in range(i+1, n+1):
s = list(l[i:j])
#             print(list(l[i:j]))
#             print('sequence', sequence, s, '\n')
if contains(s, sequence) != True:
#                 print('gamma_i', len(s), '\n')
sum_gamma += len(s)
sequence.append(l[i])
break

#     print(sum_gamma)
ae = 1 / (sum_gamma / n ) * math.log(n)
return ae