文本处理与情感分析

来自集智百科
跳转到: 导航搜索

调用模块

    import nltk
    import sys
    import os
    import re

定义函数

    #===revised from http://fjavieralba.com/basic-sentiment-analysis-with-python.html====
 
    text = """What can I say about this place. 
    The staff of the restaurant is nice and the eggplant is not bad. 
    Apart from that, very uninspired food, lack of atmosphere and too 
    expensive. I am a staunch vegetarian and was sorely dissapointed 
    with the veggie options on the menu. Will be the last time I visit, 
    I recommend others to avoid."""
 
 
    #1. Tokenize text
 
    sentences = nltk.data.load('tokenizers/punkt/english.pickle').tokenize(text) 
    tokenized_sentences = [nltk.tokenize.TreebankWordTokenizer().tokenize(i) for i in sentences] 
 
    #2. text structure: word, lemma, tag
 
    pos = [nltk.pos_tag(i) for i in tokenized_sentences]
    pos = [[(word, word, [postag]) for (word, postag) in i] for i in pos]
 
 
    #3. define dictionary 
 
    dic ={'awesome': ['positive'], 'superb': ['positive'], 
                'nice': ['positive'], 
                'cool': ['positive'],
                'bad': ['negative'],
              'uninspired': ['negative'],
              'expensive': ['negative'],
              'dissapointed': ['negative'],
              'recommend others to avoid': ['negative']}
 
    def taggingScore(a):
        result=[]
        for word in a:
            word=word.lower()
            if word in dic:
                result.append(dic[word])
        n=0
        if result:
            for i in result:
                if i==['positive']:
                    n+=1
                if i==['negative']:
                    n-=1
        return n
 
    finalScore=0
    for i in tokenized_sentences:
        finalScore+=taggingScore(i)
 
    print finalScore
 
 
    #------word cloud--------------------
 
    from pytagcloud import create_tag_image, make_tags
    from pytagcloud.lang.counter import get_tag_counts
个人工具
名字空间
操作
导航
工具箱