ImportError with gensim: cannot import name utils

If you encounter an error message similar to

gensim ImportError: cannot import name utils

then a likely reason is that there are conflicting copies of NumPy, SciPy, or utils. This is a common case for users of Canopy. A quick fix is as follows

pip uninstall numpy
pip uninstall scipy
pip uninstall utils

We then re-install gensim

pip install --upgrade gensim

Re-launch your analysis environment and you should have no problems with importing gensim.  Hopefully it helps!


Emulate ggplot using plot(): logit regression

I find using ggplot for logistic regression confusing, and I’ve already spent quite some time coding my own template for logit regression in the past, so here’s a quick “fix” to make your simple plots produced by plot() emulate ggplot!

Why would one bother to do this?

Because whoever doesn’t think iris blue (#00BFC4) is pretty has a problem.

The coral-ish color can be defined by col=rgb(248/256, 118/256, 109/256), and the iris blue can be defined by col=rgb(0, 191/256, 196/256).

The figure below was generated by (1) first plotting the raw data of response variable and its covariates while suppressing all labels (specifically, “ann” and “axes”), and (2) then adding the logit plot to the original one using “par(new=TRUE)”

Maybe it saves more time if I just spend ten minutes reading ggplot’s grammar while plotting logit regression…


Figure: Emulating ggplot using plot()

Screen Shot 2016-08-21 at 2.40.37 PM

Text Processing in Python (1)

For a quick guide for installing Beautiful Soup, see here. In this post, I will briefly talk about the codes to process text when we want to use text as data. The Stanford NLP Group has some fantastic resources and packages available here for processing texts.

Preliminary Step:

Launch IDLE and type:

from bs4 import BeautifulSoup

from urllib import urlopen

import os, re

TASK 1: Grabbing Basic Data from Wikipedia

soup = BeautifulSoup(urlopen(‘[type in your html]‘))

bday = soup.find(‘span’, {‘class’: ‘bday’}).text

bplace = soup.find(‘span’, {‘class’: ‘birthplace’}).text




TASK 2: Processing Texts from HTML

soup=BeautifulSoup(urlopen(‘[type in your html]‘).read())

# I use Extension of Military and Economic Aid as the example

data = soup.p.contents[0]

data1 = data.lower()

data2 = re.sub(‘\W’, ‘ ‘, data1)

Meaningful Practice:

Patrick Perry at NYU Stern has processed the raw text of Federalist papers and JSON data file is available here.

Install Beautiful Soup on Mac

This is my (very) first blog post! I am a soon-to-be Ph.D. student in the department of political science at UC San Diego. And I plan to blog about political science, text as data, and China frequently in the future.

This blog post is a quick guide to the people who want to use Beautiful Soup, an awesome Python library, for their scraping projects. In the context of text analysis, Beautiful Soup saves a lot of time when you parse files such as HTML documents.

Here’s how you install Beautiful Soup on your Mac —

0. Use Python 2.7 (instead of 3)

1. Install Beautiful Soup

1.1 Download “beautifulsoup4-4.4.1.tar.gz” from here

1.2 Type Terminal in Spotlight to open Terminal

1.2.1 In Terminal, change working directory, for example, I type

$ cd /Users/Shane/Desktop/beautifulsoup4-4.4.1

1.2.2 In Terminal, type python install

1.3 Launch Python and type

from bs4 import BeautifulSoup