Computational Approaches to Language in Psychology

I’ve been reading pretty widely across the social sciences recently, looking at instances of using computational approaches to text. It seems like this sort of stuff is really sweeping through the political science literature, and I’ve stumbled across quite a few papers in that discipline which I think do a really nice job of explaining the nuts and bolts of applying these techniques. Much of my understanding is thanks to their efforts.

Maybe it’s because I have such an easy time finding this kind of work in political science that the relative lack in psychology is so stunning to me. Not to say it doesn’t exist. There are a couple of groups who seem to have picked some of this up, but my impression is that we haven’t nearly reached the level of adoption which exist in other social science disciplines. What’s going on, psychologists? A few possibilities:

1. I’m wrong

Maybe there are a bunch of people doing this work and I just haven’t seen it. I do not read every paper published in the field, and my impressions of what my colleagues are doing is heavily biased toward what I see at talks in my department and at conferences. So, it’s possible that I’m missing the work somehow. But, I think this is the least likely possibility.

2. The data is at a different scale

The level of interest for a psychologist is an individual. Computational linguistics often (though not always!) requires larger datasets, as that’s what the tools have been developed on. It could be that psychologists find it difficult to collect or find the right kind of data for these kinds of tools to be useful. This seems a bit more plausible. Political scientists are fortunate enough to have some of their key data sources be the types of things which were digitized early and in large quantities (e.g. congressional and political speeches, bills, court proceedings, news media, etc), the same has not been true of psychologists. Yes, there’s a lot of text out there, but how much of it is connected to some kind of interesting psychological meta-data? Ten years ago, we certainly could have tried scraping some livejournal posts, but without knowing something more about the author (i.e. personal and situational characteristics), the questions we could ask of the data would have been pretty limited. The anonymity of the internet frequently prevented this. Of course, this isn’t true to the same extent today, but I would hesitate to say that the problem has disappeared completely.

3. LIWC’s success

LIWC has been such a monumental success as a tool with which psychologists could explore language that there hasn’t been any consideration that one could do something else with text data. This one resonates with me. I had collected a sizeable amount of text data in graduate school, and after running through LIWC and not getting very much, I more or less gave up on the idea of the text being of any interest. I mean, what else was I going to do? I ran it through the standard psychological text analysis software and got nothing.

It wasn’t until a grad student over in English Literature asked me if I knew of any software which would allow for examining the presence of a complicated rhyming scheme in a bunch of poems that I inadvertently stumbled across the the field of computational linguistics. So I can certainly understand how this would be overlooked.

Notably, this is one reason why I feel strongly about paying attention to what’s happening outside your area of interest. Heck, even though the research questions that drive me are social psychological in origin, I almost always find more benefit in reading papers and attending talks that are decidedly not social psychological.

Written on November 14, 2014