A practical guide to text mining with topic extraction – Karl, Wisnowski, & Rushing

Regardless of whether you code in R or not, this is a great paper. It breaks down the processes and ideas that you’ll use in topic modeling and data pre-processing into clear steps with just enough maths that you’ll be wishing you took it for a-level!

Great paper, read it.

Karl, A., Wisnowski, J., & Rushing, W. H. (2015). A practical guide to text mining with topic extraction. Wiley Interdisciplinary Reviews: Computational Statistics7(5), 326-340.

Continue reading “A practical guide to text mining with topic extraction – Karl, Wisnowski, & Rushing”

Advertisements

Automatically profiling the author of an anonymous text – Argamon, Shlomo, et al.

Always wanted to bedazzle your friends with personality insights from their writing but didn’t have the money for a Watson api? Were you like me and hoping for some insights on how to use corpora for sociological research? Well buckle up, there’s a paper here for you!

Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2), 119-123.

Continue reading “Automatically profiling the author of an anonymous text – Argamon, Shlomo, et al.”

Masculinities in Cyberspace – Schmitz & Kazyak – Part 2 (Methodology)

Following on from last week’s methodology, the time has come to talk about findings! I’m really excited to talk about the findings here as they are attempting to bucket the methods and strategies that Men’s Rights Activists are using online both to create their own identities and to convince others by progressing their arguments. Their research questions were;

  • ” are the sampled MRA groups antithetical to feminism and the goals of gender equality?”
  • ” what strategies do online MRA groups utilize to delegitimize feminism and the goals of gender equality?”

Lets dive back in!

Schmitz, R. M., & Kazyak, E. (2016). Masculinities in Cyberspace: An Analysis of Portrayals of Manhood in Men’s Rights Activist Websites. Social Sciences5(2), 18.

Continue reading “Masculinities in Cyberspace – Schmitz & Kazyak – Part 2 (Methodology)”

Comparing Corpora using Frequency Profiling – Rayson & Garside

If you want to learn how to do a technique then it might be an idea to check the source of the technique in the first place. Whilst Rayson and Garside didn’t invent the technique, they perfected it! In the last post I explained how I implemented their work, this post is all about the ins and outs of their paper that has been cited a huge 492 times!

Rayson, P., & Garside, R. (2000, October). Comparing corpora using frequency profiling. In Proceedings of the workshop on Comparing Corpora(pp. 1-6). Association for Computational Linguistics.

Continue reading “Comparing Corpora using Frequency Profiling – Rayson & Garside”

Masculinities in Cyberspace – Schmitz & Kazyak

You know me, I’m fascinated by masculinities online and when I came across this citation I just couldn’t resist! I’m usually a stickler for methodology in gender research but this paper really got me thinking. I’ll admit it’s not my perfect cup of tea…

But it’s pretty close!

Schmitz, R. M., & Kazyak, E. (2016). Masculinities in Cyberspace: An Analysis of Portrayals of Manhood in Men’s Rights Activist Websites. Social Sciences5(2), 18.

Continue reading “Masculinities in Cyberspace – Schmitz & Kazyak”

The Mann-Whitney-Wilcoxon U-Test For Corpus Linguistics (Python)

I’m currently working on the analysis for the counter/analysis of the hypothesis proposed in this paper I read recently and I thought I might share back in how I’m the data do my bidding.

All cards on the table: I’m using Python 2.7 on a laptop with an i7 in it on a corpus of 14000 tweets pulled from a set of seed keywords that are linked to AAVE and a comparison corpus that is based on general Twitter usage.

Good? Good! I’ll start with the process, then cover some of the theory of why you’d use the Mann Whitney Wilcoxon, why it works in my case and then finally how it works!

Continue reading “The Mann-Whitney-Wilcoxon U-Test For Corpus Linguistics (Python)”

Information Retreival – Tzoukerman, Klavans, and Strzalkowski

How can we query a large database and get the most relevant text documents? What methodology displays the best results and what does this tell us about the nature of our language and our existing methodologies of research? Tell me honestly that none of those questions grabs your interest and I’ll call you a liar!

Tzoukerman, Klavans & Strzalkowski. “Oxford Handbook of Computational Linguistics.” Edited by R. Mitkov (2003).

Continue reading “Information Retreival – Tzoukerman, Klavans, and Strzalkowski”