Carl Meyer REU Projects

Summer 2014 REU Project

Text Mining: Interpreting Twitter Data

Student Participants

Daniel Godfrey, University of North Carolina at Charlotte

Caley Johns, Brigham Young University - Idaho

Carol Sadek, Wofford College

Advisors

Carl D. Meyer (Faculty Advisor, NC State, Mathematics)

Shaina Race (Faculty Advisor, NC State, Institute For Advanced Analytics)

Project Description

Given unorganized data that may be derived from text or simply raw numerics, the objective is to learn and develop techniques for detecting, revealing, and analyzing hidden patterns and clusters of information that exhibit some sort of similarity or commonality. The size and diverse nature of the data sets of interest make this a formidable but extremely important problem.

The first part of the project will be to learn and understand how to use some of the state-of-the art techniques by analyzing some selected practical applications. Emphasis at the outset will be placed on text mining and community detection although the content eventually can be directed by the interests of the participants. Programming will be integral as students implement existing methods and develop their own improvements.

The ultimate goal is to explore possibilities for developing some new methodologies and algorithms whose aim is to detect patterns and structure in unlabeled data where no value for error or accuracy can be placed on the final result.

The mathematics employed involves linear algebra, probability and statistics, networks and graphs, and some numerical analysis coupled with scientific computing principles.

Articles and Poster Presentations

The results are explained in the following two articles.

A Case Study in Text Mining: Interpreting Twitter Data from World Cup Tweets

Interpreting Clusters of World Cup Tweets

A poster presentation was given at the Undergraduate Research Symposium, McKimmon Center, NC State University, August 2014