Summer 2014 REU Project

      Text Mining: Interpreting Twitter Data

    Student Participants
    • Daniel Godfrey, University of North Carolina at Charlotte
    • Caley Johns, Brigham Young University - Idaho
    • Carol Sadek, Wofford College


    Project Description
    • Given unorganized data that may be derived from text or simply raw numerics, the objective is to learn and develop techniques for detecting, revealing, and analyzing hidden patterns and clusters of information that exhibit some sort of similarity or commonality. The size and diverse nature of the data sets of interest make this a formidable but extremely important problem.
    • The first part of the project will be to learn and understand how to use some of the state-of-the art techniques by analyzing some selected practical applications. Emphasis at the outset will be placed on text mining and community detection although the content eventually can be directed by the interests of the participants. Programming will be integral as students implement existing methods and develop their own improvements.
    • The ultimate goal is to explore possibilities for developing some new methodologies and algorithms whose aim is to detect patterns and structure in unlabeled data where no value for error or accuracy can be placed on the final result.
    • The mathematics employed involves linear algebra, probability and statistics, networks and graphs, and some numerical analysis coupled with scientific computing principles.

    Articles and Poster Presentations