Content Analysis and Text Mining

Sample Studies Using WordStat

A full list of over 350 studies that have used Wordstat and/or QDA Miner can be found here.

Application of Text Mining to Aviation Safety Data

Authors: Normand Péladeau (Provalis Research) and Craig Stovall (JetBlue Airline).

Description: This technology demonstration applied text mining routines by Provalis Research Corporation to text-intensive safety reports at JetBlue Airways.

Full reference: Péladeau, N., & Sovall, C. (2005). Application of Provalis Research Corp.'s Statistical Content Analysis Text Mining to Airline Safety Reports. Global Aviation Information Network.

Content Analysis of Hotel Customer Satisfaction

Authors: Madeleine Pullman, Kelly McGuire, Charles Cleveland (Cornell University School of Hotel Administration; Quester Linguistics)

Description: Customer surveys and comment cards are all well and good, but the best way to gain a full understanding of a customer's feelings about a hotel is to analyze the context of the customer's comments. Heretofore a laborious process, qualitative data analysis is rapidly becoming feasible for hoteliers, using software applications that support content analysis and data linking and those that offer advanced linguistic analysis. The content-analysis applications allow an analyst to assess the number of times a customer uses a particular word or phrase in written material or transcribed remarks. By counting the frequency of words and noting the association of certain words, one can categorize themes and concepts. By thus "quantifying" the qualitative communication, an analyst can associate the resulting information with demographic or other quantitative data. A more sophisticated analysis is possible with linguistic analysis, which examines the semantics, syntax, and context of customers' verbal communications.

Full reference: Pullman, M. McGuire,K, Cleveland, C. (2005). Let Me Count the Words: Quantifying Open-Ended Interactions with Guests. Cornell Hotel and Restaurant Administration Quarterly. Vol. 46 (3), 323-343.

Content Analysis of Operations Research Employment Ads.

Authors: ManMohan S. Sodhi and Byung-Gak Son (Cass Business School, City University London)

Description: Preliminary results from analysis of employment ads offer insight on job market for students, instructors, university program directors and employers

Full reference: Sodhi, M.S & Son, B.-G. (2005). What Industry Wants From O.R. Grads. OR/MS Today, August 2005 Issue.

Mining Microarray Expression Data by Literature Profiling.

Authors: Damien Chaussabel and Alan Sher (Laboratory of Parasitic Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health)

Description: The authors developed a mining technique based on the analysis of literature profiles generated by extracting the frequencies of certain terms from thousands of abstracts stored in the Medline literature database. Terms are then filtered on the basis of both repetitive occurrence and co-occurrence among multiple gene entries. Finally, clustering analysis is performed on the retained frequency values, shaping a coherent picture of the functional relationship among large and heterogeneous lists of genes. Such data treatment also provides information on the nature and pertinence of the associations that were formed. The analysis of patterns of term occurrence in abstracts constitutes a means of exploring the biological significance of large and heterogeneous lists of genes. This approach should contribute to optimizing the exploitation of microarray technologies by providing investigators with an interface between complex expression data and large literature resources.

Full reference: Chaussabel, D., & Sher, A. (2001). Mining microarray expression data by literature profiling. Genome Biology, 3, 1-55.

Mesuring Disclosure of Intangible Resources in Corporate Annual Reports.

Authors: Dina Gray and Göran Roos (Center for Business Performance, Canfield School of Management).

Description: There has been a movement in the accountancy field to induce companies to disclose the worth of their intangibles, and researchers argue that the demand for the external communication of intangibles and value drivers is increasing in capital markets. This paper discusses empirical research that has been carried out in 95 companies in the UK and 16 companies in Finland to determine what intangible resources companies consider important in value creation, what intangible resources they actually measure, and what, if any, of those measures they actually disclose to their stakeholders. WordStat content analysis features have been used to assess the level of disclosure found in the corporate annual reports of those companies.

Full reference: Gray, D. & Roos, G. (2004). What intangible resources do companies value, measure, and report? A synthesis of UK and Finnish research. International Journal of Learning and Intellectual Capital, vol. 1(3), 242-261.

Measuring Employer Expectations of Information Professionals

Authors: Linda Marion, Mary Anne Kennan, Patricial Willard and Conception S. Wilson (School of Information Systems, Technology and Management, The University of New South Wales).

Description: This paper reports the findings of an exploratory study of 395 library job advertisements in Australia and the USA from August to October 2004. To investigate similarities and differences between the two countries’ data we conducted a content analysis and co-word analysis of professional job ads from academic, public and special libraries. Interpersonal Skills, Behavioural Characteristics, and responsiveness to a changeable Environment1 were identified as critical requirements in both countries.

Full reference: Marion, L., Kennan, M.A., Willard, P. & Wilson, C.S. (August, 2005). A tale of two markets: employer expectations of information professionals in Australia and the United States of America. Paper presented at the World Library and Information Congress: 71th IFLA General Conference and Council, Oslo: Norway.

Content Analysis of Learning Logs of Marketing Managers

Authors: Friesner, Tim & Hart, Mike (Business Management Group, University of Winchester).

Description: This research project used learning logs as a research instrument to gather data on the reflection, experience and learning of a sample of marketing managers from British theatres. This paper introduces Learning Log Analysis as an analytical approach to help researchers to interpret findings. For this research project Learning Log Analysis employs content analysis, case study analysis and narrative and storytelling analysis. This paper aims merely to introduce the approach. It in no way attempts to be a conclusive formula, and encourages further research and dialogue.

Full reference: Friesner, Tim and Hart, Mike (2005) 'Learning Log Analysis: Analysing data that Record Reflection, Experience and Learning' Paper delivered at 4th European Conference on Research Methodology for Business and Management Studies [ECRM2005] Université Paris-Dauphine, 21-22nd April, 2005

Searching for Clinical Prediction Rules in MEDLINE

Authors: Ingui, Bette Jean; & Mary AM., Rogers (Upstate Medical University, Syracuse, New York).

Description: Click here to read the abstract.

Reference: Ingui, B.J. & Rogers, M.A. (2001). Searching for clinical prediction rules in MEDLINE. Journal of the American Medical Informatics Association, 8, 391-397.

Concept Analysis of Gender, Feminist, and Women's Studies Research in the Communication Literature

Author: Timothy Stephen (Department of Communication, University at Albany & President of CIOS)

Description: In recent decades a distinctive literature has accumulated discussing the role of gender, feminism, and women's studies-related research (GFWS) in the communication field; however, questions persist about how this research is represented in the field's literature. This article sketches the history of this representation in a field test of a concept mapping technique that tracks patterns of publication and isolates conceptual associations within the titles of GFWS articles. Findings support the idea that the feminist scholarship is represented by a unique configuration of conceptual relationships, has a history unto itself separated from that of studies of gender or sex differences, and that feminist research has entered the literature in two distinctly different eras. Feminist research has a unique and uneven pattern of representation in the field's literature. The concept mapping methodology is argued to provide one means for offsetting the fragmentation of the discipline's scholarship that has occurred as a result of the rapid proliferation of new specialized communication journals occurring throughout the last three decades.

Reference: Stephen, T. (2000). Concept Analysis of Gender, Feminist, and Women's Studies Research in the Communication Literature. Communication Monographs. 67, 193-214.

Differentiating the Regional Communication Journals: A Computer Assisted Concept Analysis

Author: Timothy Stephen (Department of Communication, University at Albany & President of CIOS)

Description: The journals of the four U.S. regional communication associations, all maintaining equivalent editorial policies, have published jointly more than 2,900 articles since 1970. Computer assisted automated content analysis was employed to study the conceptual structure of the discipline as represented by this literature. Using data from the ComIndex database, the words in article titles were linguistically normalized and filtered to isolate significant concept terms. Cluster analysis was then applied to the transformed data. This procedure identified 12 clusters of concepts, representing areas of significant scholarly interest across the four journals. ANOVA procedures revealed differences between the four journals on 5 of the 12 clusters. Results are considered in light of differences between the journals and the implications of the findings for the role of omnifocus journals in an era of increasing fragmentation in scholarly publishing.

Reference: Stephen, T. (2001). Differentiating the US regional communication journals: A computer assisted concept analysis. Presented at the meeting of the International Communication Association. Washington D. C., May.

Content analysis of journal abstracts in communication

Author: Timothy Stephen (Department of Communication, University at Albany & President of CIOS)

Description: With the help of WORDSTAT software, the author analyzed the titles of papers published in Human Communication Research. Word co-occurrences were identified and then cluster analyzed, revealing five major clusters, four of which also contained at least two subclusters. This procedure, the author suggests, shows how content analysis can be used in bibliometric research.

Reference: Stephen, T. (1999). Computer-assisted concept analysis of HCR's First 25 Years. Human Communication Research, 25, 498-513.

Automated Content Analysis of Multiple-Choice Test Item Banks

Authors: Ford, John M., Thomas A. Stetz, Marilyn M. Bott, and Brian S. O'Leary (US Office of Personnel Management)

Description: Test item review is a specialized type of content analysis conducted to identify and correct test item flaws early in the test development process. Test item reviewers not only examine the targeted content of a test but also remove inappropriate content and balance various types of incidental content. An automated content analysis implementation of Hiller’s verbal ambiguity scales and Laffal’s General Concept Dictionary of English was used to examine 576 multiple-choice test items before and after test item reviewand revision by experienced item editors. Hiller’s scales detected some problems with item clarity. Laffal’s categories detected content imbalance between test forms but not inappropriate item content.

Reference: Ford, J.M., Stetz, T.A., Bott, M.M. & B.S. O'Leary. Automated content analysis of multiple-choice test item banks. Social Science & Computer Review, 18, 258-271.

Content Analysis of Skills and Characteristics in the Online Ads for Academic Libraries

Author: Linda Marion (Drexel University)

Description: The paper explores the territory of digital librarianship and examines the skills employers are seeking in new hires when filling technologically oriented jobs. Marion's presentation contains a content analysis of job ads to provide a map describing the domain of digital librarianship.

Reference: Marion, L. (2001). Digital librarian, cybrarian, or librarian with specialized skills: Who will staff digital libraries? In H. Thompson (Ed.), Crossing the Divide: Proceedings of the Tenth National Conference of the Association of College and Research Libraries, March 15-18, 2001, Denver, CO. (pp. 143-149), Chicago: American Library Association. Winner of the 2001 ACRL Student Research Award.

Evaluating Learning about the Nature of Science

Authors: Pamela C. Burnley, William Evans, & , and Olga S. Jarrett (Georgia State University)

Description: The authors studied changes in knowledge of science and attitudes regarding science among participants in a summer Research Experiences for Undergraduates program. They developed and tested a new survey instrument based on clusters of statements representing a variety of philosophical positions. They also studied the use of open-ended questions regarding the nature of science. Statistical analysis of responses to open-ended questions was found to differentiate between college students with different science backgrounds and detect some changes over the course of their program (full abstract).

Reference: Burnley, P.C., Evans, W., & Jarrett, O.S. (2002). A Comparison of Approaches and Instruments for Evaluating a Geological Sciences Research Experiences Program. Journal of Geoscience Education, 50(1), 15-24. (Click here for a condensed version)

Fictional market study on midscale hotels customers:

Authors: Wasamon Apichatvullop & Marianne Wolenski (students at the New Jersey Institute of Technology)

Description: Fictional market research made by two students in a course on information retrieval and text mining (Professeur Dr Brook Wu). The objective of this study was to determine which aspects of midscale hotels customers considered most appealing and to better determine what customers’ needs are.

Reference: Click here to obtain a PDF version of the study. (1.7Mb)

Content Analysis of Speeches of US Presidential Candidates

Author: Normand Péladeau (Provalis Research)

Description: This paper was presented as part of a bakeoff competition of computer assisted content analysis software. Speeches were made available about one week before the conference.

Reference: Peladeau, N. (2001). Analysis of US Presidential Candidates's Speeches using WordStat 3.0. Paper presented at the Computer Assisted Content Analysis (CATA) workshop, 51th Annual Conference of the International Communication Association, Washington, DC.

Miscellaneous Projects

Analysis of groups online discussions of members with diabetes (June Forkner-Dunn & Sylvia L. Marino)

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | QDA Miner
Stats Books | Stats Links | Anglesey


Like us on Facebook Facebook

Get in Touch

  • Email:
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom