WordStat Information

Content Analysis and Text Mining

A highly advanced content analysis and text-mining software with unmatched handling and analysis capabilities

“For those who have ever needed to find themes or relationships in verbatim responses, focus group transcripts, or other text sources, WordStat is very attractive indeed.” Marketing Research, Spring 2006

WordStat is a flexible and easy-to-use text analysis software – whether you need text mining tools for fast extraction of themes and trends, or careful and precise measurement with state-of-the-art quantitative content analysis tools. WordStat‘s seamless integration with Simstat – our statistical data analysis tool – and QDA Miner – our qualitative data analysis software – gives you unprecedented flexibility for analyzing text and relating its content to structured information, including numerical and categorical data. It can also be used as an extension for the Stata statistical package.

What it is used for?

WordStat can be used by anyone who needs to quickly extract and analyze information from large amounts of documents. Our content analysis and text mining software is used for:

• Content analysis of open-ended responses, interview or focus group transcripts
• Business intelligence and competitive web sites analysis
• Information extraction and knowledge discovery from incident reports, customer complaints
• Content analysis of news coverage or scientific literature
• Automatic tagging and classification of documents
• Fraud detection, authorship attribution, patent analysis
• Taxonomy development and validation

Key and Unique features

Powerful content analysis and text mining software for handling large amounts of unstructured information. WordStat can process up to 20 million words per minute and identify all references to user-defined concepts using categorization dictionaries.

Integrated exploratory text mining and visualization tools such as clustering, multidimensional scaling, proximity plots, and more, to quickly extract themes and automatically identify patterns.

Relates unstructured text with structured data such as dates, numbers or categorical data for identifying temporal trends or differences between subgroups or for assessing relationship with ratings or other kinds of categorical or numerical data.

Use existing or create your own hierarchical content analysis dictionaries or taxonomies composed of words, word patterns, phrases as well as proximity rules (such as NEAR, AFTER, BEFORE) for achieving precise measurement of concepts.

Truly unique computer assistance for dictionary building with tools for extracting common phrases and technical terms and for quickly identifying in your text collection, misspellings, synonyms, antonyms and related words.

One click access to keyword-in-context and keyword retrieval tools for easy identification and coding of relevant text segments, validation of content analysis dictionaries, word-sense disambiguation or for drilling down to the source documents.

Seamless integration with a state of the art qualitative coding tool (QDA Miner), allows more precise exploration of data or more in-depth analysis of specific documents or extracted text segments when needed.

Machine Learning for automatic document classification using Naive Bayes and K-Nearest Neighbours algorithms with automatic features selection and validation tools. Classification models may then be saved on disk and reapplied on new data.

Easy importation of databases, spreadsheets and documents (including PDF and HTML) as well as exportation of text analysis results to common industry file formats (Excel, SPSS, ASCII, HTML, XML, MS Word) and graphs (PNG, BMP and JPEG).

GIS MAPPING module to create interactive plots of data points, THEMATIC MAPS, and HEATMAPS, along with a GEOCODING web service for transforming location names, postal codes and IP addresses into latitude and longitudes

›› SEE A DETAILED LIST OF FEATURES

Benefits

WordStat is a content analysis and text mining add-on module of QDA Miner. This powerful software can:

Analyze quickly a large amount of unstructured data such as customer feedback, emails, open-ended responses, interview transcripts, incident reports, patents, legal documents, blogs or websites.
Build a content analysis dictionary in order to categorize automatically text data and quickly retrieve text segments related to a specific category (for example, retrieve positive and negative comments). You can apply statistical analysis on categories or explore the relationship between categories and other variables associated with the documents (ex. authors, location, time, etc.) in order to identify trends. To save you time, you can reapply in few seconds the same dictionary for a similar project, customize or use an existing dictionary.
Provide statistical and visualization tools that are easy to interpret such as word frequencies, clustering, correspondence analysis or heatmap. WordStat can also compute statistical tests to verify the strength of the analysis. All those features allow you to quickly identify themes, trends and patterns without the need to read the documents and to explore the relationship between the content of documents and other categorical or numerical variables such as gender, age, education level, etc.
Transform text into statistical tables and graphics and at anytime, you can drill down to the source documents in order to see what is behind the numbers.
Provide a total control over the content analysis process and enough closeness to the data to achieve the perfect balance between text analysis efficiency and precision in the results.
Easily create outstanding presentations and write a professional report that includes statistical tables and graphics provided by WordStat such as bar charts, pie charts, bubble charts, dendrogram, concept maps, correspondence analysis and more.
Analyse text data in almost any languages, because the software relies on language independent techniques.
No need to identify and fix manually spelling mistakes, WordStat can automatically correct them in your documents, and you will therefore save a lot of time. In order to standardized the writing of similar words and phrases and therefore, obtaining more accurate statistical results, WordStat allows you to substitute any specific word or phrase with another one of your own choice.

Reviews of WordStat

ORGANIZATIONAL RESEARCH METHOD, March 2010
RESEARCH, September 2008
THE POLITICAL METHODOLOGIST, vol 15 (1), Summer 2007
JOURNAL OF MIXED METHODS RESEARCH , April 2007
MARKETING RESEARCH, Spring 2006
OR/MS Today, October 2005
RESEARCH, August 2005
AMERICAN STATISTICIAN, February 2005
LINGUIST, April 2004
SOCIAL SCIENCE COMPUTER REVIEW, vol 18(3), Fall 2000
FIELD METHODS, vol 11(2), 1999