Home / Blog / How to start using automatic categorization in 9 steps to save up to 60 hours a month!

How to start using automatic categorization in 9 steps to save up to 60 hours a month!

21.07.2024

Step 1: Understanding automatic categorization

Automatic categorization of survey statements is a process in which a computer system analyzes text data and assigns it to appropriate categories without the need for manual reading and analysis. By using advanced machine learning and artificial intelligence algorithms, automatic categorization allows for efficient management of large amounts of information, which is particularly useful in market research, surveys or opinion analysis.

Why is it important? First of all, automatic categorization saves time. Manually assigning responses is time-consuming and prone to errors, while automating the process allows for quick and precise assignment of data. It also increases operational efficiency, enabling companies to make faster decisions and better manage resources. Finally, it improves the accuracy of data analysis, minimizing the risk of human error and leading to more accurate reports.

Key application examples

Consumer opinion analysis: Companies can automatically categorize open-ended survey responses to understand what customers think about products and services.
Feedback analysis: Categorizing customer feedback and suggestions helps identify areas for improvement and develop new features.
Unsolicited voice analysis: All social media and map sites are excellent sources of knowledge regarding the needs and concerns of your own customers as well as those of your competitors

Categorization automation can be approached in two ways:

Free automatic categorization: The system independently identifies and creates categories based on content analysis. This is useful when you don't have a pre-existing data structure or want to discover new patterns and themes in responses.
Automatic categorization matching a list of categories: The system assigns data to predefined categories. This method is more controlled and precise, especially when we have clearly defined analysis goals and know which categories are relevant. In this case, individual language models taught in terms of specific categories apply.

Step 2: Choosing the right tool

Choosing the right tool for automatic categorization is crucial to the efficiency and accuracy of the process. Below are some important criteria to consider when choosing a tool, and examples of popular solutions available on the market.

Criteria for selecting a categorization automation tool

Functionality:
- Compliance with requirements: Make sure the tool supports all necessary features, such as text analysis, integration with other systems, the ability to define custom categories, and automatic learning from new data.
- Flexibility: The tool should be flexible enough to adapt to the specific needs of your organization. There are solutions on the market that allow you to tailor the operation of mechanisms to your needs, as well as detailed analysis of the quality of performance of categorization models
Ease of use:
- User interface: An intuitive and easy-to-use interface will allow quick implementation of the tool and reduce the learning curve for employees.
- Technical support and documentation: The availability of detailed documentation and technical support can be crucial if you have problems configuring or using the tool.
Scalability and performance:
- Performance: The tool should be able to process large amounts of data in a reasonable amount of time.
- Scalability: Make sure the tool can grow with your needs, handling increasingly large data sets and more complex analysis.
Integrations:
- Compatibility with other systems: The tool should easily integrate with other systems used in your organization, such as CRM, analytics systems or content management platforms.
Cost:
- Pricing model: Consider licensing costs, maintenance fees, and any additional costs for implementation and employee training.
Security:
- Data confidentiality: Your data should only be protected and processed in a way that ensures 100% security, that is, locally. Relying on any third-party cloud solution, which is used in many market solutions, runs the risk of your data being used to train public models

An example of a research tool that provides automated processing is YourCX

YourCX

YourCX is a platform focused on analyzing customer experience and conducting any research, which offers automatic categorization of open-ended responses and sentiment analysis.

Functionality: Response categorization, sentiment and emotion analysis, identification of key themes, monitoring of key customer satisfaction indicators.
Ease of use: Intuitive interface, comprehensive technical support and documentation.
Integrations: Easy integration with CRM systems, marketing tools and analytics platforms.

Platforms like Medallia and Qualtrics also have similar capabilities.

Step 3: Data preparation

Automated categorization of open responses requires properly prepared data that can be analyzed. Data sources can be diverse:

Surveys: Data from online surveys in which respondents answer open-ended questions.
Online reviews: Reviews and opinions posted on websites such as e-commerce stores, discussion forums and product review sites.
Social media: Comments, posts and opinions posted on socialmedia platforms such as Facebook, Twitter, Instagram, Google maps or LinkedIn.

Determine the purpose of the question

In order for data to be effectively analyzed by automated categorization tools, it is important to determine when and in what situation the data was collected. This is needed in order to give the right context for content analysis mechanisms. For example, it is important here:

Determining the exact purpose of the question being asked:
- Purpose of the question: Think about what you want to achieve through the question you are asking respondents. Is it to understand customer satisfaction, to identify product problems, or to solicit ideas for service improvements?
- Context of the question: Determine the context in which the question is being asked. Is it about a specific service, a product, the timing of a purchase, or perhaps a general opinion about the brand?
Determine the moment of the respondent's interaction with the survey:
- Timing of the survey: Determine at what point the customer receives the survey. Is it after making a purchase, after using a service, or as part of a regular satisfaction survey?
- Relevance: Make sure the question being analyzed relates to the customer's current experience and is sent at the right time to get the most valuable responses.

Step 4: Don't have initial categories? Nothing easier - use automated categorization

If you don't have predefined categories to begin with, YourCX allows you to automatically generate categories based on text analysis. This is a quick and 100% automated solution:

Automatic categorization: the tool uses advanced machine learning algorithms to identify patterns and create categories. This gives you real-time access to grouped statements almost in real-time.
Extracting relevant categories: After automatic categorization, review the generated categories and select the ones that are most relevant and fit the goals of your analysis.

In a very short time, you can find out what respondents are writing about and pull out business-relevant topics.

Step 5: Prepare training data - import categorized statements or generate opinions for categorization

To ensure that the model is trained accurately, it is necessary to prepare the training data properly. After all, we want the categorization model to work exactly as we expect it to. The better examples we provide, the better the mechanism will work in the future.

What to pay attention to and what to do:

Import data

Import existing reviews that have already been categorized manually. You can do this directly through YourCX data importer. Real examples of manually categorized statements will be a very good input for the categorization model.

Category revision

Categories should be logically separated if you want to have a low proportion of categories assigned redundantly. Examples of close categories are reliability and failures or availability and locations. However, if it is acceptable to assign several categories to a statement, close categories can be left.

Include all aspects of the topic
Categories should cover all aspects that may appear in a topic. If we want to be able to break out categories for problems, there should also be categories for praise for similar topics or general categories. Otherwise, it will be the case that promoters praising a mobile app could be assigned the category "mobile app performance problems." However, if the model is to work only on critical statements, it can target only problems. This is also related to the prior analysis of the purpose and context of the question being analyzed.

Generating training data

If you don't have enough self-categorized opinions, YourCX provides mechanisms to generate additional opinions automatically to provide enough training data. With just a few clicks, you can generate thousands of diverse statements used to train a categorization model. By automatically preparing training statements, you can save dozens of hours.

Remember - generate synthetic statements and add manually categorized statements so that each category has a minimum of 200 examples. The better the examples, the better the categorization model will work.

Step 6: Train the model

Training the categorization model in YourCX is the step that creates a viable mechanism and language model that assigns categories according to your expectations.

Train the model based on imported or generated feedback. YourCX will automatically adjust the model parameters to achieve the best possible accuracy. However, if you think it would be useful to change the training parameters - you can influence everything. Examples of parameters you have influence over are:

Choice of base language model
The number of training runs
The setting of the overweighted loss function
The minimum acceptable number of scores for a category to be included in coaching
The minimum ratio of the least numerous category to the most numerous category
Threshold of probability of acceptance for the first category
Threshold of probability of acceptance for subsequent categories

Step 7: Assess the quality of the categorization model

Once the model has been trained, it is necessary to evaluate its quality to ensure that it works properly and categorizes opinions effectively.

Testing the model: The model is automatically tested on a training dataset. If you would like to test on an additional validation dataset, just import it and the analysis will be performed automatically.

Evaluation metrics: Use metrics such as accuracy (accuracy), precision (precision), sensitivity (recall) and F1-score to evaluate the quality of the model.

Error checking: Identify and analyze cases miscategorized by the model (if any). YourCX allows you to easily review and analyze such cases, as well as edit categories.

Step 8: Further optimize the model's performance

If the model is not working well enough, YourCX offers full support for any problem or discrepancy cases to optimize

Correct the categories

If the problem is related to incorrectly assigned categories, manually assign correct categories for the problematic statements.

Adding new categories

If there are new topics that were not included in the existing categories, generate additional feedback for these topics to expand the model.

Modify parameters

Modify the model's performance parameters, such as the loss function, to change the model's quality of operation.

Manipulate probability acceptance thresholds

The working model determines probabilities for all categories. By setting the thresholds high enough, you can get rid of over-assigned categories. However, on the other hand, the risk of eliminating the correct category also increases.

Above you can see the confusion matrix, which informs about potential problems with category assignment.

Re-training: Train the model again on the revised and new data to achieve better results.

Step 9: Additional capabilities - sentiment and emotion analysis

Categorizing statements is not enough. With sufficiently general and separable categories, it is worth analyzing the sentiment and emotions in the statements to know which are positive and which are negative. YourCX offers this type of solution off-the-shelf, allowing you to automatically analyze the distribution of emotions and sentiments for business-relevant issues, as well as get alerts for critical remarks, e.g. about Customer Service.

Sentimentanalysis:

Use sentiment analysis tools in YourCX to determine whether feedback is positive, negative or neutral. This will help you better understand customer sentiment.

Emotion analysis:

YourCX allows you to identify emotions expressed in reviews, such as joy, anger, sadness, surprise, etc. Integrating the results of emotion analysis with categorization gives a more complete picture of customer feedback.

Integration of results:

Integrating sentiment and emotion analysis results with categorization produces more detailed and valuable data for analysis. As an example, the following shows Google Maps opinion data processed automatically with automatic satisfaction ratings for each issue and the percentage for that issue.

If you are interested in the topic and would like to save time on analyzing survey or opinion data, let's talk.

Sources of information

Data analysis

Actions

For whom

Problems / Issues

Materials

About us