Analyzing corporate data often requires examining various texts, such as customer reviews, open-ended survey responses, and third-party product descriptions. Traditionally, this involves complex methods to prepare the text for analysis: translating, removing unnecessary characters and stop words, breaking it into words or phrases, and standardizing the text. After preparation, techniques like frequency analysis, sentiment analysis, classification, summarization, and named entity extraction can be applied.
Automating this process is challenging. Each specific task often requires a specialized algorithm, and complex analyses might still need manual processing. Due to these complexities, text data is underutilized in business intelligence (BI), despite its potential to provide valuable insights for decision-making.
With the introduction of large language models (LLMs) in natural language processing (NLP), Zoolatech has developed a streamlined approach to simplify and automate text analysis for BI.
All input data and processing results are stored in Google BigQuery Data Warehouse (DWH). BigQuery offers built-in features for machine learning and integrates seamlessly with Vertex AI.
Data processing, both batch and manual, is automated using Google Dataform. Dataform allows analysts to write code in SQL and JavaScript, store scripts in Git/GitHub, and run them manually or automatically. It also supports running unit tests, significantly simplifying the analysts’ workflow.
The Vertex AI platform, fully integrated with BigQuery, enables the use of the Google Gemini model in the BI cycle. This integration allows for advanced data processing functions and patterns, enhancing the efficiency and quality of text analysis.
By leveraging LLMs and advanced data management platforms, Zoolatech has significantly simplified the process of text analysis, making it more accessible and effective for business intelligence purposes.
The applied solution made it possible to reduce the time for certain types of text data by up to 80%.
The results allowed text data to be analyzed and displayed alongside traditional numerical data in LookerStudio. In particular, examples have been developed that include the following types of diagrams and analysis:
For example, histograms that classify text responses. To do this, the LLM was first tasked with identifying common categories from the mass of responses. Then, once the final list of categories was compiled, the LLM classified the responses according to the resulting list.
This method is used to create generalized recommendations and reviews.
Summarization and anonymization of responses. In cases where it is necessary to not only analyze each answer, but also display them, this helps to understand the answers and suggestions in more detail. It is possible to maintain anonymity and remove identifying information and style of the message.
This is an alternative to frequency analysis when it is impossible to clearly categorize responses, but you want to visualize trends. To obtain them, the LLM is tasked with reducing each answer to one or two words.
Thus, it is possible to conduct analysis based on the number of respondents in each group, or even use different analysis methods for each group.