The client is a company specializing in data science and reputation management. They have developed an innovative product that integrates academic evolutionary biology and automated data visualization to improve the speed and accuracy of reputation management.
An essential component of the product is a set of metrics and algorithms for calculating the numerical characteristics of the client’s reputation. Some reputation indicators rely on analyzing media content associated with the company. The client aims to provide its clients with monthly reports on reputation changes, detailed explanations, and actionable advice.
However, the media content supplied by a third party was often generalized, making it difficult to analyze and categorize. The generalized content was often irrelevant or lacked the context needed to link the article to the specific company or its reputation.
Before creating the report, we identified several problems related to the quality of the supplied content and its duplication. To address these issues, we:
Our solution utilized the Amazon Bedrock platform with the following foundation models:
We used the Langchain.js framework running under Node.js, which decoupled the main logic from specific APIs and simplified work with various aspects of LLMs. Vector storage was implemented within the application’s main DBMS (PostgreSQL), utilizing the pgvector plugin.