Industry
Industry
Solution and expertise
Oracle Cloud Infrastructure, Azure AI Translator, Oracle Autonomous Database, Oracle Apex, API
A European agricultural media portal consolidates information flows within various areas of the agricultural industry in order to analyze, organize, and enhance data. The goal is to publish the resulting insights and conclusions for the stakeholders in the field.
Despite the integration of a large number of data sources, there was a lack of accepted standards and patterns for providing the data. Some of the data sources from different regions were published in their local languages. To build the target quality results it was required to unify the data standards, bring all data content to a single language, and enrich existing data with extra valuable information. Additionally, behavior templates for each data flow should be created to allow flexibility in processing data from various sources.
Moving forward to historical analysis and improving the quality of existing ingested data it is required to store the processed and prepared data with a specific approach. Using the stored information as a basis, there was a request to create analytical dashboards that represent the calculated value and quality of the gathered information.
Considering the vision for the target solution, the existing data, and the anticipated outcomes, we developed a semi-automated solution that includes an approval process and a mapping engine to facilitate the delivery of target content.
Each data source was discovered in terms of integration approaches, data format, data structure, and data quality. Specific domain areas were classified and expected source outcomes were defined. A flexible tool for data source configuration and information structuring was provided for source-responsible persons, enabling them to efficiently manage and organize data according to platform-specific needs and requirements.
To manage the information ingested additional workspace for the data steward is established. It helps control all data flows, defining the logical rules for data quality, data enrichment, and results acceptance. Each data flow is monitored and processed in terms of categories, regions, and languages. Any additional data improvements can be configured into the processing template for each source. Additionally, the data steward workspace includes the historical data monitor to be able to control data flow within the whole data processing period.
All processed and validated data is published to the client's agricultural portal following specified categories, rules, and relevant cross-references within the portal.
The Data Content Hub was designed as a modular distributed solution to facilitate scalable and efficient data management and processing across multiple components. This architecture provides the flexibility to integrate and expand in accordance with the evolving source structure and content flows of the data ecosystem.
The diagram below provides a logical vision of The Data Content HUB model as a centralized solution for the consolidation and processing of any data source:
The Solution has been developed using a cost-effective approach by leveraging the optimal database services of Oracle Cloud Infrastructure and the AI Translator service of Microsoft Azure Cloud. A specialized SAAS platform is utilized as the extraction tool to get and prepare the data in a format ready for ingestion into the Data Content HUB. All processed and prepared data is hosted within the cloud-based database, and essential information is transmitted to the external agricultural portal's own hosting platform.
After the data is extracted from the source it is uploaded to the database cloud for further processing, enrichment, and analysis, thus the data goes through the next steps:
Within the database cloud, all data are uploaded into raw storage and based on existing rules and source-behaviour templates are prepared for further processing.
In the next steps based on source content insights data are categorized and labeled. To avoid some data discrepancy enrichment tools are applied, including language unification using the AI Translator.
All information is processed, unified, prepared, and stored as historical data and used for processing the next actual data from sources.
To enable the process control and metadata adjustments the UI functionality is prepared for the data steward.
Additional analysis dashboards were prepared to monitor the quality of data flows.
As a result, prepared data is adjusted and cross-linked with additional content based on the integrated target portal structure, and within the further steps - automatically published to the portal.
All these steps are automated and supported by the activity of the data steward enabling the establishment of a flexible and controlled data lifecycle within the data ecosystem.