Data integrity for AI success

AI & Automation

Your AI tools will fail without trusted data. Dr Tendü Yoğurtçu at Precisely explains why and how to fix this problem

With artificial intelligence (AI) rapidly disrupting the business landscape, organisations are now looking to AI tools to fuel innovation, improve decision-making, and facilitate productivity. In fact, according to Forbes, the current UK AI market is worth over £16.8 billion, and is predicted to grow to £801.6 billion by 2035.

Businesses don’t trust their data to power AI models

Despite significant use of AI across the business world, not all organisations are ready to reap the benefits of it. In fact, recent reports show that only four percent of organisations believe that their data is ‘AI-ready’, casting doubt on the reliability and accuracy of AI already in operation.

Organisations that are not correctly preparing their data for AI initiatives are left vulnerable to a wide array of potential issues, and we’ve seen the unintended outcomes of this play out via news headlines during the past 12 months.

A recent example is Google’s Gemini AI tool which, due to a lack of sufficiently rigorous testing, produced a range of controversial images of historical figures with incorrect ethnic backgrounds. This led to significant backlash forcing Google to issue a public apology.

Data bias also creates AI bias

Although AI is sometimes perceived as objective, the reality is that the organisations need to follow a business outcome driven approach and the dependability of AI outputs are inextricably linked to how trustworthy the data is that feeds it. For example, if there is a lack of representation in the data powering an AI model then it is highly likely to start producing biased outputs.

We like to think of data as being something that’s factual, or even impartial, but the truth is that human biases can create data biases too.

This is already producing real-world problems, such as creating inequities in healthcare provision, and impaired facial recognition software which less effectively identifies women of colour. In fact, after testing the AI tools created by OpenAI and Meta, UNESCO concluded that there is "unequivocal evidence of prejudice against women" in the large language models (LLM) that fuel AI.

For organisations to maximise the potential of AI, they must ensure that the data fuelling it has the upmost integrity – meaning data is accurate, consistent, and has context. The way to achieving this can be broken down into three main steps: integrating existing data across all environments, applying a robust approach to data governance and quality, and leveraging location intelligence and data enrichment to uncover hidden insights.

Step One: Integrate critical datasets across different systems

Large organisations usually leverage several, and often disjointed, environments to host critical data relating to customers, prospects, vendors, inventory, employees and more. Many companies, particularly in the financial services industry, also rely on mainframe systems to store sensitive information.

These systems are highly reliable and secure, but it can be challenging for businesses to effectively integrate complex mainframe data into the cloud environment where AI is being managed.

To improve the reliability and trustworthiness of AI outcomes, organisations need to start by breaking down data silos and integrating critical data across cloud, on-premises, and hybrid environments as well as across business silos. This helps to bring together relevant data, for example the customer demographics being served by the business, or information across all countries of operation.

By building a holistic view of the data, this ultimately provides AI models with a more comprehensive understanding of patterns and trends stored in the data, creating reliable and well-informed results.

Step Two: Build data quality and governance frameworks

Although data integration allows AI to be trained on a more complete picture of an organisation’s data, if the data itself is of poor quality – whether due to being inaccurate, outdated, incomplete, inconsistent, or irrelevant - it’s still highly unlikely that the insights an AI model produces will be trustworthy.

AI initiatives require a dedicated approach to data quality to ensure they’re using data that is accurate, consistent, and fit for purpose. This requires proactive core data quality and business rules, automated validation and cleansing, and AI-powered data observations. It allows businesses to quickly identify and address any anomalies in datasets that may cause issues downstream if left unaddressed.

Data governance also plays a critical role by helping to maintain the privacy and security of data being used in AI. Data must be monitored to ensure compliance with privacy and security regulations related to handling personally identifiable information (PII) data.

Data access and usage should also be tracked through data governance to ensure that it is being used for the intended purposes. It establishes confidence in an organisation’s data by ensuring that AI models have access to all necessary information, and that the data is being used ethically and responsibly. When being used in this way, data governance ultimately becomes the foundation of AI governance for the organisation.

Step Three: Leverage external data to reduce bias

Having complete and accurate data is imperative for reliable AI outputs. However, without context, AI models are still vulnerable to biases and can lack the nuance required for precise decision-making and predictive modelling.

Organisations can greatly increase the diversity of their data and reveal hidden patterns that may otherwise have been missed, by enriching it with trustworthy third-party datasets and geospatial insights. This can include points of interest data, demographics data, detailed address information, and environmental risk factors such as the likelihood of natural disasters.

Take, for example, the rise of record-setting weather and wildfire events. It’s never been more important for underwriters to have as much context as possible to accurately assess risk and price policies. Hyper-accurate geospatial insights and data enrichment help make that possible.

Recent years have also seen huge increase in customer expectations for personalised communications and tailored services, across industries. Enriching customer data helps companies to understand consumers at a whole new level – allowing them to stay one step ahead of the competition.

By fuelling AI models with augmented data, organisations can be sure that they are powering the most contextually relevant and reliable outcomes for all kinds of critical decision-making.

Creating a foundation of data integrity for AI success

It seems clear that the use of AI will continue grow at an almost exponential rate across the business world, driven through access to virtually limitless tools and applications. C-suite discussions will remain dominated by the ways AI can drive innovation while improving productivity and efficiency, keeping it as a top priority for investment.

But to truly reap the benefits of AI, organisations need to be sure they can trust the data powering it. By prioritising robust data integrity strategies, businesses can build a foundation of accurate, consistent, and contextual data, ensuring AI outcomes are valuable and reliable.

Tendü Yoğurtçu PhD is chief technology officer at Precisely

Main image courtesy of iStockPhoto.com and Peshkova