AI Talk: Rubbish in, rubbish out - data quality in the age of GenAI

Technology13 Nov 2024

On 5 November 2024, AI Talk host Kevin Craine was joined by Diana Kearns-Manolatos, Technology Transformation Research Leader, Deloitte; Pat McGrew, Managing Director, McGrewGroup; and Deyana Petrova, Digital, Growth and Product Head, August Collection.

Views on news

According to a June 2024 study from TechTarget’s Enterprise Strategy Group, "Data Governance in the Age of AI," 70% of organizations said they prioritize data quality and integrity in their AI-driven initiatives. The heightened focus underscores the undeniable link between strong data governance and the success of AI projects. The lack of trust is a major barrier to GenAI adoption. Data quality and worker trust in the technology are critical too to deployment and upscaling.

Also, organic and GenAi generated data must be distinguished. Ai governance is rather complex and those getting into this new role must be trained to understand all the nuances. For SMEs, their CRM system, and how data enters and is stored in the system is a logical place to start. Customer data often sits in silos, and when data across different silos is not consistent, discrepancies in data need to be resolved and a single source of truth established before any AI deployment.

It’s also important to consider the Ai use cases that are relevant to the company, as well as what data architecture is needed to enable those capabilities. An organisation must invest in data management and data lifecycle programmes too to run successful AI deployment projects.

The importance of governance and ethical training

There are sectors such as healthcare, where digital data has been stored with special care. Think of how insurers would like to get hold of patient data to better manage their risks, which is illegal. However, an actuary, with the best of intentions, may cause a data leak if they are not trained properly. Bringing in external experts can be more effective than using internal, self-trained coaches. GenAI focussed ethical training should include how these models should be trained on data in an ethical way, where users of the tool give their consent to developers using their conversations. There are a couple of frameworks that companies can adopt, such as Deloitte’s Trustworthy AI Framework, which has 7 layers’ of trust starting with transparency and explainability.

It’s key for users who give prompts to the Gen AI, for example, to know what data the model has been trained on, as hallucinations pick up when it gets a question that is beyond its contextual understanding. It can also help if the company has a library of prompts that have been pre-configured and pre-tested on the model to ensure reliable outputs. Although bias will always be there to some extent, so, for now, there must be a human in the loop to make sure that it’s kept to the minimum – an analyst even with a tiny bit of bias can put a business at risk. Bias should constitute a part of not only ethical training but quarterly reviews as well. Those who have been working for the same company for a long time may also develop a tunnel vision that prevents them from perceiving silos and biases, which makes it challenging to trust their own data.

Although questioning your data always helps, you must do anything in your power to make your data trustworthy. There are AI-driven tools that can be used to streamline and manage the data quality assurance process, such as the ones used for monitoring the organisation’s data infrastructure, where you can see hallucinations and instances of unpredictability.

The panel’s advice

The International Association of Privacy Professionals (IAPP) is a great source of guidance for those who want to improve their knowledge of AI governance.
Consider whether you are using proprietary data or RAG to bring in data from external sources or multi-agent applications.
Data protection requirements vary widely depending on the use case.
We are in an age when data will tell you the story you want it to tell.
Establish clear boundaries between human responsibility and what the AI is responsible for.

AI & Automation