Ayman Sayed at BMC Software examines the implications of poor-quality data for artificial intelligence and how it can impact businesses and social groups in the real world
The saying “garbage in, garbage out” has proven extremely relevant in the context of artificial intelligence (AI). Without high-quality data, even the most advanced AI systems will falter and produce biased, inaccurate, or meaningless outcomes.
Although AI is making what was once impossible an everyday occurrence at almost lightning speeds, it remains infinitely bound to the data it consumes. Data, consequently, requires unending attention. Organisations cannot be complacent with it, nor grow tired of it – it is paramount to AI’s success
AI needs data, but what does data need?
Put simply, AI is about making sense of vast amounts of data to derive insights, automate processes, and make predictions. In a sense, machine learning (ML) models, trained on data to recognise patterns and make decisions, are the backbone of AI. And the effectiveness of these models hinges on the diversity, quality, and volume of the data they are fed.
Undoubtedly, access to large datasets is important. However, for it to have organisational use, the data must be of a high quality. Accurate, clean, and relevant data ensures that AI models learn effectively and make reliable predictions – poor-quality data can lead to erroneous conclusions and decisions.
Diverse datasets that accurately represent the target population are invaluable for producing unbiased AI systems. Biases in datasets can perpetuate and even amplify societal biases, creating unfair and potentially harmful outcomes.
For instance, if a bank principally provides loans to customers from higher socioeconomic backgrounds, AI models that trains on that data will do the same thing, denying loans to those from lower socioeconomic backgrounds. Frequent reviews and audits of AI systems can identify the areas that ethical guidelines for AI development and deployment could mitigate bias and ensure fairness.
Large volumes of data are paramount to AI systems – they facilitate the learning of complex patterns, reduce bias, and enable the production of personalised solutions. Amid the rapidly changing landscape of the contemporary digital world, the relevance of data can diminish quickly. Datasets must therefore remain up to date to ensure accurate and effective AI-driven predictions and insights. This requires a healthy infrastructure.
For example, in the retail industry, real-time or near real-time data is critical. When products are purchased in-store or online, that data must be captured instantly so the current product inventory is up to date. This prevents customers from waiting on items that were never in stock or trying to buy something in the store that was only sold online.
Handling data is foundational
It’s not unusual to see data scattered throughout multiple systems and departments within a singular organisation. These silos can impede the utilisation of data and hinder AI initiatives. Organisations can remedy this by implementing data integration tools and platforms that facilitate seamless data flow and accessibility.
Inaccurate, inconsistent, and incomplete data compromises the integrity of AI models. Ensuring data quality through rigid validation and cleaning processes is a constant challenge. Different date formats are a simple example of poor-quality data – whether it is 01/10/2024 or October 1, 2024, this type of inconsistency can lead to significant data analysis errors.
The Personal Information Protection and Electronic Documents Act (PIPEDA), the EU’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), the upcoming Digital Operational Resiliency Act (DORA), and other data protection regulations hold organisations accountable for their handling of personally identifiable information (PII) and overall data privacy and compliance. Ensuring that data collection, retention, and usage practices adhere to these regulations is critical.
Although people like to build the plane and fly it too, they need the necessary data governance framework first—defining data standards, policies, and procedures, as well as assigning roles and responsibilities for data management. Resources can be found with the Data Governance Institute, the EDM Council, the MITRE Corporation, and Gartner.
Organisations need scalable solutions to store, process, and analyse data efficiently. Modernising data infrastructure to handle the complexity and scale of big data is critical. That infrastructure can span from on-premises to cloud to a hybrid of both and must factor in compute power, security, and recovery considerations.
AI needs the right preparations
Organisations need to cultivate a culture that values data and understands the power of data-driven decision-making – people are a vital variable in the data equation. Business leaders need to catalyse their employees’ willingness to embrace this kind of culture through effective training. It can open their eyes to the benefits of data literacy, data sharing, and incorporating data analytics into everyday business processes.
As we continue to understand the capabilities of AI, the focus must remain on the data that underpins it. Diverse, high-quality, and well-governed data is the cornerstone of effective AI systems—it keeps businesses moving. By adopting strategic approaches and addressing the challenges in data management, organisations can unlock the true power of AI, driving efficiency, innovation, and growth.
AI is undoubtedly shaping the complexion of the business world and consumer world alike. However, it is easy to forget how intrinsically linked AI’s success is with the quality of the data it leverages. When data structures and systems are perfected, the heights AI is capable of reaching are virtually unbound.
Ayman Sayed is CEO of BMC Software
Main image courtesy of iStockPhoto.com and Natali_Mis
© 2024, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543