Are we on the cusp of digitally transforming data retrieval beyond recognition?
Whether it’s car keys, TV remotes, pets or mislaid spectacles (tip: you’re wearing them), people spend a lot of time searching for all sorts of things – both at home and at work.
While we all know that the best antidote to this is being tidy and organised, many find that’s easier said than done. According to a study commissioned by IKEA, we spend almost 5,000 hours (about 208 days) of our lives looking for things around the home.
Although losing so many of these precious days to rummaging for misplaced physical items sounds like a genuine waste, knowledge workers, according to McKinsey, compound this even further, with 9.3 hours per week spent on finding and gathering information at work.
McKinsey’s report is a decade old, but the fact that it’s still extensively quoted and its figures have remained relevant shows that enterprise search hasn’t had its transformational moment yet.
Folders – both in their physical and digital formats – have always been key to making data searchable. What lies at the core of the digital data management challenge, though, is processing all the unstructured information that employees and clients leave behind in their digital trails, and gleaning valuable and actionable insights from it across the different functions of the business,
Folding your data lakes into a unified database
If you want to spruce up a messy room, you must sort through the objects and categorise them into containers. A process analogous to this happens when a large language model (LLM) cleans and categorises unstructured text.
In order to “structure” word documents, email messages, PowerPoint presentations, survey responses, meeting write-ups or posts from blogs and social media, the text is first segmented into tokens – essentially words, or a short sequence of them – which are then subjected to various linguistic analysis techniques.
The outcome of this processing is a relational database where the tokens the unstructured text has been broken down into are indexed and can be treated as structured data. LLMs can convert video and audio into indexed data too, using technology specific to these formats, such as computer vision and voice recognition.
Considering that structured data only accounts for 10 per cent or less of the quintillion bytes of data produced globally per day, the automation of unstructured data processing is revolutionary in itself. Intelligent data processing can bring about a step-change for data analysts too, who, according to research, currently spend 60 per cent of their time cleaning and organising data.
Data analysis is a job that has always had a strong monotonous, manual component – at least until now. With today’s AI-based tools, analysts can now spend the time automation has spared them to complete more valuable tasks, such as modelling data.
A right sort
Transforming this unstructured data into entirely new, neat databases can be a daunting prospect, but the integration of a business’s existing databases has its own challenges too.
To create a single source of enterprise truth, a business must either integrate its current data systems, or migrate that data to a single platform.
Both methodologies have their advantages. While migrating data to a cloud platform comes with some disruption to business operations, integrating different types of databases with specialised software components, called “connectors”, across functions is more of an ongoing exercise.
Irrespective of which data centralisation path a business chooses, synchronising and harmonising data across different systems is a pivotal part of the process.
Data must also be cleansed to eliminate ROT (redundant, obsolete and trivial) data to prevent it from clogging up the unified data search system.
Grounding LLMs in enterprise data
While large language models are instrumental to making enterprise search more efficient, they can also affect the system’s reliability.
Although there are a number of methods to ensure that an LLM’s search results always remain relevant for enterprise search – and domain-specific data in general – the most common is retrieval-augmented generation (RAG). This is a method that allows private and proprietary data from a business’s walled garden – a closed ecosystem with limited and controlled access to external systems – to be used by LLMs as raw material for generating a response.
What RAG does is enhance the accuracy of LLMs by integrating an enterprise database as external knowledge into the answer generation process. As long as LLMs’ tendency to hallucinate is curbed by grounding them in enterprise data, they can improve enterprise search in several ways. They enable a more effective retrieval of information by focusing on semantic meaning rather than keywords, which means that they can handle synonyms, misspellings and fuzzy searches efficiently.
As mentioned earlier, they can also help automate indexing unstructured data. Although these two features are real game-changers, what will perhaps strike users most is how LLMs change the way they interact with enterprise search tools. LLMs create an interface where an employee’s query formulated in natural language is then translated into structured query language (SQL) used for managing indexed databases.
Only a couple of years ago, data management professionals were arguing for training all knowledge employees to use SQL, to squeeze more value from data. But with the emergence of today’s gen AI tools, it would be much harder to make a case for this. The reason is that gen AI radically democratises access to data management tools, as – with the help of natural language to SQL converters – knowledge workers can now make data queries without any real SQL knowledge.
NLSQL lets users type SQL queries in plain conversational English, which is then translated into executable SQL code, leveraging its natural language (NL) processing capabilities and machine learning algorithms.
Gen AI-powered enterprise search engines can also get the answers to more personalised queries, based on the user’s behaviour, previous searches and a running project’s context – a feature called relevance tuning or relevance optimisation.
As these capabilities suggest, the most advanced enterprise search solutions, sometimes referred to as intelligent assistants, are firing on all cylinders, combining all the latest technological advancements.
However, while making the most of advanced digital technologies, such as NLP, semantic search, LLM-based summarisation, computer vison and the like, they also integrate tried-and-tested approaches such as keyword-based searches, relational databases and SQL.
It stands to reason, however, that that these tools need to make the most of available technologies both old and new if they are to even rival a hypothetical human colleague who has been with the company since its foundation, knows the content of all the databases by heart, has read, watched and heard all the emails and company related blogs and posts and can reel off any of this information in a few seconds.
© 2024, Lyonsdown Limited. Business Reporter® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543