Legal tech in 2025: Data, data and more data management


Posted by Andrew Lindsay, general manager at Legal Futures Associate LexisNexis Enterprise Solutions

Lindsay: Law firms need to use their own data to train their LLMs

Last year was when the adoption of generative artificial intelligence (AI) in the legal world was proven a certainty. Even the staunchest sceptics are now recognising that it is here to stay. But was it also the year that the AI ‘hype bubble’ burst?

The initial excitement and DIY approach to AI has been trumped by the need to demonstrate tangible return on investment from AI investment. The Boston Consulting Group recently reported that 74% of companies struggle to achieve and scale value when adopting AI, and legal providers are taking note, only considering investing in AI solutions that deliver demonstrable business value to their firm.

Data strategy and management should be the number one priority

Consequently, it’s safe to say that a robust data strategy for continuous and thorough data management should be the number one focus for law firms in 2025.

Today, there is the utmost recognition that the promise of AI and more specifically, generative AI, relies almost entirely on the quality and integrity of the data that the large language models (LLM) are fed.

Research shows that technology companies are predicted to exhaust publicly available data for LLM training as early as 2026. So, here’s the rub, especially for those firms who are tempted to buy ‘sexy’ systems that say ‘AI’ on the tin as a short-cut to AI adoption – law firms need to start thinking about their own data source, and how they can glean value from it before they even attempt to gain efficiencies from AI exploiting internal knowledge.

Let’s face it, no law firm can deliver better advice and ‘lawyering’ by simply using publicly available data (i.e. ChatGPT) – but, by using their own data to train their LLMs, that’s when real value can be derived.

A legal practice’s invaluable knowledge and expertise resides in the data held within its systems, so it makes sense that, to truly extract value from AI technology, using their own data warehouse to power it is essential – in addition, of course, to external private and proprietary sources of data, whose quality, reliability, and integrity are proven.

Entangling data – a messy affair

However, if we are honest, many lawyers’ data houses are not necessarily in ‘order’, and the task of solving the data management problem can undoubtedly be difficult and messy.

While such projects are not going to be enthusing and exciting, they are nevertheless essential – not only because of the promise the future of AI holds, but for the increased client and legislative demands weighing on modern legal practitioners. Law firms are therefore better off having a well-developed data strategy that is ‘well on the way’ to implementing AI, rather than treating data cleansing and management as ‘tomorrow’s job’.

The garage analogy is a fitting comparison. For a garage, that for the best part of its existence has been filled with ‘stuff’, and the door lowered to hide the clutter from being seen. Sorting through, deciding whether to discard or retain, and then organising the identified useful stuff, is a daunting, drawn-out, and potentially painful exercise.

What should be thrown away? What’s the decision-making criteria? Should the whole garage be organised in one go, or should the process be staggered? However arduous the task, it is better to consider these factors before decluttering to make sure nothing of value is lost.

As a result, a plan of action can be determined, and the garage will finally be in a fit state to house the new shiny car and comply with the reduced insurance policy cost of keeping it in a locked garage – a win-win!

Considerations for a data management

Today, firms have disparate, disorganised and duplicated (even triplicated) data across various formats – Word, Excel, Outlook, PMS/CMS/CRM systems, and more. A key reason for this is that, in most firms, digital transformation has occurred gradually over the years, often by converting hard copies into digital files.

A careful process of identifying the best, most representative documents to use for training, rather than just feeding all available documents into the model, is crucial.

Due to the colossal volume of data residing in law firms, they must conceptualise and build a data framework to collect, store, curate and manage so that every piece of data is held only once. This data normalisation is important to ensure data quality and integrity.

Routinely, some files may be drafts, outdated versions, or not representative of the firm’s best practices. If these lower-quality documents are used to train the LLM, it could lead to the model producing biased or inaccurate outputs. So, law firms then must determine which data is trustworthy and appropriate for training AI models.

The firm’s data strategy must be driven by the business need and its timeline for AI adoption. Of course, the ultimate vision has to be a carefully cleansed and automatically managed data environment, but realistically this goal cannot be realised in one day.

Data strategy and management projects can take anywhere from two to five years to deliver the full return on investment. In the interim, firms must decide what data they need immediately for training the AI models so that the AI adoption vision can progress.

Part of the data strategy must also be determining, or even taking a stand on, who actually owns the data that will be used to train the LLMs – the law firm or its clients?

Some food for thought: A client instructs the firm to act on its behalf. The firm uses its knowledge, expertise, and experience to process the legal case, delivering an outcome and result for the client. Thereafter, the firm anonymises the client files/data, removing the client-specific information to the extent that it is not possible to identify the client.

This experience collated and accumulated through thousands of such cases and files (in the form of data) is invaluable and, if fed to the LLM, will categorically improve the law firm’s future legal service delivery for the better.

Nonetheless, it is important to define the intent of this client data usage from the get-go, ensuring the client is informed, even if anonymised, so that no compliance implications can be realised.

In summary, the law firms that prioritise the development and execution of a realistic and comprehensive data strategy are the ones that will be best placed to support and derive value from their AI initiatives in 2025.

As the AI market matures, partners and business owners will be expecting to see real, tangible return for their continued commitment, creating ‘AI hope’ as opposed to ‘AI hype’, proving the value that AI holds for the future of law… and your business.

Tags:




Leave a Comment

By clicking Submit you consent to Legal Futures storing your personal data and confirm you have read our Privacy Policy and section 5 of our Terms & Conditions which deals with user-generated content. All comments will be moderated before posting.

Required fields are marked *
Email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Loading animation