The Use of AI to Assist Experts in the Analysis of Documentary Evidence

share on

By Howard Elliot

This article discusses the difficulties and considerations relevant to developing Artificial Intelligence (AI) tools for expert witnesses to aid in the analysis of documentary evidence. For this article, documentary evidence means documents discovered or otherwise provided for the purpose of legal proceedings, including reports, emails and other business documents (that is, mainly text-based documents).

The article outlines the aspirations experts and AI developers hold for expert evidence documentary analysis tools, before describing the challenges in trying to realise those aspirations and the work it will involve. In doing so, this article aims to provide a picture of the developing legal tech landscape and its progress towards more ‘complete’ AI tools, especially in the realm of document analysis and expert evidence.

The application of AI frameworks like ChatGPT or other Natural Language Processing (NLP) technologies in the legal domain, specifically for the analysis of documentary evidence, presents both promising opportunities and significant challenges. If the challenges can be overcome, the opportunities presented will largely lead to significant increases in productivity for experts. The article addresses the current reality in which, most of the time, documents are provided as a set of digital files, typically not indexed, and where, in many cases, document management systems are already being used to improve productivity.

What’s the Problem?
Most civil legal cases involve large numbers of documents, often provided through a court direction (e.g. subpoena, notice to produce). These are complex documents with contents semantically bound to the technical subject matter or facts of the proceedings. For example, documents in an IT-related matter may contain language and terms that relate to IT and can only be understand in the full technical context of IT work. Any AI able to read and understand these documents will need to be able to detect and comprehend the contents of these technical documents in the correct technical context.
Moreover, individual documents may be large, in terms of total word count, and hence require significant time and effort to read and understand on the part of the expert and legal teams. Importantly, the interconnectedness between the contents of documents or their significance to the matter may depend on a more detailed consideration of the content and context of the document against the facts and context of the matter. AI for expert evidence will therefore also need to be able to identify the place or importance of document contents to the factual and legal questions of the relevant matter.
A number of tools currently exist (see below) which automate eDiscovery and provide a controlled framework within which a litigation team can review and analyse documents. Some of the tools offer analysis mechanisms that include named entity recognition (i.e. the ability to organise miscellaneous texts according to key ideas and objects, ‘names’, in those texts).
The ideal ‘solution’ would involve an expert training an AI tool to read each document and provide analysis of the contents based on (a) the context of the specific legal matter; (b) the context of the document subject matter area; and, (c) the perspective of the expert. The training would be iterative such that the analysis is recursively refined following feedback from the Expert. The result would be a more focused and intuitive analysis function reflecting the perspectives of the expert.
Advantages:
The availability of such a tool would have some key and obvious advantages. These would be associated with driving the productivity of the Expert and include:
- Efficiency: the ability of AI to sort through large volumes of data faster than a human can, reducing the time and manpower needed.
- Accuracy: By setting specific parameters, AI tools can identify key information consistently without getting tired or distracted (as human experts naturally do).
- Interconnectedness Detection: AI can potentially identify relationships and patterns within the data that might not be immediately apparent to human analysts. More advanced AI could even present these relationships in more diverse ways, including graphical representations such as network diagrams and thematic maps.
Challenges:
1. There are some obvious challenges with an AI approach. These include:
  - Complexity: Legal documents often contain language and concepts that are complex and specialised, making it difficult to program an AI tool to understand them fully.
  - Context and Semantics: Legal arguments often rely heavily on context, which can be difficult for an AI tool to grasp. A single sentence can change meaning significantly depending on the surrounding text and/or the subject matter area. This context is multidimensional and often incorporates a wide range of dynamic elements including:
    1. Context of the specific matter – the facts, people, timeline, and events involved, perhaps defined by the relevant filings.
    2. Legal and jurisdictional context – the relevant legislative instruments.
    3. Context of the expert subject matter – for example a particular industry or sector of an industry.
    4. Context of the expert – the specific area of expertise of the expert and their understanding of that subject
  - Transparency and Accountability: An expert must be able to demonstrate how an opinion has been reached. Often this includes taking the audience on a journey of the evidence and how such evidence supports the expert’s opinion. We may struggle to develop an AI which can demonstrate reasoning.
  - Ethical and Legal Concerns: The use of AI in legal decision-making will raise concerns about avoiding bias and maintaining accountability and data privacy.
Practical Approaches:
There are some approaches AI developers could consider when correcting or refining AI technology for the purpose of assisting expert witnesses. They include:
- Training the AI: The AI tool could be trained or fine-tuned using a set of legal documents that are relevant to the case at hand. Ideally these would include the pleadings and the defence. This way, the technology becomes more sensitive to the specific language, concepts, and context specific to certain cases. Further, the knowledge and expertise of the Expert would contribute to the AI knowledge base. That is, the model could be trained to view documents from a particular perspective (that of the expert).
- Query-based Analysis: Experts could interact with the AI system to pose queries related to the case, refining the AI’s understanding and improving the relevance of its output. This is akin to the ChatBot model such as is used by tools like ChatGPT and Bard.
- Annotation and Feedback Loop: Experts could manually annotate a subset of the documents to teach the AI system what to look for. This could also be a part of a feedback loop where the AI learns from the corrections and insights provided by experts.
- Output: The AI system could produce outputs which highlight key findings, indicate confidence levels for those findings, and show how different pieces of evidence are interconnected. The output could represent the relationships in many ways including graphical representations such as network diagrams and thematic maps.
- User Interface: A well-designed interface can facilitate effective human-AI interaction, making it easier for the expert to input data, pose queries, and interpret the AI’s output.
- Transparency and Auditability: The AI tool could be trained to explain its reasoning and processes whenever arriving at an opinion or undertaking analysis. Being able to trace back the AI’s decision-making process is crucial for ethical and legal compliance. This is particularly important where some of the output of the tool is tendered as evidence.
Document Analysis – what does this mean?
Current models of document analysis follow a general process similar to the following:
Pre-processing is typically the first step and may be used to:
- Correct data errors — these may have arisen as a result of OCR errors. From a legal perspective, cleansing needs to be carefully checked as misspellings and typographic errors may be significant, especially where transcripts and quotes are included.
- Removal of noise-words like “the”, “a”, and “to” are almost always the terms/tokens with the highest frequency in most documents.
- Stemming and lemmatization are the reduction of words to their root form (for example “helping” to “help”).
Feature Extraction is the process of transforming the cleansed data into a form which can be understood by the AI algorithms. There are many techniques used here including Bag-of-words (occurrence of words in a document), TF-IDF (term frequency -inverse document frequency) which describes how important a term is within a document or document collection.
Topic modelling uses techniques such as:
- LDA (Latent Dirichlet Allocation), which uses the words in a document to find topics to which a document belongs.
- LSA (Latent Semantic Analysis), which uses techniques such as singular value decomposition to keep documents and words in a semantic space.
- pLSA (Probabilistic Latent Semantic Analysis), which is similar to LSA but can be ‘trained’ with expectation-maximisation algorithms.
- BERTopic and Top2Vec are advanced topic modelling techniques that leverage the Feature Extraction step to identify topics in documents.
- NMF (Non-negative Matrix Factorization), which involves the use of multivariate analysis and linear algebra algorithms to model topics and cluster documents.
- Document clustering, used to cluster documents together based on their contents.
Summarisation and Chat which functions in the following ways:
- Named entity recognition seeks to locate and classify named entities into pre-defined categories.
- Extractive summarisation takes out the important sentences or phrases from the original text and joins them to form a summary. It can use, amongst other techniques, output from Feature Extraction, to determine the importance.
- Abstractive summarisation seeks to produce a document summary in a more readable form. There are many methodologies used in this area. Perhaps the most relevant to us is the ontological method which informs the summarisation through the use of a domain knowledge base.
How does this section connect with the rest of this article? What challenges or advantages are revealed by this explanation of document analysis? Please make these points clearer.

The Ideal Tool

So what would the ideal AI tool for expert evidence document analysis look like?

Most importantly, it would not take the form of a public cloud-based service. The system must respect any privacy or confidentiality obligations. Some documents may also be subject to data sovereignty restrictions.

The ideal tool would be based on existing AI frameworks and tools. This is because the tool will benefit from future enhancements to the framework. The operation of the tool could include:

Loading relevant legislation and precedent (cited) case law.
Loading and analysing pleadings (and defence). It is likely that this may include interaction with a human to ensure the tool “understands” the pleadings.
Loading and processing the evidence documents.
Allowing the expert to interact and refine the model through, for example, a prompt-based approach.

The effectiveness of the tool would be driven in part by the development of effective prompts. Creating effective prompts for the tool for the purpose of legal document analysis is a nuanced process that requires careful consideration. The tool’s capacity to generate useful responses largely depends on how questions and tasks are framed. Below are examples of key prompt-types that should be considered for the legal tech space, particularly when it comes to document analysis and expert evidence:

Prompt Type	Example
Goal-Oriented Prompts: Information Extraction and Named entity recognition.	“Extract all mentions of the plaintiff’s obligations from this contract.”
Contextual Understanding: information that relies on understanding a context.	“In the context of mobile app development, what does this paragraph imply?”
Document Summarisation: summarise large volumes of text, focusing on key facts, arguments, or principles.	“Summarise the key findings in this test results worksheet.”
Relationship Mapping: relationships between multiple documents or entities based on specified criteria.	“Identify and map the relationships between these emails concerning Project X, John Smith and Jane Doe.”
Conditional and Layered Prompts	“If the document contains a force majeure clause, summarise its conditions. If not, indicate its absence.”
Provide confidence level:	“How confident are you in the accuracy of your analysis of this indemnity clause?”
Specificity	“Extract all mentions of financial terms from these contracts.” “List the emails that discuss the defendant by name or alias.” “Identify any clauses in this contract that could be considered unfair or one-sided.” “Highlight any areas in these communications that may imply misconduct or unethical behaviour.”
Validate and Refine (Iterative Feedback): an expert reviews and corrects the outputs, and those corrections are used to refine future prompts

What is There Now
The integration of artificial intelligence into legal technology (often referred to as ‘LegalTech’) has spurred the development of various tools that can assist in the document analysis process. However, it’s essential to understand that the landscape is continually evolving, and new tools are emerging rapidly.
Here is a list of more commonly known tools relevant to this space.
- Generic LLM platforms including OpenAI’s GPT-3 or GPT-4, Google Bard etc:These can be fine-tuned for specific tasks, including legal text analysis.
- TensorFlow and PyTorch: These general-purpose machine learning frameworks can be customized for specific legal text analysis projects.
- Spacy:An open-source NLP library that can be customized for specific legal text analysis tasks. Spacy has quite good named entity recognition capabilities.
- DocLime – analyses PDF documents and allows you to query the document.
- ClarifyPDF – Summarises documents and respond to simple queries.
- AILyze – Produces summaries, thematic and Frequency Analysis, queries
- Relativity: Often used in e-discovery for large litigation cases, Relativity uses machine learning to help sort and analyze large datasets. Relativity offers an advanced set of features and is widely recognised as a leading solution in this space.
- Everlaw: Provides cloud-based e-discovery and litigation software that uses predictive coding (a form of machine learning) to help sort through large volumes of data.
- Logikcull: Offers automated legal document management and provides quick data discovery solutions.
- ROSS: An AI tool that aims to help legal teams sort through case law more efficiently, although its focus is not solely on document analysis.
- DISCO: An AI-powered legal solution focusing on e-discovery, legal document review, and case management.
- Spellbook – focussed on contract drafting
- Goldfynch – eDiscovery and document statistical analysis
- Opentext – eDiscovery and document statistical analysis
- Nuix – eDiscovery and document statistical analysis
- Kira: Uses machine learning to extract relevant data from contracts and other similar types of documents.
- ThoughtRiver: Focuses on pre-screening contracts to assess their risks before they reach the legal team.
- LawGeex: Compares contracts against in-house policies to expedite the review process.
- Casetext: Uses AI to help with legal research, specifically in finding case law.
- Luminance: Uses machine learning to help with due diligence and compliance-related tasks.
This is a non-exhaustive list serving to illustrate that some solutions exist. Further, additional document analysis tools are emerging on an increasingly regular basis.
Overall, how do these measure up to your idea of the ideal AI Document Analysis Tool for expert evidence? Do the functions of these programs need to be combined for there to be any real document analysis efficacy? Are any one of these powerful enough? Do they obviously lack real expert witness training and feedback programming?
Conclusion
AI tools have great potential in assisting with the analysis of documentary evidence in legal cases. Orchestrating the operation of these tools into a productive toolset for experts will result in significant productivity gains for the expert.
Using the expert as the primary trainer would provide effectively for useful and accurate outputs that can validly and defensibly form an important piece of the expert’s opinion.

Howard Elliott is an ExpertsDirect Exclusive Expert. He is a seasoned technology professional with over 35 years of experience in IT, telecommunications and electronic payments with a proven track record of delivering complex projects and driving innovation.

His expertise includes:

Technology strategy and implementation
Risk assessment and mitigation
Governance
Software development and acquisition
Telecommunications, payments and AI

Get in touch with our exclusive expert here.