- Legal Futures - https://www.legalfutures.co.uk -

Building Unstructured Data Solutions for Today and Tomorrow

NuixBy Alex Chatzistamatis, [1] Principal Solutions Consultant for Legal Futures’ Associates Nuix [2]

Now that we’ve covered collections [3]—what  I believe is a foundational component to an unstructured data solution—I  think it’s time to focus on easily the most important requirement: processing.

Continuing with the concept of building a structure (haha, see what I did there?) one could argue that processing is the frame which everything else is ultimately built around. As I’m sure most of you reading this are aware, processing is a make or break stage. The key thing to remember is your processing tool is only as good as the data you plan to consume, regardless of how narrow or broad your scope may be.


Frankenstein Processing

As I discussed in Part 2 about collections, processing is one of the other steps that can suffer from the Frankenstein effect—various technology components being used to achieve the same result. Even though I’ve seen this time and time again, it still shocks me that many technologists believe there’s no immediate solution for ‘Frankensteined’ technology.

During a recent consultation, I ran out of fingers counting the number of separate tools the organization was using, each being a point solution for a specific data type. Within a few minutes of discussing some of the capabilities that Nuix processing has to offer, the client was blown away with the variety of data that Nuix can process, regardless of total volume, with unmatched velocity, thus proving there IS a better way.

Our processing ‘way’

Some of the most common requirements my clients have when trying to address their processing challenges include:

Of course, this is just scratching the surface, but the Nuix Engine is up for the challenge. The end-to-end Nuix platform is an extremely powerful solution for organizations, but the unique and extensible Nuix Engine is the driving force that makes it all possible.

The Nuix Engine enables endless possibilities including the ability to:

Now, allow me to geek out a bit and highlight some of the additional benefits Nuix processing can provide an organization:

Taking on the breadth of your data

Take a minute to review the list of file types below. That’s right … Nuix can consume all these different data sources at unparalleled speeds, all while making the data fully searchable from a single pane of glass!

Imagine being able to combine data from endpoints, DLPs, web servers, disk images, user email, mobile device, and chat data from a single location. Pretty cool, huh?


How exactly is this possible? It all comes down to the secret sauce in the Nuix Engine and how our users decide to present the data for their organizational use cases. For example, in preparation for litigation, it may be a good idea to perform a proportionality analysis to determine the depth of the extraction required.

Doug Austin from eDiscovery Today recently referenced [5] a Craig Ball blog [5] in which the topic of advanced settings are considered . Whether you’re preparing data for litigation, beginning your forensic investigation, or working on data governance initiatives, Nuix has you covered with a wide range of options.

Below is a screenshot of Nuix Workstation data processing settings, which enable the aforementioned benefits. Specifically, I’d like to call out:

  1. Basic Index Settings – Control the depth of details
    1. Full content (text), metadata, and the family/hierarchy of files
    2. Flexibility to control storing the item’s binary and other settings to enable near duplicates, topic modelling, and identifying sensitive data.
  2. Forensic Settings – Control the level of forensic vigour
    1. Recovering deleted files, working with slack space, and file carving
    2. Creating photo thumbnails, skin-tone analysis, face detection, and deep learning.
  3. Advanced Index Settings – Enrich your index with additional options specific to use-cases
    1. Grouping families together for search, enabling exact queries, leverage stemming, or stop words
    2. Controlling the digests and hash calculations
    3. Streamlining imaging, OCR, and metadata workflow.


Once you’re locked in with the appropriate data processing settings (which can be turned into an easily reusable template) buckle your seatbelt! Nuix can consume data of all sizes and take advantage of the horsepower it’s running on to scale from GBs per hour to TBs per day.

In fact, one of my new clients in the pharmaceutical space was able to process a rather large and complex data set with Nuix in 36 hours as opposed to its older technology, which couldn’t even move the needle after close to 1.5 months.

To make it clear, the client recognized an immediate ROI within days of implementing Nuix!

Making $@#$ searchable

“While processing is a key capability of the Nuix Engine, it is so much more powerful,” said Nuix CTO Stephen Stewart. “The Nuix Engine’s ability consume all 10 dimensions of data, normalize all the text and metadata into a consistent schema is amazing. Said more straight—Nuix makes $@#$ searchable.”

Stephen is absolutely right! The fun really starts after the data has been processed and it is normalized and fully searchable. Nuix provides simple yet powerful ways to prepare the data for downstream analysis and review.

Again, regardless of use case, Nuix has you covered:

Early data assessment including number crunching, pivot reports, and metadata.


Wrapping up and looking ahead

Between the volume, velocity, and variety that the Nuix Engine brings to the table to quickly consume various types of data, making it fully enriched and searchable, coupled with the ability to filter and cull hundreds of different ways, it should be pretty apparent why building an unstructured data solution around the Nuix platform makes sense.

Looking back at the some of the processing requirements, here are a few ways Nuix can help solve these challenges:

Taking the critical endpoint data capture using Nuix Endpoint (covered in Part 2), we can layer it next to additional unstructured data sources to be fully processed and searchable using the Nuix Engine. Regardless of data type or size, Nuix is flexible enough to scale based on your organisation’s requirements.

Stayed tuned for Part Four, where I’ll continue our unstructured data solution journey and understand how the data that’s been collected and processed can be explored and analysed using our software.