The London Interbank Offered Rate (LIBOR) is a benchmark rate that some of the world’s leading banks charge each other for short-term loans. It serves as the first step to calculating interest rates on various loans throughout the world. A number of large banks and brokers have been brought to trial and fined billions for their roles in fixing the LIBOR. During the discovery phase of these cases, law firms received massive amounts of subpoenaed data, to include structured and unstructured email communications, chat logs, and phone records. Within this big data environment, firms often struggled to process and organize data for review, let alone efficiently and effectively investigate the data. In response, Praescient has developed a niche analytic offering combining cutting edge partner technologies with proven workflow methodologies.
In this successful use case, we investigate the collapse, and subsequent bankruptcy, of Lehman Brothers in 2008. Following the firm’s bankruptcy filing, many bankers and brokers were accused of colluding with one another to rig the LIBOR to insulate themselves from loss. In response to the analytic challenge presented by this, and other LIBOR-specific investigations, Praescient was contracted by legal clients to identify nefarious actors and uncover indicators of intent to fix insurance rates. To accomplish this mandate, Praescient partnered with NexLP, a powerful cognitive computing platform that leverages advanced natural language processing technology.
The NexLP Solution
[Watch the full NexLP LIBOR demo here.]
NexLP combines powerful artificial intelligence and machine learning components with an intuitive user interface that allows analysts and investigators to efficiently perform link/network analysis, sentiment analysis, temporal analysis, and pattern detection on massive volumes of data. These capabilities mesh perfectly with the big data analytic needs of firms working LIBOR-related cases, and specifically those associated with the Lehman Brother’s crash of 2008.
Step 1: Data Modelling and Integration
The most critical aspects of any big data challenge are data modelling and integration. If the data cannot be ingested quickly and accurately, at scale, into the analytic platform, the analyst’s mission is doomed from the start. Fortunately, NexLP’s compatibility with the types of data encountered in LIBOR investigations allowed us to rapidly ingest large data scales and immediately begin the analytic process on up to hundreds of millions of documents and email threads.
Step 2: Filtering for Investigation-Relevant Communications
To better understand our corpus of data, we began by utilizing NexLP’s Story Engine for graphical network analysis capability. Using the Story Engine, we can visualize these communications and hone in on information relevant to the investigation. In Figure 1.1, which shows more than 2,000 individual communicators, we can discern nine major network hubs, or distinct communicators. Based on information in the Quick Filters section of the Global View (top pane, Fig. 1.1), we see 66% of all our communications contained only one message thread with no reply. These one-off communications likely don’t pertain to our investigation. Likewise, we can ignore the additional 25% of communications (top pane, Fig. 1.1) that were sent to a large number of recipients. These are very likely newsletters or company-wide announcements sent via a distribution list and therefore not relevant to our investigation. Another high-level filtering exercise we can perform is to view the popular topics discussed (right pane, Fig. 1.1). Although the topics at this level are quite broad, for example funding, bank, or the acronym ‘usd’, we can use this pane to quickly filter out irrelevant topics and narrow our dataset.
Figure 1.1: Global view break down of pertinent high level filters and visualization of communications. High level communicators can begin to be parsed out at this point.
Figure 1.2: Global View break down of results and filters after a date range filter has been placed on the overall data set.
In addition to filtering by topic discussed, Story Engine also allows us to filter by start and end date. For this investigation we chose to narrow our search to the window between September 15 and September 26 (Figure 1.2), the period directly preceding the Lehman Brothers crash. Using this technique, we are left with a more manageable dataset of 10,556 documents and 5,532 email threads.
Step 3: Using Machine Learning to Further Parse Out Relevant Communications
Even after filtering our data set to the exact date and time range we were interested in, we were still left with thousands of communications which would be cumbersome to review by hand.
Instead we used Story Engine’s machine learning capability to teach the system how to review communications on our behalf, selecting which were relevant and irrelevant. Figure 1.3 shows the thread viewer, which we utilized to review messages manually until the machine learning algorithm took over. In the thread viewer, we viewed communication threads one-by- one, selecting the “yes” box when the thread mentioned LIBOR rates or other key terms relevant to our investigation. We selected the “no” box when the threads were not relevant or came from automated market tracking systems. Within minutes, the Story Engine was able to anticipate our selection and we had to manually review fewer and fewer communications. After completing our computer- assisted review of the communications dataset, which took only a few hours as opposed to days or weeks, we identified 204 documents and 54 threads as directly relevant to our investigation.
Figure 1.3: Email Thread View – effectively allows the platform and analyst to judge documents of interest
Step 4: Analyzing Remaining Dataset to Identify Potential LIBOR Rate Fixers
Our final step was to analyze our much smaller, curated dataset to uncover likely LIBOR rate fixers. The top communicators within this data set (see top pane, Figure 1.4) were Subject A (redacted), Subject B (redacted) and Subject C (redacted), while the highest scored words were ‘libors,’ ‘problem,’ and ‘Lehman.’ Additionally, the visualization (Fig 1.4) now shows a single major communicator whose correspondence fall within the machine-learning parameters we’ve set.
Figure 1.4: Story Engine display after utilizing the machine learning process in the thread viewer application.
The next obvious avenue of approach is to investigate this highly linked communicator. As a well-connected individual, this person (Subject D) could be the logical center of any rate fixing discussions. However, after we interrogated the communications data for Subject D using the thread viewer, it was revealed that he is not a major player or a key communicator. The vast majority of his communications were email blasts that went out to distribution lists conveying general market updates. So even though he was the single most connected communicator in our network, his threads are irrelevant for our investigation and we can discount him as a major player. Reaching these types of “dead ends” are an expected part of any analytic investigation, but because of the machine-learning and visualization capabilities of NexLP’s Story Engine, we deal with fewer dead ends, and are able to move to the “next best” investigative path quickly and accurately.
In this case, instead of continuing to focus on highly-connected individuals, we turned our attention to suspicious topics of conversation. We selected ‘libor’ as a topic filter on the bottom right of the global view to bring back only communications that included ‘libor’ in the subject or body of an email. This allowed us to isolate 43 documents and 9 threads, which we quickly reviewed using the Story Engine’s Thread Viewer. This expedited review revealed a conversation between two connected subjects that indicated collusion related to LIBOR fixing (see Figures 1.5,1.6)
Figure 1.5: Visualization showing a communication thread including key communicators after filtering for conversations including ‘libors’.
Figure 1.6: A portion of the email communication.
Based on this lead, we expanded the communications network for Subject A (redacted). Figures 1.8 and 1.9 show the Global View summary for Subject A as well as a snap shot of his communication with his colleague, Subject B (redacted). After examining Subject A’s communications in the Thread Viewer, it becomes clear he was colluding with others to fix the LIBOR at numerous times. Ultimately, the individual was found guilty of rigging the global benchmark interest rate and sentenced to four years in prison.
Figure 1.8: Global View summary of a key communicator.
Figure 1.9: Snap shot of conversations between key communicators in the Thread Viewer.
Utilizing the machine learning and natural language processing capabilities of the NexLP Story Engine, alongside Praescient’s proven analytic methodologies; we were able to rapidly sift through terabytes of unstructured data to identify the most pertinent, incriminating information. This included the rapid integration of hundreds of thousands of communication events, quickly narrowing those events to a pertinent set of manageable communications, and leveraging machine-learning and data visualization to further hone in on truly relevant actors. Ultimately, using this workflow, we were able to pinpoint key actors in the conspiracy and highlight the specific communication events that showed collusion associated with LIBOR.
Praescient was founded in 2011 by a team of analysts, entrepreneurs, and engineers committed to applying cutting-edge analytic technologies and methodologies to complex information challenges across the globe. Praescient specializes in technology assessment, advanced training, intelligence analysis, and investigative support to Law Enforcement, the Intelligence Community, the Department of Defense and the Legal and Commercial sectors.
Founded by a team of data scientists, programmers, and eDiscovery experts, NexLP uses Artificial Intelligence and Machine Learning to derive actionable insight from unstructured and structured data. Powered by a proprietary cognitive computing engine, NexLP uses next generation text analytics to help corporations uncover answers in their data. Their specialties include; natural language processing, machine learning, pattern detection, sentiment/relevance analysis, eDiscovery, data visualization, and text mining/analytics.
Watch the full video demonstration on the Praescient YouTube channel here