In April 2016 West Point’s Combating Terrorism Center (CTC) released preliminary analysis of over 4,600 unique ISIS administrative records, the largest such collection ever circulated in open sources. The majority of these records, originally delivered to NBC News by an ISIS defector, consisted of entry forms for foreign fighters joining the Islamic State. The leaks have provided researchers and intelligence practitioners with never-before-seen details of the organization’s geographic, educational and occupational diversity, as well as fighters’ preferences for roles within ISIS, prior jihadist experience, and entry methods into Syria. It’s hard to overstate the value of these documents – 4,600 individual ISIS members all providing their own self-testimony on their backgrounds and ambitions. The sheer volume of the data leaves nothing to the imagination and marks a pivotal point in gaining a new understanding of the organization.
Traditional Approach to Data Ingest and Analysis
The task of thoroughly analyzing several thousand handwritten documents for all of their intelligence value is no small order for a traditional research institution, and we see CTC’s humble approach in the introduction to their report, which is littered with qualifiers such as “more analysis is required,” “the size and challenging nature of the data,” and “a significant amount of work remains to be done.” We get more insights into CTC’s methodology in the acknowledgments to the report, thanking various researchers for “translating, coding, and organizing the massive amount of primary source information.” CTC is outlining a well-grounded approach to traditional analysis, which involves researchers poring over the documents by hand and assigning tags based on key trends. Coding or tagging the data in this fashion allows analysts to create distinct groupings within the dataset and to make comparisons between different data subsets. The result is a comprehensive and thoughtful trend analysis on the leaks, complete with charts and graphics.
Technology Tools Can Streamline Analysis
The process I’ve just outlined of analyzing 4,600 records and producing accompanying data visualizations may seem to be a painstaking research task only approachable by a leading research institution such as CTC, but there are a variety of powerful software tools which can streamline the process of ingesting, analyzing, and visualizing datasets (even datasets exponentially larger than 4,600 records). Today’s leading technology tools can augment traditional analysis to produce rapid insight into vast amounts of data at an exponentially greater level than ever before.
Here’s how Praescient would approach the initial ingest and analysis of the ISIS leaks. The first priority is converting all records, which appear to be structured PowerPoint and Excel responses, into a single digital form. Next, the records can be ingested into an advanced link analysis platform. Many leading platforms include document translation capabilities (of varying accuracy) which can expedite an otherwise arduous human workload.
The most important step is creating a data ontology which tells the tech platform how to interpret and organize the data being ingested. This data model defines all possible object types, properties and relationships that will be created when the data is integrated. As each ISIS fighter’s record comes into the database, it’s recognized as a distinct object, perhaps a “person” entity, assigned relevant properties (i.e. age, gender, educational background), and connected to other entities with which it has a relationship. The whole data ingest and integration process may be finished in a few seconds, especially with a relatively small dataset of 4,600 records.
Suddenly thousands of pieces of distinct data are formed into a manageable dataset and, like on a digital potter’s wheel, the analyst can intuitively form and shape the data in a myriad of ways. The burden of deriving meaningful insights is met with the ability to geographically locate entities, visualize relationship networks, temporally plot events, and histogram tens of thousands of properties in seconds, filtering by age groups, education levels and other affiliations. In the end, an analyst can share these insights by creating multiple visualizations from within a single platform, instead of hand generating charts and graphics in Excel.
The New Standard for OSINT Analysis
This is not meant to be a criticism of CTC’s analytic product, which is quite impressive and painstakingly thorough. Our purpose here is to show how currently-available technology tools can improve the speed and ease of traditional analysis. Returning to CTC’s initial statements, yes– the data is vast and challenging in nature, and yes– significant work remains. But let’s embrace these challenges and eagerly leverage a new breed of analytical technologies against them. A recent report on the foundations of ISIS, released by the Rand Corporation, can serve as a model for this type of analysis. The final report, which was a collaboration between Rand, a Praescient analyst, CTC, and many others, leveraged a number of advanced technology platforms including Palantir to exploit ISIS primary source documents and extract the most relevant insights. It is our hope that this type of partnership between research institutions, government, and technology integrators like Praescient can set the new standard for OSINT analysis.
Image credit: Reuters/BBC