Praescient is recognized as a thought leader when it comes to data analysis and each of our staff members embody that expertise. Our people work as analysts, engineers, and field service supporters, augmenting our technology partners’ powerful analytic tools. Praescient is highly effective at delivering actionable insight, and ultimately enabling mission wins, for our clients because we merge the technical prowess of our engineers with the proven abilities and workflows of our analysts.
Praescient employees frequently engage in continued education in order to stay up-to-date on the everchanging world of information technology. James Spencer, a Praescient Analyst Consultant, has recently enrolled in a new course from the Massachusetts Institute of Technology (MIT) designed to get to the heart of the definition of big data and the importance of the analyst / engineer dynamic. Called “Tackling the Challenges of Big Data,” this course is indicative of Praescient’s approach to leveraging big data analysis against tough problem sets.
I had a chance to speak to James in a telephone interview and we discussed some of what he is learning, as well as his thoughts on what big data means for the practice of analysis:
Charlotte: What attracted you to the “Tackling the Challenges of Big Data” course?
James: Prior to this course, I had been taking the occasional self-directed course through Edx, which offers course work from a bunch of big name schools like Harvard, Berkeley, and MIT. There is a wealth of information that is really accessible there and some of the paid courses even offer certifications. Anyway, I saw an email from MIT promoting this course and what really attracted me was the basic description that it would offer overarching, generalized coverage of big data – what is it and how do you deal with it.
Honestly, taking the course is really for my own edification. I wanted to learn more about the back-end of the software. Once you understand the platform on an even deeper level, you get new ideas of how to make the product better and you find new approaches that you haven’t thought of before.
C: The course is conducted in part by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), a leading information technology research organization. What can you tell me about the educators in this unique department?
J: Well, I knew vaguely that MIT has this big computer science department and does all these important things, but before this class I didn’t really know about CSAIL. But what is nice about this course is that they reference all of the concepts and provide full bios of these really accomplished professors who specialize in machine learning, artificial intelligence, and data analytics.
One of the great appeals in taking this course is that even though this is not a class taking place at MIT proper, it was developed and is being administered by some of the forerunners in technology education. Everyone in the world has heard of MIT and some really great, cutting edge research comes from there. So the people who teach these courses are going to understand the fundamentals of big data and are going to do a great job of teaching it to a class.
C: One of the learning objectives established in the course description is to define big data. How has your experience in the course shaped how you define big data as a concept?
J: I already worked with big data everyday as an Analyst Consultant, but it wasn’t always easy to define it. Most people just think of massive databases, large scale sets of information, but don’t really have any context to it.
It isn’t just about the size of the data. Of course, that is a major factor, but it is also about looking at disparate sets of data and the way you’re analyzing it. It is not about how you store the information, it is actually much more complex. Anybody can store data – you could go out to Costco and buy a bunch of 1TB hard drives and string them together and have a great big storage device, but that won’t give you the ability to actually look at the data and derive anything useful from it.
You could summarize it in one sentence – “big data is essentially any type of data that defies analysis using conventional methods.”
C: Where is big data useful? In what sectors should decision makers be taking advantage of big data?
J: Oh, gosh…with data analysis the sky’s the limit. In any sort of field where you are collecting data you can find some sort of insight into it. Research and development is of course a major example. Any time you’re in a research environment you’re collecting potentially a huge amount of data; this could be stuff as varied as genomic information and flight test data from rocket launches. It really benefits researchers to be able to crunch their numbers quickly.
Healthcare is a key field here and you can look even further into many sub-sectors within that industry. You could be looking at things like patient outcomes, but also billing data or even healthcare fraud. Really, there are so many fields where this kind of analysis applies.
(Read Praescient’s blog post Healthcare Fraud: Big Data to the Rescue? to learn more about our approach)
C: MIT’s course description mentions a case study visualizing Twitter data. What do those visualizations look like and how useful are they?
There is a lot of stuff out there that attempts uses Twitter data to conduct what’s called “sentiment analysis.” The goal with that is to look at that data and try to determine how people within a particular geographic region feel about a particular topic. The MIT course uses a really interesting tool which uses GPUs to analyze the social media data. The advantage here is that graphics processors are way faster than other processors because they are designed to render all the pixels on a screen, so you can analyze massive amounts of information quickly.
We saw Twitter messages overlaid onto a global map using geolocation data contained within the tweets. You can look at something as simple as concentrations of tweets from major population centers, but you can also drill down into keywords to look for specific trends. For example: Ebola was a topic that was heavily tweeted about recently, and if you were to look for that term with this tool you would see heavier concentrations in Central Africa, as well as in the US due to prolific news coverage.
(Want to discover more tools for sentiment analysis? Check out Praescient’s technology partner, Recorded Future!)
C: How does machine learning factor into using big data?
J: Speaking more from my own understanding and knowledge of machine learning, the way that I think about it is that it is kind of an element of artificial intelligence. It enables you to break out of the structured, repetitive, approach that computers take when running algorithms and instead helps the computer “learn” patterns of behavior so that it can predict what a user means, or process data that isn’t quite formatted the way the computer expects to see it. Normally this sort of divergence from a rigid rule set would cause the computer to crash or the process to be terminated. Machine learning and AI allows for more variability and enables the computer to figure out the inferred meaning of input or commands, which provides greater flexibility and utility.
We already see this in our everyday life with things like Google’s predictive search terms – you start typing something in the search bar and immediately Google starts guessing what you’re looking for and more often than not it’s spot on. It all makes our interactions with computers that much more efficient. It also helps the computers start to learn our ways of communicating and take guesses outside of what is rigidly written into their code.
C: What kind of collaboration needs to take place between analysts, data scientists, and decision makers to make big data useful?
The problem as I see it is that many enterprises deal with analysis and engineering as two separate entities that don’t really talk to each other. I think you lose out with that kind of perspective. You have the analysts who are subject matter experts and can answer detailed questions about the data sets themselves, then you have the data scientists who are the experts in curation of the information. These folks need to talk to each other and figure out what it is they are trying to answer. That is where you can derive very useful insight.
Think of it this way: if an engineer or data scientist makes a tool that’s really good at answering question “X,” but the analyst cares about answering the question “Y,” then it doesn’t matter how revolutionary that tool is because it doesn’t meet the needs of the users. If the analysts and data scientists work together, they can generate understanding on both sides. Analysts will start to understand what some of the technical limits are with their tools, such as the limits on the size of data that can be imported, or that it can only be analyzed at a certain speed. On the opposite side, data scientists can understand how analysts think and tweak the software to make it more useful. This symbiosis between the engineer and analyst is something I think we do really well at Praescient.
Praescient takes great pride in employing smart folks like James who are passionate about expanding their knowledge of data analysis. We encourage all of our employees to pursue new educational opportunities and offer tuition reimbursement as part of our extensive benefits package. Check back with the Ideas Blog for more from our awesome team!
Praescient Analytics is a Veteran-Owned Small Business that delivers training, data integration, platform customization, and embedded analytical services in partnership with leading technology providers. Praescient’s teams of analysts and engineers provide comprehensive solutions to federal and commercial clients engaged in critical defense, law enforcement, intelligence, cyber security, financial, investigative, and legal analytics missions.
Charlotte Stasio is Praescient’s Communications Specialist.