Imagine having a hoard of very valuable information, but that it’s locked away in little crevices and nooks where only a few people have access. Imagine what could be done if all of it suddenly became available to the people who need it to do their jobs, helping to solve problems and protect our citizens. This is the essence of my job; I unlock data from government databases and other sources to be used by federal law enforcement. Our users analyze this data on the Palantir platform, which allows them to operate more efficiently than they would be able to with other tools and helps them find unprecedented insights.
The platform can ingest any kind of digital data from documents and databases to media files like images or movies, transforming them into our working format, and then put them at the fingertips of our users. To accomplish this, we have a broad set of tools to use depending on the situation. A user inputting data manually on the front end goes through a different process than a batch import from a remote database. For many of the batch imports, I use a Domain Specific Language in Groovy designed for ingesting semi-structured data like CSVs and XML. I can rapidly create an end-to-end integration from a source like the federal government’s spending data and put it into Palantir so that investigators can identify potential instances of fraud.
After the script is created for importing data, we have to orchestrate the execution schedule for it and all the other jobs that are of varying lengths and frequencies. For the balancing act of keeping our users’ data fresh without bogging down the servers during primetime for our users, we use Rundeck, a state of the art process automation tool. Rundeck takes out the possibility of operator error from manual errors and frees up human time to focus on harder problems than pushing buttons in a certain order. By using modern tools like this, we can have fewer engineers accomplish more, all while staying sane.