What’s the big deal with Artificial Intelligence? Dozens of publications have recently been putting out articles on government agencies focusing on AI implementation and data strategies. In June 2018 The Department of Defense released its Artificial Intelligence Strategy through the commission of the Joint Artificial Intelligence Center (JAIC), a central organization focused on harnessing the power of AI in the DoD. In the last two months alone the Federal government announced it plans to spend $1 Billion on non-defense Artificial Intelligence research and development in fiscal year 2020, and the Department of Energy has created their Artificial Intelligence and Technology Office to effectively oversee their implementation of AI. With all this rush to implement AI, it is important that Federal agencies have an in depth understanding of how to oversee data management to effectively use AI it in the Federal space.
Artificial Intelligence is only as effective as your data. When training models, data congruence, accuracy, and dimensionality are all key to success. Data sets need to be uniform when training models because neural networks focus on differences in data to identify key features. For example, if you are training a model to recognize images of balloons, there are millions of pictures of balloons found freely on the internet that you could use. However, all these images have different pixel resolutions and different color schemes of grayscale versus color. If you were to train a model on purely these images, the model would take pixel resolution and color scheme into key identification features, which would ruin training the model. In order to effectively train a model, images need to be all set to a uniform pixel resolution and color scheme, allowing the model to focus on significant key parameters of the image rather than extraneous ones.
Additionally, accuracy of data is imperative to training success. If you query balloons on a Google Image search you can see for yourself the wide variety of images that show up, not always exactly in line with what you are looking for. Some are images of singular balloons, some put in formations, and sometimes even images of hot air balloons, an entirely different object. It is important that these images are filtered through so that they are all accurate images of what you are trying to identify with the model.
On top of this, efficiency of training depends on the dimensionality of data sets. Continuing the balloon example, the higher pixel resolution the more significant data the model has to parse through, and the more time it needs to train. This can be costly in compute hours and sometimes even confuse the model for having too high of a dimensionality. Therefore data not only needs to be congruent, but also have a sensible dimensionality to effectively train models in a timely manner.
Based on these factors, it is important that the Federal space not only allocates funding to increase research and development into AI, but also into structuring data sets by highly trained AI data specialists. The Defense Advanced Research Projects Agency (DARPA) has been funding AI research and development for over 56 years, and they believe that the DoD’s JAIC should be in charge of managing AI data. A centralized hub for AI data would be an instrumental stride to a successful implementation of artificial intelligence in the Federal space, lifting a majority of the individual agency responsibility in structuring data. Without proper centralized data management, the implementation of AI in the Federal space could render useless. While the iron is hot, data management should be a high priority for the Federal space to allow for a bright future of AI implementation.