A neural network is essentially a much-simplified model of the network of neurons and synapses in our brains. It looks something like a stacked sandwich with layer upon layer resting upon the one below. Each layer is made up of artificial neurons, which are in fact empty computational units that when tickled can get excited and then pass that excitement along to their connected counterparts.
Excitement in this context is measured in terms of how resonant the unit is and how much of that resonance is being passed along. That measure models the strength of the connection between these units and their layers in much the same way as our brain’s connections among its neurons. When excitement (resonance) grows, the connection strengthens and more of that excitement is passed along to the other units.
In building an artificial neural network, one connects each layer of these fake neurons to another layer of other fake neurons and those in turn to another layer, etc., and as they cascade upward, they sort of settle out to a single representation of a thought. Like “baseball”. In order to get to that single thought of “baseball”, you have to start at the lowest layer with lots of thoughts, only one of which is “baseball” and many of which are “not baseball”.
Thoughts of “baseball” are excited by some of the neurons and thoughts of “not baseball” are excited by secondary or lower level neurons, yet associated with the thought nonetheless. As each fake neuron connects with other fake neurons, the network is “learning” about “baseball” being an important thought.
This is how machine learning does its thing and is in a highly simplified explanation, the foundation for Artificial Intelligence.
While this technique has been around for 50 years, its application in fields like Cybersecurity or Market Intelligence is only now feasible through our ability to quickly parse huge data lakes. One of the most successful applications of deep neural nets today is in image recognition. It starts with a picture. To stay in the analogy, let’s say its Aaron Judge, the current Yankee’s phenom. We feed Aaron’s picture into our neural network and we set the excitement metric for each of our fake neurons to the brightness of each pixel in the image. Then we feed tons of other images into the system with similar instructions; Hippopotamus, Dart, Kangaroo, Pencil, Hot Dog, Rabbit, Barcalounger.
At this point, the connections between our neurons have random weights, as our brain might have at birth. Upon “seeing” the image for the first time, our AI algorithms have programmed the neurons to excite excessively on the brightest pixels and less so on the least bright pixels. At the highest layer of the network, we would expect to eventually get an image of Aaron Judge along with the value “Aaron Judge”.
Now imagine this in real-time parsing through billions of images. Also imagine that sometimes along the way toward image recognition, our processor somehow reads the brightness of two or more pixels in a way that causes the mis-assignment of weights, so that now we get an image of a Barcalounger along with the value “Aaron Judge”. You will undoubtedly not recall, but when you were learning how to recognize the world around you, it often happened that you would refer to a tree as a ball or could not find the correct context in which to hold the weird guy who shoved letters into your mailbox.
This is exactly how neural networks operate and is also the framework for modern artificial intelligence.
This is also where the AI world engages a technique known as Backpropagation to rejigger either the pixel reads or the strength of each network connection. And this is where “machine learning” becomes a key in AI and where errors in each training example can be corrected.
To continue oversimplifying, the process starts with the last two neurons, and identifies how much of a difference exists between what the excitement numbers should have been and what they actually were. Then it looks at each of the connections leading into those neurons and determines their contributions to the error and when it gets to the very bottom it can change the weights appropriately. This is the reference to backpropagation – propagating (or tracing) errors back down through each layer from the end result.
When done with millions of images, the neural network gets quite good at declaring Aaron Judge against an image of … Aaron Judge. In addition, each layer begins to “see” an increasingly holistic version of the final recognition. One layer starts to excitedly detect certain image edges that it “knows” is a human head and the layer above may get excited as it identifies sets of shapes that it has “seen” before like a baseball cap or a bat and passes that excitement along to the next layer while actually beginning to self-organize hierarchically without external stimuli.
The next time it “sees” the formation of Aaron Judge-like stuff, it “knows” earlier in the process and much faster than the time before that it “has” an Aaron Judge. The same way you learn that saying certain words to your parents will result in a harsh reprimand.
What is even more remarkable is that today’s neural networks are able to build representations of ideas. Training a neural network with text from a multi-billion word encyclopedic source for example by establishing excitement weights along with spatial coordinates to create vectors among them will result in words that have similar “meanings” showing up alongside each other in searches.
Then, applying vector-arithmetic, one can subtract the vector for California from the vector for Yosemite, add the vector for Yellowstone and end up with Wyoming without preconditioning of any kind.
Training neural networks is the process of assembling vast sets of “things” into multidimensional vector spaces with weights assigned via coordinates related to the proximity of one thought, thing or idea from another. And because of our relatively newfound ability to process vast seas of data in near-real time, neural networks appear to magically mimic human brain activity. But as with all complex problems, the devil is found in the details or in this case may also be found in Montana or Idaho, where parts of Yellowstone Park reside.
And it is in our ability to properly weight all of the vector relationships that could possibly exist in say, the presence of a Heartbleed bug in a cryptographic library that we find the gating factor in successfully applying this incredible process to Cybersecurity discovery and defense.
An advanced set of algorithms that detects what it thinks may be the formation of an advanced persistent threat and can trigger appropriate incident response is a gold mine, right up until the moment when that same set of algorithms mistakenly fails to recognize a hedge fund trader’s surreptitious pilfering of the company’s proprietary trading algorithms and believes it instead to be an innocent download of transaction detail for the Charles Payne account by a properly credentialed user.
Neural networks, for all their prodigious promises are simply an assortment of empty fuzzy pattern recognition algorithms, and represent at best, a limited brand of easily hoodwinked fake intelligence.
Changing a single pixel, or adding visual noise that’s imperceptible to a scanner will completely flummox a neural network. As much time and calories that have been consumed in the development of autonomous cars, we still see that self-driving vehicles routinely fail to navigate conditions that they have never “seen” before. All kinds of machines equipped with AI and Deep or Machine Learning fail to consistently parse language or events that are contextually inconsistent with their training.
Until we get to the point where our neural networks don’t collapse when the layers suddenly can’t assign a proper weight to a flipped bit that it has never seen before, we will remain stuck in the endless leapfrogging groove that keeps repeating in our Cybersecurity playlist. So near we can smell it but still it remains a small fuzzy image hovering on the distant horizon.
And by the way, instead of breathlessly publishing each and every new research discovery and breakthrough at MIT and Stanford, maybe we could keep the ones that really matter hidden, so that the bad guys don’t immediately spin up countermeasures that bump us back into that groove.
Discretion continues to remain the better part of valor. Especially in Cybersecurity defense.