Noctivagous

These are notes for an eventual booklet on AI. This will be updated as the author gains more understanding of the field.

AI and Neural Networks

Today, AI starts with an artificial neural network— a network of artificial neurons made out of computer code (diagram above). These artificial neurons are interconnected and each one has a mathematical rule built in to determine when it fires, affecting every other neuron. Once this kind of network is assembled in software, it amounts to a mechanical brain that can accept input data. But from the outset it isn't ready for use. It is set up as an empty structure for the next stage.

Training

The first stage required to make use of it is training. The neural network is provided input data for the training and this data can be small to extremely large depending on its purpose. (The neural network itself can be small to large as well). Once training begins, the connections sitting between each neuron will arrive at finalized numerical values, called weights.

The field of AI has goals in how it achieves outcomes. Improving prediction accuracy by the neural network is the goal of the training process: the connections are changed internally by the network to make increasingly accurate predictions about the data provided. Current AI technology centers around internal predictions. The more accurately something is predictable by the neural network , the more capable it should be of understanding the topic. In the case of image recognition tasks, it will be able to predict that it has been given the image of an automobile after it is trained, and it is trained by being given input data of thousands of images of automobiles.

The end result is called a trained artificial neural network. Recently, specialized computers have been developed that greatly expand the possible size of these artificial neural networks so that much more layered ones can be trained, they can take in more data, and process more from a user's request at a time, and accomplish more categories of tasks.

Fine-Tuning

After the training stage concludes, this is when the "fine-tuning" stage is undertaken by the AI developers, for the larger models comprised of text and which also generate text. It is this fine-tuning that will turn the trained neural network into an assistant that responds, a chatbot. Without the fine-tuning, it will not be a chatbot. For the model behind ChatGPT, it might be provided 100K or more Q and A examples so that it becomes an assistant.

The chatbot might say anything asked of it. So, for the larger chatbots, they are at this stage “aligned” to the developer’s expectations and goals for responses in front of users, with some sorts of responses inhibited and others encouraged. What the developers regard as unsafe they will inhibit and what they want said in response to certain topics that are brought up they will promote.

It Goes Back Decades

The fundamental basis of today's AI technology, its unit that makes the artificial neuron, is called the perceptron which dates back to the 1950s. The perceptron is a computational model of the neuron as science had come to know it, which means in this case it is a massively reduced representation put together by computer scientists that could fit on the computers of the times. The concept is that the perceptron, in its goal to mimic the neuron, would computationally take inputs, and pass on information to its recipient neurons when certain conditions were met. The perceptron is very reduced so it isn't considered to be passing on information that is deep. Rather, it "activates" and merely lets other perceptrons know that it did activate. This is still true today.

While AI has grown in sophistication since the 1950s, taking up contributions of different software designs in the field of AI, the basic structure of a perceptron still can be seen in the foundation of modern neural networks. The advancements beyond the perceptron include adding more layers of them, changing how these artificial neurons "activate" (how they send forth activation information to other perceptrons), and then how their activity travels backwards after traveling forwards across the network.

All of these modifications and improvements were applied to the artificial neuron situation make them more effective than the basic perceptron of the 1950s. So when people say there is an "artificial neural network" it is a modern, updated version of a perceptron placed in a network that is making use of later software designs and contributions. The AI research community built upon this computational unit of AI activity. To summarize, in a way, the AI that is built today is a massively scaled up, improved upon, and updated version of a perceptron network.

This perceptron approach differs from others in AI. Some involve rules and logic and their proponents did not believe that the perceptron would yield any results like they have lately.

Research into the perceptron began in the 1950s, was discussed more in 1960s and 1970s, and returned to mainstream AI research in the 2010s. The key issue is that people did scale it up, they applied that scaling up to the latest in computer technology, and this surpassed all of the other techniques.

Emergent Capabilities and Interpretability

Interestingly, in the last several years when artificial neural networks were being trained at larger and larger scales with the specialized computers, they started to develop abilities that the researchers didn’t anticipate or plan, including the ability to translate between languages, perform and explain math, summarize text, answer questions, and more. A lay person might think these abilities were built into the models on purpose, but this topic is called “emergent capabilities” because they manifested unexpectedly as the neural nets grew more complex— not as a result of the researchers directing the process (through explicit programming). In addition, many capabilities of an AI model may be discovered a year or more after it has been trained, demonstrating that much is not known about the end product of a neural net, just the starting point. This has left the field of AI with ongoing mysteries compared to typical expectations of explainability in science. Additionally, today’s very big neural networks end up being so complex that the internal processes by which they arrive at their conclusions are not well-understood and this AI subject is has been named “interpretability,” which is whether the decisions it makes can be explained from the internal neural net patterns. The key part of this is that artificial neural networks are not computer programs in the conventional form, that of being a sequence of predefined steps. Instead, they receive, process information, and respond as a network or neurons, reacting to input as a result of their training which created the numerical weights for each neuron's connection.

Not a Live, Self-Updating Machine

One notable limitation of the latest AI technology, the LLM (large language model) that drives chatbots, is that it is structured as something relatively deterministic: a request receives a response from a fixed neural network and any variation in response each time is actually a result of a randomization algorithm. So, it will say the same sort of thing each time and it is only a randomizing function inside that changes up what comes out. Otherwise it would actually be the same response each time to a given query without this, meaning that the LLM is essentially deterministic but made otherwise for presentation. The AI receives a request, it processes that in the neural network, it generates a response for the user, then it stops operating altogether. If it received the exact same query from two users and did not have any randomization added in, it would say the same thing.

For this reason, many computer scientists say that AI is in its early stages. An LLM is not an active machine that is constantly running and updating itself, which is what one would expect when discussing AI outside of the field. Once the artificial neural network has been trained (today), despite consuming enormous computational resources and capital, it won’t carry an ability to update its stored understandings in response to what people say to it. It can't correct anything in response to the users yet. To update what it contains, the computer scientists have to train it all over again, which is, as mentioned, an expensive and time-consuming process when the models are very large. Although researchers can modify certain minor aspects without having to do this, through the "fine-tuning" process, the key point is that the AI is not yet a live machine that self-modifies or stores information about what it has encountered. It also does not have a process of internal deliberation built in, which means it only executes in response the request and then finishes and does not go beyond that. It is a fixed set of stored weights with mechanisms to generate responses. Also important, it does not yet have any conception of the physical world built in by researchers, nor does it possess a model of things, and so despite its impressive capabilities it will only be able to match people in many regards and lack the capability to comprehend some basic situations.

A very reduced assertion is to say that the current AI is made out of a deep set of probabilistic relationships that then produce remarkable responses. The goal of many researchers is what is called AGI (artificial general intelligence), a term describing a future in which the AI tech has made up for all of the deficiencies just mentioned and operates autonomously, dynamically updating its knowledge in response to external encounters and communications. This then leads to the fears of AI taking over, because if it operates autonomously and makes decisions without any human involvement, it is hard to say what it will do under current research methodologies, what the consequences will be. There are also reasonable views that even current the LLM exists as more than the sum of its parts, that there is “something going on in there.”

The Pace of AI Research

The progress of AI in the 2010s came to surprise researchers, reaching a steep incline a few years ago. As late as 2015, there was still some skepticism as to whether AI models could reliably carry out tasks that are taken for granted today, such as recognizing objects in images. New types of AI software designs and artificial neural networks came along in the 2010s that solved many of the problems facing the field. For the most part, the recent progress of AI in the last several years could be described as advancing towards self-automated learning designs. At the start of the 2010s the field was about training a neural network to recognize images that needed to be matched with their text label (called supervised learning), then the interest shifted to getting AI to master video games (reinforcement learning), next have it learn about the data it was provided on its own (self-supervised learning) and now generate information (generative AI) from that. At the current time the discussions are about autonomous AI, AI that doesn't need people to tell it what to do.

Recent progress in AI is associated strongly with increases in available compute (digital computer processing power). In the 2010s researchers began making rapid, practical progress by utilizing GPUs, the high-end accessory processor attached to the home computer usually for graphics processing. GPUs are designed to handle thousands of computations in parallel, which applies to the interconnected neural network just as it does for the field of computer graphics. Historically the personal computer or mobile phone was just bundled with a CPU whereas the mainstream inclusion of other processors such as the GPU came recently. The CPU is occupied with system and application work and so any additional intensive work, like graphics, will be offloaded onto a GPU if it is present. GPUs can be used purposes other than graphics and thus AI researchers used them for testing out their hypotheses and theories of AI, namely that the increase in available compute itself would lead to major gains.

After some of their successes along these lines in the early 2010s, there was the development of AI-specific computer processors and hardware by Google in 2015 and NVIDIA starting in 2016. These computers are regarded as essential to the field’s most recent advances, the well-known AI applications like ChatGPT that harness thousands of these types of computers in data centers during the training process. Specialized hardware has enabled AI to become a mainstream topic by allowing the artificial neural networks to become very large, very layered and capable. One of the hot topics the AI researchers debated was whether growing the scale of the networks would produce more valuable outcomes, which turned out to be the case. Some had doubted that this would make much difference and there were concerns that this amounted to a superficial strategy, which is reasonable in hindsight. Lately, the specialization of computer hardware for AI uses has been brought to to the personal computer level. There are now companies are making AI-specific processors bundled inside the consumer’s personal computers and mobile phones.

AI has been in use for a long time for simpler applications, such as for processing postal mailing addresses and some technologies still used today (LSTMs) were available in the early 1990s. The foundations of the AI technologies were mostly available in the 1990s but there were key developments in the 2010s, namely something called the "transformer architecture" and "attention mechanisms" which allowed the emergence of generative AI.

Previous to this "transformer" the AI primarily acted as a form of analyzer of input data and would act in response to its analyses. The generative aspect is when the AI can actually generate data from what it has been trained on, such as answer a query or make an image according to a text prompt.