Large language models, are tools that make ChatGPT work. Think about how children learn to speak. To begin with, they listen to large amounts of speech, which they later mimic when they start to speak. LLMs are made using a similar approach. These are neural networks that imitate the functioning of the brain. Initially, they are trained on large quantities of text mostly sourced from the Internet. Think of sites like Wikipedia and large news websites.
A major challenge in LLM development is to maintain the same performance while significantly reducing the number of parameters
If we input text, an LLM will try to generate several sequences that are possible continuations, and will use what it has learned to determine the most likely one. For example, if we write “give”, there’s a huge range of possibilities. But if I add more context by writing “give pause for”, that will reduce the range of possibilities and it will likely generate “give pause for thought”.
LLMs are probabilistic models that rely on a great many parameters. To give you an idea, GPT 3, the first release of ChatGPT, was based on 175 billion parameters. The latest model, GPT 4, has 1000 times more parameters. A major challenge in LLM development is to maintain the same performance while significantly reducing the number of parameters, because the energetic and environmental cost of training these models is very high.