LLMs Demystified

6 min readJan 5, 2025

LLMs, especially ChatGPT, have become synonymous with Artificial Intelligence (AI). The world has been amazed by the capabilities of LLMs and wonder what is in store in the future. Will this exponential growth in this technology continue? Will machines learn from images and videos just like they have learnt from the text? Are LLMs really intelligent machines which create existential threat to humanity? Some of these could be answered by understanding how LLMs actually work. Here is a non-technical primer based on my understanding, which can answer some of these questions.

Language models for predicting next word and writing content have been in existence for long. Think of auto complete on your mobile or Gmail. These models basically predict next word using context of past few words. e.g. given a statement ‘32 teams played in the Football World ___ ‘, most people can guess that the next word is Cup. There can be other similar worlds like Championship or Tournament, but based on what you have learnt over the years, the probability of next word being Cup is the highest. LLMs just expanded this methodology by learning from reading whole internet and lots of other reading material. And they stored them in a book (called Model) as a list of rules (called Parameters).

So how did the LLMs store what they learnt? It stores the relationships between different parts of the texts it has learnt in a set of rules or Parameters. Without going into technical details, assume that LLM notes down these relationships it has found in a ‘Parameter Book’ or Model. So it will learn that word ‘Football’ is followed by ‘ World Cup’ in multiple occasions (with higher probability) and will write this rule as one or more parameters. Because it is reading whole lot of data, it may learn that the phrase Football World Cup is associated with FIFA and write it down as another parameter. Some parameters may store link between football players and World Cup, some may store history of World Cup. And it includes not only positive linkage but also exclusionary rules. e.g. T20 rarely appears in the context of Football. As you can imagine, there can be huge number of such linkages and LLM diligently writes down all these rules in its ‘Parameter Book’.

Size of this ‘Parameter Book’ will depend on couple of factors — the breadth of domains covered and how extensive are the linkages. Most of the advances LLMs today are ‘trained’ on the entire internet, multiple books, forums, etc., so they cover wide array of topics. But there can be LLMs specific for certain domains or even for a company’s internal data including chats. Second factor- linkages- depend on how deep one wants to go. Continuing with the example above, a smaller ‘Parameter Book’ (Model) will have say only 10,000 rules about football covering key teams, players, tournaments. But a bigger model may have 10 times more rules covering strategies, history and obscure things related to football. That is why companies release multiple versions of Models. e.g. Meta’s Llama has versions from 7 billion to 65 billion parameters. User can select the number of parameters based on the application. Just like a user may select a pocket dictionary with 50k words for daily use and complete version with 600k words if they want to understand words in detail.

Creating this ‘Parameter Book’ is not an easy task and requires huge number of specialized processors (which are manufactured by NVidia). Only few companies with deep pockets have been able to build these books. Then they use these books to create interface which users can use. So when one asks a question to ChatGPT, it refers this ‘Parameter Book’ to guess what would be the next words and composes the answer. The quality of answer depends on how detailed the ‘Parameter Book’ is.

The company which created ‘Parameter’ book, will decide how people can use this book. This is how Open source vs proprietary models differ. Proprietary models such as ChatGPT, do not share this ‘Parameter Book’ with everyone. People can only ask questions and get their answer using the interface made available by these companies. They charge customers based on how many questions are asked.

In the open source model, the ‘Parameter Book’ with all its rules is made available to everyone. Now users can use these rules as a base and can create additional rules based on their use cases and their own data. This enables them to make their own ‘Parameter Book’, by combining the relevant rules from the Open-source Parameter Book and adding their own rules. This customized ‘Parameter Book’ can answer questions for their particular use case much better than the original Open-source Parameter Book. The Open-source company can then charge the users for using their rules.

With a big parameter book, LLMs can answer most of the questions and give an impression that the questions are being answered by some intelligent being. And this is the precise reason why LLMs can get simple things wrong such as 1.1⁵. Because LLMs just make rules based on the text, they do not necessarily understand what is ‘1.1⁵’. They would just treat these numbers as text and try to find out for a similar pattern of text from its rule book. If such text sequence is not encounted in the past, it can throw off the LLM and it gives answer which has the highest probability. This whole mechanism makes me think (and which I have tweeted as well) — LLMs are like rote learning approach (popular in India/Asia) and not independent thinking approach in the West. Rote learning produces remarkable results in reproducing existing knowledge, but it lacks the fundamental understanding of the concepts. Of course, models are constantly improving and they can start adding rules around addition in the ‘Parameter book’. Does this mean they ‘understand’ the basics of maths? I doubt. That’s why the claims that these LLMs can evolve and pose threat for humanity are farfetched in my opinion.

If not LLMs, then can the video models learn and become sentient? Latest video models from Google and Open AI are showing a lot of promise. Image/ video models differ from the LLMs mainly because how information is encoded. Words is one of the efficient information encoding that humans have. So a word ‘car’ in a text can mean a specific entity which can appear in context of certain other text. Images on the other hand are stored as pixels and hence, an image of a car is represented as a set of pixels. If the car appears at another angle or settings, the set of pixels changes and the image model does not intrinsically understands that it is a car, just in the other settings. On the other hand, car in a sentence remains car irrespective of the context in the sentence. Hence, predicting image and videos requires much more processing as instead of a one word, each pixel is used for prediction of the next image. So generating a full length movie is still some time away, though books with LLMs are available.

What could benefit rapid progress in image/ video models is moving from a pixel model to an entity model. The model should understand the car and not pixels. So if the car changes direction, model should understand that the entity (car) has remained the same, but only the context has changed. Models are being developed with this principle and I believe that would result in a rapid progress in how machines understand the world and it will be closer to what humans do. Machines can watch videos for millions of hours and learn, just the way they have read all the books to create LLMs.

But does creating these rules with text and videos is same as ‘intelligence’? What would make machines equally if not more ‘intelligent’ as humans? These are complicated questions as we still do not completely understand how neurons in our brains and the actual neural networks work. We can no longer rely on Turing Test as machines can often fool you to be a real human. Does solving all the problems as humans and meeting various benchmarks set for humans, will make them at par with ‘intelligent human beings’?

It reminds me of a question almost 200 years ago, when an organic compound was first created using inorganic compounds. Before that it was believed that organic chemicals can only be created by organisms. Currently, we assign intelligence as something which living organism display, even if machines can do similar tasks. Should intelligence be defined by the constituents of the entity (human vs machine) or by the ability to complete tasks? Is the argument about machines becoming as intelligent as humans a merely semantics? Aren’t we already on that road? We are indeed heading into an intesting future.

LLMs Demystified

Written by Ninad Parab

No responses yet