OpenAI Open Sourced: What it is means to you?
What makes a model Open Sourced and how to use them?
OpenAI released open-source models. People think it’s a revolution. But this is not the first time such a revolution has occurred. Meta was the first one to do it with their Llama models. Deepseek did it right out of the bat. But we are not here today to debate who did it first, but to embrace what’s happening and how to look at it.
Soon, you will see videos of people talking about this release. Your brain tags it as a new, distant, complex thing. But it is yet another open-source model making its way to its users.
LLM Models and Their Components
An LLM model has two parts: the code and the weights.
The code, again, contains two parts: training and inference.
Training involves certain chunking(large sentences to small), tokenizing(splitting sentences to words), vectorization(converting words to numbers so that GPUs can understand them), training phase where the magic of feeding a large neural network(a math equation) with large training data happens, followed by fine-tuning(where LLMs are nudged to perform specific tasks, another layer of training, with task specific dataset eg., Chatgpt = to chat = Q&A). The output of all these weights
Inference is what you see ChatGPT do when you ask it a question. The text you enter goes through exact chunking, tokenization, and vectorization, except the weights from the training data make it hop through specific hoops based on what it “learned” previously. Certain nodes of a neural network light up, and some remain dormant, just like our brain. Finally, vectors are generated, converted back to words, and presented to you.
Weights—The neural network, the vectorization, and the transformation layers together make up the architecture of a model. But all of this is of no use without the weights.
A model is truly open source when the weights and data used for training are open-sourced. We can do away with training data because we won’t have the infrastructure to train on, even with such large volumes of data. But the weights? That you can’t
What are these weights, anyway?
Numbers.
Yep. That’s it
Neural networks are large, glorified linear equations with billions of parameters. When you see 30B on a mode, that means 30B variables in the equation.
Ax + By + Cz + …….. (30B such complicated, un-understandable chaos that somehow makes sense)
How to use OSS models?
That’s for the understanding part, what’s in it for you? How can you make use of this?
Well, if you have a GPU, you can run your own ChatGPT without ever fearing that your midnight therapy sessions will be exposed.
Even if you don’t, there is a smaller 20B model that I hope will run on a CPU. Unlike GPUs, CPUs struggle to perform math as complicated as this. To top that, the 20GB model comes with about 14GB, and the whole 14GB worth of code + weights has to be loaded into your RAM for you to use it.
So, if your system has more than 24GB RAM(14GB for the model + room for other apps and the Operating system), use Olama to run gpt-oss on your machine.
To try larger models without a GPU, you can use a hosting provider like Groq. They maintain the infrastructure for you, while you can use the models via API calls.
Happy to answer questions in comments
I tried running QWEN 12B and osmosis-ai-mcp-4b model on my system (16GB ram & 3060 12GB GPU) to do some MCP on a self hosted Kanboard instance. I was able to infer and get information from the Kanboard using a self-hosted MCP server and it was wayyy too slow than the claude pro. But it was a fun project.