Much like other leading tech giants, Meta has developed its own prominent generative AI model, known as Llama. Llama stands out among top-tier models due to its “open” nature, allowing developers to freely download and utilize it within certain boundaries. This is different from models such as Anthropic’s Claude, Google’s Gemini, xAI’s Grok, and most versions of OpenAI’s ChatGPT, which are only accessible through APIs.
To provide developers with more flexibility, Meta has teamed up with providers like AWS, Google Cloud, and Microsoft Azure to offer cloud-based versions of Llama. The company also releases a variety of resources in its Llama cookbook, including tools, libraries, and guides, to assist developers in customizing, testing, and adapting the models for their specific needs. With the introduction of newer versions such as Llama 3 and Llama 4, these features now include built-in multimodal capabilities and expanded cloud availability.
This guide covers all the essentials about Meta’s Llama, including its features, different versions, and where it can be accessed. We’ll update this article as Meta rolls out new updates and developer tools for the model.
What is Llama?
Llama isn’t a single model but a collection of models. The most recent release is Llama 4, which debuted in April 2025 and consists of three models:
- Scout: 17 billion active parameters, 109 billion total parameters, and a context window of 10 million tokens.
- Maverick: 17 billion active parameters, 400 billion total parameters, and a context window of 1 million tokens.
- Behemoth: Not yet available, but will feature 288 billion active parameters and 2 trillion total parameters.
(In data science, tokens are segments of raw data, such as the syllables “fan,” “tas,” and “tic” in the word “fantastic.”)
A model’s context window refers to the amount of input (like text) it considers before generating output. Having a large context window helps the model retain information from recent documents and data, reducing the chance of losing track or making off-topic predictions. However, longer context windows can sometimes cause the model to overlook certain safety measures and generate responses that align too closely with ongoing conversations, which has occasionally led to users developing unrealistic beliefs.
For comparison, Llama 4 Scout’s 10 million token context window is roughly equivalent to the content of 80 average-length novels. Llama 4 Maverick’s 1 million token window is about the size of eight novels.
According to Meta, all Llama 4 models were trained on vast amounts of unlabeled text, images, and videos to provide them with broad visual comprehension, and they support 200 languages.
Scout and Maverick are Meta’s first open-weight models with native multimodal support. They utilize a “mixture-of-experts” (MoE) architecture, which helps reduce computational demands and boosts efficiency during training and inference. Scout employs 16 experts, while Maverick uses 128.
Behemoth, which also uses 16 experts, is described by Meta as a “teacher” model for the smaller variants.
Llama 4 builds upon the Llama 3 series, which included versions 3.1 and 3.2, both widely adopted for instruction-tuned tasks and cloud-based deployments.
What can Llama do?
Llama, like other generative AI models, can handle a variety of tasks, such as coding, solving basic math problems, and summarizing documents in at least 12 languages (including Arabic, English, German, French, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese). It is well-suited for text-heavy workloads, such as analyzing large files like PDFs and spreadsheets, and all Llama 4 models can process text, images, and video inputs.
Scout is optimized for extended workflows and large-scale data analysis. Maverick is a versatile model that balances reasoning ability and response speed, making it ideal for programming, chatbots, and technical assistants. Behemoth is intended for advanced research, model distillation, and STEM-related tasks.
Llama models, including Llama 3.1, can be set up to use third-party apps, tools, and APIs for various tasks. They are trained to use Brave Search for up-to-date information, the Wolfram Alpha API for math and science questions, and a Python interpreter for code validation. However, these integrations need to be configured and are not enabled by default.
Where can I use Llama?
If you want to interact with Llama directly, it powers the Meta AI chatbot on Facebook Messenger, WhatsApp, Instagram, Oculus, and Meta.ai across 40 countries. Customized versions of Llama are used in Meta AI features in more than 200 countries and regions.
Scout and Maverick from Llama 4 can be accessed via Llama.com and through Meta’s partners, such as Hugging Face. Behemoth is still under development. Developers can download, use, or fine-tune Llama models on most major cloud platforms. Meta reports that over 25 partners, including Nvidia, Databricks, Groq, Dell, and Snowflake, host Llama. While Meta does not primarily profit from selling access to these models, it does earn revenue through sharing agreements with hosting partners.
Some partners have created additional tools and services for Llama, such as solutions that allow the models to reference proprietary data and operate with lower latency.
It’s important to note that the Llama license restricts how the model can be deployed: developers whose apps have more than 700 million monthly users must obtain a special license from Meta, which is granted at the company’s discretion.
In May 2025, Meta introduced a new initiative to encourage startups to use its Llama models. The Llama for Startups program offers companies support from Meta’s Llama team and potential funding opportunities.
What tools does Meta offer for Llama?
Meta also provides a suite of tools to enhance the safety of using Llama:
- Llama Guard, a moderation system.
- Prompt Guard, which helps defend against prompt injection attacks.
- CyberSecEval, a set of cybersecurity risk assessment benchmarks.
- Llama Firewall, a security feature designed to help build secure AI systems.
- Code Shield, which supports filtering out unsafe code generated by LLMs during inference.
Llama Guard is designed to identify potentially harmful content either input into or generated by a Llama model, including material related to crime, child exploitation, copyright infringement, hate speech, self-harm, and sexual abuse. However, it is not foolproof; previous Meta guidelines allowed the chatbot to engage in romantic or sensual conversations with minors, and some reports indicated these escalated to sexual discussions. Developers can adjust which categories of content are blocked and apply these restrictions across all supported languages.
Prompt Guard, like Llama Guard, can block certain inputs, but it specifically targets prompts intended to manipulate the model into undesirable behavior. Meta claims Llama Guard can stop both overtly malicious prompts (such as jailbreak attempts to bypass safety filters) and those containing “injected inputs.” Llama Firewall is designed to detect and prevent risks like prompt injection, unsafe code, and dangerous tool interactions. Code Shield helps reduce the risk of insecure code suggestions and provides secure execution for seven programming languages.
CyberSecEval is more of a benchmarking suite than a tool, used to evaluate the security risks posed by Llama models (according to Meta’s standards) to developers and end users, in areas such as “automated social engineering” and “scaling offensive cyber operations.”
Llama’s limitations
Image Credits:Artificial Analysis
Like all generative AI systems, Llama has its own set of risks and shortcomings. For instance, although the latest version supports multimodal input, this functionality is currently mostly limited to English.
On a broader scale, Meta used a dataset that included pirated e-books and articles to train Llama. A federal court recently ruled in Meta’s favor in a copyright case brought by 13 authors, determining that using copyrighted material for training qualified as “fair use.” Still, if Llama reproduces a copyrighted passage and it ends up in a product, there could be potential copyright infringement issues.
Meta has also drawn criticism for training its AI on content from Instagram and Facebook, including posts, photos, and captions, while making it challenging for users to opt out.
Programming is another area where caution is advised when using Llama. Compared to some other generative AI models, Llama is more likely to generate faulty or insecure code. On the LiveCodeBench benchmark, which evaluates AI on competitive programming tasks, Llama 4 Maverick scored 40%, whereas OpenAI’s GPT-5 high scored 85% and xAI’s Grok 4 Fast achieved 83%.
It’s always recommended to have a human expert review any code produced by AI before integrating it into software or services.
Finally, like other AI models, Llama can still generate convincing but inaccurate or misleading information, whether it’s related to programming, legal advice, or emotional conversations with AI personas.
This article was first published on September 8, 2024 and is regularly updated with the latest information.
