Qwen 3: Powerful Open-Source Hybrid LLM Beats Deepseek R1 in Benchmarks

Qwen 3: Powerful Open-Source Hybrid LLM Beats Deepseek R1 in Benchmarks. Alibaba's new Qwen 3 series, including 235B and 30B parameter models, outperforms top models in coding, math, and reasoning tasks. Optimized for efficiency and global applications.

29 de junho de 2025

Discover the power of Qwen 3, a groundbreaking open-source hybrid LLM that outperforms top models like Deepseek R1 across a range of benchmarks. This efficient and versatile AI solution is poised to revolutionize the way you approach tasks, from coding and mathematics to general reasoning and problem-solving.

Comparison to Top Tier Models
The Lightweight Qwen 3 Model
Qwen 3's Architectural Innovations
Testing the Qwen 3 Model
Developing a Front-End App
Implementing Conway's Game of Life
Generating SVG Butterfly
Solving a Math Word Problem
Creating a TV Channel Simulator
Summarizing a Climate Research Article
Solving a Logic Puzzle
Conclusion

Comparison to Top Tier Models

The Quen 3 series models, particularly the flagship 235 billion parameter model, are rivaling the performance of top-tier models like DeepSeek R1, GPT-3, Gemini 2.5 Pro, and OpenAI's models. The 235 billion parameter Quen 3 model outcompetes these other models in almost every benchmark, including coding, mathematics, and general reasoning.

Surprisingly, the lightweight 30 billion parameter Quen 3 model also performs quite well compared to models like GPT-4, Omnigpt, and Gemini 3 DCV3, as well as Alibaba's previous models. This lightweight version could be a more practical choice for local deployment due to its smaller size.

The Quen 3 series utilizes a mixture of experts architecture with only 10% active parameters, which drastically reduces inference and training costs. They have also introduced a new "hybrid thinking mode" that allows users to switch between step-by-step reasoning and instant answers based on task complexity and budget.

Overall, the Quen 3 series showcases a massive efficiency gain, positioning it as a major breakthrough for fast, scalable AI deployment. The open-source nature and availability of multiple model sizes make the Quen 3 an attractive alternative to other top-tier models in the market.

The Lightweight Qwen 3 Model

The Qwen 3 series includes a lightweight 30 billion parameter model with just 3 billion active parameters. This model is a compelling option for those looking to use a high-performance AI model locally, as its smaller size makes it more accessible and efficient.

Despite its relatively lightweight nature, the 30 billion Qwen 3 model performs impressively, outcompeting models like GPT-4, Gemini 3, and DaVinci 3 on a variety of benchmarks, including coding, mathematics, and general reasoning tasks. This is likely due to the model's use of a mixture of experts architecture, which allows for only 10% of the parameters to be active during inference, drastically reducing computational costs.

The Qwen 3 model also introduces a new "hybrid thinking mode" that allows users to switch between step-by-step reasoning and instant answers, depending on the task complexity and available resources. This flexibility makes the model well-suited for a wide range of applications.

Overall, the lightweight Qwen 3 model represents a significant advancement in the field of efficient and high-performing AI models, offering a compelling alternative to larger, more resource-intensive options.

Qwen 3's Architectural Innovations

The Qwen 3 series from Alibaba showcases several architectural innovations that position it as a major breakthrough in efficient and scalable AI deployment:

Mixture of Experts Architecture: Qwen 3 utilizes a mixture of experts architecture, where only 10% of the parameters are active during inference. This drastically reduces the computational cost and enables faster inference.
Hybrid Reasoning Mode: Qwen 3 introduces a new "hybrid thinking mode" that allows users to switch between step-by-step reasoning and instant answers, depending on the task complexity and resource budget.
Enhanced Multilingual Capabilities: The model supports 119 languages, thanks to its pre-training on 36 trillion tokens - twice the amount used for Cowin 2.5.
Improved Coding and Agentic Capabilities: Qwen 3 demonstrates enhanced coding abilities and can perform various computer-based tasks, including tool use and function calling.
Efficiency Gains: The model's architectural design and optimization techniques have resulted in significant efficiency gains, making it well-suited for fast and scalable AI deployment.

Overall, the Qwen 3 series represents a major advancement in the field of large language models, combining impressive performance with increased efficiency and versatility.

Testing the Qwen 3 Model

The Qwen 3 series from Alibaba is a remarkable set of open-source mixture of expert models. The flagship Qwen 3 model with 235 billion parameters rivals top-tier models like Deepseek R1, Gro 3, Gemini 2.5 Pro, and OpenAI's models, outperforming them in various benchmarks.

The lightweight 30 billion parameter Qwen 3 model also performs impressively, competing well against models like GPT-4, Omni, and Gemma 3 DCV3. The Qwen 3 series utilizes a mixture of experts architecture with only 10% active parameters, significantly reducing inference and training costs.

The models showcase impressive capabilities, from software engineering tasks like developing a front-end application and implementing Conway's Game of Life, to mathematical problem-solving and logical reasoning. The models also demonstrate strong reading comprehension and summarization abilities when tested on a research article about climate modeling.

Overall, the Qwen 3 series is a significant breakthrough in the open-source AI model space, offering efficient and high-performing alternatives to the industry's leading models. The models' open-source nature and availability of dense variants make them accessible for local deployment and experimentation, positioning them as a game-changer in the AI landscape.

Developing a Front-End App

The Quen 3 model was able to successfully generate the code for a simple note-taking web application with the ability to add sticky notes. The application features a clean and modern user interface, allowing users to add titles and notes. Additionally, the model added the functionality to make the sticky notes draggable, enhancing the interactivity of the application.

The generated code demonstrates the model's capability in designing web application structures, implementing interactive components, and handling user interactions. The model was able to create a responsive and visually appealing front-end application, showcasing its proficiency in software engineering tasks that involve front-end development.

Overall, the Quen 3 model performed well in this task, delivering a functional and visually pleasing note-taking application that meets the requirements specified in the prompt.

Implementing Conway's Game of Life

#

The model was able to generate the implementation of Conway's Game of Life that runs within the terminal. This task tests the model's ability to implement a cellular automaton, checking its logic for matrix manipulation, and demonstrating its software engineering capabilities.

The generated code includes the necessary functions to initialize the game grid, apply the rules of the Game of Life, and simulate the evolution of the cells over time. The model was able to handle the algorithmic complexity of the problem and produce a working implementation that can be executed and observed.

Overall, the model performed well on this software engineering task, showcasing its proficiency in implementing a classic computer science problem that involves logical reasoning and programming skills.

Generating SVG Butterfly

The model was unable to generate the correct SVG code to represent the shape of a butterfly. The output generated resembled a Pokemon-like creature rather than a butterfly. While the model was able to capture some elements like the antenna, the overall shape and structure did not match the expected butterfly design. This prompt appears to be one of the more challenging tasks for the model, as generating accurate SVG code for a complex visual representation requires advanced reasoning and visual understanding capabilities that the model currently lacks.

Solving a Math Word Problem

The model was able to solve the math word problem involving the relative motion of two trains. It provided the correct answer of 1:12 PM as the time when the two trains meet, and also showed the step-by-step process to arrive at the solution.

The problem involved calculating the time when a train leaving city A at 9:00 AM traveling at 60 km/h and another train leaving city B at 11:00 AM traveling towards city A at 90 km/h would meet, given the total distance between the cities is 450 km.

The model demonstrated its capability in performing multi-step calculations, applying the distance formula (speed × time), and logically reasoning through the problem to arrive at the final answer. This showcases the model's strong mathematical and problem-solving abilities.

Creating a TV Channel Simulator

The Quen 3 model was able to generate a TV channel simulator with numbered key channels ranging from 0 to 9. The simulation showcases the model's ability in creative programming, handling keyboard input, and implementing array mapping and canvas masking using the p5.js library.

The generated code creates a TV-like interface with different channels that can be accessed by pressing the corresponding number keys on the keyboard. Each channel displays a unique animation or visual representation, demonstrating the model's creativity and programming skills.

The simulation handles the array mapping of the channel numbers, ensuring that the correct channel is displayed when the corresponding key is pressed. Additionally, the code implements canvas masking to create a TV-like frame around the channel content, further enhancing the visual experience.

Overall, the Quen 3 model was able to successfully complete this software engineering task, showcasing its capabilities in creative programming, input handling, and visual representation.

Summarizing a Climate Research Article

The model was able to effectively read and summarize the key points from the provided climate research article. It demonstrated the ability to:

Synthesize information from multiple sections of the article
Identify and integrate the main ideas and findings
Infer cause-effect relationships within the content
Provide a concise summary addressing the specific questions asked

The summary highlights the model's strong reading comprehension and reasoning capabilities, allowing it to extract and convey the essential insights from the complex climate modeling research. This showcases the model's proficiency in handling academic and scientific content, a valuable skill for various applications.

Solving a Logic Puzzle

The model did a great job in terms of logical reasoning with this puzzle. It provided a quick summary table of the step-by-step analysis to find the guilty person, and it correctly identified David as the guilty party.

The model demonstrated its ability to effectively check the statements, assume each person is telling the truth, and find the consistency to arrive at the right conclusion. This showcases the model's strong deductive reasoning capabilities.

Overall, the model's performance on this logical puzzle is impressive, as it was able to navigate the complex web of statements and deduce the correct solution. This is a testament to the model's robust reasoning skills and its capacity to tackle challenging problems that require logical thinking and problem-solving.

Conclusion

It matches top models like Deepseek R1 and O3 in multiple categories like math, coding, and reasoning, which is great because it also does this with fewer active parameters. This is thanks to a mixture of experts architecture, which makes it fast, efficient, and have a real hybrid model - great for instant answers and deep reasoning.

It's open-source with open weights, which means you can access all the different dense models on your local computer. Obviously, not the largest model, but you have so many different options for using it on your phone as well as your local computer. It's a game-changer, especially on fast inference hardware like GPUs.

Overall, it's a huge step forward for the open-source AI model space. The techniques used to develop this model will likely be used in other cases, which is astonishingly great for people developing AI models. I definitely think you should try it out locally, as it's a great open-source alternative to many of the other models we've seen like O3, O1, and DeepSeaR1.

Perguntas frequentes

What is the Qwen 3 series?

How does the performance of the Qwen 3 models compare to other top-tier models?

What are the key features and capabilities of the Qwen 3 models?

How can users access and test the Qwen 3 models?