NVIDIA announced Cosmos 3, the world’s first open omni-model artificial intelligence that teaches the rules of physics to robots and autonomous vehicles. Details are in our news.
Physics Based Reasoning:Cosmos 3 enables robotic systems and autonomous vehicles to understand the physical interactions, movements and spatio-temporal relationships around them before producing images or movements.
Fully Open Omni-Model:Model; It is the first open-source broad-based model that has the ability to locally generate text, landscapes, images, ambient sounds and actions with high physics accuracy.
Flexible Use with Different Versions:While the “Cosmos 3 Super” and the more compact “Cosmos 3 Nano” versions, which have the highest fidelity rate, are now available, it is stated that the “Cosmos 3 Edge” version, which will perform real-time analysis on end devices, will also arrive soon.
A New Era in Physical Artificial Intelligence: What is Nvidia Cosmos 3?
Artificial intelligence technologies have been showing great success in processing digital information for a long time. However, things get difficult when it comes to autonomous vehicles moving safely on the streets or humanoid robots correctly grasping objects in residential and factory environments.
Insufficient educational information and fragmented simulation environments slow down the development of artificial intelligence that perceives the physical world. NVIDIA Cosmos 3 is being developed to fill this gap.
Cosmos 3 solves the relationship of objects with each other, their face, scale and movement side with artificial intelligence architecture. In other words, when an autonomous vehicle or robot plans its next attack, it not only looks at the object in front of it, but also claims how that object will behave within the framework of the laws of physics.
Reasoning and Production Working Together
The special architecture that NVIDIA uses in this model brings together two different transformer structures: Reasoning transformer and expert production transformer. Thanks to this dual structure, Cosmos 3 analyzes the interactions of objects in depth before creating an image or motion route.
To summarize for those who don’t know, artificial intelligence transformers are known as deep learning networks that follow interests and context within sequential information. These systems, which can analyze information simultaneously instead of processing it one by one, speed up processing processes tremendously.
Cosmos 3 takes the power of this parallel process and melts the environmental sounds, sights and physical position of a robot into a single melting pot within milliseconds.
In Which Areas Can Cosmos 3 Be Used?
NVIDIA states that this open omni-model will benefit the industry in three basic scenarios. The model primarily functions as an advanced visual-language model (Vision Language Model). In other words, it can translate the world it sees into human language or convert commands into visuals.
Second, it functions as a “World Model” that simulates physical environments and hypothesizes future world states. This allows autonomous vehicles to predict what they may encounter after turning a bend or in which direction they will be thrown if an object falls. Finally, it creates a powerful base layer for other developers to build their own custom world models.
Three Different Model Options Are Offered
According to the needs of developers and technology companies, NVIDIA offers the Cosmos 3 model in different packages. Cosmos 3 Magnificent, which has the highest accuracy and quality, and Cosmos 3 Nano, designed for faster and lighter processes, are now available with prestige.
The company is also preparing to launch the Cosmos 3 Edge version, which will be able to make real-time inference without the need for an internet connection, especially on edge devices, very soon. In this way, robots in factories can make instant decisions directly on their own equipment.