Navigating the physical world has proven to be much more challenging than many followers of the autonomous vehicle (AV) industry expected. Current AV development is modular with components for object detection, tracking, trajectory prediction, path planning, and control, but this approach can fall short in real-world scenarios, according to Nvidia. So, company researchers built the Hydra-MDP as an end-to-end driving system for accurate perception and robust decision-making under a unified transformer model.

Today, the Nvidia technology was named an Autonomous Grand Challenge winner at the Computer Vision and Pattern Recognition (CVPR) conference organized by the IEEE Computer Society and Computer Vision Foundation and running this week in Seattle. Building on last year’s win in 3D occupancy prediction, Nvidia Research topped the leaderboard this year in the end-to-end driving at scale category with its Hydra-MDP model, outperforming more than 400 entries from around the world.

“This milestone shows the importance of generative AI in building applications for physical AI deployments in autonomous vehicle development,” wrote Danny Shapiro, Vice President, Automotive at Nvidia, in a blog post. “The technology can also be applied to industrial environments, healthcare, robotics and other areas.”

The winning submission also received CVPR’s Innovation Award, recognizing Nvidia’s approach to improving “any end-to-end driving model using learned open-loop proxy metrics.”

The race to develop self-driving cars involves three distinct yet crucial parts operating simultaneously: AI training, simulation, and autonomous driving, according to Shapiro. Each requires its own accelerated computing platform, and together, the full-stack systems built for these steps enable continuous development cycles always improving in performance and safety.

To accomplish this, a model is first trained on an Nvidia DGX AI supercomputer. It’s then tested and validated in simulation—using the Nvidia Omniverse platform and running on an Nvidia OVX system—before entering the vehicle. Finally, the Nvidia Drive AGX platform processes sensor data through the model in real time.

This year’s CVPR challenge asked participants to develop an end-to-end AV model trained using the nuPlan dataset, the world’s first large-scale planning benchmark for autonomous driving, to generate driving trajectory based on sensor data. The models were submitted for testing inside the open-source NavSim—an Isaac simulator for navigation based on the Unity 3D game engine—and were tasked with navigating thousands of scenarios they hadn’t experienced yet. Model performance was scored based on metrics for safety, passenger comfort and deviation from the original recorded trajectory.

For the winning entry in CVPR’s challenge, Hydra-MDP ingests 1 s of vehicle trajectory history and camera and lidar data at a reduced frame rate of only 2 frame/s and generates the next 4 s of optimal vehicle path as an output. The simplified end-to-end architecture is said to optimize pipelines with less code and better performance.

The model can learn from real-world and simulated driving data. This enables easier handling of rare corner cases and dangerous scenarios and the ability to more easily mimic human driving providing a more comfortable and predictable experience.

Nvidia also ranked second for its submission to the CVPR Autonomous Grand Challenge for Driving with Language. Nvidia’s approach connects vision language models and autonomous driving systems, integrating the power of large language models to help make decisions and achieve generalizable, explainable driving behavior.

In conjunction with CVPR and building on the Autonomous Grand Challenge win, Nvidia announced a new set of software APIs (application programming interfaces) that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of any kind. They build on the workflow researchers used to win the competition can be replicated in high-fidelity simulated environments with Nvidia Omniverse Cloud Sensor RTX. This means AV simulation developers can recreate the workflow in a physically accurate environment before testing their AVs in the real world. Nvidia Omniverse Cloud Sensor RTX microservices will be available later this year here.

For this year’s CVPR, more than 50 Nvidia papers were accepted on topics spanning automotive, healthcare, and robotics. Over a dozen papers cover Nvidia automotive-related research including the Hydra-MDP for end-to-end multimodal planning with multi-target Hydra distillation, the winner of CVPR challenge, and on producing and leveraging online map uncertainty in trajectory prediction, a CVPR best paper award finalist. Sanja Fidler, Vice President of AI Research at Nvidia, will also speak on vision language models at the CVPR Workshop on Autonomous Driving.