newsNVIDIA BlogTrust 88 · LabPublished 3d agoLive · 3d ago

Into the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning

Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse. Vision AI agents are becoming a practical way to automatically turn video data from the physical world into operational intelligence in factories, […]

Covers

paperHAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration paperDynamo: Dynamic Skill-Tool Evolution for Vision-Language Agents paperScaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent repomrreviewai/ai-tools-for-content-creators repokrushna081/chakravyuh-ai

Covers (incoming)

paperGoku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing paperUnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image paperVLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes paperDomain Adaptation with Adaptive Imagination for Visual Reinforcement Learning under Limited Target Data paperArticulating then Matching: Zero-Shot Shape Matching for Uncurated Data paperASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving paperDriveWeaver: Point-Conditioned Video Inpainting for Controllable Vehicle Insertion in Autonomous Driving Simulation paperMVP-Nav: Multi-layer Value Map Planner Navigator paperNo Place to Hide: Benchmarking Video Hallucination with Background-Controlled Pairs paperFlexViT: A Flexible FPGA-based Accelerator for Edge Vision Transformers paperDPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers paperPreserve the Hard, Regenerate the Rest: Uncertainty-Guided Synthetic Training Data Augmentation with Diffusion Models paperTreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language Models paperDataset Biases and Shortcut Learning in Motion-Based AI-Generated Video Detection paperGenAU: Language-Grounded Industrial Anomaly Understanding with Vision-Language Models paperUnderstanding How Humans Inject Knowledge into Machine Learning Workflows through Visual Analytics paperStructured 4D Latent Predictive Model for Robot Planning paperInk3D: Sculpting 3D Assets with Extremely Complex Textures via Video Generative Models repoUnpr3dictable/neurizon.ai repoNVIDIA-NeMo/DataDesigner repoomnigent-ai/omnigent reporoboflow/inference repoopen-edge-platform/geti repoDoubangoTelecom/compv repopixeltable/pixeltable repovoxel51/fiftyone repoEventual-Inc/Daft paperReal-Time Visual Intelligence on Low-Cost UAVs: A Modular Approach for Tracking, Scanning, and Navigation paperSearch-based Testing of Vision Language Models for In-Car Scene Understanding paperSeek to Segment: Active Perception for Panoramic Referring Segmentation repoisl-org/Open3D repopytorch/vision repokornia/kornia repoSomnusochi/VLM-AutoYOLO repoJosephOIbrahim/Comfy-Cozy

Related across the graph

paperNo Place to Hide: Benchmarking Video Hallucination with Background-Controlled Pairs paperDynamo: Dynamic Skill-Tool Evolution for Vision-Language Agents paperUnderstanding How Humans Inject Knowledge into Machine Learning Workflows through Visual Analytics repoDoubangoTelecom/compv repoNVIDIA-NeMo/DataDesigner paperASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving paperHAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration paperFlexViT: A Flexible FPGA-based Accelerator for Edge Vision Transformers repoisl-org/Open3D paperReal-Time Visual Intelligence on Low-Cost UAVs: A Modular Approach for Tracking, Scanning, and Navigation paperMVP-Nav: Multi-layer Value Map Planner Navigator paperSeek to Segment: Active Perception for Panoramic Referring Segmentation repoUnpr3dictable/neurizon.ai repopytorch/vision repomrreviewai/ai-tools-for-content-creators paperScaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent paperPreserve the Hard, Regenerate the Rest: Uncertainty-Guided Synthetic Training Data Augmentation with Diffusion Models repoEventual-Inc/Daft paperGenAU: Language-Grounded Industrial Anomaly Understanding with Vision-Language Models repoSomnusochi/VLM-AutoYOLO repokrushna081/chakravyuh-ai repoopen-edge-platform/geti paperVLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes paperArticulating then Matching: Zero-Shot Shape Matching for Uncurated Data paperDomain Adaptation with Adaptive Imagination for Visual Reinforcement Learning under Limited Target Data paperDriveWeaver: Point-Conditioned Video Inpainting for Controllable Vehicle Insertion in Autonomous Driving Simulation paperGoku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing paperInk3D: Sculpting 3D Assets with Extremely Complex Textures via Video Generative Models paperUnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image repoomnigent-ai/omnigent paperStructured 4D Latent Predictive Model for Robot Planning repokornia/kornia paperDPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View Transformers repoJosephOIbrahim/Comfy-Cozy paperTreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language Models paperDataset Biases and Shortcut Learning in Motion-Based AI-Generated Video Detection reporoboflow/inference repopixeltable/pixeltable repovoxel51/fiftyone paperSearch-based Testing of Vision Language Models for In-Car Scene Understanding