newsNVIDIA BlogTrust 88 · LabPublished 3d agoLive · 3d ago
Into the Omniverse: Three Workflows for Improving Vision AI Agent Accuracy With Synthetic Data and Fine-Tuning
Editor’s note: This post is part of Into the Omniverse, a series focused on how developers, 3D practitioners, and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse. Vision AI agents are becoming a practical way to automatically turn video data from the physical world into operational intelligence in factories, […]
Covers
paperHAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent CollaborationpaperDynamo: Dynamic Skill-Tool Evolution for Vision-Language AgentspaperScaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agentrepomrreviewai/ai-tools-for-content-creatorsrepokrushna081/chakravyuh-ai
Covers (incoming)
paperGoku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video EditingpaperUnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or ImagepaperVLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed ScenespaperDomain Adaptation with Adaptive Imagination for Visual Reinforcement Learning under Limited Target DatapaperArticulating then Matching: Zero-Shot Shape Matching for Uncurated DatapaperASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous DrivingpaperDriveWeaver: Point-Conditioned Video Inpainting for Controllable Vehicle Insertion in Autonomous Driving SimulationpaperMVP-Nav: Multi-layer Value Map Planner NavigatorpaperNo Place to Hide: Benchmarking Video Hallucination with Background-Controlled PairspaperFlexViT: A Flexible FPGA-based Accelerator for Edge Vision TransformerspaperDPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View TransformerspaperPreserve the Hard, Regenerate the Rest: Uncertainty-Guided Synthetic Training Data Augmentation with Diffusion ModelspaperTreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language ModelspaperDataset Biases and Shortcut Learning in Motion-Based AI-Generated Video DetectionpaperGenAU: Language-Grounded Industrial Anomaly Understanding with Vision-Language ModelspaperUnderstanding How Humans Inject Knowledge into Machine Learning Workflows through Visual AnalyticspaperStructured 4D Latent Predictive Model for Robot PlanningpaperInk3D: Sculpting 3D Assets with Extremely Complex Textures via Video Generative ModelsrepoUnpr3dictable/neurizon.airepoNVIDIA-NeMo/DataDesignerrepoomnigent-ai/omnigentreporoboflow/inferencerepoopen-edge-platform/getirepoDoubangoTelecom/compvrepopixeltable/pixeltablerepovoxel51/fiftyonerepoEventual-Inc/DaftpaperReal-Time Visual Intelligence on Low-Cost UAVs: A Modular Approach for Tracking, Scanning, and NavigationpaperSearch-based Testing of Vision Language Models for In-Car Scene UnderstandingpaperSeek to Segment: Active Perception for Panoramic Referring Segmentationrepoisl-org/Open3Drepopytorch/visionrepokornia/korniarepoSomnusochi/VLM-AutoYOLOrepoJosephOIbrahim/Comfy-Cozy
Related across the graph
paperNo Place to Hide: Benchmarking Video Hallucination with Background-Controlled PairspaperDynamo: Dynamic Skill-Tool Evolution for Vision-Language AgentspaperUnderstanding How Humans Inject Knowledge into Machine Learning Workflows through Visual AnalyticsrepoDoubangoTelecom/compvrepoNVIDIA-NeMo/DataDesignerpaperASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous DrivingpaperHAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent CollaborationpaperFlexViT: A Flexible FPGA-based Accelerator for Edge Vision Transformersrepoisl-org/Open3DpaperReal-Time Visual Intelligence on Low-Cost UAVs: A Modular Approach for Tracking, Scanning, and NavigationpaperMVP-Nav: Multi-layer Value Map Planner NavigatorpaperSeek to Segment: Active Perception for Panoramic Referring SegmentationrepoUnpr3dictable/neurizon.airepopytorch/visionrepomrreviewai/ai-tools-for-content-creatorspaperScaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B AgentpaperPreserve the Hard, Regenerate the Rest: Uncertainty-Guided Synthetic Training Data Augmentation with Diffusion ModelsrepoEventual-Inc/DaftpaperGenAU: Language-Grounded Industrial Anomaly Understanding with Vision-Language ModelsrepoSomnusochi/VLM-AutoYOLOrepokrushna081/chakravyuh-airepoopen-edge-platform/getipaperVLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed ScenespaperArticulating then Matching: Zero-Shot Shape Matching for Uncurated DatapaperDomain Adaptation with Adaptive Imagination for Visual Reinforcement Learning under Limited Target DatapaperDriveWeaver: Point-Conditioned Video Inpainting for Controllable Vehicle Insertion in Autonomous Driving SimulationpaperGoku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video EditingpaperInk3D: Sculpting 3D Assets with Extremely Complex Textures via Video Generative ModelspaperUnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Imagerepoomnigent-ai/omnigentpaperStructured 4D Latent Predictive Model for Robot Planningrepokornia/korniapaperDPPE: Rethinking Camera-Based Positional Encoding for Scaling Multi-View TransformersrepoJosephOIbrahim/Comfy-CozypaperTreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language ModelspaperDataset Biases and Shortcut Learning in Motion-Based AI-Generated Video Detectionreporoboflow/inferencerepopixeltable/pixeltablerepovoxel51/fiftyonepaperSearch-based Testing of Vision Language Models for In-Car Scene Understanding
