Topic

Cv

50 items across the graph — tagged with Cv.

From the graph · 50

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

shimat/opencvsharp

OpenCV wrapper for .NET

scverse/scanpy

Single-cell analysis in Python. Scales to >100M cells.

emgucv/emgucv

Emgu CV is a cross platform .Net wrapper to the OpenCV image processing library.

TimefoldAI/timefold-solver

The open source Solver AI for Java and Kotlin to optimize scheduling and routing. Solve the vehicle routing problem, employee rostering, task assignment, mainte…

keras-team/keras-hub

Pretrained model hub for Keras 3.

mosecorg/mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

scverse/anndata

Annotated data.

ROCm/MIVisionX

AMD MIVisionX is a computer vision toolkit built around a highly optimized, conformant open-source implementation of the Khronos OpenVX™ 1.3 specification. As o…

capsulerun/vpod

Lightweight, secure linux sandboxes for untrusted processes.

samuelhm/42Jobs

AI-powered job search platform for junior software engineers: job fetching, smart filtering, keyword extraction, and ATS-optimized CV generation. Built with .NE…

Ruichen0424/ai-paper-explorer

AIPaperX: A one-stop search engine for top-tier AI conference & journal papers

uvaishmohd307-sketch/GestureX

AI-powered virtual mouse using hand gestures built with Python, OpenCV and MediaPipe.

DisciplineGen-1M: A Large-Scale Dataset for Multidisciplinary Visual Generation and Editing

Recent image generation and editing models can produce visually appealing natural images, yet they remain unreliable when the target image is a knowledge-intens…

HiRes: A Hierarchical Cascaded Method for Resistor Value Identification

Accurate identification of resistor values from unconstrained images remains a challenging computer vision task due to variations in lighting, orientation, scal…

ASTAD: Asymmetric Style Transfer for Synthetic-to-Real Adaptation in Autonomous Driving

Synthetic data mitigates the data scarcity problem in autonomous driving perception. However, the synthetic-to-real gap leads to performance degradation, hinder…

Your Data Manifold is Secretly a Reward Model: Shell-LCC for Text-to-Video Generation

Recent text-to-video (T2V) diffusion models rely heavily on auxiliary reward signals (e.g., via reward models or DPO) to align generated content with human aest…

Efficient PEFT Methods with Adaptive Checkpointing for Vision Models and VLMs on Resource Constrained Consumer-GPUs

Modern pretrained vision models achieve strong accuracy but demand substantial GPU memory for fine-tuning, making edge deployment impractical. This paper compar…

High-dimensional Embedding Prior for Noisy K-space Domain MRIReconstruction

Magnetic resonance imaging (MRI) reconstruction under realistic acquisition conditions can be fundamentally viewed as estimating the underlying k-space distribu…

AnyBokeh: Physics-Guided Any-to-Any Bokeh Editing with Optical Fingerprint Transfer

Depth-of-field control is a fundamental tool in photography, yet post-capture bokeh editing from a single image remains challenging. A practical editor should h…

FR-DETR: Frequency and Recurrent Feature Refinement for Robust Object Detection under Adverse Weather

Object detection under adverse weather remains challenging due to severe visual degradations and domain shifts. Existing enhancer-based approaches attempt to im…

Perceive-to-Reason: Decoupling Perception and Reasoning for Fine-Grained Visual Reasoning

Fine-grained visual reasoning remains challenging for vision-language models, especially when small but critical visual cues are buried in high-resolution image…

PS-MOT: Cultivating Instance Awareness from Point Seeds for Multi-Object Tracking

We introduce Point-supervised Multi-Object Tracking (PS-MOT) as a cost-effective alternative to traditional bounding box supervision, shifting the focus from sp…

Hyper-Network Neural Functional Maps for Unsupervised Robust 3D Shape Matching

Functional maps are the cornerstone of recent non-rigid 3D shape matching methods due to their efficiency and performance. However, existing methods struggle wi…

CPDDNet: Color-Polarization Denoising and Demosaicking Network

Color-polarization imaging using a color-polarization filter array (CPFA) sensor captures both texture (color intensity) and physical (polarization) information…

Real-Time Visual Intelligence on Low-Cost UAVs: A Modular Approach for Tracking, Scanning, and Navigation

Autonomous drones are rapidly transforming modern warfare and civil applications alike. This paper presents the development of an integrated intelligent drone s…

RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception

LiDAR has increasingly been integrated into traffic cameras to expand coverage and mitigate occlusion in roadside cooperative perception. However, how unimodal…

Automated Background Swapping for Robustness against Spurious Backgrounds

Classifiers based on Deep Neural Networks exhibit strong performance across domains, yet can fail catastrophically if they rely on spurious correlations, i.e.,…

Seek to Segment: Active Perception for Panoramic Referring Segmentation

Existing referring segmentation models passively process static images captured from fixed perspectives, limiting their applicability in Embodied AI, where agen…

3D Scene-Adaptive Trajectory-Controllable Human Image Animation with Camera Movement

Human image animation, which aims to generate a video of a reference subject following a provided action sequence, has received increasing research interest. Wi…

W4A4 Quantization for Inference on Wan2.2-I2V-A14B

We summarize our submission to Sub-Challenge 1: W4A4 Quantization for Inference (HiF4 / MXFP4) of the ICME 2026 Low-Bit-width Large-Model Quantization Challenge…

PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception

We introduce PerceptionRubrics, a rubric-based evaluation framework that addresses the gap between saturated benchmark scores and real-world brittleness. Shifti…

FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data

Forest attributes are essential for national-scale resource monitoring. Airborne LiDAR metrics are among the auxiliary variables most strongly correlated with f…

Towards Metric-Agnostic Trajectory Forecasting

Accurate trajectory forecasting of surrounding traffic participants is a core capability for autonomous driving, enabling vehicles to anticipate behavior and pl…

QuaMoE-DRF: Proactive Beam and Rate Adaptation via Multimodal Dynamic Radio Map Forecasting in ISAC Networks

Static radio maps provide location-dependent propagation priors, but they cannot capture short-term blockage caused by moving objects. Direct sensing-assisted b…

GaussianEmoTalker: Real-Time Emotional Talking Head Synthesis with Audio-Driven and Blendshape-Based 3D Gaussian Splatting

Audio-driven talking head synthesis has achieved impressive progress in lip synchronization and visual quality, yet generating expressive emotional avatars with…

SuperFlex: Deformable Superquadrics for Point Cloud Decomposition

Superquadrics have proven to provide a compact, geometrically meaningful representation for 3D objects. However, existing methods suffer from limited reconstruc…

D$^{2}$R$^{2}$OSR: Degradation-Disentangled Representation for Real-World Omnidirectional Image Super-Resolution

With the growing demand for immersive visual experiences, high-quality omnidirectional images (ODIs) have become increasingly important. However, limitations in…

RayPE: Ray-Space Positional Encoding for 3D-Aware Video Generation

Modern video diffusion transformers position their tokens through RoPE on the (u,v,t) axes -- a description of the camera's sampling grid that says nothing abou…

Learning to Evolve Scenes: Reasoning about Human Activities with Scene Graphs

Understanding human behavior while interacting with the surrounding world is crucial for many applications of embodied AI. First-person videos are particularly…

Exact and Deterministic Patch Descriptor Retrieval via Hierarchical Normalization

We present a patch descriptor retrieval method that returns the exact nearest neighbour -- provably identical to exhaustive full-vector search -- while evaluati…

Learning from Reliable Latent Prompts for Visual Recognition with Missing Modalities

Large-scale multimodal models (LMMs) have achieved superior performance in visual recognition by synergizing information across diverse, massive-scale paired mo…

GMO-E$^2$DIT: Grounded Multi-Operation Editing for E-Commerce Images

Real-world e-commerce image editing often requires multiple, localized, and auditable operations rather than global restyling. This compositional nature poses a…

MVP-Nav: Multi-layer Value Map Planner Navigator

Zero-shot Object Goal Navigation (ZSON) with RGB-only perception poses a fundamental challenge for embodied agents, as the absence of explicit depth information…

MirrorPPR: Exemplar-Based Portrait Photo Retouching

While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine…

Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN

Reinforcement learning from human feedback (RLHF) for 3D generation is now established across a number of works, but most existing pipelines optimise explicit s…

RSICCLLM: A Multimodal Large Language Model for Remote Sensing Image Change Captioning

Remote Sensing Image Change Captioning (RSICC) aims to describe changes between bi-temporal remote sensing images and holds significant research and application…

SatSplatDiff: Geometry-preserving generative refinement for high-fidelity satellite Gaussian Splatting

Gaussian Splatting has been recently explored for satellite 3D reconstruction, demonstrating flexibility and efficiency in representing radiometrically diverse…

No Place to Hide: Benchmarking Video Hallucination with Background-Controlled Pairs

We introduce VidPair-Halluc, a new benchmark for evaluating video hallucination in large video models (LVMs) under rigorous and controlled conditions. Unlike pr…

Ink3D: Sculpting 3D Assets with Extremely Complex Textures via Video Generative Models

Recent 3D generative models can synthesize high-quality geometry but often struggle to reproduce intricate textures from reference images, largely due to the sc…

Related topics

cs.CV 37 machine-learning 9 python 5 computer-vision 5 deep-learning 4 llm 3 cv 3 ai 3 dotnet 3 artificial-intelligence 3 jax 2 bioinformatics 2

Search Cv →All topics →