多模态交互

🌐 语言: English 中文

本目录收集了具身智能中与多模态交互相关的论文和代码实现。

主要内容

手动添加的论文

日期 标题 论文 代码 推荐指数
2024-09 ReMEmbR: Retrieval-Enhanced Memory for Robot Reasoning and Navigation [pdf] NVIDIA-AI-IOT/remembr ⭐️⭐️⭐️
2024 Gesture-Based Control for Robotic Systems [pdf] ⚠️ ⭐️⭐️
2023 Natural Language Instructions for Robot Manipulation [pdf] example/lang_robot ⭐️⭐️⭐️

自动更新的论文

日期 标题 论文 代码 推荐指数
2025-06-25 [Multi-Agent] Personalized Mental State Evaluation in Human-Robot Interaction using Federated Learning [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-25 How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-24 [en] The MOTIF Hand: A Robotic Hand for Multimodal Observations with Thermal, Inertial, and Force Sensors [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-23 [en] TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-23 [Multi-Agent] Situated Haptic Interaction: Exploring the Role of Context in Affective Perception of Robotic Touch [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-19 History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-19 On using AI for EEG-based BCI applications: problems, current challenges and future trends [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-18 [Designing Intent] Designing Intent: A Multimodal Framework for Human-Robot Cooperation in Industrial Workspaces [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-18 [en] Vision in Action: Learning Active Perception from Human Demonstrations [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-18 I Know You’re Listening: Adaptive Voice for HRI [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-17 Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-16 Multimodal “Puppeteer”: An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-16 [en] A Cooperative Contactless Object Transport with Acoustic Robots [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-13 [en] Robot Context Protocol (RCP): A Runtime-Agnostic Interface for Agent-Aware Robot Control [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-13 Robotic System for Chemical Experiment Automation with Dual Demonstration of End-effector and Jig Operations [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-12 [RT-VC] RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-12 Using Vision Language Models to Detect Students’ Academic Emotion through Facial Expressions [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-11 Integrating Quantized LLMs into Robotics Systems as Edge AI to Leverage their Natural Language Processing Capabilities [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-11 A Navigation Framework Utilizing Vision-Language Models [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-11 Test-Time Adaptation for Generalizable Task Progress Estimation [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-11 A Unified Framework for Probabilistic Dynamic-, Trajectory- and Vision-based Virtual Fixtures [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-11 Cybernetic Marionette: Channeling Collective Agency Through a Wearable Robot in a Live Dancer-Robot Duet [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-10 [Help or Hindrance] Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-10 [en] Towards Biosignals-Free Autonomous Prosthetic Hand Control via Imitation Learning [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-09 [en] LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-09 [en] BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-09 Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion Models in a Vision-Language-Action Framework [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-07 Active Test-time Vision-Language Navigation [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-07 [en] Attention-Based Convolutional Neural Network Model for Human Lower Limb Activity Recognition using sEMG [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-06 [HMVLM] HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-05 [GEX] GEX: Democratizing Dexterity with Fully-Actuated Dexterous Hand and Exoskeleton Glove [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-05 [en] Multimodal Limbless Crawling Soft Robot with a Kirigami Skin [pdf] ⚠️ ⭐️⭐️⭐️
2025-06-02 EPFL-Smart-Kitchen-30: Densely annotated cooking dataset with 3D kinematics to challenge video and language models [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-30 Learning API Functionality from Demonstrations for Tool-based Agents [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-30 Towards Tangible Immersion for Cobot Programming-by-Demonstration: Visual, Tactile and Haptic Interfaces for Mixed-Reality Cobot Automation in Semiconductor Manufacturing [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-29 Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-29 Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-28 [ForceVLA] ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-26 [DiffVLA] DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-26 Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-26 The Many Challenges of Human-Like Agents in Virtual Game Environments [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-26 [en] CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-23 [Trajectory] DTRT: Enhancing Human Intent Estimation and Role Allocation for Physical Human-Robot Collaboration [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-23 [VideoGameBench] VideoGameBench: Can Vision-Language Models complete popular video games? [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-22 [Circle-RoPE] Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-22 [DriveMoE] DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-21 [ClickSight] ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-21 Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-20 Sketch Interface for Teleoperation of Mobile Manipulator to Enable Intuitive and Intended Operation: A Proof of Concept [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-20 Robotic Monitoring of Colorimetric Leaf Sensors for Precision Agriculture [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-20 Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-20 [Multi-Agent] UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-20 [en] Certifiably Safe Manipulation of Deformable Linear Objects via Joint Shape and Tension Prediction [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-19 Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-19 [Multi-Agent] Interpretable Robotic Friction Learning via Symbolic Regression [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-16 Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-16 [en] Open-Source Multi-Viewpoint Surgical Telerobotics [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-15 Context-aware collaborative pushing of heavy objects using skeleton-based intention prediction [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-14 [Flash-VL 2B] Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-14 Grasp EveryThing (GET): 1-DoF, 3-Fingered Gripper with Tactile Sensing for Robust Grasping [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-13 [CLTP] CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-13 The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-13 [en] A Social Robot with Inner Speech for Dietary Guidance [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-13 WaLLM – Insights from an LLM-Powered Chatbot deployment via WhatsApp [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 Intuitive Human-Robot Interfaces Leveraging on Autonomy Features for the Control of Highly-redundant Robots [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 [Adaptive] Hybrid Control Strategies for Safe and Adaptive Robot-Assisted Dressing [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 [AcoustoBots] AcoustoBots: A swarm of robots for acoustophoretic multimodal interactions [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 When Near Becomes Far: From Rayleigh to Optimal Near-Field and Far-Field Boundaries [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 [BodyGPS] BodyGPS: Anatomical Positioning System [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 Circulators based on Coupled Quantum Anomalous Hall Insulators and Resonators [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 [UAV-CodeAgents] UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-12 [TPT-Bench] TPT-Bench: A Large-Scale, Long-Term and Robot-Egocentric Dataset for Benchmarking Target Person Tracking [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-09 Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-09 An Empirical Study of Fuzz Harness Degradation [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-09 Polymer-Shell Coating of Mie-Resonant Silicon Nanospheres for Controlled Fabrication of Self-Assembled Monolayer [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-09 Preferential Attachment Trees with Vertex Death: Persistence of the Maximum Degree [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-09 Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation [pdf] yc4ny/SVAD ⭐️⭐️⭐️
2025-05-08 A Survey [pdf] hzxie/awesome-3d-scene-generation ⭐️⭐️⭐️
2025-05-08 Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Generating Physically Stable and Buildable LEGO Designs from Text [pdf] AvaLovelace1/LegoGPT ⭐️⭐️⭐️
2025-05-08 Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Preference Alignment via Comparison Oracles [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Understanding Perception and Reasoning through Model Merging [pdf] shiqichen17/vlm_merging ⭐️⭐️⭐️
2025-05-08 Primordial black-hole formation and heavy r-process element synthesis from the cosmological QCD transition. Two aspects of an inhomogeneous early Universe [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Marsden–Meyer–Weinstein reduction for $k$-contact field theories [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Representation Stability for Marked Graph Complexes [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 A Dataset of Misleading Narratives Surrounding Recent UK General Elections [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Emergence of Spin-Polarized Unconventional Skin Effect in Hatano-Nelson Model [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 A Study on Improvement of Image Quality in Quantum Polarized Microscopy using an Entangled-Photon Source [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Conversational Process Model Redesign [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Reinforcement Learning-Driven Data Assimilation with Uncertainty-Aware Constrained Ensembles [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 The Brownian marble [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Novel Forms of Early Dark Energy [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 todd [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 a cosmic explosion with a complex off-axis jet and cocoon from a massive progenitor [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 implications for the observed abundance of ultra-violet luminous galaxies at z>10 [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Leveraging Co-Speech Gestures to Augment LLM-Based Interaction in Virtual Reality [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Stabilization of Kac polynomials [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Scalable Bernoulli factories for Bayesian inference with intractable likelihoods [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 The effective energy of a lattice metamaterial [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Non-Markovianity in collision models with initial intra-environment correlations [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Boundary Energy-Momentum Tensors for Asymptotically Flat Spacetimes [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Statistical Characterization of Entanglement Degradation Under Markovian Noise in Composite Quantum Systems [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Two-dimensional water waves with constant vorticity and general bottom topography [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Theoretical modeling of approximate universality of tidally deformed neutron stars [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Empowering Scientific Workflows with Federated Agents [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Efficient Data Filtering and Verification for High-Quality LLM Training Data [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 On differentiation of integrals in Lebesgue spaces [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 An efficient second-order cone programming approach for dynamic optimal transport on staggered grid discretization [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 an LLM-based Literary Translation evaluation metric with Professional Question Answering [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Robustly optimal dynamics for active matter reservoir computing [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 A new time-dependent quantum theory based on Tsallis’ distribution [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 A Budget-Constrained Routing Perspective [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Evidence of chiral fermion edge modes through geometric engineering of thermal Hall in $α$-RuCl$_3$ [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Variable Selection for Fixed and Random Effects in Multilevel Functional Mixed Effects Models [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Fermi lune and transdimensional orbital magnetism in rhombohedral multilayer graphene [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Post-Training Compression for Ultra-Low Power Hyperdimensional Computing [pdf] ⚠️ ⭐️⭐️⭐️
2025-05-08 Dynamic injection of a compressible gas into a confined porous layer [pdf] ⚠️ ⭐️⭐️⭐️

📊 统计

最后更新: 2025-06-28