Conversational AI with Embedded Sensors: Human-Robot Interaction Case Study

Conversational AI on Embedded Hardware: The Baru HRI System

Executive summary: Baru is a complete human-robot interaction (HRI) system built by Maedcore: a conversational AI engine running on embedded hardware, with a multi-modal sensor array (distance, touch, voice), session-state management, and continuous NLP personalisation. The deployment form factor is a child-safe zoomorphic enclosure — a design decision that eliminates the acceptance barrier in constrained interaction contexts. The underlying architecture — edge AI inference, multi-modal input fusion, adaptive response generation — is directly applicable to industrial HMI panels, voice-controlled machinery, and human-machine collaboration systems in manufacturing environments.

Funding: Developed by Maedcore as an R&D initiative with funding support from the Ayuntamiento de Madrid and Madrid Innovación.

The Engineering Challenge

Building a conversational AI system that runs reliably on constrained embedded hardware — without cloud dependency — while handling multi-modal input in real time presents three core challenges:

Latency under resource constraints. NLP inference and sensor polling must run concurrently on a single embedded system without perceptible response delay. Any lag between user input and system response breaks the interaction loop and degrades perceived intelligence.

Multi-modal input fusion. The system receives simultaneous input from three sensor types — ultrasonic distance sensors, capacitive touch sensors, and a microphone array — each with different polling rates and data formats. The controller must fuse these streams into a coherent interaction context.

Adaptive personalisation without cloud dependency. Session state and interaction history are stored and processed on-device, allowing the AI to personalise responses over time without transmitting sensitive data to external servers.

System Architecture

The Baru system operates on three integrated layers:

Three sensor streams feed the interaction controller simultaneously:

Ultrasonic distance sensors — detect user proximity and presence, triggering wake-on-approach behaviour without requiring explicit user action.
Capacitive touch sensors — register intentional contact input, mapped to interaction triggers and conversational branching points.
Microphone array — captures voice input for NLP processing, with hardware-level noise filtering for noisy environments.

All three streams are polled asynchronously and merged by the input fusion controller, which assigns priority weights based on interaction context (e.g., voice dominates when the user is speaking; proximity dominates at session initialisation).

Layer 2 — Conversational AI Engine

The NLP pipeline processes fused input and generates responses on-device:

Speech-to-intent classification maps spoken input to one of the system’s defined interaction intents, handling natural linguistic variation without requiring exact phrasing.

Contextual response generation selects and adapts output based on the current session state, the user’s interaction history, and the active intent. Responses are generated as parameterised templates, allowing variation without requiring generative inference at runtime.

Emotion and engagement signalling is expressed via the expressive display (facial states) and audio output, synchronized to the NLP response.

Layer 3 — Session State and Personalisation

Interaction data is persisted per user session:

Vocabulary level and response complexity adapt to demonstrated linguistic patterns.
Engagement metrics (response latency, touch frequency, proximity patterns) update the personalisation model after each session.
Cumulative data is available for export to external analysis systems via a secure local API — no cloud transmission required.

Implementation: Embedded Hardware Integration

The hardware integration process involved three engineering phases:

Component specification and layout. Processing unit selection balanced NLP inference performance against power envelope and thermal constraints. Sensor placement was validated against occlusion patterns and interaction geometry — the system must detect approach from any angle, regardless of user height.

Real-time OS configuration. The embedded OS was configured for deterministic task scheduling, ensuring the NLP inference loop and sensor polling loops share CPU time without priority inversion or starvation under peak load.

Acoustic enclosure engineering. The microphone array required an acoustic geometry that maximises voice pickup while attenuating structural vibration from the servo-driven display actuators embedded in the same chassis.

Performance Results

Metric	Result
Voice-to-response latency	< 800 ms end-to-end on-device
Sensor fusion polling rate	60 Hz across all three streams
Session personalisation data	Stored and updated per interaction, no cloud dependency
Operating environment	Continuous multi-hour operation at room temperature
Input variability tolerance	Handles natural speech variation, background noise, and partial sensor occlusion

Technology Applications Beyond This Deployment

The Baru HRI architecture addresses a class of problems that recurs across industrial and enterprise contexts:

Industrial HMI panels. A voice and touch interface running conversational AI on embedded hardware — with no cloud dependency — is directly applicable to factory-floor control panels where network connectivity is unreliable and response latency is critical.

Voice-controlled machinery. The multi-modal input fusion layer (voice + proximity + touch) provides a more robust control interface than single-modality voice systems, reducing false-trigger rates in noisy industrial environments.

Human-machine collaboration. The adaptive session state layer — building a behavioural model of the operator over time — is the foundation for assistive systems that adjust to individual work patterns rather than requiring fixed interaction protocols.

Accessibility interfaces. The system’s low-pressure, non-screen-dependent interaction model translates directly to accessible interfaces for operators with motor or cognitive constraints.

Technologies Used

Project developed with: Conversational AI — NLP — Embedded Systems — Edge AI Inference — Multi-Modal Sensor Fusion — Human-Robot Interaction (HRI) — Session State Management

Building an HRI or Conversational AI System?

Baru demonstrates Maedcore’s ability to take a conversational AI system from architecture to embedded hardware deployment — without cloud dependency, with full sensor integration and adaptive personalisation. If you need an HRI solution, an industrial voice interface, or an edge AI system for a constrained environment, request a technical consultation.

Talk to the AI Team | View AI & Software Services | See All Success Stories

Conversational AI on Embedded Hardware: Baru HRI System

Conversational AI on Embedded Hardware: The Baru HRI System

The Engineering Challenge

System Architecture

Layer 2 — Conversational AI Engine

Layer 3 — Session State and Personalisation

Implementation: Embedded Hardware Integration

Performance Results

Technology Applications Beyond This Deployment

Technologies Used

Building an HRI or Conversational AI System?

Further Reading

Ready to transform your company?

Read Next

Conversational AI on Embedded Hardware: Baru HRI System

Conversational AI on Embedded Hardware: The Baru HRI System

The Engineering Challenge

System Architecture

Layer 1 — Multi-Modal Sensor Input

Layer 2 — Conversational AI Engine

Layer 3 — Session State and Personalisation

Implementation: Embedded Hardware Integration

Performance Results

Technology Applications Beyond This Deployment

Technologies Used

Building an HRI or Conversational AI System?

Further Reading

Ready to transform your company?

Read Next