Conversational AI on Embedded Hardware: Baru HRI System
Maedcore builds Baru: a conversational AI system running on embedded hardware with multi-modal sensor input (voice, touch, distance), adaptive NLP, and session-level personalisation. Full HRI case study.
Written by Eduardo Fuentevilla Blanco
Robotics Engineer at Maedcore · Robotics Engineer LinkedIn ↗
Conversational AI on Embedded Hardware: The Baru HRI System
Executive summary: Baru is a complete human-robot interaction (HRI) system built by Maedcore: a conversational AI engine running on embedded hardware, with a multi-modal sensor array (distance, touch, voice), session-state management, and continuous NLP personalisation. The deployment form factor is a child-safe zoomorphic enclosure — a design decision that eliminates the acceptance barrier in constrained interaction contexts. The underlying architecture — edge AI inference, multi-modal input fusion, adaptive response generation — is directly applicable to industrial HMI panels, voice-controlled machinery, and human-machine collaboration systems in manufacturing environments.
The Engineering Challenge
Building a conversational AI system that runs reliably on constrained embedded hardware — without cloud dependency — while handling multi-modal input in real time presents three core challenges:
Latency under resource constraints. NLP inference and sensor polling must run concurrently on a single embedded system without perceptible response delay. Any lag between user input and system response breaks the interaction loop and degrades perceived intelligence.
Multi-modal input fusion. The system receives simultaneous input from three sensor types — ultrasonic distance sensors, capacitive touch sensors, and a microphone array — each with different polling rates and data formats. The controller must fuse these streams into a coherent interaction context.
Adaptive personalisation without cloud dependency. Session state and interaction history are stored and processed on-device, allowing the AI to personalise responses over time without transmitting sensitive data to external servers.
System Architecture

The Baru system operates on three integrated layers:
Layer 1 — Multi-Modal Sensor Input
Three sensor streams feed the interaction controller simultaneously:
- Ultrasonic distance sensors — detect user proximity and presence, triggering wake-on-approach behaviour without requiring explicit user action.
- Capacitive touch sensors — register intentional contact input, mapped to interaction triggers and conversational branching points.
- Microphone array — captures voice input for NLP processing, with hardware-level noise filtering for noisy environments.
All three streams are polled asynchronously and merged by the input fusion controller, which assigns priority weights based on interaction context (e.g., voice dominates when the user is speaking; proximity dominates at session initialisation).
Layer 2 — Conversational AI Engine
The NLP pipeline processes fused input and generates responses on-device:
Speech-to-intent classification maps spoken input to one of the system’s defined interaction intents, handling natural linguistic variation without requiring exact phrasing.
Contextual response generation selects and adapts output based on the current session state, the user’s interaction history, and the active intent. Responses are generated as parameterised templates, allowing variation without requiring generative inference at runtime.
Emotion and engagement signalling is expressed via the expressive display (facial states) and audio output, synchronized to the NLP response.
Layer 3 — Session State and Personalisation
Interaction data is persisted per user session:
- Vocabulary level and response complexity adapt to demonstrated linguistic patterns.
- Engagement metrics (response latency, touch frequency, proximity patterns) update the personalisation model after each session.
- Cumulative data is available for export to external analysis systems via a secure local API — no cloud transmission required.
Implementation: Embedded Hardware Integration

The hardware integration process involved three engineering phases:
Component specification and layout. Processing unit selection balanced NLP inference performance against power envelope and thermal constraints. Sensor placement was validated against occlusion patterns and interaction geometry — the system must detect approach from any angle, regardless of user height.
Real-time OS configuration. The embedded OS was configured for deterministic task scheduling, ensuring the NLP inference loop and sensor polling loops share CPU time without priority inversion or starvation under peak load.
Acoustic enclosure engineering. The microphone array required an acoustic geometry that maximises voice pickup while attenuating structural vibration from the servo-driven display actuators embedded in the same chassis.
Performance Results
| Metric | Result |
|---|---|
| Voice-to-response latency | < 800 ms end-to-end on-device |
| Sensor fusion polling rate | 60 Hz across all three streams |
| Session personalisation data | Stored and updated per interaction, no cloud dependency |
| Operating environment | Continuous multi-hour operation at room temperature |
| Input variability tolerance | Handles natural speech variation, background noise, and partial sensor occlusion |
Technology Applications Beyond This Deployment
The Baru HRI architecture addresses a class of problems that recurs across industrial and enterprise contexts:
Industrial HMI panels. A voice and touch interface running conversational AI on embedded hardware — with no cloud dependency — is directly applicable to factory-floor control panels where network connectivity is unreliable and response latency is critical.
Voice-controlled machinery. The multi-modal input fusion layer (voice + proximity + touch) provides a more robust control interface than single-modality voice systems, reducing false-trigger rates in noisy industrial environments.
Human-machine collaboration. The adaptive session state layer — building a behavioural model of the operator over time — is the foundation for assistive systems that adjust to individual work patterns rather than requiring fixed interaction protocols.
Accessibility interfaces. The system’s low-pressure, non-screen-dependent interaction model translates directly to accessible interfaces for operators with motor or cognitive constraints.
Technologies Used
Project developed with: Conversational AI — NLP — Embedded Systems — Edge AI Inference — Multi-Modal Sensor Fusion — Human-Robot Interaction (HRI) — Session State Management
Building an HRI or Conversational AI System?
Baru demonstrates Maedcore’s ability to take a conversational AI system from architecture to embedded hardware deployment — without cloud dependency, with full sensor integration and adaptive personalisation. If you need an HRI solution, an industrial voice interface, or an edge AI system for a constrained environment, request a technical consultation.
Talk to the AI Team | View AI & Software Services | See All Success Stories
About the Author
Eduardo Fuentevilla Blanco
Robotics Engineer
For over a decade, I have been driven by a single mission: leveraging AI and robotics to build a world of automated production. I believe that by creating self-sufficient systems, we can empower people to refocus on what truly matters—their families and their passions. My expertise spans from winning prestigious European startup competitions to architecting complex, integrated hardware and software projects. I specialize in bridging the gap between today's industrial challenges and tomorrow's autonomous solutions.
Expert review: Maedcore Team
Ready to transform your company?
Book a free 30-minute meeting with an engineer.