Real-Time Computer Vision: Facial Landmark Detection and Projection Mapping
Maedcore builds a real-time computer vision pipeline: AI facial landmark detection at 60+ keypoints, sub-frame projection transform calculation, and dynamic visual mapping synchronized to live facial motion. Full technical case study.
Written by Eduardo Fuentevilla Blanco
Robotics Engineer at Maedcore · Robotics Engineer LinkedIn ↗
Real-Time Computer Vision: Facial Landmark Detection and Projection Mapping
Executive summary: Maedcore developed a real-time computer vision pipeline for dynamic projection mapping onto a moving subject: an AI-powered facial landmark detection model running at sub-frame latency, a projective transform calculation layer that compensates for head rotation, expression changes, and translational movement frame-by-frame, and a live compositing system that renders and projects effects synchronized to facial motion with no perceptible lag. The system processes camera input, runs landmark inference, calculates the deformation transform, composites the output frame, and drives the projector — all within a single deterministic loop at 30+ fps. The deployment application is live performance and interactive art; the computer vision architecture is directly applicable to industrial quality control inspection, robotics guidance systems, and any real-time human-machine vision interface.
The Computer Vision Challenge
Projecting a static image onto a static surface is a solved problem. Projecting dynamically generated visual content onto a moving, deforming surface — a human face — in real time introduces three hard requirements that compound each other:
Sub-frame landmark detection latency. The facial landmark model must complete inference and return updated keypoint coordinates within a single frame cycle (~33 ms at 30 fps). If landmark detection takes longer than one frame, the projection falls behind the subject’s movement and the visual alignment breaks.
Projective transform accuracy under motion. The transform matrix that maps from screen-space effect coordinates to the physical surface of the face must be recalculated on every frame, accounting for:
- Translation (the person moves closer, further, or sideways).
- Rotation (head tilts, turns, nods).
- Non-rigid deformation (facial expression changes the shape of the surface).
A linear homography is insufficient for non-rigid deformation. The system requires a per-region transform that handles the 3D curvature of facial geometry.
Single-loop determinism. Camera capture, landmark inference, transform calculation, compositing, and projector output must all complete within one frame cycle. If any stage is delayed by OS scheduling, network latency, or inference variance, the output lags and the visual coherence collapses.
System Architecture

The system is structured as a single deterministic processing loop with four stages executing sequentially on each frame:
Stage 1 — Camera Capture and Pre-Processing
A high-frame-rate camera captures the subject at 60 fps, providing a 16 ms head start over the 30 fps output frame rate. Frames are pre-processed (colour space conversion, exposure normalisation) before being passed to the landmark model. Pre-processing is implemented in hardware-accelerated code to stay within the frame budget.
Stage 2 — Facial Landmark Detection
The facial landmark model processes each captured frame and returns a set of 60+ keypoints representing the geometric structure of the face: eye corners, nose bridge, nostril positions, lip contour, jaw line, and forehead boundary.
The model runs on a dedicated inference accelerator (GPU or NPU) to isolate its latency from the CPU-bound transform and compositing stages. The model was selected and optimised for:
- Inference time < 15 ms on the target hardware.
- Keypoint stability — low jitter between consecutive frames under normal head motion.
- Robustness — consistent performance across different skin tones, lighting conditions, and face sizes in frame.
Stage 3 — Projective Transform Calculation
Using the 60+ landmark positions from Stage 2, the transform calculation stage computes the mapping from effect-coordinate-space to projector-output-space:
- Triangulation — the 60+ keypoints define a mesh of triangles covering the face surface.
- Per-triangle homography — for each triangle, a local affine transform is calculated mapping from effect coordinates to projector output coordinates.
- Warp application — the effect image is warped using the per-triangle transforms, producing a distorted output image where each region of the effect is correctly aligned to the corresponding region of the face.
This piecewise-linear approach handles the non-rigid deformation of facial expression changes without requiring a full 3D mesh reconstruction.
Stage 4 — Compositing and Projection Output
The warped effect frame is composited against any ambient layer, brightness-corrected for the ambient light level in the environment, and sent to the projector output buffer. The projector renders at 30 fps, aligned to the camera capture frame rate.
Hardware Configuration

The physical system consists of three hardware components positioned relative to each other at calibrated distances:
| Component | Specification |
|---|---|
| Camera | 60 fps high-frame-rate, low-motion-blur, hardware-triggered |
| Projector | High-luminance (3500+ lm) for visibility under ambient light; short-throw lens for close-range operation |
| Processing unit | GPU-equipped embedded system; dedicated inference accelerator |
Camera and projector are co-mounted to maintain a fixed geometric relationship, simplifying the projective transform calibration. System calibration uses a checkerboard target to establish the camera-to-projector homography baseline, which is applied as a pre-correction to every output frame.
Performance Results
| Metric | Result |
|---|---|
| Landmark detection keypoints | 60+ per frame |
| Output frame rate | 30 fps sustained |
| End-to-end pipeline latency | < 33 ms (sub-frame) |
| Perceptible lag | None under normal subject movement |
| Lighting robustness | Validated under controlled stage lighting, ambient gallery lighting, and mixed natural light |
| Subject variability | Validated across multiple face geometries, skin tones, and distances |

Technology Applications
The computer vision pipeline Maedcore built for this project addresses the core pattern of real-time vision-driven control systems — which recurs across industrial and enterprise contexts:
Industrial quality control inspection. The landmark-detection-to-transform pipeline is structurally identical to a vision system that detects feature positions on a manufactured component and calculates whether they fall within tolerance. The frame-rate real-time constraint and per-region transform calculation are the same engineering problems.
Robotics guidance systems. A robot that tracks a moving target — product on a conveyor, a human collaborator’s hand, a docking port on a vehicle — requires the same sub-frame perception-to-action latency that this projection system achieves.
Augmented reality overlays. Any system that renders virtual content aligned to a real-world surface in real time (AR headsets, industrial AR maintenance guidance, in-situ measurement overlays) uses the same projective transform and compositing architecture.
Real-time human-machine interfaces. The deterministic single-loop architecture — camera to inference to output with no buffering — is the pattern required for any vision-based HMI where response time to human movement is a design constraint.
Technologies Used
Project developed with: Computer Vision — Real-Time AI — Facial Landmark Detection — Projective Transform — Projection Mapping — OpenCV — Embedded Vision — GPU Inference — Mechatronics
Building a Real-Time Computer Vision System?
Maedcore engineered a complete real-time CV pipeline — perception, inference, transform, output — meeting sub-frame latency requirements on embedded hardware. If you need a computer vision system for inspection, guidance, HMI, or any real-time visual application, request a technical consultation.
Talk to the AI Team | View AI & Software Services | See All Success Stories
About the Author
Eduardo Fuentevilla Blanco
Robotics Engineer
For over a decade, I have been driven by a single mission: leveraging AI and robotics to build a world of automated production. I believe that by creating self-sufficient systems, we can empower people to refocus on what truly matters—their families and their passions. My expertise spans from winning prestigious European startup competitions to architecting complex, integrated hardware and software projects. I specialize in bridging the gap between today's industrial challenges and tomorrow's autonomous solutions.
Expert review: Maedcore Team
Ready to transform your company?
Book a free 30-minute meeting with an engineer.