H
HARIS · IAU Graduation Project · 2026

Real-time hostile activity
recognition for CCTV.

HARIS is a multi-stage surveillance AI that detects weapons, tracks people across frames, classifies actions from skeletons, and reasons about threat context — producing auditable alerts instead of black-box scores. Built for operators, not dashboards.

Live demo available Imam Abdulrahman Bin Faisal University College of Computer Science & IT Graduation showcase · May 2026

01The problem we're solving

Most CCTV deployments are reactive. Hours of footage are reviewed only after an incident, cameras don't talk to each other, and the few "smart" systems that exist flood operators with false alarms or hide behind a single opaque score. A security operator watching 16 camera feeds cannot physically pay attention to all of them — and the moment that matters is usually the one nobody was watching.

HARIS is an AI-first video intelligence system that watches every feed in real time, flags the moments that matter, and tells operators why it flagged them — with a visible skeleton, a tracked identity, a weapon bounding box, and a reasoning trail they can audit.

02The pipeline

Five specialist models run per frame, each solving one sub-problem and passing structured evidence to the next. No single network is asked to do everything — that's what makes the output auditable.

T1
RT-DETR
People + weapons object detection
T2
BoT-SORT
Multi-object tracking across frames
T2
RTMPose
17-keypoint skeleton per tracked person
T3
ST-GCN
Skeleton-based action classification
T4
Alert engine
Persistence gates · holder binding · cooldown
StageComponentWhat it contributes
DetectRT-DETR (custom)Per-frame bounding boxes for person / gun / knife classes. Fine-tuned on CCTV-domain data for improved recall at surveillance angles.
TrackBoT-SORTAssigns stable IDs across frames so actions accumulate per-person, not per-detection.
PoseRTMPose17 COCO keypoints per tracked person. Multi-tier confidence gating separates render-only vs classifier-usable skeletons.
ActionST-GCNClassifies 3-second skeleton windows into normal / suspicious / hostile categories. Works from geometry, not pixels — robust to lighting and clothing.
Re-IDFaceNet watchlistOptional face re-identification for known persons of interest. Privacy-gated, operator-enabled.
ReasonAlert engineGeometric gates on weapon detections (aspect ratio, area ratio) + temporal persistence windows + cooldown logic + wrist-to-weapon holder binding.

03Measured results

False-positive reduction from the weapon detector v2 fine-tune, measured against the v1 baseline on CCTV-domain test footage.

−74%
Raw weapon false positives
Per-frame RT-DETR, CCTV domain
−41%
Alert-level false positives
After temporal + geometric gates
−17%
Video-level false alerts
Per-clip confirmed-alert rate
5.4k+
Curated training images
Post-deduplication, grouped for clean CV splits

Training set was built through a multi-source pipeline (COCO, CCTV footage, real-world guns/knives), deduplicated via perceptual hash + CLIP similarity, and grouped by source to prevent train/test leakage during cross-validation. Evaluation pairs CCTV-domain positives (UCF-Crime shooting/assault) with self-recorded negatives for dual-test-set honesty.

04Operator-facing features

The dashboard is designed as a professional DVR/NVR replacement — not a research notebook. Every overlay is toggleable, every threshold is live-tunable, and every alert shows its reasoning.

👤 Continuous body overlay

Skeleton + mannequin rendering for every tracked person. When pose estimation drops a frame, a last-valid-pose snapshot holds for ~1.5 seconds; below that, a generic body glow indicates presence without faking anatomy. Tracks fade along their velocity vector for 2 seconds after loss, instead of popping out abruptly.

🎯 Weapon-threshold slider

Operators dial confidence sensitivity in real time. Drag up to suppress noisy low-confidence detections; drag down for security-critical contexts. Applies live to the detection panel, overlay strokes, the auto-flagger, and the threat heatmap timeline — no page reload.

🔥 Threat-density heatmap

The scrub bar renders a time-density heatmap of detected threats across the clip, so operators can scan a 10-minute video at a glance and jump directly to the seconds that matter. Manual flag buttons persist operator annotations alongside model alerts.

🤝 Weapon holder binding

Detected weapons are bound to the wrist of the nearest tracked person via pose-based proximity. The overlay draws a line from weapon to holder, so operators don't have to guess who is carrying what in crowded scenes.

🌙 Night + tint modes

Per-clip brightness/contrast boost for low-light footage and a customizable dark tint for washed-out daytime clips. Both stackable, both persisted per-operator, both zero performance cost (GPU CSS filters).

🔔 Auditable alerts

Every alert carries its evidence: which frames fired, which persons were involved, which weapon class, which confidence scores, which temporal window. Operators can acknowledge, mark false-positive, or escalate — with the reasoning chain attached for review.

05What makes HARIS different

🧩 Specialist pipeline, not monolith

Every decision is traceable to a named sub-model. When HARIS is wrong, we know which stage was wrong — and can fix that stage without retraining the whole system. This is the difference between an engineered system and a demo.

⏱️ Temporal, not single-frame

Actions are classified over 3-second skeleton windows. Alerts require persistence — 3 out of 5 recent frames must agree, with at least 10% of the window above the persistence threshold. Single-frame detections never fire alerts.

🌐 API-first, multi-client ready

The web dashboard is one of several planned clients. A clean JSON boundary means future mobile and desktop clients can plug in to the same server: phone-as-camera, portable operator UI, on-demand face re-identification scans.

🔍 Honest evaluation

Evaluated on a dual test set: public UCF-Crime clips (shooting + assault categories) plus self-recorded domain-specific footage. Group-aware splits prevent source-leakage in the metrics. Source-level deduplication before training.

06Roadmap

Completed milestones and what's shipping before the graduation showcase.

Semester 1 · 2025
Foundations
Problem framing, literature review, initial dataset construction, first-pass detector and pose integration. Proposal defense passed.
Semester 2 · Q1 2026
Full pipeline end-to-end
RT-DETR + BoT-SORT + RTMPose + ST-GCN + alert engine running on uploaded video. Web dashboard with realtime overlay, socket-pushed alerts, and operator controls.
April 2026
Weapon detector v2 · overlay polish
Fine-tuned detector: −74% raw FP, −41% alert FP, −17% video-level FP vs v1. Continuous body overlay, threat heatmap, weapon-threshold slider, holder binding, night/tint modes.
Late April · Early May 2026
VLM-on-alert · FaceNet enable
Qwen2.5-VL-3B running locally on alert frames for natural-language threat summaries. FaceNet watchlist re-identification surfaced in the operator UI.

07Honest limitations

We publish the caps. A system that pretends to have no limits is a system that hides them from its operators — and that's the opposite of what surveillance AI should be.

Operational caps in the current build

  • Top-4 persons per frame get action labels. Tracking applies to everyone in frame; skeleton-based action classification applies to the four highest-detection-confidence persons. Documented and visible in the UI.
  • Video processing cap: 60 seconds at 10 fps per upload. Environment-overridable for longer evaluation runs; the dashboard is scoped for short-clip operator workflows.
  • Tracker re-identification and track-recovery are gated features. Code is shipped but disabled by default — the re-ID path has a wall-time cost we haven't optimized, and tracker-side appearance matching is a known next-pass upgrade.
  • Pose confidence gates silently drop low-quality skeletons. Far-field subjects below ~100px tall get bounding boxes and tracks but no action labels. The overlay surfaces this with a generic body glow, so operators aren't left wondering why a box is empty.

08Team

Five students · College of Computer Science & Information Technology · Imam Abdulrahman Bin Faisal University.

T
Turki
Team Leader
A
Aseel
Team Member
H
Hamza
Team Member
A
Anas
Team Member
K
Khalid
Team Member

Supervised by faculty at the College of Computer Science & Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Kingdom of Saudi Arabia.