HARIS · IAU Graduation Project · 2026

Real-time hostile activity
recognition for CCTV.

HARIS is a multi-stage surveillance AI that detects weapons, tracks people across frames, classifies actions from skeletons, and reasons about threat context — producing auditable alerts instead of black-box scores. Built for operators, not dashboards.

Live demo available Imam Abdulrahman Bin Faisal University College of Computer Science & IT Graduation showcase · May 2026

01The problem we're solving

Most CCTV deployments are reactive. Hours of footage are reviewed only after an incident, cameras don't talk to each other, and the few "smart" systems that exist flood operators with false alarms or hide behind a single opaque score. A security operator watching 16 camera feeds cannot physically pay attention to all of them — and the moment that matters is usually the one nobody was watching.

HARIS is an AI-first video intelligence system that watches every feed in real time, flags the moments that matter, and tells operators why it flagged them — with a visible skeleton, a tracked identity, a weapon bounding box, and a reasoning trail they can audit.

02The pipeline

Five specialist models run per frame, each solving one sub-problem and passing structured evidence to the next. No single network is asked to do everything — that's what makes the output auditable.

RT-DETR

People + weapons object detection

BoT-SORT

Multi-object tracking across frames

RTMPose

17-keypoint skeleton per tracked person

ST-GCN

Skeleton-based action classification

Alert engine

Persistence gates · holder binding · cooldown

Stage	Component	What it contributes
Detect	RT-DETR (custom)	Per-frame bounding boxes for person / gun / knife classes. Fine-tuned on CCTV-domain data for improved recall at surveillance angles.
Track	BoT-SORT	Assigns stable IDs across frames so actions accumulate per-person, not per-detection.
Pose	RTMPose	17 COCO keypoints per tracked person. Multi-tier confidence gating separates render-only vs classifier-usable skeletons.
Action	ST-GCN	Classifies 3-second skeleton windows into normal / suspicious / hostile categories. Works from geometry, not pixels — robust to lighting and clothing.
Re-ID	FaceNet watchlist	Optional face re-identification for known persons of interest. Privacy-gated, operator-enabled.
Reason	Alert engine	Geometric gates on weapon detections (aspect ratio, area ratio) + temporal persistence windows + cooldown logic + wrist-to-weapon holder binding.

03Measured results

False-positive reduction from the weapon detector v2 fine-tune, measured against the v1 baseline on CCTV-domain test footage.

−74%

Raw weapon false positives

Per-frame RT-DETR, CCTV domain

−41%

Alert-level false positives

After temporal + geometric gates

−17%

Video-level false alerts

Per-clip confirmed-alert rate

5.4k+

Curated training images

Post-deduplication, grouped for clean CV splits

Training set was built through a multi-source pipeline (COCO, CCTV footage, real-world guns/knives), deduplicated via perceptual hash + CLIP similarity, and grouped by source to prevent train/test leakage during cross-validation. Evaluation pairs CCTV-domain positives (UCF-Crime shooting/assault) with self-recorded negatives for dual-test-set honesty.

04Operator-facing features

The dashboard is designed as a professional DVR/NVR replacement — not a research notebook. Every overlay is toggleable, every threshold is live-tunable, and every alert shows its reasoning.

👤 Continuous body overlay

Skeleton + mannequin rendering for every tracked person. When pose estimation drops a frame, a last-valid-pose snapshot holds for ~1.5 seconds; below that, a generic body glow indicates presence without faking anatomy. Tracks fade along their velocity vector for 2 seconds after loss, instead of popping out abruptly.

🎯 Weapon-threshold slider

Operators dial confidence sensitivity in real time. Drag up to suppress noisy low-confidence detections; drag down for security-critical contexts. Applies live to the detection panel, overlay strokes, the auto-flagger, and the threat heatmap timeline — no page reload.

🔥 Threat-density heatmap

The scrub bar renders a time-density heatmap of detected threats across the clip, so operators can scan a 10-minute video at a glance and jump directly to the seconds that matter. Manual flag buttons persist operator annotations alongside model alerts.

🤝 Weapon holder binding

Detected weapons are bound to the wrist of the nearest tracked person via pose-based proximity. The overlay draws a line from weapon to holder, so operators don't have to guess who is carrying what in crowded scenes.

🌙 Night + tint modes

Per-clip brightness/contrast boost for low-light footage and a customizable dark tint for washed-out daytime clips. Both stackable, both persisted per-operator, both zero performance cost (GPU CSS filters).

🔔 Auditable alerts

Every alert carries its evidence: which frames fired, which persons were involved, which weapon class, which confidence scores, which temporal window. Operators can acknowledge, mark false-positive, or escalate — with the reasoning chain attached for review.

05What makes HARIS different

🧩 Specialist pipeline, not monolith

Every decision is traceable to a named sub-model. When HARIS is wrong, we know which stage was wrong — and can fix that stage without retraining the whole system. This is the difference between an engineered system and a demo.

⏱️ Temporal, not single-frame

Actions are classified over 3-second skeleton windows. Alerts require persistence — 3 out of 5 recent frames must agree, with at least 10% of the window above the persistence threshold. Single-frame detections never fire alerts.

🌐 API-first, multi-client ready

The web dashboard is one of several planned clients. A clean JSON boundary means future mobile and desktop clients can plug in to the same server: phone-as-camera, portable operator UI, on-demand face re-identification scans.

🔍 Honest evaluation

Evaluated on a dual test set: public UCF-Crime clips (shooting + assault categories) plus self-recorded domain-specific footage. Group-aware splits prevent source-leakage in the metrics. Source-level deduplication before training.

06Roadmap

Completed milestones and what's shipping before the graduation showcase.

Semester 1 · 2025

Foundations

Problem framing, literature review, initial dataset construction, first-pass detector and pose integration. Proposal defense passed.

Semester 2 · Q1 2026

Full pipeline end-to-end

RT-DETR + BoT-SORT + RTMPose + ST-GCN + alert engine running on uploaded video. Web dashboard with realtime overlay, socket-pushed alerts, and operator controls.

April 2026

Weapon detector v2 · overlay polish

Fine-tuned detector: −74% raw FP, −41% alert FP, −17% video-level FP vs v1. Continuous body overlay, threat heatmap, weapon-threshold slider, holder binding, night/tint modes.

Late April · Early May 2026

VLM-on-alert · FaceNet enable

Qwen2.5-VL-3B running locally on alert frames for natural-language threat summaries. FaceNet watchlist re-identification surfaced in the operator UI.

Pre-showcase

Aggressor Logic Engine · DVR primitives

Role-assignment reasoning (aggressor / defender / bystander) with evidence-chain output. DVR-class features: multi-day scrub, zones, retention, operator roles.

May 2026

Graduation showcase

Live demo at IAU's graduation projects exhibition. Presenting the full system end-to-end on both public benchmark footage and self-recorded scenarios.

07Honest limitations

We publish the caps. A system that pretends to have no limits is a system that hides them from its operators — and that's the opposite of what surveillance AI should be.

Operational caps in the current build

Top-4 persons per frame get action labels. Tracking applies to everyone in frame; skeleton-based action classification applies to the four highest-detection-confidence persons. Documented and visible in the UI.
Video processing cap: 60 seconds at 10 fps per upload. Environment-overridable for longer evaluation runs; the dashboard is scoped for short-clip operator workflows.
Tracker re-identification and track-recovery are gated features. Code is shipped but disabled by default — the re-ID path has a wall-time cost we haven't optimized, and tracker-side appearance matching is a known next-pass upgrade.
Pose confidence gates silently drop low-quality skeletons. Far-field subjects below ~100px tall get bounding boxes and tracks but no action labels. The overlay surfaces this with a generic body glow, so operators aren't left wondering why a box is empty.

08Team

Five students · College of Computer Science & Information Technology · Imam Abdulrahman Bin Faisal University.

Turki

Team Leader

Aseel

Team Member

Hamza

Team Member

Anas

Team Member

Khalid

Team Member

Supervised by faculty at the College of Computer Science & Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Kingdom of Saudi Arabia.

Real-time hostile activityrecognition for CCTV.

01The problem we're solving

02The pipeline

03Measured results

04Operator-facing features

👤 Continuous body overlay

🎯 Weapon-threshold slider

🔥 Threat-density heatmap

🤝 Weapon holder binding

🌙 Night + tint modes

🔔 Auditable alerts

05What makes HARIS different

🧩 Specialist pipeline, not monolith

⏱️ Temporal, not single-frame

🌐 API-first, multi-client ready

🔍 Honest evaluation

06Roadmap

07Honest limitations

Operational caps in the current build

08Team

Real-time hostile activity
recognition for CCTV.