Back to: Advanced Physical Security Integration (APSI)
0
Lesson 8.3: Audio Analytics (Hearing the Threat)
Module: 8 – AI & Advanced Analytics
Prerequisites: Lesson 5.2 (Glass Break Sensors) & Lesson 8.1 (AI Stack)
Estimated Time: 45–60 Minutes
1. Learning Objectives
By the end of this lesson, you will be able to:
- Contrast “Decibel Thresholding” (Dumb) with “Spectral Analysis” (AI) to explain why modern sensors don’t false alarm on slamming doors.
- Explain the physics of Gunshot Detection using TDOA (Time Difference of Arrival).
- Define “Aggression Detection” and how it identifies threats without violating privacy (recording words).
- Navigate the legal minefield of Audio Surveillance (Wiretap Laws vs. GDPR).
2. The Evolution: From Volume to Signature
For decades, audio security was useless because it relied on volume.
- Old Way (Thresholding): “If sound is louder than 80dB, trigger alarm.”
- Result: A book drops, a door slams, or a janitor laughs $\rightarrow$ False Alarm.
- New Way (Spectral Analysis): AI converts sound into a visual picture called a Spectrogram (Frequency vs. Time). It ignores volume and looks for the “Shape” of the sound.
- Gunshot: Near-instant rise time (millisecond spike) followed by a specific decay.
- Scream: High frequency (pitch), sustained duration, and harmonic distortion.
- Glass Break: Low frequency “thud” (impact) + High frequency “shatter.”

3. Gunshot Detection Technologies
There are two distinct ways to detect a shooter.
A. Indoor (Acoustic Signature)
- Hardware: A specialized sensor (or an AI camera microphone).
- Logic: It listens for the specific “Bang” of the muzzle blast.
- Challenge: Echoes in hallways. The AI must be trained to ignore reverb.
B. Outdoor (Triangulation / TDOA)
- Hardware: Requires at least 3 Microphones spaced far apart (e.g., on different light poles).
- Logic:Time Difference of Arrival (TDOA).
- Speed of Sound = ~343 meters/second.
- If Mic A hears the shot at 0.00s, Mic B hears it at 0.05s, and Mic C hears it at 0.08s, the computer calculates the geometry.
- Result: It places a red dot on the map at the exact GPS coordinates of the shooter.
4. Aggression Detection (Predicting Violence)
This is popular in Hospitals (ER waiting rooms) and Schools.
- How it works: It detects Stress patterns in the human voice.
- Rising Pitch (Frequency).
- Rising Volume (Amplitude).
- Rapid Cadence (Speed of speech).
- The Key Differentiator: It does NOT use Speech-to-Text. It does not know what you said; it only knows how you said it.
- Benefit: This usually bypasses privacy concerns because no intelligible words are analyzed or recorded.
5. Privacy & The Law (The Integrator’s Minefield)
Warning: Audio laws are stricter than Video laws.
- Video: In public/commercial spaces, you generally have “No Expectation of Privacy.” You can film people.
- Audio:
- USA: Federal Wiretap Act. Some states are “One-Party Consent” (one person knows), others are “Two-Party Consent” (everyone must know). Recording a conversation without consent can be a Felony.
- EU (GDPR): Extremely strict. Recording audio in a workplace is almost always illegal unless justified by a specific high-security threat.
The Integrator’s Standard Operating Procedure (SOP):
- Default OFF: Always ship cameras with microphones disabled.
- Signage: If audio is active, you must post signs: “Audio and Video Surveillance in Progress.”
- Waiver: Make the client sign a document stating they are responsible for legal compliance, not you.