Deepfake Phishing: How to Train Employees to Recognise AI Voice and Video Impersonation

2026-05-27 12 min read By PhishSkill Team

AI voice cloning and deepfake video have moved from research demos to operational attack tooling. The training fix is behavioural, not visual.

Finance team verifying a deepfake video meeting request through an out-of-band channel before authorising a payment

In February 2024, a finance employee at the engineering firm Arup transferred approximately 25 million US dollars to attackers after attending what appeared to be a video call with the company's chief financial officer and several other senior colleagues. Every participant on the call, except the employee themselves, was a deepfake. The audio matched. The video matched. The mannerisms matched. The instructions felt entirely consistent with the relationships and the context the employee already understood. The transfer was authorised, and the money was gone before anyone realised what had happened.

The Arup incident is the highest-value documented deepfake fraud case to date, but it is not unique. The Wall Street Journal reported a 2019 case in which a UK energy company executive transferred 243,000 dollars after a phone call from someone who sounded exactly like the parent company's CEO. The FBI Internet Crime Complaint Center has been tracking the rise of AI-enabled fraud in its annual reports for several years, and the trajectory is unambiguous: voice cloning and deepfake video are no longer research curiosities. They are operational attack tooling, and the awareness programmes most organisations deployed five years ago were not designed for them.

This article covers what deepfake phishing looks like in practice, why traditional red-flag training does not stop it, what behavioural defences do work, and how to think honestly about deepfake phishing simulation software as a category of tools.

The Deepfake Attack Surface in 2026

Two technical shifts have changed what is possible. The first is the dramatic improvement in generative AI voice models. Producing a convincing clone of someone's voice no longer requires hours of source audio or expensive specialised software. Less than a minute of public speaking — a conference talk, an earnings call, a YouTube interview — provides enough material for credible cloning. The second shift is the maturity of real-time deepfake video. Live video calls in which the attacker's face is replaced with a target's face in real time are now within reach of motivated criminal actors, not just nation-state programmes.

The combination changes attacker economics. Spear phishing that required hours of research and personalised writing can now be augmented or replaced by phone calls and video meetings impersonating senior figures the target genuinely knows. The traditional moat — that voices and faces could not be fabricated convincingly — has been drained.

Three populations face elevated risk. Finance and accounts payable teams are the most common direct targets because they have the authority to move money. Executive assistants and chiefs of staff are targeted because they manage the calendars and communications of high-value figures and are routinely asked to act on those figures' behalf. Anyone with privileged access — IT administrators, security operations staff, board members — is targeted because their credentials or decisions can unlock larger compromises downstream.

How Voice Cloning Phishing Works

A voice cloning phishing attack typically proceeds in three stages.

Stage one: source collection. The attacker collects audio of the target voice. Public recordings — podcast appearances, conference talks, video introductions, customer testimonials — are the most common sources. Internal recordings extracted from previous breaches are a secondary source. The minimum viable input has fallen dramatically; some commercial voice cloning services can produce useable output from 15 to 30 seconds of clean audio.

Stage two: cloning and rehearsal. The attacker generates the cloned voice and tests it against the specific phrases the operation will require. Banking fraud calls rehearse phrases like "authorise the transfer," "wire the funds," and "confirm the account details." Executive impersonation rehearses informal greetings, common verbal tics, and the specific tone the target uses with subordinates.

Stage three: the live call. The attacker calls the recipient — typically someone in finance, accounts payable, or executive support — and uses the cloned voice to issue an instruction or request a confirmation. The call is short and high-pressure. Urgency compresses decision time. Authority overrides scrutiny. The target executes the requested action before the verification habits that would have caught a fake email had a chance to engage.

Some variants use the cloned voice as the second step in a multi-channel attack: an email arrives first to establish the pretext, and the call follows minutes later to apply pressure. This is the pattern documented in many of the published 2024 and 2025 voice-cloning fraud cases.

How Deepfake Video Phishing Works

Deepfake video phishing is more operationally complex but produces higher-value outcomes. The pattern observed in incidents like the Arup case typically involves a pre-arranged "meeting" invitation, often through a calendar appointment that looks legitimate. The target joins a video call where multiple deepfaked participants — usually senior figures the target knows or knows of — are present. The conversation reinforces the legitimacy of the pretext: previous interactions are referenced, organisational context is included, the body language and facial expressions are convincing.

The fraud instruction is delivered within the meeting. A transfer, a credential reset, a contract approval, a sensitive data share. The pressure is collective: multiple senior figures appear to be requesting the action together. The target's mental category of "I would never authorise this from a single email" does not engage because what they are seeing is not a single email. It is a video meeting with three or four trusted colleagues all confirming the instruction.

The Arup transfer is the published high-water mark. Industry threat intelligence indicates several other attempts in similar magnitude that were stopped at the verification stage. The pattern is now mature enough that security teams should plan for it, not treat it as a future scenario.

Why Traditional Red Flag Training Does Not Work

Most awareness training built between 2010 and 2022 focused on visual and grammatical red flags. Look at the sender domain. Hover over the link to inspect the URL. Watch for spelling errors. Notice generic greetings. These are useful skills for email phishing, but they apply almost not at all to voice and video deepfake attacks.

There is no domain to inspect when the attack arrives over a phone call. There are no grammatical errors in a deepfake video meeting because the deepfaked figure is reading scripted dialogue or speaking from a fluent voice clone. The visual cues that earlier deepfakes produced — unnatural eye movement, awkward lip sync, audio-video desynchronisation — have largely been engineered out of current-generation tooling. Training employees to look for "deepfake artefacts" is training them for a defence that does not survive the next quality improvement cycle.

The other problem is psychological. The cognitive defences that make email phishing recognisable — "this looks weird, I should check" — are weakened in voice and video because the modalities feel inherently more trustworthy. Hearing a familiar voice activates relational trust circuits that text does not. Seeing a familiar face on a video call activates them further. The result is that the same employee who would scrutinise a suspicious email may comply reflexively with a deepfaked voice call from someone who sounds like their boss.

AI-generated phishing emails created the first significant gap in red-flag training. Deepfake voice and video created the second, and the gap is wider.

The Behavioural Defence: Verification Norms

The durable defence against deepfake phishing is not better detection of the deepfake itself. The deepfake will keep getting better, and content-based detection will keep losing ground. The durable defence is verification norms that operate independently of audio or video authentication.

Three specific habits, deployed consistently, defeat the large majority of deepfake voice and video attacks.

Out-of-band verification for high-stakes requests. Any request involving financial action, credential changes, or sensitive data sharing is verified by contacting the requester through a channel known in advance — a phone number from the corporate directory, an internal chat handle, a verified email address — not by responding to the channel the request arrived on. This single habit defeats most deepfake attacks because the deepfaked figure cannot answer a callback to their real phone.

Code-word challenges. For executive teams and finance teams, a pre-arranged verification phrase or challenge question that only the real person would know. The phrase is established in advance, not exchanged in the moment, and is rotated periodically. An attacker who has cloned the voice does not know the phrase. The Hollywood version of this idea has been around for decades; the AI era is the moment it became operationally necessary.

Mandatory cooling-off for unscheduled high-value actions. Wire transfers above a threshold, credential resets for privileged accounts, and approvals for off-cycle vendor changes do not happen in real time on a phone call or video meeting. The policy is that they require a documented confirmation step — an email approval logged in a system of record, a second signatory, a 24-hour review window — that cannot be compressed by urgency claims. The deepfake's primary weapon is real-time pressure. Mandatory cooling-off removes that weapon.

These three habits are organisational policy decisions as much as training topics. Awareness training teaches employees to apply them; the underlying policy is what makes the application defensible against pushback in the moment.

What Deepfake-Aware Awareness Training Should Cover

A 2026-current awareness training programme should explicitly include deepfake content alongside the email phishing curriculum. Five elements are essential.

Pattern exposure. Employees who have seen real deepfake examples — voice clones of public figures, deepfake video clips — recognise the modality faster when they encounter it in an attack. The exposure does not need to be technical; it needs to be experiential.

The "what to do when the voice is right" framework. Employees need explicit guidance for the scenario where every visual and audio cue confirms legitimacy. The default action is verification, not compliance. Repetition matters here; the verification habit has to feel automatic, not optional.

Role-specific scenarios. Finance teams should walk through voice-cloning fraud scenarios. Executive assistants should walk through deepfake video meeting scenarios. IT teams should walk through helpdesk voice-cloning scenarios where the attacker is impersonating an executive demanding an urgent credential reset. Generic training applied uniformly does not prepare these high-risk roles.

The collective-fraud pattern. The Arup case demonstrated that attackers can convincingly stage multi-participant deepfake video meetings. Awareness content should explicitly cover the "everyone in the meeting is fake" scenario, because the cognitive shift from "is this person real" to "is this entire meeting real" is not automatic for most employees.

Organisational policy reinforcement. Training is most effective when it points to specific organisational policies the employee can rely on. "The CFO will never instruct a wire transfer over a video call. If anyone in a video call says otherwise, the request is fraudulent regardless of how it looks." Specific, defensible, repeatable.

For specific guidance on the executive-protection side, CEO fraud and whaling attack prevention covers the wire-transfer scenario in depth. For the regional dimension, the GCC family offices and wealth management guide examines deepfake voice attacks against principal-led advisory environments where verification habits are often the only defence in place.

Deepfake Phishing Simulation Software: What Exists, What Works

A small but growing category of vendor tools now markets itself as "deepfake phishing simulation software" or "tools for simulating deepfake voice phishing." It is worth being honest about what these tools currently are and are not.

The tools exist, and several reputable vendors offer them. The typical pattern is a controlled outbound voice call to an opted-in employee, using a pre-recorded or synthesised voice that impersonates a senior figure, with a scripted social engineering ask. Some platforms add deepfake video as a second module. The output is the same kind of behavioural data that email phishing simulation produces: who complied, who verified, who reported.

The honest assessment is that deepfake simulation is operationally complex and ethically sensitive in ways that email simulation is not. Deploying it requires careful coordination with HR and legal, because employees may genuinely believe they are receiving a call from a senior leader and may experience the simulation as more distressing than an email simulation. Most organisations should not deploy deepfake simulation as a standalone exercise. The order of operations that produces the best outcome is: deploy strong awareness training on deepfake patterns first, build the verification habits and cooling-off policies, and only then consider live deepfake simulation as a refinement layer for high-value roles.

PhishSkill does not currently run deepfake voice or video simulations. Our live simulation channels are email and WhatsApp, and our awareness training modules cover the deepfake recognition patterns and the verification behaviours that defeat them. Organisations with a near-term requirement for live deepfake simulation should evaluate that capability separately from the rest of their awareness programme rather than expecting a single platform to do everything well.

Deepfake Phishing Awareness for Regulated Industries

Regulated industries face elevated deepfake risk and elevated regulatory exposure when an incident occurs.

Financial services. Wire transfer fraud has been the consistent first-order monetisation path for deepfake attacks. The regulatory framing — FFIEC, FINRA, the UK FCA, CBUAE consumer protection requirements — increasingly references AI-augmented fraud explicitly. Awareness training is part of the documented control set examiners expect to see.

Healthcare. Patient data fraud, prescription fraud, and insurance impersonation are emerging deepfake targets. HIPAA's security rule does not explicitly mandate deepfake training, but the broader "appropriate workforce training" requirement now functionally requires it for organisations of any significant size.

Critical infrastructure. The 2019 UK energy company case was an early indicator. CISA guidance on AI-augmented threats has matured significantly since 2023 and now explicitly addresses voice cloning. Operators of critical infrastructure should treat deepfake recognition training as part of their cybersecurity workforce baseline.

Government and defence. Insider threat programmes increasingly include deepfake recognition because the social engineering threat to cleared personnel is among the most rapidly evolving categories.

For organisations in any of these sectors, the training requirement is not optional and will keep tightening. Building deepfake-aware curriculum into your existing awareness programme is the cleanest path; bolting it on later under regulatory pressure is more expensive and less effective.

Building Resilience Without Overreacting

It would be a mistake to read this article and conclude that every voice call and video meeting should now be treated with paranoid suspicion. The defensive goal is not to undermine routine operational communication. The goal is to ensure that the small category of high-stakes actions — fund transfers, credential changes, sensitive data shares, off-cycle approvals — always passes through a verification step that deepfake content cannot bypass.

That is a smaller, more achievable behavioural change than "spot every deepfake." It is also more durable, because it does not rely on the recipient's ability to detect AI-generated content that will keep getting better.

The combination of pattern-aware phishing awareness training, role-specific scenarios for the highest-risk populations, and organisational policies that make verification mandatory for high-stakes actions is the defensive posture that holds up against deepfake phishing in 2026 and beyond. The technology will keep evolving. The behavioural defence does not need to.


Related Reading

For the foundation, start with What Is Phishing Awareness Training? and What Is Spear Phishing?.

To understand the AI-augmented attack surface more broadly, AI-Generated Phishing Emails: Why They Are Harder to Detect is the umbrella reference. For the executive-protection angle, CEO Fraud and Whaling Attack Prevention covers the wire-transfer scenarios most exposed to voice cloning.

For voice and SMS channel context, Vishing and Smishing Simulation Training examines the broader mobile-channel awareness picture. For a regional dimension, GCC Family Offices and Wealth Management covers deepfake voice exposure in principal-led advisory environments.

External authority: the ENISA guidance on AI and deepfake threats documents the federal threat picture, and the NIST AI Risk Management Framework provides the underlying risk-management methodology.

Ready to build deepfake-aware training into your programme? Start a free 30-day Starter trial and use the awareness training modules to baseline your team's recognition skills before layering simulation.

Ready to stop phishing attacks?

Run realistic phishing simulations and high-impact security awareness training with PhishSkill's automated platform.