★ ★ ★

BABY TRUMP & FRIENDS

← Back to analysis

From SNL to Stable Diffusion: A Short History of AI-Powered Political Satire

Genre history · Published May 28, 2026 · 9-minute read

The Baby Trump videos surfacing in our daily feed look new, but they sit at the end of a long, well-documented line of political caricature. The technology has changed several times. The underlying impulse — render the powerful as ridiculous, then watch them try to keep their composure — hasn't changed at all. If you want to understand why Baby Trump landed when it did, it helps to look at where the form came from and what actually shifted to make 2026's version possible.

The Pre-Digital Centuries

Political caricature in the West has a recognizable form by the 1730s, when British print shops started producing single-sheet engravings of George II as a child, an invalid, or a domesticated animal. The fundamental move — exaggerated proportions, a single distinguishing physical feature, the subject placed in a setting that contradicts their official posture — is locked in by the time of James Gillray's late-1700s work on William Pitt and Napoleon. Honoré Daumier inherits the form in 1830s France, gets jailed for it briefly, and ports it into the new medium of mass-circulation newspapers. By the late nineteenth century, Thomas Nast in the United States is using the same toolkit on Tammany Hall, and by the 1930s the print caricature is a daily fixture of every major paper.

What's notable about this two-century run is that none of the technology shifts — engraving to lithography, lithography to halftone printing, halftone to wirephoto syndication — changed the underlying form. The cartoonist drew a recognizable face, distorted a specific feature, and placed the subject in a degrading context. Faster distribution. Same craft.

The Television Decades

The first real form shift comes with television. That Was the Week That Was in 1962 brings caricature into motion, then Spitting Image in 1984 commits to the puppet as a satirical instrument. The puppet does something the cartoon couldn't: it talks, in real time, in the puppet maker's chosen voice, and it inhabits the same pixelated TV space as the actual politician it's mocking. The viewer's brain processes the puppet and the news anchor as occupying the same medium. The deflation works differently — not "look at this funny drawing" but "look at this person on TV who is also a puppet."

Saturday Night Live's cold opens, running from the late 1970s onward, are the American answer. Live actors in latex, behaving in character, doing the same deflation move that the cartoons and the puppets did, just with the addition of a comic performer's timing. Tina Fey's 2008 Sarah Palin and Alec Baldwin's mid-2010s Donald Trump both work primarily because the performers can hold a single physical mannerism long enough that the audience laughs at the gesture itself, independent of the script.

The television era's limit is production cost. A Spitting Image puppet takes weeks to sculpt. An SNL cold open requires a writer's room, a wardrobe department, a soundstage, and a live audience. The form is gated by the budget required to enter it, which means the producers are necessarily large media institutions with editorial standards and legal departments. The satire is therefore curated to the average tolerance of a network audience.

The First Web Wave

The browser breaks the budget gate. JibJab's "This Land" video in 2004 — the Bush-Kerry singing parody made by two brothers in a Brooklyn apartment — demonstrates that you can put a viral political-satire video in front of fifty million people without a network behind you. The animation technique (cut-out heads of real politicians on cartoon bodies) is crude, but the comedy lands because it inherits a century of caricature shorthand. Viewers don't need to be taught how to read it.

The first web wave runs from JibJab through the YouTube auteur era (2007–2015): Auto-Tune the News, Bad Lip Reading, Songify the News, the Gregory Brothers. The form is heavily edit-driven — take real political footage and recontextualize it with music, with subtitles, with juxtaposition. The creators are small, individual or two-person operations, and they hit virality reliably because the source material (cable news clips) is free and inexhaustible.

What this era doesn't have is generated footage. The politician on screen is always real, just reframed. The constraint is real: animating a politician's face on a normal human timeline still requires either a costume, a rotoscoper, or a frame-by-frame animator. The web removed the budget gate for distribution; it didn't yet remove it for production.

The Deepfake Interregnum (2017–2022)

The first generative-video approach to political caricature is deepfaking, which becomes practical for hobbyists around 2017. Early-generation deepfakes are uncanny and the public discussion immediately frames them as a misinformation hazard rather than a comedic medium. The few satirical deepfakes that do get made — the Jordan Peele PSA, Ctrl Shift Face's Bill Hader morphs, a handful of YouTube channels — tend to be either earnest demonstrations of the technology or careful, signposted comedy bits in which the satirical intent is unmistakable.

The deepfake era produces three useful things for what's coming. First, a viewer base that's already calibrated to expect AI-generated faces of politicians. Second, a body of public debate that establishes acceptable and unacceptable uses of the form. Third, a generation of creators who learn the production pipeline (face mapping, audio cloning, post-processing) and carry that skill into the next era. What it doesn't do is produce a recognizable comedic genre. The technology is still too expensive in time-per-second to support the rapid iteration that internet comedy requires.

The Diffusion Inflection

The shift that produces the Baby Trump era happens between roughly mid-2024 and early-2026, when generative video models cross several thresholds at once. Image diffusion models had already gone mainstream in 2022 (Stable Diffusion's open release); the video equivalent — Sora, Runway Gen-3, Veo, Kling — cross the "watchable" line in 2024 and the "feature-grade" line in 2025. By the time Baby Trump starts trending in late 2025, a single creator with a consumer GPU and a paid API subscription can generate a thirty-second clip of a stylized character speaking, in sync, with consistent identity across cuts.

The diffusion inflection matters because it removes two production-side gates that the previous eras still had. First, character consistency: previous AI video required careful frame-by-frame curation to keep the same character looking like itself. Diffusion models with reference-image conditioning (and increasingly, dedicated character-locking LoRAs) make the same character produceable across arbitrary scenes by a single creator. Second, voice: text-to-speech and voice-cloning models in the 2024–2025 wave finally cross the "you don't have to be a voice actor" threshold for short comedic delivery. The Baby Trump voice you hear on the Baby News Network is a cloned-and-tuned voice model, not an impersonator, and the consistency across hundreds of videos is what allows the character to feel like a character rather than a one-off bit.

Why "Baby" Specifically

The aesthetic choice to render the politicians as toddlers rather than as photorealistic copies is partially a comedic decision and partially a technical one. Photorealistic generation of a recognizable real human is legally and platform-policy fraught; YouTube's deceptive-content rules, post-2024, are restrictive about lifelike impersonation of real people. A stylized toddler render is unmistakably a caricature, sits comfortably within satire norms, and clears moderation without much friction.

But the stylized-toddler aesthetic also turns out to be the right comedic register for an era that's exhausted by both performative seriousness and performative transgression. The toddler frame disarms the politics. It says: this isn't an attack, it's a deflation. The same political subject that would feel exhausting in a photorealistic deepfake feels like permission to laugh in the toddler aesthetic. The aesthetic does the work of moral framing that the previous eras' comedians had to do with tone of voice.

Where the Form Is Headed

The next inflection is already visible at the edges of the feed: multi-character sustained narratives, episodic structures, recurring side characters with their own arcs. The Baby News Network already operates closer to an animated sitcom than to a one-off satire channel, and a handful of newer entrants are pushing toward longer formats (eight-to-fifteen-minute "episode" videos rather than the dominant sixty-second cuts). Whether the longer format works is still being tested; our own feed data shows the longer videos lagging the shorter ones, but the experiment is happening.

Whatever the next iteration looks like, it'll be built on the same caricature instinct that has been working since 1730 — render the powerful as ridiculous, let the audience laugh. The toolchain keeps changing. The job stays the same.