Key Takeaways: Em dashes have become the most-cited signal that a piece of writing came from an LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,…, and the most common response (search-and-replace) does more damage than good. The em dash is a real punctuation mark with legitimate uses. The problem with AI-generated em dashes isn’t the punctuation, it’s the underlying sentence structure that depends on them. Fix the sentences, not the dashes.
Spend ten minutes on LinkedIn this week and you’ll see it: someone confidently declaring that any em dash in a post means it was written by ChatGPT. The em dash has gone from forgotten typographic detail to viral AI signal in about eighteen months.
There’s something to it. Modern language models produce em dashes at rates no human writing pool matches. But the response most teams are settling on, stripping every em dash out of AI drafts before publishing, is the wrong fix for the wrong problem. Em dashes aren’t the issue. The sentences underneath them are.
The Em Dash Has a Real Job
Three different horizontal lines exist in serious typography, and they aren’t interchangeable. The hyphen (-) joins compound words. The en dash (–) connects ranges of numbers or dates. The em dash (—) is the longest of the three, and per Merriam-Webster, it does at least four things commas and parentheses can’t quite do as well:
- It sets off extra information when commas would create clutter. Compare “The new policy, which the board approved in March after two months of debate involving most of senior management, takes effect in July” with “The new policy — approved by the board in March after two months of debate — takes effect in July.”
- It marks an interruption or abrupt change of direction in a sentence.
- It introduces an explanatory clause with more force than a colon.
- It precedes attributions in pull quotes ("— Margaret Atwood").
The em dash is not a bug. It’s a tool. Hemingway used them. Emily Dickinson built her entire prosody around them. The Chicago Manual of Style devotes an entire section to them. Banning em dashes is not a return to natural writing; it’s an overcorrection driven by a recent statistical pattern in LLMA neural network trained on vast amounts of text data to understand and generate human language. LLMs use the Transformer architecture and can perform a wide range of tasks — summarization,… output.
So Why Does AI Use So Many?
Here’s where the story gets less satisfying than the LinkedIn version.
Sean Goedecke wrote the most thoughtful investigation I’ve seen on this, and his honest conclusion is that nobody is quite sure. The leading theory: large modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… pipelines started leaning more heavily on digitised printed books, and printed English from the late 1800s and early 1900s used em dashes roughly 30% more often than contemporary prose. GPT-3.5 didn’t have the em-dash habit. GPT-4o did. Something changed in the trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… data composition, and a stylistic tic from a hundred-year-old corpus surfaced as a dominant featureAn individual measurable property or characteristic of the data used as input to a model. Feature engineering — selecting, transforming, and creating features — is a critical step in the ML pipeline. of modern AI prose.
That’s the best theory, and Goedecke himself flags it as speculation. There are alternative explanations (tokenisation effects, RLHFA technique used to align LLMs with human preferences. Human evaluators rank model outputs, and a reward model is trained on these rankings. The LLM is then fine-tuned using RL to maximize the reward… preferences leaking through, trainingThe process of exposing a machine learning model to labeled or unlabeled data so it can learn patterns. During training, the model adjusts its internal parameters (weights) to minimize a loss… on web prose that itself absorbed AI output) but none of them holds up cleanly. The honest answer is: we know LLMs over-produce em dashes, we have a plausible theory about why, and we don’t have proof.
What’s more useful than the origin story is the consequence: AI-generated em dashes are not a careful writer’s choice. They’re a statistical artefact. That’s why blanket replacement misses the point.
Why Replace-All Is the Wrong Fix
Take an AI draft, run search-and-replace on every em dash, and two things happen.
The few em dashes that were doing real work (setting off a parenthetical, marking a sharp pivot) get flattened into commas that don’t carry the same emphasis. The rest get replaced too, but the underlying sentences are still built around them. The result is prose with the same metronome rhythm and the same hollow structure, just punctuated more conventionally. The reader still feels something is off; they just can’t point at the punctuation any more.
Here’s a typical AI sentence:
“Modern ERP isn’t just about software — it’s about transformation.”
Replace the em dash with a comma:
“Modern ERP isn’t just about software, it’s about transformation.”
Now technically free of em dashes. Still bad writing. The “isn’t just X — it’s Y” structure (a “negative parallelism,” in technical terms) is itself one of the strongest AI signals. The em dash wasn’t the problem. The vacuous claim and the persuasion-attempt structure were.
Fix the sentence:
“Most ERP projects fail at the change-management stage, not the software stage.”
Real claim. Defensible position. No em dash. Nothing to flag.
The Actual Fix: Rewrite, Don’t Replace
Handling em dashes properly is closer to editing than to find-and-replace. The workflow we’ve found useful, applied to any AI draft going out under a human name:
- Read each sentence containing an em dash and ask: what is this dash doing?
- Is it setting off truly parenthetical material that would otherwise create a comma pile-up? Keep it.
- Is it marking an actual interruption or abrupt pivot? Keep it.
- Is it propping up a vague claim or a “not just X but Y” construction? The dash is a symptom. Rewrite the sentence.
- For each kept dash, ask: would a comma, period, or parenthesis serve as well? Often yes. Default to the simpler punctuation. Save em dashes for the rare moment when nothing else carries the right weight.
- For each rewritten sentence, ask: would a real practitioner write this? A useful test: is there a concrete fact, number, or specific observation in the sentence? AI sentences often leave abstractions in place because that’s where models are most fluent. Adding a real example or a measurable outcome usually breaks the pattern.
The goal isn’t zero em dashes. It’s em dashes that earn their place, used at roughly the rate a careful human writer would use them. That’s somewhere around once or twice per 1,000 words for most editorial writing, not once per paragraph.
What Tools Actually Help
There’s a growing set of small tools and prompts aimed at this problem. The most useful one we’ve found is the open-source humanizer skill, which explicitly treats em-dash overuse as one of fourteen specific signs of AI writing. Its guidance is the same as the workflow above: replace em dashes with commas, periods, or parentheses where appropriate, but treat each one as a rewrite promptThe input text provided to an LLM to guide its response. Prompt design — choosing words, structure, and examples — significantly affects output quality. Also referred to as the user message or query., not a find-and-replace target.
A few practical observations from using it on real drafts:
- It works best as a checklist after a first pass, not as an automatic filter. The skill surfaces patterns; the human still decides which sentences need rewriting.
- Tools that purely strip em dashes produce technically clean but voiceless copy. Tools that flag the underlying patterns (negative parallelism, rule of three, hollow adjectives, copula avoidance) are more valuable.
- No tool catches everything. The single best signal that a draft has been properly humanised is whether it has a recognisable point of view. Punctuation is a proxy. Voice is the real thing.
What About AI Detectors?
The detection side of the same problem deserves a quick word.
Tools like GPTZero and Originality.ai claim very low false-positive rates (under 1.5% in vendor benchmarks). Independent studies tell a different story. A 2025 arXiv evaluation found GPTZero misclassified 16% of human-written essays as AI-generated; a separate analysis on Plagiarism Today logged around 12% in similar conditions. False positives climb above 20% for non-native English writers and for highly structured commercial writing. SEO posts, technical documentation, and most marketing content sit in that high-risk zone because they look optimised, not because they were generated.
The practical consequence: passing an AI detector is the wrong success metric for editorial work. Stripping em dashes and rotating vocabulary to lower a detector score produces text that’s just as soulless as before, except now it also reads slightly oddly. The workflow that holds up under scrutiny is the obvious one. Edit for voice and substance. Treat detector scores as one diagnostic among several. Never let a probability number be the gate.
The Meta-Point
We use AI in our own content workflow at Trobz. We also use it for code, for client research, for first drafts of internal documentation. The em-dash question keeps coming up, and the answer we’ve landed on is the boring one: this isn’t a punctuation problem, it’s an editing problem.
The reason AI drafts feel off when they’re published unedited is the same reason a junior consultant’s first deck feels off: no perspective, no specificity, no judgment. Those are added in editing, not in formatting. The discipline a team builds around an AI tool matters more than the tool itself, and editorial workflow is part of that discipline.
So the policy isn’t “no em dashes.” It’s “no AI prose published without human editing.” If a finished post has em dashes that earn their place, that’s fine. If it has none, that’s also fine. What it shouldn’t have is sentences that depend on the em dash to feel substantial, regardless of whether they were generated by a modelA mathematical function trained on data that maps inputs to outputs. In ML, a model is the artifact produced after training — it encapsulates learned patterns and is used to make predictions or… or a human.
One self-imposed constraint while writing this article: em dashes only where they were genuinely doing work. Count them above. The number is small enough to make the point and not so small that perfectly good sentences had to be mangled to avoid them.
If your team is publishing content with AI assistance and trying to figure out a sustainable editorial workflow around it, we’d be glad to compare notes. The em dash is one signal among many. The broader question of how to keep AI-assisted writing recognisably yours is the one worth spending time on.