Part 2 of the AI Writing Voice series

From Gut Feel to Measurable Style: A Stylometry Primer for Writers

Quick Takeaways
  • Writing voice breaks down into five measurable dimensions: sentence architecture, lexical fingerprints, rhythm, rhetorical moves, and perspective
  • Running three to five samples through a style analyzer reveals the stable patterns that define our voice, not just one-off quirks
  • Contradictions between dimensions (like dense vocabulary paired with casual contractions) are often what makes a voice distinctive
6 min read AI & Writing

The Voice Identification Paradox

Most of us can pick our own writing out of a lineup. We glance at a paragraph and just know whether it sounds like us. But when someone asks us to describe that voice, we tend to reach for vague labels. "Conversational but professional." "Casual but smart." "Kind of like a podcast, but written down."

Those descriptions are almost useless as instructions. Try telling an AI to write "casual but smart" and see what comes back. It will sound like everyone and no one.

This is the core problem we explored in Part 1: AI defaults to a generic style because we give it generic instructions. So how do we get specific? That is where stylometry comes in. It is the study of measurable patterns in writing, and it gives us a vocabulary for the things we can feel but cannot quite name.

The Five Dimensions of Writing Style

Authorship attribution research has found that writing style breaks down into measurable dimensions. Five of them matter most for our purposes.

Radar chart showing a sample writing style fingerprint across five dimensions: sentence length, vocabulary, readability, rhythm, and voice
A sample writing style fingerprint across five measurable dimensions

1. Sentence Architecture

The structural skeleton: average sentence length, how much that length varies, and how complex the grammar gets.

Some benchmarks to think about: Hemingway averaged around 14 words per sentence. Malcolm Gladwell sits closer to 22. Faulkner could push past 38. Think of them less as scores and more as fingerprints.

What tends to be more revealing than the average is the variation. A writer who alternates between 6-word punches and 30-word flows has a different rhythm than someone who consistently writes 18-word sentences. Both can be effective. But they are different instruments.

2. Lexical Fingerprints

This dimension covers word choice at the macro level: vocabulary richness, contraction usage, jargon density, and the ratio of content words (nouns, verbs, adjectives) to function words (the, of, and, but).

A high contraction rate ("we're," "that's," "won't") signals informality. A low one signals either formality or academic convention. Jargon density tells us whether a piece is written for insiders or a general audience. Neither is wrong, but each creates a distinct reading experience.

Lexical density, the percentage of content words in a passage, is a useful proxy for information load. Typical blog prose lands between 40% and 55%. Academic writing often pushes into the 55% to 70% range. Above 70%, the prose starts feeling compressed, like a textbook or a legal document.

3. Rhythm and Pacing

If sentence architecture is the skeleton, rhythm is the heartbeat. This dimension captures paragraph length, punctuation patterns, and the visual cadence of the text.

Some writers lean heavily on parenthetical asides. Others prefer semicolons to join related thoughts. Some use colons to set up lists or punchlines. These punctuation habits are surprisingly stable across a writer's work, and they contribute to the feeling of pace.

Short paragraphs (1 to 3 sentences) create a sense of speed. Longer paragraphs (5 or more sentences) slow the reader down and signal that an idea needs room to develop. Most of us have a default range we return to without thinking about it.

4. Rhetorical Moves

This is the strategic layer: how we open sections, transition between ideas, deploy evidence, and land conclusions. It is harder to quantify than sentence length, but it is still measurable.

Do we open with questions? Anecdotes? Bold claims? Do we transition with explicit connectors ("However," "On the other hand") or let juxtaposition do the work? When we cite evidence, do we drop it in as a parenthetical fact or build a narrative around it?

These patterns reveal something about our relationship with the reader. A writer who opens every section with a question is inviting participation. One who opens with a declarative claim is establishing authority. Neither is better; they are just different rhetorical postures.

5. Perspective and Stance

This final dimension captures point of view, certainty, and emotional register. First-person density (how often "I" or "we" appears), hedging language ("seems," "may," "suggests"), and the overall formality gradient.

A writer who hedges frequently ("this might suggest," "one possible interpretation") comes across as cautious and exploratory. One who uses certainty language ("this proves," "without question") projects authority. Most of us sit somewhere along this spectrum, and our position tends to be consistent.

The formality gradient matters too. Contractions, colloquialisms, sentence fragments: these are all signals that a reader processes without consciously noticing, yet they shape the entire experience of the text.

Measuring Our Own Style

So how do we actually get these numbers for our own writing?

The Writing Style Analyzer on this site measures all five dimensions automatically. The process is simple: paste in 500 to 1,500 words of finished prose, run the analysis, and get back a five-dimension profile with specific numbers.

One sample, though, only tells us about one piece. Our writing shifts depending on the topic, the audience, and even the time of day. To find the stable patterns (the ones that really define our voice), it helps to run three to five different pieces through the analyzer. Use samples from different topics if possible, but keep them in the same genre. A blog post and an academic paper will look different by design; comparing across genres just adds noise.

What we are looking for is consistency. If our average sentence length lands between 17 and 21 across four samples, that is a real pattern. If it swings from 12 to 28, the signal is in the swing itself: we might be a high-variation writer, and that is a legitimate stylistic feature.

Reading the Results

Once we have a profile, how do we interpret it? Here are some practical benchmarks for the key metrics.

Sentence Length

Under 15 words on average: punchy, direct, fast-moving. Think news writing or action-oriented blogs. Between 15 and 22: balanced, the comfort zone for most nonfiction prose. Between 22 and 30: flowing, suited to narrative or analytical writing that develops ideas at length. Above 30: dense, common in academic or literary prose where complexity is the point.

The standard deviation matters as much as the mean. A low standard deviation (say, 4 to 6 words) means predictable rhythm. A high one (10 or more) means deliberate variation, short sentences for emphasis, long ones for development.

Lexical Density

Between 40% and 55%: typical prose. Readable, with enough function words to provide flow and transition. Between 55% and 70%: information-rich. Every sentence carries more payload. Above 70%: highly compressed. This is where writing starts to feel technical or specialized.

Contraction Rate

High (above 60% of possible contractions used): informal, conversational. Low (below 20%): formal, academic, or deliberately measured. Between 20% and 60%: a mixed register that most professional writers naturally fall into.

Hedging vs. Certainty

This one is less about a target and more about awareness. If our writing consistently uses phrases like "it seems," "one possibility," or "the evidence suggests," that exploratory posture is part of our voice. If we tend toward "this shows," "the data confirm," or "without question," that assertive stance is equally legitimate. The goal is awareness: knowing our default well enough to specify it.

What to Watch For

A few patterns that are easy to miss:

The metrics that feel unremarkable are often the most important ones. If our sentence length is 19 words across every sample, that consistency is the finding, even though 19 feels like a perfectly ordinary number. It means our voice has a specific structural center.

Outlier samples are worth investigating. If one piece looks nothing like the others, it might be the one we wrote on deadline, or while channeling a different genre. It might not belong in the style specification at all.

Contradictions between dimensions can be the most interesting discoveries. High lexical density combined with a high contraction rate, for example, creates an unusual combination: dense information delivered in a casual register. That tension might be exactly what makes our writing distinctive.

From Numbers to Instructions

Measurable patterns give us a foundation, but numbers alone do not make a style specification. The next step is translating those measurements into instructions an AI can follow consistently.

The Style Specification Guide covers exactly that: how to take a five-dimension profile and turn it into a working document that produces prose in our voice.

Voice is a set of measurable habits. Once we can see them, we can teach them.

Next in the series: Building Your Style Spec: The Document That Makes AI Write Like You