Fleet managers get asked some version of "are our routes getting better?" more often than almost any other operational question. The problem is that most of the numbers people reach for first — stops per route, miles per day, average route duration — don't actually answer it. They describe activity, not efficiency.
We've tracked enough fleet data at Routely to have a clear view on which metrics genuinely reflect route quality and which ones create false confidence or misleading comparisons. Here are the five that matter, with some context on why the obvious alternatives fall short.
1. On-time percentage, segmented by time-window type
On-time delivery percentage is the right starting point for route efficiency measurement — but only if you segment it correctly. An aggregate on-time percentage number obscures too much. A carrier that delivers 94% of stops on time sounds good until you see that their tight 2-hour-window stops have a 78% on-time rate while their all-day open-window stops have a 99% rate. The problem is concentrated exactly where it costs the most in customer experience and contractual exposure.
Segment on-time percentage by: hard time windows (customer-specified, narrow), soft time windows (expected delivery windows with some flex), and open stops (best-effort, no specified window). Each segment has a different performance baseline and a different consequence for failure. Mixing them into a single percentage tells you almost nothing actionable.
We're not saying a single aggregate metric can never be useful — at the fleet-summary level, trend direction on aggregate on-time rate is a reasonable health signal. But for diagnosing route quality problems and knowing where to focus improvement effort, segmented on-time is the number that matters.
Track it by day of week, by route, and by driver. Routes that consistently underperform on tight-window stops are usually telling you something about stop sequencing relative to time window feasibility — the optimizer put a tight-window stop too late in a sequence that often runs long.
2. Failed delivery attempt rate, by first-attempt versus repeat
Every carrier tracks failed delivery attempts. Most don't distinguish between first-attempt failures and repeat failures on the same stop. That distinction matters because the root causes are completely different, and conflating them hides where to intervene.
First-attempt failures on residential stops typically indicate a time window problem: the driver arrived when no one was home because the route ran behind schedule, or the scheduled delivery time didn't account for customer availability. These are route planning and re-sequencing problems. If you have a persistent first-attempt failure rate above 8–10% on residential routes, something in your sequence optimization is consistently overrunning time windows.
Repeat failures — stops that fail again after a rescheduled attempt — indicate a different problem: usually bad address data, a customer who has genuinely moved or cancelled, a commercial stop with irregular access hours, or a driver who's marking stops as "attempted" without actually making the attempt. Each of these requires a different response, none of which is a routing algorithm fix.
Tracking first-attempt and repeat failure rates separately lets you direct the right intervention to each. Routing optimization helps with first-attempt failures. Data quality work, driver compliance monitoring, and customer communication help with repeat failures.
3. Actual versus planned route duration variance
Route duration variance — the gap between the planned end time and actual end time — is a direct measure of your route model's accuracy. If your routing engine consistently plans 7-hour routes that take 8.5 hours, you have a systematic bias in your dwell time estimates, your drive time estimates, or both.
The useful form of this metric is variance distribution, not just average variance. A fleet with an average actual-versus-planned gap of 22 minutes might have that gap distributed fairly evenly (+/- 15–30 minutes) or concentrated in specific route types that run significantly over. The second case reveals a structural estimation problem in how you model certain territory types or stop categories.
High variance on specific routes often traces back to dwell time underestimation for particular stop types. Commercial receiving docks that require the driver to wait for a dock worker, apartment complexes with parking and elevator time, stops that regularly require POD signatures from a specific person — these have materially higher dwell times than a standard residential stop and need to be modeled differently.
When route duration variance comes down, it usually means your optimizer is building routes that are actually completable within the planned window. That has direct downstream effects on on-time percentage and driver overtime cost.
4. Re-sequencing acceptance rate
If your routing product offers live re-sequencing, tracking how often drivers follow the proposed re-sequence is a critical feedback metric that most fleet managers don't look at. Re-sequencing acceptance rate measures what fraction of proposed sequence changes drivers actually execute versus what fraction they ignore and continue with their prior plan.
A high acceptance rate (above 75%) generally means drivers trust the system's suggestions — the re-sequences are making sense to them operationally, the time savings are visible, and the reasons are being communicated clearly enough that drivers aren't second-guessing the change. A low acceptance rate (below 50%) is a warning sign that either the re-sequencing logic is producing suggestions that don't make practical sense to drivers, or the reasoning isn't being surfaced well enough to be persuasive.
Re-sequencing acceptance also interacts with route quality outcomes. When drivers ignore suggested re-sequences during traffic incidents, you often see a cluster of late completions on the routes where the ignored re-sequence would have helped. Correlating acceptance rate with on-time percentage by route is a productive analysis — it separates "the optimizer gave bad advice" from "the optimizer gave good advice that drivers didn't follow."
We track this metric in Routely's dispatcher view, with per-driver and per-route breakdowns. Drivers with consistently low acceptance rates are candidates for the contextual coaching Dana wrote about in our driver onboarding post — often it's a trust gap rather than a workflow problem.
5. Stops per vehicle-operating-hour (not stops per route)
Stops per route is the metric most carriers look at. It's also one of the most misleading, because it conflates two different things: the number of stops planned (a dispatching decision) and how efficiently the route executed. A 22-stop route and a 14-stop route are not comparable on this metric without knowing territory type, stop type, and operational window length.
Stops per vehicle-operating-hour normalizes for route length and provides a meaningful efficiency comparison across different route types and territories. Vehicle-operating-hour starts when the driver leaves the depot and ends when they return. Stops completed during that window, divided by hours elapsed, gives you a rate that's comparable across routes of different planned lengths.
Typical ranges vary significantly by territory type. Dense urban residential routes with short drive segments and standard residential dwell time might run 5–8 stops per vehicle-operating-hour. Lower-density outer suburban routes with longer inter-stop drive times and a higher proportion of commercial stops might run 3–5. Comparing a Columbus inner-ring route against an outer I-270 beltway route using raw stop count tells you nothing useful; comparing their stops-per-vehicle-operating-hour ratios against their respective territory benchmarks tells you whether each route is performing at expected efficiency for its territory type.
What these five metrics have in common
Each of these metrics is designed to separate signal from noise — to tell you something actionable about route quality rather than just describing what happened. On-time percentage segmented by window type tells you where customer commitments are being missed. Failed attempt rate separated by attempt number tells you whether the problem is routing or data. Duration variance tells you whether your model is realistic. Re-sequencing acceptance tells you whether your live optimization layer is actually working. Stops per vehicle-operating-hour gives you a territory-normalized productivity measure.
Taken together, they give a fleet manager a clear picture of where route performance is strong, where it's weak, and — most importantly — what category of problem is causing the weakness. That's the starting point for actually improving routes, rather than watching aggregate numbers move and hoping they mean something.
We've built dashboards around these five in Routely's operations view. The hardest part wasn't the calculation — it was agreeing internally on definitions, particularly around what counts as a "hard" versus "soft" time window and how to classify the first versus repeat attempt distinction when a stop has customer-requested reschedule versus dispatcher-initiated reschedule. Getting the definitions clean took longer than building the charts. It always does.