Domain-Expert AI for Predictive Maintenance

Predictive maintenance is the application every manufacturer wants and almost none get right. The promise is irresistible: catch a bearing before it seizes, a pump before it cavitates, a motor before it burns out — and convert unplanned downtime into a scheduled, ten-minute swap. The math is compelling, the vendors are eager, and the pilots look great on a slide.

Then the model meets the plant floor, and it quietly fails. Not with a crash, but with a shrug: too many false alarms, missed the one failure that mattered, couldn't explain itself to the technician who had to act on it. Within a quarter, the dashboard is ignored and the team is back to running equipment to failure.

The failure is rarely the algorithm. It's the assumption underneath it — that predictive maintenance is a pure data problem, solvable by pointing enough sensors and enough compute at a machine. Four challenges break that assumption, and each one points to the same fix.

1. Failures are rare — so the data is too

The whole value of predictive maintenance lives in the failures, and failures are, by design, rare. A critical asset might fail a handful of times across its entire service life. A purely data-driven model needs hundreds of labeled failure examples to learn a signature; the plant can offer three. You cannot bootstrap statistical confidence from an event that almost never happens.

Worse, the failures you do have are heterogeneous. A gearbox fails from misalignment, from lubrication breakdown, from contamination, from fatigue — and each leaves a different fingerprint in the vibration spectrum. Lump them together and the model learns nothing; separate them and each class has a sample size of one.

2. Black-box scores don't earn trust on the floor

Even a model that works statistically has to survive contact with the people who act on it. A maintenance planner who is told "asset health: 34%, act now" and given no reason will, correctly, distrust it. The first false alarm that pulls a crew off real work confirms the suspicion, and the tool is dead.

Trust on the floor is earned the way it's earned between two engineers: with a reason. Which signal moved, why it matters, what failure mode it points to, and how confident the call is. A score without a rationale is not a recommendation — it's a demand, and demands get ignored.

The bottleneck in predictive maintenance was never the sensor data. It was the expertise required to interpret it.

3. The expertise isn't in the dataset — it's in the people

Ask the reliability engineer who has kept a line running for twenty years how they know a compressor is about to go, and they won't cite a threshold. They'll describe a pattern: a particular change in discharge temperature combined with a shift in current draw under a specific load condition, which they've learned means the valve is leaking. That knowledge is real, precise, and completely absent from the historian.

This is the knowledge that actually solves the problem, and data-driven pipelines throw it away. They treat the sensor stream as the ground truth and the expert as, at best, someone to label examples. It's exactly backwards. The expert holds the causal model; the data only holds the symptoms.

4. Models don't transfer — every asset relearns from zero

Train a model on Pump A and it will not work on Pump B, even if they're the same make and model, because the operating context differs: different duty cycle, different fluid, different ambient conditions, different installation. A purely statistical model has no way to carry what it learned about pumps in general to a new instance. So every asset, every line, every site starts its data collection over — and the failures are still rare.

Scaling predictive maintenance across a fleet this way is a treadmill: the cost of the next deployment never comes down, because there's no shared understanding to reuse.

The fix is the same in all four cases: put knowledge first

Every one of these challenges is a symptom of the same mistake — starting from data instead of from what your best people already know. A knowledge-first approach inverts the order.

You begin by capturing the expert's causal model of the asset: the failure modes, the mechanisms behind them, the signals that indicate each one, and the operating conditions that matter. That domain knowledge does the work the missing data can't. It tells the system what to look for before it has ever seen a failure, so three examples become enough to confirm a signature the expert already described. It gives every alert a reason — the specific failure mode and the evidence for it — so the score arrives with a rationale a planner can act on. It keeps the expert's judgment in the loop as the authority, not the labeler. And because the knowledge is about the mechanism, not one specific pump, it transfers: the understanding built on Pump A carries to Pump B, and the fleet gets cheaper to cover, not more expensive.

Data still matters — enormously. But it's the confirmation, not the foundation. The foundation is the expertise you already have, made executable. That's the difference between a predictive-maintenance model that lives on a dashboard nobody opens and one that a technician trusts enough to pull a crew for.