Big Data – Big Gain or Big Pain?

Data is King. Fleet managers who leverage their vehicle’s operational data can attain tremendous savings and operational efficiencies. While the term “big-data” has become popularized in business today, leading and progressive fleet managers have used data to their advantage for decades. However, the volume of data available can quickly become overwhelming, even for small to mid-sized fleets. Some upfront pain will ensure long-term gain; care must be taken to ensure that subtle differences in data input and keying errors are avoided – inconsistencies that can render portions of valuable data unusable and useless. 

At best, such differences can make database queries difficult and extremely time-consuming, often requiring dozens of hours of data-cleansing and/or manual re-keying to make it usable. In this post, I will discuss the use of fleet data to create competitive advantage for your fleet, and ways to wrangle it into submission.

Examples of basic fleet data include miles/kilometers-driven, engine and PTO hours, fuel consumed, repair and preventive maintenance costs, accident costs, the book and/or depreciated value of vehicles/equipment, cost of capital, utilization, availability (uptime), fuel consumption, engine idling and more. Getting all this data and managing it into a logical and usable format for analysis and decision-making can indeed be challenging.

Vehicle telematics combined with input from a fuel dispensing system, fuel cards and your fleet management software and/or work order system, plus interfaces from an enterprise and/or corporate general ledger system can make data collection a lot easier. For fleets that outsource to a fleet management/leasing company and employ their fuel cards and fleet maintenance services, most of the basic data – and analysis of it, can be part of the service provider’s offerings. For fleets without such technologies and interfaces, collecting data manually is immensely time-consuming – and very error-prone (but not impossible).

Wikipedia describes big data as a term for data sets that are so large or complex that traditional data processingapplication software is inadequate to deal with them”.  Data sets grow rapidly – in part because they are increasingly gathered by numerous information-sensing mobile devices (telematics), aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes (one exabyte is one quintillion bytes!) of data are generated.

Life Cycle Analysis

Our work at Fleet Challenge regularly involves the analysis of large data sets from our client’s fleets. Today’s fleet managers know that the best-informed go-forward decisions are made by studying the trends of the past — as the adage goes “the past predicts the future”.  A key use of historical data is life cycle analysis. Savvy fleet managers consult their historical operating data to determine the optimal age for go-forward vehicle replacement based on economic life cycle analysis.  

Dr. Andrew K.S. Jardine, PhD, CEng, MIMechE, MIET, PEng, FCAE, FIIE, FISEAM (Hon), is the Founding Director of the Centre for Maintenance Optimization and Reliability Engineering (C-MORE). Dr. Jardine is a globally recognized expert on evidence-based asset management; many fleet managers (myself included) have studied life cycle analysis under Professor Jardine. His work and training had a big influence on my own career as a fleet manager. 

Cost-Benefit Analysis

Data trends should be studied in most facets of fleet management to (for example) determine correct PM intervals, assess the performance of vehicles and engine/trans/drivetrain combinations, the cost-effectiveness of new technologies – the list goes on and is restricted only by one’s imagination and inquisitiveness.

Best-in-class fleets use historical cost and reliability (uptime) data to make decisions about maintenance scheduling along with a myriad of other critical decisions. For example, over-maintaining a fleet is wasteful in terms of resources and labor while under-maintaining vehicles comes with its own set of risks and costs.

Accurately scheduling PM events can have a big payoff. If, for example, a fleet of 100 highway tractors each received just one premature PM inspection each year, including oil and filter changes, etc., at a cost of say… $1,000 per PM event (including labor, materials, downtime and PM costs), the fleet would be spending $100,000 unnecessarily each year!

Extending PM intervals can be a risky business so the decision to do so must be fully-informed and based on data, not guesswork. Clearly there are advantages to correctly tailoring PM schedules based on well-informed decisions determined by studying data on the potential impacts of in/decreased levels of preventive maintenance versus uptime rates.

PM schedules should always be determined based on–you guessed it – big data.  

Lube oil analysis is an example of data informing fleet managers about the frequency of vehicle oil changes and whether extending the interval is possible. Of course, many other factors must be considered in deciding on PM intervals, such as how frequently vehicles need to come back to the shop for air brake inspections and adjustments, as well as other maintenance procedures and mandatory inspections, but these types of decisions should also be based on frequency data. 

Data Conventions

As one of Fleet Challenge’s data analysts, I spend innumerable hours poring through tens of thousands of rows of fleet data searching for trends that will lead to recommendations and provide information to assist and inform the fleet manager in his/her decision-making. A data set for a mid-sized fleet we recently analyzed contained almost 100,000 rows of operating cost data. Each row of data had more than 10 data points which meant over one million possibilities of error!  

Issues most often come from keying errors and inconsistencies in the ways the data fields were originally set up. Some examples our team has encountered include:

  • Numbers entered as: “1, 2, 3, etc.”, then other times entered as text: “one, two, three, etc.”
  • The number “0” entered as the letter ”o” 
  • Peterbilt input as “Pete” and also as “Peterbilt”
  • Naming inconsistencies such as “Chev”, “Chevy”, “Chevrolet”, “GM”, and “GMC” 
  • Model years entered as i.e., “03” instead of “2003”
  • Misspelled or colloquially-spelled info  “bio-diesel” vs. “biodiesel”, “gas” vs. “gasoline”, “pick-up” vs. “pickup”, etc.
  • Lumping too much key data into single fields.  i.e., “2016 Kenworth tandem tractor” vs. individual, searchable fields i.e., “2016”, “Kenworth”, “tandem axle” and “tractor”
  • The misuse of hyphens – adding hyphens between blocks of characters in license numbers is an example we often observe
  • Date protocol – is it “month/day/year” or “day/month/year”?

Humans can accept variance and assimilate contextual sameness; computers cannot.

Jim Dyer, Regional Account Manager for Dossier Systems, a leading provider of Fleet Management software systems since 1979.

For example: Unit Category: pickup, pick-up, pick up, PU, P.U. – to a human it’s the same category.  But to a computer its five different things. Consider what your standard naming conventions should be (are they pickups or pick-ups?, are they flatbed or flat bed? Is the vendor NAPA305, NAPA 305  or NAPA #305?)   – decide on one convention for each, and make them all the same. Once the data is imported, you can limit certain users to the existing selections, so they can’t add non-approved categories, vendors, etc. Now when you pull a report on all costs related to ‘pickup’, everything is included.This type of consideration will help your data conversion go much more smoothly and effectively, and start receiving benefits and ROI much faster!

As described by Jim Dyer, there are dozens of such examples that can make working with big data a real pain. What to do? Whether your team is installing a new fleet management software system, or developing a new Excel worksheet for tracking operational data, it is critical to spend time upfront developing a set of data conventions and naming protocols. These should include a set of data entry guidelines for all staff, whether they are entering data on a work order on the shop floor, or data-entry/administrative support personnel. 

Having a data name convention is important because it is a collection of rules that when applied to data could result in a set of data elements described in a standardized and logical fashion.

The choice of naming conventions (and the extent to which they are enforced) is often a contentious issue, with partisans holding their viewpoint to be the best and others to be inferior. Moreover, even with known and well-defined naming conventions in place, some organizations may fail to adhere to them, causing inconsistency and confusion. These challenges may be exacerbated if the naming convention rules are internally inconsistent, arbitrary, difficult to remember, or otherwise perceived as more burdensome than beneficial.

In the general area of computer programming, a data naming convention refers to the set of rules followed to choose the sequence of characters which will be used as identifiers in the source code and documentation. Following a data naming convention, in contrast to having the programmer choose any random name of their choice, make the source code very easy to read and understand and enhance the source code appearance for easy tracing of bugs.

Some Further Considerations:

  • Use dropdown tables wherever possible, so users need only click on their choices from tables instead of manually keying in information.
  • Embed standard ATA VMRS (Vehicle Maintenance and Repair Standard) codes for use in entering work order data, to facilitate data analysis later
  • Shorter identifiers may be preferred as more expedient, because they are easier to type
  • Extremely short identifiers (such as ‘i’ or ‘j’) are very difficult to uniquely distinguish using automated search and replace tools
  • Longer identifiers may be preferred because short identifiers cannot encode enough information or appear too cryptic
  • Longer identifiers may be disfavored because of visual clutter

If your data is stored in Excel, or if your fleet management system exports to Excel for analysis, here are some tips:

  • Using Excel’s Data Validation feature, you can eliminate inappropriate data. You specify the conditions a value must meet, and Excel rejects values that don’t meet those conditions. 
  • Excel’s AutoComplete feature matches previous entries to the current input. As soon as the input value matches an existing value (in a single-column contiguous range), Excel attempts to complete the current entry. 
  • Validating data input values is one way to limit data entry mistakes, but it isn’t foolproof. For instance, you can assign a rule that limits entries to a decimal value, but the user can still enter the wrong decimal value. When input values belong to a known set, create a drop-down list.

As I’ve described, some pain up front can make the use of big data in fleet management much more usable and efficient, translating directly into big gains.

For more information about fleet data management, or if you have comments or suggestions about this post, we invite you to contact Roger Smith at   

Leave a Reply

%d bloggers like this: