Austin skyline from Palmer Events Center


July 20, 2023

Condition Monitoring on a 400-ton asset using IoT & Advanced Analytics

Condition Monitoring on a 400-ton asset using IoT & Advanced Analytics

A journey into the world of condition monitoring on a 400-ton asset, where IoT and advanced analytics converge to revolutionize the way Caterpillar operates.

Predictive and preventative analytics utilizes data and business knowledge to reduce unplanned downtime. And the idea of repair before failure come to life with the use of IoT and advanced analytics

Harsheel Shah, IoT Analytics Manager, Caterpillar has been working across key functional areas to develop and deploy end-to-end scalable predictive analytics utilizing on-prem / cloud infrastructure and joined us at IoT World & The AI Summit Austin (now Applied Intelligence Live! Austin) to present what Caterpillar is building in this space to help their customers make informed decisions for fleet uptime.

For similar use cases presented at this year’s edition, check out the Industrial Enterprise Stage.


The Transcript:


Good afternoon, everyone. Thank you for joining us, especially after lunch, I'll try my best to keep this interesting and not let you sleep. Over the last day and a half, we've had really interesting discussions about IoT connectivity, analytics pipeline, right, knowing your business before even thinking about AI. So maybe the talk today will sort of try to bring everything together into perspective. And I'll take one of the applications of these technologies that we've been talking about for the last two days. And that application is condition monitoring. What is Caterpillar doing in the condition monitoring space? So we'll try to answer that question of how is Caterpillar monitoring a 400 ton machine? Using advanced analytics, which is obviously you will need IoT connectivity to flow through, only then you will go to the analytics portion, right.



So how many of you in the crowd have seen 797? Or even heard about a 797? Truck? Caterpillar truck? Oh, quite a few apart from Caterpillar folks. Yes, great. For the others who have not I do have a video. So we will attach to that machine. And maybe keep that in mind so that we can connect to each of the technologies that we talked about IoT connectivity and condition monitoring. So that'll explain a bit more.



Okay, going a bit deeper into what caterpillar is. Caterpillar as a company, we are nearly 100 years old, I say nearly because we were founded in 1925. So in about three years, we will be hitting a century, all the cricketers out there. We have about 4 million products out there in the field right now out of 4 million, 1.2 million connected assets. Now when we say connected, it's different levels of connectivity and depends on the business case, right. So again, trying to connect to the point of knowing your business and the customers. If the customer just wants to know where their machines are, you really don't need to get time series data off of the machines. So it's different levels of connectivity. Just keep that in mind. We have about 160 dealers, I think this is the strongest network that we have and helps us succeed in the market. Using our dealer network, we are able to reach to our customers. About 100,000 plus employees.



Mainly, I would like to point the different product lines that we cater as of today. So the 797 that I mentioned, falls into this mining equipment. That is one of the largest trucks that caterpillar produces. And the 400 ton is the load that it carries in one cycle. Right. So we will see that in a video soon here. But the main point being there are multiple applications that caterpillar has to think about when we think in IoT connectivity analytic terms, right. So that's one point to note. And as far as the locations are concerned, we have about 150 locations across 25 countries.



Now, going back to that 1.2 million connected assets, it wasn't a day that we connected all the assets together was an exponential curve, right, a slow process. And I'll get to that in a moment. If we can look at the industrial level evolution, right that we've gone through in the last 300 years or so we'll start with the first to say 1700s 1800s, right. The revolution that we had in 17 onwards, was we were able to build machines. So we were happy about it, we were able to build something in the second revolution then came in the division of labor, that you can actually have different departments come together and then build assets. So now you can build even more right after that was caterpillar 1925-ish. And then the third revolution is the automation part of it. Now we can build more, but how do you do it? With automation, right. So that's where the third revolution is. And we are currently in this fourth revolution. And along with these revolutions, maybe keep an eye on this productivity. And this productivity can mean anything for different industries. So if it's a software industry, you will look at your number of customers. For Caterpillar, it is more of productivity and customer success over time.



We are currently in this fourth digital connected revolution and all of the technologies that need to be, you know, servicing to be able to go through this revolution is what we've been hearing about in this conference, right? So big data, AI IoT. And this the thing I was talking about the 1.2 million connected assets you see right at the top, we did not start too early. But still it's about 20 years, where we had our first connected asset, we just came to know where our machine is working using GPS location. And going forward, we started developing the hardware software that's required to go on to the machines when we produce them. So before we take our machines to the customers, we put those devices on the assets so that we try to connect the data as in when the customer starts using it. And going up till the end. We have also come up with an aftermarket device. So if at all there are, I would say customer specific requirements and condition monitoring, we have our own devices as well just as an example would be vibration monitoring right not every customer who's maybe constructing an airport he is not very, I would say in need of knowing the vibration of its own components the vibration condition monitoring mainly comes in the mining equipment which is very expensive and time consuming as well you will see the value case real soon here. So let me maybe go through the video which is going to be coming up next and that will give you a good length of sight of what these machines are doing again we go back to that 797 example.


05:37 – (Video played)



So that was 797 The main reason I had this video in there is for you to give some thought in terms of the environment that these trucks are working in. So you saw that deep mind that the truck has go into when it is down up the down in the minds, the connectivity is an issue. And that's where the advancements in connectivity IoT comes into place. Because if we have to give real time feedback to our customers, we need to get real time data as well. Right only then we'll be able to do real time analytics cloud edge, whatever we want to choose as far as technology is concerned. So this is the truck that we're looking at just to give you a size estimate in terms of relative where we are according to the truck, that's me waving at you. (So that is me, right near the truck tire, if you see right down there. That's the size. And there is there is an operator sitting up there trying to hold this machine and travel long distances carrying 400 tons right?) What I'm trying to get at is the machine application size, the operations that the customers are having. And now we combine the IoT connectivity technologies to go with these different compartments that builds this truck right say truck bed engine transmission, just the tires themselves are one component for us, you cannot ship the whole truck at one time, you have to ship different parts on the site, build the truck on the site and then the truck dies there you cannot get it back as well. So all the maintenance has to be done on the site itself. We have about 150 Plus IoT sensors on it. And most of them are more than one hertz capable again depending on the business case and what the customer really needs. We tried to define what will be needed from a data perspective. (I think we are missing one zero up there but in terms of the weight, you can have 20 Different New York taxi cabs in terms of weight volume in the in the bed.) If you've ever been to Illinois, Peoria Illinois go to the CAT visitor center they have 797 truck bed, which they have made into an auditorium for you to get an idea of the size of the truck bed itself. So it's a really good experience if you like to go there.



Coming back to the connectivity, right? So big machines, small machines, everything is connected. Now, we want to define the use case of why we need connectivity, right? Okay, so I talked about the customer operations. On these mine sites, there are different meetings that will happen, there are lunch breaks, there are breakdowns of different machines as well, right. So the point is, there is so much uncertainty. And even if one truck, that big truck that I'm showing you, if that goes down, the whole mine set has to go down, right? And the cost for our customers at an average downtime is about $150,000 per incident, and per hour would be around $100,000, in terms of production loss. So if the mine is not working, so you can think of how important is it to have the connectivity pieces, IoT pieces, real time monitoring, if at all, the customer knows that the machine is going to go down, maybe even even a week before or a day before, they will take it in terms of solution analytics solution. So this defines the WHY, now I'll go into the HOW.



So Caterpillar has a three-tiered layer in terms of defining a solution for condition monitoring, we start with a 1.2 million connected assets, we define what type of data do we need, again, depending on the application, right? So caterpillar is all about the machines, we sell the machines through our dealers, we want to know what sort of oil is being poured into the machine? How are they using it. And there are different parameters that we can assess, or, you know, define KPIs, to see how our machines are being used, are they using the machines as we have intended them to be designed or used, or they are using or abusing the machines, right. So that would give us a good idea in terms of why the components are failing on the field. Again, a condition monitoring aspect of it.



The three-tiered approach is first is obviously the connectivity. So IoT connectivity all together. Next is the platform, once you have the data through the connectivity layer, it goes onto the platform. So we have an internal team that's taking care of the software part of it. We call it the platform team. And then the third one is the application. Because once we develop the technology and have these models ready. Somehow, we have to go back to our customers, right, and that is what the application team is doing. So these are these are all dealer facing application. And the analytics portion is between that platform and applications, right? So we develop the models and then push it to the application,  application goes out to the dealer, dealer connects with the customer, and says, Hey, customer, something's going wrong.



Okay. All right. So how to connect the physical and the digital. So everything that we saw in terms of application and the three layered approach, this is how we are operationalizing the whole model now we have a human CMA, we call it the Condition Monitoring Advisor. That's what CMA stands for. That CMA is ready to get a warning from any system, any model that we create, and the CMA is ready to connect with the dealer and then the customer to be able to warn them. We can even do it directly, but in these sort of applications, if at all, the machines are huge, it is better to have for Caterpillar, a person, a human in the loop. For smaller machines, they directly go out to the customers, right? So how do we connect to the CMA and then go to the dealers again. So we have these different assets of the IoT platform. Everything is on the cloud. So all the data points will go up to the cloud. So it's off board analytics that we're talking about, right? So cloud computing versus edge computing, this is cloud computing we’re talking about. And then comes the condition monitoring portfolio. And this is defined by your application again, so this goes back to our point of knowing your business, whether or not AI is going to help or analytics is going to help. So depending on what you want to achieve, you will be defining the models that you want to create for your condition monitoring application. So we rely heavily on the anomaly detection, it's a generic tool that people use to tell something is off right off the charts. So that anomaly detector, there are different rules-based models, because we are designing these assets, we are producing them internally, we know how they are going to fail, right? So we can, it's not it's not necessary to have a hammer for every nail, if at all, you can use a pushpin might as well do that. So, if you know the rule, that a machine is going to fail, you know with 100% certainty, rather than creating a machine learning model, be simple and make sure then that the impact is the highest right. So we have rules based models as well. And then the physics based models and digital twin models go hand in hand again, the machines are being designed by us. So we know the physics behind them. We have our own simulations and we tried to build a digital twin, so that we can sort of simulate the failures on those assets and then again, connect to the dealers and customers right. So this this becomes the whole condition monitoring portfolio, once we have this in place, we use the applications that are created to notify the CMA, right? And then the CMA will be connecting with the dealer for the customer, right. So that depends on the repair recommendation that goes back to the dealer, or if the CMA deems that the model has a false positive, that comes back to us, and then the model gets retrained. So this is the whole connection pipeline, I will say, between the physical world and the digital world. We are from digital analytics team, we create models using the data that's coming from these machines, and we have a totally different department designing these products and producing them.



Okay, this is a very generic slide shall not spend too much time on this. But these are the types of failures that we are trying to address your you'd be very well aware of the failure curve, the path curve, right? Early in their life, if at all, there is a defect in the machine, then it's going to fail early. But then those defects will go down as the machine lives more and more, right. And then you're going into that wear-curve. So overall, if you think about this failure curve, this is what we're trying to predict in terms of whether it is predictable, or it's totally random, random, meaning we have no idea how that is happening. So we have a different way of approaching these random failures, trying to understand from a physics perspective, what could have gone wrong. And if there are any other features that would be affecting those random failures. One of the simplest example is if you are trying to predict a failure in time, right, there are different ways that you can do it, different parameters that you might have identified, if you want some sort of lead time. That's where the anomaly detection comes in. Right? Anomaly Detection does provide you a week to week, in some cases, even a month worth of lead time, if you have enough samples to train on, right. Or you can go you can go rules based as well, which is the easiest approach. If you know that the oil temperature should not be over 100 degrees Celsius, your engine is going to go down, might as well put that rule in to monitor the engines in real time.



Okay. All right. So this is one of the one of our mining customers that we used our solution that I just shared, for monitoring for condition monitoring, and then connecting back to the customer trying to make sure that things were going wrong. So you can see the difference between without analytics and with analytics, right, if at all, they wouldn't have planned for that downtime. So we are trying to convert unplanned downtime into planned downtime, if that makes sense. Right?



If at all, it's unplanned, again, the site is down, you're losing the downtime costs that I mentioned, multiple hours, given the supply chain issues that we've seen during the COVID times that multiplies, right. So with analytics, it is always better to have an upfront monitoring system for our customers that will always keep you safe. And you will see the ROI in the long term. Okay, so the tagline that we are going for is to have the right parts at the right place at the right time so that we can connect with our parts team. They connect with the dealers and then dealers send over the parts to our customers.



Okay, with that, I'll keep the summary slide with three takeaways. Again, going back to the technologies that we talked about the different sessions that we've been going through solving the right problem the right way. So understand your own products, what do you want to achieve, there is technology out there but then if at all, you go the other way around, you have the technology, then you try to find the problem will never succeed, we've gone through the same thing. So now we are trying to find the problem first and then trying to decide what sort of technology do we need. Connectivity is the key, right. And then understanding the customer pain point is key as well. The example of customer needing to know where their machines are or the customer needs to know when the machine is going to go down.



Second one is to develop capabilities. I think once you do this right, you will be able to understand whether you need edge computing, cloud computing or in house on prem servers, right depending on the solution and the budget that you have. You will be able to go through this process. I think this was one of the talks as well. The five-step process of analytics you ideate get the data tested, iterated deploy and then scale.



Third one is customer value, I think customer value would be the output of those two. Once you are able to get through the customer pain points then obviously you will be increasing uptime. You will be building trust with the end customer. I think that is one of the main keywords that has been used since the last two days as well. Building trust is key. If at all the model is generating a lot of false positives then even if you have the best solution out there, best application, best looking portal. No one is going to look at it. We've been there done that. So now we are trying to take a better approach of understanding what the customer needs and building that relationship with the customer so that we can help our dealers and then dealers can help our customers. Okay. And I think that's my last slide. Five minutes for q&a. Anybody got any questions? We got a few minutes here. I've got a microphone to give you. All right.





So you mentioned those, those the size of those trucks and where they're where they're located. I live in Utah, and I'm near the the Kennecott Copper Mine, they're in copper tin, Utah. So I know it's not in a, you know, metro area and stuff. How do you how do you share that you have connectivity? When you're not in, you know, a densely populated area. All right. Yeah,



good question. So we came across that issue as well, like when we are not in those cities or not, not even in the cities, but well connected areas, there are two solutions that you can go to either cellular connectivity. If not, then sites would have or maybe Caterpillar would have in partnership with the dealers, their own radio networks. So we try to make sure that the site is connected, depending on its location, obviously, either radio cellular, again, going with the technology that makes sense from a business perspective. But we've had multiple situations, to the example that I go back to the mine example that I showed that itself when it goes down to the pit, even radio sometimes doesn't work. So we are still figuring out how to get that data in real time so that we can go back with our analytic models going forward. Good question though.



Hear me? Yep. So how difficult to do root cause analysis in the UK? So is it something trivial? Or here's a challenge?



Sorry? Could you say that, again?



How difficult it is to do root cause analysis in the UK? So is it straightforward? You understand from anomaly? What's the source of failure? Or it's not reveal the cross



analysis? You mean, root cause or non root cause analysis? Yeah, thank you. With anomaly detection? Yes, the root cause analysis is tricky. But again, we are trying to develop post processing algorithms after we you detect an anomaly, you know, say five ways that this problem could occur, right. So you need some sort of post processing for that to give the right root cause. But obviously, it's not coming as a byproduct of the anomaly detection algorithm, there is some sort of analysis that needs to be done. And that, again, comes from the product knowledge. So we know the folks who are designing this product, we tell them, hey, this is the anomaly. And then there are five different reasons. We are also trying to use AI on the back end, if we have years and years worth of data, where people have solved similar issues with some provided steps, AI can provide the solution. But for that, we need more and more data, right? Until that time, we are depending on the SMEs.



love to know what challenges you face getting to this solution and what challenges you see in the future ahead of you.



Yeah, one of the main challenges is the data collection, the very first step of it. We've had a hard time in terms of getting both predictor data and the ground truth data. When we talk about modeling since I'm in the analytics world. It's, I would say it's a loop problem. We try to develop our machines so that they don't fail. If they don't fail, you don't get ground truth data points. So you have to make synthetic ways of making your field machines fail over time. So ground to data has been one of the challenges I think in future. Connectivity is surely one of the points that we are worried about, again, the mines that we are talking about, we have more and more customers asking for different solutions for their own mindset. Right. So there are different types of mine and different levels of mine, I would say. So mining is one of our main concerns as of now. Okay, thank you so much for Yeah, go ahead.



Quick question. Do you also use the data as a feedback in your design process to increase your lifetime? Use the data as feedback through the design process of the caterpillar trucks? Yes,



yes. Yeah, we haven't done that yet. But that is in the works like so while we are designing most of our components that component quality as of now, we haven't taken that into consideration. There was a really good talk by Mr. Betts this morning from the Daimler company. That is something we're trying to look into as well. We've just started this cat Digital Analytics portion of it. But if at all, we tried to understand the quality of our own products, I think it can resolve some of the few issues up front before we see Those issues on the field so good point yeah


Our Sponsors

Industry Partners



Diamond Sponsors



Gold Sponsors


Silver Sponsors


Bronze Sponsors

Associate Sponsors


Networking & Party Sponsor


Media & Strategic Partners