Edge Computing and AI for Operational Efficiency and Production
Abhinav Kohar, the Head of AI Hub - Americas for SLB (previously Schlumberger), discusses the use of AI and IoT for operational efficiency and production in satellite field wells.
The challenge: to use AI and IOT data to monitor satellite field wells for operational efficiency and production.
The solution: A platform which runs a machine learning algorithm at the edge to classify DynaCards and enable smart alerts that proactively notify users if anomalies are found when monitoring data which reduced anomaly downtime up to 70% for the connected wells.
Sharing a real-world deployment of an end-to-end machine learning pipeline, this session explores the challenges faced in digitizing remote locations and the need for real-time monitoring to prevent downtime and production loss.
Watch it on-demand now.
First, Schlumberger has rebranded itself into SLB. Driving energy innovation for a balanced planet, so you don't have to pronounce the complicated Schlumberger anymore. It's very simple SLB. Alright, and this is the agenda for today. So first, we'll talk about the challenge of using AI and IoT to monitor satellite field wells for operational efficiency and production. Then we'll talk about a platform solution which runs a machine learning algorithm at the edge to classify Dynacards, and enable smart alerts that proactively notify users if anomalies are found when monitoring the data. And finally, we'll talk about the results, one of them being reduced anomaly downtime up to 70% for the connected wells. So most of the satellite fields are in remote locations, and they are not digitized. They lack the infrastructure for collecting the data and basically artificial left optimization. And that is where agenda advanced analytics come into picture. So the manual data collection on these fields lead to two kinds of problems. First is insufficient and discrete data for any post failure analysis. And second, minimal data to investigate post pre failure events. We need real time monitoring to prevent well downtime and production loss. So what does the satellite will looks like? It has road bumps, which are the most common form of artificial lift for rivals. Today, these systems are used to lift formation fluids from more than 600,000 wells worldwide. A rod pump system consists of a prime mover, a surface pump, a sucker rod strength and a downhill pump. The overwhelming majority of the surface units are been pumps. So you can see sucker rod pump right here. And what is a Dynacard, Dynacard is simply a plot between the rod load and displacement over a pump cycle for a socket or pump. And it is critical in order to analyze the health of a downward downhill pump. And the shape of the card is basically used to tell what is the health of the pump. And a proper and detailed analysis of Dynacard can help in reducing the pumping cost, equipment failure, appropriate selection of pumping equipment and increase in production. So if you see here, there are some sample shapes for the Dynacard available. So this is the normal pump operating conditions the ideal card, this is basically a slanted, normal condition. This is fluid bound gas interference. And finally, one out pump. And these are just some sample shapes, we also have more than these.
So in the past, we were periodic and reactive to do well intervention. And for some satellite fields, it could take up to months in order to get permit approvals. But today we are continuous and predictive, monitoring all these wells in real time and diagnosing them. So this is what the pre Internet of Things and artificial intelligence workflow looks like. First, we'll mobilize a team for the site. Second, they'll get permit approvals, which can take a long time, then they'll manually go to the remote location record the Dynacards, they'll come back to the office, share it over the mail with the team, then will use the software to visualize these data cards. Finally, it will be sent to the subject matter experts, and they will basically tell what is wrong at the site and finally, they will send this information to the maintenance team. So, you can see there is a long cycle here and the data available is discrete. And sometimes it is very hard to basically predict failures before they happen. So what do we need, we need this new data flow architecture to avoid all these problems. So, the satellite fields do not have any existing infra for getting the data. So, one of the requirements is that in order to digitize these fields, it should be economical. And second, we should be able to send this data to the cloud, which will enable machine learning and AI on this particular data. And so, this is the satellite field and we are able to collect data from this field using the controllers. And this data is basically sent to the rod lift optimizer, from Agora and this through a cellular network basically sends it to the cloud. And this is just an IoT gateway. And Agora is a SLB subsidiary company. So you can see with very economical setup, we are able to send this data to the cloud, which will basically power our platform.
So the first thing in this platform is data visualization layer, we can visualize the Dynacards in real time, and interpret them. So here you can see a time lapse of Dynacards plotted over one another. And this basically tells me how the pump health is evolving over time. Second, we can also visualize the field status, we can check which wells are over pumping, under pumping, optimally pumping, and adjust our strategy accordingly. We can also visualize a lot of sucker rod pump parameters, we can visualize the motor current, the strokes per minute, the Dynacard area pump village, and this basically also tells you about your pump health. And finally, since we have so many Dynacards being generated every single day 1000s of them, it is not possible anymore, to rely on our subject matter experts to basically tell us what it means or what is the situation at the pump currently. So we basically need a machine learning algorithm, which can basically tell us what kind of failure is happening at the pump and how to diagnose that. So here we can take a look at sample failures, the non normal pump operating conditions, high fluid level, very high fluid level and gas interference. And on the right hand side, you can basically see sample data distribution for these classes for a particular well. And as you can see, this is an imbalanced class problem for machine learning to handle.
So what we did was we sent all these particular labels, to our subject matter experts, this is also part of the platform, they tagged all these Dynacards, with different classes, normal operation, Highfield level and all the others. And we fed this through a neural network. And we were able to predict, basically with a very high accuracy 80 to 85%, for different wells, what kind of failure is going to happen at a particular pump. And this was basically used to send smart alarms to the maintenance team. And this was also deployed at the edge so that we can do the predictions in real time. And we don't need to send all that data to the cloud because it requires a lot of bandwidth, we can save some bandwidth with the help of that. And finally, we also deployed this using an end to end machine learning pipeline which consists of six stages. The first one is basically the orchestrated experiment. Second one being the continuous integration of building testing and deploying the packages. And then third one is the continuous delivery. Fourth one is the automatic triggering of the pipeline. fifth one is basically the model delivery or model serving, and then we have the performance monitoring. And based on this, we can trigger the pipeline again. And as you can see here, this above part is basically for the experimentation or development phase. And this one is for staging and production phase.
We also provide an ability for our subject matter experts to fine tune the model. So we are predicting basically 1000s of times a day, the pump held and if the subject matter experts feel that something is not correct, or something is changing at the pump, they can basically change the tag tagged. And this will be fed again into our machine learning pipeline for retraining and the model will be updated in an online learning fashion. And as you might know, that the model help keeps on decaying with time, no, no model is perfect and the performance will degrade. So we basically have automatic triggers to retrain the model based on two things. One is the f1 score. So if the f1 scores goes down below, say, threshold will basically retrigger the training. And second is basically when a data drift happens. So data drift is basically a covariate shift in your data. So based on a combination of these two factors will re trigger a training. And this will keep your pumps in basically your model spam models and evergreen health. And now is the time to discuss the results, we'll discuss downtime reduction, we'll discuss the smart alarms and saccharide pump optimization based on this platform for the satellite fields.
So downtime reduction. In the first graph, you can basically see the Dynacard area on this axis, and strokes per minute on this axis over time. So here when the failure event happens, basically you can see the Dynacard area drops the strokes per minute drops, and takes the team around 60 hours to basically bring the pump back online. But after the deployment of this new infrastructure, we are able to generate an algo based alert before this event happens and the team is able to bring the pump back online in less than six hours. So you can see there is an order of magnitude reduction in the downtime. And this is very, very useful for all our wells. Secondly, we are able to send these smart alarms before a catastrophic failure happens to the maintenance team. And they are able to take preemptive actions in order to normalize the pump conditions. And the smart alarm algorithm is based on the moving average of the card area data card area. And this is the overall algorithm. And next, we go to sucker rod pump optimization based on the platform. So, on the left-hand side, you can basically see a low pump village based on the Dynacards area. And the well was rammed down in order to basically prevent failure. And finally, after some time, we were able to ramp up the well and we can see normal operating conditions of using the Dynacards. And if you kept on operating the pump basically at off condition, this will lead to catastrophic failure. So we are able to reduce downtime here as well. And on the secondhand side, here, you can see another optimization. So basically, we have the high gas interference and high friction in the pump does sometimes happen due to debris or residue in crude oil. So the subject matter expert was able to see this on the platform, and he recommended a hot water circulation through the pump. And after that was done, we were able to improve village and reduce the friction. And this again prevented a catastrophic failure at the pump. So you can see this economical infrastructure using edge IoT gateway. And this platform allows us to digitize our satellite fields. And we are able to prevent failures, increase efficiency and basically prevent failures. So that is my time. And now I will open up for some questions.
Question &Answer Session
If you would like to ask a question, can I just ask you to raise your hand so I can bring the microphone to you?
How many data points are required for retraining?
That's a very good question. So for retraining, you need to keep a few things in mind. How much is your like data size, say 10 million data points versus 1 million data points versus 10,000 data points. Second is basically data drift, which we talked about a covariate shift in your data distribution. And third is basically the frequency of updates, like how frequently do you get new data? Do you get new data every one month, every second every nanosecond. And based on that you can use different strategies for retraining, you can have a fixed window strategy for retraining. Which basically means you just say use the new data to retrain forget about the past a dynamic window strategy, where say you have say 100,000 data points, you get new 5000 data points, you can keep those 5000 data points as test data. The other 100,000 is basically training data and use like a grid search or something to find the window size and basically retrain. Or you can use a representative sample selection where your training data looks like production data and you train on that. But in all these strategies, you need to be mindful that you do not have data drift or concept drift. So yeah.
Thank you, good presentation, can you give a comment on the cost of the system versus the savings.
So the cost of the system is basically you know, the edge IoT gateway, which we basically put on the field where we have a controller, and that basically, you know, it has everything a cellular network to send your data and everything. And basically the platform is available on the cloud. So that is the entire cost of the system. And the dollar amount basically depends on your usage. So that is a little different. And the savings, as we saw is basically you know, 70%, downtime, reduction, and 85% reduction in failure. So that is like millions of dollars of savings with respect to you know, the cost of this infrastructure.
Okay. I guess just a follow on But isn't there the cost when you add those systems, the people to manage and generate the models, which you would not be a cost if you didn't have those systems. So, yeah,
so, as we talked about right, the people are only required in the beginning when we deploy basically the box and after that everything is automated. So, we talked about the model, the model is already automated and the retraining is automated through the ML pipeline, if your model performance degrades say for a particular field, it will automatically get new data from that particular field retrain itself. And apart from that, there is no involvement from our side on this particular field unless you know so there is a client support required or something like that. And the subject matter experts at the particular field can just visualize what is going on. So And they can also like, trigger a retrain manually, that hey, this doesn't look correct and I'll basically change some labels, trigger retraining and that's it. So there is not a lot of involvement from our side and most of this is automated. Alright, thank you, everyone.