|
1.INTRODUCTIONMachine learning (ML) becomes a popular approach in industry applications1. However feature extraction is still a time-consuming step in the applied machine learning projects. It is heavily relied on data analyzer’s experience and domain knowledge, which hinder ML’s broader expansion to non-experts. Automated machine learning (AutoML) is an exploration to ease the process of building ML models by automating commonly used steps, such as feature extraction & preprocessing, model selection, and hyperparameters tuning2. For example, Featuretools3 is a Python library for automatically engineering features from relational and transactional data. The library introduces the concept called Deep Feature Synthesis (DFS). For multiple datasets with relationships defined among them such as parent-child based on primary keys (or unique identifiers columns), DFS can create new features based on certain calculations, such as summation, count, mean, mode, standard deviation, and so on. Such a data schema based feature extraction approach can be extended to industry domains by considering causal relationships. To leverage causal relationship in feature extraction, a formalized description model is a prerequisite. In industrial Fault Detection and Isolation (FDI), several graphical models have been proposed to describe the causal relationship of dynamics system. Ould Bouamama et al.4 divide them into 3 categories, (1) structural graph based ARR (Analytical Redundancy Relation) including bond graph, bipartite graph, linear graph, etc. (2) qualitative graph based methods such as causal graph, functional graph, signed directed graph, etc. (3) causal probabilistic model including Bayesian network and dynamic Bayesian network. The interest of using graphical methods arises from its easy construction, and ability to obtain main properties (monitorability, diagnosability, sensor placement, observability, controllability), which can be provided before industrial design. Sztyber et al.5 introduced a new causal graph called as GP (Graph of a Process). GP can be constructed on the basis of process diagrams and expert knowledge. Vertices of the graph represent process variables, disturbances, and control signals. Directed edges depict relations between variables. This paper aims to explore the feasibility of automatic feature extraction in in industrial or manufacturing domain, Since in industrial domain, system dynamics diagram, piping & instrumentation diagram (P&ID) and other system diagrams contain rich “structural relationship” between raw variables implicitly. If properly captured and harvested, many useful features can be extracted automatically to ease machine learning process. The paper is structured as follows. A system dynamic diagram (SDG)6 approach is discussed in Section 2. The extraction algorithms of typical model structures in SDG, including anchor variable, path, branch, and closed loop, are presented in Section 3. Automatic feature extraction policy to different model structures is discussed in Section 4. 2.SYSTEM DYNAMICS MODELLINGTo be consist with previous work, the symbolic representation by Sztyber et al. is adopted with minimal extensions, as shown in Figure 1. The first 3 symbols are the same as the work of Sztyber et. al. That is, measurements are marked with black circles, control signals with white circles with a black dot, and unmeasured state variables are marked with white circles. Besides these 3 nodes, 3 node types and 2 linkage types are defined. To clearly model the branch and assemble structure, a virtual node is added by a circle with dash line. This is useful in many industrial processes. For example, in the adaptive water feed system, the water of 3 waterfeed subsystems comes from the same main water pipe, the gas generated from each subsystem are feeded into the same steam pipe to the turbine generator. Group node is used to denote that multiple variables are co-located, such as pressure and flow of water main pipe. To identify the control objective, objective symbolic is added. Linkage are divided into 2 types as physical linkage and logic linkage. Logic linkage is used to represent the designed control logic, and physical linkages is used to model the interaction in the physical world. Since the purpose of this paper is not just about fault diagnosis, fault node type in Sztyber is ignored here. Group Node and Branch/Joint Point are categorized as virtual node, since they are not physically existed but a logic node to represent system structure. The type of other 4 nodes are called as basic node. An adaptive waterfeed subsystem in power plant is used as an example in this paper to illustrate the conceptions and algorithms. The system receives water from another subsystem by a pump. Feed water flow rate is adjusted to the plant’s target power output by controlling pump’s speed. The water is fed into 3 steam generator branches. Each branch’s water flow rate is controlled by valve’s openness. The steam from 3 branches is merged into a main gas pipe to drive steam turbine for power generation. The expected water level in steam generator is set according to target power output. The expected water level is automatically maintained by adjusting related valve openness. The P&ID diagram is shown as Figure 2a7. According to previous description, the control policy diagram can be shown as Figure 2b. The system dynamics diagram of the water feed subway is shown in Figure 3a. The control policy can be merged with original SDG of the system. The merge algorithm is shown in below. The process can be illustrated as Figure 3. After the merge, we obtain a directed diagram with vertex and linkage. Here are some annotations for the convenience of the discussion hereafter. 3.TYPICAL STRUCTURE IDENTIFICATIONWith the merged system dynamics diagram, 4 types of model structures can be identified. Different model structures have different feature extraction policy, which will be discussed in next section. 3.1Anchor variable identificationAnchor variables are the variables which determine the behavior of the whole system, that is, anchor variable is not the result of the other variables. Anchor variable might be a system supervisory parameter or a variable from a higher-level system. Based on such a definition, Anchor variable in SDG should have 0 inbound linkage degree and non-zero outbound degree. If a node with 0 inbound linkage is lied in a group node, and if the inbound linkage is 0, the node is also an anchor variable. According to anchor variable identification rule, only “Power” node is anchor variable. It is understandable since target power output is the fundamental drivers of all other variables in water feed subsystem. 3.2Path and branch identificationA path is a directed physical linkage between nodes, which starts from a basic node, and end with a virtual node, or a basic node without outbound physical linkage. The identification algorithm is shown below. The paths in Figure 3 are shown in Figure 4. If two paths has the same previous node and the same ending node, they are called as parallel branches. 3.3Control loop identificationUnder a fault condition, target variable sometimes can still achieve expected configure value by a closed control loop. But a control variable usually exhibits abnormal behavior. Thus control loop structure is also important to feature extraction. The identification algorithm is shown below. The control loop identification process of Figure 3 can be illustrated as Figure 5. 4.FEATURE EXTRACTION POLICYFor the variable with the same unit (for example, centigrade degree), PCA/PLS/ICA and other linear/nonlinear feature extraction methods are more meaning than those operator cross different physical units. For different variables in the same path structure, if they are in the same physical units, differences between them are also meaningful. For example, pressure drop after passing a valve, temperature drop with a pipeline. For variable with different units, grammar based genetic algorithm can be used8. Grammar defines the possible combination. For the branch structure, the pair comparison or allocation ratio are often meaningful. For example, if the flow of one pipe network branch is much lower than others, there might exist pipe blockage. For the loop structure, transient behavior of objective variables or value distribution of control variables are important feature in anomaly detection. Anchor variable clustering means different working conditions. Cluster information can be used as categorical features in data analytics. 5.CONCLUDING REMARKSIn this paper, a system dynamics diagram (SDG) based feature extraction approach is proposed. SDG is adopted to capture the relationship between variables. Merge algorithm of P&ID and control policy diagram is presented. SDG structure identification algorithms are introduced, including anchor variable, single path, parallel branch, and closed-loop. For different SDG structures, different feature extraction operators can be configured. The approach is illustrated with a feedwater flow control system. Such a formalized graph structure based automated machine learning can significantly reduce feature extraction efforts. There are several extension directions. Firstly, SDG in this paper is given by domain experts. A data aided approach10 will be useful when there is no clear mechanism relationship existed. Secondly, SDG model simplification and consistency checking algorithms are necessary in a complex SDG situation. Thirdly, the feasibility of a predictive model11 can be deduced according to SDG structures. All of these works will enrich the synergy of data-driven approach and domain knowledge driven approach12 in industrial data analytics. ACKNOWLEDGMENTSThis work is supported in part by National Key R&D Program of China (2018YFB1700605) and Open Fund of MIIT Key Laboratory of Big Data for Industrial Quality (2021-IEQBD-01). REFERENCESQin, S. J. and Chiang, L. H.,
“Advances and opportunities in machine learning for process data analytics,”
Computers and Chemical Engineering, 126 465
–473
(2019). https://doi.org/10.1016/j.compchemeng.2019.04.003 Google Scholar
Hutter, F.,
“Automated Machine Learning: Methods, Systems, Challenge,”
Springer, Gewerbestrasse
(2017). Google Scholar
Das, S. and Mert C. U.,
“Hands-On Automated Machine Learning: A Beginner’s Guide to Building Automated Machine Learning Systems using AutoML and Python,”
Packt Publishing, Birmingham
(2018). Google Scholar
Bouamama, B. O., Biswas, G., Loureiro, R. and Merzouki, R.,
“Graphical methods for diagnosis of dynamic systems: A review,”
Annu. Rev. Control, 38
(2), 199
–219
(2014). https://doi.org/10.1016/j.arcontrol.2014.09.004 Google Scholar
Sztyber, A. O. and Kos̈cielny, J. M.,
“Graph of a process - A new tool for finding model structures in a model-based diagnosis,”
IEEE Trans. Syst. Man, Cybern. Syst, 45
(7), 1004
–1017
(2015). https://doi.org/10.1109/TSMC.2014.2384000 Google Scholar
Wang, H. T.,
“Decoupled control of coal miller,”
Thermal Power Generation, 42
(02), 58
–61
(2013). Google Scholar
“Guangdong Nuclear Power Training Center,”
900MW Pressurized Water Reactor Power System and Equipment, Atomic Energy Press, Beijing, (2005). Google Scholar
Silva, A. M. D. and Leong P. H.W.,
“Grammar-Based Feature Generation for Time-Series Prediction,”
Springer, Singapore
(2015). https://doi.org/10.1007/978-981-287-411-5 Google Scholar
Tidriri, K., Chatti, N., Verron, S. et. al.,
“Bridging data-driven and model-based approaches for process fault diagnosis and health monitoring: A review of researches and future challenges,”
Annual Reviews in Control, 42 63
–81
(2016). https://doi.org/10.1016/j.arcontrol.2016.09.008 Google Scholar
Alizadeh, E., Koujok, M. E., Ragab A., et. al,
“A data-driven causality analysis tool for fault diagnosis in industrial processes,”
IFAC, 51
(24), 147
–152
(2018). Google Scholar
Rashidi, B., Singh, D. S. and Zhao, Q.,
“Data-driven root-cause fault diagnosis for multivariate non-linear processes,”
Control Engineering Practice, 70
(11), 134
–147
(2017). Google Scholar
Bikmukhametov, T. and Jäschke J.,
“Combining machine learning and process engineering physics towards enhanced accuracy and explainability of data-driven models,”
Computers and Chemical Engineering, 138 106834
(2020). https://doi.org/10.1016/j.compchemeng.2020.106834 Google Scholar
|