Xinxin Hu, Jiayao Lv
Proceedings Volume Second International Conference on Biomedical and Intelligent Systems (IC-BIS 2023), 1272408 (2023) https://doi.org/10.1117/12.2687762
The research data were obtained from the information of ER antagonists provided by the D problem of Huawei Cup. We screened the molecular descriptor variables to construct quantitative prediction models for the biological activity of the compounds against ER, and the classification prediction models for Caco-2, CYP3A4, hERG, HOB, and MN of the compounds, respectively. We also determined the optimal range of molecular descriptors that could lead to compounds with better bioactivity for ER inhibition and better ADMET properties. In turn, we provide data analysis and predictive models for breast cancer research. In the first step, we used the feature es election method to remove redundant variables, and applied random forest to rank the variable sinterms of relevance to lterout7 molecular descriptor variables: MDEC-23, max HsOH, minHBa, min HsOH, minHBint4, C1SP2 and nHBAcc. In the second step, the pIC50 values in the training set were used as dependent variables and the seven molecular descriptor variables were used as independent variables, and extremely randomized trees were applied to construct the non-linear regression quantitative prediction model between compounds and ER bioactivity. In the third step, we used different integration methods to construct classification prediction models for Caco-2, CYP3A4, hERG, HOB, and MN, respectively. In the fourth step, after eliminating the 1974 compounds with the sum of ADMET indicators less than 3, we inverse solve the prediction model using the particle swarm optimization algorithm to obtain the maximum value of pIC50 and the optimal solution for each molecular descriptor variable. pIC50 has a maximum value of 8.9055, and the molecular descriptors C1SP, minHBa, min HBint4, minHsOH, maxHsOH, nHBAcc, MDEC, nHBAcc, and MDEC-23 are 0, - 7.75978169, -3.42842314, 9.46164889, 0.51061438, 2.63674280, and 80, respectively.