There has been growing interest over computational methods

2023-05-24

There has been growing interest over computational methods to predict the biological activity by chemical structure, so as to decide whether it h89 has objective qualities or not. In this contribution, the well-known method, which is called quantitative structure–activity relationship (QSAR) [13,14,15,16,17]; has been developed, and proved to be useful tool for predicting the biological activities of compounds by utilizing experimental data and molecular structures [18,19]. Based on the model derived by QSAR method, biological properties can be obtained without any experimental efforts for synthesis and testing the novel compounds, easily [20,21,22]. These features made this method to be expanded and used in several fields, and recently could have been employed for screening the biological activities of drugs in drug design [23]. By collecting the experimental data, and then calculating the theoretical parameters for new designed compounds, QSAR model can be generated. In QSAR model, the experimental data are associated with the biological properties such as toxicity, bioavailability or activity which are considered as dependent variables for creating the model. The parameters are numerous descriptors that demand molecular structures. Among these descriptors only some of them which are relevant variables in correlation with biological activities should be selected. Hence, employing a technique to select the respective variables is one of the essential steps in the QSAR method [24,25]. Progressive of the variable selection tools resulted in developing of remarkable methods such as stepwise [26,27], simulated annealing [28] and genetic algorithms (GAs) [29]. Once the respective descriptors have been obtained, the model is built by using various modeling methods such as multiple linear regression (MLR) [30,31], support vector machine (SVM) [32,33] and partial least squares (PLS) [34]. In the present work, multiple linear regression technique has been used to generate the QSAR model based on genetic algorithm as a variable selection method. The objective of this study is to develop a QSAR model, and also accurate quantitative relationship between the molecular structure and ACK1 inhibition activity of the taken compounds.
Materials and methods
Results and discussion First, the data set was divided into two subsets of training and test [45,46] which respectively include 30 and 7 compounds with the taken ratio of 80% for training set and 20% for test set in whole series of compounds based on hierarchical clustering technique. After clustering of whole data set the test set compounds were randomly selected from all clusters. The selected molecules as a test set are shown in Table 1 and marked as bold in Fig. 1. These chosen molecules were not involved in selecting respective descriptors to build the QSAR model. The main purpose of using these molecules is to test the accuracy of the given QSAR model. For the selection of the best variables, genetic algorithm (GAs) subset selection method has been used. Finally, the MLR analysis combined with GA to build the model based on training set. The most statistically meaningful descriptors which were selected by genetic algorithm are MAXDP, PW5, Mor30e, E1s, H7u and H-047. Multi-collinearity for the selected descriptors was detected by calculating the variation inflation factors (VIF) [47]. Correlation coefficient and corresponding VIF values for each descriptor are given in Table 2. As can be seen from this table, the correlation coefficient value of each pair descriptors was less than 0.47, which meant that the selected descriptors were independent; also, all variables have VIF value less than 5, indicating that the obtained model has obvious statistical significance [47]. The GA–MLR model based on training set and by the selected descriptors was developed and the linear equation was as follows: where N is the number of compounds of training set, , and