4-Methylcytosine (4mC) represents a crucial DNA base modification, characterized by the addition of a methyl group to the cytosine base. This methylation involves normal gene expression, DNA repair, cell cycle regulation and so on. 4mC binding sites, specific regions within the DNA sequence, engage with 4mC through interactions with proteins or other molecules, influencing gene expression and other cellular functions. Accurate prediction and understanding of the 4mC binding site are of great importance for the in-depth study of the function and regulatory mechanisms of DNA methylation, and for elucidating the mechanisms of cellular processes and disease development. Traditional wet experiments for locating 4mC binding sites, despite their utility, are hindered by high costs and extensive time requirements. Therefore, various computational models have been developed to predict 4mCbinding sites. In this study, we develop a framework named “4mCMS” based on deep learning. The model uses a multi-scale feature encoding strategy that combines different kmers together so that a more comprehensive set of DNA sequence information can be captured. On the C. elegans dataset, the model has an ACC of 0.931, which exceeds the results of the comparative models. The experimental results demonstrate the effectiveness of our proposed model, as we achieve advanced results on the 4mC datasets.
SARS-CoV-2 inhibitor plays an important role in COVID-19 preclinical drug discovery. As the existing SARS-COV-2 inhibitors showed more or less deficiencies, it is urgent to develop new SARS-COV-2 candidate inhibitors. De Novo Molecular Design plays a very important role in drug discovery. Most of the existing method use SMILES (Simplified Molecular Input Line Entry System) as the input of deep learning models. One popular way is utilizing deep learning models to automatically generate candidate drug molecules, and most of the existing models use SMILES as the input. In this study, we embed SMILES using a sub-word algorithm named BPE (Byte Pair Encoding) instead of One-Hot. First of all, the sub-word algorithm BPE learns a vocabulary of high frequency SMILES substrings from a large SMILES dataset, SMILES are then tokened according to the vocabulary learned by the BPE algorithm. Results show that the BPE algorithm can effectively learn the SMILES grammars and can help our generative model generate potential SARS-COV-2 inhibitors after transfer learning using the known 1253 SARS-COV-2 inhibitors. Generally, this paper provides an effectively method for de novo molecular design of SARS-COV-2 inhibitors.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.