¾Á¸»ŠÊ˜·³Ç

Skip to main content

Big Data in Chemistry

Edited by Igor V. Tetko, Helmholtz Zentrum München, Germany

The increasing volume of biomedical data in chemistry and life sciences requires the development of new methodologies and approaches for their analysis. Artificial Intelligence (AI) and machine learning, especially neural networks, are increasingly used in the chemical industry, in particular with respect to Big Data.

The goal of this special collection in is to show progress and exemplify the current needs, trends and requirements for machine learning in chemical data analysis. In particular, it focuses on the use of chemical informatics and machine learning methodologies to analyse chemical Big Data, e.g. to predict biological activities and physico-chemical properties, facilitate property-oriented data mining, predict biological targets for compounds on a large scale, design new chemical compounds, and analyse large virtual chemical spaces.

The collection mainly contains a selection of articles to be presented during the BIGCHEM special session of the International Conference on Artificial Neural Networks (), which is co-organized by the  and the Horizon2020 Marie SkÅ‚odowska-Curie Innovative Training Networks European Industrial . 


  1. The increasing volume of biomedical data in chemistry and life sciences requires development of new methods and approaches for their analysis. Artificial Intelligence and machine learning, especially neural ne...

    Authors: Igor V. Tetko and Ola Engkvist
    Citation: Journal of Cheminformatics 2020 12:74
  2. We present the open-source AiZynthFinder software that can be readily used in retrosynthetic planning. The algorithm is based on a Monte Carlo tree search that recursively breaks down a molecule to purchasable...

    Authors: Samuel Genheden, Amol Thakkar, Veronika Chadimová, Jean-Louis Reymond, Ola Engkvist and Esben Bjerrum
    Citation: Journal of Cheminformatics 2020 12:70
  3. In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL),...

    Authors: Thomas Blaschke, Ola Engkvist, Jürgen Bajorath and Hongming Chen
    Citation: Journal of Cheminformatics 2020 12:68
  4. The technological advances of the past century, marked by the computer revolution and the advent of high-throughput screening technologies in drug discovery, opened the path to the computational analysis and v...

    Authors: Laurianne David, Amol Thakkar, Rocío Mercado and Ola Engkvist
    Citation: Journal of Cheminformatics 2020 12:56
  5. Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecule...

    Authors: Alice Capecchi, Daniel Probst and Jean-Louis Reymond
    Citation: Journal of Cheminformatics 2020 12:43
  6. Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on ch...

    Authors: Isidro Cortés-Ciriano, Ctibor Škuta, Andreas Bender and Daniel Svozil
    Citation: Journal of Cheminformatics 2020 12:41
  7. An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based...

    Authors: C. Škuta, I. Cortés-Ciriano, W. Dehaen, P. Kříž, G. J. P. van Westen, I. V. Tetko, A. Bender and D. Svozil
    Citation: Journal of Cheminformatics 2020 12:39
  8. Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, t...

    Authors: Josep Arús-Pous, Atanas Patronov, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen and Ola Engkvist
    Citation: Journal of Cheminformatics 2020 12:38
  9. For kinase inhibitors, X-ray crystallography has revealed different types of binding modes. Currently, more than 2000 kinase inhibitors with known binding modes are available, which makes it possible to derive...

    Authors: Raquel Rodríguez-Pérez, Filip Miljković and Jürgen Bajorath
    Citation: Journal of Cheminformatics 2020 12:36
  10. Activity landscapes (ALs) are graphical representations that combine compound similarity and activity data. ALs are constructed for visualizing local and global structure–activity relationships (SARs) containe...

    Authors: Javed Iqbal, Martin Vogt and Jürgen Bajorath
    Citation: Journal of Cheminformatics 2020 12:34
  11. Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequent...

    Authors: Ruud van Deursen, Peter Ertl, Igor V. Tetko and Guillaume Godin
    Citation: Journal of Cheminformatics 2020 12:22
  12. Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity an...

    Authors: Jennifer Hemmerich, Ece Asilar and Gerhard F. Ecker
    Citation: Journal of Cheminformatics 2020 12:18
  13. We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in high...

    Authors: Pavel Karpov, Guillaume Godin and Igor V. Tetko
    Citation: Journal of Cheminformatics 2020 12:17
  14. Designing a molecule with desired properties is one of the biggest challenges in drug development, as it requires optimization of chemical compound structures with respect to many complex properties. To improv...

    Authors: Åukasz Maziarka, Agnieszka Pocha, Jan Kaczmarczyk, Krzysztof Rataj, Tomasz Danel and MichaÅ‚ WarchoÅ‚
    Citation: Journal of Cheminformatics 2020 12:2
  15. Neural Message Passing for graphs is a promising and relatively recent approach for applying Machine Learning to networked data. As molecules can be described intrinsically as a molecular graph, it makes sense...

    Authors: M. Withnall, E. Lindelöf, O. Engkvist and H. Chen
    Citation: Journal of Cheminformatics 2020 12:1
  16. Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generativ...

    Authors: Oleksii Prykhodko, Simon Viet Johansson, Panagiotis-Christos Kotsias, Josep Arús-Pous, Esben Jannik Bjerrum, Ola Engkvist and Hongming Chen
    Citation: Journal of Cheminformatics 2019 11:74
  17. Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. He...

    Authors: Josep Arús-Pous, Simon Viet Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen and Ola Engkvist
    Citation: Journal of Cheminformatics 2019 11:71
  18. This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTS...

    Authors: Oliver Laufkötter, Noé Sturm, Jürgen Bajorath, Hongming Chen and Ola Engkvist
    Citation: Journal of Cheminformatics 2019 11:54