On the value of popular crystallographic databases for machine learning prediction of space groups
Peer reviewed, Journal article
MetadataShow full item record
Original versionActa Materialia. 2022, 240 118353-?. https://doi.org/10.1016/j.actamat.2022.118353
Predicting crystal structure information is a challenging problem in materials science that clearly benefits from artificial intelligence approaches. The leading strategies in machine learning are notoriously data-hungry and although a handful of large crystallographic databases are currently available, their predictive quality has never been assessed. In this article, we have employed composition-driven machine learning models, as well as deep learning, to predict space groups from well known experimental and theoretical databases. The results generated by comprehensive testing indicate that data-abundant repositories such as COD (Crystallography Open Database) and OQMD (Open Quantum Materials Database) do not provide the best models even for heavily populated space groups. Classification models trained on databases such as the Pearson Crystal Database and ICSD (Inorganic Crystal Structure Database), and to a lesser extent the Materials Project, generally outperform their data-richer counterparts due to more balanced distributions of the representative classes. Experimental validation with novel high entropy compounds was used to confirm the predictive value of the different databases and showcase the scope of the machine learning approaches employed.