对X射线衍射小型数据集分类—数据增强-深度神经网络

科技工作者之家 2019-07-02

来源:知社学术圈


快速材料表征对于高通量新材料探索十分重要。XRD是材料表征和筛选的重要手段,但获得XRD并对其进行分类通常比较耗时,成为高通量材料表征的瓶颈之一。如高角度分辨率的XRD数据采集通常需要1小时,此后一般还需晶体学专业人员再耗费1-2小时进行Rietveld精修,这还只是对已知结晶相所作的数据采集,对未知的结晶相将花费更多时间。

来自麻省理工学院和新加坡的研究团队发展了一种基于监督机器学习的框架用于快速获得和识别新型薄膜材料的XRD图谱。他们首先根据ICSD数据库中164种薄膜卤化物和115种实验合成薄膜的XRD图谱建立了一个数据库。基于这个小型库发展了一个与模型无关的、物理信息输入的数据扩展方法用于构建训练数据集。进而采用该数据集训练了一个卷积神经网络用于XRD图谱分类,其维度和空间群分类准确率分别可达93和89%。本研究提出的方法可以成功解决新材料探索固有的数据稀缺问题,能够快速地(在5.5分钟以内)得到一个新材料的XRD图谱并对其进行分类。

该文近期发表于npj Computational Materials 5: 60 (2019),英文标题与摘要如下,点击左下角“阅读原文”可以自由获取论文PDF。

20190702121309_501aa9.jpg

Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks

Felipe Oviedo, Zekun Ren, Shijing Sun, Charles Settens, Zhe Liu, Noor Titan Putri Hartono, Savitha Ramasamy, Brian L. De Cost, Siyu I. P. Tian, Giuseppe Romano, Aaron Gilad Kusne & Tonio Buonassisi 

X-ray diffraction (XRD) data acquisition and analysis is among the most time-consuming steps in the development cycle of novel thin-film materials. We propose a machine learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of thin-film XRD patterns. We overcome the scarce data problem intrinsic to novel materials development by coupling a supervised machine learning approach with a model-agnostic, physics-informed data augmentation strategy using simulated data from the Inorganic Crystal Structure Database (ICSD) and experimental data. As a test case, 115 thin-film metal-halides spanning three dimensionalities and seven space groups are synthesized and classified. After testing various algorithms, we develop and implement an all convolutional neural network, with cross-validated accuracies for dimensionality and space group classification of 93 and 89%, respectively. We propose average class activation maps, computed from a global average pooling layer, to allow high model interpretability by human experimentalists, elucidating the root causes of misclassification. Finally, we systematically evaluate the maximum XRD pattern step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be 0.16° 2θ, which enables an XRD pattern to be obtained and classified in 5.5 min or less.

20190702121309_51c42d.jpg

来源:zhishexueshuquan 知社学术圈

原文链接:http://mp.weixin.qq.com/s?__biz=MzIwMjk1OTc2MA==&mid=2247498097&idx=4&sn=c94cd7bc5019296f6a88db14a03c547e&chksm=96d4078ea1a38e98fd991763ef84afb89daa56819a03daeb8959e62e83d149658f827d870704&scene=27#wechat_redirect

版权声明:除非特别注明,本站所载内容来源于互联网、微信公众号等公开渠道,不代表本站观点,仅供参考、交流、公益传播之目的。转载的稿件版权归原作者或机构所有,如有侵权,请联系删除。

电话:(010)86409582

邮箱:kejie@scimall.org.cn

大数据 分类数据 深度神经网络 data xrd

推荐资讯