I am an incoming Ph.D. student at Carnegie Mellon University (Fall 2019). My research focuses on data mining algorithms and applications, specifically on outlier & anomaly detection, ensemble learning, and clustering.

I am an enthusiastic open-source developer: I build machine learning libraries and systems. Specifically, I initialized Python Outlier Detection library (PyOD) project in 2018, which has become the most popular Python outlier detection toolkit. I also initialized combo: A python Toolbox for ML Combination Methods in July 2019--it is currently under active development. Watch/Star/Follow welcome!

I am a dedicated technical writer with 75,000 followers on Zhihu (知乎), which is also known as Chinese Quora with more them 200 million registered users. I have been recognized as a Top Zhihu Writer (优秀回答者) in four fields (AI, ML, DM, and STAT). See my Zhihu page here.

I obtained my Master and Bachelor degrees from University of Toronto (computer science) and University of Cincinnati (computer engineering, minors in mathematics and computer science) in 2017 and 2015, respectively. I also have three happy years (2007-2009) at Shanxi Experimental Secondary School (山西省实验中学).

I was a senior consultant at PwC Canada from 2017 to 2019; I delivered multiple large-scale technology transformation projects to major Canadian financial institutions. Besides, I finished 16 month software engineering internship at Siemens PLM Software (USA) from 2012 to 2014.

本站使用Google Sites创建,中国内地访客需使用VPN来获取图片、链接、PDF文档,不便请见谅。This site is built with Google Sites; the visitors (IPs) from Mainland China need VPN to access figures, hyperlinks, and PDF files. Sorry for any inconvenience this may cause.

Contact: zhaoy@cmu.edu | GitHub | LinkedIn

News & Travel

July 2019: I initialized a new Python toolbox called combo for the easy use of combination methods in machine learning.

Jun & Jul 2019: I am taking vacation in China with limited bandwidth :)

May 27th, 2019: Our paper on anomaly detection tool, PyOD, is published in Journal of Machine Learning Research (JMLR).

Mar 7th, 2019: Our paper on music artist classification with deep net is accepted at International Joint Conference on Neural Networks (IJCNN).

Feb 22nd, 2019: I received and accepted a Ph.D. offer from Carnegie Mellon University. See you in Pittsburgh soon :)

Dec 21st, 2018: Our paper on outlier ensemble, LSCP, is accepted at SIAM International Conference on Data Mining (SDM).

Research Interests

Data mining and knowledge discovery algorithms, systems, applications, and their implications to decision process and policy-making. Specifically, my interests are:

  1. proposing fundamental algorithms to tackle complex problems, including anomaly detection, ensemble learning, and clustering.
  2. designing scalable machine learning systems with performance optimization instruments, e.g., parallelization and JIT.
  3. marrying data mining with other areas (such as healthcare and finance) to build applications and understand their implications to decision making, public policy, social welfare, and society as a whole.

Research keywords: outlier & anomaly detection, outlier ensembles, ensemble learning, ML systems, clustering

I always wonder "how to use emerging data techniques to enhance existing decision and policy making processes ." For example, anomaly detection algorithms can reveal how pharmacies are taking advantage of the healthcare system by filing fraudulent claims, and a targeted policy can therefore be designed and imposed by using the identified data patterns.