I am an incoming Ph.D. student at Carnegie Mellon University (Fall 2019). My research focuses on data mining algorithms and applications, specifically on outlier & anomaly detection, ensemble learning, and clustering. I am also an enthusiastic open-source developer: I build machine learning libraries and systems. For instance, I initialized Python Outlier Detection library (PyOD) in 2018, which has become the most popular Python detection toolkit.

I obtained my Master and Bachelor degrees from University of Toronto (computer science) and University of Cincinnati (computer engineering, minors in mathematics and computer science) in 2017 and 2015, respectively.

I was as a senior consultant at PwC Canada from 2017 to 2019; I delivered multiple large-scale technology transformation projects to major financial institutions in Canada. Besides, I finished 16 month software engineering internship at Siemens PLM Software (USA) from 2012 to 2014.

本站使用Google Sites创建,中国内地访客需使用VPN来获取图片、链接、PDF文档,不便请见谅。This site is built with Google Sites; the visitors (IPs) from Mainland China need VPN to access figures, hyperlinks, and PDF files. Sorry for any inconvenience this may cause.

Contact: zhaoy@cmu.edu | GitHub | LinkedIn

News

May 27th, 2019: Our paper on anomaly detection tool PyOD is published in Journal of Machine Learning Research (JMLR).

Mar 7th, 2019: Our paper on music artist classification with deep net is accepted at International Joint Conference on Neural Networks (IJCNN).

Feb 22nd, 2019: I received and accepted a Ph.D. offer from Carnegie Mellon University. See you in Pittsburgh soon :)

Dec 21st, 2018: Our paper on outlier ensemble, LSCP, is accepted at SIAM International Conference on Data Mining (SDM).

Research Interests

Data mining and knowledge discovery algorithms, systems, applications, and their implications to decision process and policy-making. Specifically, my interests are:

  1. proposing fundamental algorithms to tackle complex problems, including anomaly detection, ensemble learning, and clustering.
  2. designing scalable machine learning systems with performance optimization instruments, e.g., parallelization and JIT.
  3. marrying data mining with other areas (such as healthcare and finance) to build applications and understand their implications to decision making, public policy, social welfare, and society as a whole.

Research keywords: outlier & anomaly detection, outlier ensembles, ensemble learning, ML systems, clustering

I always wonder "how to use emerging data techniques to enhance existing decision and policy making processes ." For example, anomaly detection algorithms can reveal how pharmacies are taking advantage of the healthcare system by filing fraudulent claims, and a targeted policy can therefore be designed and imposed by using the identified data patterns.