I am an incoming Ph.D. student at Carnegie Mellon University (Fall 2019). My research focuses on data mining algorithms and applications, specifically on outlier & anomaly detection, ensemble learning, and active learning. I am also an enthusiastic open-source developer: I BUILD machine learning libraries and systems.

I obtained my Master and Bachelor degrees from University of Toronto (computer science) and University of Cincinnati (computer engineering, minors in mathematics and computer science) in 2017 and 2015, respectively. I was as a senior technology consultant at PwC Canada from 2017 to 2019; I delivered multiple large-scale technology transformation projects to major financial institutions in Canada. Besides, I finished 16 month software engineering internship at Siemens PLM Software (USA) from 2012 to 2014.

本站使用Google Sites创建,中国内地访客需使用VPN来获取图片、链接、PDF文档,不便请见谅。This site is built with Google Sites; the visitors (IPs) from Mainland China need VPN to access figures, hyperlinks, and PDF files. Sorry for any inconvenience this may cause.

Contact: zhaoy@cmu.edu | GitHub | LinkedIn

Research Interests

Data mining and knowledge discovery algorithms, systems, applications, and their implications to decision process and policy-making. Specifically, my interests are:

  1. proposing fundamental algorithms to tackle complex problems, including anomaly detection, ensemble learning, and clustering.
  2. designing scalable machine learning systems with performance optimization instruments, e.g., parallelization and JIT.
  3. marrying data mining with other areas (such as healthcare and finance) to build applications and understand their implications to decision making, public policy, social welfare, and society as a whole.

Research keywords: outlier & anomaly detection, outlier ensembles, ensemble learning, ML systems

I always wonder "how to use emerging data techniques to enhance existing decision and policy making processes ." For example, anomaly detection algorithms can reveal how pharmacies are taking advantage of the healthcare system by filing fraudulent claims, and a targeted policy can therefore be designed and imposed by using the identified data patterns.