About me

Mingjie Tang is a member of technical stuff at Hortonworks,Inc. He is working on Spark Sql, Spark MLlib and Spark Streaming. He has broad research interest in database management system, similarity query processing, data indexing, big data computation, data mining and machine learning.

He won his PhD degree from Computer Science Department of Purdue University, West Lafayette, Indiana. He was advised by Professor Walid G. Aref. During his PhD study, he built a distributed in-memory spatial data processing system over Spark, more details can be found from our project webpage LocationSpark. He also worked with Yongyang to build a distributed in-memory matrix computation over Spark. First version is avaliable download from project website MatCatalyzer.


2010.9 - 2016.9 PhD of Computer Science,
Purdue University, IN, USA
2007.9 - 2010.7 Master of Computer Science
University of Chinese Academy of Sciences, Beijing, China
2003.8 - 2007.7 Bachelor of Science, Department of Computer Science
Sichuan University, Chengdu, China

Industry Experience

2015.5 - 2015.8 Research Intern IBM research Almaden, CA, USA
2012.5 - 2012.8 Software Engineer Intern Microsoft , Seattle, USA

Publication (Selected)

Google Scholar, DBLP, Full list

  • In-memory Distributed Matrix Computation Processing and Optimization [code]
    Yongyang Yu, Mingjie Tang, Walid Aref, Qutaibah Malluhi, Mostafa Abbas and Mourad Ouzzani
    in 33rd IEEE International Conference on Data Engineering (ICDE) 2017
  • LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data (Demo) [code]
    Mingjie Tang, Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in 42th International Conference on Very Large Data Bases (VLDB) 2016
  • Atlas: On the Expression of Spatial-Keyword Group Queries Using Extended Relational Constructs (Systems Paper)
    Walid Aref, Ahmed Mahmood, Ahmed Aly, Mingjie Tang
    in 24th International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2016
  • Cruncher: Distributed In-Memory Processing for Location-Based Services [Demo]
    Ahmed S. Abdelhamid, Mingjie Tang, Ahmed M. Aly, Ahmed R. Mahmood, Walid G. Aref,Saleh Basalamah
    in 32nd IEEE International Conference on Data Engineering(ICDE)2016
  • Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce [pdf] [ppt] [code]
    Mingjie Tang, Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in 18th International Conference on Extending Database Technology (EDBT) 2015
  • The Similarity-aware Relational Intersect Database Operator  [pdf]
    Wadha J. Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang and Walid G. Aref
    in 7th International Conference on Similarity Search and Applications (SISAP) 2014 Best Paper Award
    full journal version Information Systems 2015 [paper]
  • Similarity Group-by Operators for Multi-dimensional Relational Data  [pdf] [code]
    Mingjie Tang, Ruby Y. Tahboub, Walid G. Aref, Mikhail Atallah, Qutaibah Malluhi, Mourad Ouzzani, Yasin. Siva 
    in IEEE Transactions on Knowledge and Data Engineering(TKDE) 2015
  • Bird Flu Outbreak Prediction via Satellite Tracking [pdf]
    YuanChun Zhou, Mingjie Tang,Weike Pan,Jinyan Li, Weihang Wang, Jing Shao, Liang Wu, Jianhui Li, Qiang Yang, BaoPing Yan
    in IEEE Intelligent Systems(IS) 2013
  • Exploring the Wild bird's Role as Potential Vector for H5N1 transmission by Clustering and Association Analysis, [pdf]
    Mingjie Tang, YuanChun Zhou, Weihang Wang ,Peng Cui, Jinyan Li, Yuan- Sheng Hou, BaoPing Yan
    in Knowledge and Information Systems(KAIS) 2010
  • Birds Bring Flues? Mining Frequent and High Weighted Cliques from Birds Migration Networks [pdf]
    Mingjie Tang, Weihang Wang ,Yexi Jiang, YuanChun Zhou, Jinyan Li, Peng Cui,Ying Liu, BaoPing Yan
    in 15th the Database Systems for Advanced Applications (DASFAA) 2010
  • Discovery of Migration Habitats and Routes of Wild Bird Species by Clustering and Association Analysis [pdf]
    Mingjie Tang, Weihang Wang , YuanChun Zhou, Yexi Jiang, Jinyan Li, Peng Cui, Yuan- Sheng Hou, BaoPing Yan,
    in 5th International Conference on Advanced Data Mining and Applications (ADMA) 2009 Best Application Paper Award.