About me

Mingjie Tang is working on Ant Financial for the AUTO ML system and algorithm. Previously, he was a member of technical staff at Hortonworks,Inc. He has broad research interest in RDBMS, distributed machine learning system, big data computation engine, and distributed deep learning system.

He won his PhD degree from Computer Science Department of Purdue University, West Lafayette, Indiana. He was advised by Professor Walid G. Aref. During his PhD study, he was working n distributed system for spatial computation, machine learning and AI

Education

2010.9 - 2016.9 PhD of Computer Science,
Purdue University, IN, USA
2010.9 - 2012.12 Master of Computer Science,
Purdue University, IN, USA
2007.9 - 2010.7 Master of Computer Science
University of Chinese Academy of Sciences, Beijing, China
2003.8 - 2007.7 Bachelor of Science, Department of Computer Science
Sichuan University, Chengdu, China

Industry Experience

2018.10 - present Software Engineer Ant Financial, CA, USA
2016.9 - 2018.10 Member of tech staff Hortonworks, CA, USA
2015.5 - 2015.8 Research Intern IBM research Almaden, CA, USA
2012.5 - 2012.8 Software Engineer Intern Microsoft , Seattle, USA

Publication (Selected)

Google Scholar, DBLP, Full list

  • A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning System.
    Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Yuanchun Zhou, Mingjie Tang, Jianguo Wang
    Demo in Proceedings of ACM Conference on Management of Data (SIGMOD), 2024.
  • GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization.
    Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, Jianguo Wang
    in Proceedings of Very Large Data Bases Conference (VLDB), 2024.
  • Couler: Unified Machine Learning Workflow Optimization in Cloud
    Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang,Jingji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang
    in 40th IEEE International Conference on Data Engineering (ICDE) 2024
  • Cougar: A General Framework for Jobs Optimization In Cloud
    Bo Sang, Shuwei Gu, Xiaojun Zhan, Mingjie Tang, Jian Liu, Xuan Chen, Jie Tan, Haoyuan Ge, Ke Zhang, Ruoyi Ruan, Wei Yan
    in 39th IEEE International Conference on Data Engineering (ICDE) 2023
  • STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data
    Guizhen Wang, Jingjing Guo, Mingjie Tang, José Florencio de Queiroz Neto, Calvin Yau, Anas Daghistani, Morteza Karimzadeh, Walid G. Aref, and David S. Ebert.
    in IEEE Conference on Visual Analytics Science and Technology (VAST 2020)
  • LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
    Mingjie Tang, Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in Frontiers in Big Data, section Data Mining and Management 2020
  • A Natural-language-based Visual Query Approach of Uncertain Human Trajectories
    Zhaosong Huang, Ye Zhao, Wei Chen, Shengjie Gao , Kejie Yu, Weixia Xu, Mingjie Tang, Minfeng Zhu, Mingliang Xu
    in IEEE Transactions on Visualization and Computer Graphics (TVCG 2019)
  • SAC: A System for Big Data Lineage Tracking [code]
    Mingjie Tang, Saisai Shao, Weiqing Yang, Yanbo Liang, Yongyang Yu, Bikas Saha, Dongjoon Hyun
    in 35th IEEE International Conference on Data Engineering (ICDE) 2019
  • Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Mingjie Tang,Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in 35th IEEE International Conference on Data Engineering (ICDE) 2019
  • Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster
    Ahmed R. Mahmood, Anas Daghistani, Ahmed M. Aly, Walid G. Aref, Mingjie Tang, Saleh M. Basalamah, Sunil Prabhakar
    in 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2018
  • Optimizing Generalized Linear Models with Billions of Variables [code]
    Yanbo Liang, Yongyang Yu, Mingjie Tang, Weiqing Yang, Weichen Xu, Chaozhuo Li and Ruifeng Zheng
    in ACM International Conference on Information and Knowledge Management(CIKM) 2018
  • Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Mingjie Tang,Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in IEEE Transactions on Knowledge and Data Engineering(TKDE) 2018
  • SHC: Distributed Query Processing for Non-Relational Data Store [code]
    Weiqing Yang*, Mingjie Tang*, Yongyang Yu, Yanbo Liang, Bikas Saha
    in 34th IEEE International Conference on Data Engineering (ICDE) 2018
    * both are the leading authors
  • COACT: a query interface language for collaborative databases [link]
    K Mershad, QM Malluhi, M Ouzzani, Mingjie Tang, M Gribskov, WG Aref, Deo Prakash
    in Distributed and Parallel Databases 2017
  • AUDIT: approving and tracking updates with dependencies in collaborative databases [link]
    K Mershad, QM Malluhi, M Ouzzani, Mingjie Tang, M Gribskov, WG Aref
    in Distributed and Parallel Databases 2017
  • In-memory Distributed Matrix Computation Processing and Optimization [code]
    Yongyang Yu, Mingjie Tang, Walid Aref, Qutaibah Malluhi, Mostafa Abbas and Mourad Ouzzani
    in 33rd IEEE International Conference on Data Engineering (ICDE) 2017
  • LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data [code]
    Mingjie Tang, Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in 42th International Conference on Very Large Data Bases (VLDB) 2016
  • Atlas: On the Expression of Spatial-Keyword Group Queries Using Extended Relational Constructs (Systems Paper)
    Walid Aref, Ahmed Mahmood, Ahmed Aly, Mingjie Tang
    in 24th International Conference on Advances in Geographic Information Systems (SIGSPATIAL) 2016
  • Cruncher: Distributed In-Memory Processing for Location-Based Services
    Ahmed S. Abdelhamid, Mingjie Tang, Ahmed M. Aly, Ahmed R. Mahmood, Walid G. Aref,Saleh Basalamah
    in 32nd IEEE International Conference on Data Engineering(ICDE)2016
  • Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce [pdf] [ppt] [code]
    Mingjie Tang, Yongyang Yu, Walid G. Aref, Qutaibah Malluhi, and Mourad Ouzzani
    in 18th International Conference on Extending Database Technology (EDBT) 2015
  • The Similarity-aware Relational Intersect Database Operator  [pdf]
    Wadha J. Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang and Walid G. Aref
    in 7th International Conference on Similarity Search and Applications (SISAP) 2014 Best Paper Award
    full journal version Information Systems 2015 [paper]
  • Similarity Group-by Operators for Multi-dimensional Relational Data  [pdf] [code]
    Mingjie Tang, Ruby Y. Tahboub, Walid G. Aref, Mikhail Atallah, Qutaibah Malluhi, Mourad Ouzzani, Yasin. Siva 
    in IEEE Transactions on Knowledge and Data Engineering(TKDE) 2015
  • Bird Flu Outbreak Prediction via Satellite Tracking [pdf]
    YuanChun Zhou, Mingjie Tang,Weike Pan,Jinyan Li, Weihang Wang, Jing Shao, Liang Wu, Jianhui Li, Qiang Yang, BaoPing Yan
    in IEEE Intelligent Systems(IS) 2013
  • Exploring the Wild bird's Role as Potential Vector for H5N1 transmission by Clustering and Association Analysis, [pdf]
    Mingjie Tang, YuanChun Zhou, Weihang Wang ,Peng Cui, Jinyan Li, Yuan- Sheng Hou, BaoPing Yan
    in Knowledge and Information Systems(KAIS) 2010
  • Birds Bring Flues? Mining Frequent and High Weighted Cliques from Birds Migration Networks [pdf]
    Mingjie Tang, Weihang Wang ,Yexi Jiang, YuanChun Zhou, Jinyan Li, Peng Cui,Ying Liu, BaoPing Yan
    in 15th the Database Systems for Advanced Applications (DASFAA) 2010
  • Discovery of Migration Habitats and Routes of Wild Bird Species by Clustering and Association Analysis [pdf]
    Mingjie Tang, Weihang Wang , YuanChun Zhou, Yexi Jiang, Jinyan Li, Peng Cui, Yuan- Sheng Hou, BaoPing Yan,
    in 5th International Conference on Advanced Data Mining and Applications (ADMA) 2009 Best Application Paper Award.