Tian Shi, DAC Ph.D. student in computer science

When Chandan Reddy, associate professor in computer science, joined the DAC faculty in the National Capital Region in August 2016, one of his Ph.D. students, Tian Shi, moved right along with him.

“I feel very lucky to be Dr. Reddy’s student. He has helped me very much in both my research and life,” said Shi.

A Ph.D. in computer science will be the second Ph.D. for Shi.  His first, from Wayne State, is in physical chemistry.

Shi’s research was in theoretical and computational chemistry built upon quantum mechanics, statistical physics, and ab initio calculations. Various projects led him to computer science, where he found an interest in data mining, machine learning, and data visualization.

“There are many opportunities in this interdisciplinary area, such as applying machine learning to traditional computational chemistry,” said Shi. “During my Ph.D. studies in computer science I will focus on my research projects in text mining and will be trying to apply what I have learned in physical chemistry to data mining.”

Shi is interested in developing new algorithms to discover knowledge from text data. One of his current research projects involves topic modeling, a powerful tool in discovering hidden semantic structures from a collection of text documents.

“Every day, large numbers of short texts are generated, such as tweets, search queries, questions, image tags, and ad keywords and they play an important role in our daily lives,” said Shi. Discovering knowledge from them is an interesting and challenging research focus because short texts consist of only a few words and they are arbitrary, noisy, and ambiguous.”

More conventional methods are designed to discover topics from long documents but have some difficulty in capturing semantics for short text due to the lack of abundant word correlations, Shi said.

The non-negative matrix factorization based algorithm he proposes in his research tries to tackle this problem by leveraging a recently advanced word embedding technique. The proposed models have achieved significant improvement in quality over conventional methods in terms of word coherence and document representation. A paper he collaborated on about this research has been accepted by WWW 2018 conference in Lyon, France, next week.

“I have benefited greatly from Dr. Reddy, who guided me to this area of research and shared a lot of his knowledge with me,” said Shi. “I have also benefited from discussions with my colleagues, and from group meetings and seminars. All have helped me gain comprehensive knowledge and deeper understanding of the research areas I am interested in.”