News featuring Naren Ramakrishnan

Research award aims to develop new algorithms for information extraction and understanding from scholarly literature

Naren Ramakrishnan, Director of DAC and Professor in the Department of Computer Science

The Discovery Analytics Center has received a research award from the Center for Security and Emerging Technology (CSET) at Georgetown University to support data-informed analysis for policymakers  concerning emerging technologies and their security implications. DAC will develop methods to extract novel insights at scale from full-text analytics of publications to better understand emerging technologies and their prevalence, spatial and temporal trends, and relationships.

“Algorithmic components developed by DAC will go into a high-performance pipeline that enables inspection of extracted patterns as well as the lineage of data transformations underlying the patterns,” said Naren Ramakrishnan, the Thomas L. Phillips Professor of Engineering and DAC director, who is the principal investigator for the project.

Ramakrishnan’s team at DAC — which includes senior research associate Patrick Butler; research associate Brian Mayer; and three Ph.D. students — will develop a machine learning framework based on weak supervision to process full-text AI publications into extracted structured fields, such as information on computational platforms utilized, language and library dependencies, compute time, research methods, objective tasks, and links to source code and data resources.

The initial focus will be on arXiv as researchers evaluate and assess progress followed by extraction from China National Knowledge Infrastructure (CNKI) literature, which provides full-text articles from more than 8,000 Chinese journals covering natural sciences, engineering, technology, agriculture, medicine, and selected topics in economics and social sciences.

This project is providing DAC with the opportunity to build on its prior work in extracting information from news articles about civil unrest events.  It will also be informed by DAC’s experience with automated extraction of epidemiological line lists from disease reports, which is used to develop custom word embeddings aimed at recognizing the typical language patterns in how computational details are described in the scholarly literature.

“This project brings together machine learning, computational linguistics, and human-computer interaction capabilities to extract features at scale. The information we extract will be mapped over time to help identify key trends and potential gaps that can support analysts and policy makers at the CSET,” said Ramakrishnan.

“We are looking forward to seeing how this innovative work can help inform CSET’s analysis as we strive to inform the future of AI policy,” said Dewey Murdick, director of Data Science at CSET.




Discovery Analytics Center study sheds light on what turns a peaceful protest into a violent one

Protest in Brazil

Protests are an increasingly common occurrence, but only a small percentage of them turn violent. In a collaborative study led by the Discovery Analytics Center with the University of California, San Diego, and George Mason University, a team of researchers set out to uncover triggers that foretell violence by crowds.

Gathering data from thousands of online news sources in five Latin American countries — Argentina, Brazil, Colombia, Paraguay, and Venezuela — the researchers used the characteristics of past events to develop new methods that forecast the occurrence of violent crowd behavior in advance.

“Crowd violence is not generally initiated by one factor but, often, is a culmination of outrage over a stream of preceding unresolved public issues or events,” said Yue Ning, lead author of the study, who was a Ph.D. graduate in computer science from the Discovery Analytics Center at the time of the study and now an assistant professor at Stevens Institute of Technology. “Our study showed that before a violent protest in any of these countries, other protests and strike events, even if peaceful, occurred during the prior week.”

“The fact that violent protest can be modeled before it happens is an important finding of the study,” said David Mares, Institute of the Americas Chair for Inter-American Affairs and professor of political science at the University of California, San Diego. “The link between the act of protesting and violent behavior in a protest has been difficult to understand because so many factors are operating at the same time. Our model gives us confidence that it will be possible to develop a better understanding of the factors that transform peaceful protest into violent confrontations.”

The study was designed to give governments, law enforcement, and community organizations insights that can help them support the right to peaceful gatherings, mitigate the level of frustration and anger that people who have been in many recent protests experience, and reduce the risk of violence.

“Being able to forecast violent events can help policymakers make better decisions about how to deal with protests,” said Naren Ramakrishnan, the Thomas L. Phillips Professor of Engineering in the Department of Computer Science and director of the Discovery Analytics Center. “And understanding triggers is important because any effort to decrease the probability of a violent gathering without understanding the dynamics that differentiate violent from non-violent events can lead to measures that have the opposite effect.”

For example, he said, a significant show of force with police or the military at the first sign of protest can intimidate and frustrate protesters rather than make them feel protected. If such intimidation and frustration build into anger, the likelihood of violence increases during the next such gathering.

Huzefa Rangwala, professor of computer science at George Mason University, said the study also showed that events can be influenced by what is happening in different locations. “One might have thought that people would be most affected by what happens locally, but our data suggests that those protesters prone to violence reflect upon national and not just local experiences when voicing grievances and increasing frustrations that lead to violence.”

In addition to Ning, Mares, Ramakrishnan, and Rangwala, the research team included Sathappan Muthiah, a Ph.D. student in the Discovery Analytics Center majoring in computer science.

Read the full study, “When do Crowds turn Violent? Uncovering Triggers from Media.”




DAC and UrbComp actively participating at KDD 2018 with conference organization and research presentations

KDD Logo

The Discovery Analytics Center and the Urban Computing Certificate Program (funded through a National Science Foundation traineeship grant and administered through DAC) will be well represented at the 24th Annual  Association for Computing Machinery Special Interest Knowledge Discovery and Data Mining (KDD 2018) conference in London, August 19-23.

The overall theme of this year’s conference is data mining for social good.

Chandan Reddy, associate professor of computer science and DAC faculty, served as a poster co-chair for the KDD conference.

Naren Ramakrishnan, the Thomas L. Phillips Professor of Engineering and DAC director, served on the senior program committee for the KDD research track.

Aditya Prakash, assistant professor of computer science and DAC faculty, served on the committee for Health Day at KDD, held in conjunction with the conference, and is one of four organizers for epiDAMIK: Epidemiology meets Data Mining and Knowledge discovery, a Health Day workshop.

This workshop serves as a forum to discuss new insights into how data mining can play a bigger role in epidemiology and public health research. While the integration of data science methods into epidemiology has significant potential, it remains understudied, Prakash said.

The goal of the workshop is to raise the profile of this emerging research area of data-driven and computational epidemiology and create a venue for presenting state-of-the-art and in-progress results — in particular, results that would otherwise be difficult to present at a major data mining conference, including lessons learned in the “trenches.”

The paper, “Forecasting the Flu: Designing Social Network Sensors for Epidemics,” (B. Aditya Prakash; Naren Ramakrishnan; Huijuan Shao, K.S.M. Tozammel Hossain and Hao Wu, all DAC Ph.D. alumni; Madhav Marathe, professor of computer science and director of the Network Dynamics and Simulation Science Lab (NDSSL) at Virginia Tech; Anil Vullikanti, associate professor of computer science at NDSSL and Maleq Khan, assistant professor at Texas A&M University) will be presented at the epiDAMIK workshop by Prakash and Vullikanti.

An Urban Computing workshop is also scheduled in conjunction with KDD2018. The objective of this workshop is to provide professionals, researchers, and technologists with a single forum where they can discuss and share the state-of-the-art of the development and applications related to urban computing, present their ideas and contributions, and set future directions in innovative research for urban computing. It is particularly targeted to people who are interested in sensing/mining/understanding urban data so as to tackle challenges in cities and help better formulate the future of cities.

The following posters from DAC have been accepted for presentation at the workshop:

Additionally, a DAC alumnus, Prithwish Chakraborty, is running a third workshop taking place during the conference, Machine Learning for Medicine and Healthcare (MLMH).

DAC Student Spotlight: Yue Ning

Yue Ning, DAC Ph.D. student in computer science

“Working in data science and machine learning is exciting, but it is even more exciting when science helps us solve real-world challenges,” said Yue Ning, a Ph.D. student in the computer science department.

The opportunity to be involved in high impact research drew Ning to Virginia Tech and DAC. “I am fortunate and honored to be working with Dr. Naren Ramakrishnan, who is one of the leading researchers in data analytics and applied machine learning,” she said.

Ning’s interest in computer science evolved from her love of math and puzzles in elementary school.

“When I first discovered the computer, I was attracted to the beauty of its processing power and multiple fascinating functions. Without a doubt, I chose to study computer software when I enrolled in college,” Ning said. “And that is when social media really took off.”

Since then, she said, the world has become more and more connected, generating accessible data at massive scales. Data-driven models are motivated by, and have contributed to, many domains including social informatics, security, games and health.

“I believe in data and find myself especially interested in data-driven machine learning and AI applications. The area has provided tons of opportunities for computer scientists to explore with the help of innovative algorithms. I am always excited to learn cutting-edge theories, models, and applications in this big data era,” Ning said.

Her research focuses on applying machine learning algorithms to solve real world problems such as forecasting societal events as well as predicting users’ behaviors in online services. Ning’s thesis is about discovering precursors for the use in event modeling and forecasting. A key problem of interest to social scientists and policy makers is modeling and forecasting large-scale societal events such as civil unrest, disease outbreaks, and turmoil in economic markets. Forecasting algorithms are expected not only to make accurate predictions, but also to provide insights into causative attributes that influence an event’s evolution.

“With the machine learning paradigm known as multi-instance learning I have been studying and developing frameworks that discover event precursors,” said Ning. “Using large-scale distributed representations of news articles and multi-task learning, I can demonstrate how this framework can provide clues into the spatio-temporal progression of events.”

Ning, who received a master’s degree in computer science and applications from the Graduate University of Chinese Academy of Sciences is expecting to graduate in summer 2018 and join the Department of Computer Science at Stevens Institute of Technology as an assistant professor in the fall.

Among other accomplishments while a Ph.D. student, earlier this year, Ning received a Student Travel Award to attend the SIAM International Conference on Data Mining; was invited to serve on the program committee for the Advances in Social Networks Analysis and Mining (ASONAM) conference; and had a paper accepted by the ACM Transactions on Knowledge Discovery from Data (TKDD).

DAC has strong presence at ICDM 2017

DAC Ph.D. student, Zhiqian Chen, presenting his paper at ICDM 2017.

The Discovery Analytics Center was strongly represented at the IEEE International Conference on Data Mining (ICDM) in New Orleans, Nov. 18-21, with a number of accepted research papers by DAC faculty and students and DAC faculty serving on committees and panels.

Research papers accepted for the conference include:

DAC faculty participation in the ICDM Conference included Chang-Tien Lu serving on the program committee and Naren Ramakrishnan serving as an area chair. Ramakrishnan also co-chaired a panel focusing on ethical and professional issues when dealing with social data with Tanushree (Tanu) Mitra, assistant professor of computer science, as one of the panelists. B. Aditya Prakash was invited to participate as a mentor in the ICDM Ph.D. Forum.

The ICDM has established itself as the world’s premier research conference in data mining. It provides an international forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of data mining, including algorithms, software and systems, and applications. ICDM draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high-performance computing. By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference features workshops, tutorials, panels.