Sanghani Center Student Spotlight: Shengzhe Xu

Graphic is from the paper “STAN: Synthetic Network Traffic Generation with Generative Neural Models”

Shengzhe Xu chose to pursue a Ph.D. in computer science at Virginia Tech because the Sanghani Center offered him the opportunity to investigate cutting-edge challenges of academic importance and find ways of applying these methodologies to tackle real-world problems.

“What I like best about the center is that everyone is encouraged to pursue their own areas of interest,” said Xu, who is advised by the center’s director, Naren Ramakrishnan. “As students in this free scientific research environment, we just need to concentrate on improving ourselves and conduct in-depth research on the topics we choose.” 

Xu’s work explores semantic analysis of tabular data as well as synthetic tabular data generation. “A real-world example of this is network traffic data,” he said. “Every operation on the Internet is recorded like a footprint that we can model by using deep learning methods.”

But capturing the semantics of tabular data is a challenging problem. Unlike traditional natural language processing and computer vision fields, the overall portrait of tabular data is difficult for humans — even if they are domain experts — to judge because it has complex dependencies that need to explored in depth.

“Deep learning models have achieved great success in recent years but progress in some domains like cybersecurity is stymied due to a paucity of realistic datasets. For privacy reasons, organizations are reluctant to share such data, even internally,” he said. “In order to protect the privacy of training data from being leaked, it is important to explore how to generate good enough tabular data in terms of both training performance and privacy protection.”

Xu presented his work on “STAN: Synthetic Network Traffic Generation with Generative Neural Models” at the MLHat Workshop on Deployable Machine Learning for Security Defense during the 2021 SIGKDD Conference on Knowledge Discovery and Data Mining. The paper explored synthetic data generation in real-world network traffic flow data to protect any sensitive data from data leakage. 

Projected to graduate in 2024, Xu hopes to continue his research as an industry professional.


Sanghani Center Student Spotlight: Afrina Tabassum

Graphic is from the paper “Hard Negative Sampling Strategies for Contrastive Representation Learning”

Afrina Tabassum, a Ph.D. student in computer science, was attracted to the Sanghani Center by the trending research conducted by faculty for improving machine learning algorithms and their application to other fields.

Her research interests lie in machine learning and self-supervised learning, particularly designing novel representation learning objectives for multi-modal data. “I was really attracted to this area of research by an urge to use deep learning in order to make people’s lives easier,” she said.

One of the projects Tabassum is working on at the Sanghani Center is “Hard Negative Sampling Strategies for Contrastive Representation Learning,” a collaboration with her advisors, Hoda Eldardiry and Ismini Lourentzou and a fellow Ph.D. student.

Their paper introduces Uncertainty and Representativeness Mixing (UnReMix) for contrastive training, a method that combines importance scores that capture model uncertainty, representativeness, and anchor similarity. 

“We verify our method on several visual, text and graph benchmark datasets and perform comparisons over strong contrastive baselines,” said Tabassum, “and to the best of our knowledge, we are the first to consider representativeness for hard negative sampling in contrastive learning in a computationally inexpensive way.”

Experimental and qualitative results so far have demonstrated the effectiveness of their proposed approach, she said.

Tabassum is also part of a team from Lourentzou’s PLAN Lab which is competing in the Alexa Prize Taskbot Challenge 2.

“Ten teams across the world were selected to build a taskbot to assist in cooking and performing other tasks around the house. Our bot will be able to make adaptable conversation a reality by allowing customers to follow personalized decisions through the completion of multiple sequential subtasks and adapt to the tools, materials, or ingredients available to the user by proposing appropriate substitutes and alternatives,” she said.

In addition to working on adapting instructions according to the user needs, she is serving as student team leader with responsibilities that include setting clear team goals and short-term deadlines and delegating tasks among all the team members. 

Projected to graduate in 2024, Tabassum would like to pursue a career in industry research.


Dawei Zhou receives Cisco Faculty Research Award to help combat destructive insider threats to cybersecurity

Dawei Zhou

Insider threats to cybersecurity can occur when an actor with authorized access to an organization’s network conducts malicious activities that may release the organization’s critical information that further results in severe consequences such as financial loss, system crashes, and national security challenges.

“These threats are on the rise and according to a recent cyber security survey, 27 percent of cybercrime incidents involved insiders,” said Dawei Zhou, an assistant professor in the Department of Computer Science; director of the VirginiaTech Learning on Graphs (VLOG) Lab and core faculty at the Sanghani Center for Artificial Intelligence and Data Analytics.

One of Zhou’s projects, “Combating Insider Threat: Identification, Monitoring, and Data Augmentation,” targets the challenging problem of how to combat insider threats. He recently received a 2023-2024 Cisco Faculty Research Award that will help support this research.

Zhou said his project uses multiple dynamic and heterogeneous data sources that include internal system logs, employee networks, and email exchange networks.

“Distinctly from other types of terror attacks, insider threats exhibit several unique challenges like  rarity, non-separability, label scarcity, dynamicity, and heterogeneity, making it extremely difficult to catch them in time for a successful counter-attack,” said Zhou. 

He explains: Rarity means that the absolute number of such insiders is extremely small, especially compared with the total number of employees in a large organization or company; non-separability means that the insiders are very good at camouflaging themselves to make them indistinguishable from normal ones and thus able bypass the detection system; label scarcity means that the annotation process of insiders is labor-extensive and time-consuming; dynamicity refers to the time-evolving nature of the raw input data sources as well as the behaviors of insiders; and heterogeneity refers to the heterogeneous data coming from various sources and in various formats.  

“Although different insiders are often conscious and good at camouflaging themselves, they might share some common traits if examined under the proper lens” he said.

With this in mind, the project will try to combat insider threat via an interactive learning mechanism, building new theories and algorithms for the following learning tasks: 

  • Insider Identification: characterize the descriptive and essential properties of insiders and detect groups of insiders – such as traitors, masqueraders, and unintentional perpetrators — with common traits.

  • Insider Monitoring: track the evolution of insider behaviors over time and provide a visual system for analysis, annotation, and diagnosis.

  • Data Augmentation; sanitize input data by completing missing data and cleaning noisy data and generate synthetic insiders to alleviate the label scarcity issue. 

Computer science Ph.D. students Shuaicheng Zhang and Haohui Wang, who are advised by Zhou, will be working with him on the project. A third student, Weije Guan, will be joining the team in the Fall semester.

“We hope that the innovative approach we are taking will result in a better understanding of how to counterattack these threats and ultimately decrease the number of cybercrimes,” Zhou said. 


Virginia Tech researchers receive National Science Foundation award to secure vegetable production in a changing environment

The research team is developing climate-smart, economically efficient, and environmentally sustainable precision agricultural practices that enable more effective and adaptive decision-making as part of our nation’s agricultural priorities. Photo courtesy of USDA.

Virginia Tech researchers in the Center for Advanced Innovation in Agriculture (CAIA) and the Virginia Tech Applied Research Corporation(VT-ARC) were awarded a $750,000 grant by the National Science Foundation Convergence Accelerator program to enhance vegetable production and food security in the commonwealth.

The Sanghani Center for Artificial Intelligence and Data Analytics is a partner on this project. Read full story here.


Lenwood Heath collaborating on plant genome research project funded by National Science Foundation grant

Lenwood Heath

Lenwood Heath, a professor in the Department of Computer Science and core faculty at the Sanghani Center, is part of a team that recently received a National Science Foundation (NSF) grant for its plant genome research project, “Unraveling the origin of vegetative desiccation tolerance in vascular plants collaborators.” Heath is collaborating with colleagues from Texas Tech University and the University of Nevada, Reno on the study.

Excessive water loss is lethal for most plants, but a minority of plants (known as resurrection plants) have a remarkable ability to survive almost complete dryness, said Heath. This ability, known as desiccation tolerance, relies upon a combination of physiological, biochemical, and molecular responses that allow the plant to preserve cell integrity in the dry state.

“In the context of climate change,” Heath said, “we feel it is important to understand how plants respond to drying out and especially important to develop the science that will allow crops to better tolerate drought.”

“It is believed that this resurrection capability depends on genes that are in all plants but lost by most over evolutionary times,” Heath said. “The aim of our project is to discover the essential differences in genetic responses between resurrection plants and drought-sensitive plants so that crops can be re-engineered to be more drought tolerant.” 

In addition to sophisticated biological experiments to measure gene response in the two kinds of plants, the project will employ machine learning techniques, led by Heath, to construct gene regulatory networks (GRNs) for comparative study.  

The grant will provide learning and professional opportunities to graduate students and postdocs at the three universities. Jingyi Zhang, a Ph.D. computer science student advised by Heath, will work with him on the project.

Long-term goals for the project include promoting conservation programs for resurrection species; providing diverse scientific workforce training and outreach activities to first-generation students and the general public; and increasing public awareness about the importance of vegetative desiccation tolerance to future crop breeding in order to tackle the effects of climate change. 


Sanghani Center Student Spotlight: Raquib Bin Yousuf


Graphic is from the paper “Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn’t, and Future Directions”

Raquib Bin Yousuf, a Ph.D. student in computer science, is exploring the capabilities of large language models to generate text from different forms of data, especially from knowledge graphs. 

A knowledge graph, he said, can be a network with various entities and their relationships on any domain. Generating the correct and helpful narrative from the knowledge graphs is an important task for the user of that domain. 

“Although my research focus is on natural language processing, I have been fortunate while at the Sanghani Center to work in some other multidisciplinary domains as well,” he said. “The excellent and diverse work of the faculty is what attracted me to the center and the exposure I have had to real-world problems in these collaborative projects has helped me to learn more and conduct better research.”

Yousuf’s first exposure to his research area was through information retrieval projects from large scale text data during his undergraduate years. 

He has also worked on knowledge extraction projects under supervision of his advisor Naren Ramakrishnan, which have involved the application of natural language processing methods on large scale scholarly articles. 

“Recently there has been a pivotal innovation in NLP in the form of the Transformer model and subsequent development of large language models,” Yousuf said. “Today’s large language models can work well, across many tasks, with little to no help at all and that has motivated me to look deep into the working nature of these state of art models for real-world applications.” 

At the 2022 SIGKDD Conference on Knowledge Discovery and Data Mining last August in Washington, D.C., he presented “Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn’t, and Future Directions.” The paper explored the use of domain adapted Transformers models as building blocks to develop and deploy an automated End-to-end Research Entity Extractor, capable of extracting technical facets from full-text scholarly research articles of a large scale dataset.

Yousuf received a bachelor’s degree in computer science and engineering from Bangladesh University of Engineering and Technology (BUET) and a master’s degree in computer science from Virginia Tech.

Projected to graduate in 2025, he hopes to continue his research as an industry professional.

 


Danfeng ‘Daphne’ Yao, pioneer and expert in enterprise data security, elevated to IEEE fellow

Danfeng “Daphne” Yao

Danfeng “Daphne” Yao, professor in the Department of Computer Science and affiliate faculty at the Sanghani Center for Artificial Intelligence and Data Analytics at Virginia Tech, has been elevated to fellow, the highest grade of membership in the Institute of Electrical and Electronics Engineers (IEEE), for her contributions to enterprise data security and high-precision vulnerability screening. 

Following a rigorous evaluation procedure, fewer than 0.1 percent of voting members in the institute are selected annually for this career milestone. Read more here.


Sanghani Center Student Spotlight: Hoang Anh Just

Graphic is from the paper “LAVA: Data Valuation Without Pre-Specified Learning Algorithms” 

Hoang Anh Just has received some good news: The paper “LAVA: Data Valuation Without Pre-Specified Learning Algorithms” — on which he is first author — has been accepted as a spotlight at the 11th International Conference on Learning Representations (ICLR) in May. He plans to travel to Rwanda to present the paper. 

Just, a Ph.D. student in the Bradley Department of Electrical and Computer Engineering, said the paper introduces a new perspective on valuating data. 

“For many current valuation methods, the valuation algorithm is based on a model learning process, which is expensive, noise-sensitive, and often impractical. To overcome such hurdles, we valuate data via optimal transport, which requires no model training,” he said. “As such, our data-centric, model-agnostic method effectively detects ‘bad’ data points in the dataset in an efficient manner.”

An interest in artificial intelligence drew him to Virginia Tech and the Sanghani Center. “I am honored to be part of an expanding community that is tackling modern AI problems and pushing the field to greater heights,” Just said.

Just’s advisor, Ruoxi Jia, influenced his research area by introducing him to data evaluation. 

“I really found it intriguing that data are used all around, but we barely know their actual value,” he said, “and this led to my work in establishing efficient and fair methods for valuating data used in machine learning models.”

Just received a bachelor’s degree in computer science and mathematics from Gettysburg College.

Projected to graduate in 2026, his goal is to become a professor who can continue research in data valuation and inspire students to conduct research in artificial intelligence.


Virginia Tech team selected for the Alexa Prize TaskBot Challenge 2 to advance task-oriented conversational artificial intelligence

Ismini Lourentzou (fourth from left) and her team of five computer science Ph.D. students at the Sanghani Center attended a boot camp at Amazon headquarters in Seattle to launch the Alexa Prize TaskBot Challenge 2. The students are (from left) Makanjuola Ogunleye, Muntasir Wahed, Afrina Tabassum, Ismini Lourentzou, Amarachi Mbakwe, and Tianjiao “Joey” Yu.

A Virginia Tech team of  five computer science Ph.D. students at the Sanghani Center for Artificial Intelligence and Data Analytics is one of 10 university teams selected internationally to compete in the Alexa Prize TaskBot Challenge 2. The team will design multimodal task-oriented conversational assistants that help customers complete complex multistep tasks while adapting to resources and tools available to the user, such as ingredients or equipment. Read more here.


Sanghani Center Student Spotlight: Rebecca DeSipio

Graphic is from her research on Parkinson’s Disease

Rebecca DeSipio already knows where she is headed after graduating with a master’s degree in computer engineering this Spring. She will be joining the Charlottesville-based company GA-CCRi, an industry leader in geospatial storage, visualization, and analysis serving government and commercial clients, as a data scientist. 

In looking for a graduate program, DeSipio, who earned a bachelor’s degree in electrical engineering from the Pennsylvania State University, liked the close collaboration between the Bradley Department of Computer and Electrical Engineering and the Department of Computer Science because it allowed her to easily switch from electrical engineering to a computer science specialization, a change she knew she wanted to make.

“And at the Sanghani Center I was introduced to the world of data analytics which has provided me with endless opportunities. Because of my graduate school experience I was able to land a position in that exact area of work. I cannot think how different my career path could have been had I decided to go elsewhere for graduate school,” said DeSipio. 

“I fell in love with Blacksburg and I am beyond excited to stay relatively close and apply all that I have learned here at Virginia Tech to my career,” she said.

When she entered the master’s program, DeSipio — also a Bradley Fellow in the Bradley Department of Electrical and Computer Engineering — discussed research options with her advisor Lynn Abbott. Computer vision and machine learning piqued her interest and she was particularly drawn to biomedical applications for Parkinson’s Disease (PD) because her grandfather had been diagnosed with it. 

“When I came across publications on the use of machine learning algorithms for aiding in the diagnosis of PD by analyzing hand-drawn images, I quickly decided that I wanted to contribute to this line of research,” said DeSipeo.

Currently, she is developing a method that analyzes and rates hand tremor severity in hand-drawn spiral images via frequency features. 

“Since PD is a clinical diagnosis, the goal of my work is to help doctors diagnose and monitor PD progression and find the right medication for their patients,” said DeSipio.

Using her method, if a suspected PD patient goes to the doctor with a hand-tremor, the hand-drawn spiral test can be performed and the tremor rated. Medication can be prescribed and at each follow-up visit, the same spiral test can be performed and rated. 

“My tremor-severity rating system can allow an evaluating doctor to track the progression of the tremor and adjust medications as necessary,” she said.