publications | Anmol Singhal

2025

RE

Legal Requirements Translation from Law

Anmol Singhal, and Travis Breaux

In 33rd IEEE International Requirements Engineering Conference (RE 2025) , Sep 2025

Abs HTML

Software systems must comply with legal regulations, which is a resource-intensive task, particularly for small organizations and startups lacking dedicated legal expertise. Extracting metadata from regulations to elicit legal requirements for software is a critical step to ensure compliance. However, it is a cumbersome task due to the length and complex nature of legal text. Although prior work has pursued automated methods for extracting structural and semantic metadata from legal text, key limitations remain: they do not consider the interplay and interrelationships among attributes associated with these metadata types, and they rely on manual labeling or heuristic-driven machine learning, which does not generalize well to new documents. In this paper, we introduce an approach based on textual entailment and in-context learning for automatically generating a canonical representation of legal text, encodable and executable as Python code. Our representation is instantiated from a manually designed Python class structure that serves as a domain-specific metamodel, capturing both structural and semantic legal metadata and their interrelationships. This design choice reduces the need for large, manually labeled datasets and enhances applicability to unseen legislation. We evaluate our approach on 13 U.S. state data breach notification laws, demonstrating that our generated representations pass approximately 89.4% of test cases and achieve a precision and recall of 82.2 and 88.7, respectively.
RE

Requirements Elicitation Follow-Up Question Generation

Yuchen Shen, Anmol Singhal, and Travis Breaux

In 33rd IEEE International Requirements Engineering Conference (RE 2025) , Sep 2025

Abs HTML

Interviews are a widely used technique in eliciting requirements to gather stakeholder needs, preferences, and expectations for a software system. Effective interviewing requires skilled interviewers to formulate appropriate interview questions in real time while facing multiple challenges, including lack of familiarity with the domain, excessive cognitive load, and information overload that hinders how humans process stakeholders’ speech. Recently, large language models (LLMs) have exhibited state-of-the-art performance in multiple natural language processing tasks, including text summarization and entailment. To support interviewers, we investigate the application of GPT-4o to generate follow-up interview questions during requirements elicitation by building on a framework of common interviewer mistake types. In addition, we describe methods to generate questions based on interviewee speech. We report a controlled experiment to evaluate LLM-generated and human-authored questions with minimal guidance, and a second controlled experiment to evaluate the LLM-generated questions when generation is guided by interviewer mistake types. Our findings demonstrate that, for both experiments, the LLM-generated questions are no worse than the human-authored questions with respect to clarity, relevancy, and informativeness. In addition, LLM-generated questions outperform human-authored questions when guided by common mistakes types. This highlights the potential of using LLMs to help interviewers improve the quality and ease of requirements elicitation interviews in real time.
Dealing with Data for RE: Mitigating Challenges While Using NLP and Generative AI

Smita Ghaisas, and Anmol Singhal

Sep 2025

Abs

Across the dynamic business landscape today, enterprises face an ever-increasing range of challenges. These include the constantly evolving regulatory environment, the growing demand for personalization within software applications and the heightened emphasis on governance. In response to these multifaceted demands, large enterprises have been adopting automation that spans from the optimization of core business processes to the enhancement of customer experiences. Indeed, Artificial Intelligence (AI) has emerged as a pivotal element of modern software systems. In this context, data plays an indispensable role. AI-centric software systems based on supervised learning and operating at an industrial scale require large volumes of training data to perform effectively. Moreover, the incorporation of generative AI has led to a growing demand for adequate evaluation benchmarks. Our experience in this field has revealed that the requirement for large datasets for training and evaluation introduces a host of intricate challenges. This book chapter explores the evolving landscape of Software Engineering (SE) in general, and Requirements Engineering (RE) in particular, in this era marked by AI integration. We discuss challenges that arise while integrating Natural Language Processing (NLP) and generative AI into enterprise-critical software systems. The chapter provides practical insights, solutions and examples to equip readers with the knowledge and tools necessary for effectively building solutions with NLP at their cores. We reflect on how these text data-centric tasks sit together with the traditional RE process. With this effort, we hope to engage students, faculty and industry researchers in a discussion that could lead to the identification of new and emerging text data-centric tasks relevant to RE. We also highlight new RE tasks that may be necessary for handling the increasingly important text data-centricity involved in developing software systems.

2024

LREC-COLING

Generating Clarification Questions for Disambiguating Contracts

Anmol Singhal, Chirag Jain, Preethu Rose Anish, Arkajyoti Chakraborty, and Smita Ghaisas

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , May 2024

Abs HTML

Enterprises frequently enter into commercial contracts that can serve as vital sources of project-specific requirements. Contractual clauses are obligatory, and the requirements derived from contracts can detail the downstream implementation activities that non-legal stakeholders, including requirement analysts, engineers, and delivery personnel, need to conduct. However, comprehending contracts is cognitively demanding and error-prone for such stakeholders due to the extensive use of Legalese and the inherent complexity of contract language. Furthermore, contracts often contain ambiguously worded clauses to ensure comprehensive coverage. In contrast, non-legal stakeholders require a detailed and unambiguous comprehension of contractual clauses to craft actionable requirements. In this work, we introduce a novel legal NLP task that involves generating clarification questions for contracts. These questions aim to identify contract ambiguities on a document level, thereby assisting non-legal stakeholders in obtaining the necessary details for eliciting requirements. This task is challenged by three core issues: (1) data availability, (2) the length and unstructured nature of contracts, and (3) the complexity of legal text. To address these issues, we propose ConRAP, a retrieval-augmented prompting framework for generating clarification questions to disambiguate contractual text. Experiments conducted on contracts sourced from the publicly available CUAD dataset show that ConRAP with ChatGPT can detect ambiguities with an F2 score of 0.87. 70% of the generated clarification questions are deemed useful by human evaluators.

2023

NLLP

Towards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder’s Perspective

Anmol Singhal, Preethu Rose Anish, Shirish Karande, and Smita Ghaisas

In Proceedings of the Natural Legal Language Processing Workshop 2023 , Dec 2023

Abs HTML

Commercial contracts are known to be a valuable source for deriving project-specific requirements. However, contract negotiations mainly occur among the legal counsel of the parties involved. The participation of non-legal stakeholders, including requirement analysts, engineers, and solution architects, whose primary responsibility lies in ensuring the seamless implementation of contractual terms, is often indirect and inadequate. Consequently, a significant number of sentences in contractual clauses, though legally accurate, can appear unfair from an implementation perspective to non-legal stakeholders. This perception poses a problem since requirements indicated in the clauses are obligatory and can involve punitive measures and penalties if not implemented as committed in the contract. Therefore, the identification of potentially unfair clauses in contracts becomes crucial. In this work, we conduct an empirical study to analyze the perspectives of different stakeholders regarding contractual fairness. We then investigate the ability of Pre-trained Language Models (PLMs) to identify unfairness in contractual sentences by comparing chain of thought prompting and semi-supervised fine-tuning approaches. Using BERT-based fine-tuning, we achieved an accuracy of 84% on a dataset consisting of proprietary contracts. It outperformed chain of thought prompting using Vicuna-13B by a margin of 9%.

2022

CAIN

Data is about detail: an empirical investigation for software systems with NLP at core

Anmol Singhal, Preethu Rose Anish, Pratik Sonar, and Smita S Ghaisas

In Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI , Pittsburgh, Pennsylvania, Dec 2022

Abs HTML

Businesses continue to operate under increasingly complex demands such as ever-evolving regulatory landscape, personalization requirements from software apps, and stricter governance with respect to security and privacy. In response to these challenges, large enterprises have been emphasizing automation across a wide range, starting with business processes all the way to customer experience. As AI continues to be a core component of software systems being developed, data assumes a predominant role. AI-centric software systems of industrial scale need large amounts of training data, that in our experience, has introduced several challenges. In this paper, through an empirical study based on interviews with AI practitioners, we present current challenges that need to be addressed in ’data requirements’ of Software Systems with NLP at the Core (SSNLPCore). We further discuss the impact of the challenges and techniques currently employed by practitioners for addressing them. Our findings reveal that a focus on details pertaining to data is required early into the project lifecycle, which include aspects such as how we may select, process, and annotate data. This can ensure that the AI component is effective in meeting business goals of software systems.

2021

HCII

Feature Fused Human Activity Recognition Network (FFHAR-Net)

Anmol Singhal, Mihir Goyal, Jainendra Shukla, and Raghava Mutharaju

In HCI International 2021 - Posters , Dec 2021

Abs HTML

With the advances in smart home technology and Internet of Things (IoT), there has been keen research interest in human activity recognition to allow service systems to understand human intentions. Recognizing human objectives by these systems without user intervention, results in better service, which is crucial to improve the user experience. Existing research approaches have focused primarily on probabilistic methods like Bayesian networks (for instance, the CRAFFT algorithm). Though quite versatile, these probabilistic models may be unable to successfully capture the possibly complex relationships between the input variables. To the best of our knowledge, a statistical study of features in a human activity recognition task, their relationships, etc., has not yet been attempted. To this end, we study the domain of human activity recognition to improve the state-of-the-art and present a novel neural network architecture for the task. It employs early fusion on different types of minimalistic features such as time and location to make extremely accurate predictions with a maximum micro F1-score of 0.98 on the Aruba CASAS dataset. We also accompany the model with a comprehensive study of the features. Using feature selection techniques like Leave-One-Out, we rank the features according to the information they add to deep learning models and make further inferences using the ranking obtained. Our empirical results show that the feature Previous Activity Performed is the most useful of all, surprisingly even more than time (the basis of activity scheduling in most societies). We use three Activities of Daily Living (ADL) datasets in different settings to empirically demonstrate the utility of our architecture. We share our findings along with the models and the source code.