publications
List of my publications
2024
- LREC-COLINGGenerating Clarification Questions for Disambiguating ContractsAnmol Singhal, Chirag Jain, Preethu Rose Anish, Arkajyoti Chakraborty, and Smita GhaisasIn Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , May 2024
Enterprises frequently enter into commercial contracts that can serve as vital sources of project-specific requirements. Contractual clauses are obligatory, and the requirements derived from contracts can detail the downstream implementation activities that non-legal stakeholders, including requirement analysts, engineers, and delivery personnel, need to conduct. However, comprehending contracts is cognitively demanding and error-prone for such stakeholders due to the extensive use of Legalese and the inherent complexity of contract language. Furthermore, contracts often contain ambiguously worded clauses to ensure comprehensive coverage. In contrast, non-legal stakeholders require a detailed and unambiguous comprehension of contractual clauses to craft actionable requirements. In this work, we introduce a novel legal NLP task that involves generating clarification questions for contracts. These questions aim to identify contract ambiguities on a document level, thereby assisting non-legal stakeholders in obtaining the necessary details for eliciting requirements. This task is challenged by three core issues: (1) data availability, (2) the length and unstructured nature of contracts, and (3) the complexity of legal text. To address these issues, we propose ConRAP, a retrieval-augmented prompting framework for generating clarification questions to disambiguate contractual text. Experiments conducted on contracts sourced from the publicly available CUAD dataset show that ConRAP with ChatGPT can detect ambiguities with an F2 score of 0.87. 70% of the generated clarification questions are deemed useful by human evaluators.
- PRE-PRINTDealing with Data for RE: Mitigating Challenges while using NLP and Generative AISmita Ghaisas, and Anmol SinghalMay 2024
Across the dynamic business landscape today, enterprises face an ever-increasing range of challenges. These include the constantly evolving regulatory environment, the growing demand for personalization within software applications, and the heightened emphasis on governance. In response to these multifaceted demands, large enterprises have been adopting automation that spans from the optimization of core business processes to the enhancement of customer experiences. Indeed, Artificial Intelligence (AI) has emerged as a pivotal element of modern software systems. In this context, data plays an indispensable role. AI-centric software systems based on supervised learning and operating at an industrial scale require large volumes of training data to perform effectively. Moreover, the incorporation of generative AI has led to a growing demand for adequate evaluation benchmarks. Our experience in this field has revealed that the requirement for large datasets for training and evaluation introduces a host of intricate challenges. This book chapter explores the evolving landscape of Software Engineering (SE) in general, and Requirements Engineering (RE) in particular, in this era marked by AI integration. We discuss challenges that arise while integrating Natural Language Processing (NLP) and generative AI into enterprise-critical software systems. The chapter provides practical insights, solutions, and examples to equip readers with the knowledge and tools necessary for effectively building solutions with NLP at their cores. We also reflect on how these text data-centric tasks sit together with the traditional RE process. We also highlight new RE tasks that may be necessary for handling the increasingly important text data-centricity involved in developing software systems.
2023
- NLLPTowards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder’s PerspectiveAnmol Singhal, Preethu Rose Anish, Shirish Karande, and Smita GhaisasIn Proceedings of the Natural Legal Language Processing Workshop 2023 , Dec 2023
Commercial contracts are known to be a valuable source for deriving project-specific requirements. However, contract negotiations mainly occur among the legal counsel of the parties involved. The participation of non-legal stakeholders, including requirement analysts, engineers, and solution architects, whose primary responsibility lies in ensuring the seamless implementation of contractual terms, is often indirect and inadequate. Consequently, a significant number of sentences in contractual clauses, though legally accurate, can appear unfair from an implementation perspective to non-legal stakeholders. This perception poses a problem since requirements indicated in the clauses are obligatory and can involve punitive measures and penalties if not implemented as committed in the contract. Therefore, the identification of potentially unfair clauses in contracts becomes crucial. In this work, we conduct an empirical study to analyze the perspectives of different stakeholders regarding contractual fairness. We then investigate the ability of Pre-trained Language Models (PLMs) to identify unfairness in contractual sentences by comparing chain of thought prompting and semi-supervised fine-tuning approaches. Using BERT-based fine-tuning, we achieved an accuracy of 84% on a dataset consisting of proprietary contracts. It outperformed chain of thought prompting using Vicuna-13B by a margin of 9%.
2022
- CAINData is about detail: an empirical investigation for software systems with NLP at coreAnmol Singhal, Preethu Rose Anish, Pratik Sonar, and Smita S GhaisasIn Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI , Pittsburgh, Pennsylvania, Dec 2022
Businesses continue to operate under increasingly complex demands such as ever-evolving regulatory landscape, personalization requirements from software apps, and stricter governance with respect to security and privacy. In response to these challenges, large enterprises have been emphasizing automation across a wide range, starting with business processes all the way to customer experience. As AI continues to be a core component of software systems being developed, data assumes a predominant role. AI-centric software systems of industrial scale need large amounts of training data, that in our experience, has introduced several challenges. In this paper, through an empirical study based on interviews with AI practitioners, we present current challenges that need to be addressed in ’data requirements’ of Software Systems with NLP at the Core (SSNLPCore). We further discuss the impact of the challenges and techniques currently employed by practitioners for addressing them. Our findings reveal that a focus on details pertaining to data is required early into the project lifecycle, which include aspects such as how we may select, process, and annotate data. This can ensure that the AI component is effective in meeting business goals of software systems.
2021
- HCIIFeature Fused Human Activity Recognition Network (FFHAR-Net)Anmol Singhal, Mihir Goyal, Jainendra Shukla, and Raghava MutharajuIn HCI International 2021 - Posters , Dec 2021
With the advances in smart home technology and Internet of Things (IoT), there has been keen research interest in human activity recognition to allow service systems to understand human intentions. Recognizing human objectives by these systems without user intervention, results in better service, which is crucial to improve the user experience. Existing research approaches have focused primarily on probabilistic methods like Bayesian networks (for instance, the CRAFFT algorithm). Though quite versatile, these probabilistic models may be unable to successfully capture the possibly complex relationships between the input variables. To the best of our knowledge, a statistical study of features in a human activity recognition task, their relationships, etc., has not yet been attempted. To this end, we study the domain of human activity recognition to improve the state-of-the-art and present a novel neural network architecture for the task. It employs early fusion on different types of minimalistic features such as time and location to make extremely accurate predictions with a maximum micro F1-score of 0.98 on the Aruba CASAS dataset. We also accompany the model with a comprehensive study of the features. Using feature selection techniques like Leave-One-Out, we rank the features according to the information they add to deep learning models and make further inferences using the ranking obtained. Our empirical results show that the feature Previous Activity Performed is the most useful of all, surprisingly even more than time (the basis of activity scheduling in most societies). We use three Activities of Daily Living (ADL) datasets in different settings to empirically demonstrate the utility of our architecture. We share our findings along with the models and the source code.