Avatar
  • Reranking algorithms have made progress in improving document retrieval quality by efficiently aggregating relevance judgments generated by large language models (LLMs). However, identifying relevant documents for queries that require in-depth reasoning remains a major challenge. Reasoning‐intensive queries often exhibit multifaceted information needs and nuanced interpretations, rendering document relevance inherently context dependent. To address this, we propose contextual relevance, which we define as the probability that a document is relevant to a given query, marginalized over the distribution of different reranking contexts it may appear in (i.e., the set of candidate documents it is ranked alongside and the order in which the documents are presented to a reranking model). While prior works have studied methods to mitigate the positional bias LLMs exhibit by accounting for the ordering of documents, we empirically find that the compositions of these batches also plays an important role in reranking performance. To efficiently estimate contextual relevance, we propose TS-SetRank, a sampling-based, uncertainty-aware reranking algorithm. Empirically, TS-SetRank improves nDCG@10 over retrieval and reranking baselines by 15–25% on BRIGHT and 6–21% on BEIR, highlighting the importance of modeling relevance as context-dependent.

    Published October 2025
  • Retrieval-augmented generation (RAG) systems rely on retrieval models for identifying relevant contexts and answer generation models for utilizing those contexts. However, retrievers exhibit imperfect recall and precision, limiting downstream performance. We introduce RAG-RL, an answer generation model trained for multi-hop question answering (MHQA) to not only generate answers but also to identify and cite relevant information from larger sets of retrieved contexts, shifting some of the burden of identifying relevant documents from the retriever to the answer generator. Our approach uses curriculum learning, where models are trained across retrieval settings with varying levels of noise. Our experiments show that training samples with fewer distractor documents enable models to acquire citation and reasoning skills with greater sample efficiency and generalizability, demonstrating strong model performance even as the number of irrelevant passages increases. We benchmark our methods on three open-domain MHQA datasets and report significant gains in answer and citation accuracy. Furthermore, our experiments provide empirical insights into how simpler training samples can give models stronger signals for learning specific skills (e.g., citation generation) and how different components of post-training (e.g., training set construction, rule-based rewards, training sample ordering, etc.) impact final model performance.

    Published July 2025
  • Owing to the inherent fault tolerance of deep neural network (DNN) models used for classification, many structural faults in the processing elements (PEs) of a systolic array-based AI accelerator are functionally benign. Brute-force fault simulation for determining fault criticality is computationally expensive due to many potential fault sites in the accelerator array and the dependence of criticality characterization of PEs on the functional input data. Supervised learning techniques can be used to accurately estimate fault criticality but it requires ground truth for model training. The ground-truth collection involves extensive and computationally expensive fault simulations. We present a framework for analyzing fault criticality with a negligible amount of ground-truth data. We incorporate the gate-level structural and functional information of the PEs in their “neural twins”, referred to as “PE-Nets”. The PE netlist is translated into a trainable PE-Net, where the standard-cell instances are substituted by their corresponding “Cell-Nets” and the wires translate to neural connections. Each Cell-Net is a pre-trained DNN that models the Boolean-logic behavior of the corresponding standard cell. In the PE-Net, every neural connection is associated with a bias that represents a perturbation in the signal propagated by that connection. We utilize a recently proposed misclassification-driven training algorithm to sensitize and identify biases that are critical to the functioning of the accelerator for a given application workload. The proposed framework achieves up to 100% accuracy in fault-criticality classification in 16-bit and 32-bit PEs by using the criticality knowledge of only 2% of the total faults in a PE.

    Published October 2021