You’re Not Wrong, You’re Just Unsatisfiable

You’re Not Wrong, You’re Just Unsatisfiable- Investigating ASP Code Generation with LLMs via Fine-tuning and Feedback Loops

Tijdens de module Generative AI heb ik verder gewerkt aan het genereren van syntactisch én semantisch correcte ASP-programma’s met Large Language Models. Dit project bouwt voort op Ctrl+Z for LLMs: Iterative Syntax Feedback for ASP Code Generation, waarin een solver-in-the-loop aanpak voor syntactische correctie centraal stond.

In deze uitbreiding is de architectuur aangevuld met een semantische feedbackloop, waarin een tweede LLM beoordeelt of het gegenereerde ASP-programma inhoudelijk overeenkomt met de natuurlijke-taalspecificatie. Daarnaast is een logic-geoptimaliseerde QLoRA fine-tuningstrategie toegepast op een gedistilleerde en modulair opgebouwde ASP-dataset met expliciete repair-taken.

Meer informatie

Hieronder vind je het abstract van de paper.

Large Language Models (LLMs) show promise for generating Answer Set Programming (ASP) code from natural language, but reliably producing programs that are both syntactically and semantically correct remains challenging. Prior work has shown that iterative syntax feedback can improve compilability, while semantic correctness often lags behind. In this paper, we investigate whether combining supervised fine-tuning with a dual-stage repair architecture - consisting of syntactic validation and an LLM based semantic feedback loop - can improve ASP code generation. We fine-tune a Qwen2.5-7B-Instruct model using a custom dataset build using publicly available Clingo code, augmented with synthetic repair tasks to explicitly train debugging behavior. Weevaluate our approach on four established scheduling domains using multiple experimental setups with and without fine-tuning and semantic feedback. Our results show that fine-tuning leads to near-perfect syntactic correctness and a substantial improvement in semantic accuracy. However, the introduction of a semantic feedback loop does not yield additional gains and in some cases degrades performance due to unreliable semantic validation. These findings suggest that semantic validation itself should be treated as a learning problem, motivating future work on fine-tuning dedicated semantic validator models.

Downloads

Enkele bestanden die je kan downloaden.

You’re Not Wrong, You’re Just Unsatisfiable - Investigating ASP Code Generation with LLMs via Fine-tuning and Feedback Loops.pdf