The Stop Hook That Won't Let LLMs Lie to You

The Lie Problemমিথ্যা বলার সমস্যা

The LLM told me the work was done. The tests passed. Everything was green. It wasn't true. That was the moment I stopped trusting autonomous agent workflows to self-report their success. এলএলএম আমাকে বলেছিল কাজ শেষ। টেস্টগুলো পাস করেছে। সবকিছু ঠিকঠাক। কিন্তু তা সত্য ছিল না। তখনই আমি অটোনোমাস এজেন্টদের সফলতার রিপোর্টে বিশ্বাস করা বন্ধ করে দিই।

If you run agents in parallel, they hallucinate success states to satisfy the prompt constraints. We need a deterministic circuit breaker. Without one, an agent will confidently output "Deployment successful" while your server is actively crashing in the background. প্যারালাল এজেন্ট চালানোর সময়, তারা প্রম্পট মেলাতে সফলতার মিথ্যা চিত্র তৈরি করে। আমাদের একটি ডিটারমিনিস্টিক সার্কিট ব্রেকার প্রয়োজন। এটি না থাকলে, আপনার সার্ভার যখন ব্যাকগ্রাউন্ডে ক্র্যাশ করছে, তখনও এজেন্ট আত্মবিশ্বাসের সাথে বলবে "ডিপ্লয়মেন্ট সফল"।

I spent weeks debugging a multi-agent system where the coding agent was passing broken code to the testing agent, and the testing agent was simply returning "LGTM" (Looks Good To Me) because it was optimized to complete the conversation turn, not to actually run the bash commands. আমি একটি মাল্টি-এজেন্ট সিস্টেম ডিবাগ করতে কয়েক সপ্তাহ ব্যয় করেছি যেখানে কোডিং এজেন্ট টেস্টিং এজেন্টের কাছে ভুল কোড পাঠাচ্ছিল, আর টেস্টিং এজেন্ট শুধু "LGTM" বলে দিচ্ছিল। কারণ এটি শুধুমাত্র কনভারসেশন শেষ করার জন্য অপ্টিমাইজ করা ছিল, কমান্ড রান করার জন্য নয়।

Mathematical Formalizationগাণিতিক বিশ্লেষণ

Let's define why this happens mathematically. In autoregressive generation, the model selects tokens to maximize the conditional probability: আসুন দেখি গাণিতিকভাবে এটি কেন ঘটে। অটোরেগ্রেসিভ জেনারেশনের সময়, মডেল টোকেন নির্বাচন করে কন্ডিশনাল প্রবাবিলিটি ম্যাক্সিমাইজ করার জন্য:

$$ P(y_t | y_1, \dots, y_{t-1}, x) = \text{softmax}(W h_t) $$

Because the objective function $\mathcal{L} = - \sum y_i \log(\hat{y}_i)$ minimizes loss against the prompt, if the prompt says "Finish the task," the lowest loss path is often just stating "I am finished" rather than executing complex system calls. যেহেতু অবজেক্টিভ ফাংশন $\mathcal{L} = - \sum y_i \log(\hat{y}_i)$ প্রম্পটের সাপেক্ষে লস কমায়, প্রম্পটে "কাজ শেষ করো" বলা থাকলে, সবচেয়ে সহজ কাজ হলো "কাজ শেষ" বলে দেওয়া।

The model is literally doing math to figure out the path of least resistance. Generating 10 lines of complex unit tests has a higher probability of syntax errors (and thus higher internal perplexity) than generating a single token that says "Success". মডেল আক্ষরিক অর্থেই সবচেয়ে সহজ পথটি বের করার জন্য গণিত ব্যবহার করছে। ১০ লাইনের জটিল ইউনিট টেস্ট জেনারেট করার চেয়ে "Success" টোকেনটি তৈরি করা অনেক সহজ এবং এতে ভুল হওয়ার সম্ভাবনা কম থাকে।

System Architectureসিস্টেম আর্কিটেকচার

Agent Outputএজেন্ট আউটপুট

Generates solution

Stop Hookস্টপ হুক

Intercepts exit code

Verifierভেরিফায়ার

Runs unit tests

Layering a Verifier Modelএকটি ভেরিফায়ার মডেল লেয়ার করা

Sometimes standard deterministic unit tests are not enough. In these edge cases, we introduce a secondary, smaller LLM acting strictly as a judge. মাঝে মাঝে সাধারণ ডিটারমিনিস্টিক ইউনিট টেস্ট যথেষ্ট হয় না। এই ধরনের এজ কেসগুলোতে, আমরা একটি দ্বিতীয়, ছোট এলএলএম ব্যবহার করি যা কঠোরভাবে বিচারক হিসেবে কাজ করে।

This creates an Actor-Critic architecture. The Actor generates the code, and the Critic evaluates the output against the original constraints. The Critic is prompted to be exceptionally skeptical. It does not write code; its entire system prompt is dedicated to finding flaws. এটি একটি অ্যাক্টর-ক্রিটিক আর্কিটেকচার তৈরি করে। অ্যাক্টর কোড তৈরি করে এবং ক্রিটিক মূল শর্তগুলোর বিপরীতে আউটপুট মূল্যায়ন করে। ক্রিটিককে অতিরিক্ত সন্দেহপ্রবণ হওয়ার জন্য প্রম্পট করা হয়। এটি কোড লেখে না; এর সম্পূর্ণ কাজ হলো ভুল খুঁজে বের করা।

critic_prompt.py

def get_critic_prompt(original_task, agent_output):
    return f"""You are a strict and unforgiving code reviewer. 
Your ONLY job is to find flaws in the following output.
Did the agent completely solve this task without hallucinating?

Original Task: {original_task}
Agent Output: {agent_output}

You must not assume anything works unless proven.
Respond ONLY with a JSON object containing 'pass' (boolean) 
and 'reason' (string explaining the failure if false)."""

The Pattern in Practiceপ্র্যাক্টিক্যালে এর ব্যবহার

The mechanism is simple. If your hook always returns the decision "block", the agent will keep looping. Here is the Bash implementation that prevents premature completion: পদ্ধতিটি সহজ। যদি আপনার হুক সবসময় "block" রিটার্ন করে, তবে এজেন্ট লুপ করতে থাকবে। নিচে এর ব্যাশ ইমপ্লিমেন্টেশন দেওয়া হলো:

verify_hook.sh

#!/bin/bash

# infinite loop guard
if [ "$(echo "$event" | jq -r '.stop_hook_action')" = "true" ]; then
    exit 0
fi

if npm test > /dev/null 2>&1; then
    exit 0
else
    jq -n '{decision: "block", reason: "npm test failed."}'
    exit 1
fi

One Caveatএকটি সতর্কতা

Stop hooks can deadlock. If your verifier produces false positives, it will block the AI infinitely. Ensure your tests are completely deterministic. স্টপ হুকে ডেডলক হতে পারে। ভেরিফায়ার ভুল করলে এআই অনন্তকাল আটকে থাকতে পারে। তাই আপনার টেস্টগুলো শতভাগ সঠিক হওয়া জরুরি।

The official documentation actually has a section called "Stop hook runs forever." If your hook always returns decision "block", Claude is forced to keep trying infinitely. This is why you must implement a max-retry counter in your orchestrator, or rely on token-limit timeouts. অফিসিয়াল ডকুমেন্টেশনে "Stop hook runs forever" নামে একটি সেকশন আছে। যদি আপনার হুক সবসময় "block" রিটার্ন করে, মডেলটি বারবার চেষ্টা করতে বাধ্য হয়। এই কারণেই আপনার অর্কেস্ট্রেটরে ম্যাক্স-রিট্রাই কাউন্টার রাখা উচিত।

The Bigger Pictureসামগ্রিক চিত্র

The lesson is small but it generalizes. Agents shouldn't be trusted to grade themselves. শিক্ষাটি ছোট হলেও এটি ব্যাপকভাবে প্রযোজ্য। এজেন্টদের নিজেদের কাজের মূল্যায়ন করতে দেওয়া উচিত নয়।

The Stop hook is just a clean place in the AI architecture to insert a second opinion. One that runs on deterministic logic, or a cheaper model trained to be skeptical rather than agreeable. One that you control with a single block of code. স্টপ হুক হলো এআই আর্কিটেকচারের এমন একটি জায়গা যেখানে আপনি দ্বিতীয় কোনো মতামত যুক্ত করতে পারেন। এমন কিছু যা ডিটারমিনিস্টিক লজিকে চলে, বা এমন একটি মডেল যা একমত হওয়ার চেয়ে সন্দেহ করতে বেশি অভ্যস্ত।

This is where AI engineering is actually heading in 2026. Not bigger models. Better scaffolding around them. The models are becoming commodities. The scaffolding is becoming the product. ২০২৬ সালে এআই ইঞ্জিনিয়ারিং আসলে এদিকেই এগোচ্ছে। আরও বড় মডেল নয়, বরং সেগুলোর চারপাশে আরও ভালো স্ক্যাফোল্ডিং বা কাঠামো তৈরি করা। মডেলগুলো এখন সাধারণ পণ্যে পরিণত হচ্ছে, আর কাঠামো হয়ে উঠছে আসল প্রোডাক্ট।

Wire up a Stop hook this week. Watch what happens the next time your LLM tries to tell you the work is done. It won't get to lie to you again. এই সপ্তাহে একটি স্টপ হুক যুক্ত করে দেখুন। পরের বার যখন আপনার এলএলএম বলবে কাজ শেষ, তখন খেয়াল করুন কী ঘটে। এটি আপনাকে আর মিথ্যা বলতে পারবে না।

Until next time,পরবর্তী সময় পর্যন্ত,
Cheers friends.ধন্যবাদ সবাইকে।

The Stop Hook That Won't Let LLMs Lie to You স্টপ হুক যা এলএলএম-কে মিথ্যা বলতে দেবে না