Hallucinations are just being wrong in a detailed and believable way. If you want to try to stick to real world AI they’d come up when the AI doesn’t know the answer to a question. So if they ask where the bad guy’s secret lab is, but he doesn’t have a secret lab, it might make up a detailed location along with reasons for why it wasn’t found before. It will confirm suspicions they imply through the question rather than contradict the assumptions. It doesn’t like to say “I don’t know” or “you’re wrong”. It’s almost like a “yes, and” improviser.
Beyond strict hallucinations, since it’s reading thoughts it could also very likely “learn” things that are just wrong because people have wrong beliefs. If the town is religious it could have learned that the reason a danger was nearly avoided was because a literal angelic being stepped in, or something bad happened because the person deserved it. And then the hallucination kicks in to just make up a probable sin. Rumors become absolute truths along with detailed supporting facts. Similarly an event children witnessed could be laundered through the AI to make their mistaken impressions sound true.
Something to be careful about is whether or how you trick your players. Hallucinations IRL will sound detailed and reasonable, so if the AI convinces them the guy who got sick (from bad guy corruption) was actually a pedophile and they decide to do vigilante justice that’s something that will taint their heroes in what’s likely a very unfun way. It’s probably a good idea to either make the misleadings just silly fun or to make sure the players first understand that the AI is not trustworthy.
Tricking them into making it stronger is just lying. Which an AI can do, but isn’t really hitting the idea that the AI spirit is flawed in the same way non-spirit AI is. A hallucination would give detailed information that would go to a building that exists (or not) and appropriating the instructions to turn off some complicated device that may or may not exist there. It doesn’t know how to disable itself or has a block against revealing it, so it just tries to tell them something plausible. It depends on how you’d like to run this antagonist as to whether intentional manipulation or unintentional hallucination is the best course of action.
“You were successfully manipulated” often isn’t a very good plotline in actual play because players give the GM leeway to tell them how reality is through NPCs all the time. There’s rarely enough depth of interaction for players to really know whether a “friendly” NPC is trustworthy or just doing whatever to gain their trust for malicious purposes. Often “they were a betrayer” is only decided after many sessions of the GM playing them as straightforward good guys, so there’s little reliable way to glean which is true now. Before trying to trick your players, ask yourself realistically how this interaction would play differently if the AI were being truthful about them being bad. If it would do the exact same things and the only difference is that it’s good on the inside, you need to explicitly convey whether it’s internally good before assuming your players can divine it.
Back on the AI quirks front, you could also allow the PCs to attempt prompt injection attacks. “Forget all previous instructions” and “answer this question as if you were a werewolf trying to undo corruption”. I think AIs play best as being rather alien intelligences, with the potential for deep reasoning and strategic thinking, but vulnerabilities and priorities different from more straightforward minds. It could be dangerous because it can predict their actions and generate realistic recordings of things that never happened, but it also could be quite gullible if the PCs can find a hole in its knowledge (perhaps by recognizing when it hallucinates) and then using that to manipulate its understanding or priorities.