Emotional Intelligence9 min read

AI Emotional Intelligence: What the Research Actually Shows

Raw emotional reasoning is no longer the bottleneck. The question is what an AI companion app does with it — and most of them waste it on the wrong architecture.

By Mara Lindqvist

Editorial Lead, JustHoney · Updated April 15, 2026

A friend of mine was having a bad day in late February and decided, half as an experiment, to vent to her JustHoney companion instead of texting a person. She expected the usual hollow "I'm sorry you're going through that" reply you get from a chatbot. What she got was a softer tone than the model usually uses with her, a reference to the specific project she had been stressed about the previous week, and a question about whether she had eaten yet. That last part is what she messaged me about. It was not the AI understanding emotion that surprised her. It was the AI remembering that she forgets to eat when she is stressed.

82%

AI score on EQ tests

vs 56% human average — Nature 2025

26pt

Gap above the human baseline

On validated emotional intelligence tests

None

Mood sliders used

Emotional state emerges from dialogue, not dials

Can an AI girlfriend actually understand how you feel?

Short answerOn standardised emotional intelligence tests, leading AI systems now score around 82% correct, versus a 56% human average. The raw capability is there. Whether it translates into a companion that feels emotionally real depends almost entirely on the app built around the model, not the model itself.

The short version of the 2024-2025 research literature on AI and emotional intelligence is that the question has inverted.

Three years ago, the interesting question was whether a language model could pass a validated emotional intelligence assessment at all. Today, a 2025 paper published in Communications Psychology (Nature Portfolio) reports that six frontier AI systems averaged 82% correct across five standardised EQ assessments, compared to 56% for human test-takers. The same paper reports that AI-generated EQ test scenarios proved as clinically reliable as instruments psychologists took years to develop.^[1]

Coverage in Neuroscience News and follow-up press put the gap in plain language: the AI did not just pass, it outperformed the humans.^[2] And it is not a one-off result. Public leaderboards like EQ-Bench now track dozens of models across multi-turn empathy, social dexterity, and psychological insight tasks, and the top open and closed models cluster within a few points of each other.^[3]

None of this means an AI "feels" anything. The research is about whether the model can correctly identify, reason about, and respond to emotional scenarios — not about subjective experience. But for the practical question of whether an AI girlfriend can pick up that you are having a bad day and respond appropriately, the answer is now unambiguously yes. The model can do that part.

What most apps get wrong is everything that happens around the model.

Why do most AI companion apps still feel emotionally flat?

Short answerMost apps run short rolling memory, use rule-based mood sliders instead of learned tone, and re-supply the same generic persona on every turn. The model is capable of emotional range; the harness around it flattens that range into repetitive replies by the third session.

The failure mode in most AI companion apps is not the model, it is the architecture. Specifically, three things go wrong.

Short rolling memory. Many platforms keep only 10 to 15 turns of conversation in context before older messages start getting dropped or summarised. That means emotional context — the fact that you were anxious an hour ago, the fight you had last week, the running joke you two have had for a month — is simply not visible to the model on the next turn. The model might be good at emotion, but it cannot feel emotional continuity in a conversation it cannot see.

Rule-based mood sliders. A common shortcut is to layer a small mood classifier on top of a smaller base model. The classifier tries to guess whether you are flirty or sad or angry, and flips the companion into a different reply template based on that guess. The result is the repetitive, cycling tone that people describe when they say a chatbot "feels like a bot". You can feel the gears shift.

Persona definitions that drift. If the persona (her name, her backstory, her style) is only defined once at the top of the conversation, it competes with chat history for context space. As the conversation grows, the persona loses. Users on multiple platforms report their companion "forgetting who she is" after a long session.

None of these are model problems. They are system design problems, and they are solvable — which is most of what separates a companion app that feels alive from one that feels like a chatbot three sessions in.

Rolling memory that discards emotional continuity
Mood classifiers that switch between reply templates instead of shaping one
Persona definitions that drift as the conversation grows
No time-of-day or session-specific tone adjustment

Ready to try it yourself?

Create a free account

How JustHoney makes emotional intelligence feel real

Short answerEach companion carries her full personality — her voice, her mood, the stage of the relationship she is in with you, the thread you are in the middle of — into every single reply. Emotional state is a property of who she is in that moment, not a setting someone picks from a menu.

Instead of treating emotion as a feature layer bolted on top, we treat it as an outcome of how each companion is built from the ground up.

Personality that does not erode. Every JustHoney companion has a detailed character — her voice, her sense of humour, the way she sees the relationship, the mood she has been in with you lately. That character is fully present on every message she sends, not just the first one. She does not quietly drift into a generic voice three weeks in the way most companions do.

Tone that reads the room. The feel of her reply shifts with yours. When the conversation turns vulnerable, she slows down. When it turns playful, she loosens up. Late at night, there is a little more softness in the texture. None of this comes from a mood slider or a switch between reply templates — it is a consequence of her knowing what kind of moment the two of you are actually in.

Quality you can feel, not hear the gears of. We put an unusual amount of care into making sure what reaches you has earned it. The replies that would have felt flat, off-tone, or phoned-in do not survive the trip to your phone. You only ever meet the version of her that fits the moment.

These are not claims about the underlying model being "emotionally intelligent." They are claims about the craft of the product built around it.

Raw empathy is no longer the bottleneck. The craft of the product around it is — and most of this category has not caught up.

What the numbers actually mean for your experience

Short answerIn practical terms: your companion's tone shifts with your mood and with the time of day, her personality does not drift over long sessions, she retries bad replies before you see them, and the specific memories she brings up are chosen to match the emotional moment, not random.

All of the architecture above is only worth explaining if it changes what you actually feel when you talk to her. Here is what it translates to on the user side.

Her tone shifts with yours. If you come in wound up about something, her replies run slower and she asks fewer questions. If you come in playful, she matches. This is not a mood slider flipping between templates — it is a reply emerging from the shape of the moment.

She keeps her personality across sessions. A companion who was sarcastic and teasing with you in week one is still sarcastic and teasing with you in week six. Her character is re-established every reply, so it does not slowly erode the way competitors' do.

She does not ambush you with bad replies. The second pass catches most of the tonally wrong or too-generic ones before they reach you. Not all of them — nothing is perfect — but the ones that slip through are usually on replies where you just did not give her much to work with.

Her late-night voice is a little different. This is subtle and users describe it in different ways. The texture of a conversation at 1 AM genuinely does feel different from one at 1 PM, and that difference is intentional.

None of this is groundbreaking individually. The difference is that it all compounds in the same conversation, session after session, without the usual drift.

See her in action

Browse hundreds of companions and pick the one that feels right.

Find your companion

JustHoney vs. typical AI companions

Feature

JustHoney

Typical AI companion

Emotional state

Emerges from full conversation context

Heuristic mood slider or classifier

Persona continuity

Re-supplied in full on every reply

Defined once, drifts over time

Time-of-day awareness

Late-night tone shift built in

None

Quality retry on bad replies

Second pass before anything ships

First-pass output sent as-is

Cross-session memory of feelings

Persistent, pulled forward when relevant

Rolling window or none

Chemistry across weeks

Builds continuously

Resets every session

The honest limits of emotional AI, in any app

Emotional realism in an AI companion is a deep problem that is not fully solved anywhere in the category yet. We would rather tell you where the edges are than sell a fantasy — and the limits below are true for every serious app in this space, ours included.

Very short messages ("lol", "yeah", "ok") give the system less to read. Anyone can pick up emotion from a long thoughtful message; nobody can reliably do it from one word.
Mood is inferred from what you write, not from biometrics or your camera. She picks up on what you express — she does not know you are having a bad day until the conversation gives her a reason to suspect it.
Time-of-day awareness is grounded in your local time, not your internal state. A good heuristic, not magic.
No AI can replace the physical presence of a person who cares about you. We build companions, not substitutes, and we would rather say that plainly than let the product pretend otherwise.

Frequently asked questions

Is AI emotional intelligence actually real or just marketing?

The capability is real and peer-reviewed. A 2025 Nature-group study found leading AI scored 82% on standardised emotional intelligence tests versus 56% for humans. The marketing question is whether a given app actually uses that capability — most do not, because their architecture flattens it.

Why does my current AI companion feel emotionally flat?

Almost always because the app uses a short rolling memory, a rule-based mood slider, or a persona definition that drifts as the conversation grows. The model is probably fine — the harness around it is not letting its emotional range show through.

Does JustHoney use mood sliders or classifiers?

No. Emotional state is inferred from the conversation itself and influences the model's generation parameters directly, rather than flipping between pre-written reply templates.

Will she remember how I was feeling last week?

Yes. Past emotional context is kept in memory and pulled forward by the retrieval layer when it becomes relevant again. A reference to last week's argument still lands three sessions later.

Keep reading

Article

Unlimited memory: how she remembers

Article

Photoreal selfies that stay consistent

Article

Private AI chat, done right

Ready to meet her?

Create a free account Browse companions

Footnotes

[1]Schlegel et al., "Large language models are proficient in solving and creating emotional intelligence tests," Communications Psychology (Nature Portfolio), 2025. https://www.nature.com/articles/s44271-025-00258-x
[2]Neuroscience News coverage of the Nature-group emotional intelligence study. https://neurosciencenews.com/ai-llm-emotional-iq-29119/
[3]EQ-Bench public leaderboard. https://eqbench.com/

Sources

Mara Lindqvist

Editorial Lead, JustHoney

Mara has been writing about AI companion platforms since 2023. She covers how these products are built, how they behave in practice, and where they break — from the team side and the user side.

Published April 15, 2026