AgenteUno is a B2B SaaS platform that enables any business to have its own AI agent handling customers via web chat, WhatsApp, SMS, voice calls, email, Instagram and Facebook Messenger, all from a single dashboard. No technical knowledge required.

What type of businesses does it work for?

AgenteUno works for any service business: clinics, restaurants, salons, real estate, hotels, law firms, beauty centers, retail stores and more. Each agent is customized with your specific business knowledge.

What languages does it support?

AgenteUno is designed for the Spanish-speaking market. Our agents understand native Spanish from Spain and Latin America, with regional accents. We also support English for bilingual businesses.

How long does setup take?

Less than 5 minutes. Just create your account, describe your business, connect your channels (WhatsApp, phone, web) and your agent is ready. No technical knowledge or coding needed.

How much does AgenteUno cost?

We have 4 plans: Starter (€29/month, text only), Professional (€49/month, 600 voice min), Business (€99/month, 2,000 voice min) and Enterprise (€199/month, 6,000 voice min). All include 6 text channels. Voice from the Professional plan.

Can I try it for free?

Yes. You can start free with 50 conversations included. No credit card required. This way you can verify the value before committing to a plan.

What happens if I exceed my plan minutes?

Additional voice minutes cost €0.05/min. WhatsApp messages are €0.008 each, SMS €0.04 each. There are no service interruptions: your agent keeps working. You get alerts at 80% and 100% usage.

Is there an annual discount?

Yes, with annual billing you save 2 months. That means you pay for 10 months instead of 12, with the same full service.

AI voice agents: how they work and why they sound so natural

When a customer calls an AI voice agent, 4 steps happen in under 800 milliseconds. This article explains each step and why AgenteUno achieves natural conversations in Spanish.

The voice pipeline

Audio → STT (Speech-to-Text) → LLM (Brain) → TTS (Text-to-Speech) → Audio

Each step adds latency. The goal is to keep the total under 1 second so the conversation feels natural.

Step 1: Speech-to-Text (STT)

STT converts the customer's voice into text. It's the agent's ear.

Technologies used:

Deepgram Nova-2: 200ms latency, excellent in Spanish
Whisper (OpenAI): More accurate but slower (~500ms)
Google Cloud STT: Good multilingual support

Challenge in Spanish: Regional accents (Mexico, Argentina, Spain) require specifically trained models. A generic STT confuses similar-sounding words and colloquialisms.

AgenteUno uses models optimized for both Peninsular and Latin American Spanish with error rates below 5%.

Step 2: LLM (the brain)

Once we have the text, the LLM decides what to respond. This is where the agent's intelligence lives.

What it processes:

The customer's transcription
Full conversation context
Business knowledge base
System instructions (personality, restrictions)

Speed: We use models optimized for low latency. The LLM generates responses in ~200ms for short phrases.

Step 3: Text-to-Speech (TTS)

TTS converts the LLM's response into audio. It's the agent's voice.

What matters:

Naturalness: Not sounding robotic. Modern voices are nearly indistinguishable from humans
Prosody: Intonation, rhythm, pauses. Spanish has very distinctive prosody
Streaming: TTS starts speaking before finishing the entire phrase generation

AgenteUno Spanish voices: 4 native voices (2 female, 2 male) with neutral accent and regional variants.

Step 4: Audio output

Generated audio is sent to the customer in real-time via WebRTC or PSTN telephony. Codec quality and network conditions affect the final experience.

The role of latency

Total latency	Experience
< 500ms	Imperceptible, like talking to a human
500-800ms	Acceptable, slight pause
800-1200ms	Noticeable, customer perceives "thinking"
> 1200ms	Poor experience, customer hangs up

AgenteUno optimizes each step to keep total latency below 800ms.

Advanced features

Interruptions (barge-in)

The customer can interrupt the agent at any time. The agent detects the customer speaking, stops its response and listens.

Sentiment detection

The agent analyzes voice tone to detect frustration, urgency or satisfaction and adapts its response.

Human transfer

If the agent detects a situation requiring human intervention, it transfers the call with a context summary.

Try a voice agent →

When a customer calls an AI voice agent, 4 steps happen in under 800 milliseconds. This article explains each step and why AgenteUno achieves natural conversations in Spanish.

The voice pipeline

Audio → STT (Speech-to-Text) → LLM (Brain) → TTS (Text-to-Speech) → Audio

Each step adds latency. The goal is to keep the total under 1 second so the conversation feels natural.

Step 1: Speech-to-Text (STT)

STT converts the customer's voice into text. It's the agent's ear.

Technologies used:

Deepgram Nova-2: 200ms latency, excellent in Spanish
Whisper (OpenAI): More accurate but slower (~500ms)
Google Cloud STT: Good multilingual support

Challenge in Spanish: Regional accents (Mexico, Argentina, Spain) require specifically trained models. A generic STT confuses similar-sounding words and colloquialisms.

AgenteUno uses models optimized for both Peninsular and Latin American Spanish with error rates below 5%.

Step 2: LLM (the brain)

Once we have the text, the LLM decides what to respond. This is where the agent's intelligence lives.

What it processes:

The customer's transcription
Full conversation context
Business knowledge base
System instructions (personality, restrictions)

Speed: We use models optimized for low latency. The LLM generates responses in ~200ms for short phrases.

Step 3: Text-to-Speech (TTS)

TTS converts the LLM's response into audio. It's the agent's voice.

What matters:

Naturalness: Not sounding robotic. Modern voices are nearly indistinguishable from humans
Prosody: Intonation, rhythm, pauses. Spanish has very distinctive prosody
Streaming: TTS starts speaking before finishing the entire phrase generation

AgenteUno Spanish voices: 4 native voices (2 female, 2 male) with neutral accent and regional variants.

Step 4: Audio output

Generated audio is sent to the customer in real-time via WebRTC or PSTN telephony. Codec quality and network conditions affect the final experience.

The role of latency

Total latency	Experience
< 500ms	Imperceptible, like talking to a human
500-800ms	Acceptable, slight pause
800-1200ms	Noticeable, customer perceives "thinking"
> 1200ms	Poor experience, customer hangs up

AgenteUno optimizes each step to keep total latency below 800ms.

Advanced features

Interruptions (barge-in)

The customer can interrupt the agent at any time. The agent detects the customer speaking, stops its response and listens.

Sentiment detection

The agent analyzes voice tone to detect frustration, urgency or satisfaction and adapts its response.

Human transfer

If the agent detects a situation requiring human intervention, it transfers the call with a context summary.

Try a voice agent →

AI voice agents: how they work and why they sound so natural

The voice pipeline

Step 1: Speech-to-Text (STT)

Step 2: LLM (the brain)

Step 3: Text-to-Speech (TTS)

Step 4: Audio output

The role of latency

Advanced features

Interruptions (barge-in)

Sentiment detection

Human transfer

Automate your business support in minutes

AI voice agents: how they work and why they sound so natural

The voice pipeline

Step 1: Speech-to-Text (STT)

Step 2: LLM (the brain)

Step 3: Text-to-Speech (TTS)

Step 4: Audio output

The role of latency

Advanced features

Interruptions (barge-in)

Sentiment detection

Human transfer

Automate your business support in minutes