Knowledge sources: frequently asked questions

If you’ve ever had a chatbot give you a flat-out wrong answer, it probably wasn’t the AI itself that slipped up. More often, the issue is with the knowledge sources behind it. In simple terms, knowledge sources are just the stuff your bot looks at when it’s trying to answer — PDFs, spreadsheets, websites, even API feeds.

And just like any library, what matters isn’t how many books you have, but whether they’re the right ones.

What exactly counts as a “knowledge source”?

Pretty much anything your team already uses to store knowledge.

A PDF of your HR leave policy.
A customer onboarding deck in PowerPoint.
A CSV of last month’s sales data.
Even a page from your Shopify store.

What surprised me the first time I set this up: images can count too. The system can actually read the text in a scanned product manual or a screenshot of instructions (thanks to OCR).

How much content can I actually upload?

This depends on your plan, but here’s the ballpark:

Starter: 5MB total
Professional: 100mb
Enterprise: basically unlimited (with some guardrails so servers don’t tip over)

Now, you could dump in every policy doc, slide deck, and email archive. But I wouldn’t. From what I’ve seen, the teams that start with just 5–10 key documents — like an FAQ, pricing sheet, and a couple of policy PDFs — end up with better results than those that throw in 400 files on day one.

How does the AI “learn” from all this?

Okay, this part gets a bit technical, but stay with me.

It extracts text — so your PDF turns into searchable text.
It does semantic analysis — which is a fancy way of saying it figures out what the words actually mean in context.
It builds vector embeddings — think of these like index cards in an old-school library. Instead of pointing to a page number, they point to the meaning of the content.

Honestly, I don’t know the math behind embeddings (and I don’t really need to). What matters is that they make the AI better at finding the right answer.

Can I organize my knowledge sources?

Yes — and you’ll thank yourself if you do.

I’ve seen teams that skip this and end up with a chaotic mess where no one knows which “FAQ_Final_v3.pdf” is the real one.

Best practices I’ve found helpful:

Create folders by department (HR, Sales, Support).
Tag sources with keywords like “pricing,” “onboarding,” or “returns.”
Set priority levels so the AI knows your official FAQ beats a random internal deck.
Archive old material — outdated info is worse than no info at all

How do I keep my chatbot from giving outdated info?

The short answer: treat knowledge sources like living documents.

Keep quality high. A sloppy PowerPoint leads to sloppy answers.
Update on a schedule. Weekly for things like product prices, monthly for team changes, quarterly for policies.
Listen to feedback. If someone flags that the bot gave a wrong vacation policy, go back and update the HR doc.

I’m not sure there’s a way to guarantee “never wrong,” but this combo gets you close.

Can I connect external data?

Yep, and this is where it starts feeling powerful. You can:

Sync with Google Drive or Dropbox for always-updated docs.
Pull in Salesforce data so the bot knows customer history.
Connect Shopify so it can answer “Do you have this in stock?” without manual uploads.
Use APIs or webhooks for real-time updates.

I once worked with a retailer who connected their chatbot to Shopify and inventory data. Suddenly, customers could ask, “Do you have this shirt in medium?” and get a real answer — not a guess.

What if my chatbot gives the wrong answer?

It happens. Even the best-trained systems slip.

What helps is:

Confidence scoring so you know when the AI is guessing.
Source links so users can see where the info came from.
Feedback buttons so wrong answers get flagged.
Manual overrides so you can fix things quickly.

No AI is perfect, but the real value is that you can see when it’s wrong and do something about it.

How secure are my knowledge sources?

Security isn’t the flashiest topic, but it matters. Typically you get:

End-to-end encryption (in transit and at rest).
Role-based permissions (so finance docs don’t leak into marketing).
GDPR compliance and audit logs.

Personally, I always tell teams: don’t upload anything you wouldn’t already store in your company’s cloud drive. That gut check saves you from over-sharing.

Can multiple people manage all this?

Yes, and in larger teams they probably should. You can:=

Assign permissions (view, edit, approve).
Require approvals for changes.
Track who uploaded or deleted a file.
Roll back if someone messes up.

This avoids the classic “who changed the policy doc without telling anyone?” problem.

How do I know if my knowledge sources are working?

Metrics. Some useful ones:

Which docs are pulled most often.
Where the bot still can’t answer.
How satisfied users are (ratings, thumbs up/down).
How often sources are refreshed

I’ve noticed that teams who check these metrics quarterly end up with cleaner, more helpful knowledge bases than those who “set it and forget it.”

Quick start: 5 steps to set up your knowledge sources

Audit what you already have — grab your top 5–10 documents.
Upload them.
Test with real questions (not just the easy ones).
Ask your team for honest feedback.
Iterate — add, update, prune.

(One tangent here: don’t underestimate the value of pruning. Old docs sneak in over time, and pruning them out often improves answers more than uploading new ones.)

Final thought

Here’s the thing: a chatbot isn’t smart on its own. It’s smart because of what you feed it. Strong knowledge sources don’t guarantee perfect answers — but weak ones almost always guarantee bad ones.

So start small, keep it clean, and update often. That’s usually enough to make a chatbot feel genuinely useful