Custom AI Assistants in Practice: The Code and Commands That Really Matter
Personalized AI Assistants: The Essentials in One Article — Real Code, Diagrams, and Concrete Steps, Excerpts from a 44-Lesson Course.
No endless theory here: open the terminal and practice. Here's the essentials of Custom AI Assistants, extracted directly from a complete 44-lesson course — with real code you can copy-paste right now.
- Introduction and Getting Started
- Designing an Effective Assistant
- Creating Custom GPTs
- Claude Projects and Gemini Gems
- Actions and External Tools
Threads, messages and runs
Learning objectives
- Create and manage a conversation Thread
- Add user messages to a Thread
- Launch a Run and manage its lifecycle
- Choose between polling and streaming
- Retrieve and display the assistant's response
Threads: what are they concretely?
A Thread is a persistent container that stores the history of a conversation between a user and an assistant. OpenAI persists it on its servers. You only manage the ID.
Best practice: one Thread per user session. For example, in your app, you create a Thread when the user starts conversing. You store the thread_id in your database.
| Status | Meaning |
|---|---|
| queued | Queued, will start |
| in_progress | The assistant is generating the response |
| requires_action | Function call: your code must respond |
| completed | Completed successfully |
| failed | Failed, see last_error |
| cancelled | Manually cancelled |
| expired | Timeout (10 minutes) |
Polling: wait for the Run to finish
The classic pattern consists of regularly checking the status until completion:
Complete example: simple conversation
A global Assistant
Create the Assistant once and store its ID in config.
One Thread per session
Create a Thread when the user connects, store the ID in the database.
Retrieve the Thread
With each new message, reuse the stored ID to preserve history.
Limits and quotas
| Aspect | Limit |
|---|---|
| Run timeout | 10 minutes |
| Messages per Thread | No strict limit, but monitor |
| Threads per account | No strict limit |
| Concurrent Runs | Depends on tier (10-1000 simultaneous) |
| Thread Storage | 30 days by default, automatic purge after |
File search and code interpreter
Learning objectives
- Create a Vector Store and upload files to it
- Attach a Vector Store to an Assistant for RAG
- Enable Code Interpreter and test Python computation
- Retrieve files generated by Code Interpreter
- Combine both tools in the same Assistant
File Search: OpenAI's native RAG
File Search is the native RAG tool of the Assistants API. You give it files, it indexes them automatically (chunking, embedding, vector store) and allows the Assistant to search for relevant passages.
Difference with a custom RAG: with File Search, you neither choose a chunker, nor an embedding model, nor manage a vector DB. OpenAI does everything.
Vector Stores
A Vector Store is a container of indexed files, reusable across multiple Assistants. You create it once, add your documents, then attach it to the Assistants that need it.
Retrieving citations
Enabling Code Interpreter
Uploading a file for Code Interpreter
Function calling and custom tools
Learning objectives
- Define a custom function in JSON Schema format
- Attach it to an Assistant as a tool
- Handle the requires_action status and submit a response
- Define multiple functions and let the Assistant choose
- Build a mini-agent that combines multiple tools
Why function calling?
File Search and Code Interpreter are powerful but limited: they only access files and local Python. For EVERYTHING else (your internal DB, your custom APIs, specific business actions), you need function calling.
Principle: you describe your Python functions to the model. When it wants to call them, it returns the function name and arguments. You execute the function on your side and submit the result.
Defining a function in JSON Schema format
You describe each function with a name, a description, and a parameter schema:
Handling the requires_action status
When the Assistant decides to call your function, the Run moves to the requires_action status. You must:
Parallelization of tool_calls
If the Assistant requests multiple functions at the same time (for example "give me the weather in Paris AND Lyon"), you can execute them in parallel to gain performance:
Mini-agent: combining native tools and functions
The pinnacle: an Assistant that combines file_search, code_interpreter AND your custom functions. Becomes a true agent.
This article covers the most useful excerpts — the complete Custom AI Assistants course (11 chapters, 44 lessons, corrected exercises and final project) takes you all the way.
./access-the-complete-course free course: Prompt EngineeringFAQ
How long does it take to learn Custom AI Assistants?
Are there any prerequisites?
Where to start concretely?
📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.