One of the biggest challenges was determining the learner's stage in a live role-play from the conversation itself without using a fixed script or manual tags. Conversations were open and free, so it was difficult to keep feedback and scores fair for all learners and cases, especially when learners jumped between steps or skipped key questions.
Choosing the right AI models was another issue. Using a large, expensive model for every step increased our running costs by about 30-50%, but using only small models could reduce the quality of feedback, especially on cases like handling complaints, budget limits, or prescription choices. Adding a two-way voice for both the "learner" and the "patient" was also difficult. If the delay exceeded 1-2 seconds during a live session, the conversation felt slow and unnatural, and learners were more likely to interrupt or repeat themselves.
On the system side, unstable internet connections sometimes caused the same message to be sent more than once. This created duplicate messages in the conversation history in about 2-5% of sessions, which could change the identified consultation stage and lead to repeated feedback. We also needed each scenario to feel realistic, with details such as prescription, budget, and daily needs, without making the dialogue feel scripted, repetitive, or identical across learners.










