The client
SITA is a global leader in air transport communications and IT solutions. Drives digital transformation for more than 400 customers, covering 18,000 commercial aircraft and partnering with over 90 air navigation service providers. Serving 90% of the world’s airlines, SITA offers solutions like digital operations management, aircraft data handling, and seamless communication services for aircraft.
The challenge
SITA embarked on a mission to enhance the precision and efficiency of incident classification across the aviation industry. In doing so, the organisation encountered a series of interconnected challenges rooted in operational complexity, regulatory demands, and advanced analytics.
Airlines continually process high volumes of incident reports that must be accurately mapped to multi-layered risk management structures, such as the Bow Tie Risk Model. SITA needed a system that could automatically categorise incident data across several interrelated dimensions:
- risk categories,
- specific events,
- potential threats,
- related control measures
This required an approach that balanced scalability with domain understanding, ensuring that classification reflected the true nature of incidents within a safety-critical context.
A key priority for SITA was to ensure that no critical incidents were overlooked. This required the development of a robust classification model with high recall, capable of identifying even the most uncommon cases. Missing such data could compromise safety insights and operational decision-making.
Operating in a highly regulated industry, SITA needed to deliver interpretable confidence scores alongside predictions. This transparency was crucial for building trust with airline stakeholders and for supporting compliance with strict data governance and aviation safety regulations. The solution had to not only be technically sound but also auditable and explainable.
Another challenge involved the integration of aviation experts’ knowledge into the AI workflow. SITA recognized that domain expertise could significantly enhance model performance, especially in edge cases where contextual insights were essential. The project required mechanisms for seamlessly incorporating expert feedback, enabling continuous learning and refinement of predictions.
The Bow Tie Risk Model implemented by SITA comprised over 500 unique combinations of events, threats, and control actions, representing a significant classification challenge. Navigating this taxonomy required advanced modeling techniques and a deep understanding of how each element contributes to overall risk assessment.
The solution
BitPeak developed a smart text classification system that can assign multiple relevant labels to each incident report. The system uses advanced AI technology (based on GPT-4o) and runs entirely within SITA’s secure Azure environment. This setup ensures that all data is processed and stored within SITA’s own cloud infrastructure, fully compliant with GDPR and data residency requirements.
We used integrated tools in Databricks to track progress, manage model versions, and deploy updates efficiently. This allowed the client to monitor results in real time during development and created a solid foundation for future automation. With this setup, new model versions can be automatically tested and deployed as new data becomes available, or classification logic improves. All this enables faster updates, better oversight, and ongoing performance improvements in the live system.
The solution was designed around the Bow Tie Risk Model and tackled each classification stage individually:
Risk category identification
The first step assigned one or more top-level risk categories to each incident. This acted as a filter, determining whether the incident warranted deeper analysis. With 88% accuracy and 96% recall, the system reliably captured nearly all critical cases for further review.
Event recognition
Once risk categories were assigned, the system classified incidents into structured event types within the Bow Tie framework. Because events followed a hierarchical structure, we used a layered classification approach to streamline the process—achieving 97% accuracy while reducing both complexity and cost.
Threat detection
At this stage, the system identified potential threats linked to each event using built-in domain knowledge. The model showed strong performance, with 82% accuracy and 81% recall, confirming its ability to correctly interpret context and spot key risk factors.
Control action assignment
Finally, each identified threat was matched to a relevant control measure. In more complex scenarios, adding expert insights significantly improved outcomes—leading to over 12% higher accuracy and nearly 50% improvement in recall compared to models trained only on historical data. This clearly showed the value of combining AI with expert knowledge to enhance safety and decision-making.
The solution incorporated several innovative features to maximise accuracy, scalability, and long-term adaptability:
- We used a dynamic knowledge base that enriches the AI’s understanding of each incident by pulling in context-specific expert insights. This significantly improved the system’s performance in more complex classification tasks. The knowledge base updates continuously as new information becomes available, ensuring the model stays current over time.
- A new method was implemented to help the AI better assess and communicate how confident it is in its predictions. Based on recent research (Tian et al., 2023), this technique led to more accurate and trustworthy confidence estimates than traditional approaches—helping users make better-informed decisions.
- The AI model was built with a flexible architecture that supports efficient scaling and deployment, using Databricks’ MLflow tools. This ensures the solution can be maintained and expanded easily as business needs evolve.
To reach our goal, we used the following tech stack:
- Azure Databricks – scalable data processing and deployment environment
- Azure OpenAI – provided foundation model for advanced natural language understanding and classification
- MLflow – enabled experiment tracking, model versioning, managing model artifacts and model deployment
- FAISS – similarity search index leveraging training data for dynamic RAG during classification
Benefits:
Safety & risk management
Demonstration of real-world value of using AI for analyzing airline safety incidents. By combining AI with aviation-specific knowledge, we improved the model’s ability to detect patterns and classify risks effectively. This lays a strong foundation for future AI-supported risk analysis tools, helping teams better anticipate and mitigate operational safety risks.
Data & analytics
We compared traditional methods of assessing model certainty with a new approach that allows the AI to evaluate its own confidence directly. This technique proved more reliable, aligning confidence levels more closely with actual outcomes. This increases trust in model outputs, enabling analysts to make better-informed decisions based on high-confidence predictions and prioritise cases more effectively.
Legal, compliance & audit
The AI system includes built-in explanations for each classification and provides confidence scores for every report. These features support transparency and traceability, which are key for complying with EU regulations like the AI Act.
Engineering & IT operations
The model was deployed with a framework that supports easy updates, performance monitoring, and continuous learning. That enables smoother collaboration across tech teams, reduces deployment overhead, and sets the groundwork for future MLOps pipelines.