From isolated alerts to contextual intelligence: Agentic maritime anomaly analysis with generative AI

This post is co-written with Arad Ben Haim and Hannah Danan Moise from Windward.

Windward is a leading Maritime AI™ company, delivering mission-grade, multi-source intelligence for maritime-based operations. By fusing Automatic Identification System (AIS) data, remote sensing signals, proprietary AI models, and generative AI, Windward provides a 360° view of global maritime activity so defense and intelligence agencies, law enforcement, and commercial leaders can anticipate threats, protect critical assets, and stay in control at sea.

This blog post demonstrates how Windward helps enhance and accelerate alert investigation processes by combining geospatial intelligence with generative AI, enabling analysts to focus on decision-making rather than data collection. Prior to using Windward, maritime analysts spent hours manually gathering and correlating complex data to understand vessel behavior anomalies: unusual activity spikes, unexpected movements, deviations from known patterns. It required significant time and deep domain expertise. Windward’s Maritime AI™ automates this process, surfacing context and implications so analysts and companies can make informed decisions about maritime risks and opportunities with speed and precision.

Challenge

Maritime analysts rely on Windward’s system to stay ahead of complex global threats. As part of Windward’s ongoing commitment to facilitate a “mission-ready” user experience, the company continuously evolves how users move from detection to decision-making. While Windward Early Detection successfully identifies suspicious patterns, Windward further accelerated situational awareness by making the investigative process more fluid and automated.

To optimize the analytical workflow, Windward sought to enhance the correlation of external context through three key strategic improvements:

Unified Workflow: Minimizing the need to consult external data sources, facilitating a continuous and focused analytical environment.

Expertise Optimization: Automating the collection of weather, news, and alert data to allow domain experts to dedicate more time to strategic interpretation.

Comprehensive Coverage: Streamlining the synthesis of information to enable more rapid and in-depth investigation of multiple alerts simultaneously.

As a core component of MAI Expert™, the first generative AI maritime agent, Windward partnered with the AWS Generative AI Innovation Center to deliver a solution that automatically contextualizes maritime anomalies. This collaboration helped enhance the user experience by correlating alerts with relevant public and proprietary data, integrating these findings seamlessly with Windward’s internal models, and uses generative AI to help deliver comprehensive, actionable risk assessments.

Solution overview

In collaboration with AWS, Windward developed a multi-step AI-powered solution that automatically fetches relevant data from a variety of internal and external data sources and uses this information to generate a textual description that contextualizes maritime anomaly events.Figure 1 depicts the end-to-end architecture of the solution deployed to AWS.

Architecture diagram for windward aws blog

Figure 1. Solution architecture

Given an anomaly identified in the Windward Early Detection system, the solution extracts relevant metadata from the anomaly event using Windward’s internal database. The metadata includes the anomaly timestamp, region coordinates, anomaly type, vessel class, and other relevant information.

Next, the anomaly metadata is passed to the agentic analysis system powered by large language models (LLMs) on Amazon Bedrock. The multi-step anomaly analysis pipeline is orchestrated using AWS Step Functions. In the first step, the system queries multiple, diverse external data sources to provide relevant background on the anomaly, which is a key part of creating new value for our customers. These sources include:

Real-time news feed: Alerts and event signals discovered from public data are fetched and filtered based on the maritime anomaly’s time and location.
Intelligent web search: The system uses large language models to generate precise search queries, retrieving real-time web search results that provide up-to-date context for the anomaly.
Weather data: An external API is used to retrieve relevant weather data, such as temperature, wind speed, and precipitation, for the anomaly’s location and time.

Each data source is queried using a separate AWS Lambda function. After retrieving the data from the three sources, the pipeline moves to the second step. In the second step, a separate LLM—powered by Anthropic’s Claude through Amazon Bedrock—examines the data items and decides whether there is a need to fetch additional web search results. The LLM is instructed to make the decision after cross-checking the anomaly data against the retrieved data items and judging whether the data retrieved so far is sufficient to explain the anomaly or if some aspects related to the event are missing. The LLM either generates a new search query or a command to move to the next step of the pipeline. The Lambda function parses the LLM output and optionally triggers the web search function again to retrieve additional news that might provide important context about the anomaly, appending it to the previous search results. If there are no new search queries, the Step Function proceeds to the next Lambda function in the pipeline.

Diagram of flow through self-reflection

Figure 2. Self-reflection logic

After running self-reflection and additional data retrieval, the system performs two filtering and ranking steps to remove news items that are not related to the considered anomaly. First, it uses a re-ranking AI model, Amazon Rerank, which sorts the data items according to their relevance to the anomaly. This step is geared toward maintaining high recall, focusing on removing the most irrelevant data items to reduce the set of candidate items to process on the next stage. Second, each of the top-ranked items is further scored by the LLM across multiple dimensions, including time, location, matching vessel type, and others. The system assigns relevance scores between 0 and 100 and only keeps data items with a relevance score above a threshold determined by the solution developers. This step is more precise and is geared toward high precision, making sure only the most relevant news items are kept. The top-ranked data and news items are passed to the next step of the solution pipeline.Finally, the pipeline uses another LLM that uses the top-ranked data items to generate a contextualized report on the anomaly, summarizing its potential causes, risks, and implications. The concise report is written for Windward’s customers and directly cites the data sources used, which allows users to verify the information and learn additional details by following the links. Figure 3 provides an example of what the generated report looks like for one of the vessel activity anomalies.

Maritime intelligence product

Figure 3. Example Anomaly Report

Evaluation

The end-to-end system is evaluated on a set of existing maritime anomalies that occurred in the past. The evaluation consists of several stages. First, the summaries are automatically evaluated using an LLM-as-a-judge approach, a method that included human-alignment work for the LLM judges. The judge uses a set of six predefined criteria, including credibility, data quality, source diversity, coherence, and ethical bias. The judge evaluates each dimension on a scale between 1 and 100 and assigns the scores to each report. Figure 4 depicts example scores assigned to one of the generated reports by the LLM judge.Second, we calculate several deterministic metrics on the report quality. This includes the length of the report in characters, as well as the number of data sources explicitly cited in the text. These metrics help to judge the size and the credibility of the generated explanation.Finally, the selected summaries are also evaluated by human experts, who cross-check the generated summaries and retrieved data sources against their own search results and domain understanding.

Explaination of LLM Judge outputs

Figure 4. Example LLM-as-a-judge scores

Conclusion

The initial agentic solution presented in this blog marked an important milestone in the development of Windward’s MAI Expert™. Building on Windward’s already powerful system, this enhancement helped accelerate maritime alert investigation and enabled analysts to focus even more on decision-making rather than data collection.This approach combined geospatial intelligence with generative AI to streamline what was previously a manual, time-intensive process. High-quality anomaly summaries generated by the system helped analysts better understand the context of maritime events—unusual activity spikes, unexpected movements, deviations from known patterns—and make informed decisions about corresponding risks and opportunities.These capabilities expanded Windward’s value proposition across user segments. For existing users with deep maritime expertise, they further helped streamline workflows and reduce the time needed to derive relevant context. For users with limited maritime expertise, they opened new possibilities by surfacing critical insights without requiring manual correlation of complex datasets.

Artificial Intelligence

From isolated alerts to contextual intelligence: Agentic maritime anomaly analysis with generative AI

Challenge

Solution overview

Evaluation

Conclusion

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help