AWS Public Sector Blog
Stop Soldier Suicide partners with Pariveda, AWS on mission to reduce suicide rates among US service members and veterans
In July 2024, the Pentagon conducted a study that revealed a sobering statistic: US soldiers are nine times more likely to die by suicide than in combat, leading to suicide being the number one cause of death for US soldiers. Suicide rates among veterans continued to climb from 2001 to 2018 before dipping back for 2019 and 2020. The suicide rate among veterans is still 1.5 times higher than that of the general population. In 2020, there were 6,146 veteran suicides, which averages to 16.8 veterans dying by suicide every day.
Stop Soldier Suicide (SSS) is the only national nonprofit focused solely on solving the issue of suicide among US veterans and service members. By using leading-edge technology and data insights, SSS aims to apply innovative strategies to find and serve veterans and service members at the highest risk with specific suicide intervention services.
SSS has an ambitious goal to reduce military and veteran suicide rates by 40 percent no later than 2030, effectively saving more than 2,400 lives per year.
The Black Box Project
Because more than two-thirds of service members who die by suicide have no history of mental illness or suicidal ideation, SSS started the Black Box Project in partnership with Amazon Web Services (AWS) Professional Services. Launched as an early prototype to identify data from devices of those lost to suicide, the project is an effort to gain better insight into the warning signs of suicide in veterans to help support suicide postvention, intervention, and ultimately prevention.
In the fall of 2023, Pariveda was brought on as an AWS technology partner to build on this foundation with a codeveloped vision for a Suicide Intelligence Platform, or SIP. The Suicide Intelligence Platform is the teams’ commitment to reduce the incidence of suicide among service members and veterans by augmenting evidence-based clinical practices with data-driven predictive technologies.
Our approach with the SIP was to architect a data lake platform that provided SSS with scalable, flexible, cost-effective, and secure data infrastructure that enables them to:
- Set up data ingestion pipelines for client health data (device data from the Black Box Project and health data from Salesforce to understand client risk profiles and effectiveness of treatment programs)
- Use sales and marketing platform data such as Facebook Ads and others to inform planning of holistic marketing strategies and donor outreach
- Combine these datasets and develop foundational reporting capabilities for their data analysts and engineers to query, report, and create dashboards for end user consumption
We implemented an automated data ingestion pipeline featuring a streamlined data cleansing process, which reduces the time needed for data scientists to cleanse the data before analysis. We empowered marketing and health teams to brainstorm new insights to augment existing analysis regarding suicidality for sharing with researchers, clinicians, and external stakeholders. Architecting the platform on AWS allowed us to get up and operational within the first three months of starting the project.
The data platform supports two primary use cases: the core Black Box Project data pipeline and marketing data processing. Our data lake house is organized into three distinct zones—landing (raw), cleansed (analytics-ready), and curated (purpose-ready). This structure is managed using AWS Glue Data Catalog and AWS Lake Formation, with all tables stored in the efficient Apache Iceberg format. To optimize for SSS’s budget and modest data volume, we’ve developed a custom AWS Lambda image capable of running Apache Spark within the runtime. This approach promotes full compatibility with the flexible Apache Iceberg format while avoiding the overhead of deploying a full cluster solution.
Solution overview
To begin, Black Box Project families temporarily loan us their loved ones’ digital devices, such as smartphones and tablets, for our team of forensic experts to examine the data. The solution then follows these steps:
- The forensic analyst completes unlocking the device. They upload the resulting Excel file to a Secure File Transfer Protocol (SFTP) endpoint served through AWS Transfer Family, which places it in our landing zone Amazon Simple Storage Service (Amazon S3) bucket. This initiates the transformation pipeline.
- Upon the file’s arrival in the S3 bucket, a Lambda function is triggered by the Amazon S3 event notification, which initiates an AWS Step Functions workflow to begin processing the uploaded data.
- The first Lambda function within the Step Functions workflow is responsible for converting the Excel file into a series of PySpark dataframes, which are then registered and stored in the AWS Glue Data Catalog for further processing.
- To handle personal identifiable information (PII) redaction, we implemented an AWS Glue extract, transform, load (ETL) job using the AWS Glue EntityDetector library, which ensures sensitive information is redacted before saving the cleaned data into our cleansed zone.
- The Amazon Elastic Container Service (Amazon ECS) cluster allows us to scale the feature engineering jobs, which were written in Pandas. Several features (such as social isolation, message sentiment, and sleep duration) are created. Because Pandas does not provide a direct Apache Iceberg interface, the cluster also makes use of Amazon Athena to read/write the underlying Apache Iceberg tables.
- Results in the curated zone are then available for visualization and reporting through Amazon QuickSight and for model training through Amazon SageMaker.
- For the other use case, we use Fivetran to ingest data from external sources (clinical, marketing, and fundraising data). Fivetran integrates with AWS Glue Data Catalog and stores the ingested data in Apache Iceberg format.
- Additional data source integrations are implemented using a custom Lambda connector.
- Several Lambda functions are responsible for transforming and redacting the raw data before it is saved into the cleansed zone. One of these functions additionally writes the fundraising analytics data directly to Salesforce through the Bulk API 2.0.
- The Black Box device data is enriched with the other data sources, creating a comprehensive donor profile that provides a holistic view for downstream analysis and insights.
- All the infrastructure is provisioned using AWS Cloud Development Kit (AWS CDK) and deployed through AWS CodePipeline. We devised a custom library that, when given an AWS CDK stack, creates a deployment pipeline that can synthesize the stack and deploy it across our environments. The pipeline is automatically triggered on any update to the main branch, aligning with the trunk-based development paradigm of CodePipeline. It also includes manual approval for production deployment.
The following diagram shows these steps and the high-level architecture.
Conclusion
Pariveda and AWS continue to partner with Stop Soldier Suicide to evolve this data platform foundation and further enable their lifesaving work in a variety of ways, from facilitating ongoing open-ended research and analysis into the causes and risk factors associated with suicide to fundraising management and analysis to facilitating new secure data access patterns for additional use cases.
For more information, contact your AWS account team or the AWS Public Sector team.