PwC and AWS Help European Parliament Deliver Fast Digital Access to Over 450,000 Parliamentary Documents
PricewaterhouseCoopers (PwC), a global AWS Partner, worked closely with the European Parliament to build the ArchiBot application, which gives citizens digital access to 450,000 archived documents and reduces document search time by 80 percent. PwC, the Parliament, and AWS collaborated to build the application to store data on Amazon S3, which enabled the Parliament to quickly build a dashboard in Amazon QuickSight by leveraging the service’s native integration and simplicity of use.
Seeking to Make Government Documents Accessible to All Public Citizens
The European Parliament, one of two European Union legislative bodies, works with the Council of the European Union to adopt legislation that impacts European citizens. As part of its service to the public, the Parliament allows researchers and citizens to view historical documents such as meeting transcripts and parliamentary plenary sessions, as well as letters. These scanned documents form part of a library of five million records.
However, accessing records was proving difficult for researchers and citizens. Upon request, the Parliament would send PDF versions of documents to users. Many of these were low quality and difficult to read because the original source documents were typewritten. The electronic archive management system was also cumbersome as it was designed for research specialists. “We wanted to make these documents easily accessible to all public citizens, not just researchers,” says Ludovic Delepine, head of archives at the European Parliament. “To do that, we knew we had to build a new management system that would enable people to quickly search documents and generate data in a readable digital format. We also wanted to provide the data in a dashboard for easy visualization.”
"We’d tried for over a year to use another business intelligence tool to build a document search solution, but we were able to build this application in under a week using Amazon QuickSight."
- Marco Amabilino, Head of Digitalization Department, European Parliament
Creating a Document Search Dashboard on AWS
The European Parliament evaluated different optical character recognition (OCR) software options but decided that a cloud-based solution would best meet its needs for performance and scalability. “We have an expanding document archive and increasingly complex research topics, so we needed to support fast document analysis. Ultimately, we decided the cloud was the best way to achieve these goals, and Amazon Web Services (AWS) could scale to our needs while providing elasticity and a pay-as-you-go model without the need to provision and manage infrastructure,” says Delepine.
The Parliament decided to work with PwC, an AWS Partner, to conduct a series of studies of AWS-based solutions. Because the Parliament is an AWS Enterprise Support customer, both the Parliament and PwC received support and architectural guidance from an AWS Solutions Architect and Technical Account Manager. “We collaborated closely to find the right solution for the Parliament’s challenges,” says Thierry Kremser, data and analytics leader at PwC Luxembourg. “We evaluated various technologies in terms of effectiveness in addressing these challenges.”
As a result of these efforts, PwC supported parliament by evaluating different AWS services on archive documents such as Amazon Textract (to extract text from documents), Amazon Comprehend (to identify structural data and uncover critical information), and Amazon OpenSearch Service (to create smart search indexes from the data). The application, called ArchiBot, uses AI algorithms and AWS Lambda services for text extraction and document summarization, extracting important information from low-quality documents. This information is then used to generate metadata for each document. The application relies on Amazon Simple Storage Service (Amazon S3) for storing document data.
For data visualization, the Parliament decided to use Amazon QuickSight, a serverless business intelligence solution with native machine learning integration that integrates with Amazon S3. In collaboration with the AWS Enterprise Support account team, the Parliament created a production-ready QuickSight dashboard that allows users to visualize search results and quickly find archived documents, with information taken from metadata filters. “I was surprised by how easy it was to create and implement the dashboard,” says Marco Amabilino, head of digitalization department at the European Parliament.
Giving Citizens Digital Access to 450,000 Archived Documents
With these AWS technologies, PwC and the Parliament launched the Archives Unit Dashboard, a web-based solution that gathers more than 450,000 digitized documents from the years 1952 to 1979. Documents from the remaining years, including up to the present, will be available in the near future. The solution automatically reads archived, scanned PDFs and condenses the text while preserving the key information and meaning. Using an Amazon QuickSight–powered dashboard to search documents and visualize data, citizens and researchers can quickly and easily access documents to discover what members of Parliament discussed and debated over the years. Eventually, all five million documents in the archive will be accessible via the solution.
PwC and the Parliament also used advanced AI algorithms to add a “top words” interactive word cloud offering insight into previous parliamentary questions throughout history. Clicking a word in the word cloud reveals all documents containing that word and an AI functionality helps users identify related parliamentary questions.
Reducing Search Time by 80%
Taking advantage of the Archives Unit Dashboard, citizens and researchers have a simple tool for quickly finding archived documents. “Users surface documents almost immediately, instead of waiting for them to be sent out,” says Delepine. This results in an 80 percent time saving in users receiving the documents they seek. “Users are able to use the dashboard filters to select certain time periods or subjects and access fast summaries instantly.”
The Parliament recently reported results from a survey of archive users showing a customer satisfaction rate increase from a score of 3 to 4.75. “The increase in satisfaction comes from the ease of use and speed of the dashboard, which help people quickly find what they’re looking for,” says Delepine.
Using serverless AWS Lambda functions, Amazon S3, and Amazon QuickSight, the Parliament optimized the total costs of the solution by taking advantage of the AWS pay as-you-go model, avoiding the administrative overhead that comes with managing infrastructure.
PwC is now working with the Parliament to enhance the dashboard and add new search functionality. The Parliament also anticipates other government organizations will use the tool. “Other parliaments are very interested in this tool, and they’re impressed with the results we’ve achieved on AWS,” Delepine says. “Often, organizations conduct studies to examine new technologies, but we’ve actually implemented something that solved a real business challenge.”
About the European Parliament
The European Parliament is one of two legislative bodies of the European Union and one of its seven overall institutions. Together with the Council of the European Union, it adopts European legislation, following a proposal of the European Commission. The Parliament is composed of 705 members.
PwC is an AWS Partner providing IT solutions, services, and consulting via a network of firms in 155 countries. The organization employs over 284,000 professionals who deliver solutions and services to a range of customers.
Published September 2022