AWS Public Sector Blog
A faster, more resilient digital repository: Migrating DSpace to AWS

Automated bot traffic has surged across academic digital repositories, creating real performance problems for institutions that make research openly accessible. At Johns Hopkins University (JHU), the problem was compounding an already difficult situation. The Sheridan Libraries’ installation of DSpace—an open-source digital repository system used by thousands of institutions worldwide—was running on on-premises infrastructure that the team could no longer update without significant manual work. The system was many versions behind the latest release, and the single-server setup required significant dedicated resources to handle frequent traffic spikes.
These challenges made modernizing DSpace a necessity to support the university community. The Digital Research and Curation Center (DRCC), the group within the Sheridan Libraries that builds and manages digital infrastructure for open scholarship, migrated DSpace to the cloud with Amazon Web Services (AWS). Using Amazon Elastic Container Service (Amazon ECS) with AWS Fargate, the team achieved a faster, more scalable repository without the operational burden of maintaining on-premises infrastructure.
Frozen infrastructure and surging traffic
DSpace, known at Johns Hopkins as JScholarship, is the central repository for the university’s research and cultural collections, housing over 150 collections that include research papers, theses, dissertations, historical documents, newsletters, articles, images, audio, video, sheet music, and maps.
“JScholarship users range from students depositing theses, to government employees researching state land maps, to musicians searching for historic sheet music compositions,” said Digital Repositories Manager for the Sheridan Libraries, Allison Fischbach. JScholarship also supports the university’s open-access policy, providing faculty with a place to make their research publicly discoverable. “DSpace is used to maintain a permanent record of university scholarship,” said Hodson Director of the DRCC and Open Source Programs Office, Bill Branan.
But running DSpace on-premises had become unsustainable. Recent licensing changes resulted in increased hosting costs, and much of the maintenance and administration for supporting DSpace relied on manual processes. Pushing code updates into production took months.
“Updates hadn’t really been done to DSpace for a very long time because there was a lack of confidence in the process,” explained Senior Cloud Engineer in the DRCC, Steven Miklovic.
Meanwhile, automated bot traffic—driven largely by AI companies scraping open-access research—had surged, and the infrastructure needed frequent manual intervention to keep up.
Before the migration, the DRCC team evaluated whether to replace DSpace entirely. They determined that the software was still the right fit, but it was necessary to increase the speed of moving software changes into production, and the deployment environment needed to be able to scale to meet demand without manual intervention. These requirements pointed to a container-based deployment with an automated build pipeline.
“Given the on-premises architecture, deploying changes in a timely manner would have been very difficult,” said Senior Software Engineer in the DRCC, Russell Poetker.
Building on an existing modernization effort
The DRCC team brought relevant experience to the project. Engineers on the team had already modernized the Public Access Submission System (PASS), a custom application that allows researchers to deposit articles into DSpace, using a similar containerized architecture. The team also drew on experience with DSpaceDirect, a hosted service run through Lyrasis, the organizational home of the DSpace open-source project. That prior work showed that hosting DSpace in the cloud could deliver consistency, repeatability, and resiliency.
Throughout the project, the DRCC team worked with AWS through a consultation-based approach, meeting at key milestones for architectural reviews. Those sessions validated the architecture and surfaced important security features and optimizations.
Six months from architecture to production
The technical implementation spanned about six months. The first three to four months focused on defining the initial architecture, including a significant data migration sub-project. The on-premises environment for DSpace stored files differently than Amazon Simple Storage Service (Amazon S3), so the team went through several iterations of migrating data, validating it, and refining scripts.
“Steps like that are how you build the confidence in the cloud for people,” Miklovic noted.
This phase also included the DRCC’s creation of infrastructure as code (IaC) to automate the deployment process, laying a repeatable foundation for future migrations.
Once the production environment was created using IaC tooling, the team performed testing and validation prior to a final production launch on January 12, 2026. Post-launch, they tuned scaling policies and optimized resource allocation to handle bot traffic spikes, followed by additional efficiency improvements.
A serverless architecture built for maintainability
Moving to a serverless architecture was more complex than a straightforward lift-and-shift, but the DRCC team chose that path deliberately. An earlier attempt at JHU to run a different application in a more advanced container orchestration environment had proven too burdensome. Amazon ECS with AWS Fargate offered a managed middle path.
“We wanted to really simplify the operational burden of an advanced architecture and focus on the developers being the primary support for the application,” said Miklovic. By shifting infrastructure management to AWS-managed services, the team could redirect their focus from operational maintenance to development, effectively adopting a DevOps model where the engineers who build the application also own its deployment and observability.
DSpace naturally breaks into several components, including a front end, back-end API, search index, and scheduled jobs, which the team split into separate containers so each can scale independently. The architecture includes Amazon Relational Database Service (Amazon RDS) for PostgreSQL configured for high availability; Amazon S3 for the DSpace asset store; AWS WAF in combination with Cloudflare for application security and bot traffic management; Elastic Load Balancing using Application Load Balancers for public and internal traffic; Amazon EventBridge for scheduled tasks; and Amazon CloudWatch for monitoring. The team also used Amazon Q Developer for the first time to support architectural decisions.
The migration also gave back to the open-source community. The team found that DSpace’s Amazon S3 storage integration relied on an outdated version of the AWS software development kit, upgraded it, and contributed the fix upstream.
“That’s one of the nice things about working in open source,” said Branan. “If we find something that’s a problem, not only can we fix it, but we can push it back up for anyone else who needs to use it.”
Faster performance, faster deployments, and greater confidence
Since launching, the new environment has reached stable performance after an anticipated tuning period. The public-facing load balancer typically averages 400,000 to 500,000 requests per day, while a second, internal load balancer handles over 2 million, which reflects the volume of communication between DSpace’s internal components.
The difference has been immediate for the people who use DSpace every day. Students searching for dissertations, faculty accessing research, and staff managing collections all noticed faster response times as soon as the cutover happened. Where the old single-server setup left the repository vulnerable to bot traffic spikes, the new architecture absorbs surges without degrading the experience for real users.
Centralized logging and alerts now give the DRCC team real-time visibility across the environment, replacing the reactive troubleshooting of the old setup. The serverless nature of the deployment also gives engineers more time to focus on improving the application itself.
The new deployment pipeline has also shortened the path from code change to a testable environment.
“Verification of application changes in a pre-production environment now happens within a few minutes after a PR is merged. This is a big improvement for our development and test cycle,” said Poetker.
With faster performance for users and a streamlined workflow for developers, the DSpace migration has given the DRCC team confidence that they can apply the same approach with other applications in their portfolio. Stakeholders for other library systems are eager for similar transitions.
A roadmap for other institutions
The DRCC team is already migrating more applications into a similar architecture and exploring how AI can support DevOps visibility over time. Other academic libraries and cultural institutions considering this type of migration can draw on the team’s experience: start with a managed service like Amazon ECS as a pathway into the cloud; take small steps to build confidence; and use what others have built.
To that end, the DRCC team published an open-source reference architecture for DSpace on AWS on GitHub, which also breaks out components that other institutions can reuse for different applications, so they don’t have to build it all from scratch.
With bot traffic continuing to grow and on-premises infrastructure increasingly difficult to maintain, modernizing digital collections in the cloud is becoming a practical necessity. Explore how AWS helps institutions build secure, scalable solutions for higher education.
Read related stories on the AWS Public Sector Blog
- Reimagining university libraries with AWS: University of Maryland’s six-month cloud migration
- Old Dominion University helps to modernize quantum chemistry software for 140,000 researchers with AWS
- Seattle University’s 8-year cloud journey: Key lessons, wins, and a new path forward
- Macquarie University accelerates cloud transformation with AWS