Smithsonian releases 2.8 million images through Smithsonian Open Access Initiative
The Smithsonian Institution announced the availability of more than 2.8 million two- and three-dimensional images and files through the Smithsonian Open Access Initiative. With this initiative, anyone with an internet connection has access to high-resolution media files, the accompanying metadata, and research from the Smithsonian Institution’s 19 museums, nine research centers, and zoo.
This Smithsonian dataset is the largest museum collection released to date. It is cross-discipline, combining data from science and technology, art and design, and history and culture. With this release, the Smithsonian Institution is setting a new standard for the museum sector by making millions of digital assets available for learning, discovery, and creative re-use.
This Smithsonian Open Access Initiative is supported by the AWS Public Dataset Program, which provides Amazon Simple Storage Service (Amazon S3) storage for publicly available high-value datasets. All of the data available through this initiative is available directly from Amazon S3, allowing anyone to analyze it and build services on top of it without needing to download or store their own copies. High-fidelity metadata can be searched and analyzed with services like Amazon Athena to identify relevant media files. Metadata can be combined with media and analyzed using artificial intelligence (AI) and machine learning (ML) to derive new insights about the history reflected in the Smithsonian’s collections.
For example, users could apply computer vision techniques to match styles in diverse collections’ objects like paintings, drawings, furniture, coins, bank notes, and photographs as a tool to reveal interesting historical patterns across collections, provide new ways to look at collections, and provide content for style transfers. Already, AWS is using computer vision of plants and insects to help build deep learning models based on Smithsonian data and data aggregated from multiple institutions. These models can be used not only to identify species, but also to look at global patterns of shape and species diversity.
“By hosting the Smithsonian Open Access data set in the cloud, we increase the availability of our trusted data, leverage cloud based processing tools, and keep download times for users worldwide to a minimum – all of which furthers our mission to make the nation’s collections more accessible than ever before,” said Effie Kapsalis, senior digital program officer at the Smithsonian.
From the National Air and Space Museum that explores the history of aeronautics and space exploration, to the Cooper Hewitt National Design Museum that explores design objects and design history, to the National Museum of African American History and Culture that features photographs and objects representing African American history – the Smithsonian Open Access Initiative brings the museums and their research to people around the world.
Bringing all of the Smithsonian Institution’s data and content to a central place allows scientists, historians, humanists, and artists to make connections across disciplines to open new lines of research and create new content. This includes training ML models to explore botany specimens, climate research, computer vision, genome analysis, cultural heritage preservation, and digital humanities studies.
The open access data is available to anyone, but targeted toward K12 and undergraduate teachers and students, artists and designers, scientific researchers, digital humanities scholars, and technologists and technology companies.
The Smithsonian Open Data Initiative will continue to grow the dataset, adding additional assets each year. The Smithsonian will work to improve the discoverability of digital resources, create new digital collections, and embed them throughout the semantic web so they are easily accessible to the world’s 4.3 billion internet users.