AWS Big Data Blog
Using CombineInputFormat to Combat Hadoop’s Small Files Problem
James Norvell is a Big Data Cloud Support Engineer for AWS Many Amazon EMR customers have architectures that track events and streams and store data in S3. This frequently leads to many small files. It’s now well known that Hadoop doesn’t deal well with small files. This issue can be amplified when migrating from Hadoop […]