AWS Marketplace: Diffbot APIs Comments

Sign in

Sign in

or

Create a new account

Agent Mode

Categories

About What is AWS Marketplace?Why AWS Marketplace?Get started in AWS Marketplace Procurement options Cost management tools Governance & control features ??service_pages.dropdown.free_trials_en??Sell in AWS Marketplace

AI Agents & Tools AI Security Content Creation Customer Experience Personalization Customer Support Data Analysis Finance & Accounting IT Support Legal & Compliance Observability Procurement & Supply Chain Quality Assurance Research Sales & Marketing Scheduling & Coordination Software Development

Business Applications Blockchain Collaboration & Productivity Contact Center Content Management CRM eCommerce eLearning Human Resources IT Business Management Project Management

Cloud Operations Cloud Financial Management Cloud Governance

Data Products Automotive Data Environmental Data Financial Services Data Gaming Data Healthcare & Life Sciences Data Manufacturing Data Media & Entertainment Data Public Sector Data Resources Data Retail, Location & Marketing Data Telecommunications Data

DevOps Agile Lifecycle Management Application Development Application Servers Application Stacks Continuous Integration and Continuous Delivery Infrastructure as Code Issue & Bug Tracking Log Analysis Monitoring Source Control Testing

Industries Automotive Education & Research Energy Financial Services Healthcare & Life Sciences Industrial Media & Entertainment

Infrastructure Software Backup & Recovery Data Analytics High Performance Computing Migration Network Infrastructure Operating Systems Security Storage

IoT Analytics Applications Device Connectivity Device Management Device Security Industrial IoT Smart Home & City

Machine Learning Audio Computer Vision Data Labeling Services Generative AI Human Review Services Image Intelligent Automation ML Solutions Natural Language Processing Speech Recognition Structured Text Video

Professional Services Assessments Implementation Managed Services Premium Support Training

Delivery Methods API-Based Agents & Tools Amazon Machine Image EC2 Image Builder Component Amazon SageMaker AWS Data Exchange CloudFormation Stack Container Image Helm Chart Add-on for Amazon EKS Professional Services SaaS

Solutions AI Agents & Tools AWS Well-Architected Business Applications CloudOps Data & Analytics Data Products DevOps Digital Sovereignty Generative AI Infrastructure Software Internet of Things Machine Learning Managed Services Providers Migration Security

Industry ??industrySolutions.dropdown.advertising_and_marketing_en??Energy ??industrySolutions.dropdown.engineering_construction_and_real_estate_en??Financial Services Healthcare & Life Industrial ??industrySolutions.dropdown.life_sciences_en??Media & Entertainment Nonprofit ??industrySolutions.dropdown.power_and_utility_en??Public Health Public Sector ??industrySolutions.dropdown.retail_en????industrySolutions.dropdown.sustainability_en??Telecommunications

AWS Service Integrations AWS Control Tower AWS PrivateLink Pre-trained Amazon SageMaker Models

Resources All resources Developer tools & tutorials Blog Events & webinars Analyst reports Customer success stories Buyer guide Frequently asked questions

Your Saved List

Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Diffbot Increases Efficiency

By Computer Software
on 02/25/2021

What do you like best about the product?

Prior to using Diffbot, we relied primarily on RSS feeds and a web scraping tool that is based on the visual layout and HTML of a webpage. We were very dependent on X Paths to get the data we wanted. We find that the Diffbot crawlers are more stable in the long term because they are not as impacted by website design changes. This saves us a lot of time that we would otherwise be spending on maintenance.

What do you dislike about the product?

The two issues that are most challenging for us are:

1. Diffbot does not recognize PDF documents, and we frequently would like to ingest them as articles.

2. We find it difficult to troubleshoot a crawler in situations where it is not bringing in data or it is not bringing in the data we are expecting.

What problems is the product solving and how is that benefiting you?

The biggest problem that Diffbot solved for us is reducing the amount of maintenance we have to do on our scraped websites. We use heavily Diffbot's full text capability and Diffbot’s metadata is also useful for us. The metadata that we use most is Diffbot’s language designation to ensure that our clients are seeing only articles in the languages that they choose.

We also see great potential for using the bulk API to become more efficient in our content ingest process and we are excited to continue to explore this option.