Sign in
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Diffbot APIs

Diffbot | 1

Reviews from AWS Marketplace

0 AWS reviews
  • 5 star
    0
  • 4 star
    0
  • 3 star
    0
  • 2 star
    0
  • 1 star
    0

External reviews

29 reviews
from G2

External reviews are not included in the AWS star rating for the product.


    James C.

Here's the deal... leave structuring web data to the PROS

  • June 01, 2020
  • Review provided by G2

What do you like best about the product?
Diffbot can augment data streams for SO MANY industries/use cases. Within ours we're able to keep track of news mentions on universities (from literally all over the web), and enrich leads for outreach. I'm sure there's a ton more we could be doing with Diffbot. But even with those uses the service has paid for itself many times over. It doesn't take many saved work hours to justify the $299 price tag...
What do you dislike about the product?
To tap into the full power of Diffbots offerings you do need a technical team member. (But for what service is this not the case?) Basically you can deal with pre-extracted sites (of which there seem to be millions) with the Knowledge Graph and Enhance. If you want to crawl a specific site repeatedly you'll need to at least know hot to make an API call.
What problems is the product solving and how is that benefiting you?
High level we're using Diffbot for data extraction. More specifically enriching lead data and monitoring news sources about a large group of organizations.

In the past we've built custom scrapers. but even with a (albeit small) data team the upkeep required to monitor even scores of sites made projects balloon in complexity and cost. The fact that we have multiple entry points to data streams about web properties that matter to us is HUGE.


    Ben E.

Game Changer for Cold Start Data Extraction

  • June 01, 2020
  • Review provided by G2

What do you like best about the product?
Diffbot's Extraction APIs and Crawlbot API provide an incredibly valuable, versatile, and simple to use pipeline for acquiring crucial information from web pages that may not have been visited before. The Analyze API makes it a snap to determine if the page in question is a product page or not, and the wide array of elements that Diffbot returns from most pages is exceptionally useful!
What do you dislike about the product?
In our space, we tend to cover a large percentage of the e-commerce world, and that takes us to many domains that are either irregular, outdated, or less than perfect in terms of function. We've noticed that for those pages, or ones with domains that have sophisticated/aggressive bot blocking techniques that Diffbot will often fail to provide a result (or at least within a minute or two). This can be problematic for a company like ours that explores tens of thousands of domains each day as it can slow down our discovery pipeline that finds new listings and e-commerce domains.
What problems is the product solving and how is that benefiting you?
We typically use Diffbot to aid in providing data elements that we need in machine learning and AI, but would be too costly to spend the human-hours creating selectors for. Additionally, we use the Crawlbot API to help us get wider coverage of certain sites, while still leveraging the power of the automated extraction tools that Diffbot offers.


    Andres P.

Extremely powerful API for text extraction

  • May 31, 2020
  • Review verified by G2

What do you like best about the product?
We have used Diffbot for several years, their API for text extraction is extremely powerful and accurate. It has become an important part of our data processing pipeline. Their API(s) allow us to convert unstructured HTML data into information we can ingest and store.

Their support is also very responsive and has always provide us with value answers and feedback when needed.
What do you dislike about the product?
They also provide with a web interface to define custom rules, that functionality has also proved very useful, however its UI can be not very intuitive sometimes.
What problems is the product solving and how is that benefiting you?
It allows us to extract structured data from HTML pages.


    Artur R.

Content extraction done right

  • May 29, 2020
  • Review verified by G2

What do you like best about the product?
We're a happy customer for about 6 years now, and we tend to forget Diffbot is there, since their data flows seaminglessly. Our work depends a lot on data processing, and we don't want to worry about how data sources provide their data, or when change their process along the way. With Diffbot we can really focus on processing.
What do you dislike about the product?
Nothing worth mentioning. The few glitches we had in the past were promptly dealt by their support.
What problems is the product solving and how is that benefiting you?
We're using data extraction APIs for getting web data. We're evaluating the knowledge graph.


    Minn K.

Powerful tool for exploring data!

  • May 29, 2020
  • Review provided by G2

What do you like best about the product?
Impressive database of information curated from across the web
What do you dislike about the product?
There is a bit of a learning curve to the Diffbot Query Language, but it's worth it!
What problems is the product solving and how is that benefiting you?
I'm using it to enrich a dataset, based on a smaller list of fields.
Recommendations to others considering the product:
Have a specific use case in mind. Their documentation is also very useful.


    Eric S.

Diffbot is our favorite content provider by a landslide

  • May 28, 2020
  • Review provided by G2

What do you like best about the product?
We needed a content sourcing solution for our product, Tanjo Animated Personas, or TAPs. Tanjo Animated personas are simulated customers that learn and evolve over time. Our personas need to read a continual stream of articles, in order to evolve and function properly. Diffbot gives us an easy way to source that content.

We have been a Diffbot customer for over 5 years, and have used all of their products, including Crawlbot and Knowledge Graph. Before Diffbot, we mainly relied on RSS feeds and custom scrapers to import articles into our system. The results were often inconsistent, with misread or malformed text blocks. It was tedious and unsustainable. Diffbot provided an almost limitless set of sources with high quality data.

Implementing Diffbot has greatly improved scalability, efficiency and quality of feeding internet articles into our platform. They are always willing to work with us if we encounter any issues. They take customer feedback seriously and are willing to hear out suggestions for what features could be improved or added. We appreciate Diffbot’s flexibility to work with us for our needs.
What do you dislike about the product?
Diffbot has always been open to hearing our suggestions for what could be improved or added to their website. I don't think it would be fair to "dislike" anything since they have taken our feedback seriously in the past and iterated on their platform. If we think things could be better, we let Diffbot know.
What problems is the product solving and how is that benefiting you?
We needed an automated method to extract article text and images from popular websites online that was much more reliable and required much less effort to maintain. Diffbot provides an almost limitless set of sources with high quality data.


    Internet

Diffbot great for extracting data without engineering help!

  • May 28, 2020
  • Review verified by G2

What do you like best about the product?
Their support team is very helpful. Even without purchasing their support plan to have an SLA, they usually get back within a week and provide thorough responses. Sometimes, they'll even see your API configuration, adjust it for you, and explain how the new setting is better.

I would highly recommend Diffbot for their robust and dependable products, supportive sales and customer support staff, and transparent pricing plans. Even their base plans make it easy for any company or team of any size to test it and determine what their positive ROI looks like.
What do you dislike about the product?
Documentation could be improved a bit. It can be hard for new users who aren't familiar with HTML and CSS how to apply specific filters and selectors. My recommendation here is to provide templates or additional documentation on best practices for scraping data from popular sources such as Wikipedia.

Another small thing they can improve on is providing better visibility into account usage statistics for accounts with multiple tokens, which are all tied into one parent account.
What problems is the product solving and how is that benefiting you?
Their data extraction APIs are customizable and flexible. Almost any page on the internet can be scraped. It expedites data extraction for our team as we don't need to depend on custom python scripts or software engineers to help collect data for our needs. We were able to reduce time from days to mere hours to get working APIs to extract data. For a startup that is now part of a much larger company, this type of efficiency helped us allocate our engineers to more important sprints.


    Information Technology and Services

Excellent Spidering and Content Extraction

  • May 28, 2020
  • Review provided by G2

What do you like best about the product?
Crawlbot paired with Diffbot's extraction api saved us thousands of hours when acquiring web data for research and implementing consulting projects.
What do you dislike about the product?
Occasionally, when looking to update content from previously crawled URL's diffbot would be inconsistent. While this was not frequent, and we were able to find work arounds, we were never able to successfully trouble shoot the issues.
What problems is the product solving and how is that benefiting you?
As a consulting organization working with organizations to leverage NLP and auto classification we used Diffbot to extract text, image, and product information.

The big benefit of Diffbot is it seamlessly integrated into our tool set enabling us to quickly generate structured data upon which to work.


    Henry V.

Diffbot was extremely helpful, attentive, and responsive to meet my business' custom needs.

  • May 27, 2020
  • Review provided by G2

What do you like best about the product?
Diffbot provides a simple, well documented API that allows for mind-boggling web scraping with brain-dead code. By finding what's important on nearly every kind of webpage, Diffbot helped launch my project further than I could have imagined, saving me hours writing code which would have only been able to understand a few websites.
What do you dislike about the product?
One suggestion for them is, there are probably individuals/small businesses out there that can't afford the plans they offer, that could still get a lot out of Diffbot, so maybe they should consider adding a smaller plan. But as a user I haven't encountered anything to dislike yet- really! Haven't had a single issue using the API and it was really easy to get started with all of their help.
What problems is the product solving and how is that benefiting you?
Several times a day, we're scraping URLs which are dynamically chosen by a program and pulling data from those web pages. Since we don't know which sites will be scraped in advance, it's a daunting programming task to reliably scrape the important info from any given webpage. Diffbot does this job reliably with any web page we encounter. Ultimately it gives us a ton of mental space to tackle other important aspects of my program, rather than muck around in the mess of web code.


    Information Technology and Services

Very good tech and support team

  • May 27, 2020
  • Review verified by G2

What do you like best about the product?
Data extraction capability of the software, support team's commitment to our success
What do you dislike about the product?
Minor inconsistencies and slightly wonky UI
What problems is the product solving and how is that benefiting you?
Data collection