Diffbot Increases Efficiency
What do you like best about the product?
Prior to using Diffbot, we relied primarily on RSS feeds and a web scraping tool that is based on the visual layout and HTML of a webpage. We were very dependent on X Paths to get the data we wanted. We find that the Diffbot crawlers are more stable in the long term because they are not as impacted by website design changes. This saves us a lot of time that we would otherwise be spending on maintenance.
What do you dislike about the product?
The two issues that are most challenging for us are:
1. Diffbot does not recognize PDF documents, and we frequently would like to ingest them as articles.
2. We find it difficult to troubleshoot a crawler in situations where it is not bringing in data or it is not bringing in the data we are expecting.
1. Diffbot does not recognize PDF documents, and we frequently would like to ingest them as articles.
2. We find it difficult to troubleshoot a crawler in situations where it is not bringing in data or it is not bringing in the data we are expecting.
What problems is the product solving and how is that benefiting you?
The biggest problem that Diffbot solved for us is reducing the amount of maintenance we have to do on our scraped websites. We use heavily Diffbot's full text capability and Diffbot’s metadata is also useful for us. The metadata that we use most is Diffbot’s language designation to ensure that our clients are seeing only articles in the languages that they choose.
We also see great potential for using the bulk API to become more efficient in our content ingest process and we are excited to continue to explore this option.
We also see great potential for using the bulk API to become more efficient in our content ingest process and we are excited to continue to explore this option.