查询
最新公告

SysNucleus WebHarvey 7.4.0.228

File size: 133.5 MB

Intuitive Powerful Visual Web Scraper. WebHarvy can automatically scrape Text, Images, URLs & Emails from websites, and save the scraped content in various formats. WebHarvy is an application designed to help you scrape images, text or any data displayed on a web page using an advanced built-in browser and an easy point and click interface.

- Incredibly easy-to-use, start scraping within minutes - Extract data from multiple pages/categories/keywords - Save extracted data to file or database - Built-in scheduler and proxy support

Point and Click Interface WebHarvy is a visual web scraper. There is absolutely no need to write any scripts or code to scrape data. You will be using WebHarvy's in-built browser to navigate web pages. You can select the data to be scraped with mouse clicks. It is that easy !

Scrape Data Patterns Auto Pattern Detection WebHarvy automatically identifies patterns of data occurring in web pages. So if you need to scrape a list of items (name, address, email, price etc) from a web page, you need not do any additional configuration. If data repeats, WebHarvy will scrape it automatically.

Export scraped data Export data to file/database You can save the data extracted from web pages in a variety of formats. The current version of WebHarvy Web Scraper allows you to export the scraped data as an XML, CSV, JSON or TSV file. You can also export the scraped data to an SQL database.

Scrape data from multiple pages Scrape from Multiple Pages Often web pages display data such as product listings in multiple pages. WebHarvy can automatically crawl and extract data from multiple pages. Just point out the 'link to the next page' and WebHarvy Web Scraper will automatically scrape data from all pages.

Keyword based Scraping Keyword based Scraping Scrape data by automatically submitting a list of input keywords to search forms. Any number of input keywords can be submitted to multiple input text fields to perform search. Data from search results for all combinations of input keywords can be extracted.

Scrape via proxy server Proxy Servers / VPN To scrape anonymously and to prevent the web scraping software from being blocked by web servers, you have the option to access target websites via proxy servers or VPN. Either a single proxy server address or a list of proxy server addresses may be used.

Category Scraping Category Scraping WebHarvy Web Scraper allows you to scrape data from a list of links which leads to similar pages/listings within a website. This allows you to scrape categories and sub-categories within websites using a single configuration.

Regular Expressions WebHarvy allows you to apply Regular Expressions (RegEx) on Text or HTML source of web pages and scrape the matching portion. This powerful technique offers you more flexibility while scraping data.

Run JavaScript Run your own JavaScript code in browser before extracting data. This can be used to interact with page elements or invoke JavaScript functions already implemented in target page.

Download Images Images can be downloaded or image URLs can be extracted. WebHarvy can automatically extract multiple images displayed in product details pages of eCommerce websites.

Automate browser interaction WebHarvy can be easily configured to perform tasks like Clicking Links, Selecting List/Drop-down Options, Input Text to a field, Scrolling page etc.

Whats New


直观强大网页抓取器。WebHarvy 可以自动从网站上抓取文本、图片、URL 和电子邮件,并将抓取的内容保存在各种格式中。WebHarvy 是一个专为帮助您使用内置浏览器和简单点击界面抓取网页上的图片、文本或任何数据的应用程序。 - 使用起来非常简单,几分钟内即可开始抓取 - 从多个页面/类别/关键词提取数据 - 将提取的数据保存到文件或数据库中 - 内置调度器和代理支持 点与点的界面 WebHarvy 是一个视觉化的网页抓取工具。您无需编写任何脚本或代码来抓取数据,而是使用 WebHarvy 的内置浏览器浏览网页,并通过鼠标点击选择要抓取的数据。这真是太简单了! 自动生成的数据模式检测 WebHarvy 可以自动识别网页上出现的数据模式。因此,如果您需要从一个网页中提取一列物品(名称、地址、电子邮件等)的数据,则无需进行任何额外配置。如果数据重复,WebHarvy 将会自动抓取。 导出抓取的数据 可以从文件或数据库导出抓取的数据 目前 WebHarvy 网页抓取器允许将抓取的数据导出为 XML、CSV、JSON 或 TSV 文件。您还可以将抓取的数据导出到 SQL 数据库中。 从多个页面抓取数据 通常,网页会显示如产品列表等数据在多个页面上。WebHarvy 可以自动爬行并从多个页面中提取数据。只需指出“下一个页面的链接”,WebHarvy 网页抓取器就会自动从所有页面中抓取数据。 基于关键词的数据抓取 通过自动提交输入关键字到搜索表单来抓取数据。可以将任何数量的输入关键字提交至多个输入文本字段以执行搜索。所有组合输入关键字的结果中的数据都可以提取出来。 通过代理服务器抓取数据 为了匿名抓取并防止网页抓取软件被网站服务器阻止,您可以选择使用代理服务器或 VPN 访问目标网站。既可以使用单个代理服务器地址,也可以使用代理服务器地址列表。 基于分类的数据抓取 WebHarvy 网页抓取器允许您从指向同一网站内类似页面/列表的链接中抓取数据。这使得您可以在一个配置下抓取网站内的各类别和子类别成为可能。 正则表达式 WebHarvy 允许您在网页及其源代码中应用正则表达式(RegEx)并抓取匹配的部分。这种强大的技术使您可以更好地控制抓取的数据。 运行 JavaScript 在提取数据之前可以在浏览器中执行自己的 JavaScript 代码。这可以用于与页面元素交互或调用目标页上已经实现的 JavaScript 函数。 下载图片 图片可以下载或者从网页抓取图像链接。WebHarvy 可以自动从电子商务网站产品详细页面中的多张图片进行抓取。 自动化浏览器互动 WebHarvy 可以轻松配置为执行如点击链接、选择列表/下拉选项、输入文本到字段、滚动页面等任务。
Download from free file storage


本站不对文件进行储存,仅提供文件链接,请自行下载,本站不对文件内容负责,请自行判断文件是否安全,如发现文件有侵权行为,请联系管理员删除。