WebExtractor allows you to easily define some Web page extracts, and to save the extraction definition.
The extraction definition can then be used in :
- custom application (initial application is personal bank account management)
- custom widgets
- etc... (synchro on palm in plucker or dedicated format, retrieving data for plotting...)
Web extracts are defined by :
- set of url to follow, possibly with parameters, possibly using key-chain stored values
- position in DOM tree, selecting by tag, id, and class, and possibly content
- some post procesing, e.g. converting to date, following links, etc...
Currently supported features:Developers wishing to contribute can contact me
- multiples URL/pages, multiples extraction per page
- partial editing of DOM extraction path
- recognition of DOM element containing month (thus will allow "click on current month" action)
- recognition/extraction of date, and date range
- operation on numeric extracted values
- integrated widgets
- bundle framework library
- array extraction
Sources are now on google-code
In browser mode, WebExtractor follows your action (click on link, etc..) and keep track of URLs
Extracts definition mode
Versatile date parsing
WebClipper can recognise and convert dates or date ranges, from a variety of natural language format (e.g. "17 > 19 Avril", "Du 17/04 au 29/04", ...) , to NSCalendarDate objects.
The selected extract is kept in HTML format. Following HTML elements can be kept too.
Follow url and HTML elements containing month or day
While specifying the HTML element to be extracted, WebExtractor automatically recognise that the element contain month name, and propose current month as the selection criteria (e.g.: use list element containing current month, instead of use 7th list element)
The extract can be itself an URL, and can be used for next page fetch.