intelligent agents presentation with focus on regular expressions

24
Building Intelligent Web Agents with CFML Michael Dinowitz November, 2000

Upload: ebayworld

Post on 25-Jan-2015

544 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Intelligent agents presentation with focus on Regular Expressions

Building Intelligent Web Agents with CFML

Michael DinowitzNovember, 2000

Page 2: Intelligent agents presentation with focus on Regular Expressions

Intelligent Agents in ColdFusion

What are Agents?– Code that does automatic work for you– Involves retrieving information– Processing or storing that information– Usually a single page or has no interface

What are Intelligent Agents (IA)?– Term user for a specific class of agents– Retrieves remote information– Processes the retrieved information– Decision making code built in– Usually involves Parsing operations– Interfaces with remote processes

Page 3: Intelligent agents presentation with focus on Regular Expressions

Intelligent Agents in ColdFusion

What aren’t Intelligent Agents?– Push of any sort (CFMAIL)– Calls to structured locations

• DBs• LDAP

– Browsers

Grey Areas - Structured data– Syndicated data (Spectra)– HTTP query returns– Comma delimited information– Most local information calls

Page 4: Intelligent agents presentation with focus on Regular Expressions

Intelligent Agents in ColdFusion

Broad examples– CF_StockGrabber - grabs and processed stock

information– CF_UPS - interface to UPS shipping data– CF_MetaSearch - searches multiple search

engines and collates results– CF_GetTags

Page 5: Intelligent agents presentation with focus on Regular Expressions

Intelligent Agents in ColdFusion

Technologies used for retrieval– CFHTTP - retrieve websites– CFFTP - retrieves ftp information– CFX_Socket - socket calls for information– CFX_NNTP - retrieves usenet news

Technologies used for parsing– Find() / FindNoCase ()– Replace() / ReplaceNoCase ()– Mid()– REFind() / REFindNoCase ()– REReplace() / REReplaceNoCase()

Page 6: Intelligent agents presentation with focus on Regular Expressions

IA technique I - CF_EbayItem

IA technique I - CF_EbayItem 1. Define what you want

– A page from ebay with the results of a search 2. Define how it will be displayed

– Whole page returned in a variable. No parsing 3. Define the steps to get it

– CFHTTP to retrieve a page– Place information in file or on browser

Page 7: Intelligent agents presentation with focus on Regular Expressions

CFHTTP Basics

<CFHTTP– Url - Url to retrieve. Does not need http:// prefix– Method - Get or Post. – ResolveUrl - Turns all relative links into ‘full’

ones. Needed for graphics and links from the page.

Notes:– The URL does not need to be prefixed by

http://, but it’s good practice to do so.– Get is standard and uses the tag ‘as is’. Post

requires a CFHTTPPARAM as well as a closing CFHTTP tag.

– ResolveUrl should only be used when you expect to follow links from the called page or want to see the media content.

Page 8: Intelligent agents presentation with focus on Regular Expressions

IA technique I - CF_EbayItem

IA technique I - CF_Ebay (Code)<!--- CF_EbayItem - Module to get all items from ebay and return it --->

<!--- Required attributes ---><CFPARAM name="attributes.searchitem"><CFPARAM name="attributes.ReturnVar" default="ReturnVar">

<cfhttp url="http://search-desc.ebay.com/search/search.dll?MfcISAPICommand=GetResult&ebaytag1=ebayreg&ht=1&query=#attributes.searchitem#&ebaytag1code=0&srchdesc=y&SortProperty=MetaNewSort" method="GET" resolveurl="true">

<CFSET “Caller.#Attributes.ReturnVar#”=CFHttp.FileContent>

Page 9: Intelligent agents presentation with focus on Regular Expressions

IA technique II - CF_EbayItem

1. Define what you want– All items from an ebay search

2. Define how it will be displayed– in a return array

3. Define the string to search for in the page– <a href="http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?

ViewItem&item=449570667">HEBREW AMULETS: By T Schrire</a>

4. Define the steps to get it– CFHTTP to retrieve a page– CFLOOP over the page for elements– FindNoCase() to get start of specific element– FindNoCase() to get end of specific element– Mid() to get whole element– Place information in array for return

Page 10: Intelligent agents presentation with focus on Regular Expressions

Find()/FindNoCase() Basics

FindNoCase(substring, string [, start ]) – SubString - The exact string your looking for– String - The string that your searching– Start - Optional start position.

Notes:– FindNoCase is slightly slower, but better when

you don’t know exactly what your looking for.– Always a good idea to set a start. Speeds up

the search.– Remember that the return value is the START

position of the SubString. Add the SubString length to get the end position.

Page 11: Intelligent agents presentation with focus on Regular Expressions

Mid() Basics

Mid(string, start, count) – String - The string that contains the SubString

you want.– Start - The start position of the SubString you

want.– Count - The amount of characters in the

SubString that you want. Notes:

– When used with FindNoCase, it is usual to have a start variable and an end variable. The count would then be noted as

• End-Start

Page 12: Intelligent agents presentation with focus on Regular Expressions

IA technique II - CF_EbayItem

<!--- CF_EbayItem - Module to get all items from ebay and return it --->

<!--- Required attributes ---><CFPARAM name="attributes.searchitem"><CFPARAM name="attributes.ReturnVar" default="ReturnVar">

<cfhttp url="http://search-desc.ebay.com/search/search.dll?MfcISAPICommand=GetResult&ebaytag1=ebayreg&ht=1&query=#attributes.searchitem#&ebaytag1code=0&srchdesc=y&SortProperty=MetaNewSort" method="GET" resolveurl="true">

<CFSET “Caller.#Attributes.ReturnVar#”=CFHttp.FileContent>

<!--- CF_EbayItem - Module to get all items from ebay and return it --->

<!--- Required attributes ---><CFPARAM name=”Attributes.SearchItem"><CFPARAM name=”Attributes.ReturnArray" default="ReturnArray">

<cfhttp url="http://search-desc.ebay.com/search/search.dll?MfcISAPICommand=GetResult&ebaytag1=ebayreg&ht=1&query=#Attributes.SearchItem#&ebaytag1code=0&srchdesc=y&SortProperty=MetaNewSort" method="GET" resolveurl="true">

<CFSET End=1><!--- Set local array for storage. We set all values to a local

array rather than to the calling template to reduce the number of ‘calls’ between templates and improve performance. --->

<CFSET LocalArray=ArrayNew(1)>

Page 13: Intelligent agents presentation with focus on Regular Expressions

IA technique II - CF_EbayItem

<CFLOOP condition="1"><CFSET Start = FindNoCase('<a href="http://cgi.ebay.com/aw-

cgi/eBayISAPI.dll?ViewItem&item=', cfhttp.filecontent, end)>

<CFIF Start><!--- Add the search item’s length to its position to get its true end position. This will help in getting its full value in the Mid() function. ---><CFSET End=FindNoCase('</a>', cfhttp.filecontent, start)+4><!--- Add item to a local array ---><CFSET ArrayAppend(LocalArray,Mid(cfhttp.filecontent, start, end-start))>

<cfelse><cfbreak>

</cfif></cfloop>

<!--- Set local array to calling template ---><CFSET "caller.#Attributes.ReturnArray#"=LocalArray>

Page 14: Intelligent agents presentation with focus on Regular Expressions

IA technique III - CF_EbayItem

1. Define what you want– All items from an ebay search

2. Define how it will be displayed– in a return array

3. Define the string to search for in the page– <a href="http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?

ViewItem&item=449570667">HEBREW AMULETS: By T Schrire</a>

4. Define the steps to get it– CFHTTP to retrieve a page– CFLOOP over the page for elements– REFindNoCase() to get specific element– Mid() to get whole element– Place information in array for return

Page 15: Intelligent agents presentation with focus on Regular Expressions

REFind()/REFindNoCase() Basics

REFindNoCase(RegEx, String [,start] [,returnsub] ) – RegEx - Regular Expression to use as search

criteria– String - String to search in– Start - Position in String to start search at– ReturnSub - Returns sub expressions as

defined in the RegEx Notes:

– Start should always be used as it speeds up the search. If using ReturnSub, it is required and can be set to 1.

– This function returns the numeric position of the searched for text unless ReturnSub is specified. Then it returns a structure

Page 16: Intelligent agents presentation with focus on Regular Expressions

REFind()/REFindNoCase() Basics

– Structure returned by this string will have two keys (Pos, Len) with each key being an array. The first array (Variable.Pos[1], Variable.Len[1]) will always contain the position/Length of the ENTIRE match. Each additional array element will contain the position and length of a subelement.

– Variable• Pos

– [1] – [2]

• Len

– [1]– [2]

Page 17: Intelligent agents presentation with focus on Regular Expressions

RegEx Basics The following is a fast rundown of important

characters in Regular Expressions– In most cases, a character is equal to itself– A \ will escape any special character– A period (.) represents any one character

• .at can mean bat, cat, rat, or anything that has a single character and ends with at.

– A pair of brackets denotes a set of characters (I.e. one of them can be used)

• [01256] means any one of those numbers– A dash (-) within a set means “a range of”

• [0-9] means any single number of 0 through 9– A carat (^) within a range means “Not the

range”• [^aeiou] means any character but a vowel

Page 18: Intelligent agents presentation with focus on Regular Expressions

RegEx Basics

– Parenthesis is used to denote a compound expression OR a subexpression

• (this) will return the position and length of the word “this”

– When used within a compound, a pipe (|) means either/or

• (this|that) will return the position and length of the first occurrence of “this” or “that”

– A question mark (?) means that the previous character, set or compond may or may not exist but if it does, will exist 1 time

– A plus (+) means that the previous character, set or compond must exist 1 or more times

– An asterisk (*) means that the previous character, set or compond may exist 0 or more times

Page 19: Intelligent agents presentation with focus on Regular Expressions

IA technique III - CF_EbayItem

<CFQUERY DATASOURCE ="demo" NAME="products">SELECT PRODUCT, PRICEFROM PRODUCTS

</CFQUERY>

<H2>Car Paint Colors</H2><CFOUTPUT QUERY="products">

<B>#product#</B> - $#price#<BR></CFOUTPUT>

<!--- CF_EbayItem - Module to get all items from ebay and return it --->

<!--- Required attributes ---><CFPARAM name=”Attributes.SearchItem"><CFPARAM name=”Attributes.ReturnArray" default="ReturnArray">

<cfhttp url="http://search-desc.ebay.com/search/search.dll?MfcISAPICommand=GetResult&ebaytag1=ebayreg&ht=1&query=#Attributes.SearchItem#&ebaytag1code=0&srchdesc=y&SortProperty=MetaNewSort" method="GET" resolveurl="true">

<CFSET end=1><!--- Set local array for storage. We set all values to a local

array rather than to the calling template to reduce the number of ‘calls’ between templates and improve performance. --->

<CFSET LocalArray=ArrayNew(1)>

Page 20: Intelligent agents presentation with focus on Regular Expressions

IA technique III - CF_EbayItem

<CFLOOP condition="1"><!--- Search the CFHTTP.FileContent for any link (A HREF=></A>)

where two parts of the link will change. The Url variable item= will always contain a number. [0-9]+ will get 1 or more numbers.

The text in the body of the A tag will contain any characters, but never HTML. Using [^<]+ to search for anything other than a closing bracket will get us all the text.

Note that a forward slash is used before each period and question mark in the URL to ‘escape’ these characters and have them treated as a normal character rather than a special RegEx character.--->

<CFSET Item=REFindNoCase('<a href="http://cgi\.ebay\.com/aw-cgi/eBayISAPI\.dll\?ViewItem&item=[0-9]+">[^<]+</a>', cfhttp.filecontent, end, 1)>

Page 21: Intelligent agents presentation with focus on Regular Expressions

IA technique III - CF_EbayItem

<!--- If the value of Item.Len[1] is TRUE (I.e. not 0) then add the element to the array. Else break out of the loop --->

<CFIF Item.len[1]><!--- Add the search item length to its position. This will be used as the new position to start the search from in the next loop iteration. A simple +1 would work as well.---><CFSET End=Item.pos[1]+Item.len[1]><!--- Add item to a local array. Note that the return from a REFind()/REFindNoCase() function fits perfectly into a Mid() function. ---><CFSET ArrayAppend(LocalArray,Mid(cfhttp.filecontent, Item.pos[1], item.len[1]))>

<cfelse><cfbreak>

</cfif></cfloop><!--- Set local array to calling template ---><CFSET "caller.#Attributes.ReturnArray#"=LocalArray>

Page 22: Intelligent agents presentation with focus on Regular Expressions

Extra Information

CFHTTP Headers - extra information returned by a CFHTTP (or any HTTP) call

– FILECONTENT - Text grabbed

– HEADER - Header info (including cookies)

– MIMETYPE - Return mime type

– RESPONSEHEADER - structure with all information except content

– STATUSCODE - HTTP return code

Page 23: Intelligent agents presentation with focus on Regular Expressions

Syndication (WDDX & Queries)

Can return structured information as a query Better to use WDDX to send query encoded in

a packet Basis of Spectra syndication Can pass binary files encoded with

ToBase64() function

Page 24: Intelligent agents presentation with focus on Regular Expressions

Conference Closing Slide