website auto scraping with autoit and .net httprequest

17
Thursday, June 9, 2022 AuTo Scraping Blackie Tsai [email protected]

Upload: chen-tien-tsai

Post on 12-Apr-2017

390 views

Category:

Engineering


10 download

TRANSCRIPT

Page 1: Website Auto scraping with Autoit and .Net HttpRequest

May 3, 2023

AuTo ScrapingBlackie Tsai

[email protected]

Page 2: Website Auto scraping with Autoit and .Net HttpRequest

AgendaAgenda•Background•Behavior and System Analysis•HLD

Page 3: Website Auto scraping with Autoit and .Net HttpRequest

Background

Page 4: Website Auto scraping with Autoit and .Net HttpRequest

•User requirement• A desktop application for scraping Odds and OnlineList from rental

site• Total 1, 652 links include 16,5200 records• Data format arrange and export to a excel file

•Non-function requirement• Avoid to lock account by action as similar as DDOS

Background

Page 5: Website Auto scraping with Autoit and .Net HttpRequest

Behavior and System Analysis

Page 6: Website Auto scraping with Autoit and .Net HttpRequest

Behavior Analysis•Website need Login first•Login session will keep alive if you idle in MainPage(timely sync request post from client)•After login to MainPage, each click open Pop-up window to display•Each data page will display 100 records by filter you give

Page 7: Website Auto scraping with Autoit and .Net HttpRequest

System Analysis•Security• Website need login to get SessionId and StickyId for request• Website have security mechanism to redirect invalid request• Using one time token to avoid user request page data without

permission when login to Main page• All sub-page(pop-up window) only allow open from Main page• Website using RESTful-like routing include UserSession token

•Routing and Request Post• URL routing included Login Token• MainPage routing included BuildVersion• Request need add Query key(it pass from Main window) for Odds and

OnlineList service

Page 8: Website Auto scraping with Autoit and .Net HttpRequest

Chanllege•Issue 1• Too many links to scraping if using Selenium or other similar solution.

•Issue 2• Some data need using JavaScript to decrypt and re-generate(RSA

token, one time token and etc…).•Issue 3• Need capture response header(Session and StickyId) to mock the

request to query the Odds an OnlineList service.

Page 9: Website Auto scraping with Autoit and .Net HttpRequest

HLD

Page 10: Website Auto scraping with Autoit and .Net HttpRequest

Use of Technology•C# and .Net framework•AutoIt(Download)• AutoIt v3 is a freeware BASIC-like scripting language designed for

automating the Windows GUI and general scripting.• Have script Editor to build up the script• Can execute ShellScript• Can compile script to .exe file

•AutoItX aka NAutoIt(Download)• Methods available to AutoIt BASIC, but not provided via AutoItX, are

replaced by .NET counterparts.• AutoItX with PowerShell, .NET, C, COM, COM interop and reg free COM

interfaces.

Page 11: Website Auto scraping with Autoit and .Net HttpRequest

HLD

Page 12: Website Auto scraping with Autoit and .Net HttpRequest

Use of Technology - AutoIt•AutoIt Window Info

Page 13: Website Auto scraping with Autoit and .Net HttpRequest

Use of Technology - AutoIt•SciTe Script Editor• Write script with IDE, hint intelligence and Help guide

Page 14: Website Auto scraping with Autoit and .Net HttpRequest

Use of Technology - AutoIt•Run Script• Execute .au3 or .a3x file.

•Compile Script to .exe• Convert .au3 script to .exe or .a3x file

Page 15: Website Auto scraping with Autoit and .Net HttpRequest

•Setup• Project reference AutoItX3.Assembly.dll• Project add AutoItX3.dll and update setting to CopyToOutput

Use of Technology - AutoItX

Page 16: Website Auto scraping with Autoit and .Net HttpRequest

Q & AQ & A

Page 17: Website Auto scraping with Autoit and .Net HttpRequest

11F., No.399, Ruiguang Rd., Neihu Dist., Taipei City 114, Taiwan TEL: +886 2 2798 8529 Fax: +886 2 2798 8531 Website : www.xuenn.com

THANK YOU!THANK YOU!