ezproxy stanza deconstruction - oclc...add a new host and use it the same way it uses an explicit...

20
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL Sébastien Nadeau March 31st 2016 EZproxy Stanza Deconstruction

Upload: others

Post on 07-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Sébastien Nadeau

March 31st 2016

EZproxy Stanza Deconstruction

Page 2: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 2

Anatomy of a complete URL, part 1

protocol://host:port/path?query• URL may contain only some parts :

• https://www.example.com

• www.example.com

• www.example.com:8080

• http://www.example.com/starting_point

• host is also commonly called: domain

• Not to be confused with EZproxy’s Host and Domain directives!

Page 3: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 3

Anatomy of a complete URL, part 2

URLs are used in EZproxy URL and Host directives.

Only the host or domain part of a URL is used in a Domain

directive. Sometimes, in order to be more inclusive, only a subset

of a domain is used.

URLs used in URL and Host directives are often called Starting

Point URLs (SPU).

Page 4: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 4

Disambiguation part 1

1. URL and Host lines are equivalent to ezproxy.

From EZproxy reference manual: The URL directive is an implicit Host directive, making it redundant to

specify the same protocol/ host/ port in both a URL directive and a Host directive.

The Host directive also authorizes the specified protocol/ host/ port for use in Starting Point URLs, similar

to the behavior of the URL directive.

2. No matter how much you put after the host and the port in the URL or Host line, it

doesn’t matter. Which means path and query in URL line do not matter.

3. For a Host line, when protocol is omitted, http is the default. To make things clear, I

recommend to specify a protocol.

4. Domain lines don’t allow you starting point access to resources; you need a URL or

Host line for that.

5. Domain lines let ezproxy add new hosts in that domain automagically if you click on

a vendor link that goes to a different host in that domain. EZproxy does not care if

the Domain lines are in the same stanza as the URL or Host line that provided

starting point access to the resource.

Page 5: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 5

Disambiguation part 27. Domain lines are used to instruct EZproxy about what to rewrite in fetched web

pages from vendor websites.

8. Ezproxy, under the hood, knows nothing about domains per se, it simply knows to

add a new host and use it the same way it uses an explicit host in the stanza’s Host

line.

9. In fact, if you delete the ezproxy.hst file and restart, EZproxy will simply start adding

new hosts as people search, whether they are explicitly in the Stanza or not.

A Starting Point URL (SPU) is a URL, in your catalog or website, that links at the

right place in the vendor web site. It often contains a path part and sometimes a

query part. It is prefixed by your EZproxy URL, like this:

http://ezproxy.institution.edu/login?url=http://www.example.com

Page 6: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 6

Disambiguation part 310. DJ somedomain.com actually acts like a wildcard *.somedomain.com, but only

when rewriting pages.

11. Not when allowing starting point URL’s, which is the job of H and HJ, which do not

allow wildcard domains.

12. So don’t get the false feeling that adding a D or DJ line is enough to instruct

Ezproxy to accept all URLs ending with the given domain as SPU.

13. That’s why all possible starting point URL must be specified and is the main reason

why sagepub and oxfordjournals Stanzas are so huge.

More about all this in the coming slides.

Page 7: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Now, let’s deconstruct some Stanzas.

Example #1: Oxford Journals

Title Oxford Journals

URL https://www.oxfordjournals.org

Domain oxfordjournals.org

DJ oxfordjournals.org

Ok, looks good, but there’s something useless. What?

7

Let’s just say it’s a well known case…

http://www.oxfordjournals.org/en/help/tech-info/ezproxyconfig.txt

Domain oxfordjournals.org

Page 8: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Why?EZproxy Reference Manual [Draft], page 20 :

If a database stanza contains Host, Domain, and

DomainJavaScript directives that correspond to a specific

protocol/host/port, DomainJavaScript takes priority and

enables additional processing.

Domain oxfordjournals.org

DJ oxfordjournals.org Takes priority

8

Page 9: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Example #2:

IEEETitle IEEE Xplore

URL https://ieeexplore.ieee.org/xpl/bkBrowse.jsp

HJ ieeexplore.ieee.org

HJ m.ieeexplore.ieee.org

HJ opac.ieeecomputersociety.org

HJ search.ieeexplore.ieee.org

HJ www.computer.org

HJ www.ieeexplore.ieee.org

HJ www.ieee.org

HJ xplorestaging.ieee.org

HJ https://ieeexplore.ieee.org

DJ computer.org

DJ ieee.org

DJ ieeecomputersociety.org

This is not a bad stanza. Howver, there’s a useless line in there. Could you spot it?

9

HJ https://ieeexplore.ieee.org

Page 10: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Why?

• URL https://ieeexplore.ieee.org/xpl/bkBrowse.jsp

already authorizes all other SPUs that starts with https://ieeexplore.ieee.org. The line

is thus redundant.

• Then why is this HJ needed: HJ ieeexplore.ieee.org ?

• Indeed, this seems to already be taken care by the URL line. But no.

• URL will allow only the specified protocol (in this case https), so

http://ieeexplore.ieee.org will not be authorized as SPU.

• EZproxy documentation in this case is clear:Host specifies a specific protocol/ host/ port which should be rewritten by EZproxy.. If http:// and https:// are

both omitted, then EZproxy assumes that the protocol is http. If port is omitted, the default is 80 for http or

443 for https.

• This means that http is not covered by the URL line of this stanza, but is (implicitly)

covered by the first HJ line. All the other HJ lines also implicitly cover only http. If we

would like to allow https for all hosts, we would need to add HJ lines with explicit https

for each hosts listed.

10

Page 11: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

The path part

• The purpose of URL is to specify the main SPU for a database.

• IEEE’s URL line could be stripped of the path part (/xpl/bkBrowse.jsp).

• URL https://ieeexplore.ieee.org

• The only practical use I see for the path part of the URL is for menu.htm.

• This page is displayed when a user is logging in without specifying a resource.

• i.e. http://ezproxy.institution.edu/login

• If someone knows of other use cases for the path part of URL, let’s share!

11

Page 12: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

But, it’s just a single line, nothing to

really worry about… no?

• Wait for example #3…

12

Page 13: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Example #3: Lynda.com

ProxyHostnameEdit ^lynda.com$ lyndacom

ProxyHostnameEdit .lynda.com$ lyndacom

Option NoHttpsHyphens

Title Lynda.com

URL http://iplogin.lynda.com

URL https://iplogin.lynda.com

URL https://www.lynda.com/lyndaCampus/LoginOrCreateProfile.aspx

URL http://lynda.com/page/ajaxedfooter

URL https://lynda.com/page/ajaxedfooter

URL http://ldcemail.lynda.com/gT0N0yu0600hFS0gD000CHq

URL http://ldcemail.lynda.com/DDS000B0SFN00H0huy0q0g6

URL http://ldcemail.lynda.com/K0060DF00uHR0AgNq0hy0S0

URL http://ldcemail.lynda.com/Q0HS0NF0DP0u60yhg00y0q0

URL http://ldcemail.lynda.com/O0H0q06S0000yDuO0x0ghNF

URL http://ldcemail.lynda.com/K0060DF00uHN0wgNq0hy0S0

URL http://ldcemail.lynda.com/m000yv00006ghuqNFHM0DS0

Host iplogin.lynda.com

Host www.lynda.com

DJ lynda.com

NeverProxy cdn.lynda.com

Option HttpsHyphens

13

Page 14: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Christian Wimmer, answering on the list:

« are you shure that OCLC provided that Stanza? This looks more like something a

vendor would produce. From my understanding there should only be one URL line per

stanza, the rest of them should be host lines. This should make the 10 or so

ldcemail.lynda.com kinda redundant. Same goes for the extra H iplogin.lynda.com. »

ProxyHostnameEdit ^lynda.com$ lyndacom

ProxyHostnameEdit .lynda.com$ lyndacom

Option NoHttpsHyphens

Title Lynda.com

U http://iplogin.lynda.com

H https://iplogin.lynda.com

H http://www.lynda.com

H https://www.lynda.com

H http://lynda.com

H https://lynda.com

H http://ldcemail.lynda.com

DJ lynda.com

NeverProxy cdn.lynda.com

Option HttpsHyphens

14

Page 15: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

Or as published now

https://www.oclc.org/support/services/ezproxy/documentation/db/lynda.en.html

ProxyHostnameEdit ^lynda.com$ lyndacom

ProxyHostnameEdit .lynda.com$ lyndacom

Option NoHttpsHyphens

Title Lynda.com

URL http://iplogin.lynda.com

HJ iplogin.lynda.com

HJ www.lynda.com

HJ lynda.com

DJ lynda.com

Option HttpsHyphens

However doesn’t support https and first HJ line is useless.

15

Page 16: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

To Javascript or Not To Javascript? (1/3)

• I wondered what the difference is, strictly talking about Javascript processing,

between DomainJavascript and HostJavascript.

• Hopefully, Susan Musser was able to get a definitive answer for me.

• A DJ line will turn on Javascript processing for any URL in the stanza that ends in the

given domain.

• So for example, the following stanza:

Title Some Database

URL http://www.somedatabase.com

DJ somedatabase.com

• Would turn on Javascript processing for a SPU with http://www.somedatabase.com

as a target URL because it matches the domain somedatabase.com.

16

Page 17: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

To Javascript or Not To Javascript? (2/3)

• It would also enable javascript processing for any URL clicked once the proxied

session began that ended in the domain somedatabase.com, and this could include

both http and https URLs.

• an HJ line will turn on javascript processing for URLs that are matches with the origin

given. So for example, the following stanza:

Title Some Database

URL http://www.somedatabase.com

HJ www.somedatabase.com

D somedatabase.com

• Would turn on Javascript processing for any of the following URLs:

http://www.somedatabase.com

http://www.somedatabase.com/research

17

Page 18: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL

To Javascript or Not To Javascript? (3/3)

• But not the following:

https://www.somedatabase.com (because the URL is https, not http)

http://www.somedatabase.com:8080 (because the port is different)

http://www.research.somedatabase.com (because the URL origin does not match)

http://somedatabase.com (again because the URL origin does not match)

• For each of the reasons given, and because the final line is a D and not a DJ.

• Conclusion:

» When Javascript processing is needed, which is true for 99% or more of the resources, DJ is

the way to go.

» When DJ is activated, Javascript processing will happen, even for resources in URLs

specified by H lines.

18

Page 19: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 19

• Please ask your questions in the chat

• Easier for me!

Questions?

Page 20: EZproxy Stanza Deconstruction - OCLC...add a new host and use it the same way it uses an explicit host in the stanza’s Host line. 9. In fact, if you delete the ezproxy.hst file and

BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 20

Thank you!Credit to all those who inspired me with their posts on the mailing listand forum from whom I borrowed a lot of the material in thispresentation:

• Andrew Anderson• Dom Benson• Paul Butler• Phil Elms• Bruno Ménette• Susan Musser• Ian Richmond• Mandi Schwarz• Christian Wimmer• And apologies to all those I forgot…