ezproxy stanza deconstruction - oclc...add a new host and use it the same way it uses an explicit...
TRANSCRIPT
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Sébastien Nadeau
March 31st 2016
EZproxy Stanza Deconstruction
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 2
Anatomy of a complete URL, part 1
protocol://host:port/path?query• URL may contain only some parts :
• https://www.example.com
• www.example.com
• www.example.com:8080
• http://www.example.com/starting_point
• host is also commonly called: domain
• Not to be confused with EZproxy’s Host and Domain directives!
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 3
Anatomy of a complete URL, part 2
URLs are used in EZproxy URL and Host directives.
Only the host or domain part of a URL is used in a Domain
directive. Sometimes, in order to be more inclusive, only a subset
of a domain is used.
URLs used in URL and Host directives are often called Starting
Point URLs (SPU).
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 4
Disambiguation part 1
1. URL and Host lines are equivalent to ezproxy.
From EZproxy reference manual: The URL directive is an implicit Host directive, making it redundant to
specify the same protocol/ host/ port in both a URL directive and a Host directive.
The Host directive also authorizes the specified protocol/ host/ port for use in Starting Point URLs, similar
to the behavior of the URL directive.
2. No matter how much you put after the host and the port in the URL or Host line, it
doesn’t matter. Which means path and query in URL line do not matter.
3. For a Host line, when protocol is omitted, http is the default. To make things clear, I
recommend to specify a protocol.
4. Domain lines don’t allow you starting point access to resources; you need a URL or
Host line for that.
5. Domain lines let ezproxy add new hosts in that domain automagically if you click on
a vendor link that goes to a different host in that domain. EZproxy does not care if
the Domain lines are in the same stanza as the URL or Host line that provided
starting point access to the resource.
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 5
Disambiguation part 27. Domain lines are used to instruct EZproxy about what to rewrite in fetched web
pages from vendor websites.
8. Ezproxy, under the hood, knows nothing about domains per se, it simply knows to
add a new host and use it the same way it uses an explicit host in the stanza’s Host
line.
9. In fact, if you delete the ezproxy.hst file and restart, EZproxy will simply start adding
new hosts as people search, whether they are explicitly in the Stanza or not.
A Starting Point URL (SPU) is a URL, in your catalog or website, that links at the
right place in the vendor web site. It often contains a path part and sometimes a
query part. It is prefixed by your EZproxy URL, like this:
http://ezproxy.institution.edu/login?url=http://www.example.com
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 6
Disambiguation part 310. DJ somedomain.com actually acts like a wildcard *.somedomain.com, but only
when rewriting pages.
11. Not when allowing starting point URL’s, which is the job of H and HJ, which do not
allow wildcard domains.
12. So don’t get the false feeling that adding a D or DJ line is enough to instruct
Ezproxy to accept all URLs ending with the given domain as SPU.
13. That’s why all possible starting point URL must be specified and is the main reason
why sagepub and oxfordjournals Stanzas are so huge.
More about all this in the coming slides.
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Now, let’s deconstruct some Stanzas.
Example #1: Oxford Journals
Title Oxford Journals
URL https://www.oxfordjournals.org
Domain oxfordjournals.org
DJ oxfordjournals.org
Ok, looks good, but there’s something useless. What?
7
Let’s just say it’s a well known case…
http://www.oxfordjournals.org/en/help/tech-info/ezproxyconfig.txt
Domain oxfordjournals.org
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Why?EZproxy Reference Manual [Draft], page 20 :
If a database stanza contains Host, Domain, and
DomainJavaScript directives that correspond to a specific
protocol/host/port, DomainJavaScript takes priority and
enables additional processing.
Domain oxfordjournals.org
DJ oxfordjournals.org Takes priority
8
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Example #2:
IEEETitle IEEE Xplore
URL https://ieeexplore.ieee.org/xpl/bkBrowse.jsp
HJ ieeexplore.ieee.org
HJ m.ieeexplore.ieee.org
HJ opac.ieeecomputersociety.org
HJ search.ieeexplore.ieee.org
HJ www.computer.org
HJ www.ieeexplore.ieee.org
HJ www.ieee.org
HJ xplorestaging.ieee.org
HJ https://ieeexplore.ieee.org
DJ computer.org
DJ ieee.org
DJ ieeecomputersociety.org
This is not a bad stanza. Howver, there’s a useless line in there. Could you spot it?
9
HJ https://ieeexplore.ieee.org
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Why?
• URL https://ieeexplore.ieee.org/xpl/bkBrowse.jsp
already authorizes all other SPUs that starts with https://ieeexplore.ieee.org. The line
is thus redundant.
• Then why is this HJ needed: HJ ieeexplore.ieee.org ?
• Indeed, this seems to already be taken care by the URL line. But no.
• URL will allow only the specified protocol (in this case https), so
http://ieeexplore.ieee.org will not be authorized as SPU.
• EZproxy documentation in this case is clear:Host specifies a specific protocol/ host/ port which should be rewritten by EZproxy.. If http:// and https:// are
both omitted, then EZproxy assumes that the protocol is http. If port is omitted, the default is 80 for http or
443 for https.
• This means that http is not covered by the URL line of this stanza, but is (implicitly)
covered by the first HJ line. All the other HJ lines also implicitly cover only http. If we
would like to allow https for all hosts, we would need to add HJ lines with explicit https
for each hosts listed.
10
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
The path part
• The purpose of URL is to specify the main SPU for a database.
• IEEE’s URL line could be stripped of the path part (/xpl/bkBrowse.jsp).
• URL https://ieeexplore.ieee.org
• The only practical use I see for the path part of the URL is for menu.htm.
• This page is displayed when a user is logging in without specifying a resource.
• i.e. http://ezproxy.institution.edu/login
• If someone knows of other use cases for the path part of URL, let’s share!
11
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
But, it’s just a single line, nothing to
really worry about… no?
• Wait for example #3…
12
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Example #3: Lynda.com
ProxyHostnameEdit ^lynda.com$ lyndacom
ProxyHostnameEdit .lynda.com$ lyndacom
Option NoHttpsHyphens
Title Lynda.com
URL http://iplogin.lynda.com
URL https://iplogin.lynda.com
URL https://www.lynda.com/lyndaCampus/LoginOrCreateProfile.aspx
URL http://lynda.com/page/ajaxedfooter
URL https://lynda.com/page/ajaxedfooter
URL http://ldcemail.lynda.com/gT0N0yu0600hFS0gD000CHq
URL http://ldcemail.lynda.com/DDS000B0SFN00H0huy0q0g6
URL http://ldcemail.lynda.com/K0060DF00uHR0AgNq0hy0S0
URL http://ldcemail.lynda.com/Q0HS0NF0DP0u60yhg00y0q0
URL http://ldcemail.lynda.com/O0H0q06S0000yDuO0x0ghNF
URL http://ldcemail.lynda.com/K0060DF00uHN0wgNq0hy0S0
URL http://ldcemail.lynda.com/m000yv00006ghuqNFHM0DS0
Host iplogin.lynda.com
Host www.lynda.com
DJ lynda.com
NeverProxy cdn.lynda.com
Option HttpsHyphens
13
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Christian Wimmer, answering on the list:
« are you shure that OCLC provided that Stanza? This looks more like something a
vendor would produce. From my understanding there should only be one URL line per
stanza, the rest of them should be host lines. This should make the 10 or so
ldcemail.lynda.com kinda redundant. Same goes for the extra H iplogin.lynda.com. »
ProxyHostnameEdit ^lynda.com$ lyndacom
ProxyHostnameEdit .lynda.com$ lyndacom
Option NoHttpsHyphens
Title Lynda.com
U http://iplogin.lynda.com
H https://iplogin.lynda.com
H http://www.lynda.com
H https://www.lynda.com
H http://lynda.com
H https://lynda.com
H http://ldcemail.lynda.com
DJ lynda.com
NeverProxy cdn.lynda.com
Option HttpsHyphens
14
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
Or as published now
https://www.oclc.org/support/services/ezproxy/documentation/db/lynda.en.html
ProxyHostnameEdit ^lynda.com$ lyndacom
ProxyHostnameEdit .lynda.com$ lyndacom
Option NoHttpsHyphens
Title Lynda.com
URL http://iplogin.lynda.com
HJ iplogin.lynda.com
HJ www.lynda.com
HJ lynda.com
DJ lynda.com
Option HttpsHyphens
However doesn’t support https and first HJ line is useless.
15
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
To Javascript or Not To Javascript? (1/3)
• I wondered what the difference is, strictly talking about Javascript processing,
between DomainJavascript and HostJavascript.
• Hopefully, Susan Musser was able to get a definitive answer for me.
• A DJ line will turn on Javascript processing for any URL in the stanza that ends in the
given domain.
• So for example, the following stanza:
Title Some Database
URL http://www.somedatabase.com
DJ somedatabase.com
• Would turn on Javascript processing for a SPU with http://www.somedatabase.com
as a target URL because it matches the domain somedatabase.com.
16
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
To Javascript or Not To Javascript? (2/3)
• It would also enable javascript processing for any URL clicked once the proxied
session began that ended in the domain somedatabase.com, and this could include
both http and https URLs.
• an HJ line will turn on javascript processing for URLs that are matches with the origin
given. So for example, the following stanza:
Title Some Database
URL http://www.somedatabase.com
HJ www.somedatabase.com
D somedatabase.com
• Would turn on Javascript processing for any of the following URLs:
http://www.somedatabase.com
http://www.somedatabase.com/research
17
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL
To Javascript or Not To Javascript? (3/3)
• But not the following:
https://www.somedatabase.com (because the URL is https, not http)
http://www.somedatabase.com:8080 (because the port is different)
http://www.research.somedatabase.com (because the URL origin does not match)
http://somedatabase.com (again because the URL origin does not match)
• For each of the reasons given, and because the final line is a D and not a DJ.
• Conclusion:
» When Javascript processing is needed, which is true for 99% or more of the resources, DJ is
the way to go.
» When DJ is activated, Javascript processing will happen, even for resources in URLs
specified by H lines.
18
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 19
• Please ask your questions in the chat
• Easier for me!
Questions?
BIBLIOTHÈQUE DE L'UNIVERSITÉ LAVALBIBLIOTHÈQUE DE L'UNIVERSITÉ LAVAL 20
Thank you!Credit to all those who inspired me with their posts on the mailing listand forum from whom I borrowed a lot of the material in thispresentation:
• Andrew Anderson• Dom Benson• Paul Butler• Phil Elms• Bruno Ménette• Susan Musser• Ian Richmond• Mandi Schwarz• Christian Wimmer• And apologies to all those I forgot…