introduction to data and text mining - jisc digifest 2016

15
Introduction to Data and Text mining Catherine Grout

Upload: jisc

Post on 16-Apr-2017

5.407 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Introduction to data and text mining - Jisc Digifest 2016

Introduction to Data and Text miningCatherine Grout

Page 2: Introduction to data and text mining - Jisc Digifest 2016

Click to icon to add image

https://en.wikipedia.org/wiki/Text_mining2/03/2016

Page 3: Introduction to data and text mining - Jisc Digifest 2016

What is data and text mining?

» Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text

» High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning

» Text mining usually involves the process of structuring the input text … deriving patterns within the structured data, and finally evaluation and interpretation of the output

» High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness

Ref : http://bit.ly/_jisc_textmining

Text mining

2/03/2016 Introduction to Data and Text mining

Page 4: Introduction to data and text mining - Jisc Digifest 2016

What is data and text mining?

» Data mining – is an imprecise term but means anything from› Large scale data analysis within science - outputs of Hubl

telecscope, Cern Large Hadron Collider

› Analysing census data for socio-economic trends (medium scale –finite amount of data)

› The opportunities of mining connected small objects/collections of research data to find new insight. e.g. bringing together various versions of the Mona Lisa and using Data Mining to analyse their underlying structure

Ref : http://bit.ly/_jisc_textmining

Data mining

2/03/2016 Introduction to Data and Text mining

Page 5: Introduction to data and text mining - Jisc Digifest 2016

Click to icon to add image

2/03/2016 Introduction to Data and Text mining

Page 6: Introduction to data and text mining - Jisc Digifest 2016

What is its value for research and education?

» 2012 – Jisc published a key report “Value and benefits of text mining” http://bit.ly/Jisc_textmining

» Took a case study approach and also under took an economic analysis of the benefits (…biomedicine)

» Wider at scale benefits were harder to come by owing to legal and technical limitations in inhibiting systematic use

» Since then new benefits have emerged

2/03/2016 Introduction to Data and Text mining

Page 7: Introduction to data and text mining - Jisc Digifest 2016

Introduction to Data and Text mining

What types of benefits?» Finding research insights that were

not possible through other techniques

» Bringing together texts/data across different discipline and finding new insights

» “Text mining offers a way of helping researchers to make sense of and leverage value from the vast sea of electronic resources, which is continually expanding.”

» .”.potential to increase the research base available to business and society and to enable business and others to use the research base more effectively”2/03/2016

Health benefits of outdoor educationhttps://en.wikipedia.org/wiki/Outdoor_education

Page 8: Introduction to data and text mining - Jisc Digifest 2016

Introduction to Data and Text mining

Innovative research in Humanities and Social sciences

» Digging into Data challengehttp://diggingintodata.org

» International Initiative now in its 4th funding round e.g.:› Trees and Tweets -

http://bit.ly/treesandtweets

› DiLiPaDhttp://dilipad.history.ac.uk/

2/03/2016

Page 9: Introduction to data and text mining - Jisc Digifest 2016

Introduction to Data and Text mining

Click to icon to add image

2/03/2016

Page 10: Introduction to data and text mining - Jisc Digifest 2016

Click to icon to add image

2/03/2016 Introduction to Data and Text mining

Page 11: Introduction to data and text mining - Jisc Digifest 2016

Mining repositories: Core» CORE is an aggregation of Open Access Repositories and offers itself as a

platform for TDM (£25 million articles)› Can use an API (of interest if want to build value add services on top)› Or - download the whole aggregation as an open dataset here:

https://core.ac.uk/intro/data_dump› Jisc and the Open University running CORE in partnership, with the

back-end aggregation hosted by the OU and the front-end services hosted by Jisc. (Further services by Jisc could be developed on top of this.)

2/03/2016 Introduction to Data and Text mining

Page 12: Introduction to data and text mining - Jisc Digifest 2016

Introduction to Data and Text mining

Universities and industry» NCUB (National Council for Universities and Business) is developing a tool

called an “Intelligent Broker”› To assist with making better links between University and Industry› Could potentially harvest and mine data from key sources like the

Research Council’s Gateway to Research, equipment.data (national equipment portal) and other services potentially - like Core

› This would give SME’s more intelligence about research intensive activity in particular areas for example

2/03/2016

Page 13: Introduction to data and text mining - Jisc Digifest 2016

Introduction to Data and Text mining

Click to icon to add image

2/03/2016

Page 14: Introduction to data and text mining - Jisc Digifest 2016

And finally…» Open Citation Experiment (using Text mining techniques – see Digifest

session and demo on this!)» Jisc are commissioning a study to examine the Text mining landscape and

future contributions to this space to review:› The current landscape – primarily in UK HE but also looking

internationally, and within other relevant sectors to provide a broad view

› The market – what are the value chains and where might Jisc contribute?

› The legal position and other inhibitors› Researcher practice, the issues they encounter,  their current and

future needs, considering subjects that use and those that don’t› Existing platforms, services and tools, and potential for use by Jisc or its

customers› Recommendations on possible future areas of work or services for Jisc

to explore 2/03/2016 Introduction to Data and Text mining

Page 15: Introduction to data and text mining - Jisc Digifest 2016

jisc.ac.uk

Introduction to Data and Text mining

For more information

Contact

Catherine GroutHead of change - [email protected]

2/03/2016