content migration part 1: terminalfour t44u 2013

20
The Aviva Stadium Dublin, 21-22 November 2013

Upload: terminalfour

Post on 20-Aug-2015

827 views

Category:

Technology


0 download

TRANSCRIPT

The Aviva Stadium Dublin, 21-22 November 2013

2 2 t44u.2013

• TERMINALFOUR Site Manager comes with a number of tools to approach automated migration.

• Even with automated migration, some manual migration will be required for content not handled by the import tools.

• TERMINALFOUR endeavour to migrate at a minimum of 80% of the existing content.

• In some cases, custom tools are required to handle specific site and content structures.

3 3 t44u.2013

Manual

• The amount of content to be migrated <1500

• Is content well structured & marked-up correctly?

• Complexity of the original web site

Auto

• Is it coming from another CMS (e.g. Red Dot) ?

• Is the HTML source consistent?

• Can we access an XML extract

Integration

• Structured Data: Use Web Objects / Data Objects /Content Syncer / Web Services integration features

• Live code in pages

• Combination of three options above?

4 4 t44u.2013

1. Access Web Site

2. Analyse the Data

3. Configure the Migration Tool

4. Run the Migration

5. Test / QA

6. Manual updates

5 5 t44u.2013

STEP 1 :

Copy of the website

The content in HTML, XML, or a Database dump.

Media files including images, linked

documents, videos, flash movies etc.

STEP 2 :

Analysis of the website

How to determine the Site Structure?

Is there multi-lingual content to be

migrated?

What is the structure of the content (pages)?

STEP 3 :

Configure the Migration Tool

Map the elements from the existing

pages layouts to the new content

templates

6 6 t44u.2013

STEP 4 :

Running the Migration

Defined XML format & Defined Structure for

Content

Site Structure

Assets.

STEP 5 :

Test/QA

A link checker is run on the published website to determine if there

are any broken links or missing content.

Two types of manual review are required

STEP 6 :

Manual Update

some manual updates required in order to fix

issues

7 7 t44u.2013

1. How much data is to be migrated i.e. pages, sites by TERMINALFOUR?

2. Will the content be exported from the existing CMS in order to migrate it or will the published site be used? i.e. what format will the website be provided in.

3. Is the website structure the same or different in the new system? If different a mapping will be required between the old and the new structure.

4. Is there a one to one mapping from the old page layouts to the new page layouts?

5. Is there multi-lingual content to be migrated?

6. Will there be content that is not currently in the existing site to be migrated? If so, what format will this be in?

7. Is there mirrored content (same source content appearing in multiple locations on the site) within the website that needs to be handled during the migration? This includes portions of pages.

8. Are the pages well structured with markers to identify different components of the page?

8 8 t44u.2013

– Serena Collage (University of St. Thomas, University of Liverpool)

– InterWoven TeamSites (Southern States Coop)

– Documentum (Missouri State technical College)

– Open Text RedDot (University of the Arts London)

– Vignette (OECD)

– Microsoft CMS (UNAIDS)

– BroadVision (Aer Lingus)

– HannonHill Cascade Server

– Percussion (NUIG)

– SunGard LuminisCMS (University of Huddersfield)

– Active Networks IronPoint CMS (University of Fraser Valley, LMU)

– DreamWeaver HTML pages (VCU)

– Squiz (RMIT Australia, University of Stirling)

9 9 t44u.2013

Case Study: Weitz & Luxenberg

• A targeted list of page ID’s within a data source to be migrated in bulk from existing custom CMS.

• Create Hierarchy Builder to build parent and child structure within Site Manager from data source (MS EXCEL).

• HTML code being migrated to be “cleaned” using specified non-required HTML tags.

• Link Resolver to recurse imported HTML code and check for links that can be resolved – continual checking.

• Automatic Static/Regex URL Redirect

Case Study: W&L - Database to Database Migration Proposal

Link Resolver SM DB

HTML Cleaner

The HTML is cleaned in the SM DB without

resolved links

New section/page information is used to resolve the links

using a new function to match previous ID’s with new SM ID’s

4

5 6

Content Syncer W&L DB

Client Produce table of WebPageID, Level, Section Name, ContentHTML, OriginalURL, TemplateID, MetaDescrip, MetaKeywords

1

2

3

Table is imported into the Content Syncer using pre-

defined fields

Content is imported into Site Manager using template / column mapping

11 11 t44u.2013

Case Study: W&L – Data Source

Data Source can be Excel, SQL or MySQL Fields need to follow exact naming convention

12 12 t44u.2013

Case Study: W&L – External Content Syncer Handler

Setup Connection to Data Source

13 13 t44u.2013

Case Study: W&L – Content Syncer

Ensure Site Creator Plugin set & test you can query Database

14 14 t44u.2013

Case Study: W&L – Content Hierarchy Built & Imported

• Example ‘t44u’ shows section created and hierarchy & content created

15 15 t44u.2013

Case Study: W&L – HTML Cleaner

• Current interface available now within Site Manager

Specify Section to clean and upload properties file

16 16 t44u.2013

Case Study: W&L – HTML Cleaner - Options

• Remove tags only:

Parse the HTML using Jsoup, extract the content enclosed by them and write it out, minus the tags that are to be removed.

• Remove attributes only:

Only defined attribute in the tag to be removed. The tag itself will remain within the content.

• Remove tags and content:

Parse the html, find the relevant tag and pull it and the enclosed content out of the file.

17 17 t44u.2013

Case Study: W&L – HTML Cleaner – Properties File

• Sample Properties file to keep listed tags and remove everything else

18 18 t44u.2013

Case Study: W&L – URL Redirect

Original URL captured from import & used to create Static or Regex URL Rewrite

19 19 t44u.2013

Case Study: W&L – URL Redirect

Example ISS Static URL Rewrite Mapping

20 20 t44u.2013

Case Study: W&L – URL Redirect – The Future ‘V8’

Beta Screen Grab for V8 – URL Redirect