content migration part 1: terminalfour t44u 2013
TRANSCRIPT
2 2 t44u.2013
• TERMINALFOUR Site Manager comes with a number of tools to approach automated migration.
• Even with automated migration, some manual migration will be required for content not handled by the import tools.
• TERMINALFOUR endeavour to migrate at a minimum of 80% of the existing content.
• In some cases, custom tools are required to handle specific site and content structures.
3 3 t44u.2013
Manual
• The amount of content to be migrated <1500
• Is content well structured & marked-up correctly?
• Complexity of the original web site
Auto
• Is it coming from another CMS (e.g. Red Dot) ?
• Is the HTML source consistent?
• Can we access an XML extract
Integration
• Structured Data: Use Web Objects / Data Objects /Content Syncer / Web Services integration features
• Live code in pages
• Combination of three options above?
4 4 t44u.2013
1. Access Web Site
2. Analyse the Data
3. Configure the Migration Tool
4. Run the Migration
5. Test / QA
6. Manual updates
5 5 t44u.2013
STEP 1 :
Copy of the website
The content in HTML, XML, or a Database dump.
Media files including images, linked
documents, videos, flash movies etc.
STEP 2 :
Analysis of the website
How to determine the Site Structure?
Is there multi-lingual content to be
migrated?
What is the structure of the content (pages)?
STEP 3 :
Configure the Migration Tool
Map the elements from the existing
pages layouts to the new content
templates
6 6 t44u.2013
STEP 4 :
Running the Migration
Defined XML format & Defined Structure for
Content
Site Structure
Assets.
STEP 5 :
Test/QA
A link checker is run on the published website to determine if there
are any broken links or missing content.
Two types of manual review are required
STEP 6 :
Manual Update
some manual updates required in order to fix
issues
7 7 t44u.2013
1. How much data is to be migrated i.e. pages, sites by TERMINALFOUR?
2. Will the content be exported from the existing CMS in order to migrate it or will the published site be used? i.e. what format will the website be provided in.
3. Is the website structure the same or different in the new system? If different a mapping will be required between the old and the new structure.
4. Is there a one to one mapping from the old page layouts to the new page layouts?
5. Is there multi-lingual content to be migrated?
6. Will there be content that is not currently in the existing site to be migrated? If so, what format will this be in?
7. Is there mirrored content (same source content appearing in multiple locations on the site) within the website that needs to be handled during the migration? This includes portions of pages.
8. Are the pages well structured with markers to identify different components of the page?
8 8 t44u.2013
– Serena Collage (University of St. Thomas, University of Liverpool)
– InterWoven TeamSites (Southern States Coop)
– Documentum (Missouri State technical College)
– Open Text RedDot (University of the Arts London)
– Vignette (OECD)
– Microsoft CMS (UNAIDS)
– BroadVision (Aer Lingus)
– HannonHill Cascade Server
– Percussion (NUIG)
– SunGard LuminisCMS (University of Huddersfield)
– Active Networks IronPoint CMS (University of Fraser Valley, LMU)
– DreamWeaver HTML pages (VCU)
– Squiz (RMIT Australia, University of Stirling)
9 9 t44u.2013
Case Study: Weitz & Luxenberg
• A targeted list of page ID’s within a data source to be migrated in bulk from existing custom CMS.
• Create Hierarchy Builder to build parent and child structure within Site Manager from data source (MS EXCEL).
• HTML code being migrated to be “cleaned” using specified non-required HTML tags.
• Link Resolver to recurse imported HTML code and check for links that can be resolved – continual checking.
• Automatic Static/Regex URL Redirect
Case Study: W&L - Database to Database Migration Proposal
Link Resolver SM DB
HTML Cleaner
The HTML is cleaned in the SM DB without
resolved links
New section/page information is used to resolve the links
using a new function to match previous ID’s with new SM ID’s
4
5 6
Content Syncer W&L DB
Client Produce table of WebPageID, Level, Section Name, ContentHTML, OriginalURL, TemplateID, MetaDescrip, MetaKeywords
1
2
3
Table is imported into the Content Syncer using pre-
defined fields
Content is imported into Site Manager using template / column mapping
11 11 t44u.2013
Case Study: W&L – Data Source
Data Source can be Excel, SQL or MySQL Fields need to follow exact naming convention
13 13 t44u.2013
Case Study: W&L – Content Syncer
Ensure Site Creator Plugin set & test you can query Database
14 14 t44u.2013
Case Study: W&L – Content Hierarchy Built & Imported
• Example ‘t44u’ shows section created and hierarchy & content created
15 15 t44u.2013
Case Study: W&L – HTML Cleaner
• Current interface available now within Site Manager
Specify Section to clean and upload properties file
16 16 t44u.2013
Case Study: W&L – HTML Cleaner - Options
• Remove tags only:
Parse the HTML using Jsoup, extract the content enclosed by them and write it out, minus the tags that are to be removed.
• Remove attributes only:
Only defined attribute in the tag to be removed. The tag itself will remain within the content.
• Remove tags and content:
Parse the html, find the relevant tag and pull it and the enclosed content out of the file.
17 17 t44u.2013
Case Study: W&L – HTML Cleaner – Properties File
• Sample Properties file to keep listed tags and remove everything else
18 18 t44u.2013
Case Study: W&L – URL Redirect
Original URL captured from import & used to create Static or Regex URL Rewrite