iaea international atomic energy agency special characters implementation zbigniew majewski 12th...
TRANSCRIPT
IAEAInternational Atomic Energy Agency
Special Characters Implementation
Zbigniew Majewski
12th Joint INIS/ETDE Technical Committee Meeting 21-22 October 2009, Vienna, Austria
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Outcome of the 11th JTCM
• XML implementation for INIS output and a new input tool development should allow introduction of Unicode.
• the recommendation to develop a detailed plan regarding the possible implications of UNICODE implementation
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Problem
• INIS allows characters a-Z, digits and a few special characters
• INIS records quality is constrained due to the limited character set• Some abstracts, original titles, author names,
conference and journal titles use multilingual characters
• For some INIS records, formulas are needed in their abstracts
• Extra effort to eliminate rich character set of electronic input
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Impacts
• Storage• Databases and data exchange files
• Processing • QA (checking rules, authority validation) • Retrieval• External applications
• Presentation • HTML/XML enabled browsers • User Interface using tool specific data formats
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Approach options
• Unicode enabled storage based• Unicode encoding (binary representation)
implemented in all layers (storage, processing and presentation)
• Use of XML for interfaces (like Atomindex)
• Mark-up based• ASCII based mark-up for Unicode characters
implemented for storage and presentation• Processing modified to recognize mark-up or to
become character agnostic
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Barriers
Processing Step Software Component Unicode compatibility
Pres. Proc. Storage
BR preparation FIBRE - - -
MET + + +
Submission to INIS Secretariat
e-mail, FTP, File system
+ + +
Image processing Scanning/OCR + - +
BR QA IDPS + - +
Thesaurus + + +
Journals + + +
CAI + + +
INIS Products Atomindex +
NCL Collection +
INIS DB on DVD +
INIS DB on Web + - +
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Actions
• Finalize upgrading the software platform used by INIS applications
• Modify FIBRE and IDPS to allow Unicode characters
• Extend use of XML as the INIS record format throughout the entire INIS process
• Agree on use of Unicode in Atomindex• Replace the search engine to allow
searches with Unicode characters
IAEA 21-22 October 2009, Vienna12th INIS/ETDE Joint Technical Committee Meeting
Thank you!