building database-backended multilingual, multimedia data repositories: the aaqua experience
DESCRIPTION
Developmental Informatics Lab UsabilityTRANSCRIPT
BuildingDatabase-backended Multilingual, Multimedia Data Repositories:The aAQUA Experience
Developmental Informatics Lab
Introduction
aAqua’s (almost All questions answered) – An online forum for answering
questions from the grassroots by the experts in the field.
Bridges gaps in use of ICT– Usability– Availability– Multi-Linguality – Multi-media Support– Multi-Lingual Storage and
Retrieval– Reusability
Developmental Informatics Lab
Usability
Developmental Informatics Lab
A Sample Thread
Developmental Informatics Lab
aAqua in Operation
Developmental Informatics Lab
aAqua Server
aAqua Server
Crop Doctor
Crop Recommendation
KeywordBrowser
BhavPuchiye
aAqua
Internet
HTTP
aAquaOffline
Mobile network
aAquaMobile
Gateway
SMS
aAqua Demo
Developmental Informatics Lab
aAQUA- a technical perspectiveEmploys three tier web architecture Uses mvnforum which is based on the MVC
architecture.Lucene used as search engine.Compatible with any servlet container which
supports JSP1.2 and Servlet2.3Runs on tomcat Works with unicode UTF-8 compliant Oracle 9i as
well as mysql database Is integrated with open source digital library
software
Developmental Informatics Lab
Multi-Linguality
Developmental Informatics Lab
Multi-lingual Storage and Retrieval
Query in Hindi
UNL Document
Result in Hindi
“flowersScorch”
UNL Document
UNL Document
Inforepository
…The plants blossom but the flowers scorch… and(blossom(icl>develop(obj>thing)):0S.@entry.@custom, scorch(icl>dry(obj>thing)):2E.@contrast.@custom) obj(blossom(icl>develop(obj>thing)):0S.@entry.@custom, plant(icl>organism):04.@def.@pl) obj(scorch(icl>dry(obj>thing)):2E.@contrast.@custom, flower(icl>reproductive structure):1P.@pl.@def)
UNL graph
Developmental Informatics Lab
UnicodeComputers store letters and other characters by
assigning a number for each.Hundreds of different encoding systems for
assigning these numbers. Before unicode, no single encoding could contain
enough characters. Universal encoded character set
– Enables information from any language to be stored using a single character set.
– Provides a unique code value for every character, regardless of the platform, program, or language.
Developmental Informatics Lab
Unicode standard UTF-8 encoding
–Popular with html–A way of transforming all Unicode characters into a variable length encoding of bytes. –The Unicode characters corresponding to the familiar ASCII set have the same byte values as ASCII–UTF-8 can be used with much existing software without extensive software rewrites.
UTF-16 encoding–UTF-16 used when efficient access to characters is needed with economical use of storage. –Most of the heavily used characters fit into a single 16-bit code unit, while all other characters are accessible via pairs of 16-bit code units.–Better compatibility with Java
Developmental Informatics Lab
Unicode Encodings
C3 9174
63
E6 84 80ED A0 81 B0
C3 B6D0
64
006300E100746100
006400F60424
D801 DC02
át
c
öd
A4
UTF-8 UTF-16Characters
Developmental Informatics Lab
Unicode and the WebPreferred encoding form for Unicode characters on
the web is UTF-8 HTTP header of a document should contain the line
– Content-Type: text/html; charset=utf-8 (for HTML files)– Content-Type: text/plain; charset=utf-8 (for TEXT files)
Or in a HTML document, add the following line under HEAD the element < META http-equiv=Content-Type content="text/html; charset=UTF-8" >
Developmental Informatics Lab
Creating unicode databasesMysql/Oracle
– CREATE DATABASE database_name CHARACTER SET character_set
– CREATE DATABASE confluence CHARACTER SET utf8; – Oracle 9i supports UTF 16 also. (CHARACTER SET :
AL16UTF16 )Postgres
– CREATE DATABASE database_name WITH ENCODING 'UTF8';
Thank You