census data analysis tools, areas, issues & needs
DESCRIPTION
CENSUS DATA ANALYSIS TOOLS, AREAS, ISSUES & NEEDS. Neena Sharma, IAS Director of Census Operations, Uttar Pradesh Office of the Registrar General & Census Commissioner, India. Stages in Census Operations-2011. DATA COLLECTION. Census of India 2011-Data Collection. - PowerPoint PPT PresentationTRANSCRIPT
CENSUS DATA ANALYSIS TOOLS, AREAS, ISSUES & NEEDS
Neena Sharma, IASDirector of Census Operations, Uttar Pradesh
Office of the Registrar General & Census Commissioner, India
1 •Field Work•Scanning of Forms
2 •Image based recognition•Content Validation
3 •Tabulation•Dissemination
Stages in Census Operations-2011
DATA COLLECTION
Census of India 2011-Data Collection• Census 2011 is the 15th Census of India since 1872
• Census 2011 was held in two phases:
–Houselisting & Housing Census(April to September 2010)
–Population Enumeration(9th to 28th February 2011)
Reference Date: 0:00 Hours of 1st March 2011
–In Snow Bound areas the Population Enumeration was conducted from 11th to 30th September 2010
Reference Date: 0:00 Hours of 1st October 2010
Some Facts about Census 2011
Cost USD 445 Mn
Cost per person USD 0.37
No. of Census Functionaries 2.7 Mn
No. of Languages in which Schedules were canvassed
16
No. of Languages in which Training Manuals prepared
18
No. of Schedules Printed 340 Mn
No. of Training Manuals Printed 5.4 Mn
Paper Utilised 8,000 MTs
Material Transported 10,500 MTs
Yes. We have been counted !!!!!!
DATA PROCESSING
• Indian Census - Always been in the forefront of using latest technology
• 1961 Census – Unit Record machines used• 1971 Census – Key-punching (electrical cum
mechanical) machines used – An IBM 1401 computer with IBM card Reader used
• 1981 Census – Data Entry made using Key to Disk machines. Processing by HP 1000 CD-Cyber 730 & NEC - 1000 Computer System at NIC
Capturing Information and Processing huge volume of Census Data
• 1991 Census - Medha 930 Main Frame Computer System used for Data processing. Unix based dumb terminals used for data entry
• 2001 Census – First large country to use image based Automatic Form Processing Technology, High Speed Duplex Scanners used for image capturing
• 2011 Census – Using more developed ICR Technology with advanced features.
Capturing Information and Processing huge volume of Census Data
Scanning
ASCII FILE
Prepare Batch
Recognition
Tiling
Completion
Exception
Export/Archival
Census Data Processing-17 locations
• The unique TILE module optimize data accuracy with a systemized display of characters grouped together to allow easy identification
• Possible to identify which characters are correct and which are not and allows to mark as reject.
• Makes the completion more accurate
Tile
TILING STATIONIMAGE BASED FORMSPROCESSING
DATA ANALYSIS
• Provisional Population Totals for India and States compiled from Enumerator’s Abstract manually declared within about four weeks Population, 0-6 population, No. of literates
• Filled-in Schedules are collected, scanned and processed in two phases – Houselisting & Housing Census and Population Enumeration
• Extensive quality check and data validation undertaken
• CSPro software used for tabulation• More than 300 tables to be published on Census
2011 at National, State, District levels including Primary Census Abstracts
Data Analysis
Administrative Units in India Country
State
District
Sub-district C D Block
TownVillage
Ward
Panchayat
Village
Number of Administrative Units in Census 2011States/UTs 35Districts 640Sub-districts 5,924Towns 7,935Villages 0.64 million
Number of Administrative Units in India
• Population• Age• Marital Status• Scheduled Castes• Scheduled Tribes• Mother Tongue & Language• Religion• Village Directory and Town
Directory
• Literacy & Educational Status• Economic activity• Migration & Urbanization• Fertility & Mortality• Disability• Housing• Availability of amenities.
Census - Not merely a head count
Biggest source of comprehensive data with information on
Census creates two separate databases• Houselisting & Housing Census Data (at
Household level) (April to September 2010)• Population Enumeration Data (at individual
member of the Household level) (February 2011)
• In Census 2011, attempt is being made to link these two databases to cross-tabulate information (an issue in the past to be tested now)Possible to tabulate cross tabs on Condition of
Housing with Economic Condition, etc
Issues in Data Analysis
To use Abridged Houselist
• Boundary of the Enumeration Areas (EA) kept unchanged during the two phases of operation
• Provision made in the Household Questionnaire (Phase 2 Operation) to record the Household Number marked in the Phase 1 Operation
• The EA and HH Numbers to serve as link fields in the two databases
Issues in Data Analysis
• Generating time series tables from the previous censuses
• As boundary of Enumeration Areas (EAs) are not permanent – it is not possible to link the EA from one census to the next
• EAs are carved out on the basis of population size and therefore if the population changes the number of EAs carved out also varies
• Consequently, every Census has generated stand-alone databases
Issues in Data Analysis
• New districts, sub-districts, towns and villages have been created and has impeded time series analysis
• Number of these administrative units have changes significantly over the last three Censuses
• An attempt is underway to link the databases available since 1991 Census on jurisdictional changes up to Town and Village levels.
Issues in Data Analysis
• In Census 2001, 1%/5% micro-data files on housing census released– India and States (1% data)– States and Districts (5% for large states and 10% for
smaller states)• Sample micro-data files from census on population
enumeration not released in public domain– Planning to make available micro-data files for research in
institutions/universities through work-stations
Issues in Data Analysis
Needs
• Linking of files pilot-tested• Enhancing capacity of staff members in data
processing and analysis unit in SPSS, SAS etc. at national and state levels
• Organizing jurisdictional changes (redistricting) for trend analysis
• Developing architecture for data warehousing and mining to enable trend and in-depth analysis– Feasibility study to be undertaken
• Support in setting up work-stations for research in micro-data (anonymized) - good practices from other countries
Needs
Thank you