Transforming Administrative Data into Census Data: Challenges and Solutions
Explore the transition from traditional census to register-based approaches, focusing on transforming administrative data into census data. Delve into technical difficulties and challenges faced, such as inconsistency between registers, lack of a unified identification system, and differences in concepts. Discover types of administrative registers and methodologies for constructing integrated statistical registers. Gain insights into data linkage, conflict resolution, and more for a successful census process.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
United Nations Regional Workshop on the 2020 World Programme for Population and Housing Censuses for Arabic-speaking countries 5-8 December 2022 Algiers, Algeria Session 12 Transforming administrative data into census data United Nations Statistics Division Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Outline Generic model for the transition from the traditional census to registers-based approaches Types of Administrative Registers Technical difficulties and challenges of using administrative data Statistical Population Register (SPR) Constructing integrated statistical registers for the purpose of census o Data linkage o Dealing with duplication o Conflict resolution o Updating and the Signs of Life methodology o Editing and imputation Research and testing Conclusion Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Generic model for the transition from the traditional census to registers-based approaches Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Types of Administrative Registers Population registers Building and dwelling registers Enterprise (business) registers Base registers Social security or pension registers Tax registers Employment, unemployment and jobseeker registers Education and student registers Health registers Border control data Other supplementary registers Specialized or supplementary registers Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Technical difficulties and challenges of using administrative data Inconsistency between registers Timeliness and reference periods Lack of a unified identification system Classification systems Technical challenges and difficulties in a register-based census Missing data and the under- coverage of some populations Differences in concepts and definitions Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Register vs. Statistical Register Register Statistical Register Typically, a register is a structured list of units, containing a number of attributes for each of those units, and having a regular updating mechanism A statistical register is a register that is constructed and maintained for statistical purposes, according to statistical concepts and definitions, and under the control of statisticians Built primarily for administrative purposes, not for statistical purposes Administrative registers can be used as sources for statistical registers in a one-way flow of data, in line with the Fundamental Principle of Official Statistics Many administrative data files updated regularly can be considered to be registers, but the results of one-off data collections are not Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Statistical Population Register (SPR) The SPR is a systematized and indexed collection of individual records for every resident (including nationals and foreign citizens) of a country created by NSO by processing data from administrative registers for statistical purposes in accordance with statistical concepts and definitions Different from the registration of individuals in registers used for administrative purposes Connected on a regular basis with relevant administrative registers for the purpose of regular updating Legal framework for use of administrative data must ensure SPR is solely used for statistical purposes SPR is usually generated by NSO from the administrative population register, but if a centralized population register is lacking, SPR can be constructed by integrating several population registers Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Constructing integrated statistical registers for the purpose of census Key processes involved in the construction of a statistical register are: 1. Data linkage 2. Dealing with duplication 3. Conflict resolution 4. Updating and the Signs of Life methodology 5. Editing and imputation Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
1. Data linkage Record linkage refers to the identification and combination of records corresponding to the same entities persons, enterprises, dwellings and households in two or more data sources Deterministic or exact matching which is possible when unique IDs, such as personal identification numbers (PIN), exist in data sources to be used for matching Probabilistic matching -- performed when exact matching is not possible. In this case, linkage is made based on probabilistic decision rules established based on a set of key variables that are common in data sources, such as name, sex, date of birth, and address Combined approach can be applied, with exact/deterministic linking used first for as many records as possible, followed by probabilistic linking for the remaining records Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
1. Data linkage identification numbers Unique identification numbers greatly facilitate the linkage of several administrative data sources The unique identifier should be common across all relevant registers ID numbers are often created for administrative purposes to be used in population registers, civil registers, national identification systems or other administrative registers (although, sometimes they could be created for statistical purposes) The ways ID numbers are implemented vary among countries. Sometimes the number relates to the attributes of the individual and sometimes is a unique information-free number. It may be issued for citizens upon reaching legal age, or when they are born. To increase personal data protection, ID numbers should be encrypted to prevent information from being read by unauthorized parties Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
1. Data linkage linking persons to dwellings/households The basic counting units of a census include persons, households, families and dwellings When linking persons to dwellings: All the units require identification; however, the minimum necessary identifiers are for persons (PIN) and dwellings (address code, and/or spatial coordinates) In a traditional census, households are built using the housekeeping concept . This is challenging in a register-based census; thus, many countries instead use the household dwelling concept which considers all persons living in the same housing unit, as a household In some countries a distinct household register exists, which eases the process of building households. Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
2. Dealing with duplications To avoid significant coverage issues, an adequate process for detecting and removing duplications should be in place. Duplication of persons in the statistical population register can occur, if: Good record linkage methods are not in place Identifiers are not of good quality across data sources, causing a false match/mismatch of records Newly created units (true births or immigrants) and deleted units (true deaths or emigrants) are not well reflected in administrative data sources Maintaining a log of changes to the registers is helpful Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
3. Conflict resolution When combining multiple administrative data sources to create statistical registers, there may be inconsistencies in the values of variables across different sources delay in reporting changes by individuals delay in updating by administrative authorities incidents of multiple homes differing definitions, classifications, or reference periods errors in one source Identify which source is most likely to be updated and accurate for a particular variable Establish priority/decision rule for each variable Ensure that lower-priority sources do not overwrite data from a high-priority source Potential reasons: To resolve Coverage errors due to conflicting (or multiple) address information Content errors due to inadequate priority/decision rules Adverse effects Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
4. Updating and the Signs of Life methodology SPR needs to be continuously updated The basic minimum input for keeping SPR up-to-date is the information obtained from the civil registration of births, deaths, marriages, and of any changes of address resulting from either internal or international migration Signs of Life (SOL) methodology is a commonly used tool to ensure that SPR covers the census population, i.e., only include persons who are alive and meet a set of pre-defined residency criteria Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
4. Updating with the Signs of Life methodology Signs of Life (SOL) is a set of "activity rules" that can be used to check across various administrative data sources available to NSO in order to determine whether or not a person is alive and resident at a particular period in time The list of SOL markers can never be absolute; however, the more markers that can be used the more accurate will be the judgement. SOL markers may be determined by using data from CRVS, tax registers, social security, unemployment database, education database, etc. If a person has been at least once active (has a record) in a register during a specific year, then the value of SOL for him/her is 1; otherwise, 0. Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
4. Updating with the Signs of Life methodology To use SOL markers two key pre-conditions need to be satisfied, at least at the national level: administrative registers should contain records in which, all persons in all registers are identified by their unique ID codes; and all living quarters in all registers are identified by their unique address IDs; all registers should cover the whole population and be regularly updated at least annually Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
5. Editing and imputation Editing and imputation procedures keep SPR clean and consistent, with no conflicts between individual data items, no missing or improbable values Administrative data sources should be carefully investigated to detect and resolve systematic errors (either coverage or content errors) Examining metadata, particularly on editing procedures within the administrative authority, is critical, particularly in understanding the existence of any systematic limitations Editing: to correct values that are clearly erroneous or implausible Imputation: to insert plausible values where data items are missing Resolving approaches Where data comes from several sources, editing and consistency checks should be done simultaneously across several data sources Close contact with admin authorities to make improvements in the way that data are collected and recorded Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Research and testing NSOs should never integrate new administrative data sources into census data production, without feasibility research Feasibility research involves: Developing a detailed understanding of the administrative authority s data collection processes, the population covered, and the variables included within the administrative source as well as how accessible these data are Obtaining and examining test data in detail to identify quality issues and define cleaning and harmonization procedures, along with validation checks Combining with other available registers to verify data quality and select the most reliable variables and values, in accordance with developed methodological rules Producing estimates using test data and evaluating the estimates by comparing them with previous census results or other data sources Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Research and testing While performing research and testing, NSOs should address the following challenges when deriving census characteristics: ascertain the international standards (definition, classification, etc.) applicable to the target census characteristic compare and contrast census definitions and classifications with the definitions and classifications used in the administrative source test the accuracy of the administrative data recorded against alternative sources and work collaboratively with data suppliers to eliminate/mitigate any shortcomings determine which, and how many, sources are required to derive and assure the quality of each target census characteristic establish optimal rules for deriving each census characteristic and develop the necessary data processing software, optimised for the quality of outputs sought; and where characteristics are not covered by any administrative sources, take steps to ensure the creation of the necessary register (e.g. suggest amendments in register procedures, the legal environment, etc.) Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section
Conclusion Constructing the Statistical Population Register (SPR) is a complicated process that involves several steps that must be followed with extreme caution The basic minimum input for keeping SPR up-to-date is the information obtained from the civil registration of births, deaths, marriages, and any changes of address resulting from either internal or international migration Resources: UNSD: Handbook on Registers-based Population and Housing Censuses https://unstats.un.org/unsd/demographic-social/meetings/2021/egm-20211215/hb-reg-phc.pdf ECE: Guidelines for assessing the quality of administrative sources https://unece.org/sites/default/files/2021-10/ECECESSTAT20214_WEB.pdf ECE: Guidelines on the use of registers and administrative data for population and housing censuses https://unece.org/DAM/stats/publications/2018/ECECESSTAT20184.pdf ECE: Register-based statistics in the Nordic countries https://unece.org/DAM/stats/publications/Register_based_statistics_in_Nordic_countries.pdf Statistics Division Demographic and Social Statistics Branch Demographic Statistics Section