Skip to end of metadata
Go to start of metadata

From GSIM v1.0 Glossary

Survey Population

A Population for which information can be obtained in a survey.

A Population which can realistically be studied (example: people currently residing in the province of Ontario not in an institution nor in a remote northern location nor temporarily out of the province). The Survey Population is therefore often a subset of the Target Population

Synonyms: object class

Frame population

A Population represented by records in a frame, which is the observable part of a Target Population and provides a reasonable approximation to it.

Example: most recent population census frame 

Synonyms: object class

 

 

 

14 Comments

  1. Understanding Populations -

    Propose the following survey - Estimate the percent of US adults who have health insurance.

    Conduct the survey by contacting households and people through random digit dialing.

    Popuations:

    1) Target population - US adults

    2) Survey popuation - US adults with working telephone

    3) Frame population - Working household/personal telephones

  2. The issue triggers a set of reflections.

    1) I am not so familiar with random digit dialing, but I see that you do not need a frame to take your survey units from. You still need a frame for raising and weighting. I assume that this frame is not generated in the same way as the sample, but is generated from administrative information on telephone connections (with problems of over and under coverage e.g. due to time lags). My point is, that in this example the usual relation between sample and frame is not valid. I also wonder whether the survey population should not have been defined in terms of telephones. This would make the example less suitable to me.

    2) I am quite surprised by the synonyms mentioned above. Should I conclude that survey population and frame population are synonyms?

    3) On the explanation to the definition of survey population: why only mention the under coverage, and not mention potential over coverage (persons temporarily in the province)?

    4) Why do we need the fourth population concept: analysis population and analysis unit? I am familiar with the concept of analytical units in business statistics. These units are not units for themselves, but artificially created for statistical purposes (for instance to make the units more homogeneous in economic activity).

     

     

  3. 1) The sample is indeed selected from the frame in this survey.  What may be confusing is that what constitutes the sample (phone numbers) and the units contacted (persons) are not the same.

    I don't understand the point about weights.

    By a frame here I mean the set of things that exemplify the frame population.  That is consistent with the first part of the definition above.  A sample is selected from the frame.

    The usual relation between sample and frame is maintained here, but as I said above, the sample elements only point to elements in the target population; they aren't part of the target population themselves.

    2) The second part of the definition of frame population is not correct.  My example illustrates that a more complicated relationship exists.  Frame population and survey population are not synonyms.

    3) That is only an example.  The issue is the survey population is the population you can actually measure, yet you use it to make an estimate for the target population - persons with telephones are a proxy for all persons, for instance.

    4) Analytical populations and units appear as a description of tables and the like.  I agree, this may require a little more thought to make sure the concept is useful.

    1. Dan, it is probably my ignorance on random digit dialing and I am reluctant to show my ignorance to the world. I imagined that a machine would dial random numbers without any frame. This way you verify the actual working of the telephone. I already wondered how you would introduce the distinction between private telephone numbers and numbers related to businesses and organisations. I now understand that you do start from a frame of private telephone numbers (an administrative file). I do not see how you can certify that the telefone numbers in your frame actually consitite working telephone connections.

      Anyway, if the survey population is taken from the frame, they should have the same units, in case working private telephones.

      (My remark on raising and weighting was based on the assumption that the usual relation between frame and survey population was not valid)

  4. Wim, I think in the vast majority of cases, the frame, survey, and target populations have a lot in common.  So, it is confusing to consider these in cases where they might be quite different.  In a usual establishment survey, the populations might look like this:

    frame - known business establishments (rendered as a business register)

    survey - known business establishments (Note - this is the same!)

    target - business establishments

    Similar constructs are used in household surveys with either a housing or population register as the frame.

    Another case that might help clarify differences is in political polling prior to elections.  The US relies on these polls extensively in the run-up to elections.  They are almost always RDD surveys.  For these polls, the populations are as follows:

    Frame - working private telephone number

    Survey - people who say they are likely to vote

    Target - voters

    In this case and my previous example using RDD, the frame contains proxies (phone numbers) for units in the survey population.

  5. Thank you for the explanations. For me the more familiar example of different units in the target population and in the frame is to have a frame of addresses and a survey of addresses pointing to persons.

    Your new opinion poll example would for me be a kind of two stage survey, with first selection from working private telephone numbers, survey on votera/non-voters linked to the chosen numbers, the new survey population will be the voters, survey voters on their voting behaviour. These two steps will probably be integrated in one questionnaire, but that does not change the view on the design of the process and the information objects needed.

    I wonder whether the definitions we have express well the differences between the different types of population. The difference is not in the data structure, it is in the function in the statistical process. Actually, the same data set can be a survey population in one phase of the process and in a later phase of the process become a frame.

  6. I'm really confused.

    A Population is an aggregation of Units (figure 11 in the specification). Frame Population, Survey Population, Target Population and Analysis Population are sub-types of Population (also figure 11). Therefore, each Frame Population, Survey Population, Target Population and Analysis Population must specify a subset of units in the Population.

    For example, Frame Population is defined as "Population represented by records in a frame, which is the observable part of a Target Population and provides a reasonable approximation to it." The units in the Frame Population must be a subset of the same Population as the subset of units in the Target Population.

    In his comment of 4 June, Dan says "1) The sample is indeed selected from the frame in this survey.  What may be confusing is that what constitutes the sample (phone numbers) and the units contacted (persons) are not the same." This introduces some unmodeled objects: Sample and Frame. While a frame may well consist of telephone numbers, that's surely not the same as the Frame Population. You're not seeking the political views of phone numbers. Maybe it is people answering known phone numbers.

    I think Wim might have the key to a solution when he says "Actually, the same data set can be a survey population in one phase of the process and in a later phase of the process become a frame." Following from this, perhaps there is no need to have sub-types of Population at all. Perhaps all the meaning we need could be represented in how Population relates to other objects, or maybe even in understanding what type of processes consume and create Population objects.

  7. 2/7/13 meeting: Carrie Ashley to coordinate ABS input

  8. Current content in GSIM and key posts to date

    There are four sub-types of Population identified in GSIM

    • Survey Population
    • Target Population
    • Frame Population
    • Analysis Population

    In the Enterprise Architect file, Population itself is NOT abstract.  (This means you can have concrete examples of Population in its own right, not just examples of its subtypes?)

    Jenny Linnerud originally posted this issue, regarding the definition and use of the subtypes, particularly Survey Population and  Frame Population.

    Dan and Wim discussed examples and definitions related to the subtypes.

    Gareth then commented he was "really confused".  The conclusion of his post was

    I think Wim might have the key to a solution when he says "Actually, the same data set can be a survey population in one phase of the process and in a later phase of the process become a frame." Following from this, perhaps there is no need to have sub-types of Population at all. Perhaps all the meaning we need could be represented in how Population relates to other objects, or maybe even in understanding what type of processes consume and create Population objects.


    Key Issues

    If GSIM is going to define subtypes of Population then these definitions need to be clear, agreed and applied consistently.  There is a broader question, however, of whether it is necessary and appropriate to have subtyping of Population within GSIM at all.

     

    Key Considerations

    In following up this issue, John Machin and his team held extensive discussions with methodologists across the ABS who work with definition and use of (subtypes of) populations (and units).  The original intent was to provide improved, "methodologist agreed", definitions and examples of the subtypes.  

    It became apparent through the investigations that economic and social statistics areas within the ABS subtype populations on different and inconsistent bases.  Around 20 potential subtypes were identified, with some being refinements of broader subtypes.  This raised the question of how to determine what "granularity" of subtyping GSIM should aim to have agreement on.  Seeking firm international agreement on a reference categorisation and definition of Population subtypes would seem to be a major undertaking, one which would be more appropriately led by methodologists rather than metadata modellers.          

    The issue that Gareth raises was also raised by ABS methodologists.  The subtypes relate primarily to the purpose for which a population is being defined/used in a particular case.  Exactly the same population in terms of consistuent units could have a different "subtyped" use/role in different contexts.  There are typically not a set of different attributes that need to be recorded about a population depending on its use/role based subtype.

    While it is necessary to be able to identify the role a particular population is performing in a particular instance, this is an attribute to be recorded in regard to (eg) the role of Process Inputs, or a description of methodology, rather than being intrinsic to the definition of the Population itself.  

     

    Key Options

    Option A: Remove reference to Population subtypes from GSIM V2.0

    Option B: Seek to clarify and improve definitions and examples for Population subtypes for GSIM V2.0, obtaining "best fit" to the views of methodologists in the time available.

     

    Recommendation  

    Option A

    Descriptions of Population could still refer to examples to make it clear, for example, that Population is inclusive of populations that typically cannot be enumerated precisely in terms of the specific units they contain at a particular point in time (eg Target Population) and those which can (eg Frame Population and Sample Population).  It is just that particular sub-typing is not structural to GSIM as a model.

     

    Rationale

    It appears highly questionable whether sub-typing of Population belongs in any future version of GSIM.

    Even if sub-typing of Population were to be appropriate for a future release of GSIM, however, improving specification and documentation of the subtyping from GSIM V1.0 does not appear to be a high value inclusion for GSIM V2.0.  Time could be better spent on addressing higher priority improvements for GSIM V2.0.

    Seeking greater clarity and agreement on Population subtypes within the methodological community internationally, before seeking to add subtypes back into GSIM, would appear appropriate IF such subtypes belong in GSIM in the first place.     

     

    High level implications

    Feedback from ABS Methodologists suggested subtyping of Unit (currently Observation Unit and Analysis Unit) has the same issue as subtyping of Population.  The same unit can, for example, be both an Observation Unit and an Analysis Unit depending on context.  It is recommended that if subtype is removed from Population for GSIM V2.0, it also be removed from Unit.

  9. The survey population is the collection of the units, while, in addition, the frame population also contains information about the units (addresses, travel costs connected to interviews etc.) necessary to draw an optimal sample.  An example of the difference between the frame population and the survey population is that a unit (e.g. a person) in a survey population may remain the same from one year to another, but if he/she changes his/her address, the corresponding unit in the frame population will change.

  10. Discussion 27/8:

    Agreed to remove subtypes of population and unit.

    How is GSIM going to support sampling? We don't think that population subtypes would solve this.

    We could have roles for populations rather than subtypes - how? It is the relationship with/to the object not the object itself.

    1. This very much depends on what GSIM should provide with regard to sampling.  I was part of a group that built a model for describing a sampling scheme for DDI.  We could take a look at that.

      Describing a population is nothing like describing a sample, except there is a population associated with a sample.

  11. Dan, I am going to close this issue. The sampling discussion should be raised as a separate issue so we dont get confused.