Here are our answers to questions you asked/posed. For ease of access, they have been grouped by question themes below. Don’t hesitate to submit your own question(s) using this form. We will try our best to get back to you as soon as possible!
For additional Q&A, check out the NCES Frequently Asked Questions (FAQ).
If you do not see your question answered below, ask us! Click HERE.
Is there special software to use to analyze PIAAC data?
Can one just use SAS, Stata or SPSS normally to analyze PIAAC data?
Does one need to use the macros at all times when dealin with PIAAC data?
I am looking at the relationship between problem solving and educational attainment. I want to use the data from the PIAAC 2012 survey but it was difficult using the code book to find the Scaled scores that are Below 1, Level 1, 2, and 3 for problem solving. How was that coded?
How can I calculate the number/share of people who were excluded from problem solving in technology-rich environments (PS-TRE) due to their lack of computer skills?
How do I conduct analysis with the IALS and ALL data I received from Statistics Canada?
How do I access U.S. data files?
What are the differences between the U.S. Public Use File (PUF) on the NCES website and the U.S. PUF available for download on the Organization for Economic Co-operation and Development (OECD) PIAAC web site? Is there only information added or has some old information been removed?
What additional information is available in the U.S. Restricted Use File (RUF)? How can I gain access to the RUF?
Is there any way to get figures for the U.S. competencies grouped by state, county, or region?
I am trying to clarify how the Education and Skills Online assessment can be utilized. Would geographic areas (counties) be able to get data on their region? When will it be available? Will there be any cost in establishing a specific group/region for assessment purposes?
I'm looking for the results of the PIAAC U.S. Prison Study. In particular, I would like to see the results of the background questionnaire in that group. Are those results available yet? And if so, how could I find them?
Has information on country of birth been collected in the United States?
I am trying to link observations geographically and by occupation with other data. Is an occupation variable with 4-digit ISCO codes available in the restricted-use dataset? Is the occupation variable with 2-digit ISCO codes available in the public-use dataset or only in the restricted-use dataset?
How can I compare results from PIAAC with results from the Program for International Student Assessment (PISA) in Turkey?
Have the test items for the Computer Literacy Core (CLC) and Computer Numeracy Core (CNC) been released? If so, where can I find them?
How is the derived employment variable C_D05 defined? It appears to reflect a current employment status, but when matched with other employment status variables, such as C_Q07, a number of people are presenting as employed in one and unemployed in another.
Which items comprise the “Readiness to Learn” derived index in the PIAAC data set?
What is the meaning when variables are "Derived by CAPI" or "Trend-IALS/ALL"?
How does PIAAC convert educational attainment to years of school?
Q1: Is there special software to use to analyze PIAAC data?
A: You can use SAS, Stata, SPSS programs, or an online tool (IDE) to analyze PIAAC data. The Organization for Economic Co-operation and Development (OECD), in collaboration with international partners, has developed SAS and Stata macros to incorporate PIAAC complex sampling and assessment designs. SPSS macros were developed by the Data Processing and Research Center of the International Association for the Evaluation of Educational Achievement (IEA-DPC) and are in a form of a free add on software called the IDB Analyzer that generates SPSS syntax for analysis. To do basic analysis, you can use the International Data Explorer (IDE), a user-friendly, online tool. The NCES IDE can be found here: http://nces.ed.gov/surveys/international/ide/ and the OECD IDE can be found here: http://piaacdataexplorer.oecd.org/ide/idepiaac/. An explanation of the differences between the NCES IDE and OECD IDE can be found on the NCES IDE homepage. For more information on PIAAC data and to access the IDB Analyzer or SAS and Stata macros, go to: http://www.oecd.org/site/piaac/publicdataandanalysis.htm.
Q2: Can one just use SAS, Stata or SPSS normally to analyze PIAAC data?
A: Instead of one proficiency score, PIAAC has 10 plausible values (PVs) that need to be combined in a certain way to come up with correct estimates and standard errors. Theoretically, one can look at the OECD Technical Report and come up with one’s own macro to estimate the proficiency levels and average scores or run regressions. However, to make it easier for researchers, the OECD has created software tools to do analysis taking into account the complex sampling and assessment design of PIAAC. See our response to Q1 above in this “Data Analysis” section for more information about these tools.
Q3: Does one need to use the macros at all times when dealing with PIAAC data?
A: Data management – i.e., combining multiple variables, creating an index, etc. – is done outside of the macros in the software of your choice. The analysis is divided into two parts – the survey (background questionnaire) and direct assessment. If one is analyzing the background questionnaire only, SAS and Stata programs can handle the full and replicate weights, so one can use the regular complex design survey procedures. SPSS complex sample design add-on allows one to specify cluster and/or stratification variables; however, it uses Taylor series linearization to produce the standard errors; while PIAAC data uses Jackknife repeated replication (JRR) or other forms of replication based estimation, depending on the specific country sampling design.
The analysis of the direct assessment requires the use of the OECD-provided macros – i.e., if one wants to estimate literacy of adults or regress literacy on other independent variables.
Q4: I am looking at the relationship between problem solving and educational attainment. I want to use the data from the PIAAC 2012 survey but it was difficult using the code book to find the Scaled scores that are Below 1, Level 1, 2, and 3 for problem solving. How was that coded?
A: There is not a variable on file that contains information on whether the individual score was in the Below Level 1, Level 1, Level 2, or Level 3 for problem solving, because each individual is not assigned just one proficiency score or level. Each individual who takes the PIAAC assessment is given a set of 10 plausible values (imputed proficiency scores) ranging from 0-500, in order to take into account the uncertainty associated with measures of skills in large-scale surveys and also to obtain more accurate estimates of group proficiency. These plausible values for problem solving are the variables PVPLS1- PVPSL10 on the data file and in the code book. The proficiency levels are defined by score-point ranges and level of difficulty of the tasks within these ranges. The ranges for problem solving are defined as Below Level 1 (0-240), Level 1 (241-290), Level 2 (291-340), and Level 3 (341-500).
You will need to use special data analysis tools in order to correctly analyze PIAAC data, taking into account these plausible values and the complex sampling design of PIAAC. For most basic analysis, we recommend using the PIAAC International Data Explorer (IDE). The IDE is a user-friendly online tool that will allow you to do PIAAC analysis and take into account the plausible values and sampling weights for you. You could, for example, use the IDE to do an analysis of the proficiency level distribution at different levels of education.
For more complex analysis with the data files, there is a program called the IDB analyzer that creates SPSS code and there are SAS and STATA macros. These can be found at this link: http://www.oecd.org/site/piaac/publicdataandanalysis.htm.
For more information on the PIAAC Data Files, please visit the PIAAC Gateway page: http://piaacgateway.com/datasets/
Q5: How can I calculate the number/share of people who were excluded from problem solving in technology-rich environments (PS-TRE) due to their lack of computer skills?
A: You can use the variable PSLSTATUS to determine the percentage of the population who have plausible values (PVs) for problem solving in technology-rich environments (PS-TRE). The variable also shows the percentage of cases that have literacy-related nonresponse, and those that were excluded from PS-TRE for computer-based reasons (CBA Non-Response).
The variable PBROUTE will give more detailed information on the reasons why the respondent was excluded from the computer-based assessment and routed to the paper-based assessment (i.e., they failed a Core test of ICT skills, they refused the computer-based assessment, or they had answered a question indicating that they had no computer experience). For the U.S., there should be about 16% of the weighted sample who were excluded because of the lack of computer knowledge only. Note that this group does not include those that passed the Core test of ICT skills but failed the computer-based Core test of literacy and numeracy skills, and who were subsequently routed to the paper-based assessment.
The cases excluded for literacy-related non-response (about 4% of the U.S. weighted sample) are also flagged by domain corresponding variables called LITSTATUS and NUMSTATUS. These cases did not respond to the background questionnaire as a result of language difficulties or learning or mental disabilities and do not have plausible values for any of the domains.
Q6: How do I conduct analysis with the IALS and ALL data I received from Statistics Canada?
A: After you use the provided SPSS Syntax Files (.sps) to create the SPSS Data Document (.sav), you need to make a few edits to the produced data file before starting your analysis in the IEA International Database (IDB) Analyzer.
In both the IALS and ALL data files, you need to delete the leading 0 from Replic01-Replic09 and rename them as Replic1-Replic9.
In the IALS data file, you also need to rename the variable WEIGHT as POPWT.
When using the IDB Analyzer, after you select the analysis file, when the “Select Study Type” option pops up, please choose “IALS/ALLS.” You would then need to select “IALS/ALLS(Rescaled data - 2013)” in the Analysis Type drop down menu before using the IDB Analyzer to generate SPSS syntax for analysis.For more information on how to use the IDB Analyzer, please refer to the PIAAC Distance Learning Dataset Training module “Considerations for Analysis of PIAAC Data”.
For some countries, the populations included in the file provided by Statistics Canada are not aligned with the PIAAC sample population. Therefore, you have to combine or exclude parts of the populations to have a PIAAC-comparable sample and match the estimates for IALS and ALL included in the International Data Explorer (IDE) and rescaled estimates reported elsewhere (in reports published after October 2013). The Canadian population is reported separately as Canada(English) and Canada(French) in both the IALS and ALL files from Statistics Canada, so you need to combine these populations using their country identification codes (CNRTID) if you want to do analysis of the Canadian population as a whole.
Additionally, the IALS dataset reports data from Great Britain as a whole (England/Scotland/Wales), while PIAAC collected data from and reports data on England/Northern Ireland. Therefore, the comparable population between IALS and PIAAC is the population from England. To select only the population from England, and exclude those from Scotland or Wales from analysis with the IALS file from Statistics Canada, you can use the variable GBR. To select a comparable population with the PIAAC data file, you would also have to use the variable for participating country or sub-national entity code (CNTRYID_E) in the PIAAC data from England/Northern Ireland, in order to select only the population from England, and exclude those from Northern Ireland. Additionally, in IALS, a few countries included adults older than 65 in their sample, so analysis with the IALS file from Statistics Canada needs to exclude those over 65 to be comparable to the PIAAC population. The variable AGEINT can be used to exclude that population.
Q1: How do I access U.S. data files?
A: There are SPSS and SAS Public Use PIAAC Files available on the NCES website (see response to Q3 below in this “Data Files” section for more information on the PIAAC Restricted Use Files). To get a Stata readable file, you can use StatTransfer software, but be aware of the labeling issues related to Stata value cut-offs. You will need to edit some labels in the syntax to make the syntax run correctly. SAS files are linked to the format files, so please be sure to run the SAS format programs before you import the data files into SAS themselves. For more information go to: http://nces.ed.gov/surveys/piaac/datafiles.asp
Q2: What are the differences between the U.S. Public Use File (PUF) on the NCES website and the U.S. PUF available for download on the Organization for Economic Co-operation and Development (OECD) PIAAC web site? Is there only information added or has some old information been removed?
A: The U.S. PUF data file on the NCES website includes additional variables above those included in the U.S. PUF from the OECD website. The PUF on the NCES website includes U.S.-only variables on topics such as race/ethnicity, English language ability, health information practices, as well as all variables following national routing for analysis. In addition, PUFs of other PIAAC participating countries are only available on the OECD website.
Q3: What additional information is available in the U.S. Restricted Use File (RUF)? How can I gain access to the RUF?
A: The U.S. Restricted Use File (RUF) contains more detailed information for data that was suppressed in the public-use dataset due to confidentiality concerns for respondent, such as continuous age and earnings variables and more detailed industry and occupation variables. RUF contains all variables at the level of detail of how the questions were asked and answered in the Background Questionnaire (BQ). If some variables are missing, top-coded, or suppressed in PUF, they will be present in RUF in the original form of answer to the question in the BQ. To access the U.S. restricted-use data, the restricted-use license has to be applied for and obtained from NCES. More information on the process is available at: http://nces.ed.gov/pubsearch/licenses.asp. Please note that access to the RUF is only available to individuals residing in the U.S. Please allow several months for the NCES review of the restricted-use license application.
Q1: Is there any way to get figures for the U.S. competencies grouped by state, county, or region?
A: Unfortunately, PIAAC data is not currently available by state or county. Although the data is nationally representative, NCES did not collect it from all of the states or counties. Therefore, the data are not representative for any individual state or county. As you would suspect, state/county representativeness would require a larger sample size by state/county and therefore resources. This is one of the trade-offs NCES often has to make in creating a nationally representative dataset in their international assessments.
NCES is currently engaged in a modeling exercise based on small area estimates to produce synthetic estimates of averages per state per domain, but of course there will not be breakouts by all the variables and it is a synthetic estimate (i.e. an estimate for an area calculated using descriptive or demographic data combined with direct assessment estimates for various demographic groups).
However, there is a way to get proficiency scores/levels broken down by the four U.S. census regions (Northeast, Midwest, South, and West). For basic analysis, there is an online tool called the International Data Explorer (IDE) for the U.S. Using this tool, you can do analysis such as average score or proficiency level distribution by the variable Geographical region – Respondent (US Census regions)(ID: REGIONUS).
For more complex analysis, the variable REGION_US is also available on the U.S. Public Use File (PUF) found here: http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2014045. For analysis with this data file, there is a program called the IDB analyzer that creates SPSS code and there are SAS and STATA macros that can be found here: http://www.oecd.org/site/piaac/publicdataandanalysis.htm. For more information on the PIAAC Data Files, please visit the PIAAC Gateway page: http://piaacgateway.com/datasets/.
Q2: I am trying to clarify how the Education and Skills Online assessment can be utilized. Would geographic areas (counties) be able to get data on their region? When will it be available? Will there be any cost in establishing a specific group/region for assessment purposes?
A: The Education and Skills Online (E&S Online) assessment can be used at an individual or organizational level, for example by an adult learning center or an employer. The data and results would be available to the individual or the organization (including county-wide or state-wide programs) that sponsored the assessment. A county could sponsor the use of E&S Online and administer the assessment to people in their region or particular populations in their region (e.g. prison population or unemployed population). The sponsoring organization will own the data and can use the results to their own purposes. Please note, a county or any other sponsoring organization would not be able to get data or results from unaffiliated individuals that took Education and Skills Online on their own.
The E&S Online is available at the following link: http://www.oecd.org/skills/ESonline-assessment/
We expect that the cost to use E&S Online would be around $10-15 per individual and we are unsure if there will be discounted rates for groups and organizations.
Q3: I'm looking for the results of the PIAAC U.S. Prison Study. In particular, I would like to see the results of the background questionnaire in that group. Are those results available yet? And if so, how could I find them?
A: The U.S. PIAAC National Supplement Household and Prison Studies data are not available yet. NCES completed National Supplement data collection for the Household Study in May 2014 and the Prison Study in June 2014. The data will be available in early 2016.
To receive news regarding PIAAC, sign up for a PIAAC Buzz monthly newsletter. When they become available you could download the Public Use data files on the NCES website: http://nces.ed.gov/surveys/piaac/index.asp
Q4: Has information on country of birth been collected in the United States?
A: Several variables exist on the Public Use Files (PUF), International Data Explorer (IDE) and Restricted Use Files (RUF) that address this question. In the PUF and IDE, relevant variables include J_Q04A (whether respondent was born in the country – native, non-native), CNT_BRTHUS_C (respondent’s country of birth, collapsed into 2 categories – native, foreign-born), and BIRTRGNUS_C (Respondent’s country of birth, collapsed into 3 categories – North American and Western Europe, Latin America and the Caribbean, Other). In addition to the above listed variables, the RUF includes a detailed set of variables J_Q04bUS/ J_S04b (respondent’s country of birth – Mexico, China, Philippines, India, Russia, Columbia, Other/Specific answer to the ‘Other’ option).
Q5: I am trying to link observations geographically and by occupation with other data. Is an occupation variable with 4-digit ISCO codes available in the restricted-use dataset? Is the occupation variable with 2-digit ISCO codes available in the public-use dataset or only in the restricted-use dataset?
A: In the U.S. public-use data set, current occupation is available for 1-digit (ISCO1C), 2-digit (ISCO2C), and 3-digit (ISCO08_CUS_C) ISCO codes. The 4-digit ISCO (ISCO08_C) occupation variable is only available in the restricted-use data set. Note that the detail with which the 4-digit ISCO code variable slices the data may rend analysis that has a weak reporting power. For example, dividing the U.S. sample of about 5,000 adults, not every one of which had an occupation or had reported it, into more than 400 detailed ISCO occupation codes, means that many of the occupation codes will have only one or two cases reporting them, which is not a large enough sample size to produce reliable, stable estimates and is a disclosure risk. If one is analyzing in conjunction with the proficiency levels, it is advised to use a no more detailed variable than the 1-digit ISCO occupation variable. Some of the reports have used a 4 category occupational variable (ISCOSKIL4) that collapses the 1-digit ISCO occupation variable into three derived categories of skilled occupations: semi-skilled white-collar occupations; semi-skilled blue-collar occupations; and elementary occupations.
The geographic variable available for analysis is U.S. census region (REGION_US) of Northeast, Midwest, South, and West.
Q6: How can I compare results from PIAAC with results from the Program for International Student Assessment (PISA) in Turkey?
A: Turkey participated in Round 2 of PIAAC, and collected data in 2014, so results from the administration of PIAAC in Turkey have not been reported yet (and will not be available until 2016). Additionally, in the absence of evidence from a study linking PISA and PIAAC, caution is advised in comparing the results of the two assessments. The overlap between the target populations of PIAAC and PISA is not complete; and while the concepts of literacy in PIAAC and reading literacy in PISA, and the concepts of numeracy in PIAAC and mathematical literacy in PISA are closely related, the measurement scales are not the same. However, the figures on pages 206-207 of OECD Skills Outlook 2013: First Results from the Survey of Adult Skills show that there is a reasonably close correlation between countries’ performance in the different cycles of PISA and the proficiency of the relevant age cohorts in literacy and numeracy in PIAAC. A more detailed description of the relationship between the two assessments, including differences in the target populations and skills assessed, can be found in Chapter 6 of The Survey of Adult Skills – Reader’s Companion. You could also listen to the presentation of Patrick Bussière, co-leader of Canada's PIAAC team, describing the similarities and differences in design and purpose between PISA and PIAAC. In the videotaped webinar, Patrick Bussière suggests the reasons why it could be of interest to compare the results from the two surveys as we move forward.
Q7: Have the test items for the Computer Literacy Core (CLC) and Computer Numeracy Core (CNC) been released? If so, where can I find them?
A: The test items for CLC and CNC have not been released, because they are currently being used in the additional rounds of PIAAC and will be used for the purpose of trend data in future administrations. The written text descriptions available for a few of the items in the OECD Technical Report are the best information available on the CLC and CNC items. On page 5 of Chapter 21 (page 516 of the PDF), there are written descriptions of two of the CLC items (SGIH and Election Results). On page 10 of Chapter 21 (page 521 of the PDF), there is a written description of one of the CNC items (Bottles). Appendix 1 of the OECD Technical Report (beginning on page 554 of the PDF) also includes some characteristics on the difficultly levels of the CLC and CNC items.
DERIVED VARIABLES AND INDICES
Q1: How is the derived employment variable C_D05 defined? It appears to reflect a current employment status, but when matched with other employment status variables, such as C_Q07, a number of people are presenting as employed in one and unemployed in another.
A: Variable C_D05 is a "routing" variable that was derived using five preceding questions and was used to route or branch respondents to subsequent sections for employed, formerly employed or non-employed adults. This variable flags the currently active population, i.e. the labor force measured in relation to a short reference period such as one week. It is the "objective" measure of employment because it was based on an objective definition of employment and is determined by the combination of the participant’s responses to five questions on having/not having or seeking employment. Variable C_D05 was used to determine which subsequent job-related questions the respondent received, so it has an additional importance in terms of availability of data in other sections of the background questionnaire. You can take a look at the derivation of it in the background questionnaire. For information about the motivation, definition, and rationale for this variable in accessible language (rather than the exact syntax and coding of the variable), please refer to the Background Questionnaire Framework.
There is also a "subjective" measure of employment, variable C_Q07 (where respondents are asked to self-report their own status, such as whether they are a student, retired, etc.) that provides a broader indication of respondents’ current situation.
Therefore, the two variables do not necessarily line up for various reasons such as definition differences, "objectivity" vs. "subjectivity", and misreporting.
Q2: Which items comprise the “Readiness to Learn” derived index in the PIAAC data set?
A: The “Readiness to Learn” index is derived from the six “About Yourself: Learning strategies” questions: I_Q04b (relate new ideas into real life), I_Q04d (like learning new things), I_Q04h (attribute something new), I_Q04j (get to the bottom of difficult things), I_Q04l (figure out how different ideas fit together), and I_Q04m (look for additional information for clarity). You can view the items here: http://nces.ed.gov/surveys/piaac/final_en_bq.htm#I_Q04b .
Q3: What is the meaning when variables are "Derived by CAPI" or "Trend-IALS/ALL"?
A: When a variable is labeled as “Derived by CAPI”, it means that the variable was derived/coded through the computer-assisted personal interview (CAPI) system that the interviewers used and was a variable created specifically for CAPI in order to route or branch respondents to subsequent questions. Since the variables were coded by the system in the process of the interview, some inconsistency between Background Questionnaire variables and CAPI derived variables may exist. When a variable is labeled as “Trend-IALS/ALL,” it means that the variable can be linked back to an identical or similar variable that was used in one or both of the previous international adult literacy assessments, IALS and ALL. These variables must be used when conducting an analysis across the assessments.
Q4: How does PIAAC convert educational attainment to years of school?
A: PIAAC used the International Standard Classification of Education (ISCED) to map educational attainment to years of schooling. The OECD Technical Report reports the ISCED classifications for each level of educational attainment and how each level of educational attainment was converted to total years of schooling for the U.S. and all other countries. The ISCED mapping and conversion of educational attainment to years of schooling for the U.S. is found in Appendix 5 on pages 873-874 of the PDF. The mapping used for other countries is also found in the same appendix of the technical report. In the PIAAC dataset, information on years of schooling is available in the derived variable for highest level of education imputed into years of education (YRSQUAL).