Race data

From RiCkyPedia

Jump to: navigation, search

Quality of Secondary data on Race and Ethnicity in Health Research

Prepared by: Center for Aging in Diverse Communities (CADC), UCSF and RCMAR Coordinating Center (UCLA)

Although the concept of race is regularly used as a meaningful descriptor in health research, many have challenged the scientific utility of the concept. There is no doubt that the construct of race is often used in an uncritical manner. It is sometimes used as a proxy for socioeconomic status and is very often used synonymously with such terms as ethnicity and culture. As a result, the scientific community is witnessing an increase in writings questioning the existence of race as a biologically or genetically meaningful construct. Similarly, some have noted that while terms such as “Black,” “Hispanic,” “Asian,” “American Indian,” and “White,” may be bureaucratically convenient, they have made talking and thinking about race and ethnicity somewhat simplistic. Some argue that these are overly broad pan-ethnic labels that categorize a disparate number of groups who happen to share a few visible characteristics such as skin color and a few other identifiable physical features. Others argue that these are biologically meaningful population groups with significant genetic differences. These varying perspectives have resulted in some interesting and provocative reading material. This list is based on the assumption that all social scientists interested in a deeper understanding of population group differences can benefit from exposure to the ongoing debate about the appropriate conceptualization and measurement (e.g., self-report versus observer report) of race and ethnicity and the quality of data on race/ethnicity in different datasets.


Many studies demonstrating an association between race and the use of medical services have used hospital discharge abstract data. The quality of the measures of race in such sources has heretofore been unexplored. Hospital discharge abstract data from NY State were used to identify 767 cardiac patients who had been admitted to a hospital twice. Racial classifications during the two admissions were concordant 93.7% of the time. Kappa was .89 for Blacks, .72 for Whites, and .43 for all other racial groups. Evidence suggests that the misclassification of race in hospital discharge abstract data is nondifferential; racial discrepancies in access to medical services are thus probably even greater than those previously reported.


Examined agreement between ethnicity in records of a sample of members of five Northern California Kaiser Permanente medical centers with self-reported ethnicity. Sensitivities and positive predictive values of the Kaiser classification were high among blacks (0.95 for both measures) and whites (0.98 and 0.94, respectively), slightly lower among Asians (0.88 and 0.95, respectively), and considerably lower among Hispanics (0.55 and 0.81, respectively) and American Indians (0.47 and 0.50, respectively). Among Asian subgroups, the proportion classified as Asian was high among Chinese (0.94) and Japanese (0.99) but lower among Filipinos (0.79) and other Asians (0.74).


Investigates two hypotheses: (1) The race of infants of different-race parents is more likely to be differentially classified at birth and death than the race of infants of same-race parents. (2) States with a greater proportion of infant deaths of a given race are less likely to differentially classify infants of that race on birth and death certificates than states with a smaller proportion of infant deaths of that race. Both are supported.


To understand ethnic inequalities in health, we must take account of the relationship between ethnic minority status, structural disadvantage and agency. So far, the direct effects of racial oppression on health, and the role of ethnicity as identity, which is in part a product of agency, have been ignored. Factor analysis suggested that dimensions of ethnic identity were consistent across the various ethnic minority groups. Initially some of these dimensions of ethnic identity appeared to be related to health, but in a multivariate model the factor relating to a racialised identity was the only one that exhibited any relationship with health. These findings suggest that ethnic identity is not related to health. Rather, the multivariate analyses showed strong independent relationships between health and experiences of racism, perceived racial discrimination and class.


Examined agreement of administrative data with self-reported race/ethnicity and identified correlates of agreement. Veterans Affairs administrative data and self-report data. Relatively low rates of agreement (approximately 60%) between data sources were largely the result of administrative data from patients whose race/ethnicity was unknown, with least agreement for Native American, Asian, and Pacific Islander patients.


This paper evaluates the new race/ethnicity codes for Asian Americans (AA), Hispanics, and Native Americans (NA) that have recently been added to the Medicare enrollment database. The race/ethnicity code revisions are described and evaluated by (1) comparing the numbers of persons identified as AAs, Hispanics, and NAs with corresponding population census projections and (2) determining whether Medicare enrollees born in Asian and Hispanic countries are assigned relevant codes. Among persons 65+, approximately 24% of Hispanics, 17% of NAs, and 56% of AAs are identifiable by the new codes. From 18% to 29% of enrollees 65 years old or older born in Mexico, Puerto Rico, and Cuba are coded as Hispanic, and from 14% to 73% of enrollees born in nine Asian countries are classified as Asian American. Researchers should resist the temptation to base analyses on the revised HCFA race/ethnicity codes, since coverage is incomplete and biased.


Identification of Hispanic ethnicity for beneficiaries in the Medicare claims files is problematic, greatly limiting the use of these administrative data for examining race/ethnicity differences. This article reports on 2 studies assessing the effectiveness of a Hispanic surname match for improving the accuracy of race/ethnicity codes for elderly males in the Medicare data sets. Using self-identification as the gold standard, including the Spanish surname match increased accuracy for Hispanics and whites compared with the Medicare race code alone. Using surname information to supplement the Medicare race code could greatly enhance researchers' ability to examine healthcare equity.


This study evaluated the validity of registry-reported race for individuals who participated in research studies conducted since 1980 through the Metropolitan Detroit Cancer Surveillance System (MDCSS), a Surveillance, Epidemiology, and End Results (SEER) Program registry. It finds that misclassification in the MDCSS registry of African Americans as Whites, and vice versa, is relatively low.


In the Greater Bay Area Cancer Registry, Hispanic ethnicity is determined by medical record review and by matching to surname lists. This study compared these classification methods with self-report. Among those persons classified as Hispanic by either or both of these sources, only two-thirds agreed (predictive value positive = 66%), and many self-identified Hispanics were classified incorrectly (sensitivity = 68%). Classification based on either medical record or surname alone had a lower sensitivity (59% and 61%, respectively) but a higher predictive value positive (77% and 70%, respectively). Ethnic classification by medical record alone resulted in an underestimate of Hispanic cancer cases and incidence rates. Bias was reduced when medical records and surnames were used together to classify cancer cases as Hispanic.


Last updated March 2006