TECHNICAL NOTES


CCR Data Collection

The California Cancer Registry (CCR), through a network of ten regional registries, routinely collects information on all incident (that is, newly diagnosed) cancers in California. Information collected include diagnosis, patient identifiers and demographic characteristics, tumor attributes, stage of disease at the time of diagnosis, first course of treatment, and follow-up. These data are abstracted from records in medical treatment facilities. The ten regional registries consolidate the data, perform data quality procedures, and conduct analyses of interest to the regions. Each quarter, updated data tapes are submitted to the CCR, which collates the data, performs additional data quality control, and analyzes the data on a statewide basis. Mortality data obtained from the Center of Health Statistics, California Department of Health Services, are also incorporated into the analysis. Public use tapes, which do not contain any confidential data, are prepared by the CCR and made available upon request.

Cancer Reporting Regions

Ten regions covering the entire state have reported cancer incidence data to the CCR since January 1, 1988 (Figure A and Figure B):

Region 1 Santa Clara Region (Monterey, San Benito, Santa Clara and Santa Cruz Counties).
Region 2 Central Region (Fresno, Kern, Kings, Madera, Mariposa, Merced, Stanislaus, Tulare and Tuolumne Counties).
Region 3 Sacramento Region (Alpine, Amador, Calaveras, El Dorado, Nevada, Placer, Sacramento, San Joaquin, Sierra, Solano, Sutter, Yolo and Yuba Counties).
Region 4 Tri-County Region (San Luis Obispo, Santa Barbara and Ventura Counties).
Region 5 Inland Empire Region (Inyo, Mono, Riverside and San Bernardino Counties).
Region 6 North Region (Butte, Colusa, Del Norte, Glenn, Humboldt, Lake, Lassen, Mendocino, Modoc, Napa, Plumas, Shasta, Siskiyou, Sonoma, Tehama and Trinity Counties).
Region 7 San Diego Region (Imperial and San Diego Counties).
Region 8 Bay Area Region (Alameda, Contra Costa, Marin, San Francisco and San Mateo Counties).
Region 9 Los Angeles County.
Region 10 Orange County.

Cases and Deaths

Incidence data presented in this report are based on cases of primary breast cancer diagnosed in California women and reported to the CCR. Breast cancers were coded and reported according to the International Classification of Diseases for Oncology, Second Edition (ICD-O-2) (1). A "case" is defined here as a primary breast cancer (i.e., those with site codes C500 - C509, excluding types M9590 - M9970, which refer to lymphohematopoietic neoplasms), as distinguished from cancer which has spread to the breast from a tumor in another site.

Chapter 2 (Incidence and Mortality) of this report includes cases diagnosed from 1988 to 1993 and reported to the CCR as of October 1995. Cases included in all other chapters were those diagnosed from 1988 to 1992 and reported to the CCR as of November 1994. Because of the dynamic nature of cancer registries, additional cases are continuously reported and incorporated into the CCR database. Therefore, the number of cases diagnosed during a given period will change with each data submission. For example, as of November 1994 the reported number of invasive breast cancers diagnosed in California in 1992 was 17,889. An additional 139 cases diagnosed in 1992 were later reported to the CCR, and the number for 1992 rose to 18,028 as of October 1995.

Mortality data are based on the Death Certificate Master Files from the Department of Health Services. "Deaths" are defined as the number of women who died in California with breast cancer as the underlying cause of death (i.e., deaths coded as 1740 - 1749) according to the International Classification of Diseases, Ninth Revision) (2).

Of all breast cancer cases reported to the CCR as of October 1995, 1,294 (1.1%) were of unknown race/ethnicity. A total of 6 deaths due to breast cancer (0.02%) were of unknown race/ethnicity. These cases and deaths of unknown race/ethnicity were included in counts and rates for all races combined, but were not assigned to a specific race/ethnic group. Therefore, race-specific incidence or mortality counts will not sum to the total for all races combined.

Population Estimates

Annual mid-year population estimates by age, race/ethnicity, and sex to the county level for non-Hispanic whites, non-Hispanic blacks, Hispanics, and non-Hispanic Asian/Others were obtained from the California Department of Finance (DOF), Demographic Research Unit. Estimates for 1988, 1989, and 1990 are from a special population summary for 1970-1990 with gender, age, and race/ethnic detail released by DOF in April 1993, benchmarked to the 1990 Census. Estimates for 1991, 1992, and 1993 are from current estimates of the California population with gender, age and race/ethnic detail consistent with Report 93 E-2 (3).

Definition of Race/Ethnicity

Race/ethnicity for both cases/deaths and population estimates are grouped into the mutually exclusive categories of non-Hispanic white, non-Hispanic black, Hispanic, and non-Hispanic Asian/Other. Hispanic ethnicity is based on information on the medical record or death certificate, and on surname. Persons with race coded as white, black, or unknown with a last name on the 1980 U.S. Census list of 12,497 Hispanic surnames were categorized as Hispanic for analyses in this report (4). The CCR has adopted the use of surname to identify Hispanics to compensate for the recognized under-reporting of Hispanic ethnicity in medical records and death certificates (5).

SEER Program

The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program comprises a set of geographically defined, population-based central tumor registries covering approximately ten percent of the total population in the United States (6). Three of the CCR regional registries participate in the SEER Program, one of them since 1973 (Bay Area Region) and the other two since 1992 (Santa Clara Region and Los Angeles County).

Socioeconomic Status

Information on the patient's socioeconomic indicators are not available to the CCR. To overcome the absence of such indicators in medical records, a methodology using census data has been proposed. This approach characterizes patients according to the socioeconomic profile of the block group corresponding to their residential address. Block groups are more homogeneous subdivisions of a census tract, containing an average of 1,000 residents. This methodology is still subject to biases, in that grouped data are applied to individuals. However, a recent study has validated the use of block-group data as a reasonably accurate surrogate for socioeconomic data collected at an individual level (7). In this report, two socioeconomic indicators are used: (I) the median household income in the block group, and (ii) the proportion of residents 25 years and older in the block group with at least a high school diploma. Sufficient information to determine census block group was available for 93.1% of breast cancer cases.

Income level of neighborhood

Median income in the patient's neighborhood was characterized as low, medium or high based on the distribution (quartiles) of median block group income for each of the ten cancer reporting regions in California. A block group was considered to have low income if its median household income was within the 25% lowest block group incomes recorded in that particular reporting region. Similarly, high income block groups were those with median household income within the 25% highest block group incomes in the specified region. Medium income block groups were those with median household incomes falling between the 25% lowest and the 25% highest incomes in the reporting region. Regional median income was chosen over statewide values to adjust for differences in cost-of-living among different regions in California.

Education level of neighborhood

The proportion of persons 25 years and older with at least a high school diploma was chosen to represent the educational attainment in the patient's neighborhood. A woman was considered to live in a neighborhood with less formal education if less than 25% of residents 25 years and older in that particular block group had a high school diploma.

Technical Terms

Age-adjusted rate

Age-adjusted rates are weighted averages of the age-specific rates, where the weights represent the age distribution of a standard population. Rates in this report are age-adjusted by the direct method (8) to the 1970 United States population or the world standard population. Breast cancer incidence and mortality increase with age, so two distinct populations may have different rates due not to an intrinsically higher or lower cancer risk, but to a difference in their age distribution. Age-adjustment, by controlling for age differences in populations, allows for meaningful comparisons of cancer rates.

Age-specific rate

Age-specific rates are calculated by dividing the total number of cases or deaths in a specific age group by the total population in that age group. Age at diagnosis or death was categorized into five-year age categories, starting with birth to 4 years old and ending with age 85 and older. The race- and age-specific total number of cases or deaths over the five-year period was divided by the race- and age-specific population sum over the same period. This rate was then multiplied by 100,000 to yield an average annual age-specific rate per 100,000 population.

Crude rate

Crude rates are calculated by dividing the total number of cases or deaths by the total population at risk. For the race/ethnic groups for which annual population estimates were available, the race-specific total number of cases or deaths over the five-year period was divided by the race-specific population sum over the same period, and multiplied by 100,000 for an average annual crude rate per 100,000 population. Crude rates are useful in summarizing the cancer burden in a specific population. They are not useful for comparing the risk of developing cancer in different race/ethnic groups, geographic areas, or time periods.

Estimated Annual Percent Change

The Estimated Annual Percent Change (EAPC) represents the average percent increase or decrease per year in the age-adjusted rate, assuming that the rate is changing at a constant rate over the interval. The EAPC is calculated by fitting a linear regression to the natural logarithm of the annual rates (r), using calendar year as the predictor variable (i.e., ln(r) = m(year) + b. EAPC = 100*(em - 1)). Testing the hypothesis that the EAPC is equal to zero is equivalent to testing the hypothesis that the slope of the line in the above equation is equal to zero.

Cautions on Interpretation

Differences by race/ethnicity

The reliability of race-specific rates depends on the accuracy of race classification in both cases or deaths and in population estimates. Some variation in race-specific rates may reflect misclassification bias, rather than a true difference in cancer risk. Population estimates are based on self-identification at the time of the 1990 Census. The Census Bureau reports that the 1990 Census undercounted the total population by 1.6%, the Asian population by 2.3%, the black population by 4.4%, and the Hispanic population by 5% (9). The effect of undercounting the population in certain race/ethnic groups is that rates for these groups will be overestimated (while overcounting the population leads to underestimation of rates).

Race/ethnicity information for breast cancer cases is primarily based on information contained in the patient's medical record. This information may be based on self-identification by the patient, on the assumptions by an admissions clerk or other medical personnel, or by an inference using race/ethnicity of parents, birthplace, maiden name or last name. Race/ethnicity for cancer deaths, on the other hand, is based on information on the death certificate, which is often completed by the funeral director or coroner, and may not always be based on information provided by next-of-kin. The reporting of race/ethnicity in either system may be influenced by the race/ethnic distribution of the local population, by local interpretation of data collection guidelines, and other factors. While use of surname lists partially compensates for misclassification of some race/ethnic groups, it is likely that some differences in race-specific rates reflect biases of classification rather than true differences in risk.

California and SEER rates by race/ethnicity may not be directly comparable, due to different definitions of race/ethnic groups in each database. Hispanics in the SEER Program do not constitute an independent race/ethnic group, but are included in the "white" category. Because incidence rates of breast cancer among Hispanic women are lower than among whites, rates for white California women (which do not include Hispanics) are likely to be higher than the corresponding SEER rates.

Statistical significance

When comparing cancer rates for two populations, it should be kept in mind that results of such comparisons depend on both the magnitude of the difference between the two rates and the number of individuals in each population group (10). Estimates derived from a small population group tend to be less precise, and less likely to be deemed significant. Therefore, statistical tests may fail to detect real differences when they are based on a small number of individuals. On the other hand, statistically significant variations in rates do not necessarily mean that these variations are relevant from a biologic or public health stand point. A statistically significant result may simply reflect the large number of individuals being studied.


References

  1. Percy C, Van Holten V, Muir C (eds). International Classification of Diseases for Oncology. 2nd ed. Geneva, Switzerland: World Health Organization; 1990.

  2. World Health Organization. International Classification of Diseases. 9th rev. Geneva, Switzerland: World Health Organization, 1977.

  3. California Department of Finance. Population Estimates for California State and Counties: Report 93 E-2. Sacramento, CA: California Department of Finance, Demographic Research Unit, February 1994.

  4. California Department of Health Services. Cancer Reporting in California: Abstracting and Coding Procedures for Hospitals. California Cancer Reporting System Standards. 3rd ed. Sacramento, CA: California Cancer Registry, Data Standards and Assessment Unit, November 1994.

  5. Stewart, S.L., Glaser, S.L., Horn-Ross, P.L., and West, D.W. SEER Study of Methods to Classify Hispanic Cancer Patients (Final Report: Contract N01-CN-05224). Union City, CA: Northern California Cancer Center, April 1993.

  6. Kosary CL, Ries LAG, Miller BA, Hankey BF, Harras A, Edwards BK (eds). SEER Cancer Statistics Review, 1973-1992: Tables and Graphs. National Cancer Institute. NIH Pub. No. 95-2789. Bethesda, MD, 1995.

  7. Krieger N. Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology. Am J Public Health 1992;82:703-710.

  8. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley and Sons, 1981.

  9. Bureau of the Census. Assessment of Accuracy of Adjusted versus Unadjusted 1990 Census Base for use in Intercensul Estimates: Report of the Committee on Adjustment of Postcensul Estimates. August 7, 1992.

  10. Oaks M. Statistical Inference. 1st ed. Chestnut Hill, MA: Epidemiologic Resources Inc., 1990.


Return to Table of Contents

http://www.ccrcal.org/breast96/technote.htm