In studies based on electronic health record (EHR) databases, diagnostic codes are commonly used to define clinical outcomes. However, the accuracy of the codes depends on several factors, such as whether the medical diagnosis is correct and the opportunity for physical examination (ascertainment process), and validity can vary between datasets.1, 2 The diagnosis of asthma and allergic diseases (AAD) in young children is particularly challenging: the symptoms are intermittent and the differential diagnosis is difficult.3 Therefore, most diagnoses rely on response to treatment and parental report of symptoms that can be influenced by past experiences of diseases in the children and parents, which in turn can lead to recall bias. The impact of disease misclassification can be important depending on whether it is differential or non-differential, and whether it is dependent on other errors.4
We recently analysed the association between exposure to antibiotics and the risk of AAD (asthma, atopic eczema, and allergic rhinitis5, 6), in children participating in the Born in Bradford (BiB) birth cohort study.7 Briefly, 12,453 pregnant women were recruited to BiB between 2007 and 2010, resulting in the births of over 13,500 children. Consent for health record linkage was obtained, and has been achieved for approximately 98% of participants. In total, 13,044 children were linked to EHR. The protocol for the antibiotics study, written before the study started, can be found in reference 5.5 In this letter, we present our approach to defining AAD outcomes using CTV3 Read codes (coded clinical terms designed for use in EHR in the NHS in the UK) and British National Formulary (BNF) codes for prescriptions of medications.
Initially, we planned to follow the common practice of using only validated definitions described in previous studies using EHR to ensure comparability. However, we reflected over some issues: diagnostic procedures are not standardised; the codes used and their frequency can vary across different settings and doctors; and there could be cases that are not recorded with the validated codes. Conversely, including all Read codes found in our EHR relating to our outcomes could lead to bias where Read codes are used for non-cases (e.g., family history of asthma).
Using some of the methods recommended for developing clinical codelists,8Â we first conceived conceptual definitions for each disease based on available data. Then, we searched for diagnoses in our EHR database in two ways: (1) using diagnostic codes described in previous studies, and (2) using case-insensitive text mining of the term definitions that accompany Read codes that could indicate diagnosis of AAD. For asthma, we found a large number of terms that required us to adopt a pragmatic approach to short listing. The authors SSC and LP selected all codes describing diagnoses, current adherence to treatment and control assessments, and excluded those describing asthma screening or which were considered too vague. For atopic eczema and hay fever, all codes found were related to the diagnosis and did not require the steps we employed for asthma. Additionally, we searched for BNF codes for the most common medications used to treat AAD (including generic and brand names). We discussed our definitions and lists of Read/BNF codes with clinicians and other researchers with expertise in AAD and agreed on the final definitions.
To deal with levels of uncertainty of whether or not the presence of a Read code for AAD reflected a confirmed diagnosis of AAD, we created two definitions for each outcome. The first definition was regarded as being more specific compared to the second for asthma and atopic eczema. The final case definitions are detailed in Table 1, and the CTV3 Read codes can be found at https://doi.org/10.17037/DATA.00003098.