Clinicians and Clients Disagree: Implications for Evidence-Based Practice

This blog piece by Dr. Douglas Samuel from Purdue University discusses a recently published article in the Journal of Abnormal Psychology.

It is well-established that the approaches to diagnosis differ substantially between clinical practice and research settings. Whereas the typical research study collects data using either a semi-structured interview administered by a research assistant or a self-report questionnaire completed by the client, the norm within real-world practice settings is to provide diagnoses on the basis of an unstructured clinical interview. Unfortunately, there has been relatively less attention paid to the level of diagnostic agreement between these alternate approaches. In other words, how similar are the individuals diagnosed with a given disorder in clinical practice to those who are diagnosed in research settings?

Alarmingly, accumulating evidence suggests considerable divergence between research and clinical diagnoses. David Rettew and his colleagues (2009) reviewed the literature (a total of nearly 16,000 clients across 38 studies) on the agreement between semi-structured interviews and routine clinical diagnoses and reported that the overall agreement across all disorders was K = .27. Although, there were a few bright spots (e.g., eating disorders showed strong agreement), the overall conclusion was that the correspondence between research and practice diagnoses was strikingly low for most disorders.

My colleagues and I recently have examined this much more closely for personality disorder (PD) as this has been an area of diagnosis with a long-standing disconnect between clinical and research diagnoses (Perry, 1992). I first conducted a systematic review of clinicians’ diagnoses based on unstructured interviews and learned that across 27 studies, the median dimensional agreement with research diagnoses (e.g., semi-structured interview or self-report questionnaire) was only .23 (Samuel, 2015). Follow-up analyses revealed that this value was only modestly better for research diagnoses assigned based on semi-structured interviews (r = .28) than from self-report questionnaires (r = .22). Furthermore, even when the clinician diagnoses were assigned using systematic assessment tools, such as the Shedler-Westen Assessment Procedure (Shedler & Westen, 1998), the overall agreement improved only to r = .33. In sum, the literature suggests that for PDs – and many disorders across the manual – the diagnoses assigned in routine clinical practice share limited variance with those assigned in research settings

This has profound implications for the possibility of translating empirical findings into evidence-based practice (EBP) as it suggests that the findings from research studies may not be applicable to the individuals being treated in clinical practice. Arguably the greatest accomplishment in mental health over the past few decades has been the increased focus on providing psychological treatment that is supported by empirical research. This effort to develop and test interventions that offer demonstrable improvement in Randomized Controlled Trials (RCTs) has yielded a lengthy list of Empirically Supported Treatments that can inform an EBP (https://www.div12.org/psychological-treatments/treatments/). Yet in nearly all of these RCTs, the clients have been carefully diagnosed with a semi-structured interview. For example, Dialectical Behavior Therapy (DBT; Linehan, Tutek, Heard, & Armstrong, 1994) has proven to be an effective treatment for those diagnosed with Borderline Personality Disorder (BPD). Yet, given the disagreement between those methods how much confidence should a therapist who arrives at an unstructured diagnosis of BPD have that even high-fidelity DBT will benefit their client?

A second question that emerges from these diagnostic disagreement findings is which source should be trusted? There is a robust literature on the relative value of informants – and discrepancies between informants – in the childhood psychopathology literature, but relatively no comparable information about this in adults (De Los Reyes, Thomas, Goodman, & Kundey, 2013). This is particularly salient for the limited agreement between clinicians’ ratings and clients’ self-reports. The typical response – at least as it pertains to personality pathology – is to dismiss the discrepant self-report information as invalid and biased by lack of insight or deliberate attempts to portray themselves in a positive light (Huprich, Bornstein, & Schmitt, 2011).

Here again, though, recent evidence has suggested that it may not be as simple as trusting the clinician. Within a large clinical sample of those, we compared therapist and client ratings of PD pathology at baseline to indicators of psychosocial functioning collected five years later. Quite surprisingly, we found that self-report ratings routinely offered incremental prediction beyond the therapist ratings, whereas the reverse was only rarely true (Samuel et al., 2013). Although a single study, this finding provocatively suggests that self-reports have been given short shrift in terms of their value. My lab is currently analyzing additional data that compares client and therapist-reports for predicting outcomes of psychotherapy.

In sum, there is robust proof for systematic discrepancies between the diagnoses offered in routine clinical practice and those from research settings. These discrepancies have important implications for the ability of the empirical literature to inform clinical practice. More specifically, these findings encourage practitioners in the continued adoption of Evidence-Based Assessment strategies (Hunsley & Mash, 2007) as a crucial component of EBP. For researchers, the disagreement highlights an increased need to investigate the relative validities of alternative methods to determine how information from various sources can yield maximally valid diagnoses.

Discussion Questions

Considering longstanding concerns about the validity of categorical psychiatric diagnoses broadly, do you feel the low agreement is a function of true differences across sources or inherent limitations of the categories? Do you think ratings of cross-cutting dimensions would show greater agreement?
Within clinical practice how do you typically incorporate information from clients that is discrepant from your own formulation?

Reference Article

Samuel, D. B., Suzuki, T., & Griffin, S. A. (in press). Clinicians and clients don’t agree: Five implications for clinical science. Journal of Abnormal Psychology.

About the Author

Douglas B. Samuel, Ph.D., is an assistant professor clinical psychology at Purdue University. His research focuses on investigating dimensional models – particularly the Five Factor Model of personality — for improving the conceptualization of psychopathology. He is particularly interested in integrating multiple sources (e.g., clients, therapists, informants) and methods (e.g., EEG, ecological momentary assessment) can be combined to better assess and diagnosis mental illness.