The Quantitative Cultural Validation of Three Tools among People Affected by Leprosy in West Java, Indonesia.

KeywordsIndonesia, Leprosy, Neglected tropical diseases (NTDs), P-scale short, SALSA, Toolkit validation, Validation study, WGQ


Neglected tropical diseases [NTDs] are a medical diverse group of 17 diseases that are strongly associated with poverty. Globally, an estimated 1 billion people are affected by NTDs. For example, in Indonesia most of the population live in areas where at least one NTD is endemic. The figures on morbidity and disability are only estimates, due to a lack of culturally valid tools. Previous research, done by Van ‘t Noordende et al. (2016), made a prototype toolkit with tools categorised according to the ICF domains. This study aims to quantitatively validate the P-scale Short, the WGQ and the SALSA. The tools relate to the ICF components activity and participation.


The overall objective was to contribute to the development of an internationally usable cross-NTD and cross-cultural toolkit of instruments to measure NTD-related morbidity and disability, by a quantitative cultural validation study on persons affected by leprosy in West-Java, Indonesia.


This study uses a framework bases on the work of Herdman et al. (1998), Stevelink & Van Brakel (2013) and Terwee et al (2007). The framework of this study consisted of five kinds of equivalences: four qualitative (conceptual, semantic, item, and operational) and one quantitative (measurement). Measurement equivalence refers to the psychometric properties of the tools. The validated psychometric properties in this study were construct validity, internal consistency, floor and ceiling effects, and interpretability.


In order to validate scales quantitatively, first a qualitative check was needed. The qualitative check was done during the first three interviews. The quantitative validation was done with data from 101 interviews. The study population consisted of persons affected by leprosy living in a former leprosy community in Bantam, Java, Indonesia. The combination of the WGQ(6), WGQ(11), P-Scale Short and SALSA had several advantages: the tools are all based on the ICF, relate to the same or comparable ICF domains, are easy and quick to use, and have cross-culturally potential. Spearman’s correlations, Cronbach’s alphas, percentages of the lowest and highest scores, means, standard deviations, medians, and inter-quartile ranges were calculated in SPSS in order to the measurement equivalences. To prevent ethical inconvenience, ethical approval was obtained and all participants signed an informed consent.


The conceptual, semantic and item equivalence of the WGQ(6) and WGQ(11) was supported. During the pilot, item 4 was considered as confusing. After revising, 3

item 4 was together with the other items qualitatively valid. After all interviews, it was seen that most people needed explanation for question 7. Construct validity, internal consistency and floor and ceiling effects were for both WGQ versions appropriate, however the WGQ(11) scored better on all aspects. As example of interpretability is given how total mean score is significantly higher among participants with disability grade 1 or 2, compared with participants with disability grade 0, when measured with WGQ(11).

During the pilot, semantic; item; and operational equivalence was supported for the P-Scale Short. The measurement properties were considered as good. Question 12 was most difficult to understand, as most people needed an example or explanation.

For the SALSA scale, no problems were noticed regarding the conceptual, semantic or operational equivalence during the pilot. Question 2 led to confusion, and was therefore revised. The measurement properties were considered as good. Item 4 was most difficult to understand.


During the pilot, WGQ item 4 was revised. According to this and other studies, it is recommended to use the new version of the item. Semantic, item and operational equivalences are supported. When compared with the WGQ(6), the WGQ(11) is more comprehensive in facing all aspects of the construct activity, is more internal consistent, and is more sensitive in finding people facing activity limitations. Concluding, the WGQ(11) is more favourable to use in the study population.

The Bahasa Indonesia version of the P-Scale was already used in two large studies among persons affected by leprosy, which were partly conducted on Java. Therefore, remaining problems with the translation were unlikely. Indeed, this study did not find problems regarding semantic, item or operational equivalence. Although the internal consistency was lower compared with other studies, it still is considered as adequate. The other measurement equivalences were considered as good.

According to this and other studies, SALSA scale item 3 seemed to be less relevant when the answer option about risk avoidance is deleted. Therefore, this study recommends to remove question 3 if this answer option is removed as well. For question 2 is recommended to use the revised version and to make it two separate questions. The measurement properties of the SALSA scale are supported. 4


The overall conclusion is that this study shows that the WGQ(6), WGQ(11), P-Scale Short and SALSA are overall cultural valid to use among persons affected by leprosy in West-Java, while the WGQ(11) is in this study context more favourable than the WGQ(6). This study results are not one-to-one generalizable to other NTDs or regions: carefully testing or validating in the new context is needed. Persons who want to use these and other tools need to be highly aware of the importance of cultural validation of tools. More validation on a broader scale will be needed to develop an internationally usable cross-NTD toolkit of instruments to measure NTD-related morbidity and disability.