Similarity Analysis of Product Customization Artefacts
Many companies developing software increasingly have to put their resources on customizing their products for specific customer needs. Since software product customizations (SPCs) are more frequent and involve more limited changes than typical releases of the software there is frequent overlap between SPCs and between SPCs and normal evolution of the software.
This thesis aims to analyse and test how text similarity measures, such as the Normalized Compression Distance (NCD) and Normalized Google Distance (NGD), can be used to help detect overlap between SPC artefacts. Potentially this could decrease costs considerably by detecting overlap in early phases of SPC handling. The main focus will be on artefacts related to the request itself and the requirements, but later development artefacts can also be interesting targets.
The NCD (and related measures) have theoretically pleasing properties as being universally best simiarlity metrics but have also shown a lot of promise in practice.
Steps
The thesis project will involve
- studying and summarizing existing uses of text similarity metrics (TSMs) with a special focus on the NCD and NGD,
- describe and analyse the different artfacts involved in SPC handling at a partner company,
- testing/experiments how TSMs can be used to detect overlap and similarity between SPCs at the company,
Prerequisites
Students interested in this topic should preferably have knowledge/experience/interest in:
- software engineering (some),
- data/text mining (merit),
Links / Input
- R. Feldt, R. Torkar, T. Gorschek, and W. Afzal, "Searching for Cognitively Diverse Tests: Towards Universal Test Diversity Metrics", In proceedings of the 1st Workshop on Search-Based Software Testing, pp. 178 - 186, Lillehammer, Norway, 2008.
- Runeson, P. and Alexandersson, M. and Nyholm, O., "Detection of duplicate defect reports using natural language processing", Proceedings of the 29th international conference on Software Engineering, pp. 499-510, 2007.
- Ilyas and Kung, "A Similarity Measurement Framework for Requirements Engineering", pp.31-34, 2009 Fourth International Multi-Conference on Computing in the Global Information Technology, 2009
- CompLearn Toolkit for NCD
- String Metrics library