Skip to main content
留学咨询

辅导案例-CS547

By May 15, 2020No Comments

CS547 – Advanced Topics in Software Engineering Software Project Economics – Cost Estimation and Genetic Programming Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 1 / 20 Introduction Cost estimation is a challenging problem Given a project with various parameters (estimates of size, domain, programming languages, expertise of development team etc. etc.) how much is it going to cost? Many different approaches to making predictions: Algorithmic (COCOMO etc.) Historical data (Expert Judgement, Statistics and Machine Learning) Numerous approaches but no agreement over which one to use This topic looks at the use of genetic programming for cost estimation Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 2 / 20 Cost Estimation – Using Historical Data Typical dataset example: @attribute Input numeric @attribute Output numeric @attribute Inquiry numeric @attribute File numeric @attribute FPAdj numeric @attribute RawFPcounts numeric @attribute AdjFP numeric @attribute Effort numeric @data 25 150 75 60 1 1750 1750 102.4 193 98 70 36 1 1902 1902 105.2 70 27 0 12 0.8 535 428 11.1 40 60 20 12 1.15 660 759 21.1 10 69 1 9 0.9 478.89 431 28.8 13 19 0 23 0.75 377.33 283 10 34 14 0 5 0.8 256.25 205 8 … A project may have several parameters which capture various characteristics Will vary between projects and datasets Interested in deriving the most accurate prediction for the Effort parameter Given a new project we can use this to make an estimate for the development effort Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 3 / 20 Genetic Programming Application of evolutionary computation to programs or functions Genetically breeds a population of programs to solve a problem Uses the same principles of genetic operators: sexual recombination, mutation etc. Aim (in this case) is to evolve a function which makes an accurate prediction of cost given a project’s characteristics GP is also widely used for a range of other problems such as code optimisation or automated repair Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 4 / 20 Program Representation Programs expressed as syntax trees rather than lines of code e.g. max(x+x, x+3*y) Internally represented as trees or prefix expressions:- (max(+ x x) (+ x (* 3 y))) Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 5 / 20 GP – Key Components Need to specify: Terminal set Function Set Fitness Measure Run Parameters Termination Criterion Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 6 / 20 Terminal Set Identify the possible leaves that constitute the program. May consist of: External inputs (x, y etc.) Numerical constants (0, 42, 3.14159 etc.) 0-arity functions (rand() etc.) Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 7 / 20 Function Set Identify the possible non-terminal elements of the program. May consist of simple arithmetic operators (+, -, /, *, /) and a range of additional functions: Mathematical (sin, abs etc.) Boolean (and, or, not) Conditional (if-then-else) Looping (for, repeat) Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 8 / 20 Other Key Components Need to specify: Fitness Measure Captures goal of search. Same as for other search-based problems. Cost estimation: distance between predicted and known effort value Run Parameters Genetic operator probabilities Maximum size of programs … Termination Criterion Number of generations or fitness-based success criterion Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 9 / 20 Steps in a Run Identical to other search-based approaches 1 Randomly create an initial population of programs from available primitives 2 While the termination criteria remains unsatisfied, do: 1 Execute each program to determine its fitness 2 Select programs to contribute to the next generation 3 Create new individual programs by applying genetic operators 3 Return best individual Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 10 / 20 Initialising the Population Programs in initial population built by recursively generating a tree composed of random choices of functions and terminals Individuals generated subject to a defined maximum size Three standard approaches Full initialisation Grow initialisation Ramped half-and-half (combination of full and grow) Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 11 / 20 Full Initialisation Nodes taken from function set until a maximum tree depth is reached, after which only terminals may be chosen Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 12 / 20 Grow Initialisation Like full but allows nodes to be taken from the full function set until a maximum tree depth is reached Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 13 / 20 Genetic Operators – Crossover Given copies of two selected parents, crossover randomly selects a crossover point in each parent tree and swaps the sub-trees rooted at the crossover points Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 14 / 20 Genetic Operators – Mutation Mutation randomly selects a mutation point in a tree and replaces the sub-tree rooted there with a randomly generated sub-tree Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 15 / 20 Example using tiny gp.java Data: 1 1 2 4 3 9 4 16 5 25 … 45 2025 46 2116 47 2209 48 2304 49 2401 50 2500 Output: Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 16 / 20 Cost Estimation – Determining Prediction Accuracy Various standard measures to assess how good a prediction is: MMRE – Mean Magnitude of Relative Error:- MRE = (abs(actual effort – estimated effort))/actual effort Pred(n) – Proportion of estimates within n% of the actual MAE – Mean Absolute Error :- abs(actual effort – estimated effort) These measures are used to evaluate the performance of a model and may also form the basis of a fitness function Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 17 / 20 How Good is The Evolved Program? One approach is to use a train/test split: Train the model on 80% of the data, and hold back 20% for testing Alternatively, especially if the data set is small, use k-fold cross-validation: Evolve a program using k-1 partitions and compare the program’s output with the remaining partition Repeat and average over the whole dataset Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 18 / 20 Comparing Against Other Estimation Techniques Large number in use, e.g. Linear Regression Decision Trees KNN (K Nearest Neighbours) Plus many many other more sophisticated machine learning techniques…. Most implemented in Weka (machine learning and data mining toolkit) – installed in labs and uses same format as cost estimation data files you will be working from, or R (more powerful language and set of libraries), or within Python libraries. Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 19 / 20 Assessed Task Using GP to build software project cost estimation functions Read section 8.4 in the ‘Techniques Taxonomy, Tutorial’ paper Collection of effort datasets on myplace – try at le
ast two of these (more if you wish) Several frameworks/libraries to support this (e.g. JGAP for Java or DEAP for Python) Evaluate the effectiveness of your approach Compare with one other basic technique and also baseline (e.g. mean Effort) Produce a short report More details on myplace Software Project Economics – Cost Estimation and Genetic ProgrammingCS547 – Advanced Topics in Software Engineering 20 / 20

admin

Author admin

More posts by admin