Virtual Coffee, Social Golf and Genetic Algorithms

With everyone working remotely at the moment, we have virtual coffee mornings once a week where we divide people into random groups of four to have a coffee and chat together over a virtual call. Easy enough to organise at the start but very quickly we started to get repeat meetings with the same people, and it took time to manually try to prevent repeat meetings so we thought we could automate the process. After all, we’re a technology company!

It turns out the task is a bit more challenging that appears at first glance. It’s a combinatorial optimisation problem and similar problems have been posed in the past.

Kirkman’s Schoolgirls

In 1850, Rev Thomas Penyngton Kirkman posed the following problem:

Fifteen young ladies in a school walk out three abreast for seven days in succession: it is required to arrange them daily so that no two shall walk twice abreast.

REV THOMAS PENYNGTON KIRKMAN

There are only 7 possible solutions to this problem but there are 404,756,352,000 combinations to check.
(Source: https://en.wikipedia.org/wiki/Kirkman’s_schoolgirl_problem)

sOCIAL gOLF

In 1998, this question was famously posted on a usenet newsgroup:

A group of 32 golfers plays golf once a week in groups of 4. Schedule these golfers to play for as many weeks as possible without any two golfers playing in the same group more than once.

USENET NEWSGROUP

We know that the golfers cannot play in unique groups for 11 or more weeks because each golfer has three new partners each week and after 11 weeks would have 11 x 3 = 33 partners. Solutions were quickly presented for weeks 1 – 9, but it took 6 years before Alejandro Aguado found a solution for 10 weeks that provided unique pairing for all 10 weeks. And there is still no deterministic algorithm available that solves this problem for 10 weeks.

Size Matters

These problems belong to a class of computer problems that are defined as NP Hard. They are nondeterministic in nature and cannot be solved in polynomial time. As the number of items increases, the time required to solve them generally increases exponentially or in our case factorially!

Kirkman’s Schoolgirls has over 4 x 1011 combinations to check. With the Social Golfer problem, the solution space has more than 2.6 x 1036 combinations.  And with 80 people for coffee to put into 20 groups, we have about 3.88 x 10117 solutions or

149103035513049588114190695256638608678301369463261984535644677204639665115583643422226968473501696000000000000000000 solutions to check each week. To put that into context, there’s estimated to be 1082 atoms in the observable universe.

Just randomly assign them

So, if we can’t find a perfect solution, what happens if we just randomly assign people to groups every week. After all, with 80 people, in theory we could go 27 weeks before we have a repeat. (27 * 3 = 81). It turns out the odds are against us. We have a 12% chance of repeats in the second week and this increases to over 50% after a few more weeks. The Math is 1 – (C(76, 3) / C(79,3))6 = 51%

So even with a large set of people and small groups, simple randomisation is not good enough. We are OK with some repeat meetings but would like to minimise the number of repeat meetings without having to brute-force check the entire solution space.

Genetic Algorithms

This is where genetic algorithms come to the fore. A Genetic Algorithm is a search-based optimization technique based on the principles of Genetics and Natural Selection. It allows us to come up with a “good enough” solution very quickly.

It’s based on the principals of natural selection. First, we generate a random population of solutions, and we then combine and mutate the solutions to create new solutions in the population. Each solution is evaluated for “fitness” and the strongest are selected for the next generation and the process is repeated over hundreds of generations until we have a “fit enough” solution.

In our example, each user will be represented as a Gene and the user group assignments will be represented as a Chromosome. We generate a population of Chromosomes at random, then we define a fitness function that causes the algorithm to converge on a solution that has the fewest possible repeat meetings.

Each Chromosome contains a sequence of genes, and the order of the genes represents the groups that the users will be assigned to. This table shows how two Chromosomes with 16 genes would be assigned into 4 groups.

CombinationChromosomeGroup 1Group 2Group 3Group 4
Combination 1[a,b,c,d,e,f,g,h,I,j,k,l,m,n,o,p]a b c de f g hi j k lm n o p
Combination 2[b,a,c,d,e,f,g,h,I,j,k,l,m,n,o,p]b a c de f g hi j k lm n o p

Fitness

How do we evaluate the fitness of each chromosome? We want to minimise the number of repeat meetings each week and if there are repeat meetings, we’d like the algorithm to favour solutions that had repeat meetings long ago over solutions that had repeat meetings recently.

To do this, we read in the history of all the previous group assignments for the last n weeks, and we create a penalty score matrix for all users that have met previously. If two users met n weeks ago, they get 1 penalty point. If they met n-1 weeks ago, they get 2 penalty points. If they met last week, they get n-1 penalty points.

Once we calculate the penalty points for all 80 users for the last 20 weeks (80 x 20 calculations), we have a lookup matrix for our fitness function.

The fitness function is applied to each Chromosome solution. For each group we look at all the pairs of people and lookup their penalty score. We aggregate all the combinations for all users in all groups in each Chromosome and assign the combined penalty score to that Chromosome.

Evolving

The algorithm is then able to compare the fitness of two Chromosomes using the Fitness function.

To “breed” new children, we take two parents and select two random points along the Chromosome in one parent. The genes between the points are passed to the child. The remaining genes are transferred from the same parent, but in the order that they appear in the second parent. The result is that the child contains all of the values from a single parent but includes ordering, and therefore traits, from both parents and avoids the possibility of duplicates or exclusions.

We also encourage the best solutions to pass through to the next generation by keeping a percentage of the Chromosomes unchanged. These are referred to as Elite chromosomes.

Finally, to introduce some diversity into the population we introduce mutations. We randomly change the order of Genes in some of the Chromosomes. This helps the algorithm to avoid local maxima and to converge more quickly to a more optimal global solution.

Coding

There’s a .NET NuGet library called the Genetic Algorithm Framework which does all the heavy lifting with implementing the algorithm.

Our main tasks are to read in the historical data, generate a score matrix, and use it to write a fitness function for a Chromosome. We output the data to screen and serialize it back to a CSV file.

We also added an “exclude” flag for each user so that the algorithm can exclude them from the groupings in the event that they are on holidays or are unavailable for some reason.

The source code is available on GitHub for anyone that’s interested.  For best results, play around with the genetic algorithm parameters for elitism, mutation, crossover and population size. The fitness function can also be modified and improved for better/faster results.

https://github.com/seamusbrady/CoffeeMorningAssigner

Disclaimer, it’s provided “as-is”, without warranty of any kind.

Previous Post
Searching and Learning – a TEKenable Salesforce Insight

Read More

Menu