Find a year with lots of people living in Python

Given the list of people with their parent and final years (all between 1900 and 2000 ), find the year with the most people alive.

Here is my somewhat rude decision:

 def most_populated(population, single=True): years = dict() for person in population: for year in xrange(person[0], person[1]): if year in years: years[year] += 1 else: years[year] = 0 return max(years, key=years.get) if single else \ [key for key, val in years.iteritems() if val == max(years.values())] print most_populated([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939)]) print most_populated([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939), (1937, 1940)], False) 

I am trying to find a more efficient way to solve this problem in Python . Both readability and efficiency are calculated. Moreover, for some reason, my code will not print [1938, 1939] while it should.

Update

The input is a list tuples, where the first element of the tuple is the year person was born, and the second element a tuple is the year of death.

Update 2

The end of the year (the second part of the tuple) is also considered the year of a person’s life (so if a person dies in Sept 1939 (we don’t care about a month), he actually lives in 1939, at least part of it). This should lead to a lack of results in 1939.

The best solution?

While readability is in favor of @ joran-beasley , @ njzk2 was the most efficient algorithm for more input. Thanks @ hannes-ovrén for providing the analysis in IPython notebook on Gist

+8
python algorithm
source share
6 answers
 >>> from collections import Counter >>> from itertools import chain >>> def most_pop(pop): ... pop_flat = chain.from_iterable(range(i,j+1) for i,j in pop) ... return Counter(pop_flat).most_common() ... >>> most_pop([(1920, 1939), (1911, 1944), (1920, 1955), (1938, 1939)])[0] 
+3
source share

I would say:

  • Sorting faces by year of birth ( unborn list)
  • From the first born
    • Put this person on the alive list.
    • Using insertion sort by date of death (the list is sorted, so use a binary search)
    • Until you reach a person who was not born this year.
  • Then, starting with the person on the alive list who dies first, remove him from the list.
  • Put the size of the alive list in a dict
  • Year increase
  • Loop until the unborn and alive lists are empty

The complexity should be around O((m + n) * log(m)) (every year it is considered only once, and each person only twice, times the cost of inserting in the alive list)

Implementation

 from bisect import insort def most_populated(population, single=True): years = dict() unborn = sorted(population, key=lambda x: -x[0]) alive = [] dead = [] for year in range(unborn[-1][0], max(population, key=lambda x: x[1])[1] + 1): while unborn and unborn[-1][0] == year: insort(alive, -unborn.pop()[1]) while alive and alive[-1] == -(year - 1): dead.append(-alive.pop()) years[year] = len(alive) return max(years, key=years.get) if single else \ [key for key, val in years.iteritems() if val == max(years.values())] 
+3
source share

Another solution I just mentioned:

  • Create 2 tables, deathdates and deathdates .
  • Record birth dates and dates of death in these tables.
  • Browse these tables to accumulate the number of living people at that time.

Overall O(n) complexity

Implementation

 from collections import Counter def most_populated(population, single=True): birth = map(lambda x: x[0], population) death = map(lambda x: x[1] + 1, population) b = Counter(birth) d = Counter(death) alive = 0 years = {} for year in range(min(birth), max(death) + 1): alive = alive + b[year] - d[year] years[year] = alive return max(years, key=years.get) if single else \ [key for key, val in years.iteritems() if val == max(years.values())] 

It's better

 from collections import Counter from itertools import accumulate import operator def most_populated(population, single=True): delta = Counter(x[0] for x in population) delta.subtract(Counter(x[1]+1 for x in population)) start, end = min(delta.keys()), max(delta.keys()) years = list(accumulate(delta[year] for year in range(start, end))) return max(enumerate(years), key=operator.itemgetter(1))[0] + start if single else \ [i + start for i, val in enumerate(years) if val == max(years)] 
+3
source share

We can also use numpy slicing, which is pretty neat and should also be efficient enough:

 import numpy as np from collections import namedtuple Person = namedtuple('Person', ('birth', 'death')) people = [Person(1900,2000), Person(1950,1960), Person(1955, 1959)] START_YEAR = 1900 END_YEAR = 2000 people_alive = np.zeros(END_YEAR - START_YEAR + 1) # Alive each year for p in people: a = p.birth - START_YEAR b = p.death - START_YEAR + 1 # include year of death people_alive[a:b] += 1 # Find indexes of maximum aliveness and convert to year most_alive = np.flatnonzero(people_alive == people_alive.max()) + START_YEAR 

EDIT It seems that namedtuple adds a bit of overhead, so to speed up a bit more, remove namedtuple and do for birth, death in people:

+2
source share

How about this:

 def max_pop(pop): p = 0; max = (0,0) for y,i in sorted(chain.from_iterable([((b,1), (d+1,-1)) for b,d in pop])): p += i if p > max[1]: max=(y,p) return max 

This did not affect the length of the year, but in n | (unless you sort out the radix sort, which will be ~ 10n per thousand years and should be faster for | pop |> 1000). There can be neither one nor the other. A very general solution would be to first scan and decide which algorithm to use based on the measured period of the year and | pop |.

0
source share

I came to the next code that you need exactly.

Let's say the range of years is 1900 - 2000

Algorithm steps

  • Build an array X of 100 integers (all initialized to zero, 101 integers if 2000 is included).
  • For each of N people, increase X [year of birth - 1900] by one and decrease X [year of death - 1900] by one.
  • Iterate through X while maintaining the sum of each element. The year with most living people is 1900 plus an index, where the sum is maximum.

Code (Python on request)

 def year_with_max_population(people): population_changes = [0 for _ in xrange(1900, 2000)] for person in people: population_changes[person.birth_year - 1900] += 1 population_changes[person.death_year - 1900] -= 1 max_population = 0 max_population_index = 0 population = 0 for index, population_change in enumerate(population_changes): population += population_change if population > max_population: max_population = population max_population_index = index return 1900 + max_population_index 

loan 'Brian Schmitz' here

-one
source share

All Articles