Rosalind's "First Mendel Law" IPRB

As preparation for the upcoming bioinformatics course, I am doing some tasks from rosalind.info. I am currently stuck in the assignment of Mendel’s First Law .

I think that I could overcome myself through this, but somehow my thinking should be too confused. My approach would be this:

Build a probability tree that has three levels. There are two creatures that mate, creature A and creature B. First level: what is the likelihood of being selected as a creature. Homozygous dominant (k), heterozygous (m) or homozygous recessive (n). It seems that, for example, for a homozygous dominant, since there is a total number (k + m + n) of creatures, and k of them are homozygous dominants, the probability is k / (k + m + n).

Then in this tree under each of them there will come a probability of the creature B, which is k / m / n, given that we know which creature A was chosen. For example, if a creature was chosen heterozygous (m), then the likelihood that creature B would also be heterozygous would be (m-1) / (k + m + n-1), because now fewer heterozygous creatures remain.

This will give two levels of probability and will include a lot of code to get this far, since I would literally build a tree structure and manually write the code for this part for each branch.

enter image description here

Now, choosing creatures A and B, each of them has two chromosomes. One of these chromosomes can be randomly selected. Thus, for chromosome 1 or 2, you can choose the same for B. Thus, there are 4 different options: choose 1 from A, 1 from B. Choose 2 from A, 1 from B. Choose 1 from A, 2 from B Pick 2 of A, 2 of B. The probability of each of them will be 1/4. So finally, this tree will have these leaf probabilities.

Then from there, somehow, by magic, I would put together all these probabilities to see how likely it is that two organisms will produce a creature with a dominant allele.

I doubt that this task was designed to solve several hours. What am I thinking too much about?

Update:

I decided this in the funniest way of brute force. They simply launched thousands of simulated mating and found out the part that ended up with the dominant allele until there was enough accuracy to transmit the task.

import random k = 26 m = 18 n = 25 trials = 0 dominants = 0 while True: s = ['AA'] * k + ['Aa'] * m + ['aa'] * n first = random.choice(s) s.remove(first) second = random.choice(s) has_dominant_allele = 'A' in [random.choice(first), random.choice(second)] trials += 1 if has_dominant_allele: dominants += 1 print "%.5f" % (dominants / float(trials)) 
+1
python bioinformatics rosalind
source share
6 answers

Species with dominant alleles: AA or AA .

Your total ppopulation ( k + n + m consists of k ( hom ) homozygous dominant organisms with AA , m ( het ) heterozygous dominant organisms with AA and n ( rec ) homozygous recessive organisms with AA . Each of them can be combined with any other.

Probability for organisms with a dominant allele:

 P_dom = n_dominant/n_total or 1 - n_recessive/n_total 

Performing punnett squares for each of these combinations is a good idea:

  hom + het | A | a ----------- A | AA | Aa a | Aa | aa het + rec | a | a ----------- A | Aa | Aa a | aa | aa 

Mating two organisms appears to lead to four possible children. hom + het gives 1 out of 4 organisms with a recessive allele, het + rec gives 2 out of 4 organisms with a recessive allele.

You might also want to do this for other combinations.

Since we do not just pair organisms one on one, but together with it we collect a whole bunch of k + m + n , it would be nice to know the total number of descendants and the number of “children” with a certain allele.

If you don't mind a bit of Python, a comb from scipy.misc might be useful here. in the calculation, do not forget (a) that you get 4 children from each combination and (b) that you need a coefficient (from Punnett squares) to determine the recessive (or dominant) offspring from the combinations.

Update

  # total population pop_total = 4 * comb(hom + het + rec, 2) # use PUNNETT squares! # dominant organisms dom_total = 4*comb(hom,2) + 4*hom*het + 4*hom*rec + 3*comb(het,2) + 2*het*rec # probability for dominant organisms phom = dom_total/pop_total print phom # probability for dominant organisms + # probability for recessive organisms should be 1 # let check that: rec_total = 4 * comb(rec, 2) + 2*rec*het + comb(het, 2) prec = totalrec/totalpop print 1 - prec 
+4
source share

Klaus's decision has most of the correctness; however, an error occurs when calculating the number of combinations that have at least one dominant allele. This part is incorrect, because although there are 4 possibilities when combining 2 alleles to form offspring, in fact only one possibility is fulfilled. Therefore, Klaus decision calculates a percentage that is noticeably higher than it should be.

The correct way to count the number of combo organisms with at least one dominant allele is:

 # k = number of homozygous dominant organisms # n = number of heterozygous organisms # m = number of homozygous recessive organisms dom_total = comb(k, 2) + k*m + k*n + .5*m*n + .75*comb(m, 2) # Instead of: # 4*comb(k,2) + 4*k*n + 4*k*m + 3*comb(n,2) + 2*n*m 

The above code segment works to calculate the total number of dominant combos , because it multiplies each part by the percentage (100% is 1) that it will generate dominant offspring. You can think of each part as the number of squared putns for each type of combo (k & k, k & m, k & n, m & n, m & m).

Thus, the entire correct code segment will look like this:

 # Import comb (combination operation) from the scipy library from scipy.special import comb def calculateProbability(k, m, n): # Calculate total number of organisms in the population: totalPop = k + m + n # Calculate the number of combos that could be made (valid or not): totalCombos = comb(totalPop, 2) # Calculate the number of combos that have a dominant allele therefore are valid: validCombos = comb(k, 2) + k*m + k*n + .5*m*n + .75*comb(m, 2) probability = validCombos/totalCombos return probability # Example Call: calculateProbability(2, 2, 2) # Example Output: 0.783333333333 
+1
source share

You do not need to run thousands of simulations during the cycle. You can run one simulation and calculate the probability from it.

 from itertools import product k = 2 # AA homozygous dominant m = 2 # Aa heterozygous n = 2 # aa homozygous recessive population = (['AA'] * k) + (['Aa'] * m) + (['aa'] * n) all_children = [] for parent1 in population: # remove selected parent from population. chosen = population[:] chosen.remove(parent1) for parent2 in chosen: # get all possible children from 2 parents. Punnet square children = product(parent1, parent2) all_children.extend([''.join(c) for c in children]) dominants = filter(lambda c: 'A' in c, all_children) # float for python2 print float(len(list(dominants))) / len(all_children) # 0.7833333 
+1
source share

This is more a matter of probability / count than coding. It is easier to calculate the likelihood that offspring have only recessive traits. Let me know if you have trouble understanding anything. I executed the following code, and my grader gave the result.

 def mendel(x, y, z): #calculate the probability of recessive traits only total = x+y+z twoRecess = (z/total)*((z-1)/(total-1)) twoHetero = (y/total)*((y-1)/(total-1)) heteroRecess = (z/total)*(y/(total-1)) + (y/total)*(z/(total-1)) recessProb = twoRecess + twoHetero*1/4 + heteroRecess*1/2 print(1-recessProb) # take the complement #mendel(2, 2, 2) with open ("rosalind_iprb.txt", "r") as file: line =file.readline().split() x, y, z= [int(n) for n in line] print(x, y, z) file.close() print(mendel(x, y, z)) 
0
source share

I just found a formula for the answer. You have 8 possible interactions that can lead to dominant offspring:

 DDxDD, DDxDd, DdxDD, DdxDd, DDxdd, ddxDD, Ddxdd, ddxDd 

With corresponding probabilities of production of dominant offspring:

 1.0, 1.0, 1.0, 0.75, 1.0, 1.0, 0.5, 0.5 

It seemed strange to me at first that DDxdd and ddxDD were two separate pairing events, but if you think about it, they are conceptually slightly different. The probability of DDxdd is k/(k+m+n) * n/((k-1)+m+n) and the probability of ddxDD is n/(k+m+n) * k/(k+m+(n-1)) . Mathematically, they are identical, but, in terms of probability, these are two separate events. Thus, your total probability is the sum of the probabilities of each of these different mating events, multiplied by the probability that the mating event produced dominant offspring. I will not simplify this here step by step, but it gives you the code:

 total_probability = ((k ** 2 - k) + (2 * k * m) + (3 / 4 * (m ** 2 - m)) + (2* k * n) + (m * n)) / (total_pop ** 2 - total_pop) 

All you have to do is enter the values k , m and n and you get the probability they are asking for.

0
source share

I doubt that this task was designed to solve it for hours. What am I thinking too much?

I also had the same question. After reading the whole thread, I came up with the code.

I hope the code itself will explain the calculation of probability:

 def get_prob_of_dominant(k, m, n): # A - dominant factor # a - recessive factor # k - amount of organisms with AA factors (homozygous dominant) # m - amount of organisms with Aa factors (heterozygous) # n - amount of organisms with aa factors (homozygous recessive) events = ['AA+Aa', 'AA+aa', 'Aa+aa', 'AA+AA', 'Aa+Aa', 'aa+aa'] # get the probability of dominant traits (set up Punnett square) punnett_probabilities = { 'AA+Aa': 1, 'AA+aa': 1, 'Aa+aa': 1 / 2, 'AA+AA': 1, 'Aa+Aa': 3 / 4, 'aa+aa': 0, } event_probabilities = {} totals = k + m + n # Event: AA+Aa -> P(X=k, Y=m) + P(X=m, Y=k): P_km = k / totals * m / (totals - 1) P_mk = m / totals * k / (totals - 1) event_probabilities['AA+Aa'] = P_km + P_mk # Event: AA+aa -> P(X=k, Y=n) + P(X=n, Y=k): P_kn = k / totals * n / (totals - 1) P_nk = n / totals * k / (totals - 1) event_probabilities['AA+aa'] = P_kn + P_nk # Event: Aa+aa -> P(X=m, Y=n) +P(X=n, Y=m): P_mn = m / totals * n / (totals - 1) P_nm = n / totals * m / (totals - 1) event_probabilities['Aa+aa'] = P_mn + P_nm # Event: AA+AA -> P(X=k, Y=k): P_kk = k / totals * (k - 1) / (totals - 1) event_probabilities['AA+AA'] = P_kk # Event: Aa+Aa -> P(X=m, Y=m): P_mm = m / totals * (m - 1) / (totals - 1) event_probabilities['Aa+Aa'] = P_mm # Event: aa+aa -> P(X=n, Y=n) + P(X=n, Y=n) = 0 (will be * 0, so just don't use it) event_probabilities['aa+aa'] = 0 # Total probability is the sum of (prob of dominant factor * prob of the event) total_probability = 0 for event in events: total_probability += punnett_probabilities[event] * event_probabilities[event] return round(total_probability, 5) 
0
source share

All Articles