Optimization algorithms

It's called optimization the process of iterating over a landscape of possible solutions to find the best one. One form of classifying these algorithms relies on its capacity to search throughout all possibilities:

Let's look at one important example of exhaustive search in python



Dynamic Programming


In simple terms, all the information an organism requires to regulate and replicate itself is located in the DNA. The DNA is represented by the combination of four nucleic acids that are usually referred to as:
[A,T,G,C]
the combination of multiple letters is called a sequence. A DNA sequence can span from a couple of nucleotides to the entire genome. Two versions of the same gene, found in different (but related) organisms have to first be aligned. Alignment is the process of overlapping two or more sequences arguing that two sequences that look more like each other than to any other sequence potentially represent the same genetic information in both organisms.
Let's take a look at the following two sequences:
# DD. SEQUENCE
# seq = str
# interp. a string of characters representing nucleotides
seq0 = "TCCATCACCCTGGGCTGGCGGCGTGTGGCTATGGGGACGCTGGGCAGGGCTGGCCAGGAGGATGGCTGAGACACTGGAGTCCCAGCAGGCACGCGTCACCCCTGGCACATCCCCAGGCAGTGGGACTCCCTGTCCCCAGTTCCTCAGCATCCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCTGGGGGCCTGGGGAATCTACGGGCACTGCAAGGGGCCCCGGGGGCTCAGCCCCCAGCTGGGGGGGGCCTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC" 
seq1 = "GAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGACTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC"
                
Notice that the length of the sequences is not the same. The goal of the dynamic programming algorithm is to find the best possible combination of letters in order to maximize a score. To achieve this, it is permitted to add gaps in between spaces. Gaps represent assumptions on missing data, and should be penalized. However, they are necessary to move the entire sequence to the right from a starting point; the position where the gap is inserted. We also have to penalize those positions that don't share the same letters.

The best alignment is the one which maximizes the number of letters that are the same while minimizing the number of mismatches and missing data
Look at the aligned version of the previous sequence:
__TCCA_TCACCCTGGGCTGGCGGC_GTGTGGCTATGGGGACGCTGGGCAGGGCTGGCC_AG_GAGGATGGCTGAG_ACACTGGAGTCCCAGCAGGCAC_GCGTCA_CCCCTGGCACATCCCCAGGCAG_TGGGACTCCCTGTC_CC_CA_GTTCCTCAGCA_T_CCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCT_GGGGGCCTGGGGAATCTACGGGCACTG_CAAGGGGCCCCGGGGGCTCAGC_CCCCAGCTGGGGGGGGC_CTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGT_CTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC_________________________________
GAGC_AAC_ACC__A__C_GGC___CG_G_GG_________C_C_GG_CGG__CC__CCTG_G_____TG_CCAG_GG_G______C___________TC_____AAC_______AT__CCCCAA_CGAG_____CTCTTC___C_TC_ACGCT____GG_GGCT______GGT___G__A__G_____________C_CTGGTGGAG_AA_CCTG____CT_GG____T__G_G__TG___G_______CT__G____CC_A_TC_________________C_____________________TG_________A_______A_______________________________________________________________________________________________________________________________________________________________________GAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGA_CTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC
                
There is something quite pleasing on looking at the sorted data this way, don't you agree? Here is the information that we need to implement our algorithm:


The parameters for the algorithm are:

  • PENALIZING_COSTS: dictionary - determines the penalty on gaps and mismatches of letters
  • SEQUENCE_0: string - the first sample used for the alignment. Its letters occupy the columns in the matrix
  • SEQUENCE_1: string - the second sample used for the alignment: Its letters occupy the rows in the matrix
  • SCORE_MATRIX: [[int, ..., n=len(sequence0)] , ..., n=len(sequence1)] - a 2D array of numbers representing all the possible combinations of letters between the sequences
  • # DD. PENALIZING_COSTS
    # penalizingCostOf = {"TRANSITION":int, "TRANSVERSION":int,"MATCH":0, "GAP":int}
    # interp. a summary of the penalizing costs for the comparison between nucleotides in a sequence alignment
    penalizingCostOf = {"TRANSITION":1, "TRANSVERSION":3,"MATCH":0, "GAP":2}
        
        
    # DD. SEQUENCE
    # seq = str
    # interp. a string of characters representing nucleotides
    seq0 = "TCCATCACCCTGGGCTGGCGGCGTGTGGCTATGGGGACGCTGGGCAGGGCTGGCCAGGAGGATGGCTGAGACACTGGAGTCCCAGCAGGCACGCGTCACCCCTGGCACATCCCCAGGCAGTGGGACTCCCTGTCCCCAGTTCCTCAGCATCCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCTGGGGGCCTGGGGAATCTACGGGCACTGCAAGGGGCCCCGGGGGCTCAGCCCCCAGCTGGGGGGGGCCTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC"  #sequence in the columns
    seq1 = "GAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGACTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC"  #sequence on the rows
        
    # DD. MATRIX
    # matrix = [[int, ..., len(SEQUENCE1)], ..., len(SEQUENCE2)]
    # interp. a matrix made from two sequences compared against each other
    matrix = [[0 for n2 in seq1] for n1 in seq0]
        
    # TEMPLATE FOR MATRIX
    # for row in matrix:
    #   for pt in row:
    #       ... pt
    

    The Algorithm

  • Generate and populate SCORE_MATRIX, comparing letters
    • For every cell:
      • Evaluate the cost of inserting a gap in SEQUENCE_0
      • Evaluate the cost of inserting a gap in SEQUENCE_1
      • Evaluate the cost of keeping the letters in their current configuration
      • keep the max score
    • With the SCORE_MATRIX populated, traverse the matrix backwards
      • invert the SCORE_MATRIX, the sequence0 and the sequence1
      • while the index of rows (y) and columns (x) in the SCORE_MATRIX_REVERSED are still in range of their sequences
        • Generate a list containing the MINIMUM_SCORES
        • Add the score to the right (x+1) to list MINIMUM_SCORES
        • Add the score below (y+1) to list MINIMUM_SCORES
        • Add the score diagonal (x+1,y+1) to list MINIMUM_SCORES
        • pick the index of the minimum score in the list MINIMUM_SCORES
        • Is the minimum score at index 0? T: It must be right. Increase x+1 and add a _ to SEQUENCE_0, while adding the letter at index x to SEQUENCE_1
        • Is the minimum score at index 1? T: It must be below. Increase y+1 and add a _ to SEQUENCE_1, while adding the letter at index y to SEQUENCE_0
        • Otherwise: It must be diagonal. Increase x+1 and y+1 and add the nucleotide at index x to SEQUENCE_1, and nucleotide at index y to SEQUENCE_0
    Looking at our algorithm this way can be tricky, even boring. I invite you to look at the following video, hoping it helps clarifying the logic behind the implementation!
    Let's work through the implementation together!
    # DATA DEFS
    
    # DD. TRANSITIONS
    
    # DD. PENALIZING_COSTS
    # penalizingCostOf = {"TRANSITION":int, "TRANSVERSION":int,"MATCH":0, "GAP":int}
    # interp. a summary of the penalizing costs for the comparison between nucleotides in a sequence alignment
    penalizingCostOf = {"TRANSITION": 1, "TRANSVERSION": 3, "MATCH": 0, "GAP": 2}
    
    # DD. SEQUENCE
    # seq = str
    # interp. a string of characters representing nucleotides
    seq0 = "TCCATCACCCTGGGCTGGCGGCGTGTGGCTATGGGGACGCTGGGCAGGGCTGGCCAGGAGGATGGCTGAGACACTGGAGTCCCAGCAGGCACGCGTCACCCCTGGCACATCCCCAGGCAGTGGGACTCCCTGTCCCCAGTTCCTCAGCATCCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCTGGGGGCCTGGGGAATCTACGGGCACTGCAAGGGGCCCCGGGGGCTCAGCCCCCAGCTGGGGGGGGCCTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC"  #sequence in the column
    seq1 = "GAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGACTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC"  #sequence on the top
    
    # DD. MATRIX
    # matrix = [[int, ..., len(SEQUENCE1)], ..., len(SEQUENCE2)]
    # interp. a matrix made from two sequences compared against each other
    matrix = [[0 for n2 in seq1] for n1 in seq0]
    
    # TEMPLATE FOR MATRIX
    # for row in matrix:
    #   for pt in row:
    #       ... pt
    
    # CODE
    
    
    def evaluateTransTransv(s1, s2):
      nm = [["A", "G"], ["C", "T"]]
    
      for nt in nm:
        if s1 in nt and s2 in nt:
          return penalizingCostOf["TRANSITION"]
      return penalizingCostOf["TRANSVERSION"]
    
    
    def evaluate(x, y, dir):
      # coming from left (l)
      if dir == "l":
        return matrix[y][x - 1] - penalizingCostOf["GAP"]
      # coming from top (t)
      elif dir == "t":
        return matrix[y - 1][x] - penalizingCostOf["GAP"]
      # coming from diagonal (d)
      else:
        if seq0[y] == seq1[x]:
          return matrix[y - 1][x - 1] - penalizingCostOf["MATCH"]
        else:
          return matrix[y - 1][x - 1] - evaluateTransTransv(seq0[y], seq1[x])
    
    
    for y, row in enumerate(matrix):
      for x, pt in enumerate(row):
        score = None
        # point at origin is 0
        if y == 0 and x == 0:
          score = 0
        # points in the top edge can come from left
        elif y == 0:
          score = evaluate(x, y, "l")
        # points in the left edge can come from top
        elif x == 0:
          score = evaluate(x, y, "t")
        # other points can come from left, top, diagonal
        else:
          scores = []
          # left (gap insertion)   0
          scores.append(evaluate(x, y, "l"))
          # top (gap insertion)    1
          scores.append(evaluate(x, y, "t"))
          # diagonal               2
          scores.append(evaluate(x, y, "d"))
          score = max(scores)
        matrix[y][x] = score
    
    seq0_aligned = ""
    seq1_aligned = ""
    revSeq0 = str(seq0[::-1])
    revSeq1 = str(seq1[::-1])
    revMatrix = [row[::-1] for row in matrix[::-1]]
    
    y = 0
    x = 0
    while y<len(revMatrix) and x<len(revMatrix[0]):
      # if last row and last column, keep the letters
      if y == len(matrix)-1 and x == len(matrix)-1:
        seq0_aligned += revSeq0[y]
        seq1_aligned += revSeq1[x]
        x += 1
        y += 1
      # if last row we can only go right in the inverse matrix.
      elif y == len(matrix)-1:
        # add insertion to revSeq0
        seq0_aligned += "_"
        # add nucleotide to revSeq1
        seq1_aligned += revSeq1[x]
        # update x
        x +=1
      # if last column, check only down in the inverse matrix
      elif x == len(row)-1:
        # add nucleotide to revSeq0
        seq0_aligned += revSeq0[y]
        # add insertion to revSeq1
        seq1_aligned += "_"
        # update y
        y +=1
      
      # otherwise evaluate right, diagonal, down
      else:
        max_scores = []
        max_scores.append(revMatrix[y][x+1]) 
        max_scores.append(revMatrix[y+1][x]) 
        max_scores.append(revMatrix[y+1][x+1])
        idx_minScore = max_scores.index(max(max_scores))
    
        # case 1. going right
        if idx_minScore == 0:
          # add insertion to revSeq0
          seq0_aligned += "_"
          # add nucleotide to revSeq1
          seq1_aligned += revSeq1[x]
          # update x
          x +=1
    
        # case 2. going down
        elif idx_minScore == 1:
          # add nucleotide to revSeq0
          seq0_aligned += revSeq0[y]
          # add insertion to revSeq1
          seq1_aligned += "_"
          # update y
          y +=1
        # case 3. going diagonal
        else:
          # add nucleotide to revSeq0
          seq0_aligned += revSeq0[y]
          # add nucleotide to revSeq1
          seq1_aligned += revSeq1[x]
          # update x and y
          x += 1
          y +=1
    
    seq0_aligned = seq0_aligned[::-1]
    seq1_aligned = seq1_aligned[::-1]
    print(seq0_aligned)
    print(seq1_aligned)