Optimization algorithms
It's called optimization the process of iterating over a landscape of possible solutions to find the best one. One form of classifying these algorithms relies on its capacity to search throughout all possibilities:- Exhaustive: find the best global solution by looking into every possible result
- Heuristic: finds the "best so far" solution by systematically navigating the landscape of possible results
Dynamic Programming
In simple terms, all the information an organism requires to regulate and replicate itself is located in the DNA. The DNA is represented by the combination of four nucleic acids that are usually referred to as:
[A,T,G,C]
the combination of multiple letters is called a sequence. A DNA sequence can span from
a couple of nucleotides to the entire genome. Two versions of the same gene, found in different (but
related) organisms
have to first be aligned. Alignment is the process of overlapping two or more sequences
arguing that two
sequences that look more like each other than to any other sequence potentially represent the same
genetic information in both organisms.
Let's take a look at the following two sequences:
# DD. SEQUENCE # seq = str # interp. a string of characters representing nucleotides seq0 = "TCCATCACCCTGGGCTGGCGGCGTGTGGCTATGGGGACGCTGGGCAGGGCTGGCCAGGAGGATGGCTGAGACACTGGAGTCCCAGCAGGCACGCGTCACCCCTGGCACATCCCCAGGCAGTGGGACTCCCTGTCCCCAGTTCCTCAGCATCCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCTGGGGGCCTGGGGAATCTACGGGCACTGCAAGGGGCCCCGGGGGCTCAGCCCCCAGCTGGGGGGGGCCTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC" seq1 = "GAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGACTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC"
Notice that the length of the sequences is not the same. The goal of the dynamic programming
algorithm is to find the best possible combination of letters in order to maximize a score. To achieve
this, it is permitted to add gaps in between spaces. Gaps represent assumptions on missing data, and
should be penalized. However, they are necessary to move the entire sequence to the right from a
starting point; the position where the gap is inserted. We also have to penalize those positions that
don't share the same letters.
The best alignment is the one which maximizes the number of letters that are the same while minimizing the number of mismatches and missing data
The best alignment is the one which maximizes the number of letters that are the same while minimizing the number of mismatches and missing data
Look at the aligned version of the previous sequence:
__TCCA_TCACCCTGGGCTGGCGGC_GTGTGGCTATGGGGACGCTGGGCAGGGCTGGCC_AG_GAGGATGGCTGAG_ACACTGGAGTCCCAGCAGGCAC_GCGTCA_CCCCTGGCACATCCCCAGGCAG_TGGGACTCCCTGTC_CC_CA_GTTCCTCAGCA_T_CCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCT_GGGGGCCTGGGGAATCTACGGGCACTG_CAAGGGGCCCCGGGGGCTCAGC_CCCCAGCTGGGGGGGGC_CTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGT_CTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC_________________________________ GAGC_AAC_ACC__A__C_GGC___CG_G_GG_________C_C_GG_CGG__CC__CCTG_G_____TG_CCAG_GG_G______C___________TC_____AAC_______AT__CCCCAA_CGAG_____CTCTTC___C_TC_ACGCT____GG_GGCT______GGT___G__A__G_____________C_CTGGTGGAG_AA_CCTG____CT_GG____T__G_G__TG___G_______CT__G____CC_A_TC_________________C_____________________TG_________A_______A_______________________________________________________________________________________________________________________________________________________________________GAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGA_CTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC
There is something quite pleasing on looking at the sorted data this way, don't you agree? Here is the
information that we need to implement our algorithm:
The parameters for the algorithm are:
# DD. PENALIZING_COSTS # penalizingCostOf = {"TRANSITION":int, "TRANSVERSION":int,"MATCH":0, "GAP":int} # interp. a summary of the penalizing costs for the comparison between nucleotides in a sequence alignment penalizingCostOf = {"TRANSITION":1, "TRANSVERSION":3,"MATCH":0, "GAP":2} # DD. SEQUENCE # seq = str # interp. a string of characters representing nucleotides seq0 = "TCCATCACCCTGGGCTGGCGGCGTGTGGCTATGGGGACGCTGGGCAGGGCTGGCCAGGAGGATGGCTGAGACACTGGAGTCCCAGCAGGCACGCGTCACCCCTGGCACATCCCCAGGCAGTGGGACTCCCTGTCCCCAGTTCCTCAGCATCCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCTGGGGGCCTGGGGAATCTACGGGCACTGCAAGGGGCCCCGGGGGCTCAGCCCCCAGCTGGGGGGGGCCTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC" #sequence in the columns seq1 = "GAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGACTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC" #sequence on the rows # DD. MATRIX # matrix = [[int, ..., len(SEQUENCE1)], ..., len(SEQUENCE2)] # interp. a matrix made from two sequences compared against each other matrix = [[0 for n2 in seq1] for n1 in seq0] # TEMPLATE FOR MATRIX # for row in matrix: # for pt in row: # ... pt
The Algorithm
- For every cell:
- Evaluate the cost of inserting a gap in SEQUENCE_0
- Evaluate the cost of inserting a gap in SEQUENCE_1
- Evaluate the cost of keeping the letters in their current configuration
- keep the max score
- With the SCORE_MATRIX populated, traverse the matrix backwards
- invert the SCORE_MATRIX, the sequence0 and the sequence1
- while the index of rows (y) and columns (x) in the SCORE_MATRIX_REVERSED are still in range of their sequences
- Generate a list containing the MINIMUM_SCORES
- Add the score to the right (x+1) to list MINIMUM_SCORES
- Add the score below (y+1) to list MINIMUM_SCORES
- Add the score diagonal (x+1,y+1) to list MINIMUM_SCORES
- pick the index of the minimum score in the list MINIMUM_SCORES
- Is the minimum score at index 0? T: It must be right. Increase x+1 and add a _ to SEQUENCE_0, while adding the letter at index x to SEQUENCE_1
- Is the minimum score at index 1? T: It must be below. Increase y+1 and add a _ to SEQUENCE_1, while adding the letter at index y to SEQUENCE_0
- Otherwise: It must be diagonal. Increase x+1 and y+1 and add the nucleotide at index x to SEQUENCE_1, and nucleotide at index y to SEQUENCE_0
Looking at our algorithm this way can be tricky, even boring. I invite you to look at the following
video, hoping it
helps clarifying the logic behind the implementation!
Let's work through the implementation together!
# DATA DEFS # DD. TRANSITIONS # DD. PENALIZING_COSTS # penalizingCostOf = {"TRANSITION":int, "TRANSVERSION":int,"MATCH":0, "GAP":int} # interp. a summary of the penalizing costs for the comparison between nucleotides in a sequence alignment penalizingCostOf = {"TRANSITION": 1, "TRANSVERSION": 3, "MATCH": 0, "GAP": 2} # DD. SEQUENCE # seq = str # interp. a string of characters representing nucleotides seq0 = "TCCATCACCCTGGGCTGGCGGCGTGTGGCTATGGGGACGCTGGGCAGGGCTGGCCAGGAGGATGGCTGAGACACTGGAGTCCCAGCAGGCACGCGTCACCCCTGGCACATCCCCAGGCAGTGGGACTCCCTGTCCCCAGTTCCTCAGCATCCTCTCGGCCTTGCCATCAGGCGCTGCGGGAGCGCTGGCAGGGTGGGCTGGGGGCCTGGGGAATCTACGGGCACTGCAAGGGGCCCCGGGGGCTCAGCCCCCAGCTGGGGGGGGCCTGGGGTTGAGGTGGGGGCCATGTCGACGCTGGCCCCCCTGCGCCTGCTGCGCAAGCCCTGGAACACCAGTGAGGGCAACCAGAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGAGTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATC" #sequence in the column seq1 = "GAGCAACACCACGGCCGGGGCCGGCGGCCCCTGGTGCCAGGGGCTCAACATCCCCAACGAGCTCTTCCTCACGCTGGGGCTGGTGAGCCTGGTGGAGAACCTGCTGGTGGTGGCTGCCATCCTGAAGAACAGGAACCTGCACTCGCCCACGTACTACTTCATCTGCTGCCTGGCCATCTCCGACATGCTGGTGAGCGTCAGCAACCTGGTGGAGACGCTCTTCATGCTGCTGATGGAGCACGGTGTGCTGGTGATCCACGCCAGCATCGTCCGCCACATGGATAACGTCATCGACATGCTCATCTGCAGCTCCATCGTGTCCTCCCTTTCCTTCCTGGGGGTCATTGCTGTGGACCGCTACATCACCATCTTCTACGCCCTGCGCTACCACAACATCATGACGCTGCAACGGGCTGTGGTGACCATGGCCAGTGTCTGGCTGGCCAGCACCGTCTCCAGCACCATCTTCATCACCTACTACCACAATAACGCCATCCTCCTCTGCCTCATCGGCTTCTTCCTCTTTGTGCTGGTCCTCATGCTGGTGCTCTACATCCACATGTTCATCCTGTCCCGCCACCACCTCTGCAGCATCTCCAGCCAGCAGAAGCAGCCCACCATCTACTGCAGCAGCAGCCTGAAGGGAACCATCACGCTCACCATCCTCCTGGGACTCTTCTTCATCTGCTGGGGGCCCTTCTTCTTCCACCTCATCCTCATCGTCACCTGCCCCACAAACCCCTTCTGCACCTGCTTCTTCAGCTACTTCAACCTCTTCTTCATCCTCATCATCTGCAACTCGGTGGTCAACCCCCTC" #sequence on the top # DD. MATRIX # matrix = [[int, ..., len(SEQUENCE1)], ..., len(SEQUENCE2)] # interp. a matrix made from two sequences compared against each other matrix = [[0 for n2 in seq1] for n1 in seq0] # TEMPLATE FOR MATRIX # for row in matrix: # for pt in row: # ... pt # CODE def evaluateTransTransv(s1, s2): nm = [["A", "G"], ["C", "T"]] for nt in nm: if s1 in nt and s2 in nt: return penalizingCostOf["TRANSITION"] return penalizingCostOf["TRANSVERSION"] def evaluate(x, y, dir): # coming from left (l) if dir == "l": return matrix[y][x - 1] - penalizingCostOf["GAP"] # coming from top (t) elif dir == "t": return matrix[y - 1][x] - penalizingCostOf["GAP"] # coming from diagonal (d) else: if seq0[y] == seq1[x]: return matrix[y - 1][x - 1] - penalizingCostOf["MATCH"] else: return matrix[y - 1][x - 1] - evaluateTransTransv(seq0[y], seq1[x]) for y, row in enumerate(matrix): for x, pt in enumerate(row): score = None # point at origin is 0 if y == 0 and x == 0: score = 0 # points in the top edge can come from left elif y == 0: score = evaluate(x, y, "l") # points in the left edge can come from top elif x == 0: score = evaluate(x, y, "t") # other points can come from left, top, diagonal else: scores = [] # left (gap insertion) 0 scores.append(evaluate(x, y, "l")) # top (gap insertion) 1 scores.append(evaluate(x, y, "t")) # diagonal 2 scores.append(evaluate(x, y, "d")) score = max(scores) matrix[y][x] = score seq0_aligned = "" seq1_aligned = "" revSeq0 = str(seq0[::-1]) revSeq1 = str(seq1[::-1]) revMatrix = [row[::-1] for row in matrix[::-1]] y = 0 x = 0 while y<len(revMatrix) and x<len(revMatrix[0]): # if last row and last column, keep the letters if y == len(matrix)-1 and x == len(matrix)-1: seq0_aligned += revSeq0[y] seq1_aligned += revSeq1[x] x += 1 y += 1 # if last row we can only go right in the inverse matrix. elif y == len(matrix)-1: # add insertion to revSeq0 seq0_aligned += "_" # add nucleotide to revSeq1 seq1_aligned += revSeq1[x] # update x x +=1 # if last column, check only down in the inverse matrix elif x == len(row)-1: # add nucleotide to revSeq0 seq0_aligned += revSeq0[y] # add insertion to revSeq1 seq1_aligned += "_" # update y y +=1 # otherwise evaluate right, diagonal, down else: max_scores = [] max_scores.append(revMatrix[y][x+1]) max_scores.append(revMatrix[y+1][x]) max_scores.append(revMatrix[y+1][x+1]) idx_minScore = max_scores.index(max(max_scores)) # case 1. going right if idx_minScore == 0: # add insertion to revSeq0 seq0_aligned += "_" # add nucleotide to revSeq1 seq1_aligned += revSeq1[x] # update x x +=1 # case 2. going down elif idx_minScore == 1: # add nucleotide to revSeq0 seq0_aligned += revSeq0[y] # add insertion to revSeq1 seq1_aligned += "_" # update y y +=1 # case 3. going diagonal else: # add nucleotide to revSeq0 seq0_aligned += revSeq0[y] # add nucleotide to revSeq1 seq1_aligned += revSeq1[x] # update x and y x += 1 y +=1 seq0_aligned = seq0_aligned[::-1] seq1_aligned = seq1_aligned[::-1] print(seq0_aligned) print(seq1_aligned)