TAKE ADVANTAGE OF THE COMPUTING POWER OF DNA COMPUTERS

. Ever since Adleman [1] solved the Hamilton Path problem using a combinatorial molecular method, many other hard computational problems have been investigated with the proposed DNA computer [3] [25] [9] [12] [19] [22] [24] [27] [29] [30]. However, these computation methods all work toward one destination through a couple of steps based on the initial conditions. If there is a single change on these given conditions, all the procedures need to be gone through no matter how complicate these procedures are and how simple the change is. The new method we are proposing here in the paper will take care of this problem. Only a few extra steps are necessary to take when the initial condition has been changed. This will provide a lot of savings in terms of time and cost.


Introduction
Since Adleman [1] and Lipton [25] presented the ideas of solving difficult combinatorial search problems using DNA molecules, there has been a flood of ideas on how DNA can be used for computations [4] [3] [25] [9] [12] [16] [18] [19] [20] [22] [24] [27] [28] [29] [30].As one liter of water can hold 10 22 bases of DNA, these methods all take advantage of the massive parallelism available in DNA computers.This also raises the hope to solve the problems intractable for electronic computers.However, the brute force approaches usually take long time to complete due to the long process of the problem and for those existing method, they all need to restart the whole process when the initial condition changes.
In this paper, we propose a new class of algorithms to be implemented on a DNA computer.The algorithms we are going to introduce will not be affected much by the initial condition change.This will give DNA computers great flexibility.Knapsack problems are classical problems solvable by this method.It is unrealistic to solve these problems using conventional electronic computers when the size of them get large due to the NP-complete property of these problems.DNA computers using our method can solve substantially large size problems because of their massive parallelism.
Throughout the paper, we make one assumption that all molecular biological procedures are error free.This is not true for the real world, but there is a large amount of finished research work attacking this error-resisting problem [5] [10] [11] [14] [35] [36].These research work also showed many fault tolerant techniques.We hope that errors which arise during our DNA computer operations can be dealt with by the given techniques.
The rest of the paper are organized as follows: the next section will explain the methodology.The description of how NP-complete problems are solved will be presented in section 3. Section 4 will explain how we can avoid going through all the procedures when a little change is made to the initial condition.The last section will conclude this paper and point out future work.

DNA Computation Model
In this section, we include the fundamental model for basic operations to be used in our DNA computation.All DNA operation models necessary for solving those NP-complete problems in the next section are introduced and explained as follows.

reset(S):
This may also be called initialization.It will generate all the strands for the following operations.These strands can be generated either to represent the same value or to represent different values according to the requirement.2. addin(x, S): This step adds a value x to all the numbers inside the set S. 3. sub(x, S): This operation will subtract a value x from all the numbers inside the set S. 4. divide(S, S1, S2, C): This step will separate the set S into two different sets based on the criteria C. If no criterion is given, then components in these two sets are randomly picked from set S and S will be evenly distributed into two sets S1 and S2.

Union(S1, S2, S):
The operation combines, in parallel, all the components of sets S1 and S2 into set S. 6. copy(S, S1): This will produce a copy of S: S1. 7. select(S, C): This operation will select an element of S following criteria C. If no C is given, then an element is selected randomly.

Biological Implementation
In this section, we include the fundamental biological operations for our DNA computation model.

reset(S):
This initialization operation can be accomplished using mature biological DNA operations [1] [4] [6] [13] [15] [30].It will generate a tube of DNA strands representing the same number, e.g., these strands will consist of exactly the same nucleotides with same order.It may also generate different strands to represent different numbers according to the requirement.

addin(x, S):
There are some existing arithmetic operations for DNA computers that have been developed [30] [19].We are going to use the method introduced by [30].This operation will add a number to all the strands in the tube using the method first introduced by [8].After the addition is finished, all the new strands inside the tube will represent the sum of the value represented by the original strand and the number we add in no matter what was in the tube before the operation.Readers may refer to [30] if any detailed information about how to perform this addition is needed.

sub(x, S):
The operation can be accomplished similarly as the addin operation shown above.It will simply subtract the value x from all the strands inside the tube.

divide(S, S1, S2, C):
The necessary operation for this step of DNA computing is to separate one tube of strands into two tubes.Each resultant tube will have approximately half of the strands of the original tube.The criteria C can be containing or not containing a certain segment, e.g.ATTCG, and we may use the metal bead method to extract them [1] [23]. 5. union(S1, S2, S): This operation will simply pour two tubes of strands into one.6. copy(S, S1): We need to make copies of DNA strands of the original tube and double the number of strands we have for this copy operation.The best and easiest method for this will be PCR (Polymerase Chain Reaction).
Because PCR is counted as non-stable by [31] [34] [33] [32], we will try to use this operation as few times as possible.7. select(S, C): This procedure will actually extract out the strand we are looking for.So, it will extract strands from tube S following certain criteria C. We may use existing methods introduced in [2] [21] [25].

One Simplified NP-complete Problem
We will show how to solve a simplified knapsack problem [7], one of the NPcomplete problems which is unsolvable by currently electronic computers.Problem: Given an integer K and n items of different sizes such that the ith item has an integer size k i , find a subset of the items whose sizes sum to exactly K, or determine that no such subset exists [26] [17].
In solving the knapsack problem using DNA computers, we intend to use the methodology presented in the previous section.It can be accomplished as follows: a-1 reset(S): We will generate a large amount of strands in a tube S and all strands in the tube will represent 0, i.e. they are exactly the same.This will assume that all the potential "bags" are empty because each strand is counted as a bag to hold items.a-2 divide(S, S1, S2): This will give us two almost identical sets that are close to the exact copies of S when we have a large amount of strands in the original set S.
a-3 addin(x i , S1): We will add integer x i which represents the size of the ith item to set S1. a-4 union(S1, S2, S): Now we have a mixed set with about half of the potential "bags" containing the ith item while others do not.a-5 Repeat the above steps 2, 3 and 4 until all items have been added in.a-6 select(S, C): The criteria C we are using here is whether number K exists or not.If we find an integer K in the tube, that means we have a subset of the items with sizes sum exactly equals to K. If there is no such kind of strand in the tube, then the answer will be no.
The number of steps the algorithm requires is 3n+2, where n is the total number of items.Unlike another possible solution for this kind of knapsack problem introduced by Baum [7], we do not have any restriction like balanced size of items required by Baum.That means, we can solve all simplified version of knapsack problems within polynomial time as long as the total size of these n items can be represented by the method [30] we use.

An Advanced Problem
In the previous section, we gave the DNA algorithm on one of the NP-complete problems: simplified knapsack problem.Here, we are going to advance the method and try to solve the complete version of the knapsack problem, a more computation intense problem.
Problem: The input is a set X such that each element x ∈ X has an associated size k(x) and value v(x).The problem is to determine whether there is a subset B ∈ X whose total size is ≤ K and whose total value is ≥ V .
Algorithm: The advanced algorithm based on the one for the simplified knapsack problem is shown as follows: b-1 reset(S): We will generate a lot of "empty" strands in the set.Here each strands in the tube will be treated as two parts while both are zeros initially.The first part X will be used to represent the size of the items inside the tube and the second part U is for the value of these items.At the very beginning, because the bags are empty, so both the size and the value are zeros.Each strand will be like 5 − X − U − 3 .We make X and V large enough to hold the total item sizes and total item values so no matter how we operate on them, X and U will not intervene with each other.b-2 divide(S, S1, S2): This will give us two almost identical sets that are close to the exact copies of S when we have a large amount of strands in the original set S. b-3 addin(x i , S1): We will operate on the first set S1.The integer that represents the size of the item:k i and the integer that represents the value of the item:v i will be added into different part of all strands in the set.x i is added to X and v i is added to U .Set S2 will be untouched.The technique that adding different numbers to their expected locations are based on the locators L i and R i shown in [30] b-4 union(S1, S2, S): Now we have a mixed set with about half of the potential "bags" containing the ith item while others do not.b-5 Repeat the above steps 2, 3 and 4 until all items have been added in.
Because we are not looking for a particular number but for the numbers smaller than K while the associate value U is larger than V, we can not easily pick one strand out.So we go through one extra step before the final result extraction.b-6 divide(S, S1, S2, C): The criteria here is X≤K or X>K.Let's assume that S1 contains all strands with X≤K while S2 holds the rest.b-7 select(S1, C): We are going to extract the answer from S1 as S1 is the set that contains those "bags" with items less than full, i.e., X≤ K.As the value of each strand is represented by a certain number of digits, we only need to go through these digits one by one and find the answer larger than V.

Problem Reconsideration
In the previous section, we introduced new algorithms for solving NP complete problems: Knapsack problems.Here we are going to show that the advantage of our algorithm, i.e., unlike other existing algorithms that need to restart the whole computation process when there are changes on the initial condition, our algorithm will only need a few extra operations and the new problem will be solved.This will greatly save time and cost for our DNA computer because usually DNA computing needs a lot of expensive materials and takes very long time, e.g., months, to complete.We first work on the simplified knapsack problem.The initial condition is an integer K and n items of different sizes.After the procedures we showed in section 3.1, we will obtain a bag with size K and have m items inside where m<n.Suppose we want to make a minor change at the initial condition.Let the change be: instead of having n items at the beginning, we lost one of the items.So totally we have n-1 items.If the item is not contributing to the "bag", then nothing will change.If the item is in the "bag" of size K, then we need to generate a totally different new solution.Instead of going through all the steps above, we just add a few new steps to the existing algorithm and it is much easier to obtain the result.
The following are the extra steps we need to add: A-4 select(S, C): This is the exact same procedure as step a-6 shown above in section 3.1.Still, condition C is regarding whether we have a strand representing number K or not.If at least one such strand exists, then we have a solution.Otherwise, there is no combination of K with these n-1 items.
If more than one item have been removed from the initial list, we need to repeat the above extra steps A-2 and A-4 a few times.
For the complete knapsack problem, similar procedures can be used after the initial condition is changed.If the same modification is performed as the example of simplified knapsack problem showed above: one item is removed from the initial list, the following operations are necessary in order to obtain the solution.
B-1 union(S1, S2, S): This will put the remaining potential answers together.B-2 divide(S, S1, S2, C): It will sperate the set S into two sets following the criteria C so that set S1 will contain item i which should be removed from the bag and S2 do not have item i in it.B-3 sub(y i , S1): Subtract item i from set S1.The detailed operation is that y i is subtracted from X and the corresponding value v i is subtracted from U.
Then we only need to perform steps b-6 and b-7 above to see if we have the answer we expected.

Conclusion
In this paper, we attempted to solve a set of problems to which DNA computers can apply.As an example, we demonstrate that simplified knapsack problem can be solved efficiently on our DNA computer.We also extend the algorithm to solve the complete version of knapsack problem.These examples have illustrated the advantage of DNA computers.
We note that knapsack problems are NP complete, whether simplified version or complete version.Our method can solve these problems with different complexities within polynomial time.The biggest advantage of using our method to solve these NP-complete problems comparing with other existing methods is that it not only gives out the correct answer, but also saves a lot of computing time and resouces when there are minor changes to the conditions given.We may also extend our algorithm to what is under investigation: the graph connectivity problem.We may also consider cases when the condition change are huge.The future work will include implementing our algorithms in the biological lab and make it more robust.

A- 1
divide(S, S1, S2, C): Seperate the set S into two sets where S1 contains strands with item Y and S2 contains strands without Y.Y is the item we do not want to count.A-2 sub(S1, Y): S1 is the set left over after the extracting of the previous result.A-3 union(S1, S2, S): Now S will have no strand containing item Y.So Y has been removed.