0 votes

Let x1, . . . , xn be an input array of n numbers sorted in non-descending order. The problem of 1-D k-means clustering is defined as assigning elements of the input 1-D array into k clusters so that the sum of squares of within-cluster distances from each element to its corresponding cluster mean is minimized. 

in Dynamic Programming by (116 points)

4 Answers

0 votes

Hello Duan! 

         I think this question is just like the last question of the midterm exam.

         Here is my answer:  Assume that we use k servers to divide the array into k clusters. Each server serve as the mean position of the clusters. 

Notation:  sum(k) { server[0], server[1], server[2]... server[k-1]}, server[i] represent the location of server i , and it locates in the change point such that the sum of the left part of cluster[i] is bigger than the right part of that cluster. But if you move the server toward left for one step, then the sum of the left part of that cluster is less than the right part of that cluster. 

Optimality :  still thinking how to prove it.          

Recurrence relation:  sum(k-1) { ...}, we add server k-1 to the position 0 of sum(k-1) {...}. Then according to the location of  server k-2 in sum(k-1){...}, move server k-1. Then According to the location of server k-1, move k-2, then move k-3... finally move 0. And we return to server k-1, go through the same loop until no move happens. Finally we get sum(k) {...}.

 How to move?

if the left part of cluster i is less than the right part of cluster i, move the server i for one step toward right.

Algorithm:

For sum(i) from 0 to k-1:

       put the server i in location 0;

       loop : 

              move the server i according to the location of server i-1 until it can not move;

              move the server i-1 according to the location of server i until it can not move;

             move the server i-2 according to the location of server i-1 until it can not move;

             ...

             move the server 0 according to the location of server 1 until it can not move;

             if no move happens : break;

             else                          :return to the beginning of the loop.

 I guess the time complexity is O[N*K]

by (164 points)
0 votes
Optimality :  If there is another sum(k) {...} B, which minimized the result that we want.

Condition 1:  if the B don't qualify the requirement in Notation. Then there must be at least one server, whose location is not the change point. We can move the server to the change point, such that decrease the sum of that cluster. So the B can not be the optimal solution.

Condition 2:  if the B qualify the requirement in Notation, but it is different from A. We know that if the location of server 0 is defined, then every server's location is defined. So the location of server 0 in B is different from that of A. According to our policy, A must be the first optimal solution.  So the server 0 in B must locate in the right of the server 0 in A.

    Because server 0 in B also qualify the requirement in our Notation, we can move server 0 in B toward left. Such that the result of B will be increased. We keep moving server 0 in B until it arrive at the position of server 0 in A. Then we change the B to A, and we also keep increasing the result of B. According to our policy, this condition couldn't happen because the server 0 in A would move until it can not move. So this condition contradict our policy. So B doesn't exist.
by (164 points)
Nice try. I think you are also going towards the right direction. We can build a bigger solution based on the solution to a smaller question. But if we adjust the position of server(i) i = 0 to i-1every time we calculating server(i+1), the time complexity would be much larger than n*k (maybe n**k).
0 votes

I will prove the time complexity is O (KN)

Consider this algorithm:

For sum(i) from 0 to k-1:

       put the server i in location 0;

       loop : 

              move the server i according to the location of server i-1 until it can not move;

              move the server i-1 according to the location of server i until it can not move;

             move the server i-2 according to the location of server i-1 until it can not move;

             ...

             move the server 0 according to the location of server 1 until it can not move;

             if no move happens : break;

             else                          :return to the beginning of the loop.

 We need to calculate the number of compare. For each server, if the result of judge(compare) is true, then the server will move one step, if the result is false, it will stop and go on to next server. So we need to calculate the numbers of "true" and "false". 

For server 0, it will move for each "for" loop and for each "inner" loop, or the move will not happen. 

So we assume for "for" loop, first server 0 move m1 step, second it move m2 step, finally it move mk step. For each "for" loop, there must be a "false" result of compare. So, for server 0, the total number of compare is " m1+1+m2+1+...+mk+1 = (m1+m2+...mk)+k " < N+K

Such that for the first "for" loop, there is at most m1+1 inner loops. That means for the server i, there are at most m1+1 compares whose results are "false".  For the same reason, for the final "for" loop, we could get maximized number  "mk+1".  For the server i, we could get the total number of compares whose results are "false" is less than " m1+1+m2+1+...+mk+1" <N+K

Because for server i, the total number of compares whose results are "true" are less than N. (it is impossible for server i to move more than N steps). 

Such that for server i, the total number of compares are less than N+K+N = 2N+K.

because we have k servers, so the total number of compares for all servers are less than (2N+K)*K= 2NK + K^2.

So the time complexity is O(NK).

by (164 points)
0 votes
Sorry, I just found that I can not prove the optimility of this solution. Maybe It is not an optimal  solution.
by (164 points)
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...