As we know that the cost of solving a problem on a single processor is the execution time of the fastest known sequential algorithm. A parallel system or a parallel algorithm is said to be cost-optimal when its cost matches the run time of the best known sequential algorithm T_{s} for the same problem. Or in other words a parallel algorithm is cost optimal if the cost of solving a problem on a parallel computer has the same asymptotic growth (in Θ terms) as a function of the input size as the fastest-known sequential algorithm on a single processing element.

The run time of the best known sequential algorithm | ≥ | The run time of the parallel algorithm T_{s} (Cost on parallel algorithm) |

A cost-optimal parallel system has an efficiency of O(1) because the efficiency is the ratio of sequential cost to parallel cost.

## Prove that a cost optimal parallel algorithm exists?

If total number of operations performed by the parallel algorithm is of the same complexity class as an optimal sequential algorithm, then a cost optimal parallel algorithm does exist.

For example; consider the example of adding n numbers. Its parallel algorithm performs n/2 additions in first step, n/4 addition in second step, n/8 addition in third step and soon. Thus executes total of (n – 1) additions over ⌈logn ⌉ iterations. Since both the sequential and parallel algorithm for the problem under consideration perform (n – 1) additions, a cost-optimal variant of parallel algorithm exists.

### Example 1: Is cost of adding n numbers on n processors is cost optimal?

By the reference of example 2.1 for adding n number on n processing elements has parallel computation time is T(n, p) = O (log n) [in worst case], so the processor-time product will yield the corresponding cost as

C(n, p) = O (n log n) [in worst case].

Since the serial runtime or sequential runtime to operation of adding n numbers is O(n).

By using the condition of cost optimality;

The run time of the best known sequential algorithm | ≥ | The run time of the parallel algorithm T_{s} (Cost on parallel algorithm) |

i.e. O(n) ≥ O(n log n)

The above condition is contradictory when we putting the value of n ≥ 11.

Hence, O(n) ≱ O(n log n)

Thus we can say that the cost of adding n numbers on n processors is not cost optimal.

### Example 2: Is adding n numbers on p processing elements is cost optimal? When n > p.

**Proof:** Consider the problem of adding n numbers on p processing elements such that p < n and both n and p are powers of 2. We use the same parallel algorithm as use in Example 2.1 for n processors except in this case simulate n processing elements on p processing elements think them as virtual processors. Each of the p processors is now assigned (n/p) virtual processors. The first (log p) of the (log n) steps of the original algorithm are simulated in [(n/p) log p] steps on p processing elements. Subsequent (log n – log p) steps do not require any communication.

The overall parallel execution time now is Θ ((n/p) log p).

To build it more cost optimal fashion, we have to do the following;

- Each processing element locally adds its n / p numbers in time Θ (n / p).
- The p partial sums on p processing elements can be added in time Θ(n /p).

So the parallel runtime of this algorithm is

T_{p} = Θ ((n/p) + log p).

And the cost = Θ (p × [(n/p) + log p])

[∵Cost is the product of parallel runtime and the number of processing elements used]

= Θ (n + p log p)

As long as n ≈ Θ (p log p),

Hence the cost of this parallel algorithm = Θ (n + n) = Θ (2n) = Θ (n).

To check the cost optimality of this parallel algorithm;

The run time of the best known sequential algorithm | ≥ | The run time of the parallel algorithm T_{s} (Cost on parallel algorithm) |

** ** ∴ Θ (n) ≥ Θ (n)

The above condition is true. Thus we can say that the cost of adding n numbers on p processing elements is cost optimal.

Let us consider an example n = 16 and p = 4.

The steps leading to the solution are shown below figure;

These simple examples demonstrate that the manner in which the computation is mapped onto processing elements may determine whether a parallel system is cost-optimal. Note, however, that we cannot make all non-cost-optimal systems cost-optimal by scaling down the number of processing elements.