The definition of cost of solving a problem on a parallel system is the product of parallel runtime and the number of processing elements used. Mathematically, the cost C(n, p) of a parallel computation is defined as;
C(n, p) = p × T (n, p )
This Cost reflects the sum of the time that each processing element spends solving the problem. Efficiency can also be expressed as the ratio of the execution time of the fastest known sequential algorithm for solving a problem to the cost of solving the same problem on p processing elements.
Note: The cost of solving a problem on a single processing element (single processor) is the execution time of the fastest known sequential algorithm.
Example 2.1: Calculate the cost of adding n numbers on n processors.
Consider the problem of adding n numbers by using n processing elements or n number of processors. Initially, each processing element is assigned one of the numbers to be added and, at the end of the computation; one of the processing elements stores the sum of all the numbers.
Assuming that n is a power of two, then

We can perform this operation in ‘log n ‘ steps by propagating partial sums up a logical binary tree of processing elements.
Below figure illustrates the procedure for n = 16. The processing elements are labeled from 0 to 15. Similarly, the 16 numbers to be added are labeled from 0 to 15. The sum of the numbers with consecutive labels from i to j is denoted by .

Each step shown in above figure consists of one addition and the communication of a single word. The addition can be performed in some constant time, say tc, and the communication of a single word can be performed in time ts + tw (where ts is the sequential time and tw is the time taken to communicate of a single word). Therefore, the addition and communication operations take a constant amount of time. Thus,
T (n, p) = Tp = Θ (log n) —— eq. (1)
∵ The parallel computation time is T(n, p) = Θ (log n) from equation (1), so the processor-time product will yield the corresponding cost as
C(n, p) = Θ (n logn)
Hence the algorithm for adding n numbers on n processing elements has worst case time is a processor-time product of O(n log n).
Also Read: Parallel Algorithm Design Approach