Effectively using the large number of processors (1,000-100,000 and more) available on today?s (c. 2008) MPPs often requires careful structuring of the flow of information and computation for many computational tasks. Data output from thousands of processors can also present issues of efficiency and usable data format. We will discuss which MPI constructs kill scalability and techniques for producing scalable codes. Strategies and MPI constructs for overlapping interprocessor communication with computation will be discussed. We will also cover issues of weak and strong scaling, as well as efficient strategies for parallel IO.