Effectively using the large numbers of processors (1,000-500,000 and more) available on today’s MPPs often requires careful structuring of the flow of information and computation for many computational tasks. Data output from thousands of processors can also present issues of IO efficiency and usability of the resulting data. We will discuss which MPI constructs kill scalability and techniques for producing scalable codes. Strategies and MPI constructs for overlapping interprocessor communication with computation will be discussed. We will also cover issues of weak and strong scaling, as well as efficient strategies for parallel IO.