608 Pages • 234,093 Words • PDF • 4.4 MB

Uploaded at 2021-09-24 17:05

This document was submitted by our user and they confirm that they have the consent to share it. Assuming that you are writer or own the copyright of this document, report to us by using this DMCA report button.

Mathematics A Course in Real Analysis provides a rigorous treatment of the foundations of differential and integral calculus at the advanced undergraduate level.

The third part consists of appendices on set theory and linear algebra as well as solutions to some of the exercises. Features • Provides a detailed axiomatic account of the real number system • Develops the Lebesgue integral on n from the beginning • Gives an in-depth description of the algebra and calculus of differential forms on surfaces in n • Offers an easy transition to the more advanced setting of differentiable manifolds by covering proofs of Stokes’s theorem and the divergence theorem at the concrete level of compact surfaces in n • Summarizes relevant results from elementary set theory and linear algebra • Contains over 90 figures that illustrate the essential ideas behind a concept or proof • Includes more than 1,600 exercises throughout the text, with selected solutions in an appendix

• Access online or download to your smartphone, tablet or PC/Mac • Search the full text of this and other titles you own • Make and share notes and highlights • Copy and paste text and figures for use in your own documents • Customize your view by changing font size and layout K22153

w w w. c rc p r e s s . c o m

JUNGHENN

With clear proofs, detailed examples, and numerous exercises, this book gives a thorough treatment of the subject. It progresses from single variable to multivariable functions, providing a logical development of material that will prepare readers for more advanced analysis-based studies.

A COURSE IN

The second part focuses on functions of several variables. It introduces the topological ideas needed (such as compact and connected sets) to describe analytical properties of multivariable functions. This part also discusses differentiability and integrability of multivariable functions and develops the theory of differential forms on surfaces in n.

REAL ANALYSIS

The first part of the text presents the calculus of functions of one variable. This part covers traditional topics, such as sequences, continuity, differentiability, Riemann integrability, numerical series, and the convergence of sequences and series of functions. It also includes optional sections on Stirling’s formula, functions of bounded variation, Riemann–Stieltjes integration, and other topics.

WITH VITALSOURCE ® EBOOK

A COURSE IN

REAL ANALYSIS

HUGO D. JUNGHENN

A COURSE IN

REAL ANALYSIS

K22153_FM.indd 1

1/9/15 4:46 PM

K22153_FM.indd 2

1/9/15 4:46 PM

A COURSE IN

REAL ANALYSIS HUGO D. JUNGHENN

The George Washington University Washington, D.C., USA

K22153_FM.indd 3

1/9/15 4:46 PM

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20150109 International Standard Book Number-13: 978-1-4822-1928-9 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

TO THE MEMORY OF MY PARENTS Rita and Hugo

Contents

Preface

xi

List of Figures

xiii

List of Tables

xvii

List of Symbols

I

xix

Functions of One Variable

1

1 The Real Number System 1.1 From Natural Numbers to Real Numbers 1.2 Algebraic Properties of R . . . . . . . . . 1.3 Order Structure of R . . . . . . . . . . . 1.4 Completeness Property of R . . . . . . . 1.5 Mathematical Induction . . . . . . . . . . 1.6 Euclidean Space . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 3 4 8 12 19 24

2 Numerical Sequences 2.1 Limits of Sequences . . . . . . . . . 2.2 Monotone Sequences . . . . . . . . . 2.3 Subsequences and Cauchy Sequences 2.4 Limits Inferior and Superior . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

29 29 36 38 42

. . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . .

47 47 55 59 63 67

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

73 73 80 85 88 94

3 Limits and Continuity on R 3.1 Limit of a Function . . . . . . . . *3.2 Limits Inferior and Superior . . . 3.3 Continuous Functions . . . . . . . 3.4 Properties of Continuous Functions 3.5 Uniform Continuity . . . . . . . .

. . . .

4 Differentiation on R 4.1 Definition of Derivative and Examples 4.2 The Mean Value Theorem . . . . . . . *4.3 Convex Functions . . . . . . . . . . . 4.4 Inverse Functions . . . . . . . . . . . 4.5 L’Hospital’s Rule . . . . . . . . . . . .

. . . .

. . . . .

. . . . .

vii

viii

Contents 4.6 *4.7

Taylor’s Theorem on R . . . . . . . . . . . . . . . . . . . . Newton’s Method . . . . . . . . . . . . . . . . . . . . . . .

5 Riemann Integration on R 5.1 The Riemann–Darboux Integral . . . . 5.2 Properties of the Integral . . . . . . . . 5.3 Evaluation of the Integral . . . . . . . . *5.4 Stirling’s Formula . . . . . . . . . . . . 5.5 Integral Mean Value Theorems . . . . . *5.6 Estimation of the Integral . . . . . . . . 5.7 Improper Integrals . . . . . . . . . . . . 5.8 A Deeper Look at Riemann Integrability *5.9 Functions of Bounded Variation . . . . *5.10 The Riemann–Stieltjes Integral . . . . . 6 Numerical Infinite Series 6.1 Definition and Examples . . . . . . . 6.2 Series with Nonnegative Terms . . . . 6.3 More Refined Convergence Tests . . . 6.4 Absolute and Conditional Convergence *6.5 Double Sequences and Series . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

107 . . 107 . 116 . 120 . 129 . . 131 . 134 . 143 . . 151 . 152 . 156

. . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

163 . 163 . 169 . 176 . . 181 . 188

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

. . . . . . .

. . . . . . . . . . . .

7 Sequences and Series of Functions 7.1 Convergence of Sequences of Functions . . 7.2 Properties of the Limit Function . . . . . . 7.3 Convergence of Series of Functions . . . . . 7.4 Power Series . . . . . . . . . . . . . . . . .

II

. . . .

Functions of Several Variables

8 Metric Spaces 8.1 Definitions and Examples . . . . 8.2 Open and Closed Sets . . . . . . 8.3 Closure, Interior, and Boundary 8.4 Limits and Continuity . . . . . . 8.5 Compact Sets . . . . . . . . . . *8.6 The Arzelà–Ascoli Theorem . . . 8.7 Connected Sets . . . . . . . . . . 8.8 The Stone–Weierstrass Theorem *8.9 Baire’s Theorem . . . . . . . . .

100 103

193 193 199 204 211

229 . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

231 . . 231 . 238 . 243 . 248 . 255 . 263 . 268 . 275 . 282

9 Differentiation on Rn 9.1 Definition of the Derivative . . . . . . . . . 9.2 Properties of the Differential . . . . . . . . 9.3 Further Properties of the Differential . . . 9.4 Inverse Function Theorem . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

287 . . 287 . 295 . . 301 . 306

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

Contents 9.5 9.6 9.7 *9.8

ix

Implicit Function Theorem . . . . . . Higher Order Partial Derivatives . . . Higher Order Differentials and Taylor’s Optimization . . . . . . . . . . . . . .

10 Lebesgue Measure on Rn 10.1 General Measure Theory . . 10.2 Lebesgue Outer Measure . . 10.3 Lebesgue Measure . . . . . . 10.4 Borel Sets . . . . . . . . . . . 10.5 Measurable Functions . . . .

. . . . . . . . . . . . Theorem . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

312 318 323 330

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

343 . 343 . . 347 . . 351 . 356 . 360

11 Lebesgue Integration on Rn 11.1 Riemann Integration on Rn . . . . . . 11.2 The Lebesgue Integral . . . . . . . . . 11.3 Convergence Theorems . . . . . . . . 11.4 Connections with Riemann Integration 11.5 Iterated Integrals . . . . . . . . . . . . 11.6 Change of Variables . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

367 . . 367 . 368 . 379 . 385 . 388 . 398

12 Curves and Surfaces in Rn 12.1 Parameterized Curves . 12.2 Integration on Curves . 12.3 Parameterized Surfaces 12.4 m-Dimensional Surfaces

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

13 Integration on Surfaces 13.1 Differential Forms . . . . . . . . . . . . 13.2 Integrals on Parameterized Surfaces . . 13.3 Partitions of Unity . . . . . . . . . . . . 13.4 Integration on Compact m-Surfaces . . 13.5 The Fundamental Theorems of Calculus *13.6 Closed Forms in Rn . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

447 . . 447 . . 461 . 472 . 475 . 478 . 495

. . . .

. . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . . .

409 409 412 422 432

III Appendices

503

A Set Theory

505

B Linear Algebra

509

C Solutions to Selected Problems

517

Bibliography

581

Index

583

Preface

The purpose of this text is to provide a rigorous treatment of the foundations of differential and integral calculus at the advanced undergraduate level. It is assumed that the reader has had the traditional three semester calculus sequence and some exposure to elementary set theory and linear algebra. As regards the last two subjects, appendices provide a summary of most of the results used in the text. Linear algebra will not be needed until Part II. The book consists of three parts. Part I treats the calculus of functions of one variable. Here, one can find the traditional topics: sequences, continuity, differentiability, Riemann integrability, numerical series, and convergence of sequences and series of functions. Optional sections on Stirling’s formula, Riemann–Stieltjes integration, and other topics are also included. As the ideas inherent in these subjects ultimately rest on properties of real numbers, the book begins with a careful treatment of the real number system. For this we take an axiomatic rather than a constructive approach, guided as much by the need for efficiency of exposition as by pedagogical preference. Of course, presenting the real number system in this way begs the excellent question as to whether such a system exists. It is a question we do not answer, but the interested reader may wish to consult a text on the construction of the real number system from the natural numbers, or even on the philosophy of mathematics. Part II treats functions of several variables. Many of the results in Part I, such as the chain rule, the inverse function theorem, and the change of variables theorem, have counterparts in Part II. The reader’s exposure to the one-variable results should make the multivariable versions more meaningful and accessible. As might be expected, however, some results in Part II have no counterparts in Part I, the implicit function theorem and the iterated integral (Fubini–Tonelli) theorem being obvious examples. Part II begins with a chapter on metric spaces. Here we introduce the topological ideas needed to describe some of the analytical properties of multivariable functions. Primary among these are the notions of compact set and connected set, which, for example, allow the extension to higher dimensions of the extreme value and intermediate value theorems. The remainder of Part II covers differentiability and integrability of multivariable functions. As regards integrability, we have chosen to develop from the beginning the Lebesgue integral rather than to the extend the Riemann integral to higher dimensions. The additional time required for this approach is, in my view, more than offset xi

xii

Preface

by the enormous added utility of the Lebesgue integral. The last chapter of Part II develops the theory of differential forms on surfaces in Rn . The chapter culminates with proofs of Stokes’s theorem and the divergence theorem for compact surfaces. It is hoped that exposure to these topics at the concrete level of surfaces in Rn will ease the transition to more advanced courses such as calculus on differentiable manifolds. Part III consists of the aforementioned appendices on set theory and linear algebra, as well as solutions to some of the over 1600 exercises found in the text. For convenience, exercises with solutions that appear in the appendix are marked with a superscript S . Exercises that will find important uses later are marked with a downward arrow ⇓. Instructors with suitable bona fides may obtain from the publisher a manual of complete solutions to all of the exercises. The book is an outgrowth of notes developed over many years of teaching real analysis to undergraduates at George Washington University. The more recent versions of these notes have been specifically tested in classes over the last three years. During this period, the typical two-semester course closely followed the non-starred sections of this text: Chapters 1–7 for the first semester and 8–13 for the second. Given the wealth of material, it was necessary to leave some proofs for students to read on their own, a not wholly unfortunate compromise. Material in some starred sections was assigned as optional reading. I would like to express my gratitude to the many students whose critical eyes caught errors before they made their way into these pages. Of course, any remaining errors are my complete responsibility. Special thanks are due to Zehua Zhang, whose enlightened comments have improved the exposition of several topics. Finally, to my wife Mary for her support and understanding during the writing of this book: thank you! Hugo D. Junghenn Washington, D.C. September 2014

List of Figures

1.1 1.2

Supremum and infimum of A . . . . . . . . . . . . . . . . . Greatest integer function . . . . . . . . . . . . . . . . . . . .

12 14

2.1 2.2 2.3 2.4

Convergence of a sequence . . . Squeeze principle . . . . . . . . Interval halving process . . . . Limits supremum and infimum

3.1 3.2 3.3 3.4

Limit of a function . . . . . . . . . L can’t be greater than M . . . . . One-to-one correspondence between Intermediate value property . . . .

. . . . . . . . D and . . . .

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

30 31 39 42

. . . . Q . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

48 53 61 64

Trigonometric inequality . . . . . . Local extrema . . . . . . . . . . . . Mean value theorems . . . . . . . . Convex function . . . . . . . . . . . Convex function inequalities . . . . Intermediate value property implies Intermediate value property implies Newton’s method . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . monotonicity continuity . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. 74 . 80 . . 81 . 86 . . 87 . 89 . 89 . 104

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10

Upper and lower sums . . . . . The partitions P and Q . . . . The partition Pn . . . . . . . . The partitions P 0 , P, and P 00 . Riemann sum . . . . . . . . . . The partitions P x and P y . . . Trapezoidal rule approximation Midpoint rule approximation . . Simpson’s rule approximation . The partition Q . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. 108 . 110 . . 111 . 112 . 113 . 122 . 136 . . 137 . 139 . 159

7.1 7.2

Uniform convergence . . . . . . . . . . . . . . . . . . . . . . 193 Pointwise convergence insufficient . . . . . . . . . . . . . . . . 201

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

xiii

xiv

List of Figures 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11

An open ball is open . . . . . . . . . . . The functions gn and g . . . . . . . . . . Convex and non-convex sets . . . . . . . The neighborhoods Ux and Vx . . . . . . A 2ε net . . . . . . . . . . . . . . . . . . A bounded set in Rn is totally bounded A separation (U, V ) of E . . . . . . . . . C1 (−1, 0) ∪ C1 (1, 0) is path connected . E is path connected . . . . . . . . . . . . A piecewise linear function . . . . . . . . Sawtooth function . . . . . . . . . . . . .

9.1 9.2

The domain of argθ0 . . . . . . . . . . . . . . . . . . . . . . Saddle point . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8

Interval grid . . . . . . . . . . . . . . . Coverings . . . . . . . . . . . . . . . . Middle thirds . . . . . . . . . . . . . . Ternary expansion algorithm . . . . . . Decomposition into half-open intervals K = cl(E) \ U . . . . . . . . . . . . . . The components of fk . . . . . . . . . The components of fk+1 . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. 348 . 349 . 353 . 354 . . 357 . 358 . 363 . 363

11.1 11.2 11.3 11.4 11.5

Partition of an n-dimensional interval Three-dimensional simplex . . . . . . Concentric cube and ball . . . . . . . The paving Qr . . . . . . . . . . . . Theorem of Pappus . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . 367 . 390 . 402 . 403 . 408

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16

Curves in R2 . . . . . . . . . . . . . . . A piecewise smooth curve with tangent Inscribed polygonal line . . . . . . . . Vector field on E . . . . . . . . . . . . Closed curve ϕ . . . . . . . . . . . . . Concatenation of curves . . . . . . . . Tangent spaces at p . . . . . . . . . . . Affine space . . . . . . . . . . . . . . . The inward unit normal . . . . . . . . Normal vector to S at p . . . . . . . . Surface of revolution . . . . . . . . . . Möbius strip . . . . . . . . . . . . . . . −1 The mapping Ga . . . . . . . . . . . . Transition mappings . . . . . . . . . . Stereographic projection . . . . . . . . The mapping dψx . . . . . . . . . . . .

. . . . . vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. 409 . 410 . 412 . 416 . 418 . 419 . 422 . 424 . . 427 . . 427 . 429 . 430 . 434 . 435 . 436 . 438

. . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. 239 . 240 . . 241 . 255 . 256 . . 257 . 268 . . 271 . 272 . 276 . 285 310 330

List of Figures . . . .

xv

12.17 12.18 12.19 12.20

Cylinder-with-boundary . . . . Surface element . . . . . . . . . Induced orientation of Ta∂S . . . Stereographic projection from p

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. 440 . . 441 . 443 . 444

13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 13.12 13.13 13.14 13.15 13.16 13.17

Parallelogram approximation to ϕ(Q) . . . . . Two dimensional simplex . . . . . . . . . . . . A partition of unity subordinate to U1 and U2 The functions h and g . . . . . . . . . . . . . The cubes Wi and Vi . . . . . . . . . . . . . . Regular region E . . . . . . . . . . . . . . . . Annulus in R2 with exterior normal . . . . . . The case a ∈ E . . . . . . . . . . . . . . . . . The case a ∈ bd(E) . . . . . . . . . . . . . . . Regular region in R2 . . . . . . . . . . . . . . Piecewise smooth surfaces . . . . . . . . . . . Oriented cube without bottom face . . . . . . Closed polygon . . . . . . . . . . . . . . . . . Surfaces S1 and S2 with common boundary C Curves contracting to p must pass through q Boundary parametrization . . . . . . . . . . . Star-shaped and non-star-shaped regions . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. 463 . 470 . 472 . 473 . 474 . 483 . 483 . 484 . 485 . 488 . 489 . 490 . 490 . 494 . 495 . . 497 . 499

C.1

Open balls for Exercise 1 . . . . . . . . . . . . . . . . . . . . . 551

List of Tables

4.1 5.1 5.2 5.3

Newton’s method for ex + x − 2 = 0 . R Table for evaluating Rf h by parts . . . Table for evaluating (x + 1)3 e5x dx by A comparison of the methods . . . . .

. . . . . . . . . . . .

105

. . . . . . . . . . . . parts . . . . . . . . . . . . . . . . . . . .

125 125 143

9.1 9.2

Values of ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . . Values of ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . .

333 334

xvii

List of Symbols

R P Q N Z Q I n! aa a≤b b≥a |x| max S min S x+ x− sup A inf A bxc R +∞, −∞ (a, b) (a, b] [a, b) [a,b] n k n

R x·y kxk2 kxk1 kxk∞ a×b limn an an ↑

real number system . . . . . . . . . summation symbol . . . . . . . . . product symbol . . . . . . . . . . . set of natural numbers . . . . . . . set of integers . . . . . . . . . . . . set of rational numbers . . . . . . . set of irrational numbers . . . . . . n factorial . . . . . . . . . . . . . . less than . . . . . . . . . . . . . . . greater than . . . . . . . . . . . . . less than or equal . . . . . . . . . . greater than or equal . . . . . . . . absolute value of x . . . . . . . . . maximum of S . . . . . . . . . . . . minimum of S . . . . . . . . . . . . positive part of x . . . . . . . . . . negative part of x . . . . . . . . . . supremum of A . . . . . . . . . . . infimum of A . . . . . . . . . . . . . greatest integer in x . . . . . . . . . extended real number system . . . . positive infinity, negative infinity . . open interval . . . . . . . . . . . . . left-open interval . . . . . . . . . . right-open interval . . . . . . . . . . closed interval . . . . . . . . . . . . binomial coefficient . . . . . . . . . Euclidean space . . . . . . . . . . . Euclidean inner product . . . . . . Euclidean norm . . . . . . . . . . . `1 norm . . . . . . . . . . . . . . . . max norm . . . . . . . . . . . . . . cross product . . . . . . . . . . . . limit of a sequence . . . . . . . . . . increasing sequence of real numbers

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 4 6 6 6 6 7 8 8 9 9 9 10 10 10 10 12 12 14 15 15 16 16 16 16 21 24 25 25 26 26 27 29 36 xix

xx

List of Symbols an ↓ an ↑ a an ↓ a lim inf n an lim supn an N (a) = Nr (a) lim f (x) x→a x∈E

decreasing sequence of real numbers sequence increases to a . . . . . . . sequence decreases to a . . . . . . . limit infimum of a sequence . . . . limit supremum of a sequence . . . neighborhood of a . . . . . . . . . . limit of f along E . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . .

36 36 36 42 42 47 47

lim f (x)

left-hand limit . . . . . . . . . . . . . . . .

48

lim f (x)

right-hand limit . . . . . . . . . . . . . . .

48

lim f (x)

two-sided limit . . . . . . . . . . . . . . . .

48

lim f (x)

limit at +∞ . . . . . . . . . . . . . . . . .

48

lim f (x)

limit at −∞ . . . . . . . . . . . . . . . . .

48

lim inf f (x) x→a

limit inferior of f along E . . . . . . . . .

56

lim sup f (x)

limit superior of f along E . . . . . . . . .

56

df f = Df = dx D` f (a) = f`0 (a) Dr f (a) = fr0 (a) f (n) Tn (x, a) Rn (x, a) kPk S(f, P) S(f, P) Rb f a Rb f a Rb f a Rba S(f, P, ξ) R f b Va (f ) Sw (f, P, ξ) Rb f dw a S w (f, P) S w (f, P) Rb f dw a Rb f dw a

derivative of f . . . . . . . left-hand derivative at a . right-hand derivative at a . nth derivative of f . . . . Taylor polynomial . . . . . Taylor remainder . . . . . mesh of partition P . . . . lower Darboux sum . . . . upper Darboux sum . . . .

x→a− x→a+ x→a

x→+∞

x→−∞ x∈E

x→a x∈E 0

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . .

73 75 75 77 101 101 107 107 107

lower Darboux integral . . . . . . . . . . .

109

upper Darboux integral . . . . . . . . . . .

109

Riemann–Darboux integral . . . . . . . . . set of Riemann integrable functions on [a, b] Riemann sum . . . . . . . . . . . . . . . . indefinite integral of f . . . . . . . . . . . . total variation of f on [a, b] . . . . . . . . Riemann-Stieltjes sum . . . . . . . . . . . Riemann-Stieltjes integral . . . . . . . . . upper Darboux–Stieltjes sum . . . . . . . . lower Darboux–Stieltjes sum . . . . . . . .

109 110 113 121 152 156 156 160 160

upper Darboux-Stieltjes integral . . . . . .

160

lower Darboux-Stieltjes integral . . . . . .

160

List of Symbols P∞ an = n=1 an limm limn am,n lim P m,n am,n am,n m,n P P∞ ∞ a Pj=1 k=1 P∞j,k f = n n=1 fn Pn∞ n n=0 cn (x − a) −1 R = ρ

infinite series of real numbers . iterated limit . . . . . . . . . . . double limit . . . . . . . . . . . double infinite series . . . . . . . iterated series . . . . . . . . . . infinite series of functions . . . . power series in x about a . . . . radius of convergence . . . . . . a generalized binomial coefficient . n (X, d) metric space . . . . . . . . . . . kxk norm of x . . . . . . . . . . . . (X , k · k) normed vector space . . . . . . . d2 Euclidean metric on Rn . . . . . d1 `1 metric on Rn . . . . . . . . . d∞ max metric on Rn . . . . . . . . B(S) space of bounded f : S → R . . kf k∞ supremum norm f . . . . . . . . `∞ set of bounded sequences . . . . `1 set of summable sequences . . . kak1 `1 norm of a . . . . . . . . . . . d×ρ product metric . . . . . . . . . . Br (x) open ball . . . . . . . . . . . . . Cr (x) closed ball . . . . . . . . . . . . Sr (x) sphere . . . . . . . . . . . . . . C([a, b]) space of cont. f on [a, b] . . . . D([a, b]) space of diff. f on [a, b] . . . . . [a : b] line segment from a to b . . . . cl(E) closure of E . . . . . . . . . . . int(E) interior of E . . . . . . . . . . . bd(E) boundary of E . . . . . . . . . . lim{x→a, x∈E} f (x) limit of f along E . . . . . . . . lim(x,y)→(a,b) f (x, y) double limit . . . . . . . . . . . limx→a limy→b f (x, y) iterated limit . . . . . . . . . . . d(A) diameter of A . . . . . . . . . . d(A, B) distance between A and B . . . C(X, Y ) set of cont. f : X → Y . . . . . ext(E) exterior of E . . . . . . . . . . . C(X) space of cont. f X :→ R . . . . ∂f ∂j f = fxj = ∂x partial derivative of f . . . . . . j ∇f or grad f gradient of f . . . . . . . . . . . dfa : Rn → Rm differential of f at a . . . . . . . f 0 (a) Jacobian matrix of f at a . . . ∂(f1 ,...,fn ) Jf (a) = ∂(x (a) Jacobian of f . . . . . . . . . . 1 ,...,xn ) P

n

xxi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . .

.

. .

. .

163 188 189 190 190 204 211 211 216 231 233 233 233 233 233 233 233 234 234 234 234 238 238 238 240 240 241 243 243 243 248 249 249 261 261 263 274 275 289 289 291 291 292

xxii

List of Symbols ∂mf m

n ∂xi 1 ···∂xm in 1 Dm fx m m1 ,m2 ,...,mn

Tm (x, a) Rm (x, a) λ∗ M = M(Rn ) λ = λn B = B(Rn ) 1A S R + (F) R f dλ f dλ E 1 RL (E) R f (x, z) dz dx Rp Rq f ∗g αn length(ϕ) R f ds ϕ ~ F T~ϕ f1 dx1 + · · · + fn dxn ~ ω·H ϕ Tϕ(u) sign(ϕ) ∂ϕ⊥ ~ϕ N ϕa : Ua → Sa ϕab Sa Rn−1 + Hn−1 ∂Hn−1 ∂S S \ ∂S dxj ωx ω∧η dω ϕ∗ ω area(ϕ) R f dS ϕ

higher order partial derivative . . . . . . .

318

mth total differential of f . . . . . . . . . multinomial coefficient . . . . . . . . . . Taylor polynomial . . . . . . . . . . . . . Taylor remainder term . . . . . . . . . . Lebesgue outer measure . . . . . . . . . . Lebesgue measurable sets . . . . . . . . . Lebesgue measure on Rn . . . . . . . . . Borel measurable sets . . . . . . . . . . . indicator function of A . . . . . . . . . . set of F-measurable simple functions ≥ 0 Lebesgue integral of f . . . . . . . . . . . Lebesgue integral of f on E . . . . . . . space of integrable functions on E . . . . iterated integral . . . . . . . . . . . . . . convolution of f and g . . . . . . . . . . volume of unit ball in Rn . . . . . . . . . length of curve ϕ . . . . . . . . . . . . . line integral over ϕ . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

324 325 327 327 349 351 351 356 362 362 370 370 370 388 389 390 412 415

vector field . . . . . . . . . . . . . . . . unit tangent vector field along ϕ . . . . 1-form in Rn . . . . . . . . . . . . . . . inner product of a form and vector field parameterized m-surface . . . . . . . . tangent space of ϕ . . . . . . . . . . . . sign of parametrization ϕ . . . . . . . . normal vector to surface ϕ . . . . . . . normal vector field . . . . . . . . . . . local parametrization of S . . . . . . . transition mapping . . . . . . . . . . . surface element . . . . . . . . . . . . . Rn−1 with xn−1 > 0 . . . . . . . . . . . Rn−1 with xn−1 ≥ 0 . . . . . . . . . . . boundary of Hn−1 . . . . . . . . . . . . boundary of S . . . . . . . . . . . . . . interior of S . . . . . . . . . . . . . . . multidifferential . . . . . . . . . . . . . differential form . . . . . . . . . . . . . wedge product . . . . . . . . . . . . . . differential of ω . . . . . . . . . . . . . pullback of ω by ϕ . . . . . . . . . . . . area of ϕ . . . . . . . . . . . . . . . . . integral of f on a para. surface . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. .

. .

416 416 417 417 422 422 424 425 426 435 435 435 440 440 440 440 440 448 451 452 454 457 463 463

List of Symbols R

ω ϕ L(U, V) At [T ] det A

integral of a form on a para. surface . . set of linear transformations T : U → V transpose of A . . . . . . . . . . . . . . matrix of T . . . . . . . . . . . . . . . determinant of A . . . . . . . . . . . .

xxiii . . . . .

. 466 . 510 . . 511 . 513 . 514

Part I

Functions of One Variable

Chapter 1 The Real Number System

If the notion of limit is the cornerstone of analysis, then the real number system is the bedrock. In this chapter we provide a description of the real number system that is sufficiently detailed to allow a careful development of limit in the various forms that appear in this book. The real number system is defined as a nonempty set R together with two algebraic operations, called addition and multiplication, and an ordering less than that collectively satisfy three sets of axioms: the algebraic or field axioms, the order axioms, and the completeness axiom. These are discussed in Sections 1.2–1.4. We begin, however, with a brief description of how the real number system may be constructed from a more fundamental number system.

1.1

From Natural Numbers to Real Numbers

A rigorous construction of the real number system starts with the set of natural numbers (positive integers) N and then proceeds to the set of integers Z, the rational number system Q, and, finally, the real number system R. In this approach the natural numbers are assumed to satisfy a set of axioms called the Peano Axioms. These are used to define the operations of addition and multiplication in N. Subtraction is introduced by enlarging the system of natural numbers to Z, thereby allowing solutions of all equations of the form x + m = n, m, n ∈ Z. To obtain division, Z is enlarged to Q by forming all quotients m/n, where m, n ∈ Z, n 6= 0. In this system, one may solve all equations of the form ax + b = c, a 6= 0. The final step, the construction of R from Q, may be viewed as “filling in the gaps” of the rational number line, these gaps corresponding to the so-called irrational numbers.1 For the details of this “bottom up” approach, the interested reader is referred to [7] or [10]. We shall instead take a “top down” approach, describing the real number system axiomatically. 1 This step results in a system that, while having the structure necessary to formulate a robust theory of limits, does not allow solutions of all polynomial equations. This shortcoming is removed by introducing complex numbers, a subject outside the scope of this book.

3

4

A Course in Real Analysis

1.2

Algebraic Properties of R

In this section we list the axioms that govern the use of addition (+) and multiplication (·) in the real number system. These axioms lead to all of the familiar algebraic properties of real numbers. The operations of addition and multiplication satisfy the following field axioms, where a, b, c denote arbitrary members of R: • Closure under addition: a + b ∈ R. • Associative law of addition: (a + b) + c = a + (b + c). • Commutative law of addition: a + b = b + a. • Existence of an additive identity: There exists a member 0 of R such that a + 0 = a for all a ∈ R. • Existence of additive inverses: For each a ∈ R there exists a member −a of R such that a + (−a) = 0. • Closure under multiplication: a · b ∈ R. • Associative law of multiplication: (a · b) · c = a · (b · c). • Commutative law of multiplication: a · b = b · a. • Existence of a multiplicative identity: There exists a real number 1 6= 0 such that a · 1 = a for all a ∈ R. • Existence of multiplicative inverses: For each a 6= 0 there exists a member a−1 of R such that a · a−1 = 1. • Distributive law: a · (b + c) = a · b + a · c. We use the following standard notation: a = a/b = ab−1 , b a + b + c = (a + b) + c = a + (b + c), abc = (ab)c = a(bc), a − b = a + (−b), ab = a · b,

an = aa · · · a}, a−n = 1/an (a 6= 0), and a0 = 1. | {z n

We also use the summation and product symbols n X j=m

aj = am + am+1 + · · · + an and

n Y

P

and

Q

defined by

aj = am am+1 · · · an .

j=m

The field axioms may be used to derive the standard rules of algebra. Some of these are given in the following proposition; others may be found in Exercise 1.

The Real Number System

5

1.2.1 Proposition. The following algebraic properties hold in R: (a) The additive identity is unique; that is, if 00 is a real number such that a + 00 = a for all a ∈ R, then 00 = 0. (b) The additive inverse of a real number is unique; that is, if a + b = 0, then b = −a. (c) The multiplicative identity is unique; that is, if 10 is a real number such that a · 10 = a for all a ∈ R, then 10 = 1. (d) a · 0 = 0 for all a ∈ R. (e) The multiplicative inverse of a nonzero real number is unique; that is, if ab = 1, then b = 1/a. (f) If ab = 0, then either a = 0 or b = 0. (g) If ab = ac and a 6= 0, then b = c. (h) If b 6= 0 and d 6= 0, then a/b = c/d if and only if ad = bc. (i) If a 6= 0 and b 6= 0, then (ab)−1 = a−1 b−1 , or

1 1 1 = . ab a b

Proof. (a) If a + 00 = a for all a then, in particular, 0 + 00 = 0. But, by definition of 0 and commutativity of addition, 0 + 00 = 00 . Therefore 00 = 0. (b) By associativity and commutativity of addition, b = b + 0 = 0 + b = (−a + a) + b = −a + (a + b) = −a + 0 = −a. (c) If a · 10 = a for all a then, in particular, 1 · 10 = 1. But, by definition of the multiplicative identity and commutativity of multiplication, 1 · 10 = 10 . Therefore 10 = 1. (d) By the distributive property, a · 0 = a(0 + 0) = a · 0 + a · 0. Adding −(a · 0) to both sides of this equation and using associativity of addition produces the desired equation. (e) By associativity and commutativity of multiplication, b = 1 · b = (a−1 a)b = a−1 (ab) = a−1 · 1 = a−1 . (f) Assume a 6= 0. By (d) and commutativity and associativity of multiplication, 0 = a−1 · 0 = (a−1 )(ab) = (a−1 a)b = 1 · b = b.

6

A Course in Real Analysis

(g) By commutativity and associativity of multiplication, b = 1 · b = (a−1 a)b = a−1 (ab) = a−1 (ac) = (a−1 a)c = c. (h) If a/b = c/d, then multiplying both sides by bd and using the commutativity and associativity of multiplication we obtain ad = bc. Conversely, if ad = bc, then multiplying both sides by 1/(bd) yields a/b = c/d. (i) By associativity and commutativity of multiplication, (ab)(a−1 b−1 ) = (aa−1 )(bb−1 ) = 1. Now apply (e). The reader will notice that the assertions in the proposition are implications, that is, statements of the form p implies q, frequently written p ⇒ q. Such assertions may be proved directly by assuming p and then deducing q, or indirectly by assuming the negation of q and arguing to a contradiction or to the negation of p. Part (h) also contains an assertion of the form p if and only if q (hereafter, shortened to p iff q). Such an assertion is established by proving both p ⇒ q and q ⇒ p. Throughout the text, we shall encounter many examples of such proofs. The reader is advised that a careful proof requires that each (nontrivial) step be justified by citing hypotheses, appropriate axioms, or previously proved results. One more point of logic: To prove that a general statement involving the universal quantifier “for every” (or “for all”) is false, one must construct a counterexample. For example, the assertion that xy = x + y for all real numbers x and y is clearly false. For a proof, one need only find a single pair of numbers x and y such that xy = 6 x + y, for example x = y = 1. On the other hand, to prove that x2 − y 2 = (x − y)(x + y) for all real numbers x and y, it not sufficient to verify the statement for a specific pair of numbers; a general proof is needed here. For details on constructing proofs in mathematics, the reader is referred to [2]. The number systems described in Section 1.1 are summarized as follows: • N = {1, 2 := 1 + 1, 3 := 2 + 1, . . .}

(positive integers),

• Z = {0, ±1, ±2, ±3, . . .} • Q = {m/n : m, n ∈ Z, n 6= 0}

(integers), (rational numbers),

• I=R\Q

(irrational numbers).

An integer is said to be even (odd) if n = 2k (n = 2k + 1) for some k ∈ Z. A precise definition of N is given in Section 1.5. From this it is possible to argue rigorously that the number system N is closed under addition and multiplication. As a consequence, Z is closed under addition, subtraction, and multiplication, and Q is closed under addition, subtraction, multiplication, and division (Exercise 2).

The Real Number System

7

Exercises 1. Prove the following properties of addition and multiplication in R: (a) −(−a) = a.

(b)S −(ab) = (−a)b = a(−b).

(c)⇓2 (−a)(−b) = ab.

(d)S (−1)a = −a.

a/b ad ad = = . c/d bc bc c ad + bc a . (f)S If b 6= 0 and d 6= 0, then + = b d bd (e) If b, d 6= 0, then

2. Let r, s ∈ Q. Assuming that Z is closed under addition and multiplication, prove that r ± s, rs, r/s ∈ Q, the last provided that s 6= 0. 3.S If r 6= 0 ∈ Q and x ∈ I, prove that r ± x, rx, r/x ∈ I. 4. Let n ∈ N. Prove the following identities without using mathematical induction: n X n 3 n (a) ⇓ x − y = (x − y) xn−j y j−1 . j=1

(b)

xn + y n = (x + y)

n X (−1)j−1 xn−j y j−1 if n is odd. j=1

(c)

x−n − y −n = (y − x)

n X

xj−n−1 y −j if x 6= 0 and y 6= 0.

j=1

5.S Define 0! = 1 and, for n ∈ N, define n! = n(n − 1) · · · 2 · 1 (n factorial). Prove the following: n! (a) (1 − 1/n)(1 − 2/n) · · · 1 − (n − 1)/n = n . n (2n)! (b) 1 · 3 · 5 · · · (2n − 1) = n . 2 n! 6. ⇓4 For n ∈ Z+ and k = 0, 1, . . . , n, define the binomial coefficient n n! = k!(n − k)! k (read “n choose k”). Prove that n+1 n n = + . k k−1 k 2 This

exercise will be used in 1.3.2. exercise will be used in 4.1.2. 4 This exercise will be used in 1.5.5. 3 This

8

A Course in Real Analysis 7. Without using mathematical induction, prove that for any n ∈ N, n n X 2 X 1 1 (a) = . (k + 1)(n − k + 1) n+2 k+1 (b)

k=0 n X

k=0

k=0

n

1 X 1 1 = . (2k + 1)(2n − 2k + 1) n+1 2k + 1 k=0

8.S Find a polynomial f (x) of degree 2 such that

1.3

Pn

k=1

f (k) = n3 for all n.

Order Structure of R

The order relation on R is derived from the following order axiom. There exists a nonempty subset P of R, closed under addition and multiplication, such that for each x ∈ R exactly one of the following holds: x ∈ P, −x ∈ P, or x = 0. The last part of the axiom is known as the trichotomy property. A real number x is called positive if x ∈ P and negative if −x ∈ P. 1.3.1 Definition. Let a and b be real numbers. If b − a ∈ P, we write a < b or b > a and say that a is less than b or that b is greater than a. ♦ 1.3.2 Proposition. The order relation < on R has the following properties: (a) a < b iff −a > −b (reflection property). (b) If a < b and b < c, then a < c (transitive property). (c) If a < b and c < d, then a + c < b + d (addition property). (d) If a < b and c > 0, then ac < bc (multiplication property). (e) For a, b ∈ R, exactly one of the following is true: a = b, a < b, or b < a (trichotomy property). (f) If x 6= 0, then x2 > 0. In particular, 1 > 0. Proof. (a) a < b iff (−a) − (−b) = b − a ∈ P iff −a > −b. (b) By hypothesis, b − a ∈ P and c − b ∈ P, hence, by closure under addition, c − a = (b − a) + (c − b) ∈ P, that is, a < c. (c) Similar to (b). (d) Since b − a, c ∈ P, bc − ac = (b − a)c ∈ P, that is, ac < bc. (e) This follows by applying the trichotomy property of P to a − b. (f) If x > 0, then, by closure of P under multiplication, x2 > 0 . If x < 0, then −x > 0 so, by Exercise 1.2.1(c), x2 = (−x)(−x) > 0.

The Real Number System

9

1.3.3 Definition. Let a and b be real numbers. If either a < b or a = b, we write a ≤ b or b ≥ a and say that a is less than or equal to b or that b is greater than or equal to a. If A ⊆ R, we define A+ = {x ∈ A : x ≥ 0}. ♦ Note that by the trichotomy property, a ≤ b and b ≤ a ⇒ a = b.

(1.1)

The inequality a ≤ b is sometimes called weak inequality in contrast to strict inequality a < b. The reader may check that parts (a)–(d) of the above proposition are valid if strict inequality is replaced by weak inequality. 1.3.4 Definition. The absolute value of a real number x is defined by ( x if x ≥ 0, |x| = −x if x < 0.

♦

For example, |0| = 0 and |2| = | − 2| = 2. 1.3.5 Proposition. Absolute value has the following properties: (a) |x| ≥ 0.

(b) |x| = 0 iff x = 0.

(d) − |x| ≤ x ≤ |x|.

(c) | − x| = |x|. x |x| (y 6= 0). (f) = y |y|

(e) |xy| = |x| |y|. (g) |x + y| ≤ |x| + |y|. (h) |x| − |y| ≤ |x − y|. (triangle inequalities)

Proof. Properties (a)–(e) are easily established by considering cases. For example, in (e), if x ≥ 0 and y ≤ 0, then xy ≤ 0, hence |xy| = −(xy) = x(−y) = |x| |y|. For part (f), use (e) to obtain x x |x| = y = |y|, y y and then divide both sides by |y|. For (g), we have ±x ≤ |x| and ±y ≤ |y| by (d), hence ±(x + y) ≤ |x| + |y|. Since one of the signed quantities on the left is |x + y|, the assertion follows. From (g) we have |x| = |(x − y) + y| ≤ |x − y| + |y|, hence |x| − |y| ≤ |x − y|. Switching x and y and using (c) yields (h).

10

A Course in Real Analysis

1.3.6 Definition. Let S be a nonempty set of real numbers. The largest element or maximum of S is a member max S of S that satisfies max S ≥ s for all s ∈ S. The smallest element or minimum of S, denoted by min S, is defined analogously. A set may not have a largest or smallest member. The existence of max S and min S for a nonempty finite set may be established by mathematical induction. (See Exercise 1.5.2.) 1.3.7 Definition. The positive and negative parts of a real number x are defined by x+ = max{x, 0} and x− = max{−x, 0}. ♦

Exercises Prove the following: 1. (a) If ab > 0, then a and b have the same sign. (b) a > 0 iff 1/a > 0. (c)S Suppose either b, d < 0 or b, d > 0. Then a/b > c/d iff ad > bc. 2. If x > 1, then x2 > x. If 0 < x < 1, then x2 < x. 3. (a) If 0 < x < y and 0 < a < b, then 0 < ax < by. (b) If x < y < 0 and a < b < 0, then 0 < by < ax. (c) Let x, y > 0. Then x < y iff x2 < y 2 . 4.S If either 0 < x < y or x < y < 0, then 1/y < 1/x. 5. If −1 < x < y or x < y < −1, then x/(x + 1) < y/(y + 1). What if x < −1 < y? 6. If 0 < x < y and n ∈ N, then (a)S 0 < y n − xn ≤ n(y − x)y n−1 , 7. If x > 1, m, n ∈ N, and

(b)

ny + 1 (n + 1)y + 1 < . nx + 1 (n + 1)x + 1

x−1 m < < 1, then n > x. x n

8.S If a < b and 0 < t < 1, then a < ta + (1 − t)b < b. In particular, a < (a + b)/2 < b. 9. x2 + y 2 + axy ≥ 0 for all x, y ∈ R iff |a| ≤ 2. 10.S If a ≤ b + x for every x > 0, then a ≤ b.

The Real Number System

11

11. If 0 < a ≤ bx for every x > 1, then a ≤ b. 12. If a/x ≤ x + 1 for every x > 0, then a ≤ 0. 13. For all x, y, z, w ∈ R, (a) 2xy ≤ x2 + y 2 .

(b) S xy + yz + xz ≤ x2 + y 2 + z 2 .

(c) (xy + zw)2 ≤ (x2 + z 2 )(y 2 + w2 ). (d) (x + y)2 ≤ 2(x2 + y 2 ). 14.S If x, a > 0, then x + a2 /x ≥ 2a. Equality holds iff x = a. 15. (a) |x − y| ≤ |x − z| + |z − y|. (b) |x − L| < ε iff L − ε < x < L + ε. 16. Let S, T ⊆ R be finite and nonempty. Define −S := {−s : s ∈ S}. Then (a) max(−S) = − min S. (b) min(−S) = − max S. (c) max(S ∪ T ) = max{max S, max T }. (d) min(S ∪ T ) = min{min S, min T }. 17. For any x, y ∈ R, (a) x+ ≥ 0, x− ≥ 0, x = x+ − x− , and |x| = x+ + x− . (b) x+ = |x| + x /2 and x− = |x| − x /2. (c) x = y − z and |x| = y + z imply y = x+ and z = x− . (d) (x + y)+ ≤ x+ + y + and (x + y)− ≤ x− + y − . (e) (x − y)− ≤ y, if x, y ≥ 0. 18.S If a ≤ x ≤ b, then |x| ≤ max{|a|, |b|}. 19. (a) max{x, y} = x + y + |x − y| /2. (b) min{x, y} = x + y − |x − y| /2. 20. (a) max{a, b, c} = (b) min{a, b, c} =

1 4 1 4

a + b + 2c + |a − b| + a + b − 2c + |a − b| . a + b + 2c − |a − b| − a + b − 2c − |a − b| .

21.S Let S = {a1 , . . . , an }, where a1 < · · · < an . Let 1 ≤ k < n and denote by S1 , . . . , Sm thesubsets obtained by removing exactly k members from n S, where m = is the binomial coefficient (see Theorem 1.5.5). Then k max min S1 , . . . , min Sm = ak+1 .

12

1.4

A Course in Real Analysis

Completeness Property of R

A system (F, +, ·, 0 there exists n ∈ N such that na > b. Proof. Suppose, for a contradiction, that na ≤ b for all n ∈ N. The set S = {na : n ∈ N} is then bounded above and hence has a least upper bound u. Since u − a < u, the approximation property for suprema implies that u − a < na for some n ∈ N. But then u < (n + 1)a ∈ S, contradicting that u is an upper bound for S. 1.4.5 Example. Let n A = (−1)n

o n 1 2 3 o n : n ∈ N = − , ,− ,... . n+1 2 3 4

14

A Course in Real Analysis

Since A is bounded above by 1 and below by −1, −1 ≤ inf A ≤ sup A ≤ 1. Let 0 < r < 1. By the Archimedean principle we may choose an even integer n such that n > r/(1 − r). Then r < n/(n + 1) ∈ A, which shows that r cannot be an upper bound of A. Therefore, sup A = 1. Similarly, inf A = −1. ♦ 1.4.6 Well-Ordering Principle. Every nonempty subset A of N has a smallest member. Proof. Since A is bounded below by 1, it has a greatest lower bound `. The theorem will follow if we show that ` ∈ A. Suppose, for a contradiction, that ` 6∈ A. By the approximation property for infima, there exists a ∈ A such that ` < a < ` + 1. Choose any real number r with ` < r < a, for example, r = (a + `)/2. By the approximation property again, there exists a0 ∈ A such that ` < a0 < r. We now have ` < a0 < a < ` + 1, which implies that a − a0 is an integer strictly between 0 and 1. As this is impossible,5 it follows that ` must be a member of A. 1.4.7 Greatest Integer Function. For each x ∈ R there exists a unique integer bxc such that x − 1 < bxc ≤ x. Proof. The uniqueness is clear. To prove existence, apply the Archimedean principle twice: first to obtain an integer k such that x + k ≥ 1 and then to conclude that the set A := {n ∈ N : n > x + k} is nonempty. By the well-ordering principle, A has a least member a. Since 1 ≤ x + k < a, a − 1 is a positive integer. Since a − 1 < a, a − 1 cannot be in A so x + k ≥ a − 1. Therefore, x − 1 < a − 1 − k ≤ x, hence the integer bxc := a − 1 − k has the required property.

y 3 2 1 −3 −2 −1

1 −1

2

3

x

−2

−3

FIGURE 1.2: Greatest integer function. The integer bxc is called the greatest integer in x or the floor of x. The greatest integer function allows a simple proof of the following important result: 5 This is intuitively clear. The abstract definition of N given in Section 1.5 may be used to give a rigorous proof.

The Real Number System

15

1.4.8 Density of the Rationals. Between any pair of distinct real numbers there is a rational number. Proof. Let a < b. By the Archimedean principle, n(b − a) > 1 for some n ∈ N. Let m := bnac + 1. Then na < m ≤ na + 1 < nb, hence a < m/n < b. 1.4.9 Definition. (nth roots). Let n be a positive integer and let b > 0. The unique positive solution of the equation xn = b is called the positive nth root of b and by b1/n . For m ∈ Z we define bm/n = (b1/n )m . As usual we √ is denoted 1/2 write b for b . ♦ The existence of b1/n is an easy consequence of the intermediate value theorem, proved in Chapter 3. Uniqueness follows from Exercise 1.2.4(a). We omit the straightforward (but admittedly tedious) proof of the following theorem that summarizes the familiar rules of rational exponentiation. 1.4.10 Theorem. For r, s ∈ Q and positive real numbers a, b, br = br−s , (br )s = brs , and (ab)r = ar br . bs The following proposition gives a simple way to generate irrational numbers. br bs = br+s ,

1.4.11 Proposition. If n is positive integer that is not a perfect square, then √ n is irrational. √ √ √ Proof. √By definition of the greatest integer function, n − 1 < b nc ≤ n. Since n is is strict, √ √ assumed √ not to be an integer, the second inequality hence 0 < n − b nc < 1. Suppose, for a contradiction, that n is rational. √ Then the set A := {m ∈ N : m n ∈ N} is nonempty.√By the well-ordering principle, A has a least member m0 . In particular, m0 n ∈ N, hence both of the quantities √ √ √ √ √ m := m0 n − b nc and m n = m0 n − nb nc are positive√integers. But then m ∈ A, which is impossible since m < m0 . Therefore, n must be irrational. In later chapters, we shall see other important examples of irrational numbers, notably the base e of the natural logarithm. 1.4.12 Definition. The extended real number system is the set R := R ∪ {−∞, +∞}, where +∞, −∞ are symbols with the following prescribed properties: −∞ < x < ∞ for all x ∈ R, x + ∞ = +∞ if − ∞ < x ≤ +∞, x · (+∞) = +∞ if 0 < x ≤ +∞,

x − ∞ = −∞ if − ∞ ≤ x < +∞, x · (+∞) = −∞ if − ∞ < x < 0,

x · (−∞) = −∞ if 0 < x < +∞, x · (−∞) = +∞ if − ∞ ≤ x < 0, x x = = 0 if − ∞ < x < +∞. +∞ −∞

♦

16

A Course in Real Analysis

The above algebraic conventions are derived from limit considerations. Note that the operations ±∞ ∓ ∞, (±∞) · (∓∞),

±∞ ±∞ , , ±∞ ∓∞

and 0 · (±∞)

(1.2)

are not defined. 1.4.13 Definition. If A 6= ∅ is not bounded above, we set sup A = +∞. Similarly, if A is not bounded below, we set inf A = −∞. We also define sup ∅ = −∞ and inf ∅ = +∞. ♦ The reader may verify that the approximation properties for suprema and infima given in 1.4.3 hold in the extended system R. 1.4.14 Definition. An interval in R is a nonempty set I with the property that a, b ∈ I and a < x < b imply that x ∈ I. An interval containing more than one point is said to be nondegenerate. ♦ Arguing cases, one may show that the definition of interval reduces to the following familiar subsets of R: (a, b) := {x : a < x < b},

(a, b] := {x : a < x ≤ b},

[a, b) := {x : a ≤ x < b},

[a, b] := {x : a ≤ x ≤ b}.

For example, if I is unbounded below and bounded above with b := sup I ∈ I, then I = (−∞, b]. If, instead, I is bounded below and above such that a := inf I ∈ I and b := sup I 6∈ I, then I = [a, b). Intervals that contain their endpoints are said to be closed; those that don’t contain any endpoints are called open. The length |a − b| of a finite interval I with endpoints a, b will be denoted by |I|. Note that the length of a degenerate interval is zero.

Exercises 1. Prove that inf (−A) = − sup A, where −A := {−a : a ∈ A}. Conclude that every nonempty subset of R that is bounded below has a greatest lower bound. 2. Find the supremum and infimum of the following sets, where rn denotes the remainder on division of n ∈ N by 3. 6 (−1)n n(rn − 1) (a) S {(−1)n (rn2 + 3rn + 2) : n ∈ N}. (b) S :n∈N . (n + 1)(rn + 1) ( ) (−1)bn/3c − 1 n n 3n + 2 (c) (−1) :n∈N . (d) :n∈N . 2n + 3 n+1 6 For

the existence of rn , see Exercise 1.5.15.

The Real Number System

17

3. Find the supremum and infimum of the following sets. (a) {x : x2 − 5x + 6 < 0}.

(b) {x : (x + 3)(x − 4) < −6}.

(c) {x : (x − 4)/(x − 3) < −2}.

(d) S {x : x − 2 < 1/(x − 1)}.

(e) S {x : (x − 1)/x < 4}.

(f)

(g) {x : |x − 3x + 2| ≤ 1/4}. p (i) S {x : x − 1/8 > x}.

(h) {x : |x − 1| + |x − 2| ≤ 3}. p (j) {x : x + 1/8 > x}.

S

2

{x > 0 : x/(2 − x) > 3}. S

(k) {x : 2|x − 1| + 3|y − 2| < 6 for some y ∈ R}. (l) {x : 2 x2 − 1 + 3 y 2 − 2 < 6 for some y ∈ R}. (m)S (−1)n sin(nπ/2) − n−1 : n ∈ N . (n) (−1)n sin(mπ/2) − n−1 : m, n ∈ N . 4. Let A ⊆ B be nonempty subsets of R. Prove that sup A ≤ sup B and inf A ≥ inf B. 5.S ⇓7 For a nonempty bounded set A define |A| := {|a| : a ∈ A}. Prove that sup |A| − inf |A| ≤ sup A − inf A. Hint. Use |x| − |y| ≤ |x − y|. 6. For r ∈ Q, x ∈ R, and nonempty subsets A and B of R, define xA = {xa : a ∈ A} AB = {ab : a ∈ A, b ∈ B}

A + B = {a + b : a ∈ A, b ∈ B} Ar = {ar : a ∈ A}, A ⊆ (0, +∞).

Under the conventions described in 1.4.12, prove that (a) sup (A + B) ≤ sup A + sup B, inf (A + B) ≥ inf A + inf B. (b)S sup (xA) = x sup A, inf (xA) = x inf A if x ≥ 0. (c) sup (AB) ≤ (sup A)(sup B) and inf (AB) ≥ (inf A)(inf B) if A, B ⊆ (0, ∞). (d) sup Ar = (sup A)r , inf Ar = (inf A)r if A ⊆ (0, ∞) and r > 0. (e) sup A−1 = 1/ inf A, inf A−1 = 1/ sup A if A ⊆ (0, ∞). 7. Let A ⊆ R be nonempty such that inf{|x − y| : x, y ∈ A, x 6= y} > 0 (for example, any set of integers). If A is bounded above, prove that sup A ∈ A, that is, A has a maximum. 8. Let A be a nonempty bounded set and let r ∈ R such that x − y < r for all x, y ∈ A. Show that sup A − inf A ≤ r. 9.S Prove that between any pair of distinct real numbers there is an irrational number. 7 This

exercise will be used in 5.2.6.

18

A Course in Real Analysis

10. Prove that between any pair of real numbers a < b there exist infinitely many rational numbers and infinitely many irrational numbers. 11. (Density of the dyadic rationals). Prove that for each pair of real numbers a < b there exists m ∈ Z and n ∈ N such that a < m/2n < b. (Suggestion. You might want to use the fact that 2n > n, a consequence of the binomial theorem, proved in the next section.) A number of the form m/2n is called a dyadic rational. 12. Prove: (a) bxc = b−xc iff x = 0.

(b)S bxc = −b−xc iff x ∈ Z.

(c)S −1 < x + b−xc ≤ 0.

(d) bxc + bm − xc = m or m − 1.

13. Let m ∈ Z, n ∈ N, xj ∈ R, and define s :=

n X

xj

and

t :=

j=0

n X

bxj c.

j=0

Prove: (a) 0 ≤ bsc − t ≤ n.

(b) k ≤ s − t < k + 1 for some k = 0, 1, . . . , n. 1/n

14.S Let b > 0. Prove that bm/n = (bm )

.

15. ⇓8 Prove that for a, b > 0 and n ∈ N, 1/n

a

−b

1/n

= (a − b)

X n

1−j/n (j−1)/n

a

b

−1 .

j=1

16. Show that if 0 ≤ a < b and n ∈ N, then a1/n < b1/n . 17.S Prove that if A is a bounded set, then there exists an integer N such that |x| ≤ N for all x ∈ A. √ 18. Let a, b ∈ Q \ {0} and n ∈ N. Prove that x := a + b n is irrational iff n is not a perfect square. √ √ 19. Show that if x, y ∈ Q( 2), then √ x±y, xy, x/y ∈ Q( 2), the last provided that √ y 6= 0. Conclude that Q( 2) is an ordered subfield of R. Show that Q( 2) is not complete. √ √ 20.S (a) Find all n ∈ N such that n + 11 + n ∈ Q. √ √ (b) Same question for n + 21 + n. 21. Let √ p ∈ N√be prime, that is, divisible only by 1 and itself. Prove that ( n + 1)( n + p + 1)−1 ∈ Q iff n = (p − 1)2 /4. 8 This

exercise will be used in 4.1.2.

The Real Number System

1.5

19

Mathematical Induction

In this section we give an abstract characterization of the natural number system. This will lead directly to the principle of mathematical induction. 1.5.1 Definition. A set S of real numbers is said to be inductive if • 1 ∈ S, • x ∈ S implies x + 1 ∈ S. The set N of natural numbers is then defined as the intersection of all inductive subsets of R. ♦ The sets (a, +∞), and (a, +∞) ∩ Q, a < 1, are clearly inductive. More importantly, N itself is inductive. Indeed, since 1 is common to all inductive sets, 1 ∈ N, and if n is common to all inductive sets, then so is n + 1. We may therefore characterize N as the smallest inductive set (in the sense of set inclusion). The principle of mathematical induction follows immediately from this characterization: 1.5.2 Principle of Mathematical Induction. For each n ∈ N, let P (n) be a statement depending on n. Suppose that (a) P (1) is true, (b) P (n + 1) is true whenever P (n) is true. Then P (n) is true for all n. Proof. Let S denote the set of n ∈ N for which P (n) is true. Then (a) and (b) imply that S is inductive and hence, as a subset of N, must in fact equal N. In a particular application of 1.5.2, part (a) is called the base step and part (b) the inductive step. The assumption in (b) that P (n) is true is called the induction hypothesis. The principle of mathematical induction has been loosely described as the “domino principle”: If dominoes are lined up vertically in such a way that the (n + 1)st domino will fall if the nth one falls, then, if the first domino is tipped, all the dominoes will fall. Mathematical induction may be used to give a rigorous proof that N is closed under addition: Let P (n) be the statement that n + m ∈ N for all m ∈ N. Then P (1) is true because N is inductive, and if, for some n, P (n) is true, that is, if n + m ∈ N for all m, then clearly P (n + 1) is true. A similar argument shows that N is closed under multiplication. Mathematical induction is indispensable in proving many useful inequalities and formulas. We offer two examples; others may be found in the exercises.

20

A Course in Real Analysis

1.5.3 Example. We prove by induction that 3n n! > nn for all n ∈ N. This is obvious for n = 1. For the induction step, we need the fact (verified in Example 2.2.4) that (1 + 1/n)n < 3, or equivalently, (n + 1)n < 3nn , for all n. Assuming this, we see that if 3n n! > nn , then 3n+1 (n + 1)! = 3(n + 1)3n n! > 3(n + 1)nn > (n + 1)n+1 . ♦ Pn 1.5.4 Example. We derive a closed formula for f (n) := k=1 (3k − 1)2 and then verify the result by induction. A little experimentation suggests that we should try a polynomial in n of degree 3, say g(n) := An3 + Bn2 + Cn + D. Then g(n + 1) − g(n) = A (n + 1)3 − n3 + B (n + 1)3 − n2 + C (n + 1) − n = 3An2 + (3A + 2B)n + A + B + C and f (n+1)−f (n) =

n+1 X

(3k −1)2 −

k=1

n X

2 (3k −1)2 = 3(n+1)−1 = 9n2 +12n+4.

k=1

Assuming that f (n) = g(n) for all n, we may equate coefficients to obtain A = 3, B = 3/2, and C = −1/2. Since f (1) = 4, we see that D = 0. Thus, under the assumption that the sum has a closed form that is a cubic polynomial, we obtain the formula n X

(3k − 1)2 = 3n3 + 32 n2 − 21 n.

k=1

To prove the validity of the formula we use induction. When n = 1, each side equals 4. Assuming the formula holds for n, we have n+1 X k=1

(3k −1)2 =

n X

2 2 (3k −1)2 + 3(n+1)−1 = 3(n+1)−1 +3n3 + 32 n2 − 12 n.

k=1

A little algebra shows that the last expression reduces to 3(n + 1)3 + 32 (n + 1)2 − 12 (n + 1). Thus the formula holds for n + 1, completing the induction.

♦

The stalwart reader may wish to use the methods of the last example to derive and then verify by induction the formula n X k=1

k4 =

n 6n4 + 15n3 + 10n2 − 1 . 30

There are many other types of applications of the principle of mathematical induction, some of which are given in the exercises. The following has important consequences in combinatorics, probability theory, and infinite series.

The Real Number System

21

1.5.5 Binomial Theorem. Let a, b ∈ R and n ∈ N. Then (a + b)n =

n X n k n−k n n! . a b , where := k!(n − k)! k k

k=0

Proof. For n = 1 the formula asserts that 1 0 1 1 1 0 a+b= a b + a b , 0 1 which follows from the convention 0! = 1. Suppose that the formula holds for some n ≥ 1. Writing (a + b)n+1 as (a + b)(a + b)n and using the induction hypothesis, we have (a + b)

n+1

=

n X n k=0 n X

k

k+1 n−k

a

b

+

n X n k=0

k

ak bn+1−k

n X n k n+1−k n ak bn+1−k + a b + an+1 + bn+1 = k k−1 k=1 k=1 n X n n = + ak bn+1−k + an+1 + bn+1 k−1 k k=1 n+1 X n + 1 = ak bn+1−k , k k=0

where, for the last step, we used Exercise 1.2.6. By induction, the formula holds for all n.

Exercises 1. ⇓9 Let 0 < a < x1 , y1 < b := a + 1 and define p p xn+1 = a + |xn − a| and yn+1 = b − |b − yn |. Prove that a < xn < xn+1 < b and a < yn+1 < yn < b for all n ∈ N. 2. Use induction to prove that a nonempty finite set has a maximum and a minimum. 3.S ⇓10 Verify by induction that 2n X (−1)k+1 k=1 9 This

k

exercise will be used in 2.2.3. 10 This exercise will be used in 6.4.8.

=

2n X 1 for all n ≥ 1. k

k=n+1

22

A Course in Real Analysis 4. Establish the following formulas by mathematical induction: (a)

n X

k = n(n + 1)/2.

(b)

k=1

(c)

n X k=1 n X

2

k 3 = [n(n + 1)/2] .

(d)

n X k=1 n X

k 2 = n(n + 1)(2n + 1)/6. (2k − 1)2 = n(4n2 − 1)/3.

k=1 n X

√ 1 √ = n. k−1+ k k=1 k=1 p n n X X 2k + k(k − 1) − 1 √ 3 2 4 √ (g) = n n. (4k − 6k + 4k − 1) = n . (h) √ k+ k−1 k=1 k=1

(e)

(2k − 1)3 = n2 (2n2 − 1).

(f)

√

5.S P Use the methods of 1.5.4 to derive and verify a closed formula for n 2 k=1 (5k − 4) . 6. Use known formulas to calculate (a) 1 · 2 + 2 · 3 + 3 · 4 + · · · + 999 · 1000. (b)S 1 · 3 + 3 · 5 + 5 · 7 + · · · + 999 · 1001. (c) 1 · 3 + 5 · 7 + 9 · 11 + · · · + 1001 · 1003. 7.S Use the principle of mathematical induction to prove the following variant: Let n0 ∈ Z and let P (n) is a statement depending on integers n ≥ n0 such that (a) P (n0 ) is true, (b) if n ≥ n0 and P (n) is true, then P (n + 1) is true. Then P (n) is true for every n ≥ n0 . 8. Use the variant of mathematical induction in Exercise 7 to verify the following inequalities. (For (e) use (1 + 1/n)n > 2, an easy consequence of the binomial theorem.) (a) S 2n + 1 < 2n , n ≥ 3.

(b) n2 < 2n , n ≥ 5.

(c) 2n < n!, n ≥ 4.

(d) 3n < n!, n ≥ 7.

(e) S 2n n! < nn , n ≥ 6.

(f) 8n n! < (2n)!, n ≥ 6.

9.S Use the variant of mathematical induction in Exercise 7 to prove that n < ln(n!), n ≥ 6. 10. Prove Bernoulli’s inequality: (1 + x)n ≥ 1 + nx, n ∈ Z+ , x ≥ −1.

The Real Number System

23

11. Use the principle of mathematical induction to prove the following variant: Let n0 ∈ Z and let P (n) be a statement depending on integers n ≥ n0 such that (a) P (n0 ) is true, (b) P (n + 1) is true whenever P (j) is true for all n0 ≤ j ≤ n. Then P (n) is true for every n ≥ n0 . 12. (Prime Factorization). Use the variant of induction in Exercise 11 to prove that every integer n ≥ 2 may be written as a product of powers of prime numbers (for example, 72 = 23 · 32 ). 13.S The Fibonacci numbers fn are defined recursively by f0 = f1 = 1 and fn+1 = fn + fn−1 , n ≥ 1. Use the variant of induction in Exercise 11 to prove that √ √ 1 1+ 5 1− 5 n+1 n+1 fn = √ a −b , a := , b := , 2 2 5 where a, b are the zeros of x2 − x − 1. 14. Let a0 and a1 be arbitrary and define an+1 = 21 (an + an−1 ),

n ≥ 1.

Use the variant of induction in Exercise 11 to prove that for all n ≥ 0, an =

1 (−1)n (a0 − a1 ) + (a0 + 2a1 ). 3 · 2n−1 3

15.S (Division algorithm). Prove that for each pair of integers m and n with n > 0 there exist unique integers q and r such that m = qn + r and 0 ≤ r ≤ n − 1. (The integer q is called the quotient and r the remainder on division of m by n.) 16. Use the variant of induction in Exercise Pp11 to prove that each n ∈ N may be uniquely expressed in the form k=0 dk 10k for some p ∈ N and dk ∈ {0, 1, . . . , 9}. The representation n = dp dp−1 . . . d0 is called the decimal positional notation for n.

24

A Course in Real Analysis

1.6

Euclidean Space

The real number system may be used to construct other important mathematical systems, such as n-dimensional Euclidean space and the complex number system. In this section we construct the former. The reader may delay reading this section, as the material will not be needed until Chapter 8. For n ∈ N, let Rn denote the set of all n-tuples x := (x1 , x2 , . . . , xn ), where xj ∈ R. Each such n-tuple is called a point or vector, depending on context. The distinction between points and vectors is important in physics and geometry, as it allows one to refer to a vector at a point, a notion useful in describing, say, forces or tangent vectors. The set Rn has an algebraic structure which is defined as follows: Let x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), and t ∈ R. The operations of addition x + y and scalar multiplication tx in Rn are then defined by x + y = (x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ), and tx = t(x1 , . . . , xn ) = (tx1 , . . . , txn ). We also define −x := (−x1 , . . . , −xn )

and

0 := (0, . . . , 0).

The following theorem asserts that Rn is a vector space under these operations (see Appendix B). The straightforward proof is left to the reader. 1.6.1 Theorem. Addition and scalar multiplication on Rn have the following properties: • associativity of addition: (x + y) + z = x + (y + z); • commutativity of addition: x + y = y + x; • existence of an additive identity: x + 0 = x; • existence of additive inverses: x + (−x) = 0; • associativity of scalar multiplication: (st)x = s(tx); • distributivity of a scalar over vector addition: s(x + y) = sx + sy; • distributivity of a vector over scalar addition: (s + t)x = sx + tx; • existence of a scalar multiplicative identity: 1x = x.

The Real Number System

25

1.6.2 Definition. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). The Euclidean inner product x · y of x and y and the Euclidean norm kxk2 of x are defined by X 1/2 n n X √ 2 x·y = xj yj and kxk2 = xj = x · x. j=1

j=1

The set R with its vector space structure and the Euclidean inner product is called n-dimensional Euclidean space. ♦ n

The structure of Euclidean space allows one to define lines, planes, length, perpendicularity, angle between vectors, etc. These ideas will be useful in later chapters. 1.6.3 Theorem. The inner product in Rn has the following properties: (a) x · x = kxk22 . (b) x · y = y · x (commutativity). (c) t(x · y) = (tx) · y = x · (ty) (associativity). (d) x · (y + z) = (x · y) + (x · z) (additivity). (e) |x · y| ≤ kxk2 kyk2 (Cauchy–Schwartz inequality). Proof. Properties (a) and (b) are immediate and parts (c) and (d) follow respectively from the calculations t

n X j=1

xj yj =

n n n n n X X X X X (txj )yj = xj (tyj ) and xj (yj + zj ) = xj yj + xj zj . j=1

j=1

j=1

j=1

j=1

The inequality in (e) holds trivially if y = 0. Suppose y 6= 0, so kyk2 6= 0. By properties (a)–(d), 0 ≤ kx − tyk22 = (x − ty) · (x − ty) = kxk22 − 2t(x · y) + t2 kyk22 . Setting t = (x · y)/kyk22 , we obtain 0 ≤ kxk22 − 2(x · y)2 /kyk22 + (x · y)2 /kyk22 = kxk22 − (x · y)2 /kyk22 , which implies that (x · y)2 ≤ kxk22 kyk22 . Taking square roots yields (e). 1.6.4 Theorem. The Euclidean norm on Rn has the following properties: (a) kxk2 ≥ 0 (nonnegativity). (b) kxk2 = 0 iff x = 0 (coincidence). (c) ktxk2 = |t| kxk2 (absolute homogeneity). (d) kx + yk2 ≤ kxk2 + kyk2 (triangle inequality).

26

A Course in Real Analysis

Proof. Parts (a) and (b) are clear, and (c) follows from ktxk22 =

n n X X (txj )2 = t2 x2j = t2 kxk22 . j=1

j=1

For (d) we use 1.6.3: kx + yk22 = (x + y) · (x + y) = kxk22 + kyk22 + 2(x · y) ≤ kxk22 + kyk22 + 2kxk2 kyk2 = (kxk2 + kyk2 )2 .

Exercises 1.S Solve the following system of vector equations for x and y in terms of a, b, c, d, and e, assuming that (a · b)(d · b) 6= 1. x + (y · b)a = c y + (x · b)d = e. 2. Prove the following: (a) kx + yk22 − kx − yk22 = 4(x · y) (polarization identity). (b) kx + yk22 + kx − yk22 = 2 kxk22 + kyk22 (parallelogram rule). (c)S kxk2 − kyk2 ≤ kx − yk2 . Pn (d) kx1 + · · · + xn k2 ≤ j=1 kxj k2 (generalized triangle inequality). 3.S Suppose that xi · xj = 0 for i 6= j. Prove that kx1 + · · · + xk k22 =

k X

kxj k22 .

j=1

4. ⇓11 For x = (x1 , . . . , xn ) define kxk1 =

n X

|xj | and kxk∞ = max{|x1 |, . . . , |xn |}.

j=1

Verify that k · k1 and k · k∞ have the properties (a)–(d) of 1.6.4. 5. A nonempty subset C of Rn is said to be convex if x, y ∈ C and t ∈ [0, 1] imply that tx + (1 − t)y ∈ C. Let r > 0. Prove that {x ∈ Rn : kxk2 ≤ r} is convex. Is the set {x ∈ Rn : kxk2 = r} convex? What about the sets {x ∈ Rn : kxk1 ≤ r} and {x ∈ Rn : kxk∞ ≤ r}? 11 This

exercise will be used in Section 8.1.

The Real Number System

27

6. Find positive constants a, b, c such that for all x ∈ Rn , kxk2 ≤ akxk1 ,

kxk1 ≤ bkxk∞ , and

kxk∞ ≤ ckxk2 .

7.S Prove that kxk2 = kyk2 = k(x + y)/2k2 = 1 ⇒ x = y. Is the same true for k · k∞ or k · k1 ? 8. Show that in R3 , a · b = kak kbk cos θ, where θ is the (smaller) angle between a and b. 9. The cross product of vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) in R3 is defined by a a3 , − a1 a3 , a1 a2 a × b = 2 b2 b3 b1 b3 b1 b2 = ha2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 i . Let θ be the (smaller) angle between a and b. Verify the following: (a) (a × b) · a = (a × b) · b = 0. (b) b × a = −a × b. (c) a × (tx + sy) = t(a × x) + s(a × y). (d) (a × b) · c = a · (b × c). (e) a × (b × c) = (a · c)b − (a · b)c. (f) ka × bk = kak kbk sin θ.

Chapter 2 Numerical Sequences

2.1

Limits of Sequences

Simply stated, a sequence in a set E is a function from N to E. It is more instructive, however, to think of a sequence as an infinite ordered list of members of E. The list may be written out, for example, as a1 , a2 , . . . , an , . . . or abbreviated by {an }∞ n=1 or simply by {an }. A sequence usually starts with the index 1, although this is not necessary, 0 being a common alternative. The set E in the definition of sequence is arbitrary. However, for Part I of the book, we consider only numerical sequences, that is, sequences contained in R. Sequences may be defined by a closed formula, such as an = (−1)n , or recursively, such as the Fibonacci sequence, defined by a0 = a1 = 1 and an+1 = an + an−1 , n ≥ 1 (see Exercise 1.5.13). The following notion will occasionally be useful. A property P of a sequence {an } is said to hold eventually if there exists an index N such that an has property P for all n ≥ N . For example, by the Archimedean principle, the sequence {1/n} is eventually less than .001. Or, consider the sequence defined by an = n2 + 100(−1)n ; the reader may verify that eventually an < an+1 . Convergence of a sequence to a number a expresses the idea that eventually the terms of the sequence will be as close to a as desired. The following definition makes this precise. 2.1.1 Definition. A sequence {an } in R is said to converge to a real number a, written an → a or lim an = lim an = a, n

n→+∞

if for each ε > 0 there exists N ∈ N such that |an − a| < ε, (a − ε < an < a + ε), for all n ≥ N. If no such real number a exists, then the sequence is said to diverge.

♦ 29

30

A Course in Real Analysis

a+ a a− 1 2 3 4 5

N −2

N N +2

FIGURE 2.1: Convergence of a sequence to a It follows immediately from the definition that an → a iff the terms of sequence eventually lie in any open interval containing a. The definition also implies that an → a iff |an − a| → 0. Limits, if they exist, are unique. Indeed, if an → a and an → b, then by the triangle inequality |a − b| ≤ |a − an | + |b − an | → 0, hence a = b. Examples. (a) The sequence {(−1)n } oscillates between −1 and 1 and so cannot converge. For a rigorous proof, suppose (−1)n → a for some a ∈ R. Choose N such that a − 1 < (−1)n < a + 1 for all n ≥ N . Thus, if n ≥ N is even, then 1 < a + 1, and if n ≥ N is odd, then a − 1 < −1. Adding these inequalities produces the absurdity a < a. (b) To show that

(−1)n = 0, n n let ε > 0 and choose an integer N > 1/ε (Archimedean principle). Then |(−1)n /n − 0| = 1/n < ε for all n ≥ N . lim

(c) To verify that lim n

note that

2n + 1 2 = , 3n + 5 3

2n + 1 2 7 7 − 3n + 5 3 = 3(3n + 5) < n ,

so any index N > 7/ε satisfies the condition in 2.1.1.

♦

2.1.2 Definition. A sequence {an } is said to be bounded (above, below ) if the set of its terms is bounded (above, below). ♦ 2.1.3 Proposition. A convergent sequence in R is bounded. Proof. Assume that an → a ∈ R. Choose N such that |an − a| < 1 for all n > N . Since |an | − |a| ≤ |an − a|, we see that |an | ≤ |an − a| + |a| < 1 + |a| for all n > N . Thus |an | ≤ max{1 + |a|, |a1 |, . . . , |aN |} for all n ∈ N.

Numerical Sequences

31

2.1.4 Theorem. Let {an } and {bn } be sequences with an → a and bn → b. If an ≤ bn for infinitely many n, then a ≤ b. Proof. Suppose b < a. Then b < (a + b)/2 < a, hence we may choose indices N1 and N2 such that bn < (a + b)/2 for all n ≥ N1 and an > (a + b)/2 for all n ≥ N2 . But then bn < an for all n ≥ max{N1 , N2 }, contradicting the hypothesis. Note that, as a consequence of the preceding theorem, a convergent sequence in a closed interval I must have its limit in I. 2.1.5 Theorem (Squeeze principle). Let {an }, {bn }, and {cn } be sequences in R such that an ≤ bn ≤ cn for all n. If limn an = limn cn = x ∈ R, then limn bn = x. Proof. Given ε > 0, choose N1 , N2 ∈ N such that |an − x| < ε for all n ≥ N1 and |cn − x| < ε for all n ≥ N2 . For n ≥ max{N1 , N2 }, the inequalities −ε < an − x ≤ bn − x ≤ cn − x < ε imply that |bn − x| < ε.

an

bn

cn

x FIGURE 2.2: The squeeze principle. 2.1.6 Example. We show that limn nrn = 0 for any r ∈ (0, 1). Let h = r−1 −1. Then h > 0 and, by the binomial theorem, r−n = (1 + h)n = 1 + nh + 12 n(n − 1)h2 + · · · > 21 n(n − 1)h2 , hence 0 < nrn <

2 , n > 1. (n − 1)h2

Since the term on the right tends to 0 as n → +∞, the squeeze principle shows that nrn → 0. (See Exercise 16 for an extension of this result.) ♦ For another illustration of the squeeze principle we prove 2.1.7 Proposition. For any real number x there exist sequences {an } in Q and {bn } in I such that limn an = limn bn = x. Proof. By 1.4.8 and Exercise 1.4.9, for each n ∈ N we may choose points an ∈ (x − 1/n, x + 1/n) ∩ Q and bn ∈ (x − 1/n, x + 1/n) ∩ I. The squeeze principle then implies that an , bn → x.

32

A Course in Real Analysis

2.1.8 Definition. (Infinite limits) A sequence {an } in R is said to diverge to +∞, written an → +∞ or lim an = lim an = +∞, n

n→+∞

if for each real number M there exists an index N such that an ≥ M for all n ≥ N . Divergence to −∞ is defined analogously. ♦ 2.1.9 Example. If r > 1, then rn /n → +∞. This follows from 2.1.6: Given M > 0 there exists N ∈ N such that n/rn < 1/M , hence rn /n > M , for all n ≥ N. ♦ 2.1.10 Example. If r > 0, then an := rn n! → +∞. Indeed, since an = rn → +∞, an−1 there exists N ∈ N such that an > 2an−1 , for all n > N . Iterating, we see that an > 2k an−k ≥ kan−k , so taking k = n − N we have an > (n − N )aN for all n > N . Since limn (n − N )aN = +∞ (Archimedean principle), the assertion follows. ♦ For the following theorem, recall the conventions regarding addition and multiplication in the extended real number system R (1.4.12). 2.1.11 Theorem. Let {an } and {bn } be sequences in R. The following limit properties hold in R in the sense that if the expression on the right side of the equation exists in R, then the limit on the left side exists and equality holds. (a) limn (san + tbn ) = s limn an + t limn bn ,

s, t ∈ R.

(b) limn an bn = limn an limn bn . (c) limn an /bn = limn an / limn bn , if limn bn 6= 0. (d) limn |an | = | limn an |. √ √ (e) limn an = limn an if an ≥ 0 for all n. Proof. Let an → a, bn → b. We prove the theorem first for the case a, b ∈ R. Let ε > 0. For (a) choose N1 and N2 so that |an − a| <

ε ε for all n ≥ N1 and |bn − b| < for all n ≥ N2 . 2(|s| + 1) 2(|t| + 1)

If n ≥ N := max{N1 , N2 }, then both of these inequalities hold, hence, by the triangle inequality, |san + tbn − (sa + tb)| ≤ |s| |an − a| + |t| |bn − b| < ε/2 + ε/2 = ε.

Numerical Sequences

33

To prove (b), choose M ≥ |a| so that |bn | ≤ M for all n (2.1.3) and choose N so that |an − a| < ε/2M and |bn − b| < ε/2M for all n ≥ N . For such n, |an bn − ab| = |(an − a)bn + a(bn − b)| ≤ |an − a||bn | + |a||bn − b| ≤ M |an − a| + M |bn − b| < ε/2 + ε/2 = ε. For (c) it suffices to show that 1/bn → 1/b. Choose N such that |bn − b| < min{|b|/2, εb2 /2}

for all n ≥ N .

For such n, |bn | ≥ |b| − |bn − b| > |b|/2, hence 1 − 1 = |bn − b| ≤ 2|bn − b| < ε. bn b |bbn | b2 Part (d) follows from the inequality |an | − |a| ≤ |an − a|. For (e), observe first that a ≥ 0 (2.1.4). If a = 0, choose N √ such that an < ε2 for all n ≥ N . If a > 0, choose N such that |an − a| < ε a for all n ≥ N . For such n, √ √ |an − a| |a − a| √ ≤ n√ | an − a| = √ < ε. an + a a To illustrate the remaining cases a = ±∞ or b = ±∞, we prove part (b) for the case −∞ < a < 0 and bn → +∞. To show that an bn → −∞, let M < 0 and choose N so that an < a/2 For such n,

and bn > 2M/a for all n ≥ N .

−an bn > (−a/2)(2M/a) = −M,

hence an bn < M . 2.1.12 Example. To find √ lim n

4n6 − 3n2 + 5 , 2n3 + 7n + 3

divide the numerator and denominator of the general term an by n3 , the highest power of n occurring in the denominator, to obtain p 4 − 3/n4 + 5/n6 an = . 2 + 7/n2 + 3/n3 The quotients in the numerator and denominator tend to 0, hence, by 2.1.11, √ an → 4/2 = 1. ♦

34

A Course in Real Analysis

Exercises 1. Let a, b ∈ R. Find a closed formula for the nth term an of the sequences (a)S a, b, a, b, . . .

(b) a, a, b, b, a, a, . . .

(d) a, b, a, c, a, b, a, c . . .

(c) a, a, a, b, b, b, a, a, a . . . (e) 1, 2, 3, 4, 1, 2, 3, 4, . . .

2. Find a recursive formula for the sequence a, b, a, b, . . . 3. Use the ε, N definition of limit to prove that 4n − 1 (a) lim = 2. n 2n + 7

(b)

S

n−1 = +∞. (e) S (d) lim √ n n+1

√ 5 2n2 − n 5 n+7 √ = . lim 2 = 2. (c) lim n n +3 n 3 n+2 3 r 1 3 n+2 = 8. (f) limn lim 2 + = 1. n n n+1

4. Prove rigorously that the sequence {(−1)n n/(n + 1)} has no limit. 5.S Find limn sin (n!rπ) for r ∈ Q. 1 1 p 6. Find limn n+ for all p ∈ R. n n 7.S Let {an } be contained in a finite set A. Prove that if an → a, then there exists an index N such that an = a for all n ≥ N . In particular, a ∈ A. 8. Find limn bn if (a)S an → a and 3an + 2bn → c. (b) an → 2 and 3an bn + 5a2n − 2bn → 1. 9. Let k ∈ N and a, b > 0. Evaluate limn an if an =

(n + k)! . n!(n + k)k (g) S n (a − 1/n)k − ak .

1/2 an − 1 . (b) bn + 1 q √ √ (d) S an + b n − an. p (f) nk a2 + n−k − a . h i (h) n 1 − (1 − a/n)1/k .

(i) (1 − 1/2)(1 − 1/3) · · · (1 − 1/n).

(j)

2n + 1 (a) . k (n + 3n + 1)1/k p (c) n2 + kn − n. S

(e)

(k) S (1 − 1/22 )(1 − 1/32 ) · · · (1 − 1/n2 ). (l)

n X

(n2 + j)−1 .

j=1 n X

(nk + j)−1/k , k > 1.

j=1

10. Let {an } be bounded and bn → 0. Prove that an bn → 0.

Numerical Sequences

35

11.S Let an → a ∈ R, bn → b ∈ R, and r > 0 such that |an − bn | ≤ r for all n. Prove that |a − b| ≤ r. √ 12. Prove that if nan → a ∈ R, then n an → 0. Show that the converse is false. 1/k

13. Let an ≥ 0 for all n and an → a. Prove that an

→ a1/k , k ∈ N.

14. Let r > 0 and k ∈ N. Prove in each case that an → 1: (a)S an = r1/n . (c) an = r + nk

1/n

(b) an = n1/n . (d) an = sin(1/n)]1/n .

.

+ − − 15. Prove that an → a iff a+ n → a and an → a . (See 1.3.7.)

16. Let m ∈ N and r ∈ (−1, 1). Prove that limn nm rn = 0. 17.S Let 0 < r < 1, an > 0, and an+1 /an < r for all n. Prove that an → 0. Construct a sequence {an } such that an > 0 and an+1 /an < 1 for all n but an 6→ 0. 18. Suppose that an → a ∈ R. Prove that lim(a1 + · · · + an )/n = a. n

Is the converse true? 19.S Let an → a ∈ R and let an ≥ a for all n. Prove that lim min{a1 , · · · , an } = a. n

Does min{a1 , · · · , an } → a imply that an → a? 20. Show that if n−1 an → 0, then n−1 max{a1 , · · · , an } → 0. Prove that the converse holds if {an } is bounded below. Give an example to show that the converse is not generally true. 21. Let 0 < x1 ≤ · · · ≤ xk . Prove that lim(xn1 + · · · + xnk )1/n = xk . n

22.S Let f (x) be any real-valued function on R such that f (x) − x is bounded for all x (for example, f (x) = bxc). Use Exercise 1.5.4 to prove that Pn Pn (a) (1/n2 ) j=1 f (jx) → x/2. (b) (1/n3 ) j=1 f (j 2 x) → x/3. √ 23. Let a0 , a1 > 0 and an = an−1 an−2 , n ≥ 2. Find limn an . 24. Let k ∈ N and let {an } be a sequence such that an+k − an → c ∈ R. Prove that an /n → c/k. Suggestion. Consider first the case k = 1 to get the general idea.

36

2.2

A Course in Real Analysis

Monotone Sequences

2.2.1 Definition. A sequence {an } in R is said to be increasing (strictly increasing) if an ≤ an+1 (an < an+1 ) for all n. Decreasing and strictly decreasing sequences are defined analogously. A sequence that is either increasing or decreasing is called monotone. If {an } is increasing (decreasing), we write an ↑ ( an ↓). If an ↑ (an ↓) and an → a ∈ R, we write an ↑ a (an ↓ a). ♦ 2.2.2 Monotone Sequence Theorem. If {an } is increasing (decreasing), then an ↑ supk ak (an ↓ inf k ak ). In particular, every bounded monotone sequence converges in R. Proof. Assume {an } is increasing and let r < supk ak . By the approximation property of suprema, r < aN ≤ supk ak for some N . Since {an } is increasing, r < an ≤ supk ak for all n ≥ N . Therefore, an ↑ supk ak . The proof for the decreasing case is similar. 2.2.3 Example. Let 0 < a < x1 , y1 < b := a + 1 and define {xn } and {yn } recursively by p p xn+1 = a + |xn − a| and yn+1 = b − |b − yn |. By Exercise 1.5.1, {xn } is strictly increasing, {yn } is strictly decreasing, and a < xn , yn < b for all n. By 2.2.2, xn ↑ x and √ yn ↓ y for some x, y ∈√R. To find x, let n → ∞ in the equation xn+1 = a+ xn − a to obtain x = a+ x − a. This has solutions x = a and x = b. Since {xn } is increasing, x = b. Similarly, y = a. ♦ 2.2.4 Example. We use the monotone sequence theorem to show that the sequence {(1 + 1/n)n } converges. By the binomial theorem (1.5.5) and the inequality k! ≥ 2k−1 (easily established by induction), n

(1 + 1/n) =

n X n 1/nk k

k=0

=2+ ≤2+

n X k=2 n X

(1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n)/k! 1/2k−1 .

k=2 n

Since the sum in the last inequality is ≤ 1, {(1 + 1/n) } is bounded above by 3. Now let m = n + 1. Then 1 − k/m ≥ 1 − k/n ≥ 0,

k = 1, . . . , n − 1,

Numerical Sequences

37

hence (1 + 1/m)

m

≥2+ >2+

n X k=2 n X

(1 − 1/m)(1 − 2/m) · · · (1 − (k − 1)/m)/k! (1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n)/k!

k=2 n

= (1 + 1/n) . n

Thus {(1 + 1/n) } is increasing. By 2.2.2, the sequence has a limit in R, which is denoted by the letter e: n

e := lim (1 + 1/n) = 2.71828182845905 . . . n

♦

Exercises 1.S Let 0 < a < 1 < b. Prove that a1/n ↑ 1 and b1/n ↓ 1. 2. Let an = an /nk and bn = bn /nk , where 0 < a < 1 < b and k ∈ Z+ . Prove that {an } is strictly decreasing and that {bn } is eventually strictly increasing. 3.S Let

na , a, b > 0. 1 + n2 b Prove: an ↓ 0 (eventually) and nan ↑ a/b. an =

4. Let xn > 0 and xn ↑ x. Prove that (xn1 + · · · + xnn )1/n → x. 5. Prove that for any nonempty set A of real numbers there exist sequences {an } and {bn } in A such that an ↑ sup A and bn ↓ inf A. 6. Let {an } be monotone and set bn := (a1 + a2 + · · · + an )/n. Prove that {bn } is monotone. (Compare with Exercise 2.1.18.) 7.S Define a1 = 1 and an = 1 + (1 + an−1 )−1 . Find limn an by first showing that 1 ≤ an ≤ 2, {a2n } is decreasing, and {a2n+1 } is increasing. √ √ 8. Let r > 0, a0 = r, and an = r + an−1 , n ≥ 1. Find limn an . 9.S Let r > 0, a1 > 0 and define an = 21 (an−1 + r/an−1 ), n > 1. √ Show that an ≥ an+1 ≥ r and find limn an . −n

10. Prove that e = limn (1 − 1/n) 11. Let < x0 < y0 and define √ xn+1 = xn yn

.

and yn+1 = (xn + yn )/2.

Prove that 0 < xn < xn+1 < yn+1 < yn and that limn xn = limn yn .

38

A Course in Real Analysis

2.3

Subsequences and Cauchy Sequences

2.3.1 Definition. A subsequence of a sequence {an }∞ n=1 in R is a sequence {ank }∞ , where the indices satisfy 1 ≤ n < n < · · · . The limit in R of a 1 2 k=1 subsequence is called a cluster point of {an }. ♦ For example, in the following sequence the underlined terms define the beginning of a subsequence {ank } with n1 = 3, n2 = 4, n3 = 6, etc. a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 , a10 , a12 , a13 , a14 , a15 , . . . Note that the indices nk of a subsequence satisfy nk ≥ k. Examples. (a) The sequence nh io 1 − (−1)b(n−1)/2c = {0, 0, 2, 2, 0, 0, 2, 2 . . .} is a subsequence of

{1 − (−1)n } = {2, 0, 2, 0, . . .},

which has cluster points 0 and 2. (b) The sequence {n sin (nπ/2)} has cluster points 0 and ±∞. (c) Let {r1 , r2 , . . .} be an arbitrary enumeration of the rational numbers (see Appendix A). Then every real number is a cluster point of {rn }. Indeed, since every interval of the form (x − 1/n, x + 1/n) contains infinitely many terms of the sequence, we may choose n1 ≥ 1 such that |x − rn1 | < 1, n2 > n1 such that |x − rn2 | < 1/2, etc. In this way we may construct a subsequence inductively such that |x − rnk | < 1/k for all k, hence rnk → x. ♦ Notation. It is occasionally convenient to use the following alternate method to describe a subsequence: If we set bk = ank and then change the index in {bk }∞ k=1 to n, then {bn } may be used to denote the subsequence {ank }. This provides a convenient way to denote a subsequence of a subsequence. In this regard, note that if {cn } is a subsequence of {bn } and {bn } is a subsequence of {an }, then {cn } is a subsequence of {an }. The following proposition shows that a convergent sequence has a single cluster point. 2.3.2 Proposition. If {an } is a sequence in R and an → a ∈ R, then ank → a for any subsequence {ank } of {an }. Proof. We prove the proposition for the case a ∈ R and leave the other cases for the reader. Given ε > 0, choose N such that |an − a| < ε for all n ≥ N . Since nk ≥ k, |ank − a| < ε for all k ≥ N . Therefore, ank → a.

Numerical Sequences

39 2

2.3.3 Example. We calculate limn (1 + 1/n2 )3n +5 by writing 3n2 +5 " n2 #3 5 1 1 1 = . 1+ 2 1+ 2 1+ 2 n n n The term in the square brackets is a subsequence of (1 + 1/n)n and hence 2 converges to e (see 2.2.4). It follows that (1 + 1/n2 )3n +5 → e3 . ♦ The following result will have important consequences in later chapters. 2.3.4 Bolzano–Weierstrass Theorem. Every bounded sequence in R has a convergent subsequence. Proof. The proof is based on the observation that if a union of two sets contains infinitely many terms of a sequence, then at least one of the sets must contain infinitely many of the terms of the sequence. Let {an } be a bounded sequence, say c0 ≤ an ≤ d0 for all n. Bisect the interval I0 := [c0 , d0 ]. By the preceding observation, one of the resulting subintervals, call it I1 , contains infinitely many terms of the sequence. Choose one such term, say an1 . Now bisect I1 . Again, one of the resulting subintervals, call it I2 , contains infinitely many terms of the sequence. Choose one such term an2 with n2 > n1 . By repeating this procedure, we produce a subsequence {ank }∞ k=1 of {an } and a sequence of intervals Ik = [ck , dk ], k = 0, 1, . . ., such that c0 ≤ ck−1 ≤ ck ≤ ank ≤ dk ≤ dk−1 ≤ d0 , and dk+1 − ck+1 = 21 (dk − ck ). Since {ck } and {dk } are monotone and bounded ck → c and dk → d for some c, d ∈ R. Since dk − ck = 2−k (d0 − c0 ) → 0, c = d. By the squeeze principle, ank → c.

I0

c0

I1 c 1 I2 I3

d0

an1 an2 c2 c3

d2

an3 .. .

d1

d3

FIGURE 2.3: Interval halving process. The Bolzano–Weierstrass theorem may be extended as follows: 2.3.5 Theorem. Every sequence in R has a subsequence that converges in R. Proof. If {an } is bounded, then the Bolzano–Weierstrass theorem applies. Suppose that {an } is unbounded above. Then for each k ∈ N there exist infinitely many indices n such that an > k. We may then construct a subsequence {ank } with ank > k for all k so ank → +∞.

40

A Course in Real Analysis

2.3.6 Corollary. A sequence {an } in R has a limit in R iff it has exactly one cluster point in R. Proof. The necessity is 2.3.2. For the sufficiency, suppose that {an } has exactly one cluster point a ∈ R. Consider first the case a = +∞. We claim that an → +∞. If not, then there exists M ∈ R such that an ≤ M for infinitely many n, hence there exists a subsequence {ank } of {an } with ank ≤ M for all k. By 2.3.5, {ank } has a cluster point b ∈ R. But b ≤ M < a, so {an } has more than one cluster point, a contradiction. Therefore, an → +∞, as claimed. The case a = −∞ is treated similarly. Now suppose a ∈ R. Then an → a. If not, then there exists ε > 0 such that |an − a| ≥ ε for infinitely many n, so there is a subsequence {ank } of {an } with |ank − a| ≥ ε for all k. By 2.3.4, {ank } has a cluster point b in R. But then |b − a| ≥ ε, so again {an } has more than one cluster point. 2.3.7 Definition. A sequence {an } is said to be Cauchy if for each ε > 0 there exists an index N such that |an − am | < ε for all m, n ≥ N . We express this condition by writing lim(an − am ) = 0. ♦ m,n

The definition asserts that the terms of a Cauchy sequence get closer to one another. Thus the following result is not surprising. 2.3.8 Proposition. Every convergent sequence is Cauchy. Proof. Let an → a. Given ε > 0, choose N such that |an − a| < ε/2 for all n ≥ N . Then for n, m ≥ N , |an − am | = |(an − a) + (a − am )| ≤ |an − a| + |am − a| < ε. It is of fundamental importance that the converse of 2.3.8 is true. To prove this, we need the following lemma. 2.3.9 Lemma. A Cauchy sequence is bounded. Proof. Let {an } be a Cauchy sequence. Choose N such that |an − am | < 1 for all m, n ≥ N . Then |an | ≤ |an − aN | + |aN | < 1 + |aN | for all n ≥ N , hence |an | ≤ max{1 + |aN |, |a1 |, |a2 |, . . . , |aN −1 |} for all n. 2.3.10 Cauchy Criterion. Every Cauchy sequence in R converges. Proof. By 2.3.9 and the Bolzano–Weierstrass theorem, a Cauchy sequence {an } has a convergent subsequence, say ank → a ∈ R. We claim that an → a. Let ε > 0 and choose N such that |an −am | < ε for all m, n ≥ N . In particular, |an − ank | < ε for n, k ≥ N . Fixing n ≥ N and letting k → ∞ in the last inequality yields |an − a| ≤ ε, verifying the claim.

Numerical Sequences

41

Exercises 1. Find all cluster points of {an }, where an = nπ 2n + 1 2n + 1 2 nπ n S n . (b) (−1) . (a) (−1) sin cos2 4n + 3 3 n+5 4 (c)S (−1)bn/3c (1 + 1/n)2 + (−1)bn/4c (2 + 1/n)2 + (−1)bn/5c (3 + 1/n)2 . (d) (−1)n rn + r2n , where rk is the remainder on division of k by 3. 2. Construct a sequence with precisely the cluster points 1, 2, 3, +∞. 3. Let k ∈ N. Use the fact that limn (1 + 1/n)n = e (2.2.4) to find limn an for an = n n n 1 1 1 1 . (b) 1+ . (c) + . (a) 1+ kn k+n k n kn 7n3 −4 1 1 (d)S 1 + . (e) 1+ 3 . 2n + k 3n + 5 4. Let {an } and {bn } be bounded sequences. Show that there exist convergent subsequences of {an } and {bn } with the same indices. 5.S Prove that a sequence contained in a finite set has a constant subsequence. 6. Let −∞ < an < r ≤ +∞ with an → r. Show that {an } has a strictly increasing subsequence. 7. Show that every sequence of distinct real numbers has a strictly monotone subsequence. P∞ 8.S Let k ∈ N and suppose that the series n=1 |an+k − an | converges (see Chapter 6). Prove that {an } has a convergent subsequence. 9. Let a0 , a1 be arbitrary and define an+1 = (an + an−1 )/2, n ≥ 1. Show directly that {an } is a Cauchy sequence. (Its limit may be found from Exercise 1.5.14.) 10.S Let 0 < p ≤ q and an > 0 for all n. Set bn = aqn /(1 + apn ). Show that an → 0 iff bn → 0. Is the assertion true if 0 < q < p? 11. Let I be an open interval and let {an } have the property that each open subinterval J of I contains an for infinitely many n. Prove that every point of I is a cluster point of {an }. Give an example of such a sequence. 12. Suppose that the cluster points of {an } form a sequence {bn }. Show that every cluster point b of {bn } is a cluster point of {an }. Hint. Choose a subsequence {bnk } such that |bnk − b| < 1/k.

42

A Course in Real Analysis

2.4

Limits Inferior and Superior

For an arbitrary sequence {an } in R, define an = inf ak k≥n

and an = sup ak , n = 1, 2, . . . . k≥n

Then {an } is increasing and {an } is decreasing, hence the limits lim inf an := lim an n

n

and lim sup an := lim an n

n

exist in R. These limits are called, respectively, the limit inferior and limit superior of the sequence {an }.

an

a a

an

FIGURE 2.4: a = lim inf n an and a = lim supn an . Clearly,

an ≤ an ≤ an and lim inf an ≤ lim sup an . n

n

Furthermore, if {an } is unbounded below, then lim inf n an = −∞, and if {an } is unbounded above, then lim supn an = +∞. Here are some examples: (−1)n n = −1, n+1 (b) lim inf n [(−1)n + 1]n = 0,

(−1)n n = 1, n+1 lim supn [(−1)n + 1]n = +∞,

(c) lim inf n sin n = −1,

lim supn sin n = 1.

(a) lim inf n

lim supn

Example (c) follows from Example 8.3.10. (See Exercise 8.3.15.) The next proposition shows that lim sup and lim inf have properties similar to those of limits. Their usefulness derives from this fact together with the property that, in contrast to ordinary limits, the limits inferior and superior of a sequence always exist (in R). 2.4.1 Proposition. For any sequences {an } and {bn } in R, (a) lim supn (−an ) = − lim inf n an . (b) lim supn (an + bn ) ≤ lim supn an + lim supn bn if the right side is defined. (c) lim inf n (an + bn ) ≥ lim inf n an + lim inf n bn if the right side is defined. (d) lim supn can = c lim supn an , if c ≥ 0.

Numerical Sequences

43

(e) lim inf n can = c lim inf n an , if c ≥ 0. (f) lim supn (an bn ) ≤ (lim supn an )(lim supn bn ) if an , bn ≥ 0 for all n. (g) lim inf n (an bn ) ≥ (lim inf n an )(lim inf n bn ) if an , bn ≥ 0 for all n. (h) lim inf n an ≤ lim inf n bn , lim supn an ≤ lim supn bn if an ≤ bn for all n. Proof. Part (a) follows from supk≥n (−ak ) = − inf k≥n ak and part (h) is a direct consequence of the definitions. Part (b) follows by taking limits in the inequality sup(ak + bk ) ≤ sup ak + sup bk . k≥n

k≥n

k≥n

Part (f) follows similarly from sup ak bk ≤ sup ak sup bk . k≥n

k≥n

k≥n

Part (d) is a consequence of sup cak = c sup ak , c ≥ 0. k≥n

k≥n

Parts (c), (e), and (g) are proved in a similar manner. 2.4.2 Theorem. For any sequence {an } in R, the extended real numbers a := lim inf n an and a := lim supn an are cluster points of {an }. All other cluster points of {an } in R lie between these. Proof. We leave the case a = −∞ to the reader. Assume then that a > −∞ and recall that an ↓ a. Choose a strictly increasing sequence of real numbers rn tending to a. Since r1 < a1 , by the approximation property of suprema there exists an index n1 such that r1 < an1 ≤ an1 . Similarly, since r2 < an1 +1 , there exists an index n2 > n1 such that r2 < an2 ≤ an2 . In this way we may construct inductively a subsequence {ank } such that rk < ank ≤ank . By the squeeze principle, ank → a. The limit infimum case is treated similarly. Now let {ank } be any subsequence of {an } with ank → a ∈ R. Then, for any m and k ≥ m, am ≤ ank ≤ am . Letting k → ∞ yields am ≤ a ≤ am . Letting m → ∞ we obtain a ≤ a ≤ a. Since limn an exists in R iff {an } has exactly one cluster point (2.3.6), the following result is immediate. 2.4.3 Corollary. For any sequence {an } in R, limn an exists in R iff lim inf n an = lim supn an . In this case, all three limits are equal.

44

A Course in Real Analysis

Exercises 1. Find lim inf n an and lim supn an if (−1)n 5n + 7 (a)S an = . 3n + 5 (b) an = nsin(nπ/2) + (1/n) cos(n). (c)S an = (−1)bn/3c (1+1/n)2 +(−1)bn/4c (2+1/n)2 +(−1)bn/5c (3+1/n)2 . 2nrn + 1 , rk the remainder on division of k ∈ N by 3. (d) an = nr2n + 1 (e) an = (−1)rn xn + (−1)rn+1 yn + (−1)rn+2 zn , where xn → x, yn → y, zn → z, and x < y < z. (f) a1 = 1, a2n = ra2n−1 , a2n+1 = ar + a2n , 0 < r < 1, a > 0. (g) an = 2n + 2−n + (−1)n (2n − 2−n ). 3n cos (nπ/4) + 2 (h)S an = . 2n sin (nπ/4) + 3 2. Show by example that the inequalities (b), (c), (f), and (g) in 2.4.1 may be strict. 3.S Let an > 0 for all n. Prove that lim sup(1/an ) = 1/ lim inf an and lim inf (1/an ) = 1/ lim sup an . n

n

n

n

4. Let {an } be bounded and nonnegative and let r ∈ Q+ . Prove that r r lim sup arn = lim sup an and lim inf arn = lim inf an . n

n

n

n

5.S Show that for any subsequence {ank } of {an }, lim sup ank ≤ lim sup an and lim inf ank ≥ lim inf an . k→∞

n

k→∞

n

6. Let bn → b ∈ (0, +∞). Prove that lim sup(an + bn ) = b + lim sup an and n

n

lim inf (an + bn ) = b + lim inf an . n

n

7.S Let an ≥ 0 for all n and bn → b ∈ (0, +∞). Prove that lim sup an bn = b lim sup an n

n

and

lim inf an bn = b lim inf an . n

n

8. Prove that lim sup an ≤ lim sup |an | and lim inf an ≥ lim inf |an |. n

n

n

Show by examples that the inequalities may be strict.

n

Numerical Sequences

45

9. Let {nk } be a sequence of positive integers that contains each positive integer exactly once. Show that lim sup ank = lim sup an and lim inf ank = lim inf an . n

k

n

k

In particular, if an → a, then ank → a. Note: {ank }∞ k=1 is not necessarily a subsequence {an }. 10.S Let an → a > 0 and lim inf n bn > 0. If b2n − an bn − 6a2n → 0, prove that lim supn→∞ bn ≤ 3a. 11. Prove that for any sequence {an }, n

lim inf an ≤ lim inf n

n

n

1X 1X aj ≤ lim sup aj ≤ lim sup an . n j=1 n j=1 n n

12.S ⇓1 Let an > 0 for all n. Prove that lim inf n

an+1 an+1 ≤ lim inf a1/n ≤ lim sup a1/n ≤ lim sup . n n n an an n n

Use this to calculate limn n/(n!)1/n .

1 This

exercise will be used in 7.4.2.

Chapter 3 Limits and Continuity on R

3.1

Limit of a Function

The definition of limit of a function f given in 3.1.3 below is a precise formulation of the intuitive idea that as x gets closer to a number a, the function value f (x) approaches some fixed number L. This notion is conveniently described in terms of certain subsets of R called neighborhoods. 3.1.1 Definition. Let r > 0. A neighborhood of form (a − r, a + r) N (a) = Nr (a) := (r, +∞) (−∞, −r)

a ∈ R is an interval of the if a ∈ R, if a = +∞, if a = −∞.

If a ∈ R, the set N (a) \ {a} := (a − r, a) ∪ (a, a + r) is called a deleted neighborhood of a. ♦ The reader should verify that the intersection of finitely many neighborhoods of a is again a neighborhood of a and that neighborhoods separate points, that is, if a = 6 b are extended real numbers, then there exist neighborhoods N (a) and N (b) such that N (a) ∩ N (b) = ∅. 3.1.2 Definition. An accumulation point of a nonempty set E of real numbers is an extended real number a such that every neighborhood of a contains a point of E not equal to a. A member of E that is not an accumulation point of E is called an isolated point of E. ♦ For example, the set of accumulation points of E := Q ∩ (−1, 0) ∪ N is [−1, 0] ∪ {+∞}, and the set of isolated points of E is N. The following definition of limit is sufficiently general to include the usual limits encountered in calculus: one-sided limits, two-sided limits, limits at infinity, and infinite limits. 3.1.3 Definition. Let E ⊆ R, let f be a real-valued function whose domain includes E, and let a, L ∈ R, where either a ∈ E or a is an accumulation point of E (not necessarily in the domain of f ). We write L = x→a lim f (x) x∈E

47

48

A Course in Real Analysis

if, for each neighborhood N (L) of L, there is a neighborhood N (a) of a such that x ∈ E ∩ N (a) implies f (x) ∈ N (L). (3.1) In this case we say that that f (x) approaches L as x tends to a along E

♦

The restrictions on a guarantee that E ∩ N (a) 6= ∅, hence condition (3.1) is not vacuously satisfied. Note that if a ∈ E is not an accumulation point of E, then it must be an isolated point, in which case lim{x→a, x∈E} f (x) trivially exists and equals f (a). We single out the following important special cases, where a ∈ R and s > 0: (a) left-hand limit :

lim f (x) := x→a lim f (x), E = (a − s, a).

x→a−

x∈E

(b) right-hand limit : lim+ f (x) := x→a lim f (x), E = (a, a + s). x→a

(c) two-sided limit : (d) limit at +∞ : (e) limit at −∞ :

lim f (x)

x→a

x∈E

:= x→a lim f (x), E = (a − s, a + s) \ {a}. x∈E

lim f (x) := lim f (x), E = (s, +∞).

x→+∞

x→+∞ x∈E

lim f (x) := lim f (x), E = (−∞, −s).

x→−∞

x→−∞ x∈E

f L + 1 L + 2 L L − 2 L − 1 a−δ

a

a+δ

x

FIGURE 3.1: δ works for ε1 but not for ε2 . Applying the definition of limit to the cases (a)–(e) above produces the standard limit definitions encountered in beginning calculus. For example, if the limit L in (c) is finite, then, in the context of (c), 3.1.3 asserts that for each ε > 0 there exists a δ ∈ (0, s) such that |f (x) − L| < ε for all x with 0 < |x − a| < δ. (See Figure 3.1.) For (e) and the case L = +∞, the definition asserts that for each M ∈ R there exists an r > s such that f (x) > M for all x with x < −r.

Limits and Continuity on R

49

The advantage of having a single definition of limit is that it provides a unified theory and allows for economy of thought and presentation. As in the case of sequences, limits of functions are unique. Indeed, if L1 = 6 L2 both satisfy criterion (3.1), then, given neighborhoods N (L1 ) and N (L2 ), there would exist a neighborhood N (a) such that x ∈ E ∩ N (a) ⇒ f (x) ∈ N (L1 ) ∩ N (L2 ). However, N (L1 ) and N (L2 ) may be taken to be disjoint, and choosing any x ∈ E ∩ N (a) then results a contradiction. In any discussion of limits we shall tacitly assume that a and E satisfy the conditions of 3.1.3. 3.1.4 Example. Let f (x) = (3x + 2)/(2x − 1). Then (a) limx→∞ f (x) = limx→−∞ f (x) = 3/2. (b) limx→a f (x) = f (a), (a 6= 1/2). (c) limx→1/2+ f (x) = +∞. (d) limx→1/2− f (x) = −∞. To verify (a,) let ε > 0 and note that the quantity 7 f (x) − 3 = 2 2|(2x − 1)| will be less than ε if |2x − 1| > 7/2ε. The latter inequality is satisfied if either x > (1 + 7/2ε)/2 or x < (1 − 7/ε)/2. For (b), observe first that 3x + 2 3a + 2 7|x − a| = − . |f (x) − f (a)| = 2x − 1 2a − 1 |2x − 1||2a − 1| By the triangle inequality, |2x − 1| ≥ |2a − 1| − |(2a − 1) − (2x − 1)| = |2a − 1| − 2|a − x|. Hence if |a − x| < |2a − 1|/4, then |2x − 1| > |2a − 1|/2 and therefore |f (x) − f (a)| <

14|x − a| . |2a − 1|2

It follows that |f (x) − f (a)| will be less than ε if we require additionally that |x − a| < ε|2a − 1|2 /14. Therefore, any δ < min{|2a − 1|/4, ε|2a − 1|2 /14} will satisfy criterion (3.1). To prove (c), note that if 0 < |x − 1/2| < 1/2, then x > 0, hence f (x) =

1 3x + 2 1 > . 2 x − 1/2 x − 1/2

Given M > 2, let δ = 1/M . Then |x − 1/2| < δ ⇒ 0 < x − 1/2 < 1/M ⇒ f (x) > M , proving (c). The proof of part (d) is similar. ♦

50

A Course in Real Analysis

3.1.5 Theorem. Let f be a function with domain D and let E = E1 ∪E2 ⊆ D. Suppose that one of the following holds: • a is an accumulation point of both E1 and E2 . • a is an isolated point of both E1 and E2 . • a is an accumulation point of E1 and an isolated point of E2 . • a is an accumulation point of E2 and an isolated point of E1 . Then lim{x→a, x∈E} f (x) exists in R iff both limits lim{x→a, x∈E1 } f (x) and lim{x→a, x∈E2 } f (x) exist in R and are equal. In this case all three limits are equal. Proof. If a is an accumulation point of E1 or E2 , then a is an accumulation point of E. If a is an isolated point of E1 and E2 , then a is an isolated point of E. This shows that in each case lim{x→a, x∈E} f (x) is at least defined. Now suppose that L := lim{x→a, x∈E} f (x) exists. Then (3.1) holds for E, so it must hold for each of the subsets E1 and E2 as well. Therefore, lim{x→a, x∈E1 } f (x) and lim{x→a, x∈E2 } f (x) exist and equal L. Conversely, suppose that the limits along E1 and E2 exist and equal K ∈ R. Then, given a neighborhood N (K), there exists a neighborhood N (a) of a such that x ∈ Ej ∩ N (a) implies f (x) ∈ N (K), j = 1, 2. Thus x ∈ E ∩ N (a) implies f (x) ∈ N (K), proving that lim{x→a, x∈E} f (x) = K. 3.1.6 Example. Take E1 = N and E2 = (0, 2). Then 2 is an isolated point of E1 and an accumulation point of E2 , and lim{x→2, x∈E1 } f (x) = f (2). Therefore, by the theorem, lim{x→2, x∈E} f (x) exists iff limx→2− f (x) = f (2). ♦ 3.1.7 Example. (Dirichlet function). Let ( 1 if x ∈ Q, d(x) = 0 otherwise. Since lim{x→a, x∈Q} d(x) = 1 and lim{x→a, x∈I} d(x) = 0, limx→a d(x) cannot exist. ♦ The following is an immediate consequence of 3.1.5. 3.1.8 Corollary. limx→a f (x) exists iff limx→a− f (x) and limx→a+ f (x) exist and are equal. In this case all three limits are equal. The next result shows that function limits may be characterized in terms of limits of sequences. 3.1.9 Sequential Characterization of Limit. Let f be a function whose domain includes E and let a ∈ R be an accumulation point of E. Then lim{x→a, x∈E} f (x) exists in R and equals L iff f (an ) → L for all sequences {an } in E with an → a.

Limits and Continuity on R

51

Proof. Assume that lim{x→a, x∈E} f (x) = L and let {an } be a sequence in E with an → a. Given a neighborhood N (L), choose N (a) as in (3.1) and then choose N such that an ∈ N (a) for all n ≥ N . For such n, f (an ) ∈ N (L). Therefore, f (an ) → L. Now suppose that lim{x→a, x∈E} f (x) 6= L. Then there is a neighborhood of L such that (3.1) fails for each neighborhood N (a) of a. Consider the case a, L ∈ R. Then N (L) is of the form (L − r, L + r) for some r > 0. Taking N (a) = (a − 1/n, a + 1/n) we see that for each n ∈ N there exists an ∈ E with |an − a| < 1/n and |f (an ) − L| ≥ r. Thus an → a and f (an ) 6→ L, so the sequential condition does not hold. A similar argument works if either a or L is infinite. 3.1.10 Example. Let f (x) = sin (1/x), x = 6 0. Since f 1/nπ = 0 and f 2/(4n + 1)π = 1, limx→0+ f (x) does not exist. ♦ 3.1.11 Cauchy Criterion for Functions. Let a be an accumulation point of E. Then lim{x→a, x∈E} f (x) exists in R iff given ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ. Proof. If lim{x→a, x∈E} f (x) exists in R, then an application of the triangle inequality shows that the ε, δ-condition of the theorem holds. Conversely, assume that the ε, δ-condition holds and let {an } be a sequence in E with an → a. By the hypothesis, {f (an )} is a Cauchy sequence and so converges to some real number L. Suppose {bn } is another sequence in E converging to a. Then an − bn → 0 so, by the ε, δ-condition, f (an ) − f (bn ) → 0. Therefore, f (bn ) → L. By 3.1.9, lim{x→a, x∈E} f (x) = L. 3.1.12 Theorem. Let f be a function whose domain includes E and let a ∈ R be an accumulation point of E. Then the following properties hold in the sense that if the expressions on the right exist in R, then the limits on the left exist and the equality holds. (a) x→a lim [sf (x) + tg(x)] = s x→a lim f (x) + t x→a lim g(x), s, t ∈ R. x∈E

x∈E

x∈E

(b) x→a lim f (x)g(x) = x→a lim f (x) x→a lim g(x). x∈E

x∈E

x∈E

lim{x→a, x∈E} f (x) f (x) = if x→a lim g(x) 6= 0. g(x) lim{x→a, x∈E} g(x) x∈E x∈E (d) x→a lim |f (x)| = x→a lim f (x) . (c) x→a lim

x∈E

x∈E

Proof. The assertions follow immediately from 2.1.11 and 3.1.9. However, it is instructive to formulate direct proofs. We do this for the finite version of part (c). Assume that the limits L := x→a lim f (x) and M := x→a lim g(x) 6= 0 x∈E

x∈E

52

A Course in Real Analysis

are finite and let ε > 0. Choose N1 (a) such that |g(x) − M | < |M |/2 for all x ∈ E ∩ N1 (a). For such x, |g(x)| ≥ |M | − |M − g(x)| ≥ |M |/2, hence f (x) L |M f (x) − Lg(x)| g(x) − M = |M g(x)| |M (f (x) − L) + L(M − g(x))| = |M g(x)| |M | |f (x) − L| + |L| |M − g(x)| ≤ |M |2 /2 2 2|L| |f (x) − L| + |M − g(x)| = |M | M2 ≤ K |f (x) − L| + |M − g(x)| , K := 2/|M | + 2|L|/M 2 . Now choose N2 (a) so that |f (x) − L| < ε/2K and |M − g(x)| < ε/2K for all x ∈ E ∩ N2 (a). Then x ∈ E ∩ N1 (a) ∩ N2 (a) ⇒ |f (x)/g(x) − L/M | < ε. 3.1.13 Example. (Limits of rational functions at infinity). Let f (x) = P (x)/Q(x), where P (x) = a0 + a1 x + · · · + an xn and Q(x) = b0 + b1 x + · · · + bm xm , an , bm 6= 0. For any a, c ∈ R, limx→c a = a and limx→c x = c, hence, by 3.1.12, limx→c f (x) = f (c), provided Q(c) 6= 0. To calculate limits at +∞, write f (x) =

a0 x−n + a1 x−n+1 + · · · + an−1 x−1 + an n−m x . b0 x−m + b1 x−m+1 + · · · + bm−1 x−1 + bm

Since limx→+∞ x−j = 0 for j ∈ N, we see that if m > n, 0 lim f (x) = an /bn if m = n, and x→+∞ ±∞ if m < n, where the sign in the last case is that of an /bm .

♦

3.1.14 Theorem. Let f be a function whose domain includes E and let a ∈ R be an accumulation point of E. If f (x) ≤ g(x) for all x ∈ E and if L := lim{x→a, x∈E} f (x) and M := lim{x→a, x∈E} g(x) exist in R, then L ≤ M . Proof. Assume, for a contradiction, that M < L. Choose any K ∈ (M, L) and then choose neighborhoods N (L) ⊆ (K, +∞) and N (M ) ⊆ (−∞, K) (see Figure 3.2). Then there exists a neighborhood N (a) such that f (x) ∈ N (L) and g(x) ∈ N (M ) for all x ∈ E ∩ N (a). But for any such x, g(x) < f (x), contradicting the hypothesis.

Limits and Continuity on R N (M )

53

N (L)

M g(x)

K

f (x) L

FIGURE 3.2: L can’t be greater than M . 3.1.15 Theorem (Squeeze principle for functions). Let f be a function whose domain contains E and let a ∈ R be an accumulation point of E. If f (x) ≤ g(x) ≤ h(x) for all x ∈ E and if the limits lim{x→a, x∈E} f (x) and lim{x→a, x∈E} h(x) exist in R and are equal, then lim{x→a, x∈E} g(x) exists in R and all three limits are equal. Proof. Let L denote the common limit. For the case L ∈ R, given ε > 0 there exists a neighborhood N (a) of a such that L − ε ≤ f (x) ≤ g(x) ≤ h(x) < L + ε for all x ∈ E ∩ N (a). The cases L = ±∞ are proved similarly. 3.1.16 Definitions. A function f is said to be strictly increasing on E if f (x) < f (y) for all x, y ∈ E with x < y. Similarly, f is increasing on E if f (x) ≤ f (y) for all x, y ∈ E with x < y. The notions of strictly decreasing and decreasing are defined analogously. If f is either (strictly) increasing or (strictly) decreasing on E, then f is said to be (strictly) monotone on E. Finally, f is bounded on E if there exists a real number M such that |f (x)| ≤ M for all x ∈ E. ♦ The reader should compare the following theorem with the monotone sequence theorem (2.2.2). 3.1.17 Monotone Function Theorem. Let a, b, c ∈ R with a < c < b. If f is monotone on (a, b), then limx→a+ f (x), limx→b− f (x) exist in R and limx→c− f (x), limx→c+ f (x) exist in R. Proof. Assume that f is increasing. Let s := supa 0 for all x ∈ E. Prove that lim sup x→a x∈E

1 1 = . f (x) lim inf {x→a, x∈E} f (x)

5. Prove that lim sup f (x) ≤ lim sup |f (x)| and lim inf f (x) ≥ lim inf |f (x)|. x→a x→a x→a x→a x∈E

x∈E

x∈E

x∈E

Show by examples that the inequalities may be strict. 6. Let f : [a, b) → R and g(x) = supa≤t≤x f (t), a ≤ x < b. Prove that g(x0 ) ≤ limx→x0 + g(x) for every x0 ∈ [a, b).

3.3

Continuous Functions

3.3.1 Definition. A function f with domain D is said to be continuous at a point a ∈ D if lim{x→a, x∈D} f (x) = f (a); that is, for each ε > 0 there exists a δ > 0 such that |f (x) − f (a)| < ε for all x ∈ D with |x − a| < δ. If f is continuous at each point of a subset E of D, then f is said to be continuous on E. If f is continuous on D, then f is simply said to be continuous. A point in D at which f is not continuous is called a discontinuity of f . ♦ The definition of continuity implies that any function f : D → R is continuous at an isolated point of D. For example, if D is a finite set or a set of integers, then every function f : D → R is continuous. Continuity of f on E is not the same as continuity of the restriction f |E . For example, the function on R that is identically equal to one on Z and zero elsewhere is not continuous on Z, yet its restriction to Z is continuous (as a function with domain Z). From the sequential characterization of limit we have 3.3.2 Sequential Characterization of Continuity. A function f with domain D is continuous at a ∈ D iff f (an ) → f (a) for all sequences {an } in D with an → a. 3.3.3 Example. Let {r1 , r2 . . .} be an enumeration of the rationals in (0, 1). Define f on (0, 1) by f (rn ) = 1/n and f (x) = 0 if x is irrational. We use the sequential characterization of continuity to show that f is continuous precisely at the irrational numbers in (0, 1).

60

A Course in Real Analysis

Let x ∈ (0, 1) be rational. Choose a sequence {xn } of irrational numbers converging to x. Since f (xn ) = 0 for all n and f (x) 6= 0, f (xn ) 6→ f (x). Therefore, f is not continuous at any rational. Now let x ∈ (0, 1) be irrational and let {xn } be any sequence converging to x. If f (xn ) 6→ f (x), then there exists an N ∈ N and a subsequence {yn } of {xn } such that f (yn ) ≥ 1/N for all n. By definition of f , yn ∈ {r1 , r2 , . . . , rN }. But this implies that x ∈ {r1 , r2 , . . . , rN }, contradicting that x is irrational. (For a variation of this example, see Exercise 10.) ♦ The following is an immediate consequence of 3.1.12. 3.3.4 Theorem. Let f and g be functions with domain D, let α, β ∈ R and let a ∈ D. If f and g are continuous at a, then so are αf + βg, f g, f /g (the last provided that g(a) 6= 0). 3.3.5 Theorem. Let g : D → R and f : E → R with g(D) ⊆ E. If g is continuous at a ∈ D and f is continuous at g(a), then f ◦ g is continuous at a. Proof. Let b := g(a). Given ε > 0, choose η > 0 such that |f (y) − f (b)| < ε for all y ∈ E with |y − b| < η. Next, choose δ > 0 such that |g(x) − b| < η for all x ∈ D with |x − a| < δ. Then |x − a| < δ implies |f (g(x)) − f (b)| < ε. A more succinct proof uses the sequential characterization of continuity: an → a in D ⇒ g(an ) → g(a) ⇒ f g(an ) → f g(a) . Constant functions and the function f (x) = x are clearly continuous. It follows from 3.3.4 that polynomials and rational functions are continuous. Continuity of trigonometric, logarithmic, and exponential functions will follow from results in Chapter 4. Power functions xα := eα ln x are continuous as they are compositions of continuous functions. Of course, in each case the domain of the function must be carefully specified. It is possible that a function is nowhere continuous. The Dirichlet function (3.1.7) is an example. By contrast, we have 3.3.6 Theorem. A monotone function on an open interval I has at most countably many discontinuities. Proof. Assume without loss of generality that f is increasing on I. Let D denote the set of discontinuities of f on I. For each t ∈ I, let at = lim− f (x) and bt = lim+ f (x) x→t

x→t

and let It = (at , bt ). Clearly, It 6= ∅ iff t ∈ D (see Figure 3.3). Furthermore, by monotonicity, s < t ⇒ bs ≤ at . Therefore, the sets It are pairwise disjoint. For each t ∈ D, choose a rational number rt in It . Since the correspondence t → rt is one-to-one and the set of rationals is countable, D is countable.

Limits and Continuity on R

61

bt rt at bs rs as s

t

FIGURE 3.3: One-to-one correspondence between t ∈ D and rt ∈ Q.

Exercises 1.S Define

( mx + 3 f (x) = 3x2 + 7

if x < 2, if x > 2.

If f is continuous at x = 2, find the values of f (2) and m. 2. Find all values of a for which the following function is continuous on R. ( 3x2 + 5x − 7 if x < a f (x) = 2x2 + 2x + 3 if x ≥ a. 3. Let f : (a, b) → R and g : (b, c) → R be continuous and suppose that lim f (x) = lim+ g(x).

x→b−

x→b

Show that there exists a continuous function h : (a, c) → R such that h = f on (a, b) and h = g on (b, c). 4.S Let g be continuous on R and let d(x) be the Dirichlet function. Show that f (x) := g(x)d(x) is continuous precisely at the zeros of g. 5. Let f be defined on an open interval I and let c ∈ I. Show that f is continuous at c iff for each strictly increasing sequence {an } converging to c and each strictly decreasing sequence {bn } converging to c, f (an ) → f (c) and f (bn ) → f (c). 6. Let f be a continuous function on [a, b] and let {an } be a sequence in [a, b]. Prove: (a) f lim sup an ≤ lim sup f (an ). (b) f lim inf an ≥ lim inf f (an ). n→∞

n→∞

n→∞

n→∞

Show that equality holds in each case if f is increasing. Give examples to show that the inequalities may be strict.

62

A Course in Real Analysis 7. Let f1 , . . . , fn be continuous at x0 . Prove that the functions Mn (x) := max fj (x) 1≤j≤n

and mn (x) := min fj (x) 1≤j≤n

are continuous at x0 . Give examples to show that the corresponding result is not true for infinitely many functions, where max is replaced sup and min by inf. 8.S Let f : R → R be continuous at zero and satisfy f (x + y) = f (x) + f (y) for all x, y ∈ R. Prove that f (tx) = tf (x) for all t, x ∈ R. Conclude that f (x) = f (1)x for all x ∈ R. 9. A function f is right continuous at a if limx→a+ = f (a) and left continuous at a if limx→a− f (x) = f (a). (a) Prove that f is continuous at a iff f is both right and left continuous at a. (b) Prove that the greatest integer function bxc is right continuous on R but not left continuous at any integer. (c)S Let {cn } be any sequence in R. For x ∈ R define X f (x) = 2−n , n:cn ≤x

where the notation indicates that the sum, possibly infinite, is taken over all indices n for which cn ≤ x. (If there are no such indices, the sum is defined to be 0.) Prove that f is right continuous everywhere. Prove also that f is left continuous P∞ at a iff a is not equal to any cn . (Note that, because the series n=1 2−n converges, the order of summation is irrelevant (6.4.10). Thus f (x) is well-defined.) (d) Let f be increasing on an interval I. Define g on I by g(x) = lim+ f (t) = inf f (t). t→x

t>x

Prove that g is increasing and right continuous on I and that g is continuous at a iff f is continuous at a. 10. Define f : (0, 1) → R by ( 0 f (x) = 1/n

if x is irrational if x = m/n, reduced.

Use the sequential characterization of continuity to show that f is continuous precisely at the irrational numbers in (0, 1).

Limits and Continuity on R

63

11.S Let f : [0, 1] → R have the property that the limit g(x) := limt→x f (t) exists in R for all x ∈ [0, 1]. Prove that (a) g is continuous. (b) f has at most countably many discontinuities. Hint. For (a), use the sequential criterion. For (b), use ideas similar to those used in the proof of 3.3.6.

3.4

Properties of Continuous Functions

3.4.1 Extreme Value Theorem. If f is continuous on a closed bounded interval [a, b], then f has a maximum and a minimum; that is, there exist xm , xM ∈ [a, b] such that f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ [a, b]. Proof. We show first that f is bounded. Suppose, for instance, that f is not bounded above. Then for each n ∈ N there exists an ∈ [a, b] such that f (an ) > n. On the other hand, by the Bolzano–Weierstrass theorem, {an } has a convergent subsequence, say ank → x0 . But then, by continuity, nk < f (ank ) → f (x0 ) < +∞, impossible. Thus f must be bounded above. Similarly, f is bounded below. Now let M := sup{f (x) : x ∈ [a, b]}. By the first paragraph, M is finite. By the approximation property for suprema, there exists a sequence xn ∈ [a, b] such that f (xn ) → M . By the Bolzano–Weierstrass theorem again, there exists a subsequence xnk converging to some xM ∈ [a, b]. By continuity, f (xM ) = M Therefore, f (xM ) is the maximum of f . The proof for the minimum case is similar. The examples f (x) = 1/x on (0, 1) and f (x) = x on [0, +∞) show that the interval in the theorem must be both closed and bounded. 3.4.2 Definition. A function f is said to have the intermediate value property on an interval I if, for each a, b ∈ I with a < b and each y0 between f (a) and f (b), there exists an x0 ∈ (a, b) such that f (x0 ) = y0 . ♦ The intermediate value property simply asserts that f (I) is an interval whenever I is an interval. 3.4.3 Intermediate Value Theorem. A continuous function f on an interval I has the intermediate value property.

64

A Course in Real Analysis

Proof. Let a, b ∈ I with a < b and suppose that f (a) < y0 < f (b). The set E := {x ∈ [a, b] : f (x) < y0 } contains a and is bounded below, hence x0 := sup E exists and lies in [a, b]. By continuity of f at a, E contains an interval [a, a + δ), hence x0 > a. Since f (x) < y0 for all x ∈ E, 3.1.14 and the continuity of f at x0 imply that f (x0 ) = x→x lim f (x) ≤ y0 . 0

x∈E

In particular, x0 6= b. Similarly, since f (x) ≥ y0 for all x ∈ (x0 , b), f (x0 ) = lim+ f (x) ≥ y0 . x→x0

Therefore, y0 = f (x0 ). Figure 3.4 illustrates the proof.

f (b) f (x) y0 f (x) f (a) a

E

x

x0

x b

FIGURE 3.4: y0 = f (x0 ). Simple examples show that the continuity hypothesis is essential. Of course, there are many discontinuous functions that have the intermediate value property (see Exercise 5). Interestingly, all derivatives have the intermediate value property, whether they are continuous or not (Exercise 4.2.25). Thus a function without the intermediate value property cannot have an antiderivative. Combining the extreme and intermediate value theorems we obtain 3.4.4 Corollary. If f is continuous on [a, b], then f [a, b] = [f (xm ), f (xM )]. 3.4.5 Corollary (Existence of nth roots). For each b > 0 and n ∈ N, the equation xn = b has a unique positive solution. Proof. Let f (x) = xn . Since limx→+∞ xn = +∞, we may choose c > 0 such that f (c) > b > f (0) = 0. By the intermediate value theorem, the equation f (x) = b has a positive solution. By Exercise 3.1.12, xn is strictly increasing on (0 + ∞), hence the solution is unique. Here is another application of the intermediate value theorem.

Limits and Continuity on R

65

3.4.6 Example. The equation √ 2 x + sin (3x2 ) 5x2 + e2x+7 f (x) := + =0 (x − 1)3 (x − 2)5 has a solution x = x0 between 1 and 2. Indeed, since lim f (x) = +∞ and

x→1+

lim f (x) = −∞,

x→2−

there must exist 1 < a < b < 2 such that f (a) > 0 > f (b). By the intermediate value theorem, f (x0 ) = 0 for some x0 ∈ (a, b). ♦ Remark. The zeros of a continuous function f may be approximated using the interval halving method, reminiscent of the proof of the Bolzano–Weierstrass theorem: Suppose f (a) < 0 < f (b) so that a zero of f lies in (a, b). Bisect the interval [a, b] and compute the values of f at the endpoints of the resulting two intervals. If one of these values is zero, stop. If neither is zero, then for one of the intervals, denote it by [a1 , b1 ], the values of f at the endpoints have opposite signs. The intermediate value theorem then implies that a zero of f lies in (a1 , b1 ), and we may approximate the zero by either a1 or b1 . Continuing this process, we may (theoretically) approximate a zero of f to any desired degree of accuracy. The procedure is easily programmable. ♦

Exercises 1. Find an example of a bounded function on [0, 1] with a single discontinuity that has no maximum or minimum. 2.S Let f be continuous and positive on R with lim f (x) = 0. Prove that x→±∞

f has a maximum value on R.

3. Let f be continuous on R with lim f (x) = +∞. Prove that f has a x→±∞

minimum value on R.

4. A function f defined on an interval J and taking values in R is said to be upper (lower) semicontinuous at x0 ∈ J if f (x0 ) ≥ lim sup f (x) f (x0 ) ≤ lim inf f (x) , x→x0

x→x0

where the limits are one-sided if x0 is an endpoint of J. If f is upper (lower) semicontinuous at each point of J, then f is said to be upper (lower) semicontinuous on J (a) Prove that f is upper semicontinuous at x0 iff −f is lower semicontinuous at x0 . (b) Prove that f is continuous at x0 iff it is both upper and lower semicontinuous at x0 .

66

A Course in Real Analysis (c) Show that, at any integer n, bxc is upper semicontinuous but not lower semicontinuous. (d) Let f (x) = sin (1/x), x 6= 0, and f (0) = a. Show that f is upper (lower) semicontinuous at 0 iff a ≥ 1 (a ≤ −1). (e)S Let fi be defined on J and upper semicontinuous at x0 for every i in some index set I. Define f (x) = inf i∈I fi (x), x ∈ J. Show that f is upper semicontinuous at x0 . Give an example to show that f may not be continuous at x0 even if each fi is continuous on J. (f) (Semi-extreme value property) Prove: If f is upper (lower) semicontinuous at each point of [a, b], then f is bounded above (below) on [a, b] and there exists x0 ∈ [a, b] such that f (x0 ) ≥ f (x) (f (x0 ) ≤ f (x)) for all x ∈ [a, b]. 5. Give an example of a function on [0, 1] with the intermediate value property that is (a) discontinuous at precisely the points 1/n, n = 1, 2, . . . . (b)S discontinuous everywhere. 6. Prove that a polynomial P of odd degree maps R onto R. In particular, P has a real zero. 7. Use the intermediate value theorem to show that each of the following equations has a solution in the indicated interval I. (a) ln x + x = e, I = (1, e). (b) sin x = ax, I = (π/2, π), 0 < a < 2/π. (c)S tan x = x, I = (nπ, (n + 1/2)π), n ∈ N. (d) ex = 4.82 sin x, I = (0, π/2) and I = (π/2, π). (e)

x4 + x2 + 1 x3 + 1 e−x + x + + = 0, I = (−1, 0) and I = (0, 1). x+1 x x−1

(f)S

e1−x − x2 2x2 − 5 = , I = (0, π/2). sin x cos x

2

8. Prove that the equation ex = xn (n ∈ N) has a solution in R iff n ≥ 3. Hint. Find the minimum of ex /xn on (0, +∞). 9.S Let f : [a, b] → [a, b] be continuous. Prove that there exists x ∈ [a, b] such that f (x) = x. 10. Prove that if n ∈ N is odd, then every real number has a unique nth root. 11. Let f be continuous and nonzero on R. Let a0 be arbitrary and define {an } recursively by an = an−1 + f (an−1 ), n ≥ 1. Show that either an ↑ +∞ or an ↓ −∞.

Limits and Continuity on R

3.5

67

Uniform Continuity

Recall that a function f is continuous on a set E if for each y ∈ E and each ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all x in the domain of f with |x − y| < δ. The number δ typically depends on both ε and y. Removing the dependence on y results in the notion of uniform continuity: 3.5.1 Definition. A function f is said to be uniformly continuous on a subset E of the domain of f if for each ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ.

♦

The following result is frequently useful in determining whether or not a function is uniformly continuous. 3.5.2 Sequential Characterization of Uniform Continuity. A function f is uniformly continuous on E iff f (xn ) − f (yn ) → 0 for all sequences {xn } and {yn } in E with xn − yn → 0. Proof. Let f be uniformly continuous on E and let {xn } and {yn } be sequences in E with xn − yn → 0. Given ε > 0, choose δ > 0 so that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ. Next, choose N ∈ N such that |xn − yn | < δ for all n ≥ N . For such n, |f (xn ) − f (yn )| < ε. Thus f (xn ) − f (yn ) → 0. Now assume that f is not uniformly continuous on E. Then there exists an ε > 0 and sequences xn , yn ∈ E with |xn − yn | < 1/n and |f (xn ) − f (yn )| ≥ ε. Then xn − yn → 0 but f (xn ) − f (yn ) 6→ 0, so f does not satisfy the sequential condition. 3.5.3 Example. The function f (x) = 1/x, x > 0, is uniformly continuous on intervals of the form [r, +∞), r > 0, as may be seen from the inequality |f (x) − f (y)| =

|x − y| |x − y| ≤ , x, y ≥ r. xy r2

However, f is not uniformly continuous on (0, +∞). Indeed, if xn = 1/2n and yn = 1/n, then xn − yn → 0 yet f (xn ) − f (yn ) = n → +∞. ♦ 3.5.4 Theorem. Let f , g be uniformly continuous on E and let α, β ∈ R. Then (a) αf + βg is uniformly continuous on E. (b) If f and g are bounded, then f g is uniformly continuous on E. (c) If g 6= 0 and 1/g is bounded on E, then 1/g is uniformly continuous on E.

68

A Course in Real Analysis

Proof. Part (a) follows easily from the sequential characterization of uniform continuity. For (b), let M > 0 such that |f (x)|, |g(x)| ≤ M for all x ∈ E. Uniform continuity of f g then follows from the inequalities |f (x)g(x) − f (y)g(y)| ≤ |f (x)g(x) − f (y)g(x)| + |f (y)g(x) − f (y)g(y| ≤ M |f (x) − f (y)| + M |g(x) − g(y)|. For (c), choose K > 0 such that 1/|g(x)| < K for all x ∈ E. Uniform continuity of 1/g then follows from 1 1 |g(x) − g(y)| 2 − g(x) g(y) = |g(x)g(y)| ≤ K |g(x) − g(y)|, x, y ∈ E. The following theorem may be given a short proof based on the sequential criterion for uniform continuity. We leave the details to the reader. 3.5.5 Theorem. Suppose that g is uniformly continuous on D, f is uniformly continuous on E, and g(D) ⊆ E. Then f ◦ g is uniformly continuous on D. The next theorem shows that on closed and bounded intervals the notions of continuity and uniform continuity coincide. 3.5.6 Theorem. If f is continuous on a closed bounded interval [a, b], then f is uniformly continuous there. Proof. We use the sequential characterization of uniform continuity. Let {xn } and {yn } be sequences in [a, b] with xn − yn → 0. Suppose, for a contradiction, that f (xn ) − f (yn ) 6→ 0. Then |f (xn ) − f (yn )| > ε for some ε > 0 and infinitely many n and hence for a subsequence of {n}. Changing notation if necessary, we may suppose that the inequality holds for all n. By the Bolzano– Weierstrass theorem, {xn } has a convergent subsequence, say xnk → x0 . Since xnk − ynk → 0, ynk → x0 . But then, by continuity, |f (xnk ) − f (ynk )| → 0, which is impossible. The connection between continuity and uniform continuity on open intervals is more complicated. For this, we need the following definitions. 3.5.7 Definition. A continuous function f on D is said to have a continuous extension to a set D1 ⊇ D if there exists a continuous function f1 : D1 → R such that f1 |D = f . In the special case D1 = D ∪ {a}, where a 6∈ D, f (x) is said to have a removable discontinuity at x = a. ♦ 3.5.8 Proposition. Let f be defined and continuous on D and let a be an accumulation point of D, a 6∈ D. Then f has a removable discontinuity at x = a iff L := lim{x→a, x∈D} f (x) exists in R. Proof. The necessity is clear. For the sufficiency, simply set f (a) = L to obtain a continuous extension of f to D ∪ {a}.

Limits and Continuity on R

69

For example, the functions 1 x sin , x

sin x , x

and

x p |x|

defined for x 6= 0, have removable discontinuities at x = 0 and hence have unique continuous extensions to R. On the other hand, since limx→0+ sin(1/x) does not exist, the function sin(1/x) does not have a removable discontinuity at x = 0. The following theorem is the main result regarding uniform continuity of functions on bounded open intervals. 3.5.9 Theorem. Let f be continuous on the bounded interval (a, b). The following statements are equivalent: (a) limx→a+ f (x) and limx→b− f (x) exist in R. (b) f has a continuous extension to [a, b]. (c) f is uniformly continuous on (a, b). Proof. (a) ⇒ (b) is immediate from 3.5.8. (b) ⇒ (c): By 3.5.6, a continuous extension g of f to [a, b] is uniformly continuous. Therefore, f = g|(a,b) is uniformly continuous. (c) ⇒ (a): Let {an } be any sequence in (a, b) converging to a. Then {an } is Cauchy and since f is uniformly continuous, {f (an )} is Cauchy (Exercise 7). Therefore, L := limn→∞ f (an ) exists. We claim that limx→a+ f (x) exists and equals L. To see this, let {a0n } be any sequence in (a, b) converging to a. Then an − a0n → 0, hence, by uniform continuity, f (an ) − f (a0n ) → 0, so f (a0n ) → L. By the sequential characterization of limit (3.1.9), limx→a+ f (x) = L. A similar argument shows that limx→b− f (x) exists. For example, since sin(1/x) has no continuous extension to [0, 1], it is not uniformly continuous on (0, 1]. On the other hand, for any p > 0, limx→0+ xp sin(1/x) = 0, hence xp sin(1/x) is uniformly continuous on (0, 1]. For another example, consider f (x) = (1 − cos x)/x on R \ {0}. By l’Hospital’s rule, proved in the next chapter, limx→0 f (x) = limx→0 sin x = 0, hence f has a continuous extension to R. Moreover, since limx→±∞ f (x) = 0, f is uniformly continuous on R (Exercise 5). 3.5.10 Corollary. A bounded, continuous, monotone function f on a bounded interval (a, b) is uniformly continuous there. Proof. By 3.1.17, limx→a+ f (x) and limx→b− f (x) exist in R. The following result relies on the mean value theorem proved in the next chapter. 3.5.11 Theorem. If f has a bounded derivative on an interval I, then f is uniformly continuous on I.

70

A Course in Real Analysis

Proof. Let M be a bound for |f 0 | on I. By the mean value theorem, for any x, y ∈ I there exists a z between x and y such that f (x) − f (y) = f 0 (z)(x − y). Thus |f (x) − f (y)| ≤ M |x − y|, which implies uniform continuity. For example, sinn x and cosn x have bounded derivatives for every n ∈ N, hence are uniformly continuous on R. This also follows from periodicity (see Exercise 11). On the other hand, xp is not uniformly continuous on (0, +∞) for p > 1. Indeed, if xn = n + n(1−p)/2 and yn = n, then, by the mean value theorem, for each n there exists zn ∈ (yn , xn ) such that xpn − ynp =

pznp−1 ≥ pn(p−1)/2 → +∞. n(p−1)/2

Since xn − yn → 0, 3.5.2 implies that xp is not uniformly continuous.

Exercises 1.S Find functions f and g with f continuous and g uniformly continuous such that neither f ◦ g nor g ◦ f is uniformly continuous. 2. Let r > 0. Show that the function f (x) = (3x + 2)/(2x − 1) in 3.1.4 is uniformly continuous on Dr but not on its domain D, where Dr := (−∞, 1/2 − r] ∪ [1/2 + r, +∞) and D = (−∞, 1/2) ∪ (1/2, +∞). 3. Let a, b > 0. Give a careful ε, δ proof that each of the following functions is uniformly continuous on R. √ √ (b) 1/ ax2 + b. (c) |ax + b|. (a)S ax2 + b. 4. Show that ln x is uniformly continuous on (r, +∞) for every r > 0 but is not uniformly continuous on (0, 1). 5. Let f be continuous on [0, ∞). Prove that if limx→+∞ f (x) exists and is finite, then f is uniformly continuous on [0, +∞). Give an example of a bounded continuous function on [0, +∞) that is not uniformly continuous. 6. Prove that each of the following functions is uniformly continuous on the indicated interval, where n ∈ N: (a) sin(1/x), [r, +∞), r > 0.

(b) x sin(1/x), [0, +∞).

(c) arctan x, (−∞, +∞). p (e) cos x2 + 1, (−∞, +∞).

(d) xn e−x , [0, +∞).

(g) (1 + xn )1/n , [0, +∞).

(h) (1 + xn )−1/n , [0, +∞).

(f) xp , 0 < p ≤ 1, [0, +∞).

7.S Let f be uniformly continuous on E and let {an } be a Cauchy sequence in E. Prove that {f (an )} is Cauchy.

Limits and Continuity on R

71

8. Suppose that f (x) is uniformly continuous on [0, +∞). Prove that the function g is uniformly continuous on R, where ( f (x) if x ≥ 0, g(x) := f (−x) if x < 0. 9.S Let f be uniformly continuous on R. Prove that f (|x|), |f (x)|, and |f (|x|)| are uniformly continuous on R. 10. Let f be uniformly continuous on each of the intervals (a, b) and (c, d), where a < b < c < d. Prove that f is uniformly continuous on the set (a, b) ∪ (c, d). What if b = c? 11. Let f : R → R be periodic with period p > 0, that is, f (x + p) = f (x) for all x ∈ R. If f is continuous on [0, p], prove that f is uniformly continuous and bounded on R. 12. Let f1 , . . . , fn be uniformly continuous on E. Prove that the functions M (x) := max fj (x) and m(x) := min fj (x) 1≤j≤n

1≤j≤n

are uniformly continuous on E. 13.S Find all values of p > 0 for which the function f (x) = x−p sin x, x > 0, has a continuous extension to [0, +∞). Prove that for all such p the extension is uniformly continuous. 14. Let r > 0. Prove that f (x) := sin(xp ) is uniformly continuous on (r, +∞) iff p ≤ 1. 15.S Prove that a uniformly continuous function f on a bounded interval (a, b) is bounded. Give examples to show that the result is not true if (a, b) is unbounded or if f is merely continuous. 16. Give examples to show that parts (b) and (c) of 3.5.4 are not necessarily true if the boundedness conditions are removed. 17. Let f be continuous on [a, b]. Prove that g(x) := sup f (t) a≤t≤x

is continuous on [a, b]. 18.S Let

f (x) = (1 − e1/x )−1 , x 6= 0.

Is it possible to define f (0) so that f is continuous on R? What about for the function g(x) = x(1 − e1/x )−1 , x 6= 0?

Chapter 4 Differentiation on R

The notion of rate of change of one quantity with respect to another is fundamental to many disciplines. It is expressed mathematically as the derivative of a function. In this chapter we establish the main properties of this important construct.

4.1

Definition of Derivative and Examples

4.1.1 Definition. A real-valued function f defined in a neighborhood of a ∈ R is said to be differentiable at a if the limit df f (x) − f (a) f (a + h) − f (a) 0 f (a) = Df (a) = := lim = lim x→a h→0 dx a x−a h exists in R. The limit is then called the derivative of f at a. If f is differentiable at each member of a set E, then f is said to be differentiable on E and the function df f 0 = Df = dx is called the derivative of f on E. If f 0 is continuous on E, then f is said to be continuously differentiable on E. ♦ It follows immediately from the definition that the derivative of a constant function is 0. Here are some nontrivial examples. 4.1.2 Example. We prove the following special cases of the power rule (the general power rule will be proved later): Let n ∈ N and r = n or 1/n. Then Dxr = rxr−1 . (In the second case x 6= 0, and x > 0 if n is even.) The case r = n is obtained by letting h → 0 in the identity n

X (x + h)n − xn = (x + h)n−j xj−1 h j=1 73

74

A Course in Real Analysis

(Exercise 1.2.4.) Each term in the sum tends to xn−1 , and since there are n terms the formula follows. For the case r = 1/n we use the identity X −1 n (x + h)1/n − x1/n 1−j/n (j−1)/n = (x + h) x h j=1 (Exercise 1.4.15). As h → 0, the term in square brackets tends to nx1−1/n , verifying the formula. ♦ For the next example, and indeed for the remainder of the book, we shall use the standard definitions of cosine and sine as coordinates of points on the unit circle.1 From this one can derive the usual trigonometric identities, which we shall invoke as needed.

1 sin h tan h h h cos h

1

FIGURE 4.1: sin h < h < tan h. 4.1.3 Example. D sin x = cos x. From the identity sin2 h + cos2 h = 1 and the inequalities sin h < h < tan h, 0 < h < π/2, which may be derived with the help of Figure 4.1, we see that p p sin h 1 − h2 < 1 − sin2 h = cos h < < 1, 0 < h < π/2. h

(4.1)

Since sin(−h) = − sin h and cos(−h) = cos h, (4.1) holds for −π/2 < h < 0 as well. By the squeeze principle, lim cos h = lim

h→0

h→0

sin h = 1. h

From this and the calculation cos h − 1 cos2 h − 1 = =− h h(cos h + 1)

sin h h

2

h (cos h + 1)

1 A more rigorous approach to the calculus of trigonometric functions may be based on the inverse sine function. This approach is described briefly in Section 4.4.

Differentiation on R we see that lim

h→0

75

cos h − 1 = 0. h

Therefore, sin(x + h) − sin x sin x cos h + cos x sin h − sin x = h h cos h − 1 sin h = sin x + cos x h h → cos x as h → 0.

♦

It is occasionally necessary to consider one-sided derivatives, which are defined by using one-sided limits in 4.1.1. Specifically, the left-hand and righthand derivatives are, respectively, f (x) − f (a) and x−a f (x) − f (a) . Dr f (a) = fr0 (a) := lim+ x→a x−a D` f (a) = f`0 (a) := lim− x→a

From the general theory of limits, a function is differentiable at a iff it has equal right-hand and left-hand derivatives at a. For example, at x = 0 the function f (x) = |x| has right-hand derivative 1 and left-hand derivative −1 and so is not differentiable there. Although we shall have no need to do so, one may even consider the more general expressions lim inf x→a x∈E

f (x) − f (a) f (x) − f (a) and lim sup , x→a x−a x−a x∈E

where a is an accumulation point of E. The so-called Dini derivates are obtained by taking E to be intervals of the form (c, a) and (a, c). The following proposition provides a useful characterization of differentiability. It asserts that for x near a, f (x) is approximated by the linear function y = f (a) + f 0 (a)(x − a), the equation of the tangent line at a. 4.1.4 Proposition. Let f be defined in a neighborhood N (a) of a. Then f is differentiable at a iff there exists a function η on N (a), continuous at a, such that f (x) = f (a) + η(x)(x − a) for all x ∈ N (a). In this case, f 0 (a) = η(a). Proof. If such a function η exists, then f (x) − f (a) = η(x) → η(a) as x → a, x−a

76

A Course in Real Analysis

hence f 0 (a) exists and equals η(a). Conversely, if f is differentiable at a, define f (x) − f (a) if x ∈ N (a) \ {a}, η(x) = x−a f 0 (a) if x = a. Then η has the required properties. 4.1.5 Corollary. If f is differentiable at a, then f is continuous there. Proof. Simply note that f (x) = f (a) + η(x)(x − a) → f (a) as x → a. The example |x| considered above shows that the converse of the corollary is false: |x| is continuous at 0 but not differentiable there. It is a remarkable fact that there are continuous functions on R that are nowhere differentiable (see 8.9.7). 4.1.6 Theorem. If c ∈ R and f and g are differentiable a, then so are f + g, cf , f g, and f /g, the last provided that g(a) 6= 0. Moreover, in this case, (a) (f + g)0 (a) = f 0 (a) + g 0 (a), (c) (f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a),

(b) (cf )0 (a) = cf 0 (a), 0 f g(a)f 0 (a) − f (a)g 0 (a) (d) (a) = . g g 2 (a)

Proof. We prove only (d). Let h = f /g. Since g is continuous at a and g(a) 6= 0, h is defined in a neighborhood N (a) on which g is not 0. For x ∈ N (a) \ {a}, a little algebra shows that g(a) h(x) − h(a) = x−a

f (x) − f (a) g(x) − g(a) − f (a) x−a x−a . g(x)g(a)

Letting x → a, using the continuity of g at a, yields (d). The preceding theorem, together with 4.1.2 and 4.1.3, show that polynomials, rational functions, and trigonometric functions are differentiable. (See Exercise 2.) The following important result will yield additional examples. 4.1.7 Chain Rule. Let g be differentiable at a and let f be differentiable at g(a). Then f ◦ g is differentiable at a and (f ◦ g)0 (a) = f 0 (g(a))g 0 (a). Proof. Set b := g(a). By 4.1.4, there exists a function η, defined in a neighborhood N (b) of b and continuous at b with η(b) = f 0 (b), such that f (y) = f (b) + η(y)(y − b), y ∈ N (b).

(4.2)

Since g is continuous at a, we may choose a neighborhood N (a) of a such that g(N (a)) ⊆ N (b). Then f ◦ g is defined on N (a), and by (4.2) f (g(x)) − f (g(a)) g(x) − g(a) = η(g(x)) , x ∈ N (a) \ {a}. x−a x−a Letting x → a produces the desired result.

Differentiation on R

77

The formula (f ◦ g)0 (x) = f 0 (g(x))g 0 (x) is sometimes easier to apply when written in Leibniz notation as dy du dy = , where y = f (u) and u = g(x). dx du dx 4.1.8 Example. The power rule Dxr = rxr−1 , r ∈ Q, follows from 4.1.2 and the chain rule: Let r = m/n, m, n ∈ N, and set u = x1/n and y = um . Then y = xr and dy dy du 1 m m/n−1 = = mum−1 x1/n−1 = x = rxr−1 . dx du dx n n The case r < 0 may be verified using the quotient rule.

♦

Higher order derivatives of y = f (x) are defined inductively by f 00 = D2 f = .. . f (n) = Dn f =

d dy d2 y := , dx2 dx dx dn f d dn−1 f := . dxn dx dxn−1

By convention, we set f (0) = D0 f := f .

Exercises 1. Use the limit definition to find the derivative of √ 1 (c) 2 . (a) x2 + x + 1. (b)S 2x + 1. x +1

(d)S √

1 . 3x + 2

2. Use the techniques of 4.1.3 to find the derivative of cos x. Use rules of differentiation to obtain the derivatives of tan x, cot x, sec x, and csc x. 3. Use rules of differentiation to find f 0 for each of the functions f : 2/3 2 √ √ 2x + 5 x −1 5 S 3 S (a) 5x + 7 3x + 2. (b) . (c) sin . 7x + 2 x2 + 1 q √ sin2 x − 1 (d) . (e) tan cos(1/x) . (f) ax + bx + c. 2 sin x + 1 4. Assuming that y is a differentiable function of x that satisfies the given dy equation, use the rules of differentiation to find : dx (a) x3 + y 3 − xy = 1. (b) S sin(xy 2 ) + x2 = 1. (c) tan(x + y) + y 2 = x.

78

A Course in Real Analysis 5. Let f (x) = xn |x|, n ∈ N. Find f (n−1) and f (n) . 6. Let f (x) = xm bxc, m ∈ N. Find f`0 (n) and fr0 (n), n ∈ Z. 7.S Find all values of a, b, such that f 0 exists on R, where ( ax2 + bx + a/x if x > 1, f (x) = x3 if x ≤ 1. 8. Find all values of a, b, and c such that f 0 is continuous on (0, +∞), where ( ax2 + bx if x > 1, f (x) = √ c x if 0 < x ≤ 1. 9. Let

( f (x) =

ax2 + bx + c if x > 1, x3 if x ≤ 1.

Find all values of a, b, and c such that (a) f is continuous on R.

(b) f is differentiable on R.

(c) f is continuous on R.

(d) f 00 exists on R.

0

10. Find all values of c such that f 0 (c) exists, where ( ax − 4 if x > c, f (x) = 9x2 if x ≤ c. Is f 0 continuous at these values? 11. Let f be differentiable at a. Use the limit definition of derivative to calculate f (a + 5 sin h) − f (a + 2 sin h) f (a + h2 ) − f (a − h) . (b)S lim . h→0 h→0 h h

(a) lim

12. Let g be differentiable on an open interval I and let f (x) = g(x)d(x), where d(x) is the Dirichlet function (3.1.7). Let a be a zero of g. Prove that f 0 (a) exists iff a is a zero of g 0 . 13. Let f be differentiable at c and let {an } and {bn } be sequences such that an < c < bn and an , bn → c. Prove that f (bn ) − f (an ) = f 0 (c). n→∞ bn − an lim

14.S Let f be differentiable and increasing on (a, b). Prove that f 0 (x) ≥ 0 for all x ∈ (a, b).

Differentiation on R

79

15. Let f be differentiable at a and nonnegative in a neighborhood of a with f (a) = 0. Prove that f 0 (a) = 0. 16.S Prove Leibniz’s rule: If f and g are n times differentiable, then D (f g) = n

n X n k=0

k

(Dk f ) (Dn−k g).

17. Prove that if f has right-hand and left-hand derivatives at a (not necessarily equal), then f is continuous at a. 18. Assuming that f , g, and h have the necessary differentiability, find general formulas for (a) D f ◦ (gh) . (b) D f ◦ (g/h) . (c)S D2 f ◦ g . (d) D f ◦ g ◦ h . 19. Find a formula for the nth derivative of √ (a)S 1/x. (b) 1/ x. (c) xex .

(d) xe−x .

20. Find all values of p ∈ R for which the function ( |x|p sin(1/x) if x 6= 0, f (x) = 0 otherwise is (a) continuous, (b) differentiable, (c) continuously differentiable onR. 21.S Define f (0) = 0 and f (x) = xm sin xn , x 6= 0, where m ∈ Z, n ∈ N. For what values of m and n does f 0 (0) exist? For which of these values is f 0 continuous on R? 22. A function f defined on a symmetric neighborhood (−a, a) of 0 is said to be odd if f (−x) = −f (x) and even if f (−x) = f (x). (a) Prove that any function h : (−a, a) → R is the sum of an even function f and an odd function g. (b) Prove that if f is differentiable and odd (even), then f 0 is even (odd). (c) Is the converse true? That is, if f 0 is even (odd), is f odd (even)? 23.S Let fj , gj , and hj be differentiable, j = 1, 2, 3. Prove that f1 g1 f1 g1 h1

f2 g2 h2

0 0 f f2 = 1 g2 g1

0 0 f1 f3 g3 = g1 h1 g3

f20 g2 h2

f20 f1 f2 + and g2 g10 g20 f30 f1 f2 f3 f1 g3 + g10 g20 g30 + g1 h3 h1 h2 h3 h01

f2 g2 h02

f3 g3 . h03

80

4.2

A Course in Real Analysis

The Mean Value Theorem

The mean value theorem relates the average rate of change of a function to its instantaneous rate of change. It is one of the most useful theorems in analysis and will play a central role in the proof of the fundamental theorem of calculus in Chapter 5. The proof of the mean value theorem is based on the existence of local extrema. 4.2.1 Definition. A function f is said to have a local maximum (local minimum) at c if f is defined on an open interval I containing c and f (x) ≤ f (c) (f (x) ≥ f (c)) for all x ∈ I. In either case, f is said to have a local extremum at c. ♦

f

c1

c2

x

FIGURE 4.2: Local extrema of f . 4.2.2 Local Extremum Theorem. If f has a local extremum at c and if f is differentiable at c, then f 0 (c) = 0. Proof. Suppose that f has a local maximum at c. Let I be an open interval containing c such that f (x) ≤ f (c) for all x ∈ I. Then ( f (x) − f (c) ≥ 0 if x ∈ I and x < c x−c ≤ 0 if x ∈ I and x > c. It follows that the left-hand derivative of f at c is ≥ 0 and the right-hand derivative is ≤ 0, hence f 0 (c) = 0. The proof for the local minimum case is similar. 4.2.3 Rolle’s Theorem. Let f be continuous on [a, b] and differentiable on (a, b). If f (a) = f (b), then there exists a point c ∈ (a, b) such that f 0 (c) = 0. Proof. By the extreme value theorem there exist xm , xM ∈ [a, b] such that f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ [a, b]. If f (xm ) = f (xM ), then f is a constant function and the assertion of the theorem holds trivially. If f (xm ) 6= f (xM ), then either xm ∈ (a, b) or xM ∈ (a, b), and the conclusion follows from the local extremum theorem.

Differentiation on R

81

The following result is the key ingredient in the proof of l’Hospital’s rule in Section 4.5. 4.2.4 Cauchy Mean Value Theorem. Let f and g be continuous on [a, b] and differentiable on (a, b). Then there exists a point c ∈ (a, b) such that [f (b) − f (a)]g 0 (c) = [g(b) − g(a)]f 0 (c). Proof. The function h(x) := [f (b) − f (a)]g(x) − [g(b) − g(a)]f (x) is continuous on [a, b], differentiable on (a, b), and satisfies h(a) = h(b). By Rolle’s theorem, h0 (c) = 0 for some c ∈ (a, b), which is the assertion of the theorem. y

y (f (c), g(c))

(f (b), g(b))

(f (a), g(a)) x

(a)

a

x

c

(b)

b

FIGURE 4.3: (a) Cauchy mean value theorem. (b) Mean value theorem. If f (a) 6= f (b) and f 0 (x) 6= 0 on (a, b), then the conclusion of 4.2.4 may be written g(b) − g(a) g 0 (c) = 0 . f (b) − f (a) f (c) For smooth functions f and g, this equation asserts that at some point f (c), g(c) on the curve given parametrically by the equations x = f (t) and y = g(t), the line through the endpoints (f (a), g(a)) and (f (b), g(b)) is parallel to the line tangent to the curve at f (c), g(c) . See Figure 4.3(a). Taking g(x) = x in the Cauchy mean value theorem yields the standard mean value theorem (Figure 4.3(b)): 4.2.5 Mean Value Theorem. If f is continuous on [a, b] and differentiable on (a, b), then there exists c ∈ (a, b) such that f (b) − f (a) = f 0 (c). b−a

82

A Course in Real Analysis

4.2.6 Corollary. Let f (x) and g(x) be differentiable on an open interval I such that f 0 (x) = g 0 (x) for all x ∈ I. Then there exists a constant k such that f = g + k on I. Proof. Let a, b ∈ I. By the mean value theorem applied to h := f − g, there exists c ∈ (a, b) such that h(a) − h(b) = h0 (c)(a − b). Since h0 = 0, h(a) = h(b). Since a and b were arbitrary, h must be constant. 4.2.7 Corollary. Let f be differentiable on an open interval I. (a) If f 0 ≥ 0 (f 0 > 0) on I, then f is increasing (strictly increasing) on I. (b) If f 0 ≤ 0 (f 0 < 0) on I, then f is decreasing (strictly decreasing) on I. Proof. We prove (a) for the strictly increasing case. Let a, b ∈ I, a < b. By the mean value theorem, f (b) − f (a) = f 0 (c)(b − a) for some c ∈ (a, b). Since f 0 (c) > 0, f (b) > f (a).

Exercises 1.S Show that cos x = (0, π/2).

√

x − 1 has exactly one solution x in the interval

2. Find an interval I such that for each c ∈ I, sin x = x2 /2 + x + c has exactly one solution x in the interval (0, π/2). 3.S Show that f (x) = x4 − 4x3 + 4x2 + c has at most one zero in the interval (1, 2). For what interval of values of c does f have exactly one zero in (1, 2)? 4. Let f have k derivatives and n distinct zeros on an interval I. Prove that f (k) has at least n − k distinct zeros in I. 5. Let f have a continuous second derivative on [−1, 3], f (1) = 0, and set g(x) = x2 f (x). Prove that g 00 has at least one zero in [−1, 2]. Hint. Consider the function gn (x) := x(x + 1/n)f (x). 6. Let P (x) be a polynomial of degree n and let a 6= 0. Prove that the equation eax = P (x) has at most n + 1 solutions. 7.S Let P (x) be a polynomial of degree n and let a 6= 0. Prove that the equation sin(ax) = P (x) has at most n + 1 solutions. 8. Prove Bernoulli’s inequality: (1 + x)r ≥ 1 + rx for all x ≥ −1 and all rational numbers r ≥ 1. (Cf. Exercise 1.5.10.) 9.S Let f and g be continuous on [a, b] and differentiable on (a, b) such that |f 0 | ≤ |g 0 |. If g 0 is never zero on (a, b), prove that |f (x) − f (y)| ≤ |g(x) − g(y)| for all x, y ∈ [a, b].

Differentiation on R

83

10. Let f and g be differentiable on an open interval I and let a, b ∈ I with a < b. Prove that if f (a) = g(a) and f 0 > g 0 on (a, b), then f > g on (a, b). Use this to show that (a) ln x < x − 1 on the interval (1, +∞). (b) sin x < x on the interval (0, π/2). (c) cos x > 1 − x on the interval (0, π/2). (d) tan x > x on the interval (0, π/2). (e) ex > 1 + x + x2 /2! + · · · + xn /n! on the interval (0, +∞). (Use induction.) 11.S Show that

sin x is a decreasing function on (0, π/2). x

12. Show that on (0, π/2) (a) x sin x + cos x > 1.

(b) x sin x + p cos x < p, p ≥ 2.

(c) x

(d) x−2 (1 − cos x) is decreasing.

−1

(1 − cos x) is increasing.

13. Let a, b, p > 0, and for x ≥ 0 define f (x) = ap + xp − (a + x)p . Show that for x > 0, ( > 0 if 0 < p < 1, f 0 (x) < 0 if p > 1. Conclude that ( (a + b)

p

< ap + bp > ap + bp

if 0 < p < 1, if p > 1.

14. Let f and g have derivatives of order n on an open interval I and let a ∈ I. Suppose that f (j) (a) = g (j) (a) = 0, j = 0, . . . , n − 1, and f (j) (x)g (j) (x) 6= 0 for x > a and j = 0, . . . , n. Prove that for any b ∈ I with b > a there exists c ∈ (a, b) such that f (b) f (n) (c) = (n) . g(b) g (c) 15. Suppose that f has a local maximum at c. Prove that lim inf − x→c

f (x) − f (c) f (x) − f (c) ≥ 0 ≥ lim sup . x−c x−c x→c+

84

A Course in Real Analysis

16. Let f and g be continuous on [a, b], differentiable on (a, b) and let f (a) = f (b) = 0. Show that there exists c ∈ (a, b) such that f 0 (c) = g 0 (c)f (c). 17.S Show that for any polynomial P (x) there exist finitely many intervals with union R such that P is strictly monotone on each interval. 18. Suppose that f has the property |f (x) − f (y)| ≤ c|x − y|1+ε for all x, y ∈ R, where c, ε > 0. Prove that f is constant. 19.S Let f have a bounded derivative on R. Prove that for sufficiently large r the function g(x) := rx + f (x) is one-to-one and maps R onto R. 20. Suppose f > 0 on (1, +∞) and limx→+∞ xf 0 (x)/f (x) ∈ (1, +∞). Prove that x/f (x) is decreasing on (b, +∞) for some b > 1. 21. Let f be twice differentiable on (0, a), f 00 ≥ 0, and limx→0+ f (x) = 0. Prove that f (x)/x is increasing on (0, a). Show that the conclusion is false if the hypothesis f 00 ≥ 0 is dropped. 22.S Let g(x) = x2 sin(1/x) if x 6= 0 and g(0) = 0. Set f (x) = x + g(x). Show that f 0 (0) > 0 but f is not monotone on any neighborhood of 0. 23. Let limx→+∞ f 0 (x) = 0. Prove that if g ≥ c > 0 on (a, +∞), then lim f x + g(x) − f (x) = 0. x→+∞

24. Let f be differentiable on R with supx∈R |f 0 (x)| < 1. Prove that the sequence {xn } defined by xn+1 = f (xn ) converges, where x1 is arbitrary. Conclude that f has a unique fixed point; that is, there exists a unique x ∈ R such that f (x) = x. 25.S Suppose f is differentiable on an open interval I. Show that f 0 has the intermediate value property. Conclude that if f 0 (x) 6= 0 on I, then f is strictly monotone on I. Hint. Apply the extreme value theorem to the function g(x) = f (x) − y0 (x − a), a ≤ x ≤ b. 26. Let f be differentiable on I := (1, +∞). Prove that if f 0 has finitely many zeros in I, then limx→+∞ f (x) exists in R. 27. Let f and g have continuous derivatives on an interval I with g 0 6= 0 and let aj , bj ∈ I with aj < bj , j = 1, . . . , n. Prove that there exists c ∈ I such that n X j=1

[f (bj ) − f (aj )]g 0 (c) =

n X j=1

[g(bj ) − g(aj )]f 0 (c).

Differentiation on R

85

28.S A function f is said to be uniformly differentiable on an open interval I if, given ε > 0, there exists δ > 0 such that f (x) − f (y) 0 0, there exists a δ > 0 such that f (x) − f (y) f 0 (y) g(x) − g(y) − g 0 (y) < ε, for all x and y in I with 0 < |x − y| < δ. 30. Let f be differentiable on [a, +∞) and suppose that the zeros of f 0 form a strictly increasing sequence an ↑ +∞. Prove that if L := limn f (an ) exists in R, then limx→+∞ f (x) = L. 31.S Prove that a function f is continuously differentiable on an open interval I iff there exists a continuous function ϕ on I 2 such that f (x) − f (y) = ϕ(x, y)(x − y) for all x, y ∈ I. 32. Let f be continuous on (−r, r) and differentiable on (−r, 0) ∪ (0, r). If limx→0 f 0 (x) exists, prove that f 0 (0) exists and f 0 is continuous at 0.

*4.3

Convex Functions

4.3.1 Definition. A function f is said to be convex on an interval (a, b) if f (1 − t)u + tv ≤ (1 − t)f (u) + tf (v) for all a < u < v < b and all t ∈ [0, 1]. f is concave if −f is convex.

♦

For example, |x| is convex on R, as is easily established using the triangle inequality. To see the geometric significance of convexity, let Luv : [u, v] → R denote the function whose graph is the line segment from (u, f (u)) to (v, f (v)). Since a typical point on the line segment may be written (1 − t) u, f (u)) + t(v, f (v) = (1 − t)u + tv, (1 − t)f (u) + tf (v) , t ∈ [0, 1],

86

A Course in Real Analysis

we see that

Luv (1 − t)u + tv = (1 − t)f (u) + tf (v).

This shows that f is convex iff the line segment connecting any two points on the graph of f lies above the part of the graph between the two points. (See Figure 4.4.)

f

Luv

a

u

v

b

x

FIGURE 4.4: Convex function. Now let x ∈ (u, v). Then for some t ∈ (0, 1), x = (1 − t)u + tv = t(v − u) + u = (1 − t)(u − v) + v, hence

t = (x − u)/(v − u) and 1 − t = (v − x)/(v − u).

It follows that f is convex on (a, b) iff f (x) ≤ Luv (x) = f (u)

v−x x−u + f (v) for all a < u < x < v < b. (4.3) v−u v−u

4.3.2 Theorem. If f : (a, b) → R has an increasing derivative, then f is convex. In particular, f is convex if f 00 ≥ 0. Proof. Let a < u < x < v < b. By the mean value theorem applied to f on each of the intervals [u, x] and [x, v], there exist points y ∈ u, x and z ∈ x, v such that f (x) − f (u) f (v) − f (x) = f 0 (y) ≤ f 0 (z) = . x−u v−x Solving the inequality for f (x) yields (4.3). Thus x2n is convex on R for any n ∈ N, ln(x) is concave on (0, +∞), and x is convex on (0, +∞) if p ≥ 1 and concave if p < 1. There is a partial converse to 4.3.2. For this we need following lemma. p

4.3.3 Lemma. If f is convex and a < u < x ≤ y < v < b, then (a)

f (x) − f (u) f (y) − f (u) f (v) − f (y) ≤ ≤ , and x−u y−u v−y

(b)

f (v) − f (x) f (v) − f (y) ≤ . v−x v−y

Differentiation on R

87

Proof. Referring to Figure 4.5, for (a) we have f (x) − f (u) Luy (x) − f (u) ≤ x−u x−u f (y) − f (u) = y−u Luv (y) − f (u) ≤ y−u Luv (v) − Luv (y) = v−y f (v) − f (y) ≤ v−y f

by convexity, since u < x < y, by equality of slopes on Luy , by convexity, since u < y < v, by equality of slopes on Luv , by convexity since u < y < v.

Luv Lxv Luy u

y

x

v

FIGURE 4.5: Convex function inequalities. A similar calculation verifies (b): Lxv (v) − Lxv (y) Lxv (v) − Lxv (x) f (v) − f (x) f (v) − f (y) ≥ = = . v−y v−y v−x v−x 4.3.4 Theorem. If f is convex, then fr0 and f`0 exist, are increasing, and f`0 (x) ≤ fr0 (x). Proof. Let a < u < x ≤ y < v < b. By (a) of the lemma, the difference quotients [f (x) − f (u)]/(x − u) decrease as x → u+ , so fr0 (u) exists in R and fr0 (u) ≤

f (v) − f (y) < +∞. v−y

Letting v → y + shows that fr0 (u) ≤ fr0 (y). Therefore, fr0 is increasing. Similarly, by (b) the difference quotients [f (v) − f (y)]/(v − y) increase as y → v − so f`0 (v) exists in R and f`0 (v) ≥

f (v) − f (x) > −∞. v−x

Taking x = y in (a) of the lemma, we have f (x) − f (u) f (v) − f (x) ≤ . x−u v−x

88

A Course in Real Analysis

Letting u ↑ x and v ↓ x, we obtain f`0 (x) ≤ fr0 (x). In particular, f`0 (x) and fr0 (x) are finite. 4.3.5 Corollary. A convex function f is continuous. Proof. By the theorem, f has finite left-hand and right-hand derivatives and hence is left and right continuous. 4.3.6 Theorem. If a convex function f is differentiable at x ∈ (u, v), then f 0 (x)(t − x) + f (x) ≤ f (t) for all t ∈ (u, v). That is, the tangent line at (x, f (x)) lies below the graph of f on (u, v). Proof. Since the difference quotients f (t) − f (x) /(t − x) decrease as t ↓ x, fr0 (x) ≤

f (t) − f (x) , t > x. t−x

The same difference quotients increase as t ↑ x, hence fl0 (x) ≥

f (t) − f (x) , t < x. t−x

Therefore, if f 0 (x) exists, then f 0 (x)(t − x) + f (x) ≤ f (t) for all t.

4.4

Inverse Functions

In this section we prove that under suitable conditions the inverse of a one-to-one continuous (differentiable) function is continuous (differentiable). For this we need the following two lemmas. The proof of the first is illustrated in Figures 4.6 and 4.7. 4.4.1 Lemma. Let f be one-to-one on an interval I. If f has the intermediate value property, then f is strictly monotone and continuous on I. Proof. Let a, b be arbitrary points in I with a < b. Assume, for definiteness, that f (a) < f (b). We claim that f (a) < f (x) < f (b) for all a < x < b. Indeed, if, say f (x) < f (a), then f (a) lies between f (x) and f (b), hence, by the intermediate value property, there exists c ∈ (x, b) such that f (c) = f (a), contradicting that f is one-to-one. Next we show that f is strictly increasing on [a, b]. Let a < x1 < x2 < b and suppose that f (x2 ) < f (x1 ). Then f (x2 ) lies between f (a) and f (x1 ), hence there exists d ∈ (a, x1 ) such that f (d) = f (x2 ), again contradicting that f is one-to-one. Thus f is strictly increasing on [a, b]. It follows that f must be strictly increasing on any closed and bounded subinterval of I containing

Differentiation on R

89 f

f f (b)

f (x1 ) f (d) = f (x2 )

f (a) = f (c)

f (a)

f (x) x

a

a

c b

d

x1

x2 b

FIGURE 4.6: f (x) < f (a) or f (x1 ) > f (x2 ) violates one-to-one hypothesis. f β f (x0 ) α

x1 x x2 x0 α = f (x) < f (x2 ) < α

x

FIGURE 4.7: Intermediate value property implies continuity. [a, b]. Since every pair of points in I lies in such a subinterval, f is strictly increasing on I. Now let x0 ∈ I. To verify continuity of f at x0 , note that by monotonicity α := lim− f (x) ≤ f (x0 ) ≤ β := lim+ f (x). x→x0

x→x0

(If x0 is an endpoint, only one of these inequalities holds.) Continuity of f at x0 will then follow if we show that α = f (x0 ) = β. Suppose, for example, that α < f (x0 ). Choose any x1 < x0 in I. Since f (x1 ) < α < f (x0 ), there exists some x ∈ (x1 , x0 ) such that f (x) = α. But choosing x2 ∈ (x, x0 ) then produces the contradiction f (x) = α < f (x2 ) < α. 4.4.2 Lemma. If f is strictly increasing (decreasing) on an interval I, then f −1 is strictly increasing (decreasing) on f (I). Proof. Assume that f is strictly increasing. If y1 = f (x1 ) < y2 = f (x2 ), then x1 < x2 (that is, f −1 (y1 ) < f −1 (y2 )), since otherwise f (x1 ) ≥ f (x2 ). Therefore, f −1 is strictly increasing on I.

90

A Course in Real Analysis

The next two theorems are the main results on inverse functions. They assert that the properties of continuity or differentiability of a one-to-one function are inherited by the inverse function. 4.4.3 Theorem. Let f be continuous and one-to-one on an interval I. Then J := f (I) is an interval and f −1 : J → I is continuous. Moreover, f and f −1 are strictly monotone. Proof. Since f is continuous, it has the intermediate value property, hence J is an interval. Moreover, by 4.4.1 and 4.4.2, f and f −1 are strictly monotone. Since I = f −1 (J) is an interval, f −1 has the intermediate value property. The continuity of f −1 now follows from 4.4.1. 4.4.4 Theorem. Let I be an open interval and let f : I → R be continuous and one-to-one on I. If f is differentiable at a ∈ I and f 0 (a) 6= 0, then f −1 is differentiable at f (a), and 0 f −1 (f (a)) =

1 f 0 (a)

.

Proof. Let y = f (x) and b = f (a). For x near a, −1 f −1 (y) − f −1 (b) x−a f (x) − f (a) = = . y−b f (x) − f (a) x−a Since f −1 is continuous, x = f −1 (y) → f −1 (b) = a as y → b and the conclusion follows. If f is differentiable and nonzero on I and y = f −1 (x), then x = f (y) and assertion of the theorem may be written in Leibniz notation as dy = dx

1 . dx dy

From 4.4.3 we obtain the following result, which will be generalized in Chapter 9 to functions on open subsets of Rn . 4.4.5 Inverse Function Theorem. Let f be continuously differentiable on an open interval I. If f 0 (a) 6= 0, then there exist open intervals Ia ⊆ I and Ja = f (Ia ) with a ∈ Ia such that f is one-to-one on Ia and f −1 : Ja → Ia is continuously differentiable. Proof. Since f 0 is continuous and f 0 (a) 6= 0, there exists an open interval Ia containing a on which f 0 6= 0. By the mean value theorem, f is one-to-one on Ia , hence, by 4.4.3, Ja = f (Ia ) is an interval, and, by 4.4.4, f −1 : Ja → Ia is continuously differentiable.

Differentiation on R

91

4.4.6 Global Inverse Function Theorem. Let f be continuously differentiable with f 0 nonzero on an open interval I. Then f is one-to-one on I, J := f (I) is an open interval, and f −1 : J → I is continuously differentiable. Proof. That f is one-to-one follows from the mean value theorem. By 4.4.3, J is an interval and f −1 : J → I is continuous. Since continuous differentiability is a local property, 4.4.5 implies that f −1 is continuously differentiable. The following examples, as well as exercises below, establish the existence and basic properties of several well-known functions. 4.4.7 Example. Since x = sin y is strictly increasing on [−π/2, π/2], the inverse function y = sin−1 x exists, is strictly increasing on [−1, 1], and dy = dx

dx dy

−1

=

1 1 =√ , −1 < x < 1. cos y 1 − x2

Similarly, x = cos y is strictly decreasing on [0, π], hence y = cos−1 x exists, is strictly decreasing on [−1, 1], and dy = dx

dx dy

−1

=

−1 −1 =√ , −1 < x < 1. sin y 1 − x2

♦

An alternate approach to the preceding example is to define the inverse sine by Z x dt −1 √ sin x = , −1 < x < 1 1 − t2 0 and then obtain the sine function as the inverse of sin−1 . This allows the derivation of the standard properties of sin x, and ultimately of the other trig functions, without relying on geometric arguments. The disadvantage of this approach is that verification of these properties is detailed and lengthy. Still another approach is based on complex infinite series. For the latter, the reader may wish to consult [7]. The following example illustrates the integral approach for the exponential function. Some of the assertions in the example rely on results from Chapters 5 and 6 but should be familiar to the reader. 4.4.8 Example. The natural logarithm function is defined by Z x 1 ln x := dt, x > 0. 1 t One may show that all the familiar algebraic properties of the natural log follow from this definition. (See Exercise 5.) Since ln x is strictly increasing on (0, +∞), the inverse function exp x := ln−1 x

92

A Course in Real Analysis

exists and is strictly increasing. Since ln 2 > 0, ln 2n = n ln 2 → +∞ and ln 2−n = −n ln 2 → −∞, hence

lim ln x = +∞ and

lim ln x = −∞.

x→+∞

x→0+

It follows from these limits and the intermediate value theorem that the range of ln x, that is, the domain of exp x, must be R. Thus, by Exercise 4, lim exp x = 0 and

x→−∞

lim exp x = +∞.

x→+∞

From the fundamental theorem of calculus, proved in the next chapter, 1 d ln y = , hence dy y −1 d exp x d ln y = = y = exp x. dx dy Moreover, since ln(1 + 1/n) − ln 1 d ln y = lim = lim ln(1 + 1/n)n , 1= n→+∞ dy y=1 n→+∞ 1/n continuity of exp and 2.2.4 imply that exp 1 = lim exp ln(1 + 1/n)n = lim (1 + 1/n)n = e. n→+∞

n→+∞

Additional properties of exp x may be found in the exercises, including the identity exp r = er , r ∈ Q. Because of this identity, we frequently write ex for exp x. Indeed, the function exp is the basis for rigorous definitions of the general exponential function ax , a > 0, and the power function xa , x ≥ 0. (See Exercises 8 and 9.) ♦

Exercises 1. Find f −1 and its domain for each of the following functions f with the given domain: (a) x2 − 4x + 5, [2, +∞). (c)

(b) S

5e−x + 2 , (−∞, +∞). (d) 3e−x + 7

(e) ex − 2e−x , (−∞, +∞)

(f) S

3x + 2 , R \ {−3/2}. 2x + 3 sin2 x − 4 sin x + 3, [−π/2, π/2]. 2 + cos x , (0, π). 3 + cos x

2. Let f (x) = ax + |x| + |x − 1|. Find all values of a for which f −1 exists on R. For these values, find f −1 .

Differentiation on R

93

3. Give an example of a one-to-one continuous function on the union of two intervals that is (a) not monotone, (b) strictly monotone but with discontinuous inverse. 4. Let f be defined, continuous, and strictly increasing on (a, b), so the limits c := lim f (x) and d := lim f (x) x→a+

x→b−

exist in R. Show that the domain of f −1 is (c, d) and that lim f −1 (x) = a and

x→c+

lim f −1 (x) = b.

x→d−

5. Verify the following properties of ln x, as defined in 4.4.8: (a)

ln 1 = 0, ln e = 1.

(b) S ln(xy) = ln x + ln y.

(c)

ln(x/y) = ln x − ln y.

(d)

ln xr = r ln x, r ∈ Q.

6. Prove that exp(x + y) = exp(x) exp(y). 7. For c, d ∈ R with c > 0, define cd = exp(d ln c). Show that this definition agrees with the usual one if d is rational and verify the following properties, where x, y ∈ R and a, b > 0. (a) ln ax = x ln a. y (d) ax = axy .

(b)S ax ay = ax+y . (e) (ab)x = ax bx .

(c) ax /ay = ax−y . (f) aln b = bln a .

8. Let a > 0, a 6= 1, and define ax as in Exercise 7. Find limx→−∞ ax , limx→+∞ ax , and (ax )0 . 9.S Let a ∈ R and for x > 0 define xa as in Exercise 7. Prove the power rule (xa )0 = axa−1 . 10. Prove that tan x restricted to (−π/2, π/2) has a differentiable inverse defined on R. Find limx→−∞ tan−1 x, limx→+∞ tan−1 x, and (tan−1 x)0 . 11. Prove that sec x restricted to [0, π/2) ∪ [π, 3π/2) has a continuous inverse defined on (−∞, −1] ∪ [1, +∞). Show that sec−1 x is differentiable on (−∞, −1) ∪ (1, +∞) and compute its derivative. Also, find limx→−∞ sec−1 x and limx→+∞ sec−1 x. 12. Verify the inequalities x−1 (a) < ln x < x − 1, x > 1. (b) | tan−1 x − tan−1 y| ≤ |x − y|. x y−x y−x (c) √ < | sin−1 y − sin−1 x| < p , −1 < x < y < 1. 1 − x2 1 − y2

94

A Course in Real Analysis

13. Verify the identities x (a) tan sin−1 x = √ , −1 < x < 1. 1 − x2 (b) sin−1 x + cos−1 x = π/2, −1 ≤ x ≤ 1. 2 x −1 + 2 tan−1 x = π, x ≥ 0. (c)S cos−1 x2 + 1 r 1−x −1 −1 (d) cos x = 2 sin , −1 ≤ x ≤ 1. 2 (e) tan−1 x + tan−1 (2/x) + tan−1 (x + 2/x) = π, x 6= 0. 14.S Suppose f satisfies f (x + y) = f (x)f (y) for all x, y ∈ R. Show that if a := f 0 (0) exists, then f (x) = f (0)eax . 15. Suppose that f : [0, 1] → [0, 1] is continuous, one-to-one, onto, and f = f −1 . Prove that either f (x) = x for all x or f is monotone decreasing. 16. Suppose f 0 is one-to-one on an open interval I. Show that f 0 is continuous and strictly monotone on I. (See Exercise 4.2.25.) 17. Let f be differentiable on an open interval I with f 0 6= 0. Let a, b ∈ I with a < b and suppose that f : [a, b] → [a, b] is one-to-one and onto. Prove that there exists c ∈ (a, b) such that f (b) − f (a) = f 0 (c)f 0 f −1 (c) . −1 − f (a)

f −1 (b)

18.S Let f be twice differentiable and f 0 6= 0 on an open interval I. Show that (f −1 )00 (x) exists on f (I) and find a formula.

4.5

L’Hospital’s Rule

The rule for calculating the limit of a quotient of functions, namely, lim

x→a x∈E

lim{x→a, x∈E} f (x) f (x) = , g(x) lim{x→a, x∈E} g(x)

(4.4)

requires that the limits on the right are finite and the denominator is not 0. If, instead, the limits in the quotient are both zero or ±∞, then the expression on the left in (4.4) is called an indeterminate form of type 00 or ±∞ ±∞ , respectively. There are other types of indeterminate forms, but all may be converted to one of these. The following theorem describes a method for evaluating these limits.

Differentiation on R

95

4.5.1 l’Hospital’s Rule. Let J be an open interval, finite or infinite, and let a ∈ R be an accumulation point of J. Suppose that f and g are differentiable on E := J \ {a} and that g(x)g 0 (x) 6= 0 for every x ∈ E. If the limits A := x→a lim f (x), B := x→a lim g(x), and L := x→a lim x∈E

x∈E

x∈E

f 0 (x) g 0 (x)

exist in R and either A = B = 0 or B = ±∞, then lim

x→a x∈E

f (x) = L. g(x)

Proof. There are a number of cases to consider, but the proofs of many of these are essentially the same. We prove the theorem for four fundamentally different cases and for one-sided limits, so E = (a, c) or (c, a) for some c. As a first step, we use the Cauchy mean value theorem to obtain, for every pair of distinct numbers x, b ∈ E, a number ξ = ξ(x, b) between x and b such that [f (x) − f (b)]g 0 (ξ) = [g(x) − g(b)]f 0 (ξ). (4.5) Now set h(x) =

f (x) . g(x)

Case 1 : A = B = 0, a and L are finite, and E = (a, c). Extend f and g continuously to [a, c) by defining f (a) = g(a) = 0. Taking b = a and x ∈ (a, c) in (4.5) we see that f 0 (ξ) h(x) = 0 . g (ξ) Since ξ → a as x → a, limx→a+ h(x) = L, as required. For the remaining cases, we use the Cauchy mean value theorem in the following form. Divide (4.5) by g 0 (ξ)g(x) and solve the resulting equation for h = f /g to obtain f (b) g(b) f 0 (ξ) h(x) = + 1− , x, b ∈ E. (4.6) g(x) g(x) g 0 (ξ) Case 2 : A = B = 0, a = L = +∞, and E = (c, +∞). Let M > 0 and choose x0 ∈ E such that f 0 (x) > 2M for x > x0 . g 0 (x) Let b > x > x0 . For large b, g(b)/g(x) < 1/2, hence from (4.6) h(x) ≥

f (b) 1 f (b) + (2M ) = + M. g(x) 2 g(x)

96

A Course in Real Analysis

Letting b → +∞ we see that h(x) ≥ M . Therefore, limx→+∞ h(x) = +∞. Case 3 : B = +∞, a and L are finite and E = (c, a). Given ε > 0, choose b ∈ E such that 0 f (t) < ε/2 for all t ∈ (b, a). − L g 0 (t) Let x ∈ (b, a). By (4.6), h(x) −

f 0 (ξ) f (b) g(b) f 0 (ξ) = − . 0 g (ξ) g(x) g(x) g 0 (ξ)

Since the right side tends to 0 as x → a, f 0 (ξ) f 0 (ξ) |h(x) − L| ≤ h(x) − 0 + 0 − L < ε/2 + ε/2 = ε g (ξ) g (ξ) for all x near a. Therefore, limx→a− h(x) = L. Case 4 : B = +∞, a = L = +∞, and E = (c, +∞). Given M > 0, choose b > c such that f 0 (t) > 3M for all t > b. g 0 (t) Let x > b such that g(x) > g(b). By (4.6), f (b) g(b) + 1− M. h(x) ≥ g(x) g(x) Since the quotients on the right side tend to zero, for all sufficiently large x we have 1i M h + 1 − (3M ) = M h(x) ≥ − 2 2 Therefore, limx→+∞ h(x) = +∞. The following examples illustrate typical applications of l’Hospital’s rule. Examples. (a) The limit L := lim

x→0

x − tan x x3

is of the form 00 , hence 1 − sec2 x 2 sec2 x tan x sec4 x + 2 sec x tan2 x 1 = lim = lim =− . x→0 x→0 x→0 3x2 −6x −3 3

L = lim

Note that each step except the last produces a limit of the form 00 , allowing another application of l’Hospital’s rule. The validity of each step is ultimately justified by the existence of the final limit.

Differentiation on R (b) The limit

97

sin(1/x) x→+∞ e1/x − 1

L := lim

is of the form 00 ; however, it is complicated to apply l’Hospital’s rule directly. Making the substitution y = 1/x produces a more tractable problem: L = lim+ y→0

(c) The limit

sin y cos y = 1. = lim ey − 1 y→0+ ey

L := lim x sin(1/x) x→+∞

is of the form ∞ · 0, but a simple algebraic manipulation produces the form 00 : sin(1/x) sin y = lim+ = 1. x→+∞ y→0 1/x y

L = lim

Here, l’Hospital’s rule was unnecessary, since we could use a known limit. p

(d) The limit L := limx→1+ x1/(x logarithms to obtain the form 00 :

−1)

, p > 0, is of the form 1∞ , so we take

h i p 1/x 1 ln x = lim+ p−1 = . lim+ ln x1/(x −1) = lim+ p x→1 px x→1 x→1 x − 1 p Thus L = e1/p . (e) The technique used in (d) shows that x t lim 1+ = et , x→+∞ x since

x t ln(1 + ty) t lim ln 1 + = lim+ = lim+ = t. x→+∞ y→0 y→0 1 + ty x y

(f) The limit L :=

lim

x→π/2+

h

i 1 + sec x x − π/2

is of the form ∞ − ∞. Combining fractions we obtain a limit of the form 00 . Thus L= = =

lim

x→π/2+

lim

x→π/2+

lim

x→π/2+

= 0.

cos x + x − π/2 (x − π/2) cos x 1 − sin x (π/2 − x) sin x + cos x − cos x (π/2 − x) cos x − 2 sin x ♦

98

A Course in Real Analysis

Exercises 1. Evaluate the following limits, where p, q > 0: epx − eqx epx − ep (b) lim x→0 x→1 tan(x − 1) sin x ln(sin px) (d) S lim x 1 − e1/x (e) lim+ x→+∞ x→0 ln(sin qx) −1 x − tan x 1 1 S (g) lim (h) lim+ − x→0 x − sin−1 x x→0 x sin x (a) S lim

(j)

(xp )

lim (sin x)(ln x) (k) lim+ x x→0+ x→0 x−1 1 x x+1 − (n) lim+ (m) S lim+ x→1 x→0 tan x x x−1 2 1 − cos x sin x + cos x − 1 (p) S lim 2 (q) lim x→0 x + x3 sin x x→0 ln(1 + x) −x 1 S 1/(ln ln x)p (s) lim x (t) lim 1− √ x→+∞ x→+∞ x S

(v) S lim+ xsin x x→0

x cos x − sin x x→0 x2 sin x

(w) lim

ln(3x2 − 1) x→+∞ ln(5x2 − 1) p sin(px) − p2 x (f) lim x→0 x3 (c) lim

(i) lim+ ln(x − 1) ln x x→1

1/x2

(l) lim (cos x) x→0 √ x ln x (o) lim x→1 x − 1 x x−1 (r) lim x→+∞ x + 1 (u) lim+ (sin x)

x

x→0

(x) lim+ x→0

(1 + x)1/x − e x

2. For each function f : (0, 1] → R below, define f (0) so that f continuous on [0, 1]. (a) (d)

1 − ex . x x . tan x

(b)

ln(1 + x) . x

(e) x ln x.

(c) S (f)

sin 5x . sin 3x 1 − cos 2x . 1 − cos 3x

3. Find limn an , where an = (a)S sin1/n (1/n). 4. Show that

(b) n − n2 ln(1 + 1/n).

(c) n [(1 + 1/n)n − e] .

if p < 2, 0 p p n + 1/n − n → 2 if p = 2 +∞ if p > 2

5. By considering the sequences {n} and {n + 1/n}, use l’Hospital’s rule to prove that ex is not uniformly continuous on [0, +∞). 6.S Let f (x) = x1+1/x . Evaluate limn f (n + 1) − f (n) . 7. Let f be differentiable on (a, b) and suppose that limx→a+ f (x) and limx→a+ f 0 (x) exist in R. Find a continuous extension of f to [a, b) such that f 0 exists and is continuous at a.

Differentiation on R

99

8. Let g be differentiable on (1, +∞) and h differentiable on (−∞, 1] with lim g(x) = h(1) and

x→1+

Define

( f (x) =

lim g 0 (x) = h0` (1).

x→1+

(†)

g(x) if x > 1 h(x) if x ≤ 1.

Show that f is differentiable at x = 1 and hence on R. Conversely, suppose that f 0 (1) exists. Do the limit equations in (†) hold? 9.S Let f and g be differentiable on (0, +∞) with lim f (x) = lim g(x) = +∞, and

x→+∞

Evaluate

x→+∞

f 0 (x) ∈ (0, +∞). x→+∞ g 0 (x) lim

ln f (x) . x→+∞ ln g(x) lim

10. Let f be differentiable in a neighborhood of a and suppose that f 00 (a) exists. For α, β ∈ R calculate βf (a + αh) − αf (a + βh) + (α − β)f (a) . h2 f (a + αh) + f (a + βh) − 2f (a) lim if f 0 (a) = 0. h→0 h2

(a)S lim

h→0

(b)

11. Suppose that f has n derivatives on [a, +∞) and that limx→+∞ f (n) (x) exists in R. Prove that limx→+∞ f (x)/xn exists in R. 12.S Suppose that f has n derivatives on (0, a) and L := lim+ x2n f (n) (x) exists in R. Find limx→0+ xn f (x) in terms of L.

x→0

13. Let f be differentiable on (1, +∞) and limx→+∞ f (x) = 0. Prove that if limx→+∞ x2 f 0 (x) exists in R, then limx→+∞ xf (x) also exists in R. Is the converse true? 14. Suppose that, in a deleted neighborhood of 0, f is differentiable with f 0 6= 0 and that limx→0 f (x) = 0. Prove that if limx→0 f (x)/f 0 (x) exists, then it must equal 0. 15. Let g(x) be differentiable on (1, ∞) with g and g 0 nonzero and let f (x) be differentiable in a neighborhood of 0. Suppose that limx→+∞ g(x) = 0, f (0) = 0 and f 0 is continuous at 0. Find lim

x→+∞

f (g(x)) . g(x)

Give nontrivial examples of functions f and g that satisfy these conditions.

100

A Course in Real Analysis

16.S Let f and g be differentiable on (1, +∞) with g 0 6= 0 and suppose that limx→+∞ f (x) = limx→+∞ g(x) = +∞ and that the limit L := limx→+∞ f 0 (x) exists in R. Find lim

x→+∞

f (g(x)) . g(x)

Give nontrivial examples that satisfy these conditions with L finite. 17. Let f be differentiable on (1, +∞) and suppose that limx→+∞ f (x) and limx→+∞ f 0 (x) exist in R. Prove that the second limit must be zero. Does the assertion still hold if limx→+∞ f (x) is infinite? 18.S Let f be differentiable on (1, +∞) and suppose that limx→+∞ f (x) and limx→+∞ xf 0 (x) exist in R. Prove that the second limit is zero. Does the assertion still hold if limx→+∞ f (x) is infinite? 19. Let f be differentiable in a deleted neighborhood of 0 and suppose that limx→0 f (x) and limx→0 f 0 (x) tan x exist in R. Prove that the second limit must be 0. Does the assertion still hold if limx→0 f (x) is infinite? 20. Let f be differentiable on (0, b) and suppose that the limits limx→0+ f (x) and limx→0+ x2 f 0 (x) exist in R. Prove that one of these limits must be zero. Does the assertion still hold if limx→0+ f (x) is infinite? 21. Let f 00 exist and be continuous on (−1, 1) and f (0) = f 0 (0) = 0. Prove that there exists a continuous function g on (0, 1) such that f (x) = x2 g(x). Must g be differentiable at 0?

4.6

Taylor’s Theorem on R

Taylor’s theorem may be viewed as a generalization of the mean value theorem. Its importance derives from its use in establishing various inequalities and from its fundamental connection with power series. 4.6.1 Taylor’s Theorem. Let f have n + 1 derivatives in an open interval I. Then, for each x, a ∈ I with x 6= a, there exists a number c between x and a such that f (x) =

n X f (k) (a) k=0

k!

(x − a)k +

f (n+1) (c) (x − a)n+1 . (n + 1)!

(4.7)

Proof. Assume for definiteness that a < x. Define a function g on [a, x] by g(t) =

n X f (k) (t) k=0

k!

(x − t)k + α

(x − t)n+1 − f (x), (n + 1)!

(4.8)

Differentiation on R

101

where α is chosen so that g(a) = 0. Since g is continuous on [a, x], differentiable on (a, x) and g(x) = g(a), there exists, by Rolle’s theorem, c ∈ (a, x) such that g 0 (c) = 0. From the calculations (k+1) (t) f (k) (t) f (k) d f (t) (x − t)k − (x − t)k−1 if k ≥ 1, k (x − t) = k! (k − 1)! 0 dt k! f (t) if k = 0, we have g 0 (t) = =

n X f (k+1) (t) k=0 (n+1)

f

k! (t)

n!

(x − t)k −

n−1 X k=0

(x − t)n f (k+1) (t) (x − t)k − α k! n!

(x − t)n (x − t)n − α . n!

In particular, 0 = g 0 (c) =

(x − c)n f (n+1) (c) (x − c)n − α , n! n!

hence α = f (n+1) (c). Thus from (4.8), 0 = g(a) =

n X f (k) (a) k=0

k!

(x − a)k +

f (n+1) (c) (x − a)n+1 − f (x), (n + 1)!

which is (4.7). Equation (4.7) is frequently written f (x) = Tn (x, a) + Rn (x, a), where n (k) X f (a) f (n+1) (c) Tn (x, a) = (x − a)k and Rn (x, a) = (x − a)n+1 . k! (n + 1)! k=0

The expression Tn (x, a) is called the nth Taylor polynomial of f about a, and Rn (x, a) is called the remainder. It may be shown that Tn (x, a) is the unique polynomial of degree ≤ n that best approximates f near a in the sense that lim

x→a

f (x) − Tn (x, a) = 0. (x − a)n

(See Exercise 4.) The remainder term Rn (x, a) has other forms, one of which is given in Exercise 3. Observe that if Rn (x, a) → 0 as n → +∞, then Tn (x, a) → f (x), which implies that f (x) is expressible as a power series about a. We exploit this idea in Section 7.4. The following application of Taylor’s theorem is a generalization of the second derivative test.

102

A Course in Real Analysis

4.6.2 nth Derivative Test. Let f have n continuous derivatives on an open interval I and let a ∈ I with f (j) (a) = 0, 1 ≤ j ≤ n − 1, and f n (a) 6= 0. (a) If n is even and f (n) (a) > 0 (f (n) (a) < 0), then f has a local minimum (local maximum) at a. (b) If n is odd, then f has a neither a local minimum nor a local maximum at a. Proof. Assume f (n) (a) > 0. By continuity, f (n) (x) > 0 for all x in an open interval J containing a. Let x ∈ J, x 6= a. By Taylor’s theorem, there exists c between a and x such that f (x) = f (a) + f (n) (c)

(x − a)n . n!

Thus if n is even, then f (x) > f (a), hence f has a local minimum at a. If n is odd, then f (x) > f (a) if x > a and f (x) < f (a) if x < a, so f has a neither a local maximum nor a local minimum at a. A similar argument works for the case f (n) (a) < 0. Note that the familiar second derivative test, obtained by taking n = 2 in the theorem, is inconclusive for the function f (x) = x4 at a = 0. Here, one must take n = 4.

Exercises 2

1. Define f (0) = 0 and f (x) = e−1/x x 6= 0. Prove that f (n) exists on R and f (n) (0) = 0 for all n. Conclude that every Taylor polynomial for f about 0 is identically 0. 2. Verify the following inequalities: (a)

2n−1 X

2n

(−1)k xk <

k=0

(b)S

k=0

2n−1 X k=0

(c)

2n X k=1

(d)

k=1

2n X k=0

(−1)k k x , x > 0. k!

2n+1 X (−1)k+1 (−1) x2k−1 < sin x < x2k−1 , 0 < x < π. (2k − 1)! (2k − 1)!

k=0

(e)

(−1) k x < e−x < k! k

k+1

k=1

2n−1 X

n−1 X

X 1 < (−1)k xk , x > 0. 1+x

(−1) 2k x < cos x < (2k)!

(−1) k

k

k−1

2n X k=0

xk < ln(1 + x) <

(−1)k 2k x , 0 < x < π. (2k)! n X (−1)k−1 k=1

the reverse inequalities if n is even.

k

xk if n is odd,

Differentiation on R

103

3.S ⇓2 Show that if f (n+1) is continuous on I, then Z 1 x Rn (x, a) = (x − t)n f (n+1) (t) dt. n! a Hint. Integrate by parts n times. 4. Prove that a polynomial Pn (x) =

Pn

k=0

ak (x − a)k satisfies

f (x) − Pn (x) =0 (x − a)n

lim

x→a

iff Pn = Tn , the nth Taylor polynomial of f about a. Pn Pn 5.S Let P (x) = k=0 ak (x − a)k = k=0 bk (x − b)k . Show that bk =

n−k X j=0

j+k (b − a)j ak+j . k

6. Let P be a polynomial of degree n. Prove that the polynomials P (x ± 1) may be written as linear combinations of P (k) (x), k = 0, . . . , n. Find simplified expressions for P (x + 1) ± P (x − 1). 7. Let f have n derivatives on [0, 1]. Show that for each y = 6 f (1) there exists an extension g of f to [0, +∞) with n derivatives such that g(b) = y for some b > 1.

*4.7

Newton’s Method

A simple zero of a differentiable function f is a number z such that f (z) = 0 and f 0 (z) 6= 0. Newton’s method is a rapidly converging recursion scheme for approximating such a zero. The idea is to choose x1 near z and then define a sequence {xn } recursively by xn+1 = xn −

f (xn ) , n = 1, 2, . . . , f 0 (xn )

(4.9)

as illustrated in Figure 4.8. Under suitable conditions, the sequence is welldefined and converges to z, hence may be used to approximate z to (theoretically) any desired degree of accuracy. 4.7.1 Newton’s Method. Let f 00 be continuous on an open interval I and let z be a simple zero of f in I. If x1 is chosen sufficiently near z, then the sequence {xn } lies in I and converges to z. 2 This

exercise will be used in 5.6.3.

104

A Course in Real Analysis

y = f (xn ) + f 0 (xn )(x − xn ) y = f (x)

z

xn+2

xn+1

xn

x

FIGURE 4.8: Newton’s method. Proof. Since f 0 (z) 6= 0, there exists a neighborhood Iz of z contained in I on which |f 0 | ≥ c > 0. Suppose that xn ∈ Iz . By Taylor’s theorem, for each x ∈ I there exists ξ between x and xn such that f (x) = f (xn ) + f 0 (xn )(x − xn ) + 12 f 00 (ξ)(x − xn )2 . In particular, 0 = f (z) = f (xn ) + f 0 (xn )(z − xn ) + 12 f 00 (ξ)(z − xn )2 . Dividing by f 0 (xn ), we have xn+1 − z = xn − z −

f 00 (ξ) f (xn ) = (z − xn )2 . f 0 (xn ) 2f 0 (xn )

Thus if d is the maximum of |f 00 | on Iz , then |xn+1 − z| ≤ α|xn − z|2 ,

α :=

d . 2c

Iterating, we have |xn+1 − z| ≤ · · · ≤ α2

k+1

−1

|xn−k − z|2

k+1

≤ · · · ≤ α2

n

−1

n

|x1 − z|2 .

Thus if x1 is sufficiently near z, and in particular if α|x1 − z| < 1, then xn ∈ Iz for all n and xn → z. 4.7.2 Example. Let f (x) = sin x − x/3. Since √ f (3π/4) = 1/ 2 − π/4 < 0 < 1 − π/6 = f (π/2), f has a zero in [3π/4, π/2] by the intermediate value theorem. Taking x1 = 3π/4 yields the zero 2.27886266, accurate to eight decimal places. Taking x1 = 1 produces the symmetric zero −2.27886266, while x1 = π/4 produces 0. ♦

Differentiation on R

105

If x1 is not sufficiently near z, then the sequence {xn } may converge more slowly to z or may not converge at all (see Exercise 6). 4.7.3 Example. For an approximate solution of ex = 2−x we apply Newton’s method to f (x) = ex + x − 2. By the intermediate value theorem, f has a zero in (0, 1). The recursion formula for f is xn+1 = xn − (exn + xn − 2)(exn + 1)−1 . Table 4.1 gives the first few terms of the sequence {xn } and the corresponding TABLE 4.1: Newton’s method for ex + x − 2 = 0. x1 1 f (x1 ) 1.7182818 x1 5 f (x1 ) 151.4131591

x2 .5378828 f (x2 ) .2502604 x2 3.9866142 f (x2 ) 55.8587993

x3 .4456167 f (x3 ) .0070696 x3 2.9686340 f (x3 ) 20.4339472

x4 .4428567 f (x4 ) .0000059 x4 1.9701667 f (x4 ) 7.1420387

x5 .4428544 f (x5 ) .0000000 x5 1.0961884 f (x5 ) 2.0889256

values of f (accurate up to seven decimal places) for the initial values x1 = 1 and x1 = 5. The convergence is significantly slower for the larger value. The solution, accurate to 10 decimal places, is .4428544010. ♦

Exercises 1. Find a zero, accurate to eight decimal places, of the given polynomial in the indicated interval. (a) S x3 − x + 2, [−2, −1].

(b) x3 + x + 1, [−1, 0].

(c) x3 − 2x + 2, [−2, −1].

(d) S x5 − 2x + 3, [−2, −1].

(e) x7 − x − 1, [1, 2].

(f) x4 − 2x3 + 5x2 − 8x − 6, [2, 3].

(g)S 20x4 − 20x3 − 8x2 + 4x − 1, [1, 2]. (h) 20x4 − 20x3 − 4x + 1, [1, 2]. 2. Find a solution of the given equation in the indicated interval, correct to eight decimal places. (a) S sin x = x2 , [.5, 1].

(b)

(c)

ln x + x = 2, [1, 2].

(d) 2 cos x = ex , [0, 1].

ln x = e−x , [1, 2].

(f)

(e)

S

sin x = x3 , [.5, 1]. tan x + x = 1, [0, 1].

106

A Course in Real Analysis

3. Show that Newton’s method applied to the function x−1 − c produces the equation xn+1 = 2xn − cx2n . Use this to find 1/2.34567, correct to eight decimal places. Check your answer with a calculator. √ 4.S Use Newton’s method to find 63 correct to eight decimal places. Check your answer with a calculator. 5. What happens when you apply Newton’s method with x1 = 1 to the polynomial in part (c) of Exercise 1? 6. Show that the sequence generated by Newton’s method applied to f (x) = x1/3 cannot converge for any value of x1 6= 0.

Chapter 5 Riemann Integration on R

5.1

The Riemann–Darboux Integral Throughout this section, f denotes an arbitrary bounded, real-valued function on a closed and bounded interval [a, b].

The first step in the development of the Riemann–Darboux integral is to partition the interval [a, b] into finitely many subintervals, which are used to form upper and lower sums of f . Under suitable conditions, the sums converge to the integral. 5.1.1 Definition. A partition of [a, b] is a set P = {x0 , x1 , . . . , xn−1 , xn }, where x0 := a < x1 < · · · < xn−1 < xn := b. The points x1 , . . . , xn−1 are called the interior points of the partition. The mesh of the partition is defined as kPk := max ∆xj , where ∆xj := xj − xj−1 , 1 ≤ j ≤ n. 1≤j≤n

A refinement of P is a partition containing P. The common refinement of partitions P and Q is the partition P ∪ Q. ♦ 5.1.2 Example. Let p ∈ N. Then, for each n ∈ N, Pn := {j/pn : j = 0, 1, . . . , pn } is a partition of [0, 1], kPn k = p−n , and Pn+1 is a refinement of Pn .

♦

5.1.3 Definition. The lower and upper (Darboux) sums of f over a partition P of [a, b] are defined, respectively, by S(f, P) :=

n X

mj ∆xj

and S(f, P) :=

n X

Mj ∆xj ,

j=1

j=1

inf

f (x) and Mj = Mj (f ) :=

where mj = mj (f ) :=

xj−1 ≤x≤xj

sup

xj−1 ≤x≤xj

f (x).

♦ 107

108

A Course in Real Analysis

A geometric interpretation of the upper and lower sums for a positive continuous function is given in Figure 5.1. The lower (upper) sum is the total area of the smaller (larger) rectangles.

f

a

x2

x1

x3

x4

b

x

FIGURE 5.1: Upper and lower sums of f . The following proposition asserts that refinements increase lower sums and decrease upper sums. 5.1.4 Proposition. If Q is a refinement of P, then S(f, P) ≤ S(f, Q) ≤ S(f, Q) ≤ S(f, P). Proof. The middle inequality is clear. To prove the rightmost inequality, let P = {x0 = a < x1 < · · · < xn−1 < xn = b} and assume first that Q = P ∪ {c}. Choose k so that xk−1 < c < xk and set Mk0 =

sup

xk−1 ≤x≤c

f (x) and Mk00 =

sup f (x). c≤x≤xk

Then Mk0 , Mk00 ≤ Mk , hence S(f, Q) =

k−1 X

Mj ∆xj +

j=1

≤

k−1 X

n X

Mj ∆xj + Mk0 (c − xk−1 ) + Mk00 (xk − c)

j=k+1

Mj ∆xj +

j=1

n X

Mj ∆xj + Mk (c − xk−1 ) + Mk (xk − c)

j=k+1

= S(f, P). For the general case, observe that any refinement Q of P may be obtained by successively adding points to P. At each step, the upper sum is decreased so that ultimately one obtains the desired inequality. The proof for lower sums is similar. 5.1.5 Corollary. For any partitions P and Q of [a, b], S(f, Q) ≤ S(f, P). Proof. By 5.1.4, S(f, Q) ≤ S(f, P ∪ Q) ≤ S(f, P ∪ Q) ≤ S(f, P).

(5.1)

Riemann Integration on R

109

5.1.6 Definition. The lower and upper (Darboux ) integrals of f on [a, b] are defined, respectively, by Z b Z b Z b Z b f= f (x) dx := sup S(f, P) and f= f (x) dx := inf S(f, P), a

P

a

a

P

a

where the supremum and infimum are taken over all partitions P of [a, b]. In each case, f is called the integrand and x the integration variable. ♦ 5.1.7 Proposition. For any partition P of [a, b], Z b Z b S(f, P) ≤ f≤ f ≤ S(f, P). a

a

Proof. The left and right inequalities are immediate from the definition of lower and upper integrals. The middle inequality follows by taking the infimum over Q and then the supremum over P in (5.1). 5.1.8 Proposition. The following statements are equivalent: Z b Z b (a) f= f. a

a

(b) For each ε > 0 there exists a partition Pε of [a, b] such that S(f, Pε ) − S(f, Pε ) ≤ ε. Proof. (a) ⇒ (b): Given ε > 0, there exist partitions P 0 and P 00 such that Z b Z b 0 00 f − ε/2 < S(f, P ) and S(f, P ) < f + ε/2. a

a

By 5.1.4, the inequalities still hold if P 0 and P 00 are each replaced by their common refinement Pε := P 0 ∪ P 00 . Subtracting the resulting inequalities and applying (a) yields (b). (b) ⇒ (a): If the inequality in (b) holds then, by 5.1.7, Z b Z b 0≤ f− f ≤ S(f, Pε ) − S(f, Pε ) < ε. a

a

Since ε is arbitrary, the integrals must be equal. 5.1.9 Definition. The function f is said to be (Darboux) integrable on [a, b] if one (hence both) of the conditions (a), (b) of 5.1.8 hold. In this case, the common value of the integrals in (a) is called the (Riemann–Darboux ) integral of f on [a, b] and is denoted by Z b Z b f= f (x) dx. a

a

110

A Course in Real Analysis

Also, define

a

Z

f =−

Z

b

b

f and

a

a

Z

f = 0.

a

The collection of all integrable functions on [a, b] is denoted by Rba .

♦

The following theorem guarantees a rich supply of integrable functions. 5.1.10 Theorem. If f is continuous on [a, b] except possibly at finitely many points, then f ∈ Rba . Proof. Denote the points of discontinuity of f , if any, by d1 < · · · < dn . For convenience, we assume that these lie in (a, b); only a minor modification of the proof is needed if d1 = a or dn = b. Let ε > 0. For each j, remove an open interval of width r centered at dj , the value of r to be determined. Since f is continuous on each of the resulting n + 1 closed intervals I0 , . . . , In , it is uniformly continuous there. (If f is continuous on [a, b], then n = 0 and I0 = [a, b].) Thus there exists a δ > 0 such that for each j, |f (x) − f (y)| < ε/2(b − a) for all x, y ∈ Ij with |x − y| < δ. Now, the endpoints of the intervals Ij form a partition P of [a, b]. If necessary, refine P by inserting points (marked by ∗ in Figure 5.2) into these intervals so that the distance between consecutive points is less than δ. The subintervals of P

Q

r

r

a I0

d1

β

α

I1 β

I2

d2 β

∗

d1

b

β

α

β

∗

d2

FIGURE 5.2: The partitions P and Q. the resulting partition Q are of two types: those that contain some dj , which we mark by α, and those that do not, which we mark by β. Thus, in the obvious notation, X X S(f, Q) − S(f, Q) = (Mj − mj )∆xj + (Mj − mj )∆xj . α

β

In the first sum, ∆xj < r and in the second, Mj − mj ≤ ε/2(b − a). Since the first sum has n terms (corresponding to the n discontinuities dj ), S(f, Q) − S(f, Q) < 2M nr + ε/2, where M is a bound for |f | on [a, b]. Choosing r < ε/4M n, we then have S(f, Q) − S(f, Q) < ε, which shows that f is integrable on [a, b].

Riemann Integration on R

111

The set of discontinuities of an integrable function can be infinite but may not be too large. We make this precise in Section 5.8. In the meantime, we offer the following examples to illustrate the basic idea. In the first example, the function is discontinuous only on a countably infinite set, while in the second the function is discontinuous everywhere.

0

x3 1/(n − 1) x4 · · · x2n−3 1/2 x2n−2

x1 1/n x2

1

FIGURE 5.3: The partition Pn of Example 5.1.11.

5.1.11 Example. Let f be any bounded function on [0, 1] such that f (x) = 0 R1 if x 6∈ {1/n : n = 2, 3 . . .}. We claim that f is integrable and that 0 f = 0. The idea is to enclose the points of discontinuity of f in small intervals, as in the proof of 5.1.10. Fix n and let Pn = {x0 = 0, x1 , x2 , . . . , x2n−2 , x2n−1 = 1}, where x2j−1 < 1/(n − j + 1) < x2j < x2j+1 , j = 1, 2, . . . , n − 1, and ∆x2j = x2j − x2j−1 < 1/n2 , j = 1, 2, . . . , n. (See Figure 5.3.) Let |f | ≤ M on [0, 1]. Since f = 0 on [x2j , x2j+1 ] and mj ≥ −M , S(f, Pn ) = m1 x1 + m2 (x2 − x1 ) + · · · + m2n−2 (x2n−2 − x2n−3 ) ≥ −M x1 + (x2 − x1 ) + (x4 − x3 ) + · · · + (x2n−2 − x2n−3 ) ≥ −M (1/n + (n − 1)/n2 ) = −M (2/n − 1/n2 ). A similar calculation shows that S(f, Pn ) ≤ M (2/n − 1/n2 ). Therefore, lim S(f, Pn ) = lim S(f, Pn ) = 0, n

n

hence f is integrable with zero integral.

♦

5.1.12 Example. The Dirichlet function d(x) (3.1.7) is not integrable on any (nondegenerate) interval [a, b]. Indeed, every upper sum of d(x) has the value b − a and every lower sum has the value 0. ♦ A useful characterization of integrability may be given in terms of the limits of S(f, P) and S(f, P) as kPk → 0. 5.1.13 Definition. Let L ∈ R. We write L = limkPk→0 S(f, P) if, given ε > 0, there exists δ > 0 such that |S(f, P) − L| < ε for all partitions P with kPk < δ. The limit limkPk→0 S(f, P) is defined analogously. ♦

112

A Course in Real Analysis

5.1.14 Lemma. Let P 0 = {x00 = a < x01 < · · · x0n < x0n+1 = b} be a partition of [a, b] and let |f | ≤ M on [a, b]. Then S(f, P) ≤ S(f, P 0 ) + 3nM kPk for all partitions P of [a, b] with kPk < δ 0 := minj ∆x0j .

P

0

P P 00

x02

x01

a γ

γ

γ

γ

γ

α β

γ

α

γ

β

b

β

β

γ

γ γ

FIGURE 5.4: The partitions P 0 , P, and P 00 . Proof. Since kPk < ∆x0j , no interval of P can contain more that one interior point of P 0 . Mark the intervals of P that contain exactly one interior point of P 0 by α and mark those that contain no interior point of P 0 by γ. Consider the common refinement P 00 = P ∪ P 0 of P and P 0 . Some of the intervals of P 00 were formed from an interval of P of type α; we mark those by β. The remaining intervals of P 00 , intervals that were not formed from an interval of P of type α, are precisely the intervals marked γ in P. Thus the terms of S(f, P) and S(f, P 00 ) corresponding to intervals of type γ are identical, hence cancel under substraction of upper sums. Therefore, in the obvious notation, X X S(f, P) − S(f, P 00 ) = Mj (f )∆xj − Mj00 (f )∆x00j α

≤M

β

hX

∆xj +

α

X

∆x00j

i

β

≤ M nkPk + 2nkP 00 k , the last inequality because there are at most n intervals of type α and at most 2n intervals of type β. Since P 00 is a refinement of P 0 and P, S(f, P) − S(f, P 0 ) ≤ S(f, P) − S(f, P 00 ) ≤ 3nM kPk. 5.1.15 Theorem. For any bounded function f on [a, b], Z

b

f = lim S(f, P) and kPk→0

a

b

Z

f = lim S(f, P).

a

kPk→0

(5.2)

Thus f is integrable on [a, b] iff the limits in (5.2) are equal, in which case Z a

b

f = lim S(f, P) = lim S(f, P). kPk→0

kPk→0

(5.3)

Riemann Integration on R

113

Proof. Given ε > 0, choose a partition P 0 such that Z b 0 S(f, P ) < f + ε/2. a

In the notation of 5.1.14, for any partition P with kPk < δ 0 , Z b S(f, P) ≤ S(f, P 0 )| + 3nM kPk < f + ε/2 + 3nM kPk. a

Hence if kPk < min{δ 0 , ε/6nM }, then Z b Z b f ≤ S(f, P) < f + ε. a

a

Since ε was arbitrary, the first limit in (5.2) is established. The second follows from the first by considering −f and using Exercise 5.1.3. Equation (5.3) represents the integral as a limit of upper and lower sums. It is also possible to represent the integral as a limit of intermediate sums, called Riemann sums. 5.1.16 Definition. Let P = {x0 = a < x1 < · · · < xn = b} be a partition of [a, b] and let ξ = (ξ1 , . . . , ξn ), where ξj ∈ [xj−1 , xj ]. The sum S(f, P, ξ) :=

n X

f (ξj )∆xj

j=1

is called the Riemann sum of f determined by P and ξ.

♦

Figure 5.5 illustrates a Riemann sum for a positive continuous function f . In this case S(f, P, ξ) is the total area of the rectangles with heights f (ξj ) and bases ∆xj . f

a

ξ1

x1

x2 ξ2

ξ3

x3

ξ4

x4

ξ5 b

x

FIGURE 5.5: A Riemann sum. 5.1.17 Definition. Let P = {x0 = a < x1 < · · · < xn = b} be a partition of [a, b] and let ξ = (ξ1 , . . . , ξn ), where ξj ∈ [xj−1 , xj ]. We write L = lim S(f, P, ξ) kPk→0

114

A Course in Real Analysis

if for each ε > 0 there exists δ > 0 such that |S(f, P, ξ) − L| < ε for all partitions P with kPk < δ and all choices of ξ. Similarly, we write L = lim S(f, P, ξ) P

if for each ε > 0 there exists a partition Pε such that |S(f, P, ξ) − L| < ε for all refinements P of Pε and all choices of ξ. ♦ We may now give Riemann’s characterization of integrability. 5.1.18 Theorem. The following statements are equivalent: (a) f ∈ Rba . (b)

lim S(f, P, ξ) exists in R.

kPk→0

(c) lim S(f, P, ξ) exists in R. P

If these conditions hold, then Z b f = lim S(f, P, ξ) = lim S(f, P, ξ). a

kPk→0

Proof. (a) ⇒ (b): Let L =

Rb a

P

f . For any partition P and any ξ, we have

S(f, P) − L ≤ S(f, P, ξ) − L ≤ S(f, P) − L, hence (b) follows from 5.1.15. (b) ⇒ (c): Let L := limkPk→0 S(f, P, ξ). Given ε > 0, choose δ > 0 such that |S(f, P, ξ) − L| < ε for all partitions P with kPk < δ and all ξ.

(5.4)

Choose any partition Pε with kPε k < δ. If P is any refinement of Pε , then kPk ≤ kPε k < δ, hence (5.4) holds for P. (c) ⇒ (a): Let L := limP S(f, P, ξ). Given ε > 0, choose a partition Pε such that |S(f, P, ξ) − L| < ε for all refinements P of Pε and all ξ.

(5.5)

For such a partition P, by the approximation property of suprema there exists for each j a sequence {ξj,k }∞ k=1 in [xj−1 , xj ] such that limk f (ξj,k ) = Mj (f ). It follows that lim S(f, P, ξ k ) = S(f, P), where ξ k = (ξ1k , ξ2k , . . . , ξnk ). k

Rb Rb From (5.5), a f − L ≤ S(f, P) − L ≤ ε. Since ε was arbitrary, a f ≤ L. Rb Rb Rb Similarly, a f ≥ L. Therefore a f = a f .

Riemann Integration on R

115

Exercises 1. Prove that if k is a constant, then

Rb a

k=

Rb a

k = k(b − a).

2. Let a ≤ c < d ≤ b. Define f on [a, b] by f (x) = 1 if x ∈ [c, d] and Rb f (x) = 0 otherwise. Show that f ∈ Rba and evaluate a f . 3.S ⇓1 Prove that Rb Rb (a) S(−f, P) = −S(f, P) and a (−f ) = − a f. Rb Rb (b) f ∈ Rba ⇒ −f ∈ Rba and a (−f ) = − a f. 4. ⇓2 Prove that a monotone function is integrable. 5.S Let f ∈ Rba and let g : [a, b] → R be any function that differs from f at Rb Rb finitely many points in [a, b]. Prove that g ∈ Rba and that a f = a g. Does the same result hold if g differs from f at countably many points? 6. Let f ∈ Rba . Prove: (a) If inf a≤x≤b f (x) > 0, then 1/f ∈ Rba . √ (b) If f (x) ≥ 0 for all x ∈ [a, b], then f ∈ Rba . (c)S sin(f ) ∈ Rba . 7.S Let F (P) be a real-valued function of partitions P on an interval [a, b]. Write L = lim F (P) P

if, given ε > 0, there exists a partition Pε such that |F (P) − L| < ε for all partitions P refining Pε . (a) Show that the limit is linear, that is, lim αF (P) + βG(P) = α lim F (P) + β lim G(P), P

P

P

provided the right side exists. (b) Let f be a bounded function on [a, b]. With this definition, show that Z a

b

f = lim S(f, P) and P

Z a

b

f = lim S(f, P). P

8. Let f ∈ R10 and set g(x) = xq , where q > 0. Prove that f ◦ g ∈ R10 . 1 This 2 This

exercise will be used in 5.2.2. exercise will be used in 5.9.8.

116

A Course in Real Analysis

5.2

Properties of the Integral

The following lemma will be useful in proving certain properties of integrals. 5.2.1 Lemma. Let f : [a, b] → R be bounded. Then there exists a sequence of partitions {Pn } of [a, b] such that lim S(f, Pn ) =

n→∞

b

Z

f and

f.

n→∞

a

b

Z

lim S(f, Pn ) =

a

Moreover, the limits still hold if each Pn is replaced by a refinement. Proof. By the approximation property of infima and suprema, for each n there exist partitions Pn0 and Pn00 of [a, b] such that Z

b

f − 1/n <

S(f, Pn0 )

Z ≤

a

b

b

Z

f and

f≤

a

Z

S(f, Pn00 )

<

a

b

f + 1/n.

a

Since refinements decrease upper sums and increase lower sums, the inequalities still hold if Pn0 and Pn00 are replaced by their common refinement Pn or by any refinement of Pn . Letting n → +∞ completes the proof. 5.2.2 Theorem. If f, g ∈ Rba and α, β ∈ R, then αf + βg ∈ Rba and Z

b

b

Z

αf + βg = α

a

f +β

a

Z

b

g. a

Proof. By 5.2.1, we may choose a sequence of partitions Pn such that lim S(f, Pn ) = n

Z

b

f and lim S(g, Pn ) = n

a

Z

b

g. a

(There exists one such sequence for f , another for g; the sequence of common refinements then works for both functions.) Letting n → ∞ in Z

b

(f + g) ≤ S(f + g, Pn ) ≤ S(f, Pn ) + S(g, Pn )

a

yields Z

b

(f + g) ≤

a

Similarly, Z a

Z

b

f+

a b

(f + g) ≥

Z a

Z

b

g. a

b

f+

Z

b

g. a

Riemann Integration on R

117

It follows that f + g is integrable and b

Z

(f + g) =

b

Z

g. a

a

a

b

Z

f+

Rb Rb It remains to prove that αf is integrable and that a αf = α a f . If α > 0, then S(αf, P) = αS(f, P) and S(αf, P) = αS(f, P). Taking the infimum and supremum over P yields b

Z

αf = α

Z

b

f=

αf .

a

a

b

Z a

If α < 0, then −α > 0, hence b

Z

αf =

a

Z

b

(−α)(−f ) = (−α)

Z

a

b

(−f ) = α

b

Z

a

f, a

the last equality by Exercise 5.1.3. 5.2.3 Proposition. If f ∈ Rba and a ≤ c < d ≤ b, then f |[c,d] ∈ Rdc . Proof. Given ε > 0, let P be a partition of [a, b] with S(f, P) − S(f, P) < ε. We may assume that c, d ∈ P, otherwise replace P by the refinement P ∪ {c, d}. If Q = P ∩ [c, d], then clearly S f |[c,d] , Q − S f |[c,d] , Q ≤ S(f, P) − S(f, P) < ε, hence f |[c,d] ∈ Rdc . The following is a converse of 5.2.3. 5.2.4 Theorem. Let a < c < b. If f |[a,c] ∈ Rca and f |[c,b] ∈ Rbc , then f ∈ Rba and Z b Z c Z b f= f+ f. a

a

c

Proof. By 5.2.1, we may choose sequences of partitions Pn0 of [a, c] and Pn00 of [c, b] such that lim S(f |[a,c] , Pn0 ) n

=

Z

c

f a

and

lim S(f |[c,b] , Pn00 ) n

=

Z

Then Pn := Pn0 ∪ Pn00 is a partition of [a, b] and Z a

b

f ≤ S(f, Pn ) = S(f |[a,c] , Pn0 ) + S(f |[c,b] , Pn00 ).

b

f. c

118

A Course in Real Analysis

Letting n → ∞, we obtain Z

b

Z f≤

a

c

f+

a

Z

b

f. c

Replacing f by −f produces the reverse inequality for the lower integral of f , proving the theorem. 5.2.5 Theorem. If f, g ∈ Rba and f ≤ g on [a, b], then Z b Z b f≤ g. a

a

In particular, if m ≤ f (x) ≤ M for all x ∈ [a, b], then Z b m(b − a) ≤ f ≤ M (b − a). a

Proof. Let P be a partition of [a, b]. By hypothesis, Mj (f ) ≤ Mj (g) for each j, hence S(f, P) ≤ S(g, P). Taking the infimum over P yields the first inequality. The second inequality follows from the first and Exercise 5.1.1. Z b Z b f ≤ |f |. 5.2.6 Theorem. If f ∈ Rba , then |f | ∈ Rba and a

a

Proof. By Exercise 1.4.5, for any partition P of [a, b], Mj (|f |) − mj (|f |) ≤ Mj (f ) − mj (f ). Summing over j, S(|f |, P) − S(|f |, P) ≤ S(f, P) − S(f, P). Since the right side can be made arbitrarily small, |f | ∈ Rba . Applying 5.2.5 to ±f ≤ |f | we obtain Z b Z b ± f≤ |f |, a

a

which gives the desired inequality. 5.2.7 Theorem. If f, g ∈ Rba , then f g ∈ Rba . Proof. Since f g = 12 (f + g)2 − f 2 − g 2 , it suffices to prove that f 2 ∈ Rba . To this end, let P be any partition of [a, b] and let |f | ≤ M . Then Mj (f 2 ) − mj (f 2 ) = Mj2 (|f |) − m2j (|f |) ≤ 2M Mj (|f |) − mj (|f |) . Summing over j, S(f 2 , P) − S(f 2 , P) ≤ 2M S(|f |, P) − S(|f |, P) . Since |f | ∈ Rba , the right side of the last inequality may be made arbitrarily small. Therefore, f 2 ∈ Rba .

Riemann Integration on R

119

Exercises 1.S Let {cn } be a convergent sequence in [a, b] and let f be a bounded function on [a, b] with f (x) = 0 for all x 6∈ {cn }. Prove that f ∈ Rba and Rb find a f . 2. Define f on [0, 1] by f (0) = 0 and f (x) = 2−n

if 2−n−1 < x ≤ 2−n , n ≥ 0. R1 Prove that f ∈ R10 and evaluate 0 f . 3. Prove or disprove: |f | ∈ Rba implies f ∈ Rba . 4. A function s on [a, b] is called a step function if there exists a partition of [a, b] such that s is constant on the interior of each partition interval. Show that a step function is integrable. Prove that a bounded function f is integrable on [a, b] iff for each ε > 0 there exist step functions s` and Rb su such that s` ≤ f ≤ su and a (su − s` ) < ε. 5.S Prove that if fj ∈ Rba , 1 ≤ j ≤ n, then max{f1 , . . . , fn } ∈ Rba and min{f1 , . . . , fn } ∈ Rba . 6.S Let f be continuous and f (x) < M for all x ∈ [a, b]. Prove that Z b f < M (b − a). (Compare with 5.2.5.) a

7. Let f ∈ Rba be nonnegative. Prove that if f is continuous at some point Rb x0 ∈ [a, b] and f (x0 ) 6= 0, then a f > 0. 8. Let f ∈ Rba such that either Rb (a) a f g = 0 for every continuous function g, or Rb (b) a f g = 0 for every step function g. Prove that f is zero at each point of continuity of f . Ry 9.S Let f ∈ Rba and for x, y ∈ [a, b] define F (x, y) = x f . Prove that F (x, y) is continuous in y for each x and continuous in x for each y. 10. Let f be bounded on [a, b] and integrable [c, b] for every a < c < b. Prove that the following statements are equivalent: Z b (a) lim+ f exists in R. x→a

(b) lim

x

Z

n→+∞

b

f exists in R for some sequence an ↓ a.

an

(c) f ∈ Rba . Conclude from Exercise 9 that if f ∈ Rba , then the limit in (a) is

Rb a

f.

120

A Course in Real Analysis

11. Let f be integrable on [0, x] for all x > 0. Prove that Z Z 1 x 1 x lim inf f (x) ≤ lim inf f ≤ lim sup f ≤ lim sup f (x). x→+∞ x→+∞ x 0 x→+∞ x 0 x→+∞ Conclude that if L := limx→+∞ f (x) exists in R, then Z 1 x lim f (t) dt = L. x→+∞ x 0 12.S Let f be continuous on [a, b] and let M = supa≤x≤b |f (x)|. Prove: (a) For each ε > 0 there exists δ > 0 such that Z b δ(M − ε) ≤ |f (x)| dx ≤ M (b − a). a b

Z

(b) M = lim

p→+∞

|f |p

1/p

.

a

13. ⇓3 Let f, g : [a, b] → R be continuous. Supply the details in the following outline of a proof of the Cauchy–Schwarz inequality. Z b 2 Z b Z b fg ≤ f2 g2 . a

a

(a) The inequality holds if

a

b

Z

g 2 = 0.

a

(b) For any real number t, Z b Z 2 0≤ (f − tg) = a

(c) Let t =

Z

f − 2t 2

a

Z fg

a

5.3

b

b

b

g

2

−1

Z a

b

fg + t

2

Z

b

g2 .

a

in (b).

a

Evaluation of the Integral

The theorems in this section describe standard methods for evaluating integrals. The first of these expresses the integral of a function f in terms of a primitive or antiderivative, that is, a function whose derivative is f . It also shows that the process of integration is the inverse of that of differentiation. 3 This

exercise will be used in 5.7.19.

Riemann Integration on R

121

5.3.1 Fundamental Theorem of Calculus. Let f : [a, b] → R be continuous. Z x (a) The function G(x) := f (t) dt, x ∈ [a, b], is a primitive of f . a

(b) For any primitive F of f ,

Z a

(c) If f 0 ∈ Rba , then

b

b f = F (x) := F (b) − F (a). a

b

Z

f 0 = f (b) − f (a). In particular, f (x) = f (a) +

x

Z

a

f 0.

a

Proof. (a) We assume that a ≤ x < b and prove that lim

h→0+

G(x + h) − G(x) = f (x). h

(5.6)

By 5.2.4 and 5.2.6, if h > 0 and x + h < b, then Z x+h Z 1 G(x + h) − G(x) 1 x+h − f (x) = f (t) − f (x) dt ≤ |f (t) − f (x)| dt. h x h x h By continuity of f at x, given ε > 0 we may choose δ > 0 such that |t − x| < δ implies |f (t) − f (x)| < ε. Thus if h < δ, then the term on the right in the above inequality is ≤ ε, proving (5.6). (b) Let F be any primitive of f . Then F = f 0 = G, hence F = G + c for some constant c. Thus from (a), Z b f = G(b) − G(a) = F (b) − F (a). a

(c) For any partition P, by the mean value theorem f (xj ) − f (xj−1 ) = f 0 (ξj )∆xj for some ξj ∈ [xj−1 , xj ], j = 1, . . . , n. For this choice of ξj , S(f 0 , P, ξ) =

n X j=1

f 0 (ξj )∆xj =

n X f (xj ) − f (xj−1 )] = f (b) − f (a). j=1

Since we may choose P so that S(f 0 , P, ξ) is arbitrarily near

Rb a

f 0 , (c) follows.

R The general primitive of a continuous function f is denoted by f and Rb is called the indefinite integral of fR. (In this context, a f is called a definite integral.) For example, one writes 3x2 dx = x3 + c, where c is the so-called constant of integration. In general, since primitives of a function differ only by a constant, we write Z f (x) dx = F (x) + c, where F is any particular primitive of f .

122

A Course in Real Analysis

5.3.2 Change of Variables Theorem. Let ϕ : [a, b] → R be continuously differentiable with ϕ0 never zero and let f be integrable on [c, d] := ϕ([a, b]). Then (f ◦ ϕ)|ϕ0 | ∈ Rba and b

Z

f (ϕ(x))|ϕ (x)| dx = 0

a

Z

d

f (y) dy.

(5.7)

c

Proof. By the intermediate value theorem, we may assume that ϕ0 (x) > 0 for all x, so ϕ is strictly increasing, c = ϕ(a), and d = ϕ(b).

y = ϕ(x) d yn−1 .. . yj yj−1 .. . y1 c a x1 · · · xj−1

xj · · · xn−1 b

x

FIGURE 5.6: The partitions P x and P y . We show first that f ◦ ϕ ∈ Rba . For this we use the fact that ϕ induces a one-to-one correspondence between partitions P x = {x0 , . . . , xn } of [a, b] and partitions P y = {y0 , . . . , yn } of [c, d], where yj = ϕ(xj ) (xj = ϕ−1 (yj )) (see Figure 5.6). Since ϕ([xj−1 , xj ]) = [yj−1 , yj ], Mjx (f ◦ ϕ) =

sup

xj−1 ≤x≤xj

f (ϕ(x)) =

sup

yj−1 ≤y≤yj

f (y) = Mjy (f ).

(5.8)

Moreover, by the mean value theorem, there exists zj ∈ [yj−1 , yj ] such that ∆xj = ϕ−1 (yj ) − ϕ−1 (yj−1 ) = (ϕ−1 )0 (zj )∆yj ≤ C∆yj , where C is a bound for |(ϕ−1 )0 | on [c, d]. From (5.8) and (5.9), S(f ◦ ϕ, P x ) ≤ CS(f, P y ). The same inequality evidently holds for −f , hence −S(f ◦ ϕ, P x ) ≤ −CS(f, P y ). Adding these inequalities, S(f ◦ ϕ, P x ) − S(f ◦ ϕ, P x ) ≤ C[S(f, P y ) − S(f, P y )].

(5.9)

Riemann Integration on R

123

Since the right side may be made arbitrarily small, f ◦ ϕ ∈ Rba , hence also (f ◦ ϕ)ϕ0 ∈ Rba . To prove (5.7), we argue as in the first part of the proof, but now compare the Riemann sums S((f ◦ ϕ)ϕ0 , P x , ξ) and S(f, P y , ζ), where the intermediate points in each case are taken to be left endpoints: ξ := (x0 , . . . , xn−1 ), Then

ζ := (y0 , . . . , yn−1 ) = (ϕ(x0 ), . . . , ϕ(xn−1 )).

n X S (f ◦ ϕ)ϕ0 , P x , ξ = f (ζj )ϕ0 (xj )∆xj j=1

and, by the mean value theorem, S(f, P y , ζ) =

n X

f (ζj )∆ϕ(xj ) =

j=1

n X

f (ζj )ϕ0 (tj )∆xj ,

j=1

for some tj ∈ [xj−1 , xj ]. Subtracting these equations and using the triangle inequality, we obtain n X S (f ◦ ϕ)ϕ0 , Px , ξ − S(f, Py , ζ) ≤ |f (ζj )| |ϕ0 (xj ) − ϕ0 (tj )|∆xj j=1 n X

≤M

|ϕ0 (xj ) − ϕ0 (tj )|∆xj ,

j=1

where M is a bound for |f | on [c, d]. By the uniform continuity of ϕ0 on [a, b], given ε > 0 there exists a δ > 0 such that |ϕ0 (s) − ϕ0 (t)| < ε/M (b − a) for all s, t with |s − t| < δ. Hence if kP x k < δ, then S((f ◦ ϕ)ϕ0 , P x , ξ) − S(f, P y , ζ) < ε. Letting kP x k → 0 and noting that then also kP y k → 0 (because ∆yj = ϕ0 (cj )∆xj ≤ BkP x k, where B is a bound for |ϕ0 |), we see that Z b Z b f (ϕ(x))ϕ0 (x) dx − f (y) dy ≤ ε. a

a

Since ε was arbitrary, the two integrals are equal, completing the proof. Remark. Whether ϕ is increasing or decreasing, (5.7) may be written as Z a

b

f ϕ(x) ϕ0 (x) dx =

Z

ϕ(b)

f (y) dy.

ϕ(a)

This formula has an easy proof if f is continuous. Indeed, in this case f has a

124

A Course in Real Analysis

primitive F on [c, d], hence, by the chain rule, F ◦ ϕ is a primitive for (f ◦ ϕ)ϕ0 . The desired formula now follows from the fundamental theorem of calculus: Z b Z ϕ(b) 0 f ϕ(x) ϕ (x) dx = F ϕ(b) − F ϕ(a) = f (y) dy. a

ϕ(a)

Note that in this case it is not necessary to assume that ϕ0 6= 0.

♦

5.3.3 Integration by Parts Formula. Let f and g be differentiable on [a, b] with f 0 , g 0 ∈ Rba . Then Z b b Z b f (x)g 0 (x) dx = f (x)g(x) − f 0 (x)g(x) dx. (5.10) a

a

a

Proof. Since (f g) = f g + f g ∈ 5.3.1(c) implies that Z Z b Z b b b f 0 g + f g0 . f (x)g(x) = (f g)0 = 0

0

0

Rba ,

a

a

a

a

5.3.4 Example. We show that (k − 1)(k − 3) · · · 4 · 2 Z π/2 , k(k − 2) · · · 5 · 3 sink x dx = π (k − 1)(k − 3) · · · 5 · 3 0 2 k(k − 2) · · · 4 · 2, Z π/2 Let Ik = sink x dx. Integrating by parts,

k odd, k even.

0

Ik =

Z

π/2

sink−1 x sin x dx = (k − 1)

Z

0

π/2

sink−2 x cos2 x dx.

0

Since cos x = 1 − sin x, Ik = (k − 1)(Ik−2 − Ik ), hence 2

2

Ik =

k−1 Ik−2 . k

Iterating, we obtain Ik =

(k − 1)(k − 3) · · · (k − 2j + 1) Ik−2j . k(k − 2) · · · (k − 2j + 2)

If k = is odd, take j = (k − 1)/2 so Ik =

(k − 1)(k − 3) · · · 4 · 2 I1 . k(k − 2) · · · 3 · 1

If k is even, take j = (k − 2)/2 so Ik =

(k − 1)(k − 3) · · · 5 · 3 I2 . k(k − 2) · · · 6 · 4

Since I1 = 1 and I2 = π/4, the formula follows.

♦

Riemann Integration on R

125

If f 0 and g 0 are continuous, then (5.10) has the following analog for indefinite integrals: Z Z f (x)g 0 (x) dx = f (x)g(x) −

f 0 (x)g(x) dx.

(5.11)

Setting h = g 0 and using the symbols D for differentiation and I for integration, we may write (5.11) as I(f h) = f · Ih − I(Df · Ih). By induction we obtain I(f h) =

n X

(−1)(k−1) Dk−1 f · I k h + (−1)n I Dn f · I n h .

(5.12)

k=1

Rb The fundamental theorem of calculus may then be used to calculate a f h. Formula (5.12) may be expressed in tabular form as shown in Table 5.1. For each k, the entries in column k are multiplied and the resulting products are added. The exception is in column n + 1, where the product must be integrated before adding. The process terminates if and when Dn f = 0. R TABLE 5.1: Table for evaluating f h by parts. k (−1)k−1 Dk−1 f Ikh

1 +1 f Ih

2 3 −1 +1 Df D2 f I 2h I 3h

··· ··· ··· ···

n (−1)n−1 Dn−1 f I nh

n+1 (−1)n Dn f I nh

5.3.5 Example. Using Table 5.1 with f (x) = (x + 1)3 and h(x) = e5x , we have Z 3 3(x + 1)2 6(x + 1) 6(x + 1) 3 5x 5x (x + 1) (x + 1) e dx = e − + − + c. 5 52 53 54 R TABLE 5.2: Table for evaluating (x + 1)4 e5x dx by parts. k (−1)k−1 Dk−1 f Ikh

1 +1 (x + 1)3 e5x /5

2 −1 3(x + 1)2 e5x /52

3 +1 6(x + 1) e5x /53

4 −1 6 e5x /54

5 +1 0 e5x /55 ♦

126

A Course in Real Analysis

Exercises 1.S ⇓4 Let f : R → R be continuous and periodic with period p > 0, that is, f (x + p) = f (x) for all x. Prove that Z p Z p f (x + y) dx = f (x) dx for all y ∈ R. 0

0

2. Let f : (a, b) → R have a uniformly continuous derivative. Prove that f 0 ∈ Rba and Z b f 0 = lim+ f (b − ε) − f (a + ε) . a

ε→0

3. Verify the following inequalities: Z 1 √ sin x dx 2 √ √ 2−1 ≤ ≤ 2 − 1. (a)S 2 π 1+x 0 Z 1 xp dx 21−q − 1 1 ≤ ≤ , p, q > 0, q 6= 1. (b) q p q 2 (p + 1) (p + 1)(1 − q) 0 (1 + x ) 4. Establish the formula Z 1 (1 − x)m xn dx = 0

5. Let n ∈ N. Evaluate Z 1 S (a) exp(x1/n ) dx. 0

m! . (n + 1)(n + 2) · · · (n + m + 1)

(b)

Z

e

lnn x dx.

1

6. Let k ∈ N. Show that (k − 1)(k − 3) · · · 4 · 2 , π/2 k(k − 2) · · · 5 · 3 k cos x dx = π (k − 1)(k − 3) · · · 5 · 3 0 2 k(k − 2) · · · 4 · 2,

Z

7.S ⇓5 Let k ∈ N. Show that (k − 1)(k − 3) · · · 4 · 2 Z 1 xk k(k − 2) · · · 3 · 1 √ dx = (k − 1)(k − 3) · · · 3 · 1 π 1 − x2 0 k(k − 2) · · · 4 · 2 2

k odd, k even.

if k is odd if k is even.

N.B. The integral is improper but converges by Exercise 5.7.7. For the even case, use Exercise 6. 4 This 5 This

exercise will be used in 13.6.4. exercise will be used in 13.4.2

Riemann Integration on R

127

8. Let f 0 be continuous and positive on [a, b]. Prove that b

Z

f (x) dx +

f (b)

Z

f −1 (y) dy = bf (b) − af (a).

f (a)

a

Interpret geometrically for f > 0 and a > 0. 9.S (Young’s inequality). Let f be continuous and strictly increasing on [0, a] with f (0) = 0. Prove that Z x Z y Z x f+ f −1 = yf −1 (y) + f. 0

Deduce that Z

x

f+

0

f −1 (y)

0

Z

y

f −1 ≥ xy, 0 ≤ x ≤ a, 0 ≤ y ≤ f (a).

0

10. Use Young’s inequality to verify the following inequalities: p 1 − y 2 + y sin−1 y ≥ xy + cos x, 0 ≤ x ≤ π/2, 0 ≤ y ≤ 1. (a) (b)S x ln x + ey ≥ xy + x, 1 ≤ x ≤ 2, 0 ≤ y ≤ ln 2. 11. Give an example of a discontinuous function that (a) has a primitive, (b) has no primitive. 12. Let f and g be continuously differentiable with g > 0. Prove that Z Z 0 f (x)g 0 (x) f (x) f (x) dx = dx − . 2 g (x) g(x) g(x) 13.S Let f 0 ∈ Rba . Prove that lim n

Z

b

f (x) sin(nx) dx = 0.

a

14. Let f be continuous on [0, +∞) such that limx→+∞ f (x) exists in R and let a > 0. Find Z a lim f (nx) dx. n→+∞

0

15. Let h0 be continuous and positive on [a, b] and let g 0 be continuous on [c, d] = [h(a), h(b)]. Prove that Z a

b

g h(x) dx = g(d)b − g(c)a −

Z c

d

g 0 (t)h−1 (t) dt.

128

A Course in Real Analysis

16. Let f ∈ Ra−a , a > 0. Show that ( Z a 0 if f is an odd function, Ra f= 2 f if f is an even function. −a 0 17.S Let f : [a, b] → R be continuous and let u, v be differentiable functions with range contained in [a, b]. Prove that v(x)

Z

d dx

f = f v(x) v 0 (x) − f u(x) u0 (x).

u(x)

18. Let functions a, b, c, d : [0, 1] → [0, 1] have continuous derivatives and let f : [0, 1] → R be continuous. Suppose that b(x)

Z

f=

Z

a(x)

Prove that

Z

d(x)

f for all x ∈ [0, 1].

c(x)

b(1)

f+

Z

b(0)

c(1)

f=

Z

c(0)

a(1)

f+

Z

a(0)

d(1)

f. d(0)

19.S Let f be continuous and g differentiable with bounded derivative on [a, b]. Evaluate Z x g(x) lim f. x→a x − a a 20. Let p > 0, q > 1, and m, k ∈ N with m > k. Evaluate lim sn if sn = n→+∞

(a) S

n X k q−1 . (c) q n + kq

n X kp . (b) np+1

k=1

k=1

n X k=1

(mn)! nkn [(m − k)n]!

1/n .

21.S Let |f 0 | ≤ M on [a, b]. For n ∈ N set h = (b − a)/n and xk = a + kh, k = 0, 1, . . . , n − 1. Prove that Z b n X f −h f (x ) k−1 ≤ hM (b − a). a

k=1

22. Let f be continuous on [0, 1]. Prove that Z 0

1

Z 0

x

f (t) dt dx =

Z

1

(1 − x)f (x) dx.

0

23. Let f, g : [0, 1] → R be continuously differentiable, f monotone, and R1 g(x) > g(0) = g(1) on (0, 1). Prove that 0 f g 0 = 0 iff f is constant.

Riemann Integration on R

*5.4

129

Stirling’s Formula

Stirling’s formula gives an estimate for n! when n is large. The proof relies on material from Section 4.3. We begin with the following lemma, which provides the fundamental inequality needed to establish the formula. 5.4.1 Lemma. If f is concave and differentiable on (a, b), then Z v u+v f (u) + f (v) 1 f (t) ≤ f ≤ , a < u < v < b. 2 v−u u 2 Proof. By the concave versions of 4.3.6 and (4.3), f (u)

t−u v−t + f (v) ≤ f (t) ≤ f 0 (x)(t − x) + f (x) v−u v−u

for all a < u < v < b and all x, t ∈ [u, v]. Integrating with respect to t, Z v v − u (v − x)2 − (x − u)2 f (u) + f (v) ≤ + f (x)(v − u). f (t) ≤ f 0 (x) 2 2 u Taking x = (u+v)/2 and dividing by v−u produces the desired inequalities. 5.4.2 Stirling’s Inequalities. For all n, en n! √ ≤ e, nn n

e7/8 ≤

(5.13)

where the middle term is decreasing in n. Proof. Taking f (x) = ln x, u = k ∈ N, and v = k + 1 in the lemma, we have Z k+1 2 1 1 ln(t) dt ≤ ln k + 21 . 2 ln(k + k) ≤ 2 ln(k) + ln(k + 1) ≤ k

Rearranging, Z k+1 0≤ ln(t) dt − k

1 2

ln(k 2 + k) ≤ ln k +

1 2

−

1 2

ln(k 2 + k).

(5.14)

Now observe that n−1 X Z k+1 k

k=1 n−1 X k=1

ln(k 2 + k) =

ln t dt =

n−1 X

1 2

ln t dt = n ln n − n + 1,

1

[ln(k + 1) + ln k] = 2

k=1

ln(k + 21 ) −

n

Z

n X

ln k − ln n = 2 ln n! − ln n, and

k=2

ln(k 2 + k) =

1 2

ln 1 +

1 2 4(k + k)

≤

1 , + k)

8(k 2

130

A Course in Real Analysis

where, for the last inequality, we used the fact that ln(1 + x) < x for x > 0, which follows directly from the integral definition of ln(x + 1). Summing in (5.14) and using the above inequalities, we obtain 0≤ n+

1 2

ln n − n + 1 − ln n! ≤

n−1 X k=1

n−1 1X 1 1 1 1 = − ≤ . 8(k 2 + k) 8 k k+1 8 k=1

Note that the term n + 2 ln n − n + 1 − ln n! is increasing in n since it was obtained as a sum of nonnegative terms in (5.14). Rearranging, we have 1

7 ≤ − n + 12 ln n + n + ln n! ≤ 1, 8 where the middle term is decreasing in n. Exponentiating yields the desired inequalities. 5.4.3 Stirling’s Formula. lim n

√ en n! √ = 2π. n n n R π/2

Proof. By 5.4.2, the limit L in the formula exists in R. Set In = By 5.3.4, I2n+1 =

0

sinn x dx.

(2n)(2n − 2) · · · 4 · 2 π (2n − 1)(2n − 3) · · · 5 · 3 and I2n = . (2n + 1)(2n − 1) · · · 5 · 3 2 2n(2n − 2) · · · 4 · 2

For x ∈ [0, π/2] and n ≥ m, sinn x ≤ sinm x, hence I2n+2 I2n+1 I2n ≤ ≤ = 1. I2n I2n I2n It follows that 2n + 1 π 22 · 42 · 62 · · · (2n − 2)2 · (2n)2 π ≤ 2 2 2 ≤ , 2n + 2 2 1 · 3 · 5 · · · (2n − 1)2 (2n + 1) 2 from which we obtain Wallis’s product lim n

22 · 42 · 62 · · · (2n − 2)2 · (2n)2 π = . 2 2 2 2 1 · 3 · 5 · · · (2n − 1) (2n + 1) 2

Denote the general term in Wallis’s product by αn . Since 2 · 4 · · · (2n − 2) · (2n) = 2n n! and 3 · 5 · · · (2n − 1) = we see that

√

αn =

22n (n!)2 √ . (2n)! 2n + 1

(2n)! , 2n n!

Riemann Integration on R

131

en n! √ and note that nn n

Now set βn =

√ √ βn2 e2n (n!)2 (2n)2n 2n (n!)2 22n 2 √ . = 2n+1 = β2n n e2n (2n)! (2n)! n Dividing by

√

αn ,

√ √ (n!)2 22n 2 (2n)! 2n + 1 √ p √ = 2 2 + 1/n → 2. = β2n αn 22n (n!)2 (2n)! n p √ Since αn → π/2 and βn → L, we also have r βn2 2 lim . →L √ n β2n αn π q √ Therefore, L π2 = 2, hence L = 2π. βn2 √

5.5

Integral Mean Value Theorems

The following theorem asserts that the average value of a continuous function over an interval [a, b] is actually assumed by the function at some intermediate point c. 5.5.1 First Mean Value Theorem for Integrals. If f is continuous on [a, b], then there exists c ∈ (a, b) such that Z b 1 f = f (c). b−a a and the fundamental Proof. Apply the mean value theorem for derivatives Rx theorem of calculus to the function G(x) := a f (t) dt. The next theorem is a weighted average generalization of 5.5.1. 5.5.2 Weighted Mean Value Theorem for Integrals. Let f be continuous on [a, b] and let g ∈ Rba . If g does not change sign in [a, b], then there exists c ∈ [a, b] such that Z b Z b f g = f (c) g. (5.15) a

a

Rb Proof. We may assume that g ≥ 0 on [a, b], so a g ≥ 0. Suppose first that Rb g = 0. If C is an upper bound for |f | on [a, b], then a Z b Z b Z b f g ≤ |f |g ≤ C g = 0, a

a

a

132

A Course in Real Analysis Rb hence both sides of (5.15) are zero. Now assume that a g > 0. Let m = f (xm ) and M = f (xM ) denote the minimum and maximum values of f on [a, b]. Since mg ≤ f g ≤ M g, b

Z

b

Z

g≤

m

Z

a

a

hence

b

fg ≤ M

g, a

b

Z

fg m ≤ Za

≤ M.

b

g a

An application of the intermediate value theorem completes the proof. 5.5.3 Second Mean Value Theorem for Integrals. Let f be continuous and g differentiable and monotone on [a, b] with g 0 ∈ Rba . Then there exists c ∈ [a, b] such that Z b Z c Z b f g = g(a) f + g(b) f. a

Proof. Let F (x) =

Rx

a

c

f . Integrating by parts,

a

Z a

b

fg =

Z

b

F g = F (b)g(b) − 0

a

Z

b

g 0 F.

a

Since g is monotone, the sign of g 0 does not change, hence, by 5.5.2, there exists c ∈ [a, b] such that Z a

b

g 0 F = F (c)

Z

b

g 0 = F (c)[g(b) − g(a)].

a

Therefore, Z

b

f g = F (b)g(b) − F (c)[g(b) − g(a)] = g(a)F (c) + g(b)[F (b) − F (c)],

a

which is the assertion of the theorem. Remarks. (a) Because derivatives have the intermediate value property (Exercise 4.2.25), the monotonicity requirement on g will be satisfied if g 0 6= 0 on [a, b]. (b) The second mean value theorem for integrals holds under the less restrictive hypotheses that f is integrable and g is monotone. A proof may be found in [3]. ♦

Riemann Integration on R

133

Exercises

√ √ 1. Let 0 ≤ a < b and let f be continuous on [ a, b]. Prove that there exists c ∈ [a, b] such that 1 2

b

√ f ( x) dx = a

Z

√

a

√

√ c

Z

f (x) dx + b

Z √

a

b

f (x) dx. c

2. Let 0 < a < b and let f be continuous on [b−1 , a−1 ]. Prove that there exists c ∈ [a, b] such that Z

b

f (1/x) dx = b

2

1/c

Z

a

f (x) dx + a

2

1/a

Z

1/b

f (x) dx.

1/c

3.S Let f be continuous on [0, 1]. Prove that there exists c ∈ [1/2, such that Z

2 f sin x dx = √ 3

π/3

π/6

√

c

Z

f (x) dx + 2

Z

√

3/2]

3/2

f (x) dx.

c

1/2

4. Let f be continuous on [0, 1]. Prove that there exists c ∈ [0, 1] such that Z

π/4

f tan x dx =

c

Z

0

f (x) dx +

0

1 2

Z

1

f (x) dx.

c

5. Let f and g be continuous on [a, b]. Show that there exists c ∈ (a, b) such that Z Z b

b

f = f (c)

g(c) a

g. a

6. Prove: If f is continuous, g ∈ Rba , and m is lower bound for g, then there exist c, d ∈ [a, b] such that Z

b

f g = f (c)

b

Z

a

g + m(b − a)[f (d) − f (c)].

a

7.S Prove the following variant of the second mean value theorem for integrals: Let f, g ∈ Rba with g ≥ 0. If m ≤ f ≤ M on [a, b], then there exists c ∈ [a, b] such that Z

b

fg = m

a

Hint. Consider G(x) := m

Z

c

g+M

a

Rx a

g+M

Z

f. c

Rb x

g.

b

134

A Course in Real Analysis

8. Let g have a nonnegative integrable derivative on [0, 1] with g(0) = 0 and g(1) = 1. Show that there exists c ∈ [0, 1] such that Z

1

xn g(x) dx =

0

1 − cn+1 . n+1

9.S Let g have a nonnegative integrable derivative on [0, π] with g(0) = 0 and g(π) = 1. Show that there exists c ∈ [0, π] such that Z π g(x) sin x dx = cos c + 1. 0

10. Let g be twice differentiable on [a, b] with g 00 < 0 and g 00 ∈ Rba , and let f be continuous on g([a, b]). Show that if g 0 ≥ m > 0 and |f | ≤ M , then Z

a

b

2M . f 0 ◦ g ≤ m

Hint. Use the second mean value theorem for integrals.

*5.6

Estimation of the Integral

Integrals that cannot be evaluated exactly may be approximated by various numerical methods. Of course, an integral may always be approximated by a Riemann sum; however, unless the intermediate points of the subintervals are chosen judiciously, a Riemann sum usually offers only a coarse approximation of the integral. In this section we discuss three techniques, the trapezoidal rule, the midpoint rule, and Simpson’s rule, that yield good numerical estimates of an integral. The approximation techniques are given in order of increasing precision. For each of these, we use partitions of the form xk = a + khn , k = 0, 1, . . . , n, where hn :=

b−a . n

(5.16)

Rb The integral a f is then estimated by replacing f on the interval [xk , xk+1 ] by a simpler function fk . The approximation is therefore Z a

b

f (x) dx ≈

n−1 X Z xk+1 k=0

fk (x) dx.

xk

The error in the approximation is simply the difference between the left and right sides. The main goal in the approximation schemes described below is

Riemann Integration on R

135

to obtain, for a given class of functions, the sharpest upper bound for the magnitude of the error The reader may wish to compare the error bounds in the three approximation techniques described below with the error bound for the approximation given by the Riemann sum Rn =

b − a f (x0 ) + f (x1 ) + · · · + f (xn−1 ) . n

(5.17)

By Exercise 5.3.21, for functions f with a bounded derivative one has in general only the first order error bound Z b f − Rn ≤ hn (b − a)kf 0 k∞ , a

implying that a good estimate requires a large n. Here, for a bounded function g on [a, b], kgk∞ := sup {|g(x)| : a ≤ x ≤ b} ,

Trapezoidal Rule Let

Pk := (xk , f (xk )) = (xk , yk ), k = 0, 1, . . . , n,

(5.18)

where the points xk are given in (5.16). The trapezoidal rule uses the line segment from Pk to Pk+1 to approximate f on [xk , xk+1 ], k = 0, 1, . . . n − 1. Thus the approximating function fk is given by fk (x) = yk + mk (x − xk ), xk ≤ x ≤ xk+1 , mk :=

yk+1 − yk . xk+1 − xk

A simple calculation shows that Z xk+1 hn (yk+1 + yk ), fk = 2 xk The sum Tn :=

n−1 X Z xk+1

fk =

k=0

xk

hn y0 + 2y1 + · · · + 2yn−1 + yn 2

Rb is then used to approximate a f . If f > 0, Tn may be realized as the sum of areas of trapezoids. (See Figure 5.7.) Rb 5.6.1 Trapezoidal Rule. If f ∈ Rba , then limn Tn = a f . Moreover, if f 00 exists and is continuous on [a, b], then the following error estimate holds: Z

a

b

h2 f − Tn ≤ n (b − a)kf 00 k∞ . 12

136

A Course in Real Analysis

f

x0

x2

x1

x3

x4

x5

x

x6

FIGURE 5.7: Trapezoidal rule approximation. Proof. For the Riemann sum Rn in (5.17), b − a b − a f (x0 ) − f (xn ) = f (a) − f (b) → 0, 2n 2n Rb hence Tn = (Tn − Rn ) + Rn → a f. To obtain the error estimate, consider the function Rn − Tn =

gk (x) :=

f (x) − yk − mk (x − xk ) f (x) − fk (x) = , (x − xk )(x − xk+1 ) (x − xk )(x − xk+1 )

which has singularities at xk and xk+1 . Since both the numerator and the denominator vanish at these points, the singularities may be removed using l’Hospital’s rule. Therefore, gk (x) has a continuous extension to [xk , xk+1 ]. Since (x − xk )(x − xk+1 ) does not change sign on [xk , xk+1 ], by the weighted mean value theorem for integrals (5.5.2) there exists a point zk ∈ [xk , xk+1 ] such that Z xk+1 Z xk+1 [f (x) − fk (x)] dx = gk (x)(x − xk )(x − xk+1 ) dx xk xk Z xk+1 = gk (zk ) (x − xk )(x − xk+1 ) dx xk 3

= −gk (zk ) It follows that Z b n−1 XZ f (t) dt − Tn = a

k=0

xk+1

xk

h . 6

[f (x) − fk (x)] dx = −

n−1 h3n X gk (zk ). 6

(5.19)

k=0

Now fix x ∈ (xk , xk+1 ) and define ψ(z) on [xk , xk+1 ] by ψ(z) = f (z) − fk (z) − gk (x)(z − xk )(z − xk+1 ). Since ψ has distinct zeros x, xk , and xk+1 , Rolle’s theorem applied twice shows

Riemann Integration on R

137

that ψ 00 has a zero vk ∈ (xk , xk+1 ). It follows that f 00 (vk ) = 2gk (x). Since x was arbitrary, |gk (x)| ≤ 21 kf 00 k∞ for all x ∈ [xk , xk+1 ]. From this and (5.19) we see that Z

b

a

nh3n 00 h2 f (t) dt − Tn ≤ kf k∞ = n (b − a)kf 00 k∞ . 12 12

Midpoint Rule Let xk :=

xk + xk+1 = a + k + 21 hn , k = 0, 1, . . . , n − 1, 2

where the points xk are given in (5.16). The midpoint rule uses the constant function fk (x) = f (xk ) , xk ≤ x ≤ xk+1 , Rb to approximate f on [xk , xk+1 ]. This amounts to approximating a f by Riemann sums Mn , where the intermediate points are the midpoints of the intervals: b − a Mn = f (x0 ) + f (x1 ) + · · · + f (xn−1 ) . n

f

a

x0

x1

x1

x2

x2

x3

x3

x b

FIGURE 5.8: Midpoint rule approximation. 5.6.2 Midpoint Rule. If f 00 exists and is continuous on [a, b], then the following error estimate holds: Z b h2 f − Mn ≤ n (b − a)kf 00 k∞ . a 24

138

A Course in Real Analysis

Proof. The function gk (x) =

f (x) − f (xk ) − f 0 (xk )(x − xk ) (x − xk )2

has a double singularity at xk , which may be removed by applying l’Hospital’s rule twice and defining gk (xk ) to be the resulting limit. Since f (x) − f (xk ) − f 0 (xk )(x − xk ) = gk (x)(x − xk )2 and

Z

xk+1

(x − xk ) dx = 0,

xk

we see that Z

xk+1

[f (x) − f (xk )] dx =

Z

xk+1

gk (x)(x − xk )2 dx.

xk

xk

Since (x − xk )2 has constant sign on [xk , xk+1 ], the weighted mean value theorem for integrals implies that the integral on the right equals Z xk+1 h3 gk (zk ) (x − xk )2 dx = gk (zk ) n 12 xk for some point zk ∈ [xk , xk+1 ]. Therefore, Z xk+1 h3 [f (x) − f (xk )] dx = gk (zk ) n . 12 xk

(5.20)

Now fix x ∈ [xk , xk ) ∪ (xk , xk+1 ]. By Taylor’s theorem, there exists a point ξk ∈ [xk , xk ] such that f (x) = f (xk ) + f 0 (xk )(x − xk ) +

f 00 (ξk ) (x − xk )2 . 2

Solving for f 00 (ξk ) we see that f 00 (ξk ) = 2gk (x). Therefore, |gk (x)| ≤ kf 00 k∞ /2 for all x ∈ [xk−1 , xk+1 ], hence from (5.20), Z xk+1 h3 h3 − n |f 00 k∞ ≤ [f (x) − f (xk )] dx ≤ n |f 00 k∞ . 24 24 xk Summing, we obtain −

nh3n 00 kf k∞ ≤ 24

Z

b

f (x) dx − Mn ≤

a

nh3n 00 kf k∞ , 24

which is the assertion of the theorem. Note that the estimates in both the trapezoidal rule and the midpoint rule are exact for all linear functions f , since then f 00 = 0.

Riemann Integration on R

139

Simpson’s Rule Simpson’s rule assumes n = 2m in (5.16) and uses a parabola through each triple of points (Pk−1 , Pk , Pk+1 ), k = 2j + 1, j = 0, . . . , m − 1, Pk := (xk , f (xk )) = (xk , yk ), to approximate f . To obtain the rule, observe that any polynomial p(x) of f

x0

x1

x2

x3

x4

x

FIGURE 5.9: Simpson’s rule approximation. degree ≤ 2 may be written in the form p(x) = bk (x − xk−1 )(x − xk ) + ck (x − xk−1 ) + dk ,

(5.21)

where p(xk+1 ) − 2p(xk ) + p(xk−1 ) , 2h2 p(xk ) − p(xk−1 ) ck = , and h dk = p(xk−1 ). bk =

It follows that the unique polynomial pk of degree ≤ 2 that passes through the points Pk−1 , Pk , and Pk+1 is obtained by choosing f (xk+1 ) − 2f (xk ) + f (xk−1 ) , 2h2 f (xk ) − f (xk−1 ) ck = ck (f ) := , and h dk = dk (f ) := f (xk−1 ). bk = bk (f ) :=

(5.22)

With this choice, one readily calculates Z xk+1 hn Sn,k := pk (x) dx = [yk−1 +yk+1 +4yk ], k = 2j +1, j = 0, · · · , m−1, 3 xk−1 R xk+1 which is taken as an approximation of xk−1 f . Note that the approximation is exact for all polynomials f of degree ≤ 2, since such a polynomial may be

140

A Course in Real Analysis

written in the form (5.21). Summing this result, we see that the integral of the approximating function on [a, b] is b−a y0 + 4y1 + 2y2 + 4y3 + 2y4 + · · · + 2yn−2 + 4yn−1 + yn . 3n Rb 5.6.3 Simpson’s Rule. If f ∈ Rba , then limn Sn = a f . Moreover, if f (4) exists and is continuous on [a, b], then the following error estimate holds: Z b h4 (b − a)kf (4) k ∞ f − Sn ≤ n . 180 a Sn :=

Proof. Set Rn0 := y0 + y2 + · · · + yn−2 (2hn ) and Rn00 := y1 + y3 + · · · + yn−1 (2hn ). These are Riemann sums for f on [a, b] and 6Sn = 2Rn0 + 4Rn00 + (b − a)(2hn ). Rb It follows that Sn → a f . To obtain the error estimate, let f (4) be continuous on [a, b] and denote the errors by En,k =

Z

xk+1

f (x) dx − Sn,k and En =

xk−1

m−1 X

En,2j+1 =

j=0

Z

b

f (x) dx − Sn .

a

We show that there exists a point ξk ∈ [xk , xk+1 ] such that En,k = −

h5n f (4) (ξk ) . 90

(5.23)

It will follow that |En | ≤

h4 (b − a)kf (4) k∞ mh5n kf (4) k∞ = n , 90 180

proving the theorem. To verify (5.23), fix k and choose a point in x∗k ∈ (xk−1 , xk ) ∪ (xk , xk+1 ). For any function g, define a function Lg on [xk−1 , xk+1 ] by (Lg)(x) = ak (g)(x − xk−1 )(x − xk )(x − xk+1 ) + bk (g)(x − xk−1 )(x − xk ) + ck (g)(x − xk−1 ) + dk (g), where bk (g), ck (g), and dk (g) are defined as in (5.22) and ak (g) is chosen so that (Lg)(x∗k ) = g(x∗k ). Then Lg is the unique polynomial of degree ≤ 3 passing through the four points xk−1 , g(xk−1 ) , xk , g(xk ) , xk+1 , g(xk+1 ) , and x∗k , g(x∗k ) .

Riemann Integration on R

141

Note that the coefficients in the definition of L are linear functions of g, hence L itself is a linear function. Furthermore, Lg = g for all polynomials of degree ≤ 3. Since (Lf )(x) = ak (f )(x − xk−1 )(x − xk )(x − xk+1 ) + pk (x) and

Z

xk+1

(x − xk−1 )(x − xk )(x − xk+1 ) dx = 0,

xk−1

we see that

Z

xk+1

Lf =

xk−1

Z

xk+1

pk = Sn,k .

xk−1

By Taylor’s formula with integral remainder (Exercise 4.6.3), there exists a polynomial T3 (x) of degree ≤ 3 such that Z 1 x f (x) = T3 (x) + R3 (x), where R3 (x) := (x − t)3 f (4) (t) dt. 3! xk−1 The remainder may be written ( Z (x − t)3 1 xk+1 (4) qt (x)f (t) dt where qt (x) := R3 (x) = 3! xk−1 0 Since

if t ≤ x if t > x.

Lf = LT3 + LR3 = T3 + LR3 = f − R3 + LR3 ,

we see that En,k =

Z

xk+1

(f − Lf ) =

xk−1

Z

xk+1

(R3 − LR3 ).

xk−1

In the remaining calculations, for ease of notation we assume that [xk−1 , xk+1 ] = [−h, h]. By Fubini’s theorem for continuous functions, Z Z h Z h 1 h (4) R3 (x) dx = f (t) qt (x) dx dt 3! −h −h −h Z 1 h (4) f (t)(h − t)4 dt. (5.24) = 4! −h Also, because L is linear, (LR3 )(x) =

1 3!

Z

h

f (4) (t)(Lqt )(x) dt.6

−h

Therefore, by Fubini’s theorem, Z h Z Z h 1 h (4) (LR3 )(x) dx = f (t) (Lqt )(x) dx dt. 3! −h −h −h 6 This

(5.25)

may be proved using the dominated convergence theorem. (See Exercise 11.3.??)

142

A Course in Real Analysis Now, by definition of L, (Lqt )(x) = at (x + h)x(x − h) + bt (x + h)x + ct (x + h) + dt ,

where bt =

qt (h) − 2qt (0) + qt (−h) qt (0) − qt (−h) , ct = , and dt = qt (−h). 2 2h h

Since qt (−h) = 0 and qt (h) = (h − t)3 , t ∈ [−h, h], h

Z

−h

(Lqt )(x) dx = 32 h3 bt + 2h2 ct = 13 h[(h − t)3 + 4qt (0)].

(5.26)

From (5.24), (5.25), and (5.26), Z

h

(f − Lf ) =

−h

where

Z

h

−h

(R3 − LR3 ) =

1 72

Z

h

f (4) (t)α(t) dt,

(5.27)

−h

α(t) := 3(h − t)4 − 4h[(h − t)3 + 4qt (0)].

Recalling the definition of qt (0), we see that ( (t − h)3 (3t + h) + 16ht3 α(t) = (t − h)3 (3t + h)

if −h ≤ t ≤ 0, if 0 ≤ t ≤ h.

(5.28)

Thus if t ≥ 0, α(−t) = (t + h)3 (3t − h) − 16ht3 and α(t) = (t − h)3 (3t + h).

(5.29)

The cubic polynomials in (5.29) are easily seen to be equal at the conveniently chosen points t = 0, ±h, 2h and therefore must be identical. Thus α is an even function of t so (5.28) may be rewritten ( (t + h)3 (3t − h) if −h ≤ t ≤ 0, α(t) = (t − h)3 (3t + h) if 0 ≤ t ≤ h. Taking derivatives, we see that α is decreasing on [−h, 0] and increasing on [0, h]. Since α(−h) = α(h) = 0, it follows that α ≤ 0 on [−h, h]. By (5.27) and the weighted mean value theorem for integrals, for some point ξ ∈ [−h, h] we have Z Z Z h f (4) (ξ) h f (4) (ξ) h h5 f (4) (ξ) (f −Lf ) = α(t) dt = (t−h)3 (3t+h) dt = − . 72 36 90 −h −h 0 The same result holds for En,k , with the point ξ depending on k. This completes the proof of the theorem.

Riemann Integration on R

143

Comparison of the Approximations R2 Table 5.3 below gives the errors 1 x−1 dx − An , rounded to 10 decimal places, where An is the approximation. The left point rule simply refers to approximation by the Riemann sum Rn . The exact value of the integral, up to 10 decimal places, is ln 2 = .6931471805 . . . TABLE 5.3: A comparison of the methods. Method Left Point Rule Trapezoidal Rule Midpoint Rule Simpson’s Rule

5.7

n=4 .1836233710 -.0038766290 .0019272893 -.0001067877

n=8 .0927753302 -.0009746698 .0004866265 -.00000735011

Improper Integrals

In this section, the Riemann integral is extended in two ways: First, the integrand is allowed to be unbounded and second, the integration interval can be infinite. 5.7.1 Definition. A function f is said to be locally integrable on an interval I if f ∈ Rdc for every interval [c, d] contained in I. ♦ For example, a continuous function is locally integrable on any interval. 5.7.2 Definition. Each expression in (a)–(c) below is called an improper integral. The integral is said to converge if the limit exists in R and to diverge otherwise. In the former case, f is said to be improperly integrable on I. Z b Z t (a) f := lim f , where f is locally integrable on [a, b). t→b−

a

(b)

Z a

(c)

Z a

b

f := lim+ t→a

b

f :=

Z a

a b

Z

f,

where f is locally integrable on (a, b].

t

c

f+

Z

b

f , where f is locally integrable on (a, c) ∪ (c, b).

♦

c

Note that the limits of integration in these definitions, where appropriate, may be infinite.

144

A Course in Real Analysis

It is easy to see that if f is locally integrable on (a, b], then Rc iff a f converges for some (every) c ∈ (a, b). In this case, Z

b

c

Z

f=

a

f+

Z

Rb a

f converges

b

f.

a

c

The first integral on the right is improper while the second is a Riemann integral. Moreover, if f is also bounded and a, b ∈ R, then, by Exercise 5.2.10, Rb the improper integral a f is simply the Riemann integral. Analogous remarks apply to the other cases. 5.7.3 Examples. (a) Let p ∈ R. For 0 < s < t, ( Z t (1 − p)−1 t1−p − s1−p if p 6= 1 dx = p x ln t − ln s if p = 1. s It follows that Z ∞ Z 1 dx dx converges iff p > 1 and converges iff p < 1. p p x 1 0 x (b) Let r > 0, r 6= 1. For t > 1,

Z

t

rx dx =

1

rt − r , hence ln r

∞

Z

rx dx converges iff r < 1.

1

(c) Since

t

Z

(1 + x2 )−1 dx = tan−1 t − tan−1 s,

s ∞

Z 0

Z

1

(d) −1

dx = 1 + x2

dx p = |x|

Z

0

−1

Z

0

−∞

dx √ + −x

Z 0

1

dx √ = 2 lim t→0+ x

∞

Z

dx π = , hence 2 1+x 2

−∞ 1

Z t

dx = π. 1 + x2

dx √ = 4. x

♦

For ease of exposition, for the remainder of the section we consider only integrals that are improper at the upper limit. Analogous discussions hold for the other types of improper integrals. The proof of the following theorem is left to the reader. 5.7.4 Theorem. Let f and g be locally integrable on [a, b) and let α, β ∈ R. Rb Rb Rb If the improper integrals a f and a g converge, then so does a (αf + βg), and Z Z Z b

b

(αf + βg) = α

a

b

f +β

a

g. a

Riemann Integration on R

145

In contrast to the Riemann integral, the product of improperly integrable √ functions may not be improperly integrable. For example, f (x) := 1/ 1 − x is improperly integrable on the interval [0, 1) but f 2 is not. The following example illustrates the same phenomenon, but on an unbounded interval. It is the first of several examples in this P∞section that uses the fact, proved in Chapter 6, that a series of the form n=1 1/np converges iff p > 1. 5.7.5 Example. Define f on [1, +∞) by f (x) = n if n ≤ x < n + 1/n5/2 , n = 1, 2, . . ., and f (x) = 0 otherwise. Then n+1

Z

f=

1

hence

R∞ 1

n X 1 3/2 k k=1

f converges, whereas

and

n+1

Z

f2 =

1

R∞ 1

n X 1 , 1/2 k k=1

f 2 diverges.

♦

We now have examples, on both bounded and unbounded intervals, of nonnegative improperly integrable functions whose squares are not improperly integrable. Conversely, there exist locally integrable nonnegative functions on unbounded intervals, for example, f (x) = 1/x on [1, +∞), such that f 2 is improperly integrable but f is not. However, for bounded intervals this is not possible: If f 2 is improperly integrable on a bounded interval, then so is |f |. (Exercise 25.) The remainder of this section describes various convergence tests for improper integrals. Many of these are analogs of convergence tests for infinite series, discussed in Chapter 6. 5.7.6 Comparison Test for Integrals. Let f and g be locally integrable on Rb Rb [a, b) such that 0 ≤ f ≤ g. If a g converges, then so does a f . Rx Rx Proof. Let F (x) = a f and G(x) = a g, a ≤ x < b. Since f and g are nonnegative, F and G are increasing, hence, by the monotone function theorem (3.1.17), Z b Z b f = lim− F (x) and g = lim− G(x) a

x→b

x→b

a

exist in R. Since F ≤ G, the conclusion follows. 1 + sin x , x > 0. By definition, 5.7.7 Example. Let f (x) = √ x(x + 1)2 Z 0

∞

f=

Z 0

1

f+

Z

∞

f, 1

provided the integrals on the right converge. That this is √indeed the case follows from 5.7.3(a), 5.7.6, and the inequalities f (x) ≤ 2/ x on (0, 1] and f (x) ≤ 2/(x + 1)2 on [1, +∞). ♦

146

A Course in Real Analysis

5.7.8 Example. Define the gamma function Γ by Z ∞ tx−1 e−t dt, x > 0. Γ(x) = 0

To see that the integral converges for all x > 0, note that tx−1 e−t ≤ tx−1 R 1 x−1 R 1 for t ∈ (0, 1], hence 0 t e−t dt converges by comparison with 0 tx−1 dt (see 5.7.3(a)). Furthermore, by l’Hospital’s rule applied sufficiently many times, lim tx+1 e−t = 0

t→+∞

x+1 −t so there exists e ≤ 1, or tx−1 e−tR≤ t−2 , for all t ≥ t0 . 0 > 1 such that t R ∞ tx−1 ∞ −t Therefore, 1 t e dt converges by comparison with 1 t−2 dt. The gamma function has the following recursive property:

Γ(x + 1) = xΓ(x). To see this, integrate Γ(x + 1) by parts to obtain Z b Z b t=a tx e−t dt = tx e−t +x tx−1 e−t dt, t=b

a

a

and then let a → 0 and b → +∞. In particular, for n ∈ N Γ(n + 1) = nΓ(n) = n(n − 1)Γ(n − 1) = · · · = n(n − 1) · · · 1 · Γ(1). Since Γ(1) =

Z

∞

e−t dt = 1,

0

we see that Γ(n + 1) = n!. Thus Γ(x) is a continuous (indeed, differentiable) extension of the factorial function on N. ♦ 5.7.9 Limit Comparison Test for Integrals. Let f and g be locally integrable on [a, b) with f ≥ 0 and g > 0. If L := limx→b f (x)/g(x) exists and Rb Rb 0 < L < +∞, then a g converges iff a f converges. Rb Rb Proof. Since f, g ≥ 0, a f and a g exist in R. Choose c ∈ (a, b) such that L/2 < f (x)/g(x) < 2L for all x ∈ [c, b). For such x, g(x) < 2f (x)/L and f (x) < 2Lg(x). The assertion then follows from the inequalities Z b Z Z b Z b 2 b g≤ f and f ≤ 2L g. L c c c c √ 5 2 2x − x + 1 5.7.10 Example. Let f (x) = , x ≥ 1. For g(x) = x−3/2 , x4 + 3x + 5 √ √ 2x8 − x5 + x3 f (x) lim = lim = 2. 4 x→+∞ g(x) x→+∞ x + 3x + 5 R∞ R∞ Since 1 g converges, so does 1 f . ♦

Riemann Integration on R

147

5.7.11 Root Test for Integrals. Let f be locally integrable and nonnegative on [a, b), where b > 0, and suppose that L := limx→b− [f (x)]1/x exists in R. Rb Then a f converges if L < 1 and diverges if L > 1. Proof. Suppose L < 1. Choose r ∈ (L, 1) and x0 ∈ (a, b) ∩ (0, b) such that [f (x)]1/x < r for all x ≥ x0 . For such x, f (x) < rx , hence, by the comparison Rb Rb theorem and 5.7.3(b), x0 f converges. Therefore, a f converges. A similar Rb argument shows that a f diverges if L > 1. 5.7.12 Example. For p ∈ R and x ≥ 1, let px 2x + cos x f (x) = . 3x + sin x 1/x

Since lim [f (x)] x→+∞

= (2/3)p ,

R +∞ 1

f converges iff p > 0.

♦

There are examples of convergent integrals and divergent integrals with L = 1, so the root test in inconclusive in this case (see Exercise 3). 5.7.13 Definition. Let f be locally integrable on [a, b). The improper integral Rb Rb f is said to converge absolutely if a |f | converges. In this case f is said to be a Rb improperly absolutely integrable on [a, b). If a f converges but not absolutely, then the integral is said to converge conditionally. ♦ 5.7.14 Proposition. If f is improperly absolutely integrable on [a, b), then Rb f converges and a Z b Z b f ≤ |f |. a

a

Proof. Set g(x) := |f (x)| + f (x), so 0 ≤ g ≤ 2|f | on [a, b]. By the comparison Rb test, a g converges. Since f = g − |f | is the difference of two improperly integrable functions, f is improperly The inequality follows on R t integrable. Rt letting t → +∞ in the inequality | a f | ≤ a |f |. 5.7.15 Example. For p > 0, define f (x) = Then Z 1

(−1)n+1 , n ≤ x < n + 1, n = 1, 2, . . . . np

n+1

|f | =

Z n+1 X n n X 1 (−1)k+1 and f = . p k kp 1

k=1

k=1

The first sum has a finite limit iff p > 1, Rwhile the second sum has a finite ∞ limit iff p > 0 (see Chapter 6). Therefore, 1 f converges absolutely iff p > 1 and conditionally iff 0 < p ≤ 1. ♦

148

A Course in Real Analysis

The following theorem is useful in establishing conditional convergence of improper integrals. 5.7.16 Dirichlet’s Test for Integrals. Let f be continuous and g 0 improperly Rt absolutely integrable on [a, b). If the function F (t) := a f is bounded on [a, b) Rb and limx→b− g(x) = 0, then a f g converges. Proof. Let M be a bound for |F | on [a, b). Then |F g 0 | ≤ M |g 0 |, hence, by the comparison test, F g 0 is absolutely integrable on [a, b). Integrating by parts yields Z Z t

t

f g = F (t)g(t) −

a

Since

Rb a

F g0 .

a

F g 0 converges and limt→b− F (t)g(t) = 0,

Rb a

f g converges.

5.7.17 Corollary. Let f be continuous and g 0 locally integrable on [a, b) with Rt limx→b− g(x) = 0. If the function F (t) := a f is bounded on [a, b) and if g 0 Rb has constant sign, then a f g converges. Rt Proof. By the fundamental theorem of calculus, a g 0 = g(t) − g(a), hence g 0 is absolutely integrable on [a, b) and Dirichlet’s test applies. 5.7.18 Example. Let h(x) = x−p sin x, Rx ≥ 1, where p > 0. Taking f (x) = ∞ sin x and g(x) = x−p in 5.7.17 shows that 1 h converges. Since |h(x)| ≤ 1/xp , h is improperly absolutely integrable on [1, +∞) if p > 1. If 0 < p ≤ 1, the sums on the right in the inequality Z

nπ

π

|h| =

n Z X k=2

kπ

|h| >

(k−1)π

n X

Z

(kπ)−p

kπ

| sin x| dx = 2π −p

(k−1)π

k=2

n X

k −p

k=2

are unbounded (see Example 6.2.5), hence h is not improperly absolutely integrable in this case. ♦ 5.7.19 Cauchy–Schwarz Inequality for Improper Integrals. Let f and g be continuous with f 2 and g 2 improperly integrable on [a, b). Then f g is improperly absolutely integrable on [a, b) and b

Z

|f g|

2

b

Z

2

≤

a

b

Z

g2 .

f · a

a

Proof. By Exercise 5.2.13, for all t ∈ [a, b) Z

2

t

|f g| a

Now let t → b.

Z ≤ a

t

f2 ·

Z a

t

g2 .

Riemann Integration on R

149

Exercises Z

1

dx converges. p (1 − x)q (sin x) 0 Z ∞ Z ∞ 2. Let p > 0. Show that x−px dx converges and x−p/x dx diverges. 1 1 Z 1 Z 1 −px Show that the same behavior holds for x dx and x−p/x dx. 1.S Determine all values of p, q > 0 for which

0

0

3. Find examples for which limx→+∞ [f (x)] = 1 and Z ∞ Z ∞ (a) f converges. (b) f diverges. 1/x

1

1

4. Let f and g be positive and continuous on [1,R +∞). Prove that if ∞ f f (x) L := limx→+∞ exists in R, then lim Rx∞ = L. x→+∞ g(x) g x 5.S Determine if the integrals converge or diverge: Z 1 Z 1 Z 1√ sin x sin x sin x − x dx. (b) dx. (c) dx. (a) 3 2 x x x 0 0 0 Z ∞ Z ∞ Z ∞ (ln x)(sin x) (sin x)(cos x−1 ) 1 (d) dx. (e) dx. (f) dx. sin2 ln x x x 2 2 1 π/2

Z

6. Prove that

cos(secp x) dx converges for all p > 0.

0 1

7. Show that

Z

8.S Show that

Z

√

0

0

1

xp dx converges iff p > −1. 1 − x2

sinp x dx converges iff p < 1 + q. xq

9. Find all values of p for which the integral converges: Z ∞ Z 1 Z (a) S xp e−x dx. (b) xp e−x dx. (c) S 1

(d) (g) (j)

Z

1

xp sin xp dx. (e)

0 Z π/2 0 Z π/2 0

0 1

Z

xp ln x dx.

(f)

0

sinp x dx.

(h) S

Z

π/2

x sinp x dx.

(i)

0

tanp x dx.

(k) S

Z

Z

1

sin xp dx.

0 ∞

xp ln x dx.

1 Z π/2

(1 − sin x)p dx.

0 π/2

xp cos x dx. (l)

0

10. Find all values of p > 0 for which

Z

π/2

xp sin x dx.

0

Z 0

1

x−p sin ex dx converges absolutely.

150

A Course in Real Analysis

11.S Prove that

∞

Z 1

12. Prove that

x sin x dx converges conditionally. 1 + x2

∞

Z

xp sin ex dx converges for all p. For what values of p does

1

the integral converge conditionally? (See 5.7.18.) 13.S Find all values of p, q > 0 for which the integral converges: Z 1 Z 1 Z ∞ xp dx dx √ p . (b) dx. (c) . (a) p )q 2p )q (1 − x (1 − x x + xq 0 1 0 Z 1 Z π/2 Z π/2 sinp x 1 dx √ p (d) . (e) dx. (f) p q dx. qx q cos sin x x + x 0 0 0 Z ∞ 14. Prove by induction that xn e−x dx = n!. 0

Z

15.S Given that

∞

e−x

2

/2

dx =

√

2π (to be established in 11.5.3) show

−∞

that, 1 √ 2π

Z

∞

2

x2n e−x

/2

dx = (2n − 1)(2n − 3) · · · 3 · 1 =

−∞

(2n)! . n!2n

√ 2 e−s ds = π/2, show that √ √ √ 1 3 π 5 3 π Γ = π, Γ = , and Γ = . 2 2 2 2 4

16. Given that

R∞ 0

17. The formula Γ(x) = x−1 Γ(x + 1) may be used to extend the gamma func tion to non-integer values x < 0. Use this to find Γ − 21 and Γ − 32 . 18. Prove that if f is absolutely integrable on [1, ∞), then Z ∞ lim f (xn )dx = 0. n→∞

1

19. (Log test for integrals). Let f be locally integrable and positive on [0, +∞) such that − ln f (x) L := lim x→∞ ln x Z ∞ exists in R. Prove that f converges if L > 1 and diverges if L < 1. 0

20. Use Exercise 19 to determine the convergence behavior of Z ∞ Z ∞ − ln x −√x (a) ln x dx. (b) ln x dx. S

1

What does the root test reveal?

1

Riemann Integration on R 151 Z t sin ax 21. Prove that L(a) := lim dx converges for all a ∈ R and that t→+∞ 1/t x L(a) = L(1) for all a > 0. 22. Let f be differentiable and nonzero on [1, +∞). If lim xf 0 (x)/f (x) x→+∞ R∞ exists in R and is less than −1, prove that 1 f converges. R∞ R1 23. Prove that if 0 f (x) dx converges, then limn 0 f (nx) dx = 0. R∞ R∞p 24.S Prove that if f ≥ 0 and 1 f converges, then 1 f (x)/x dx converges. 25. Prove that if [a, b) is finite and f 2 is improperly integrable on [a, b), then |f | is improperly integrable on [a, b). 26.S Let f be continuous and g locallyR integrable and positive on [a, b). x Suppose that the function G(x) := a g is bounded on [a, b) and that Rb limx→b− f (x) = 0. Prove that a f g converges. Rb 27. Let f be continuous on [a, b) such that a f converges. If g 0 is locally Rb integrable and has constant sign on [a, b), prove that a f g converges. 28.S Let f be improperly integrable on (−∞, +∞) and c ∈ R. Prove that Z

∞

f (x + c) dx =

−∞

5.8

Z

+∞

f (x) dx.

−∞

A Deeper Look at Riemann Integrability

In this section we characterize Riemann integrability of a function in terms of the size of its set of discontinuities. 5.8.1 Definition. A set A of real numbers is said to have (Lebesgue ) measure zero if for each ε P > 0 there exists a finite or infinite sequence of intervals In with total length n |In | < ε such that the sequence covers A, that is, every member of A is contained in some In . ♦ Any countable set has measure zero. Indeed, if A = {a1 , a2 , . . .} and ε > 0, then the intervals In = (an − ε/2n+2 , an + ε/2n+2 ) obviously cover A and have total length < ε. In particular, the set of rational numbers has measure zero. An uncountable set of measure zero is constructed in Example 10.3.4. The following result will be proved in Chapter 11. 5.8.2 Theorem. Let f be bounded on [a, b]. Then f ∈ Rba iff its set of discontinuities has measure zero.

152

A Course in Real Analysis

Examples 5.1.11 and 5.1.12 are relevant here: The function in the first example, shown to be integrable, has a countable set of discontinuities. The function in the second example, shown not to be integrable, has [0, 1] as its set of discontinuities, certainly not a set of measure zero. Theorem 5.8.2 allows simple proofs of many of the properties discussed in this chapter. For example, if f and g are integrable with sets of discontinuity A and B, respectively, then f + g and f g have sets of discontinuity contained in A ∪ B, a set of measure zero (Exercise 2), and hence are integrable.

Exercises 1. Show that if B has measure zero and A ⊆ B, then A has measure zero. 2.S Prove: If An has measure zero for every n ∈ N, then so does A1 ∪A2 ∪· · · . 3. Let A have measure zero. Prove that A + Q has measure zero. 4. Let f : [a, b] → [c, d] be integrable and g : [c, d] → R continuous. Prove that g ◦ f is integrable. 5. A set A of real numbers has (Jordan) content zero if for each ε > 0 there exist finitely many intervals of total length < ε that cover A. Show that (a) a convergent sequence has content zero. (b) [0, 1] ∩ Q does not have content zero. 6.S Prove that the function f in Exercise 3.3.10 is integrable on [a, b] and find its integral.

*5.9

Functions of Bounded Variation

5.9.1 Definition. Let P = {a = x0 < x1 < · · · < xn = b} be a partition of [a, b]. For f : [a, b] → R define VP (f ) =

n X

|f (xj ) − f (xj−1 )|.

j=1

The total variation of f on [a, b] is the extended real number Vab (f ) := sup VP (f ). P

The function f is said to have bounded variation on [a, b] if Vab (f ) < +∞. The set of all functions with bounded variation on [a, b] is denoted by BV ba . ♦

Riemann Integration on R

153

5.9.2 Proposition. Let f : [a, b] → R. (a) If f ∈ BV ba , then f is bounded. (b) If f has a bounded derivative on [a, b], then f ∈ BV ba . (c) If f is monotone on [a, b], then Vab (f ) = |f (b) − f (a)|. Rx (d) If g ∈ Rba and f (x) = a g(t) dt, then Vab (f ) ≤ (b − a) sup[a,b] |g|. (e) If P is a partition of [a, b] and Q is a refinement of P, then VP (f ) ≤ VQ (f ). (f) If f, g ∈ BV ba and c ∈ R, then f + g, cf, f g ∈ BV ba . Proof. (a) Let a < x < b and P = {a, x, b}. Then 2|f (x)| ≤ |f (x) − f (a)| + |f (x) − f (b)| + |f (a)| + |f (b)| = VP (f ) + |f (a)| + |f (b)| ≤ Vab (f ) + |f (a)| + |f (b)|. (b) Let |f 0 | ≤ C on [a, b]. By the mean value theorem, given a partition P, there exists for each j a point tj ∈ (xj−1 , xj ) such that X X VP (f ) = |f (xj ) − f (xj−1 )| = |f 0 (tj )|(xj − xj−1 ) ≤ C(b − a). P

P

Therefore, Vab (f ) ≤ C(b − a). (c) If f is increasing, then X X |f (xj ) − f (xj−1 )| = f (xj ) − f (xj−1 ) = f (b) − f (a). P

P

(d) Let M := supa≤t≤b |g(t)|. Then, for any partition P, X Z xj X VP (f ) ≤ |g(t)| dt ≤ M (xj − xj−1 ) = M (b − a). P

xj−1

P

(e) Let P = {a = x0 < x1 < · · · < xn = b} and P 0 = P ∪ {c}, where c ∈ [xi−1 , xi ]. Then X VP (f ) = |f (xj ) − f (xj−1 )| + |f (xi ) − f (xi−1 )| j6=i

≤

X

|f (xj ) − f (xj−1 )| + |f (xi ) − f (c)| + |f (c) − f (xi−1 )|

j6=i

= VP 0 (f ). Adding points successively, yields (e).

154

A Course in Real Analysis

(f) Let |f |, |g| ≤ M on [a, b]. The inequality |(f g)(xj ) − (f g)(xj−1 )| ≤ M |g(xj ) − g(xj−1 )| + M |f (xj ) − f (xj−1 )| shows that f g ∈ BV ba . The proofs of the remaining parts of (f) are similar. 5.9.3 Example. For α > 0, define a continuous function fα on [0, 1] by ( xα sin(1/x) if 0 < x ≤ 1, fα (x) := 0 if x = 0. We show that if α ≤ 1, then fα does not have bounded variation on [0, 1]. Set ak :=

2 1 1 = and bk := 2kπ + π/2 (4k + 1)π 2kπ

and note that fα (bk ) = 0 and fα (ak ) = aα k =

c 2α , where c := α . α (4k + 1) π

Since bk+1 < ak < bk , for sufficiently small ε > 0 we may form the partition Pε = {ε < ap < bp < ap−1 < · · · < ak < bk < · · · < bq+1 < aq < bq < 1} of [ε, 1], where p and q are, respectively, the largest and smallest integers satisfying ε < ap < bq < 1, equivalently, 1 2 − πε s. It follows that lim+ Vtb (f ) ≥ s. Since s was arbitrary, the assertion follows. t→a

5.9.6 Example. We use 5.9.5 to show that the function fα in 5.9.3 has bounded variation on [0, 1] if α > 1. We have |fα0 (x)| = |αxα−1 sin(1/x) − xα−2 cos(1/x)| ≤ αxα−1 + xα−2 . R1 R1 If α > 1, the integral 0 xα−2 dx converges, hence 0 |fα | converges.

♦

5.9.7 Theorem. If f ∈ BV ba , then there exist monotone increasing functions g and h on [a, b] such that f = g − h. Proof. For x ∈ [a, b], define g(x) := Vax (f ) and h(x) := g(x) − f (x). Clearly, g is increasing. To see that h is increasing, let x < y, let Px be an arbitrary partition of [a, x], and let Py = Px ∪ {y}. Then VPx (f ) + f (y) − f (x) = VPy (f ) ≤ g(y). Taking suprema over all partitions Px yields g(x) + f (y) − f (x) ≤ g(y), that is, h(x) ≤ h(y). From Exercise 5.1.4 we have 5.9.8 Corollary. BV ba ⊆ Rba .

156

A Course in Real Analysis

*5.10

The Riemann–Stieltjes Integral

In this section we describe the main features of the Riemann-Stieltjes integral, a generalization of the Riemann integral. These integrals have many of the properties of Riemann integrals; however, as we shall see, there are some striking differences.

Definition and General Properties 5.10.1 Definition. Let f and w be bounded, real-valued functions on an interval [a, b]. If P = {x0 = a < x1 < · · · < xn = b} and ξj ∈ [xj−1 , xj ], then Sw (f, P, ξ) :=

n X

f (ξj )∆wj , ∆wj := w(xj ) − w(xj−1 ), ξ := (ξ1 , . . . , ξn ),

j=1

is called a Riemann-Stieltjes sum of f with respect to w. The function f is said to be Riemann-Stieltjes integrable with respect to w if for some I ∈ R and each ε > 0, there exists a partition Pε such that |Sw (f, P, ξ) − I| < ε for all refinements P of Pε and all choices of ξ. In this case I is called the Riemann-Stieltjes integral with respect to w and is denoted by Z b Z b f dw = f (x) dw(x) = lim Sw (f, P, ξ). (5.32) a

a

P

The function f is called the integrand and w the integrator. The collection of all functions that are Riemann-Stieltjes integrable with respect to w is denoted by Rba (w). ♦ It follows from 5.1.18 that, for the integrator w(x) = x, the RiemannStieltjes integral reduces to the Riemann integral. It is clear that constant functions are Riemann-Stieltjes integrable. The following example shows that, in contrast to the Riemann integral, if f has a Rb simple discontinuity, then a f dw may not exist. 5.10.2 Example. Let f : [0, 1] → R and define ( 0 if 0 ≤ x < 1, w(x) := 1 if x = 1 We show that f ∈ R10 (w) iff f is continuous at 1. Let P = {x0 = 0 < x1 < · · · < xn = 1} be any partition of [0, 1]. Then Sw (f, P, ξ) = f (ξn )[w(1) − w(xn−1 )] = f (ξn ).

Riemann Integration on R

157

Hence if f ∈ R10 (w) and ξ is chosen so that first ξn = 1 and second ξn < 1, we R1 see that f is continuous at 1 and 0 f dw = f (1). Conversely, if f is not continuous at 1, then there exists a sequence {am } and r > 0 such that am ↑ 1 and |f (am ) − f (1)| ≥ r for every m. Let Pm denote the refinement P ∪ {am } of P, where am ∈ (xn−1 , 1]. If ξ consists of the left endpoints of the intervals of Pm , then Sw (f, Pm , ξ) = f (am ), hence |Sw (f, Pm , ξ) − f (1)| = |f (am ) − f (1)| ≥ r. Since P was arbitrary, f 6∈ R10 (w).

♦

5.10.3 Theorem. If f, g ∈ Rba (w) and α, β ∈ R, then αf + βg ∈ Rba (w) and Z

b

(αf + βg) dw = α

Z

a

b

f dw + β

a

Z

b

g dw. a

Proof. This follows from the identity Sw (αf + βg, P, ξ) = αSw (f, P, ξ) + βSw (g, P, ξ) and the linearity of the limit in (5.32), as is readily established by a standard argument. 5.10.4 Theorem. Let w := αu + βv, where α, β ∈ R and u, v : [a, b] → R are bounded. If f ∈ Rba (u) ∩ Rba (v), then f ∈ Rba (w) and Z

b

f dw = α

a

Z

b

f du + β

Z

a

b

f dv. a

Proof. This follows from Sw (f, P, ξ) = αSu (f, P, ξ) + βSv (f, P, ξ) and the linearity of the limit in (5.32). 5.10.5 Theorem. Let a < c < b. If f |[a,c] ∈ Rca (w) and f |[c,b] ∈ Rbc (w), then f ∈ Rba (w) and Z b Z c Z b f dw = f dw + f dw. a

a

c

Proof. Given ε > 0, choose partitions Pε0 of [a, c] and Pε00 of [c, b] such that the following hold: Z c 0 0 0 0 0 f dw − S (f, P , ξ ) w < ε/2 for all refinements P of Pε and all ξ , Z a b 00 00 f dw − Sw (f, P , ξ ) < ε/2 for all refinements P 00 of Pε00 and all ξ 00 . c

158

A Course in Real Analysis

Then Pε := Pε0 ∪ Pε00 is a partition of [a, b] containing c. Moreover, if P is a refinement of Pε , then P 0 := P ∩ [a, c] and P 00 = P ∩ [c, b] are refinements of Pε0 and Pε00 , respectively. From Sw (f, P, ξ) = Sw (f, P 0 , ξ 0 ) + Sw (f, P 00 , ξ 00 ) and the above inequalities we see that Z Z b c f dw + f dw − Sw (f, P, ξ) < ε/2 + ε/2 = ε. a c This establishes the existence of

Rb a

f dw as well as the desired equality.

5.10.6 Example. Consider the floor function integrator R n w(x) = bxc. A slight modification of the argument in 5.10.2 shows that 0 f (x) dbxc exists iff f is Rk left continuous at the integers 1, 2, . . . , n, in which case k−1 f (x) dbxc = f (k). For such a function, 5.10.5 implies that Z n n Z k n X X f (x) dbxc = f (x) dbxc = f (k). ♦ 0

k=1

k−1

1

The preceding example suggests that improper Riemann-Stieltjes integration could be used to provide a unified theory that includes both improper Riemann integrals and infinite series. This is indeed possible; however, it turns out that Lebesgue integration is a more efficient approach. Lebesgue theory on Rn is developed in Chapter 11. The following theorem reveals a remarkable symmetry between integrand and integrator. 5.10.7 Integration by Parts Formula. If f ∈ Rba (w), then w ∈ Rba (f ) and Z b Z b f dw + w df = f (b)w(b) − f (a)w(a). a

a

Proof. For any partition P{x0 = a, x1 , . . . , xn−1 , xn = b}, f (b)w(b) − f (a)w(a) = Sf (w, P, ξ) =

n X j=1 n X j=1

f (xj )w(xj ) − w(ξj )f (xj ) −

n X j=1 n X

f (xj−1 )w(xj−1 ) and w(ξj )f (xj−1 ).

j=1

Subtracting we obtain f (b)w(b) − f (a)w(a) − Sf (w, P, ξ) n n X X = f (xj−1 )[w(ξj ) − w(xj−1 )] + f (xj )[w(xj ) − w(ξj )] j=1

= Sw (f, Q, ζ),

j=1

Riemann Integration on R

159

where ζ = (a, x1 , x1 , x2 , x2 , . . . , xn−1 , xn−1 , b) and Q is the refinement of P obtained by adding the coordinates of ξ to P. Therefore,

ξ P a

ξ1

ξ2

ξ3 x2

x1

ξ4

ξ5 x4

b

ξ4 x4 ξ5

b

x3

ζ Q a

ξ1 x1 ξ2 x2 ξ3 x3

FIGURE 5.10: The partition Q. Z b Z b (b)w(b) − f (a)w(a) − f dw − S (w, P, ξ) = (f, Q, ζ) − f dw . f Sw f a

a

Since f ∈ Rba (w), the right side may be made arbitrarily small, Therefore, Rb Rb w df exists and equals f (b)w(b) − f (a)w(a) − a f dw. a The next result shows that under certain general conditions the RiemannStieltjes integral reduces to a Riemann integral. 5.10.8 Theorem. Let f ∈ Rba (w). If w is continuously differentiable, then f w0 ∈ Rba and Z b Z b f dw = f (x)w0 (x) dx. a

a

Proof. For any partition P of [a, b] and any ξ, Sw (f, P, ξ) − S(f w0 , P, ξ) =

n X j=1

f (ξj )∆wj −

n X

f (ξj )w0 (ξj )∆xj .

j=1

By the mean value theorem, for each j there exists tj ∈ (xj−1 , xj ) such that ∆wj = w(xj ) − w(xj−1 ) = w0 (tj )∆xj . Therefore, Sw (f, P, ξ) − S(f w0 , P, ξ) =

n X

f (ξj ) w0 (tj ) − w0 (ξj ) ∆xj .

(5.33)

j=1

Let |f | ≤ M on [a, b]. By uniform continuity of w0 , given ε > 0, there exists a δ > 0 such that |w0 (x) − w0 (y)| <

ε whenever |x − y| < δ. 2M (b − a)

(5.34)

160

A Course in Real Analysis

Let Pε0 be a partition of [a, b] with kPε0 k < δ. From (5.33) and (5.34), n

|Sw (f, P, ξ) − S(f w0 , P, ξ)| ≤

X ε ε ∆xj = 2(b − a) j=1 2

(5.35)

for all refinements P of Pε0 and all ξ. Next, choose a partition Pε00 such that Z b f dw − Sw (f, P, ξ) < ε/2 for all ξ and all refinements P of Pε00 . (5.36) a

If P is a refinement of Pε0 ∪ Pε00 , then both (5.35) and (5.36) hold, hence, by the triangle inequality, Z b f dw − S(f w0 , P, ξ) < ε. a

This shows that f w0 ∈ Rba and establishes the equality.

Monotone Increasing Integrators If w : [a, b] → R is monotone increasing, then the Riemann-Stieltjes integral may be characterized in terms of upper and lower sums, as in the Darboux theory. This fact will lead to an important existence theorem for integrators of bounded variation and continuous integrands. Let f : [a, b] → R be bounded and let P be a partition of [a, b]. Define the upper and lower Darboux–Stieltjes sums of f with respect to w by S w (f, P) =

n X

Mj ∆wj

and S w (f, P) =

j=1

n X

mj ∆wj ,

j=1

where Mj = Mj (f ) :=

sup

xj−1 ≤x≤xj

f (x) and mj = mj (f ) :=

inf

xj−1 ≤x≤xj

f (x).

The upper and lower Darboux–Stieltjes integrals of f with respect to w are defined, respectively, by Z b Z b f dw := inf S w (f, P) and f dw := sup S w (f, P). a

P

a

P

As in the Darboux theory, if Q is a refinement of P then, because w is increasing, Z b Z b S w (f, P) ≤ S w (f, Q) ≤ f dw ≤ f dw ≤ S w (f, Q) ≤ S w (f, P). a

a

Here is the analog of 5.1.8 for Riemann–Stieltjes integrals.

Riemann Integration on R

161

5.10.9 Theorem. The following statements are equivalent: (a) f ∈ Rba (w). (b) For each ε > 0, there exists a partition Pε such that S w (f, P) − S w (f, P) < ε. (c)

Z

b

f dw =

a

b

Z

f dw. a

If these conditions hold, then

Z

b

f dw =

a

Z

b

f dw =

a

Z

b

f dw. a

Proof. That (b) and (c) are equivalent is proved exactly as in 5.1.8. Assume that (a) holds. Given ε > 0, choose a partition Pε such that Z b f dw − Sw (f, P, ξ) < ε/3 for all refinements P of Pε and all ξ. (5.37) a

For such a partition P and for each j, there exists a sequence {ξj,k }∞ k=1 in [xj−1 , xj ] such that limk f (ξj,k ) = Mj (f ). It follows that lim Sw (f, P, ξ k ) = S w (f, P), where ξ k = (ξ1,k , . . . , ξn,k ). k

From (5.37), Z

b

Z

b

a

Similarly, a

f dw − S w (f, P) ≤ ε/3. f dw − S w (f, P) ≤ ε/3.

Part (b) now follows from the triangle inequality. Now assume that (c) holds. Let I denote the common value of the integrals in (c). Given ε > 0, choose partitions Pε0 and Pε00 such that I − ε < S w (f, Pε0 ) and S w (f, Pε00 ) < I + ε. The inequalities still hold if Pε0 and Pε00 are replaced by any refinement P of Pε := Pε0 ∪ Pε00 . Thus −ε < S w (f, P) − I ≤ Sw (f, P, ξ) − I ≤ S w (f, P) − I < ε. This shows that f ∈ Rba (w) and

Rb a

f dw = I.

162

A Course in Real Analysis

Integrators of Bounded Variation Recall that a function of bounded variation may be expressed as the difference of two monotone increasing functions (5.9.7). This, together with 5.10.9, allows for a simple proof of the following existence theorem. 5.10.10 Theorem. If f : [a, b] → R is continuous and w : [a, b] → R has bounded variation, then f ∈ Rba (w). Proof. By the remark preceding the theorem and by 5.10.4, we may assume that w is increasing. By uniform continuity of f , given ε > 0, there exists a δ > 0 such that ε |f (x) − f (y)| < for all x, y with |x − y| < δ. w(b) − w(a) + 1 Let Pε be a partition with kPε k < δ. For any refinement P of Pε , kPk < δ, hence ε Mj (f ) − mj (f ) ≤ . w(b) − w(a) + 1 Therefore, S w (f, P) − S w (f, P) =

n X Mj (f ) − mj (f ) ∆wj ≤ ε, j=1

which shows that f ∈ Rba (w). The conclusion of the theorem does not necessarily hold if w fails to have bounded variation, even if w is continuous: 5.10.11 Example. Let f = w = f1/2 , where fα is defined as in Example 5.9.3. R1 We show that 0 f dw does not exist. Referring to that example, let Pε be the partition ε < ap < bp < ap−1 < · · · < bk+1 < ak < bk < · · · < bq+1 < aq < bq < 1, of [ε, 1], and let ξ consist of left endpoints of Pε . Then Sw (f, P, ξ) = f (ε) w(aq ) − w(ε) + f (bq ) w(1) − w(bq ) +

p X

p−1 X f (ak ) w(bk ) − w(ak ) + f (bk+1 ) w(ak ) − w(bk+1 ) .

k=q

Since f1/2 (bk ) = 0 and f1/2 (ak ) =

k=q

√

ak ,

Sw (f, Pε , ξ) = f (ε)

√

p X aq − w(ε) − ak . k=q

Since the sums diverge as ε → 0, limε→0 Sw (f, Pε , ξ) = −∞.

♦

Chapter 6 Numerical Infinite Series

An infinite series is the limit of a sequence of expanding finite sums. The terms of these sums may be real numbers or functions. In this chapter we examine the convergence behavior of series of the former type; series whose terms are functions are treated in the next chapter. In the first section, we give examples of series that may be summed, that is, for which an explicit numerical value may be calculated. The remaining sections describe various tests for convergence of general series. Additional methods of summing series may be found in Section 7.4.

6.1

Definition and Examples

6.1.1 Definition. Let {an } be a sequence of real numbers. The various symbols ∞ X X X an = an = an = a1 + a2 + · · · + an + · · · n

n=1

represent what is called an infinite series with nth term an or, simply, a series. The nth partial sum of the series is defined by sn =

n X

ak .

k=1

The series is said to converge if the sequence of partial sums converges, in which case we write X an = lim sn n P and call an the sum of the series. If the sequence {sn } diverges, then the series is said to diverge. ♦ 6.1.2 Remark. A series may begin with an index other than 1. In this regard, note that, because sn = sm−1 +

n X

ak , n ≥ m > 1,

k=m

163

164

A Course in Real Analysis P∞ P∞ the series s := n=1 an converges iff n=m an converges. In this case the “tail end” of the series tends to zero: ∞ X

lim

m→+∞

an =

n=m

lim (s − sm−1 ) = 0.

♦

m→+∞

6.1.3 Example. Using the definition e := lim (1 + 1/n)n (see 2.2.4), we show n→∞ that ∞ X 1 e= . n! n=0 Pn First, since the partial sums sn := k=0 1/k! increase, the limit s := limn sn exists in R. From the calculations in 2.2.4, n

(1 + 1/n) = 2 +

n X 1 (1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n) ≤ sn . k!

k=2

Letting n → ∞, we obtain e ≤ s. On the other hand, if n > m, then n

(1 + 1/n) > 2 +

m X 1 (1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n). k!

k=2

Letting n → ∞, we see that e ≥ sm . Letting m → ∞ yields e ≥ s. 6.1.4 Example. The geometric series

arn =

n=0

This follows from the calculation sn =

♦

arn , where a, r ∈ R and a = 6 0,

n=0

converges iff |r| < 1, in which case ∞ X

∞ X

a . 1−r

n X

ark = a

k=0

1 − rn+1 , r 6= 1. 1−r

♦

6.1.5 Example. For m ∈ N, ∞ X

m

1 1 X1 = . n(m + n) m k n=1

(6.1)

k=1

To see this, we use partial fractions: For n > m msn =

n X k=1

X n m n+m X X 1 m 1 1 1 = − = − . k(m + k) k (k + m) k k k=1

k=1

(6.2)

k=n+1

The second sum on the extreme right in (6.2) is less than m/(n + 1) and hence tends to zero as n → ∞. ♦

Numerical Infinite Series

165

The series in (6.1) is an example of a telescopic series, the name referring to the cancellations taking place in (6.2). P 6.1.6PTheorem. Let {anP } and {bn } be sequences and let α, β ∈ R. If an and bn converge, then (αan + βbn ) converges and X X X (αan + βbn ) = α an + β bn . (6.3) Pn Pn Proof. Let sn = Pk=1 ak and tn = k=1 bk . Then αsn + βtn is the nth partial sum of the series (αan + βbn ) and lim (αsn + βtn ) = α lim sn + β lim tn ,

n→∞

n→∞

n→∞

which is (6.3). 6.1.7 Example. By 6.1.6 and 6.1.4, ∞ ∞ ∞ X X 2 · 3n+1 + 3 · 2n−1 1 3X 1 =6 + 6n 2n 2 n=1 3n n=1 n=1

1/2 3 1/3 + 1 − 1/2 2 1 − 1/3 = 6.75.

=6

♦

The following result is a test for divergence. It implies that a seriesPwhose nth term does not tend to zero must diverge. For example, the series sin n P −1/n and 2 diverge. P 6.1.8 Proposition. If an converges, then an → 0. Proof. an = sn − sn−1 → s − s = 0. The converse of 6.1.8 is false: P∞ 6.1.9 Example. The harmonic series n=1 1/n diverges. Indeed, if sn is the nth partial sum of the series, then for all n s2n − s2n−1 =

1 2n−1 + 1

+ ··· +

1 2n−1 1 > = , 2n−1 + 2n−1 2n−1 + 2n−1 2

hence {sn } is not a Cauchy sequence. It is of interest to note that, while the sequence sn diverges, the sequence tn := sn − ln n converges. To see this, observe first that ln n =

Z 1

n

n−1 Z n Z n dx X k+1 dx X k+1 dx X 1 = < = = sn , x x k k k k k=1

k=1

k=1

so tn > 0. Furthermore, ln (n + 1) − ln n =

Z

n+1

n

dx > x

Z

n+1

n

1 1 dx = , n+1 n+1

166

A Course in Real Analysis

hence tn − tn+1 = ln (n + 1) − ln n + sn − sn+1 = ln (n + 1) − ln n −

1 > 0. n+1

Therefore, {tn } is bounded below and decreasing, hence converges. The number X n 1 γ := lim tn = lim − ln n n→∞ n→∞ k k=1

is known as Euler’s constant. Its value to eleven decimal places is .57721566490 . . .. As of this writing, it is not known whether γ is irrational. Note that since sn = tn + ln n, the convergence of {tn } provides another proof that the harmonic series diverges. ♦

Exercises 1. Let m ∈ N. Sum the series (a) S (c) S

P∞

n=1

m2n+1 . (m + 1)2n−1 n+1 2 +2 ln n+1 . 2 +1

an , where an = (−1)n+1 m3n+1 . (m + 1)3n−1 1 (d) p √ √ . n(n + 1)( n + 1 + n) 12 (f) . (n + 1)(n + 2)(n + 3) (−1)n (h) . (n + 1)(n + 3)(n + 5) (b)

(−1)n , m even. n(n + m) 1 . (g) S (n + 1)(n + 3)(n + 5) p √ m n + n(n + 1) S p (i) ln √ . m n + 1 + n(n + 1) 1 (k) . √ √ √ (n + m) n + n n + m (−1)n (n + m + 1) (m) . (2n + 1)(2n + 4m + 3) (e) S

n2 + 4n + 4 . n2 + 4n + 3 18 . (l) n(n + 1)(n + 2)(n + 3) (−1)n (2n + 2m + 1) (n) S . n(n + 2m + 1) (j)

ln

P∞ 2. Let 0 < r < 1 and m ∈ N. Sum the series n=0 an if an = (a)S rn cos (nπ)/2 . (b) (−1)bn/3c rn . (c) (−1)bn/mc rn . P∞ P∞ 3. Given thatPe = n=0 1/n! and e−1 = n=0 (−1)n 1/n!, find the value of ∞ the series n=0 an if an = (a) S

(2n + 3)3 . n!

(b)

4. Let p > 0 and sn = sn / ln ln n → +∞.

1 1 n . (c) S . (d) . (2n)! (2n + 1)! (2n + 1)!

Pn

k=1

(e)

n . (2n)!

1/k. Prove that sn /np → 0, sn / ln n → 1, and

Numerical Infinite Series

167

5. Let γ denote Euler’s constant (6.1.9). Prove that n X √ γ 1 S (a) − ln n → ln 2 + . 2k − 1 2 (b)

k=1 n X

k=1

4k − ln n → ln 4 + γ − 1. (2k − 1)(2k + 1)

∞ X

1 = ln 4 − 1. n(2n − 1)(2n + 1) n=1 P an converges iff for each ε > 0 there exists an index N 6. Prove that such that n+p X an < ε for all n ≥ N and p ≥ 1. (c)

k=n

7. Suppose that an tends monotonically to 0 and that s := converges.

P∞

n=1

an

(a) Prove that nan → 0. (b) Let p ∈ N. Show that t := in terms of s.

P∞

n=1

n(an −an+p ) converges and express t

Suggestion. For (b), consider first the case p = 1. P P 8.S Let an and bn be convergent series with bn > 0 for all n. Suppose that L := limn (an /bn ) exists in R. Prove that P∞ k=n ak lim P∞ = L. n k=n bk P∞ P∞ 2 2 −1 Use this to calculate limn . k=n sin(3/k ) k=n 1/k 9. For a sequence {cn }, define ∆cn = cn+1 − cn . Prove the following discrete analog of l’Hospital’s rule: Let {an } and {bn } be sequences with {bn } strictly monotone. Suppose that either (a) an → 0 and bn → 0, or (b) bn → ±∞. Then an ∆an lim = lim , n bn n ∆bn provided that the limit on the right exists in R. P 10. Let {an } and Pn {bn } be sequences Pnwith bn > 0 for all n and n bn = +∞. Set An = k=1 ak and Bn = k=1 bk . Use Exercise 9 to prove that lim n

an An = lim , n bn Bn

provided that the limit on the right exists in R. Use this to calculate the limits of

168

A Course in Real Analysis n X

(a)

n X

sin(1/k)

k=1 n X

,

(b)

1/k

k=1

n X

ln k

k=1 n X

,

(c)

kp

k=1

rk

k=1 n X

, kp

k=1

where r, p > 0. 11. Let {bn }∞ n=1 be a sequence obtained P by rearranging finitely P many terms of a sequence {an }∞ bn converges in R iff an converges n=1 . Show that in R, in which case the series are equal. 12.S Let {bk } be a sequence obtained from a sequence {an } by grouping, that is, bk = ank−1 +1 + ank−1 +2 + · · · + ank , k = 1, 2, . . . , where {nk }k is a strictly increasing sequence of nonnegative integers P P and n0 = 0. Show that if n an converges in R, then so does k bk and the series are equal. Show that the converse is true if an ≥ 0 for all sufficiently large n. What if the terms an change sign infinitely often? P∞ 13. LetP{an } be decreasing and nonnegative. Prove that n=1 an converges ∞ iff k=0 2k a2k converges. Hint. Set sn =

n X

aj and tk =

j=1

k X

2j a2j .

j=0

Show that sn ≤ tk if n ≤ 2k+1 − 1 and sn ≥ tk /2 if n ≥ 2k . 14. (Decimal representation of real numbers). Prove that every real number x ≥ 0 has a decimal representation x = bN bN −1 · · · b0 .a1 a2 · · · :=

N X

bn 10n +

n=0

∞ X

an 10−n ,

n=1

where the digits bn , an are integers from 0 to 9. Hint. By Exercise 1.5.16, it may be assumed that x ∈ [0, 1). Prove by induction that for each n there exist aj ∈ {0, 1, . . . 9} and xn ∈ [0, 10−n ) such that n X x = xn + aj 10−j = xn + (.a1 a2 · · · an ). j=1

15.S Call a decimal representation bN bN −1 · · · b0 .a1 a2 · · · standard if no index n exists such that ak = 9 for all k ≥ n. Prove that every real number has a unique standard decimal representation.

Numerical Infinite Series

169

16. A real number x ≥ 0 is a repeating decimal if it has decimal representation of the form x = bN bN −1 · · · b0 .a1 a2 · · · am am+1 am+2 · · · am+k , where the upper bar indicates that the block repeats forever. (For example, 61/495 = .12323 · · · = .123.) Prove that every repeating decimal is rational. 17. Prove the converse of Exercise 16, that is, every rational number p/q is a repeating Conclude that if f : N 7→ N is strictly increasing, P −fdecimal. then 10 (n) is irrational. Hint. By the division algorithm you may assume that 1 ≤ p < q. Begin by showing that if p/q = .a1 a2 · · · , then for each n p rn = .a1 a2 · · · an + n , where rn ∈ {0, 1, . . . , q − 1}, q 10 q and use this to show that qan = 10rn−1 − rn , where r0 := p.

6.2

Series with Nonnegative Terms

There are a variety of tests for the convergence of series with nonnegative terms. The most basic of these is the following theorem. P an converges in R iff 6.2.1 Theorem. If an ≥ 0 for all n, then the series its partial sums are bounded. Proof. Since the terms of the series are nonnegative, the sequence of partial sums is increasing. The assertion therefore follows from the monotone sequence theorem (2.2.2). 6.2.2 Remark. By 6.1.2, the theorem is still valid if the inequality an ≥ 0 holds only eventually, that is, for all n ≥ some m. Many of the results in this chapter have similar extensions. Rather than make these explicit, we leave the straightforward formulations to the reader. ♦ P P 6.2.3 Example. Let an , bn ≥ 0 for all n and suppose that an and bn converge. By the Cauchy–Schwarz inequality (1.6.3(e)), n p X k=1

ak bk ≤

X n k=1

ak

1/2 X n

1/2 bk .

k=1

Since P √ the sums on the right are bounded, so are the sums on the left. Therefore, an bn converges. ♦

170

A Course in Real Analysis

The following test relates the convergence of a series to that of an improper integral. 6.2.4 Integral Test. Let f be decreasing, P∞ positive, and locally integrable on the interval [1, ∞). Then the series n=1 f (n) converges iff the improper R∞ integral 1 f converges. Moreover, for every n ∈ N Z ∞ 0 ≤ s − sn ≤ f (x) dx. (6.4) n

Proof. For each n ∈ N let sn =

n X

f (k) and tn =

Z

n

f. 1

k=1

For each k ∈ N and x ∈ [k, k + 1], f (k + 1) ≤ f (x) ≤ f (k), hence f (k + 1) ≤

k+1

Z

f ≤ f (k)

k

and so sn − f (1) =

n X

f (k) =

k=2

n−1 X

f (k + 1) ≤

k=1

n−1 X Z k+1

f = tn ≤

k=1

k

n−1 X

f (k) = sn−1 .

k=1

Therefore, {sn } is bounded iff {tn } is bounded. The first assertion of the theorem now follows from 6.2.1. Now observe that for m > n, 0 ≤ sm − sn =

m X

f (k) =

k=n−1

m−1 X k=n

f (k + 1) ≤

m−1 X Z k+1

f=

k=n

k

Z

m

f. n

Letting m → +∞ yields (6.4). Inequality (6.4) allows one to estimate the error made by approximating s by a partial sum sn . R∞ 6.2.5 Example. (p-series). By 5.7.3(a), 1 1/xp dx converges iff p > 1. ThereP∞ fore, the same is true of the series s := n=1 1/np . . Furthermore, if p > 1, then Z ∞ 1 0 ≤ s − sn ≤ x−p dx = . p−1 (p − 1)n n Thus if the partial sum sn is to agree with s in, say, the first 10 decimal places, then n should be chosen so that (p − 1)np−1 > 1010 . ♦ P 6.2.6 Comparison Test. Let 0 ≤ an ≤ bn for all n. If bn converges, then P so does an .

Numerical Infinite Series 171 P P Proof. The partial sums of bn are bounded and dominate those of an , hence assertion follows from 6.2.1. 6.2.7 Limit Comparison Test. Let an , bn > 0 for all n. P P (a) If r := lim sup(an /bn ) < +∞ and bn converges, then an converges. P P (b) If r := lim inf(an /bn ) > 0 and an converges, then bn converges. P P (c) If r := lim(an /bn ) exists and r ∈ (0, +∞), then bn converges iff an converges. Proof. For (a), let r ∈ (r, +∞) and choose N so that supn≥N an /bn < r. Then an < bn r for every n ≥ N , hence the conclusion follows from the comparison test and 6.2.2. Part (b) follows similarly by choosing r ∈ (0, r) and then N so that inf n≥N an /bn > r. Part (c) follows from (a) and (b). 6.2.8 Examples. (a) The series X 2n + n3 n

3n + n2

converges by comparison with the convergent series

n (2/3)

P

n

, since

1 + n3 /2n 2n + n3 (2/3)−n = → 1. n 2 3 +n 1 + n2 /3n (b) The series

√ X n

cn + d − n+1

√

cn

, c, d > 0,

P converges by comparison with the convergent series n n−3/2 , since √ √ dn3/2 d cn + d − cn 3/2 √ n = → √ . √ n+1 2 c (n + 1)( cn + d + cn)

♦

6.2.9 Ratio Test. Let an > 0 for all n. P an+1 < 1, then an converges. (a) If r := lim sup an n P an+1 (b) If r := lim inf > 1, then an diverges. n an Proof. (a) Let r ∈ (r, 1) and choose N so that supn≥N an+1 /an < r. For n>N an < an−1 r < an−2 r2 < · · · < aN rn−N , P so an converges by the comparison test. (b) If r > 1 there exists N such that inf n≥N an+1 /an > 1. Therefore, an > an−1 > an−2 > · · · > aN > 0, n > N, P so an cannot converge to zero. Therefore, an diverges.

172

A Course in Real Analysis

6.2.10 Examples. (a) Let an denote the general term of the series ∞ X 8 · 14 · 20 · · · (6n + 2) n c , 6 · 11 · 16 · · · (5n + 1) n=1

where c > 0. Then

an+1 6n + 8 6 = c → c, an 5n + 6 5 hence the series converges if c < 5/6 and diverges if c > 5/6. If c = 5/6, then an =

8 · 14 · 20 · · · (6n + 2)5n (1 + 1/3)(2 + 1/3) · · · (n + 1/3) = > 1, 6 · 11 · 16 · · · (5n + 1)6n (1 + 1/5)(2 + 1/5) · · · (n + 1/5)

so the series diverges in this case as well. (b) For the series

∞ X

2

(n!)p rn ,

r > 0, p ∈ R

n=1

the ratios are

an+1 = (n + 1)p r2n+1 , an hence the series converges iff r < 1. (c) For the series

∞ X 2n ln2 n , n! n=2

an+1 2 ln2 (n + 1) 2 ln2 (n + 1) → 0, = ≤ an (n + 1) (n + 1) ln2 n hence the series converges.

♦ 1/n

6.2.11 Root Test. Let an ≥ 0 for all n and set ρ := lim supn an . P (a) If ρ < 1, then an converges. P (b) If ρ > 1, then an diverges. 1/n

Proof. (a) Let r ∈ (ρ, 1) andPchoose N such that supn≥N an < r. Then an < rn for all n ≥ N , hence an converges by the comparison test, 1/nk

(b) By 2.4.2, there exists a subsequence ank large k, ank > 1, hence the series diverges.

→ ρ. Then, for all sufficiently

6.2.12 Example. For the series ∞ X

a + (−1)n b

n

, where a > b > 0,

n=1 1/n

lim supn an = a + b, hence the series converges if a + b < 1 and diverges if a + b > 1. If a + b = 1, an 6→ 0 so the series diverges in this case as well. ♦

Numerical Infinite Series

173 P 6.2.13 Remark. No conclusion regarding the convergence of the series an in 6.2.9 and 6.2.11Pcan be inferred from the relations r ≥ 1, r ≤ 1, or ρ = 1. P Indeed, the series 1/n2 and 1/n satisfy r = r = ρ = 1, yet the first series converges while the second diverges. ♦ In Section 6.3, we consider more refined tests that can detect convergence or divergence in cases where the ratio or root test fails. Here’s an example: 6.2.14 Example. Let an denote the nth term of the series n ∞ q X √ √ an + b n − an , n=1

where a, b > 0. Then q √ √ 1/n an = an + b n − an = p

√ b b n → √ , √ √ 2 a an + b n + an

P hence the series an converges if b2 < 4a and diverges if b2 > 4a. If b2 = 4a, the root test fails but the log test (6.3.4) shows that the series converges in this case. (Exercise 6.3.11.) ♦ 6.2.15 Remark. By Exercise 2.4.12, if an > 0 for all n, then lim inf n

an+1 an+1 ≤ lim inf a1/n ≤ lim sup a1/n ≤ lim sup . n n n an an n n

This shows that if the ratio test determines convergence or divergence conclusively, then so does the root test. It also suggests that the root test may be effective when the ratio test fails. ♦ 6.2.16 Example. Let an = sn δn−1 + tn δn , where δn = 0 < s < t < 1. Then ( sn if n is odd, an = n t if n is even,

1 2 [1

+ (−1)n ] and

1/n

so the ratios an+1 /an are sn+1 /tn or tn+1 /sn , and the roots an are s or t, depending on the parity of n. Therefore, r = 0, r = P+∞ and ρ = t, which shows that the root test detects the convergence of an while the ratio test does not. ♦

Exercises 1.S Determine whether the series n! . 3 · 5 · · · (2n + 1) 4n n! (d) . 5 · 8 · · · (3n + 2) (a)

P

an converges or diverges, where an =

3 · 5 · · · (2n + 1) . (2n + 1)! 2 · 4 · · · (2n) (e) . 4 · 7 · · · (3n + 1) (b)

3 · 6 · · · (3n) . 3 · 5 · · · (2n + 1) 4 · 7 · · · (3n + 1) (f) . 5 · 9 · · · (4n + 1) (c)

174

A Course in Real Analysis P 2. Determine whether the series an converges or diverges, where an = (a) S (d) S

n3 . 2n n! . nn

2

ln n . n1.1 1 (f) 1+1/n . n

(b) (1 + r/n)n , r > 0. (c) (e) (n1/n − r)n .

2n 1 . (h) . n(ln n)(ln ln n)p n! √ rn , r 6= ±1. (k) (j) S sin2 (1/ n). 1 − rn n + sin n n + ln n (m) S 3 . (n) r . n + sin n n ln n n! 3n n! (q) . (p) S n . n (1.1)n3 1 1 (s) S ln n . (t) ln n . 2 3 3 3n + 4n . (w) (1 − r/n)n , r > 0. (v) S n 8 − 6n

(g) S

(i) sin2 (1/n). 1 , r 6= ±1. (1 − rn )2 1 (o) r . n ln n n 1 + an (r) , a, b > 0. 1 + bn (l)

(u) rsin n , r > 0. (x) 1/rln n , r > 0.

P P 3. Let an > 0 for all n and suppose that an diverges. Prove that an bn diverges for all sequences {bn } with lim inf n bn > 0. P∞ 4. Let bn → p > 0. Prove that n=1 n−bn converges if p > 1 and diverges if p < 1. Give anP example of a sequence {bn } with bn > 1 for all n and bn ↓ 1 such that n−bn diverges. P P 1/n 5.S Let an > 0 for all n. Prove that an converges iff n an converges. P∞ 6. Find all values of a, b, p, q > 0 for which n=1 an converges if an = lnp n 1 . (c) . q q n n lnp n −1 n Y (n + 1)p − np qp 1/q p S n/2 (d) (n + 1) − n . (e) . (f) p n! pj + 1 . nq j=1 n a + np 1 + anp 1 + anp (g) S . (h) . (i) . b + nq 1 + bnq 1 + bnq (a) S

1 . lnp n

(b)

P P 7. Let {an } be positive and decreasing. Prove that an converges iff a2n converges. P∞ 8. Let an > 0 for all n. Prove or disprove: If n=1 an converges, then

Numerical Infinite Series P∞

n=1 bn

175

converges, where bn =

(a) S a2n .

(b)

√

X

(c)

an .

(d) S min aj .

aj .

n≤j≤2n

n≤j≤n+m

(e) max aj . (f) n≤j≤2n

(i)

n X

an aj .

j=1

X

(g)

aj .

1 an

Y

aj . (k)

n 0. Prove that if

(h) S

aj .

n≤j≤2n

n≤j≤2n

(j)

Y 1 an

X

Y

aj .

1≤j≤n

aj . (l) S

n 0 and p > 3/r. Prove that bn /(rp − 3) converges for all sequences {bn } with bn → r. P 11.S Let an , bn > 0 and an+1 bn P/an ≤ bn+1 /bn for all n. Prove that if converges, then so does an . P P 12. Let an > 0. Show that an converges iff f (an ) converges, where f (x) = (a) sin x. x (e) . 1 + ax

P

(b) tan x.

(c) sin−1 x.

(d) tan−1 x.

(f) ln(1 + x).

(g) ex − 1.

(h) x3 + x2 + x.

13. Let {pn } be a sequence in Z+ and {an } a sequence of positive reals. P P √ (a) Prove that if n an converges, then n an an+pn converges, provided that either {pn } is bounded or an is decreasing. P √ (b) Suppose {pnP } is bounded and an ↓ 0. Prove that if n an an+pn converges, then n an converges. Does (b) hold if {an } is not monotone or {pn } is not bounded? 14.S Let g be positive and differentiable on [1, ∞) such that limx→∞ g(x) = 0, and let f be differentiable in a neighborhood of 0 such that f (0) = 0, fP(x) > 0 for x > 0, f 0 is continuous at 0, and f 0 (0) > 0. Prove that P∞ ∞ n=1 f (g(n)) converges iff n=1 g(n) converges. 15. Let f : R → [0, +∞) be twice differentiable and p > 0. Prove: P (a)S If p ≤ 1 and f (1/np ) converges, then f (0) = f 0 (0) = 0. P (b) If p ≥ 1 and f (0) = f 0 (0) = 0, then f (1/np ) converges. P 16. Let an ≥ 0 forPall n and suppose that an converges. Prove that if √ n−α an converges. Give an example which shows that α > 1/2, then the assertion is false if α = 1/2.

176

A Course in Real Analysis

Assume, for a contradiction, that 17.S This exercise shows that e is irrational. Pn e = m/n, m, n ∈ N. Let s = 1/k!. Using the series representation n k=0 P∞ e = k=0 1/k!, show that (a) n!(e − sn ) ∈ N. P∞ (b) n!(e − sn ) < k=1 (n + 1)−k = 1/n. Conclude that e must be irrational. Pn 18. Let sn = k=1 k −p , 0 < p < 1. Show that {sn −(1−p)−1 n1−p } converges. Conclude that if p + q > 1, 0 n 1 1 X 1 if p + q = 1, = lim n→+∞ nq 1−p kp k=1 +∞ if p + q < 1. Pn P 2 −p an n < +∞, where an , p > 0. 19. Let sn = k=1 ak and suppose that Prove that limn sn n−q = 0 for all q > (p + 1)/2. P cn diverges, 20. Let {an }, {bn }, and {cn } be positive sequences such that bn → b ∈ (0, +∞], and an /an+1 = 1 + bn cn . Prove that an → 0. Hint. Let r ∈ (0, b) and choose m so that bn > r for all n ≥ m. Then am+k /am+k+1 > 1 + rcm+k for all k ≥ 0.

6.3

More Refined Convergence Tests

The tests in this section are frequently useful when the root and ratio tests fail. The first is a generalization of the ratio test. 6.3.1 Kummer’s Test. Let an , bn > 0 for all n and set an bn − bn+1 . an+1 P (a) If c := lim inf n cn > 0, then n an converges. P P (b) If c := lim supn cn < 0 and n b−1 n diverges, then n an diverges. Pn Proof. (a) Set sn = k=1 ak and let r ∈ (0, c). Choose N so that cn ≥ r for all n ≥ N . Since an bn − an+1 bn+1 = cn an+1 , for all m > N we have cn :=

aN bN ≥ aN bN − am bm =

m−1 X n=N

m−1 X an bn − an+1 bn+1 ≥ r an+1 = r(sm − sN ), n=N

Numerical Infinite Series 177 P hence sm ≤ sN + aN bN /r. The partial sums of an are therefore bounded so the series converges. (b) If c < 0, there exists an N such that ak bk − ak+1 bk+1 < 0 for all k ≥ N . Then aN bN − an bn =

n−1 X

(ak bk − ak+1 bk+1 ) < 0

k=N

so an > (aN bN )/bn , for all n > N . Since the comparison test.

P

1/bn diverges,

P

an diverges by

A simple but important consequence of Kummer’s test is 6.3.2 Raabe’s Test. Let an > 0 for all n and set a n −1 . dn := n an+1 P (a) If d := lim inf n dn > 1, then an converges. P (b) If d := lim supn dn < 1, then an diverges. Proof. Take bn = n in Kummer’s test, so cn =

an n − (n + 1) = dn − 1. an+1

Then c = d − 1 and c = d − 1 and the assertions follow. 6.3.3 Example. We use Raabe’s test to show that the series X n

n Y 1 (k + a), where a > 0 and m ∈ N, (n + m)! k=1

converges iff m > 1 + a. Indeed, since an n+m+1 n(m − a) n −1 =n −1 = → m − a, an+1 n+1+a n+1+a the series converges if m − a > 1 and diverges if m − a < 1. If m − a = 1, then the general term reduces to n Y 1 m(m + 1) · · · (m + n − 1) 1 (m − 1 + k) = = , (n + m)! (n + m)! (m + n)(m − 1)! k=1

hence the series diverges in this case as well. Note that the ratio test is inconclusive in this example since an+1 /an → 1. ♦

178

A Course in Real Analysis The following test is sometimes useful when the root test fails.

6.3.4 Log Test. Let an > 0 for all n and set cn := ln(a−1 n )/ ln n. P (a) If c := lim inf n cn > 1, then an converges. P (b) If c := lim supn cn < 1, then an diverges. Proof. (a) Let p ∈ (1, c). Then there exists N such that cn P > p for all n ≥ N . p p an converges by For such n, ln(a−1 n ) > ln n , hence an < 1/n . Since p > 1, the comparison test. The proof of (b) is similar. 6.3.5 Example. Let an denote the general term of the series n ∞ X a + np n=1

b + nq

,

where a, b, p, q > 0. The root test shows that the series converges if p < q and diverges if p > q. If p = q, the test is inconclusive, so we consider cases. If a ≥ b, then an ≥ 1 and the series diverges. If a < b, we use the log test: By l’Hospital’s rule, the sequence cn =

− ln an ln(b + np ) − ln(a + np ) = ln n (ln n)/n

has the same limit as pnp−1 pnp−1 − p+1 (a + np ) − (b + np ) b + np a + np = pn 1 − ln n 1 − ln n (a + np )(b + np ) n2 p(a − b) n = . b/np + 1 (1 − ln n)(a + np ) The first quotient in the last expression tends to p(a − b) < 0. By l’Hospital’s rule, the second quotient has the same limit as 1 (1 − ln n)(pnp−1 ) − (a + np )/n

=

−n1−p , p(ln n − 1) + (a/np + 1)

which converges to 0 if p ≥ 1 and to −∞ if p < 1. Thus if p = q and a < b, then ( 0 if p ≥ 1 lim cn = n +∞ if p < 1, P hence an converges iff p < 1. ♦

Numerical Infinite Series

179

Exercises 1. Show that the ratio test is a consequence of Kummer’s test. 2. Show that Raabe’s test detects the convergence properties of the p-series P 1/np for p 6= 1, whereas the ratio and root tests do not. P 3.S Use Raabe’s test to determine the convergence of an if an = n n n Y Y 3k − 1 1 Y 2k − 1 1 . (b) . (c) n (3k + 1). (a) 3k + 1 2n 2k 3 (n + 1)! k=1

k=1

k=1

Show that the ratio test is inconclusive in each case. 4. Let a, b > 0 and m ∈ N. Use Raabe’s test to show that the following series converges iff b − a > m: n X Y n

Y −1 n mk + a mk + b .

k=1

k=1

5. Find all values of p > 0 for which the series converge: X pn n! X pn n! . (a)S . (b) n n (p + 1)(2p + 1) · · · (np + 1) n n What does the ratio test reveal? 6.S Show that the series ∞ X

1 · 3 · · · (2n − 1) (2 + p) · (4 + p) · · · (2n + p) n=1 converges iff p > 1. 7. Let p ∈ N. Use Raabe’s test to show that the series X (pn)! ppn (n!)p converges if p > 3 and diverges if p < 3. What does the ratio test tell us for these values of p? 8. Let a, b, c > 0 and m ∈ Z+ . Use Raabe’s test to show that the series ∞ n X 1 Y ak + b nm ak + c n=1 k=1

converges iff c > (m + 1)a + b.

180

A Course in Real Analysis

9. Let b > 0 and m ∈ N. Use Raabe’s test to show that !m ∞ n X Y kb kb + 1 n=1 k=1

converges if m > b and diverges if m < b. What does the ratio test reveal? What happens if m = b = 1? 10. Let P r > 0. Use the log test to determine the convergence behavior of an if an = 1 1 1 (a)S rln ln n . (b) . (c) . (d)S . r ln n r ln ln n n n (ln n)rn P 11. Let an be as in 6.2.14. Use the log test to verify that an converges if b2 = 4a. P (np ) 12. Let p, r > 0. Use the log test to verify that r converges iff r < 1. P ln n 13.S Let bn → b > 0. Use the log test to verify that b− converges if n b > e and diverges if b < e. P (np ) 14. Use the log test to show that the series (1 − 1/n) converges iff p > 1. P 15. Let p > 0 and a 6= 0. Use the log test to verify that (1 − a/np )n diverges if p ≥ 1, converges if 0 < p < 1 and a > 0, and diverges if 0 < p < 1 and a < 0. What does the root test reveal? P 16. Let a, b, p, q > 0. Determine the convergence behavior of an if an = a + np ln n a + np ln ln n 1 + anp ln ln n (a)S . (b) . (c) . b + nq b + nq 1 + bnq P 17. Show that (ln n)bn diverges if {bn } is bounded. What happens in the unbounded special cases (a) bn = − ln n and (b) bn = −np , p > 0? What does the root test reveal in (b)? 18.S (Loglog test) Let an > 0 for all n and set cn = −

ln (nan ) , c := lim inf cn , and c := lim sup cn . n ln ln n n

P Prove that an converges if c > 1 and diverges if c < 1. Use the test to determine the convergence behavior of ln ln n ∞ X 1 + an , a, b > 0. 1 + bn n=2

Numerical Infinite Series

181

19. Let a, b > 0. Use the log test to show that X 1 + anp ln n n

1 + bnq

diverges if p > q; converges if p < q; and if p = q, then converges if b/a > e and diverges if b/a < e. Use the log log test to show that the series also diverges if p = q and b/a = e. 20. Use Kummer’s test to prove Gauss’s test: Let an > 0 for all n and let {αn } be a bounded sequence such that αn an r =1+ − s, an+1 n n P where r, s ∈ R, s > 1. Then an converges iff r > 1. 21.S Use Kummer’s test to prove Bertrand’s test: Let an > 0 for all n and let {βn } be a sequence such that βn an 1 . =1+ − an+1 n n ln n Then

6.4

P

an converges if lim inf βn > 1 and diverges if lim sup βn < 1. n

n

Absolute and Conditional Convergence

The convergence tests in Sections 6.2 and 6.3 apply only to series with nonnegative terms. In this section we consider tests applicable to general series. P P 6.4.1 Definition. A series an is said to converge absolutely if |an | converges. A convergent series that does not converge absolutely is said to converge conditionally. ♦ P 6.4.2 Theorem. (a) If an converges absolutely, then the series X X X an , a+ a− n , and n converge and X (b) If

P

an =

X

a+ n −

X

a− n,

X

an converges conditionally, then

|an | =

P

X

a+ n and

a+ n +

P

X

a− n.

a− n diverge.

182

A Course in Real Analysis

Proof. (a) If

|an | converges, then the inequalities 1 0 ≤ a± n = 2 |an | ± an ≤ |an | P P − and the comparison test show that a+ an converge. The remaining n and + − assertionsP in (a) follow from the identities a = a − a− |an | = a+ n n P n andP n + an . P − P − (b) If an and an converge, then |an | = an + 2 Pan converges. P P an converges The same conclusion holds ifP an and P a+ n converge. Hence if conditionally, then neither a+ a− n nor n can converge. P∞ All series of the form n=1 (−1)n+1 /np , 0 < p ≤ 1 converge conditionally. This follows from the alternating series test given below. The following example is somewhat more interesting. P

6.4.3 Example. We show that the series s :=

∞ X −1 (−1)n np − 1 n=2

converges conditionally iff 1/2 < p ≤ 1 and absolutely iff p > 1. To see this, note first that if p < 0, then the nth term of the series does not tend to zero, and if p = 0 the series is undefined. So assume p > 0. If sn denotes the nth partial sum of the series, then X n n X 1 1 − = (αk + βk ), (6.5) s2n+1 = (2k)p − 1 (2k + 1)p + 1 k=1

k=1

where (2k + 1)p − (2k)p 2 and βk := . αk := p p p (2k) − 1 (2k + 1) + 1 (2k) − 1 (2k + 1)p + 1 By the mean value theorem applied to xp on the interval [2k, 2k + 1], pxkp−1 , for some xk ∈ (2k, 2k + 1). αk = (2k)p − 1 (2k + 1)p + 1 If 0 < p ≤ 1, then p 1 1 = ≤ p+1 p p 1−p 2p k (2k) − 1 (2k) + 1 (2k) (2k) − 1 Pn the last inequality large k. Therefore, k=1 αk converges by Pfor sufficiently comparison with k 1/k p+1 . Also, since αk ≤

(2k)1−p

βk 2 1 → 2p−1 , = p k −2p [2 − k −p ][(2 + 1/k)p + k −p ] 2 Pn the limit comparison test shows that k=1 βk converges iff p > 1/2. Therefore the partial sum (6.5) has a finite limit iff p > 1/2. Since s2n+1 − s2n → 0, the series s converges iff p > 1/2. Since np − 1 ≤ (−1)n+1 np − 1 ≤ np + 1, s converges absolutely iff p > 1. ♦

Numerical Infinite Series

183

The tests of Sections 6.2 and 6.3 for positive-term series may be used in conjunction with 6.4.2 to test series with terms of mixed sign. For example, −2 n sin n ≤ n−2 , together with the comparison test, shows the inequality P that the series n−2 sin n converges absolutely and hence converges. The remainder of the section describes tests that are useful for establishing conditional convergence. They rely on the following discrete analog of the integration-by-parts formula, due to Abel. 6.4.4 Summation by Parts. Let {an }, {bn }, and {sn } be sequences such that s0 = 0 and sk − sk−1 = ak , k ≥ n ≥ 1. Then, for m > n ≥ 1, m X

ak bk =

k=n

m−1 X

sk (bk − bk+1 ) + sm bm − sn−1 bn .

k=n

Proof. Since ak = sk − sk−1 , m X

ak bk =

k=n

m X

sk bk −

k=n

m X

sk−1 bk =

k=n

m X

sk bk −

k=n

m−1 X

sk bk+1 .

k=n−1

Combining the last two sums yields the desired formula. 6.4.5 Dirichlet’s Test. Let {an } and {bn } be sequences such that the following conditions hold: P (a) The partial sums of an are bounded. (b) limn bn → 0, and P (c) The series |bn+1 − bn | converges, which is the case, for example, if {bn } is monotone. P Then an bn converges. Proof. Let sn :=

n X

ak and tn :=

k=1

n X

ak bk .

k=1

If |sn | ≤ M for every n, then, by 6.4.4, m m X X |tm − tn−1 | = ak bk ≤ M |bk − bk+1 | + M (|bn | + |bm |) m ≥ n > 1. k=n

k=n

Since the right side of the inequality tends to 0 as m, n → ∞, {tn } is a Cauchy sequence and hence converges. If {bn } is monotone, say decreasing, then n X k=1

which converges.

|bk+1 − bk | =

n X k=1

(bk − bk+1 ) = b1 − bn+1 ,

184

A Course in Real Analysis

P∞ 6.4.6 Example. We apply Dirichlet’s test to the series n=1 bn sin(nθ), where {bn } is monotone andP bn → 0. To establish the boundedness of the sequence n of partial sums sn := k=1 sin(kθ), we use the identity 2 sin (θ/2) sin (kθ) = cos (k − 1/2)θ − cos (k + 1/2)θ . Summing, 2 sin (θ/2)

n X

sin (kθ) = cos(θ/2) − cos (n + 1)θ/2 .

k=1 −1 Thus P∞ if θ is not a multiple of 2π, then |sn | ≤ | sin(θ/2)| . By 6.4.5, n=1 bn sin(nθ) converges for all θ. Note that if, for example, θ = π/2 and bn = 1/n, then the convergence is conditional. ♦ P∞ n+1 6.4.7 Alternating Series Test. If bn ↓ 0, then n=1 (−1) bn converges. P∞ n+1 Proof. The partial sums of are clearly bounded, hence the n=1 (−1) assertion follows from 6.4.5.

6.4.8 (Alternating Harmonic Series). By 6.4.7, the series P∞ Example. n+1 −1 (−1) n converges. We show that its value is ln 2. Let n=1 sn =

n X (−1)k+1

k

k=1

and tn =

n X 1 − ln n. k

k=1

By 6.1.9, the sequence {tn } converges. Also, by Exercise 1.5.3, s2n =

2n X (−1)k+1 k=1

k

=

2n X 1 = t2n − tn + ln 2. k

k=n+1

It follows that s2n → ln 2. Since s2n+1 − s2n → 0, sn → ln 2.

♦

The contrast between absolutely convergent and conditionally convergent series is strikingly displayed in the context of rearrangements. P∞ P∞ 6.4.9 Definition. A rearrangement of a series n=1 an is a series k=1 amk , where {mk } is a sequence of positive integers that contains every positive integer exactly once.1 ♦ P∞ 6.4.10PTheorem. If n=1 an converges absolutely to s, then any rearrange∞ ment k=1 amk converges absolutely to s. Proof. Assume first that an ≥ 0 for all n. Let tn =

n X k=1

1 In

amk and sn =

n X

ak .

k=1

other words, k 7→ mk is a one-to-one mapping of N onto itself.

Numerical Infinite Series

185

For each N , choose K so large that the terms ak , 1 ≤ k P ≤ N , are included ∞ among the terms amP , 1 ≤ k P ≤ K. Then sN P ≤ tK ≤ k k=1 amk . Letting a ≤ a a N → ∞ shows that . Since is a rearrangement of n m n k n k n P reverse inequality holds as well. The general case follows by k amk , the P + P − considering an and an and using 6.4.2. 6.4.11 Example. Consider the series t := 1 −

1 1 1 1 1 1 1 1 − p + p − p − p + p − p − p + ··· , p 2 4 3 6 8 5 10 12

which is a rearrangement of the alternating series s := 1 −

1 1 1 1 1 1 1 1 1 + p − p + p − p + p − p + p − p + ··· , 2p 3 4 5 6 7 8 9 10

If p > 1, then both series converge absolutely and t = s. If p = 1, then the two series converge to different values. Indeed, if sn and tn denote the nth partial sums of s and t, respectively, then t3n =

n X k=1

1 1 1 − − 2k − 1 4k − 2 4k

Since t3n+1 = t3n +

n

=

1X 2

k=1

1 1 − 2k − 1 2k

=

s2n s → . 2 2

1 1 and t3n+2 = t3n+1 − , 2n + 1 4n + 2

we see that tn → s/2.

♦

The phenomenon illustrated in the last example holds generally, as shown by the following remarkable result due to Riemann. P∞ 6.4.12 Theorem. If s := n=1 an converges conditionally, then, for any real number x, some rearrangement of s converges to x. Proof. We may assume that x ≥ 0. For n ∈ N let sn :=

n X j=1

aj , s + n :=

n X

− a+ j , sn :=

j=1

n X

+ a− j , and s0 := 0.

j=1

+ Since s+ n → +∞ (6.4.2), there exists a smallest integer m1 such that sm1 > x. Since x ≥ 0, m1 6= 0. Because s− → +∞, there exists a smallest positive n − integer n1 such that s+ m1 − sn1 < x and then a smallest integer m2 such that − s+ m2 − sn1 > x. Obviously, m2 > m1 . Continuing in this manner, we obtain strictly increasing sequences {mk } and {nk } with the following properties:

• mk is the smallest integer such that − + − + − tk := s+ mk − snk−1 = (a1 + · · · + amk ) − (a1 + · · · + ank−1 ) > x,

186

A Course in Real Analysis

• nk the smallest integer such that − + − + − rk := s+ mk − snk = (a1 + · · · + amk ) − (a1 + · · · + ank ) < x.

Now consider the series − − + + − + s0 := a+ 1 + · · · + am1 − a1 − · · · − an1 + am1 +1 + · · · + am2 − an1 +1 − · · · .

The terms of s0 are either aj or 0, and s0 contains each term of the series s exactly once. Thus s0 is a rearrangement of s. We show that s0 = x. By the minimality properties of the sequences {mk } and {nk }, − tk − a+ mk ≤ x < tk and rk < x ≤ rk + ank ,

hence

+ x − a+ nk ≤ rk < x < tk ≤ x + amk .

Since an → 0,

lim rk = lim tk = x. k

k

(6.6)

Now let s0k denote the kth partial sum of the series s0 and consider the partial sums − + − r1 = (a+ 1 + · · · + am1 ) − (a1 + · · · + an1 ), − + − t2 = (a+ 1 + · · · + am2 ) − (a1 + · · · + an1 ), − + − r2 = (a+ 1 + · · · + am2 ) − (a1 + · · · + an2 ).

If m1 + n1 ≤ k ≤ m2 + n1 , then s0k includes the terms of r1 , additional terms + 0 from a+ m1 +1 + · · · + am2 , and no others, hence r1 ≤ sk ≤ t2 . Similarly, if m2 + n1 ≤ k ≤ m2 + n2 , then s0k includes the terms of t2 , additional terms − 0 from −a− n1 +1 − · · · − an2 , and no others, so r2 ≤ sk ≤ t2 . In general, for j ≥ 1, mj + nj ≤ k ≤ mj+1 + nj ⇒ rj ≤ s0k ≤ tj+1 and mj+1 + nj ≤ k ≤ mj+1 + nj+1 ⇒ rj+1 ≤ s0k ≤ tj+1 . From (6.6), s0k → x.

Exercises P P 1. Suppose that an converges absolutely. Prove that an bn converges absolutely for all sequences {bn } with lim supn→∞ |bn | < +∞. P 2.S Suppose an does not converge absolutely. P the ratio test shows that Can an still converge conditionally? P∞ n+1 3. For an alternating series s = bn , prove the inequality n=1 (−1) |s − sn | ≤ bn . This result is useful in estimating the error made by using sn to approximate s. For example, use the estimate to determine how large P∞ n should be so that the partial sum sn agrees with s = n=1 (−1)n+1 /n4 in nine decimal places.

Numerical Infinite Series 4. Let p > 0. Determine whether the series conditionally, or diverges, where an = (−1)n . n1/n (c) S (−1)n sin(1/np ).

187 P

an converges absolutely,

(b) S (−1)n (n1/n − 1).

(a) S

(d) (−1)n sin−1 (1/np ).

(e) (−1)n tan(1/np ).

(f) (−1)n tan−1 (1/np ).

sin[(2n + 1)π/2] . ln n √ √ n+1− n . (i) S (−1)n np 3n . (k) (−1)n √ n3 + 2 (−1)n , (p 6= 1). (m) S n p + (−1)n (−1)n n! (o) . 3 · 5 · · · (2n + 1) (−1)n en n! . (q) 5 · 8 · · · (3n + 2) (g)

(h) (j) (l) (n) (p)

(−2)n . n! (−1)n . n lnp (n + 1) (n!)2 (−1)n pn . (2n)! (−1)n , (n ≥ 2). np + (−1)n (−1)n 3 · 6 · · · (3n) . 3 · 5 · · · (2n + 1)

(r) (−1)n+1 n[(1)

n

−3]/2

.

5. Suppose that {bn } is monotone and bn → 0. Use the identity 2 sin (θ/2) cos (nθ) = sin (n + 1/2)θ − sin (n − 1/2)θ P∞ to verify that the series n=1 bn cos nθ converges if θ/(2π) 6∈ Z. P∞ 6. Let bn ↓ 0 and m ∈ N. Show that n=0 (−1)bn/mc bn converges. 7. Let m ∈ N. Show that m ∞ X (−1)n+1 m X (−1)n+m+1 = + δm ln 2, n(n + m) n n=1 n=1

where δm = 0 or 2 according as m is even or odd. P∞ 8. (Abel) Prove that if n=1 an converges and {bn } is bounded and monoP∞ tone, then n=1 an bn converges. 9.S Prove that

∞ X (n − 1/2) sin(nθ) converges for all real θ iff p > 1. np + (−1)n n=2

10. Let p > 1. Express each of the series ∞ X 1 terms of . p n n=1

∞ X

∞ X 1 (−1)n and in (2n − 1)p np n=1 n=1

188

A Course in Real Analysis P P that if 11. Prove nan converges, then an converges and, moreover, P |an |p converges for every p > 1. What if p = 1? P −p P −q 12. Prove that if n an converges, then n an converges for all q > p. P n 13.S (a) Let sn = k=1 an , where an → 0. Suppose P there exists a positive integer q such that snq → s ∈ R. Prove that an converges to s. (b) Use (a) to sum the series s := 1 +

1 1 1 1 1 1 1 1 + − − − + + + − ··· , 2 3 4 5 6 7 8 9

where sums of length three alternate signs. Generalize your result to alternating sums of length p > 1. (c) Show that in contrast to (b), the following series diverges, where sums of lengths p = 3 and q = 2 alternate signs. t := 1 +

*6.5

1 1 1 1 1 1 1 1 + − − + + + − − ··· . 2 3 4 5 6 7 8 9

Double Sequences and Series

A double sequence is a doubly indexed infinite array {am,n } = {am,n }∞ m,n=1 of real numbers am,n .2 Associated with each double sequence are the so-called iterated limits lim lim am,n and lim lim am,n . m

n

n

m

For the first iterated limit to exist, each inner limit bm := limn am,n , as well as the outer limit limm bm , must exist. Similar remarks apply to the second iterated limit. The following scheme illustrates the case when the iterated limits exist and equal L. a1,1 a2,1 .. .

a1,2 a2,2 .. .

am,1 ↓ c1

am,2 ↓ c2

··· ··· ··· ··· ··· ···

a1,n a2,n .. .

→ b1 → b2 .. .

am,n ↓ cn

→ bm ↓ →L

In addition to iterated limits, a double sequence gives rise to a third type of limit, frequently called a double limit to distinguish it from iterated limits. 2 More

precisely, a double sequence is a function (m, n) 7→ am,n from N × N to R.

Numerical Infinite Series

189

6.5.1 Definition. Let L ∈ R. We write L = lim am,n m,n

and say that am,n converges to L or has limit L if for each ε > 0 there exists N ∈ N such that |am,n − L| < ε for all n, m ≥ N . We also write lim am,n = +∞ (−∞)

m,n

if for each r ∈ R there exists N ∈ N such that am,n > r (< r) for all n, m ≥ N . ♦ Double limits have properties similar to limits of single sequences. For example, double limit analogs of 2.1.3, 2.1.4, 2.1.5, and 2.1.11, are readily formulated and proved. It is easy to find examples of iterated limits that exist but are unequal; am,n = (1 − 1/n)m is one such. When this happens, the double limit cannot exist, as shown in 6.5.2 below. However, even if the iterated limits are equal, the double limit may fail to exist. This is the case for the sequence defined by ( 1 if m = n, and am,n = 0 otherwise, which has zero iterated limits. Finally, the example am,n = (−1)m+n (1/m + 1/n) shows that a double limit may exist even if both iterated limits fail to exist. The following theorem gives the basic connection between double limits and iterated limits. 6.5.2 Iterated Limit Theorem. Let {am,n } be a double sequence such that limn am,n exists for each m and limm am,n exists for each n. If the double limit limm,n am,n exists, then the iterated limits limm limn am,n and limn limm am,n exist and equal the double limit. Proof. Let L := limm,n am,n , bm := limn am,n , and cn := limm am,n . Given ε > 0, choose N ∈ N such that |am,n − L| < ε for all m, n ≥ N . Letting n → +∞ yields |bm − L| ≤ ε for all m ≥ N . Therefore, bm → L. Similarly, cn → L. 6.5.3 Definition. Given a double sequence {am,n }, form the partial sums sm,n =

m X n X j=1 k=1

aj,k , m, n ∈ N.

190

A Course in Real Analysis

The double infinite series X

am,n =

X

∞ X

am,n =

m,n

am,n

m,n=1

is said to converge to s ∈ R ifP {sm,n } converges to s in the sense of 6.5.1. The series converges absolutely if |am,n | converges, and converges conditionally P if am,n converges but not absolutely. ♦ As in the case of single series, an absolutely convergent double series converges (Exercise 7). Moreover, aP doublePseries with nonnegative terms converges m n absolutely iff the partial sums j=1 k=1 aj,k are bounded (Exercise 5). The iterated limits lim lim sm,n = lim lim m

n

m

n

and lim lim sm,n = lim lim n

m

n

m X n X

m

aj,k =

j=1 k=1 n X m X

∞ X ∞ X

aj,k

j=1 k=1

aj,k =

k=1 j=1

∞ X ∞ X

aj,k

k=1 j=1

are called iterated series. The following result, a special case of the Fubini– Tonelli theorem, establishes a connection between double and iterated series. P 6.5.4 Fubini–Tonelli Theorem for Series. A double series am,n is absolutely convergent iff one (hence both) of the following conditions hold: ∞ X ∞ X

|am,n | < +∞ and

|am,n | < +∞.

(6.7)

n=1 m=1

m=1 n=1

In this case, X

∞ X ∞ X

am,n =

m,n

∞ X ∞ X

am,n =

m=1 n=1

∞ X ∞ X

am,n .

(6.8)

n=1 m=1

Pm Pn Pm Pn Proof. Set sm,n = j=1 k=1 aj,k and tm,n = j=1 k=1 |aj,k |. The first assertion of the theorem is clear, since each condition in (6.7) implies that T := supm,n tm,n < +∞, P and conversely. Now suppose that am,n is absolutely convergent. Let s := limm,n sm,n . For each j, n X |aj,k | ≤ tj,n ≤ T for all n, k=1

hence that

P∞

k=1 aj,k converges. Set rm :=

m X ∞ X

aj,k . Given ε > 0, choose N such

j=1 k=1 m X n X aj,k − s < ε for all m, n ≥ N. j=1 k=1

Numerical Infinite Series

191

Fixing m ≥ N and letting n → +∞ in this inequality yields |rm − s| ≤ ε. This shows that rm → s, which is the first equality in (6.8). The proof of the second equality is similar.

Exercises 1. Let α : N → N be strictly increasing. Show that if L := limm,n am,n exists in R, then limm,n aα(n),n exists and equals L. 2. A double sequence {am,n } is said to be Cauchy if, given ε > 0, there exists N ∈ N such that |am,n − am+p,n+q | < ε for all m, n ≥ N and all p, q ≥ 0. Prove that {am,n } converges iff it is Cauchy. Hint. Show that {an,n } converges. 3.S Determine the convergence behavior, double and iterated, of the following sequences, where a, b > 0: (a) sin(m/n). m−n . m+n 1 (g) 1/n . m n + nm sin(1/n) (j) . am + bn (d)

4. Show that if

ln(mn) . n mn (e) . (m + n)2 n (h) . m + n2 m2 n (k) 2 . an + bm4 (b)

(−1)m m . m+n mn (f) 2 . m + n2 n3 m (i) 4 . m + n4 n2 sin(1/n) (l) . m+n

(c)

am,n converges, then limm,n am,n = 0. P 5. Let am,n ≥ 0 for all m, n ∈ N. Prove that m,n am,n converges iff s := supm,n sm,n < +∞, in which case the series sums to s. P

6. State and prove a comparison test for double series with nonnegative terms. 7. Prove that an absolutely convergent double series converges. 8. For = an bm . Prove P that c := P sequences {an } and {bn }, set cm,n P b conm,n cm,n converges absolutely iff a := n an and b := P n n verge absolutely, in which case c = ab. Conclude that m,n m−q n−p converges iff p, q > 1. 9.S Given a double sequence {am,n } with am,n ≥ 0, let {bn } be the sequence obtained = n + 1, that is, Pn by summing am,n alongPthe diagonals j + k P bn := j=1 aj,n+1−j . Prove that am,n converges iff n bn converges, in which case the two series are equal.

192

A Course in Real Analysis

10. Use Exercise 9 to show that the double series X X 1 1 S , and (c) (a) , (b) p 2 + n2 )p/2 (m + n) (m m,n m,n

1 p + np m m,n

X

converge iff p > 2. Show that for p > 2, ∞ ∞ X X 1 1 1 = − . p p−1 (m + n) n np n=2 n=2 m,n=1 ∞ X

P 11.S Prove that m,n rmn converges iff |r| < 1, in which case the iterated P∞ P∞ mn series m=1 n=1 r converges. 12.S Prove the root test for double series with Pnonnegative terms: Suppose that L := limm,n am,n 1/mn exists. Then m,n am,n converges if L < 1 and diverges if L > 1. 13. Let am,n = (−1)m n−m−2 . Prove that X |am,n | = 1 and m≥0,n≥2

X m≥0,n≥2

am,n = 1/2.

Chapter 7 Sequences and Series of Functions

7.1

Convergence of Sequences of Functions

Unlike numerical sequences, sequences of functions have several modes of convergence. In this chapter we consider the two most common types: pointwise and uniform. Other types of convergence will be examined in Chapter 11. 7.1.1 Definition. Let S be a nonempty set. A sequence of real-valued functions fn on S is said to converge pointwise on S to a function f : S → R if fn (x) → f (x) for each x ∈ S. We then write f = limn f or fn → f (on S). ♦ The following theorem is an immediate consequence of 2.1.11 and 3.1.9. 7.1.2 Theorem. Let fn → f and gn → g pointwise on S and let h be continuous such that h ◦ fn and h ◦ f are defined on S. Then, for α, β ∈ R, αfn + βgn → αf + βg, fn gn → f g,

fn f → (if g 6= 0) and h ◦ fn → h ◦ f gn g

pointwise on S. The definition of pointwise convergence may be phrased as follows: For each x ∈ S and ε > 0 there exists an index N such that |fn (x) − f (x)| < ε for all n ≥ N . Here, the index N usually depends on both ε and x. Removing the

f + fn f f − S

FIGURE 7.1: Uniform convergence of fn to f . dependence on x results in the stronger property of uniform convergence: 193

194

A Course in Real Analysis

7.1.3 Definition. A sequence of functions fn : S → R is said to converge uniformly on S to a function f : S → R if, for each ε > 0, there exists N ∈ N such that |fn (x) − f (x)| < ε for all n ≥ N and all x ∈ S. (See Figure 7.1.) ♦ Clearly, uniform convergence implies pointwise convergence. The examples below show that the converse is not generally true. For these examples and for the exercises at the end of the section, the following propositions are useful. 7.1.4 Proposition. Let fn , f : S → R. Suppose that there exists a sequence {an } of positive real numbers such that an → 0 and |fn (x) − f (x)| ≤ an for all x ∈ S and all n. Then fn converges uniformly to f on S. Proof. One need only choose N in the definition of uniform convergence so that an < ε for all n ≥ N . 7.1.5 Proposition. Let fn , f : S → R. Then fn converges uniformly to f on S iff lim fn (bn ) − f (bn ) = 0 n

for any sequence {bn } in S. Proof. If fn converges uniformly to f on S, choose N so that |fn (x)−f (x)| < ε for all n ≥ N and all x ∈ S. For such n, |fn (bn ) − f (bn )| < ε. Conversely, suppose fn does not converge uniformly to f on S. Then there exists an ε > 0, and points bn ∈ S such that |fn (bn ) − f (bn )| ≥ ε for infinitely many n. Thus the sequential condition fails. 7.1.6 Examples. (a) The sequence {xn } converges pointwise but not uniformly to zero on (−1, 1). (Take bn = 1/21/n in 7.1.5.) The convergence is uniform on intervals [−r, r], 0 < r < 1, since on such an interval |xn | ≤ rn and rn → 0. (b) The sequence {n/xn } converges pointwise to zero on (1, +∞) but the convergence is not uniform there, as can be seen by taking bn = 21/n in 7.1.5. The convergence is uniform for x ∈ [r, +∞), r > 1, since then |n/xn | ≤ n/rn → 0. (c) The sequence {xn e−nx } converges uniformly to zero on [0, +∞) since xn e−nx ≤ e−n for x ≥ 0. (d) The sequence {n−1 sin nx} converges uniformly to zero on R since |n sin nx| ≤ 1/n for all x. −1

(e) The sequence {sin(x/n)} converges pointwise to zero on R, but the convergence is not uniform, as can be seen, for example, by takingbn = πn/2 in 7.1.5. The convergence is uniform on bounded intervals [a, b] since on this interval | sin(x/n)| ≤ |x|/n ≤ max{|a|, ||b}. ♦ There is an analog of 7.1.2 for uniform convergence; however, it is more restrictive and requires the notion of uniform boundedness.

Sequences and Series of Functions

195

7.1.7 Definition. A sequence of functions fn is said to be uniformly bounded on S with uniform bound M if |fn (x)| ≤ M for all x ∈ S and all n. ♦ 7.1.8 Proposition. Let fn → f pointwise on a set S. (a) If {fn } is uniformly bounded on S, then f is bounded on S. (b) If each fn is bounded on S and fn → f uniformly on S, then {fn } is uniformly bounded on S, hence f is bounded. (c) If fn → f uniformly on S and f is bounded, then {fn }∞ n=N is uniformly bounded for some N . Proof. (a) This follows by letting n → +∞ in the inequality |fn (x)| ≤ M . (b) Choose N such that |fn (x) − f (x)| ≤ 1 for all n ≥ N and x ∈ S. For such n and for all x ∈ S, |fn (x)| ≤ |fn (x) − f (x)| + |f (x) − fN (x)| + |fN (x)| ≤ 2 + MN , where MN is a bound for fN on S. Since the functions f1 , . . ., fN −1 are bounded, {fn }∞ n=1 is uniformly bounded. (c) Let |f (x)| ≤ M for all x. Choose N such that |fn (x) − f (x)| ≤ 1 for all n ≥ N and x ∈ S. For such n, |fn (x)| ≤ 1 + M for all x ∈ S. The sequence {fn } on (0, 1) defined by ( n if 0 < x < 1/n, fn (x) = 0 if otherwise shows that the first assertion in (b) may be false if the convergence is merely pointwise. 7.1.9 Theorem. Let fn → f and gn → g uniformly on S and let h be uniformly continuous such that h ◦ f and h ◦ fn are defined on S. Then (a) αfn + βgn → f + g uniformly on S, α, β ∈ R. (b) h ◦ fn → h ◦ f uniformly S. (c) fn gn → f g uniformly on S if {fn } and {gn } are uniformly bounded on S. 1 1 1 (d) → uniformly on S if is uniformly bounded on S. gn gn gn

196

A Course in Real Analysis

Proof. The proof of (a) is left to the reader. To prove (b), choose δ > 0 such that |h(u) − h(v)| < ε for all u, v with |u − v| < δ and choose N such that |fn (x)−f (x)| < δ for all x ∈ S and n ≥ N . For such n, |h◦fn (x)−h◦f (x)| < ε. For (c), let M > 0 be a common uniform bound for the sequences {|fn |} and {|gn |} and let ε > 0. Choose N such that |fn (x) − f (x)| < ε/2M and |gn (x) − g(x)| < ε/2M. for all x ∈ S and n ≥ N . For such n and x, |fn (x)gn (x) − f (x)g(x)| ≤ |fn (x)gn (x) − f (x)gn (x)| + |f (x)gn (x) − f (x)g(x)| = |gn (x)| |fn (x) − f (x)| + |f (x)| |gn (x) − g(x)| ≤ M |fn (x) − f (x)| + M |gn (x) − g(x)| < ε. For (d), let 1/|gn (x)| ≤ M for all n and x. Then the same inequality holds for g, and 1 1 |gn (x) − g(x)| 1 gn (x) − g(x) = |gn (x)g(x)| ≤ M 2 |gn (x) − g(x)|. The hypothesis of uniform boundedness in parts (c) and (d) of the theorem cannot be relaxed. (See Exercises 6 and 7.) There are versions of the Cauchy criterion for pointwise and uniform convergence of sequences of functions. For the pointwise version, consider a sequence of functions fn on S such that limm,n |fn (x) − fm (x)| = 0 for each x ∈ S. Then {fn (x)}∞ n=1 is a Cauchy sequence of real numbers and hence converges to a unique real number f (x). Thus fn → f on S. Here is the analogous result for uniform convergence: 7.1.10 Uniform Cauchy Criterion. A sequence of functions fn converges uniformly on a set S iff for each ε > 0 there exists an index N such that |fn (x) − fm (x)| < ε for all x ∈ S and all m, n ≥ N .

(7.1)

Proof. If fn → f uniformly on S, then, given ε > 0, there exists an index N such that |fn (x) − f (x)| < ε/2 for all x ∈ S and all n ≥ N . An application of the triangle inequality yields (7.1). Conversely, assume that the condition holds. Then, in particular, limm,n |fn (x) − fm (x)| = 0 for every x ∈ S, hence, by the observation preceding the theorem, there exists a function f such that fn → f pointwise on S. We claim that the convergence is in fact uniform. To see this, let ε > 0 and choose N as in (7.1). Letting m → +∞ in that inequality then yields |fn (x) − f (x)| ≤ ε for all x ∈ S and all n ≥ N . This shows that fn → f uniformly on S. 7.1.11 Definition. Let S be an arbitrary set and let fn : S → R. If the sequence {fn (x)} is increasing (decreasing) for each x ∈ S and fn → f on S, we write fn ↑ f (fn ↓ f ). In either case we say that {fn } is monotone. ♦

Sequences and Series of Functions

197

The following theorem gives general conditions under which pointwise convergence implies uniform convergence. 7.1.12 Dini’s Theorem. Let f and fn be continuous on [a, b] for each n and suppose that either fn ↓ f or fn ↑ f on [a, b]. Then fn → f uniformly. Proof. We may assume that fn ↓ f . Let gn = fn − f , so gn ↓ 0. Suppose the assertion of the theorem is false. Then there exists an ε > 0, a subsequence {hn } of {gn }, and a sequence {xn } in [a, b] such that hn (xn ) ≥ ε for all n. (Why?) By the Bolzano–Weierstrass theorem, there exists a subsequence {xnk } converging to some x ∈ [a, b]. Since hn ↓, for any fixed n and all sufficiently large k, hn (xnk ) ≥ hnk (xnk ), hence hn (xnk ) ≥ ε. Letting k → +∞ in the last inequality yields hn (x) ≥ ε for all n, contradicting that hn (x) → 0. The examples xn on [0, 1) and x−n on [2, +∞) show that Dini’s theorem is false if the interval is not closed and bounded. The decreasing sequence defined by if 0 ≤ x ≤ 1, 1 fn (x) = 1 + n(1 − x) if 1 ≤ x ≤ 1 + 1/n, (7.2) 0 if 1 + 1/n ≤ x ≤ 2 shows that continuity of the limit function in Dini’s theorem is essential.

Exercises 1. Find the largest subset of R on which the given sequence converges pointwise, and determine the intervals on which the convergence is uniform. (a) xn (1 − x)n . nx2 . enx2 √ 2 nx (g) S . 1 + nx2 x2n (j) S . 2 + x2n (d) S

(b) S np xn (1 − x).

(c) ex/n .

1 . 2n 1 + x (1 − x)2 nx2 (h) . 1 + nx2 1 (k) . 1 + |x|n

x (f) n1/2 sin 2/3 . n n x (i) . 2+x n sin x2 (l) . 1 + nx2

(e)

2. Describe the convergence behavior of the following sequences on [0, 1]: x nx nx 1 . (b)S . (c) . (d) . (a)S nx + 1 nx + 1 n2 x + 1 n2 x2 + 1 3. Describe the convergence behavior of the sequences on (0, 1): (a) {x1/n }.

(b) {x1+1/n }.

(c) {x−1/n }.

(d) {x1−1/n }.

4. Show directly that the sequence defined in (7.2) does not converge uniformly.

198

A Course in Real Analysis

5. Let p, q > 0. Prove that the sequence of functions uniformly to zero on [0, +∞) iff p < q.

xp converges n + xq

6.S Give an example of sequences {fn }, {gn } and functions f , g such that fn → f and gn → g uniformly, and fn gn → f g pointwise but not uniformly. 7. Give an example of a sequence {gn } and a function g such that gn → g uniformly and 1/gn → 1/g pointwise but not uniformly. 8. Let −∞ < a < b ≤ +∞. Suppose that fn → f uniformly on [a, r] for every r ∈ (a, b). Prove that fn → f uniformly on [a, b) iff for each sequence {bn } with bn ↑ b, fn (bn ) − f (bn ) → 0. Use this to show that fn (x) := x−n does not converge uniformly on [2, +∞). 9. Let fn be bounded for each n and let fn → f uniformly on a set S. Prove that supS fn → supS f and inf S fn → inf S f . 10.S Let f be uniformly continuous on R and an → a. Set fn (x) = f (x + an ). Show that {fn } converges uniformly on R. 11. Let fn be continuous on [a, b] for each n and let fn converge uniformly on (a, b) ∩ Q. Prove that fn converges uniformly on [a, b]. 12. Prove: If fn → f uniformly on each of the sets S1 , . . . , Sm , then fn → f uniformly on S1 ∪ · · · ∪ Sm . Show that the corresponding statement for a union of infinitely many sets is false. 13.S For x ∈ [0, 1] define ( 1 if x ∈ Q and x = k/m in reduced form with m ≤ n, fn (x) = 0 otherwise. Show that {fn } converges pointwise but not uniformly to the Dirichlet function. 14. Let p ∈ N. For x ∈ [0, 1] define ( (m + 1/n)p if x ∈ Q, x = k/m in reduced form gn (x) = 0 if x is irrational. Show that gn converges uniformly on [0, 1] iff p = 1. 15. Let {fn } be uniformly bounded, let f, g be bounded on [0, 1], and suppose that fn → f pointwise (uniformly) on [r, 1] for each 0 < r < 1. If g is continuous at 0 and g(0) = 0, prove that fn g → f g pointwise (uniformly) on [0, 1].

Sequences and Series of Functions

199

16. Let {fn } be uniformly bounded and fn → f uniformly on S. (a) Prove that (f1 + f2 + · · · + fn )/n → f uniformly on S. (b)S Suppose for some r > 0 that fn (x) ≥ r for all n and all x ∈ S. Prove that (f1 f2 · · · fn )1/n → f uniformly on S. 17.S Let f0 be a bounded function on a set S and 0 < r < 1. Define a sequence {fn } recursively by fn (x) = sin rfn−1 (x) , x ∈ S, n ≥ 1. Prove that {fn } converges uniformly on S. Show that a similar result holds if S is an interval and sin x is replaced by any function g such that supx |g 0 (x)| < 1/r, where r is any positive number. 18. Let g and h be positive and continuous on [a, b] and define fn (x) :=

ng(x) . 1 + n2 h(x)

Prove that the following convergence is uniform on [a, b]: g g2 (a) n sin fn → . (b) n 1 − cos fn → 0. (c) n2 1 − cos fn → 2 . h 2h

7.2

Properties of the Limit Function

The theorems in this section give conditions under which the properties of continuity, integrability, or differentiability of functions in a sequence are passed along to the limit function. We shall see that pointwise convergence is generally insufficient for this—the stronger property of uniform convergence is needed. The following theorem asserts that under suitable conditions two limit processes may be interchanged. It is one of several such results to be found in the text. 7.2.1 Interchange of Limits. Let fn → f uniformly on a subset E of R and let a be an accumulation point of E such that Ln := lim{x→a, x∈E} fn (x) exists in R for each n. Then L := limn Ln exists in R and lim{x→a, x∈E} f (x) = L. In other words, the equality lim x→a lim fn (x) = x→a lim lim fn (x) n

x∈E

x∈E

n

holds provided that each inner limit exists in R and the convergence in the inner limit on the right is uniform.

200

A Course in Real Analysis

Proof. Given ε, for each n choose δn > 0 such that |fn (x) − Ln | < ε/3 for all x ∈ E with |x − a| < δn . Next, choose N ∈ N such that |fn (x) − f (x)| < ε/6 for all x ∈ E and all n ≥ N . For n, m ≥ N , choose x ∈ E such that |x − a| < min{δn , δm }. Then |Ln − Lm | ≤ |Ln − fn (x)| + |fn (x) − fm (x)| + |Lm − fm (x)| < ε. This shows that {Ln } is a Cauchy sequence and hence converges to some L ∈ R. Let n ≥ N be sufficiently large so that |Ln − L| < ε/6. If x ∈ E and |x − a| < δn , then |f (x) − L| ≤ |f (x) − fn (x)| + |fn (x) − Ln | + |Ln − L| < ε/6 + ε/3 + ε/6 < ε. Therefore, lim{x→a, x∈E} f (x) = L. 7.2.2 Corollary. If fn → f uniformly on an interval I and if each fn is continuous at some a ∈ I, then f is continuous at a. Proof. Take Ln = fn (a) in the theorem. The corollary is false if the convergence is only pointwise. For example, the sequence of continuous functions xn converges pointwise on [0, 1] to a function that is discontinuous at x = 1. 7.2.3 Theorem. If fn → f uniformly on [a, b] and fn ∈ Rba for all n, then f ∈ Rba and Z b Z b lim fn (t) dt = f (t) dt. (7.3) n

a

a

Proof. By 7.1.8, f is bounded. By uniform convergence, given ε > 0, there exists an N such that ε ε fn (x) − < f (x) < fn (x) + 4(b − a) 4(b − a) for all x ∈ [a, b] and n ≥ N . It follows that for fixed n ≥ N and any partition P, ε ε S(fn , P) − ≤ S(f, P) ≤ S(f, P) ≤ S(fn , P) + , 4 4 hence ε S(f, P) − S(f, P) ≤ S(fn , P) − S(fn , P) + . 2 Since fn is integrable, P may be chosen so that the right side of this inequality is less than ε. Therefore, f is integrable. Since |fn (t) − f (t)| < ε/4(b − a) for n ≥ N and all t, Z b Z b Z b ε fn (t) dt − f (t) dt ≤ |fn (t) − f (t)| dt ≤ , 4 a a a Rb Rb which shows that a fn → a f .

Sequences and Series of Functions

201

The following examples show that the hypothesis of uniform convergence in 7.2.3 cannot be relaxed. 7.2.4 Example. Define fn : [0, π] 7→ R by ( n sin(nx) if 0 ≤ x ≤ π/n, fn (x) = 0 if π/n ≤ x ≤ π.

fn n

π/n

π

x

FIGURE 7.2: Pointwise convergence insufficient. Each fn isR continuous and {fn } converges pointwise on [0, π] to the zero π function, yet 0 fn = 2 for all n. ♦ 7.2.5 Example. Let r1 , r2 , . . . be an enumeration of the rationals in [0, 1] and let ( 1 if x ∈ {r1 , . . . , rn }, fn (x) = 0 otherwise. Then fn is integrable with zero integral and fn converges pointwise to the Dirichlet function, which is not Riemann integrable. ♦ In the two preceding examples, either the sequence was not uniformly bounded or the limit function was not integrable. It will follow from results in Chapter 11 that if {fn } is uniformly bounded, fn , f ∈ Rba , and fn → f merely pointwise on [a, b], then (7.3) holds. 7.2.6 Theorem. Let fn be differentiable on (a, b) for each n and let {fn0 } converge uniformly on (a, b). If {fn (x0 )} converges for some x0 ∈ (a, b), then {fn } converges uniformly to a differentiable function f on (a, b) and fn0 → f 0 on (a, b). Proof. Given ε > 0, choose N such that, for all m, n ≥ N and x ∈ (a, b), |fn (x0 ) − fm (x0 )| <

ε ε 0 and |fn0 (x) − fm (x)| < . 2 2(b − a)

Fix m, n ≥ N . By the mean value theorem applied to fn − fm , for each pair

202

A Course in Real Analysis

x, y ∈ (a, b) there exists ξm,n ∈ (a, b) such that 0 fn (x) − fm (x) − fn (y) − fm (y) = |fn0 (ξm,n ) − fm (ξm,n )||x − y| ≤

ε|x − y| ε ≤ . 2(b − a) 2

(7.4)

In particular, for all x ∈ (a, b), |fn (x) − fm (x)| ≤ fn (x) − fm (x) − fn (x0 ) − fm (x0 ) + |fn (x0 ) − fm (x0 )| < ε/2 + ε/2 = ε. By the uniform Cauchy criterion, {fn } converges uniformly on (a, b) to some function f . Also, from (7.4), for fixed y and for all x 6= y, fn (x) − fn (y) fm (x) − fm (y) ε ≤ − 2(b − a) . x−y x−y Therefore, the sequence of functions [fn (x)−fn (y)]/(x−y) converges uniformly in x on the set Ey := (a, y) ∪ (y, b). Since fn converges to f , f (x) − f (y) fn (x) − fn (y) → x−y x−y

uniformly in x on Ey .

By 7.2.1 with E = Ey , lim fn0 (y). = lim lim n

n x→y

fn (x) − fn (y) f (x) − f (y) = lim = f 0 (y). x→y x−y x−y

The sequence given by fn (x) = xn /n, 0 < x < 1 shows that uniform convergence of a sequence of functions does not guarantee that the derivatives converge uniformly.

Exercises 1. Prove: If fn → f uniformly on an interval I and each fn is continuous at a ∈ I, then, for any sequence {an } in I with an → a, limn fn (an ) = f (a). 2. Show that if fn → f uniformly on a subset E of R and each fn is uniformly continuous on E, then f is uniformly continuous on E. 3. Prove that (1 + x/n)n → ex uniformly on any bounded interval of R. R1 Conclude that 0 (1 + x/n)n → e − 1. R1 4.S Show that n2 xe−nx → 0 for all x ≥ 0, yet 0 n2 xe−nx dx 6→ 0. Why does this not contradict 7.2.3?

Sequences and Series of Functions R1 5. Evaluate limn 0 fn if fn (x) =

203

1 x n(ex/n − 1) . (b) . (c) . cos(x/n) n sin(x/n) x √ n e−x/n − 1 ax(x + 1)n + 1 (d)S . (e) arctan , a > 0. x nx + 1 √ 6.S Prove that fn (x) := n/(1 + n2 x2 ) converges to 0 pointwise on (0, +∞), uniformly on [r, +∞) for every r > 0, but not uniformly on (0, 1). Show R1 that, nonetheless, 0 fn → 0. (a)

7. Let {an } be a positive, strictly increasing sequence. Prove that lim n

Z 0

1

an x dx = 1 + an x

Z

1

lim n

0

an x dx. 1 + an x

8. Let f and f 0 be positive and continuous on [a, b]. Define p 2n f 0 (x) nf 0 (x) and gn (x) := . fn (x) := 1 + n2 f (x) 1 + n2 f (x) Use Exercise 7.1.18 to find Z b Z b Z b (a)S lim n sin fn . (b) lim n(1 − cos fn ). (c) lim n(1 − cos gn ). n

n

a

n

a

a

9. Show that if fn → f uniformly on [a, b] and fn is integrable for each n then Z x Z x fn (t) dt → f (t) dt S

a

a

uniformly in x on [a, b]. 10. Suppose that fn is improperly integrable on [a, c), fn → f uniformly on [a, t] for all t ∈ [a, c), and |fn | ≤ g on [a, c) for all n, where g is improperly integrable on [a, c). Prove that f is improperly integrable on [a, c) and Z Z c

lim n

c

fn =

a

f. a

11. Prove that if f is continuous on [0, 1], then lim n

Z

1

f (xn ) dx = f (0).

0

12. For each n, let fn be continuous on [a, +∞), a > 0, and suppose that cn := limx→+∞ fn (x) exists in R. Prove that if fn → f uniformly on

204

A Course in Real Analysis [a, +∞), then limn cn and limx→+∞ f (x) exist and are equal. Show also that Z 1/a Z 1/a lim fn (x) dx = f (x) dx. n

0

0

Hint. Let gn (x) = fn (1/x), 0 < x ≤ 1/a and apply 7.2.1. 13. Let fn be as in 7.2.4 and define Z x gn (x) = fn (t) dt, hn (x) = xgn (x),

0 ≤ x ≤ π.

0

Show that (a) {gn } converges pointwise and monotonically on [0, π] but not uniformly. (b) {hn } converges uniformly on [0, π]. (c) {h0n } does not converge uniformly on [0, π].

7.3

Convergence of Series of Functions

7.3.1 Definition. Let {fn } be a sequence of real-valued functions on a set S. For each x ∈ S and n ∈ N form the nth partial sums sn (x) =

n X

fn (x) and tn (x) =

k=1

n X

|fn (x)|.

k=1

P P∞ The infinite series of functions n fn = n=1 fn is said to converge P • pointwise on S if n fn (x) converges for each x ∈ S; P • absolutely pointwise on S if n fn (x) converges absolutely for every x ∈ S; • uniformly on S if {sn } converges uniformly on S; • absolutely uniformly on S if {tn } converges uniformly on S.

♦

The methods of Chapter 6 series may be applied at each x to test pointwise convergence of a series of functions. For uniform convergence, additional tests are required. The following result is an immediate consequence of 7.1.9. P P 7.3.2 Theorem. Let P n fn and n gn converge uniformly on a set S and let α, β ∈ R. Then n (αfn + βgn ) converges uniformly on S and X X X (αfn + βgn ) = α fn + β gn . n

n

n

Sequences and Series of Functions

205

The next theorem is a useful test for nonuniform convergence of a series. The proof is immediate from the identity fn = sn − sn−1 . P 7.3.3 Theorem. If n fn converges uniformly on a set S, then fn → 0 uniformly on S. For example, the geometric series ∞ X

xn =

n=0

1 , |x| < 1, 1−x

(7.5)

converges pointwise but not uniformly on (−1, 1), since xn does not tend to zero uniformly on (−1, 1). We show below that the series converges uniformly on all closed subintervals of (−1, 1). The comparison test for uniform convergence of a series of functions takes the following form: 7.3.4 Uniform P Comparison Test. If |fn (x)| ≤ Pgn (x) for all n and all x ∈ S and if n gn converges uniformly on S, then n fn converges absolutely uniformly on S. Pn Pn Proof. Since k=m gn (x), the assertion follows from the k=m |fn (x)| ≤ uniform Cauchy criterion. P 7.3.5 Corollary. If n fn converges absolutely uniformly on a set S, then P n fn converges uniformly on S. P Proof. 0 ≤ fn +|fn | ≤ 2|fn |, hence, by 7.3.4, n (fn +|fn |) converges uniformly on S and therefore so must X X X fn = (fn + |fn |) − |fn |. n

n

n

7.3.6 Weierstrass M -test. If there exist positive P P constants Mn such that +∞ and |f | ≤ M on S for all n, then M < n n n n n fn converges absolutely uniformly on S. Proof. Take gn to be the constant function Mn in 7.3.4. For example, taking Mn = rn , we see that the geometric series (7.5) converges uniformly in every interval [−r, r], 0 < r < 1. The next results are uniform convergence analogs of Dirichlet’s and Abel’s tests for numerical series. P 7.3.7 Theorem. If n fn converges uniformly on a set S and if there exists a constant M such that |g1 (x)| +

∞ X

|gn+1 (x) − gn (x)| ≤ M for all x ∈ S,

n=1

then

P

n

fn gn converges uniformly on S.

206

A Course in Real Analysis Pn P∞ Pn Proof. Let sn = k=1 fk − n=1 fn and tn = k=1 fk gk . For each n > 1, gn =

n−1 X

gk+1 − gk + g1 ,

k=1

hence |gn | ≤ M on S. Given ε > 0, choose N so that |sn (x)| < ε for all n, m ≥ N and x ∈ S. By 6.4.4, for m > n > N and x ∈ S, |tm (x) − tn−1 (x)| m X ≤ |sk (x)| |gk (x) − gk+1 (x)| + |sm (x)| |gm (x)| + |sn−1 (x)| |gn (x)|

(7.6)

k=n

≤ M ε + M ε + M ε = 3M ε. Therefore, {tn } is uniformly Cauchy on S and hence converges uniformly. P 7.3.8 Theorem. If, on a set S, the partial sums of n fn are uniformly P bounded, |g − g | converges uniformly, and g → 0 uniformly, then n+1 n n n P f g converges uniformly on S. n n n Pn Proof. Let tn be in the proof of 7.3.7, sn := k=1 fk , and let M be a uniform bound for {sn } on S. Given ε > 0, choose N such that |gn (x)| < ε and

m X

|gk (x) − gk+1 (x)| < ε, m > n > N, x ∈ S.

(7.7)

k=n

Since (7.6) holds in the current setting, (7.7) implies that |tm (x) − tn−1 (x)| ≤ 3M ε, m > n > N, x ∈ S. Therefore, {tn } converges uniformly on S. P 7.3.9 Corollary. If the partial sums of P n fn are uniformly bounded and if gn ↓ 0 or gn ↑ 0 uniformly on S, then n fn gn converges uniformly on S. Proof. Assume that {gn } is decreasing. Then n X k=1

hence

P∞

n=1

|gk+1 − gk | =

n X

(gk − gk+1 ) = g1 − gn+1 ,

k=1

|gn+1 − gn | converges uniformly.

7.3.10 Example. Let gn be continuous and gn ↓ 0 or gn ↑ 0 on R. We apply the preceding corollary to the series X s(x) := gn (x) sin nx n

Sequences and Series of Functions

207

on closed bounded intervals I not containing any integer multiple of 2π. By Dini’s theorem, gn → 0 uniformly on I. Also, by 6.4.6, s(x) converges pointwise on R. Moreover, if x is not a multiple of 2π, then n X sin(kx) ≤ k=1

1 . sin(x/2)

Pn Since inf I | sin(x/2)| > 0, the sums k=1 sin(kx) are uniformly bounded on I. By 7.3.9, s(x) converges uniformly on I. By 7.3.8, the sameP result holds if, instead of monotonicity of the sequence {gn }, we require that n |gn+1 − gn | converges and P gn → 0, both uniformly on I. Analogous results hold for series ♦ of the form n gn (x) cos nx. 7.3.11 UniformPAlternating Series Test. If gn ↓ 0 or gn ↑ 0 uniformly ∞ on a set S, then n=1 (−1)n+1 gn converges uniformly on S. Proof. Take fn = (−1)n+1 in 7.3.9. 7.3.12 Example. Let f be continuous on R and monotone in some neighborhood N of 0 with f (0) = 0. If an ↓ 0, then the series ∞ X

(−1)n+1 f (an x)

n=1

converges uniformly on any closed, bounded interval I. We verify this for the case I ⊆ [0, +∞) and f increasing. Choose N so that an x ∈ N for all n ≥P N and x ∈ I. Then f (an x) ↓ 0 on I, hence, by Dini’s ∞ theorem and 7.3.11, n=1 (−1)n f (an x) converges uniformly on I. For example, taking an = 1/n we see that the series ∞ X n=1

(−1)n+1 sin(x/n),

∞ X

(−1)n+1 n−1 xex/n , and

n=1

all converge uniformly on closed bounded intervals.

∞ X

(−1)n+1 [1 − e−n

−2

x2

]

n=1

♦

The following theorem is an immediate consequence of 7.2.2, 7.2.3, and 7.2.6 applied to the sequence of partial sums of the series. P 7.3.13 Theorem. Let fn : [a, b] → R and s := n fn . (a) If s converges uniformly on [a, b] and each fn is continuous, then s is continuous. (b) If s converges uniformly on [a, b] and fn ∈ Rba for all n, then s ∈ Rba and Rb P Rb s = n a fn . a P 0 (c) Let fn be differentiable on (a, b) and suppose P that the derived series n fn converges uniformly on (a, b) and that n fn (x0 ) converges P for some x0 ∈ (a, b). Then s converges uniformly on (a, b) and s0 = n fn0 .

208

A Course in Real Analysis P −1 7.3.14 Example. (a) By 7.3.10, sin(nx) converges uniformly on nn intervals [a, b] ⊆ (0, 2π), hence Z a

b

X

n−1 sin(nx) dx =

n

XZ n

b

n−1 sin(nx) dx =

a

X cos(na) − cos(nb) n2

n

.

P On the other hand, the derived series n cos(nx) does not converge. P P (b) Both s(x) := n n−1 sin(x/n) and its derived series n n−2 cos(x/n) converge uniformly on R, hence the latter equals s0 (x). ♦ P A closed form for a series s := n fn on a subset E of R is a “standard function” that equals s on E. Closed forms are typically combinations of rational, power, exponential, logarithmic, trigonometric, or inverse trigonometric functions. 7.3.15 Example. Since 1/(1 − x) is a closed form for the geometric series (7.5) on (−1, 1), the function 2 + sin x = 1 1 + sin x 1− 2 + sin x n ∞ X 1 on intervals I not containing is a closed form for the series 2 + sin x n=0 (4n − 1)π/2 or −(4n + 1)π/2, n = 0, 1, 2, . . .. By the Weierstrass M -test, the series converges absolutely uniformly on closed subintervals of I, since on such a subinterval 0 < 1/(2 + sin x) < 1/(1 + ε) for some ε > 0. ♦ 1

Exercises 1. For the fn below, determine all subintervals of [0, +∞) on Pfunctions ∞ which n=0 fn (x) converges pointwise or uniformly, where p ∈ N. (a) S (d) S

1 . 1 + xn x . 1 + n2 x

(g) S np e−nx . (j) xn (1 − x)n .

(b) (e)

xn . 1 + xn n x . x−2

(h) n−x . n 1−x (k) . 1+x

(c) (f)

x . +x sin(nx) . 1 + n2 x2 n2

(i) S sin(x/np ). (l) xn e−nx .

2. Find the largest intervals of pointwise P convergence and uniform conver∞ gence and a closed form for the series n=0 fn (x), where fn (x) = (−1)n (a) cosn πx/2 , x ∈ [0, 1]. (b)S lnn (1/x). (c) nx . (d) (x2 ln x)n . e

Sequences and Series of Functions 209 P 3. Prove P +that if P n−fn converges absolutely uniformly on a set S, then n fn and n fn converge uniformly on set S, where, for each x ∈ S, fn+ (x) and fn− (x) are, respectively, the positive and negative parts of fn (x). P 4.S Suppose that the numerical series n an converges absolutely. Let s(t) =

X

X an sin (2n + 1)t and c(t) = an cos(nt).

n

n

Find series expansions for Z

π/2

s(t) dt and

x ∞ X

5. Let p > 0 and s(x) =

Z

x

c(t) dt. 0

sin(x/np ). Prove:

n=1

(a) If p ≤ 1, then s(x) diverges for all x 6= 0. (b) If p > 1, then s(x) converges absolutely uniformly on bounded intervals, (hence pointwise on R) but not uniformly on R. 6.S Let p > 0 and s(x) =

∞ X

[1 − cos(x/np )]. Prove:

n=1

(a) If p ≤ 1/2, then s(x) diverges for all x 6= 0. (b) If p > 1/2, then s(x) converges absolutely uniformly on bounded intervals, (hence pointwise on R) but not uniformly on R. 7. Let f (x) be bounded on [0, 1] and t(x) :=

∞ X

xn f (x), x ∈ [0, 1].

n=0

(a) Prove that t(x) converges pointwise on [0, 1) and uniformly on [0, r] for 0 < r < 1. (b) Prove that if f (1) 6= 0, then the convergence of t(x) is not uniform on [0, 1). (c) Suppose that L := limx→1− (1 − x)−1 f (x) exists. Prove that the convergence of t(x) is uniform on [0, 1) iff L = 0. (d) Let m ∈ N. Determine whether the convergence of t(x) is uniform on [0, 1) for f (x) = (i) (1 − x)m .

(ii) 1 − xm .

(iii) 1 − sin(πx/2).

(iv) cos(πx/2).

210

A Course in Real Analysis

8. (Uniform limit comparison test). Let fn ≥ 0 and gn > 0 on a set S and let fn /gn → h uniformly on S, where h : S → R satisfies 0 < inf h ≤ sup h < +∞. S

Prove that on S.

P

n

S

fn converges uniformly on S iff

P

n gn

converges uniformly

9.S Suppose that f 0 exists, is bounded on I := (−r, r), and f (0) = 0. Prove that the series ∞ X 1 x s(x) := f n n+1 n=0 converges uniformly on I and that s0 (0) = f 0 (0). 10. Suppose that |f (x)| ≤ |x| on I = (−r, r), r > 0. If f is differentiable on I and f 0 is continuous at 0, show that the series s(x) in Exercise 9 converges uniformly on I and that |s0 (0)| ≤ 1. P 11.S Let fn (x) be continuous and nonnegative on [a, b]. Prove that if n fn converges pointwise on [a, b] to a continuous function, then the convergence is uniform. P∞ −1 12. Let {aP n } be a sequence such that n=1 an converges absolutely. Prove ∞ −1 that |x − a | converges uniformly on bounded intervals not n n=1 containing any an . P 13.S Suppose each n. Prove that if n fn (a) P that fn is monotone on [a, b] forP and n fn (b) converge absolutely, then n fn ∈ Rba and Z bX XZ b fn = fn . a

n

n

a

P 14. Let n fn converge uniformly on S and let {gn } be a uniformly bounded sequence of functions on a set S such that eitherP {gn } is monotone increasing or monotone decreasing on S. Prove that n fn gn converges uniformly on S. P 15.S Suppose that the partial sums of n fn are uniformly bounded on I = [a, Pb], gn is continuous for each n, and gn ↓ 0 or gn ↑ 0 on I. Prove that n fn gn converges uniformly on I. P 16. Suppose that n fn converges uniformly on I = [a, b], gn is continuous for P each n, and gn ↓ g or gn ↑ g on I, where g is continuous. Prove that n fn gn converges uniformly on I. 17. Suppose thatPgn is continuous on I = [a, b] for each n, {gn } is monotone, and s(x) := n (−1)n gn (x) converges for each x ∈ I. Prove that s(x) is continuous on I.

Sequences and Series of Functions

211

18.S Let g be continuous and nonnegative on R. Prove that the series s(x) :=

∞ X

(−1)n

n=1

g(x) + n n2

converges uniformly on bounded intervals, hence pointwise on R, but does not converge absolutely for any x. 19. Let gn be continuous and gn ↓P 0 on R. Show that if [a, b] does not contain any odd multiple of π, then n (−1)n gn (x) cos nx converges uniformly on [a, b].

7.4

Power Series

A power series in x about a is an infinite series of the form s(x) =

∞ X

cn (x − a)n ,

a, cn ∈ R.

n=0

In the following four subsections we examine the properties of these important series. The first step is to determine the convergence set of a power series.

Radius of Convergence of a Power Series 7.4.1 Convergence Theorem. Given a power series s(x) := P∞ Radius of n n=0 cn (x − a) , define the extended real number R ∈ [0, +∞] by R = ρ−1 , where ρ := lim sup |cn |1/n . n

Then s(x) (a) converges absolutely pointwise for |x − a| < R; (b) converges absolutely uniformly for |x − a| ≤ r < R; (c) diverges for |x − a| > R. Proof. For the case R = 0 (ρ = +∞), the theorem asserts that s(x) diverges for all x 6= a. This is immediate from the root test. A similar application of the root test proves (c): If |x − a| > R, then lim sup |cn (x − a)n |1/n = ρ|x − a| > 1. n

To prove (a) and (b), assume R > 0 (ρ < +∞) and let 0 < r < s < R.

212

A Course in Real Analysis

Then ρ < 1/s so there exists an index N such that |cn |1/n < 1/s for all n ≥ N . For such n and for all x with |x − a| ≤ r, |cn (x − a)n | ≤ (r/s)n . Since r/s < 1, the series converges uniformly on [a − r, a + r] by Weierstrass M -test. Since r is arbitrary, part (a) follows. The number R = 1/ρ is called the of convergence of the series. The Pradius ∞ set I of all x for which the series n=0 cn (x − a)n converges is called the interval of convergence. By 7.4.1, I is one of the intervals {a}, (a − R, a + R), (a − R, a + R], [a − R, a + R), or [a − R, a + R]. The theorem gives no further information regarding I. The methods of Chapter 6 may be applied to determine convergence behavior at the endpoints a ± R if R is finite. The following characterization of R is frequently useful. 7.4.2 Theorem. If cn > 0 for all sufficiently large n, then R = lim n

|cn | , |cn+1 |

provided the limit exists in R. Proof. Let L denote the limit and set an = |cn | > 0 for all n ≥ N . The assertion then follows from the inequalities an+1 1 1 an+1 = lim inf ≤ lim inf a1/n ≤ ρ = lim sup a1/n ≤ lim sup = , n n n n L an an L n n (Exercise 2.4.12). Here are some typical examples using 7.4.2, where I is the convergence interval. Examples. ∞ X (a) nn xn , I = {0}. n=1 ∞ X

xn , I = (−∞, +∞). n! n=1 ∞ X xn √ , I = [−1, 1), conditional convergence at −1. (c) n n=1 ∞ n X x (d) , I = [−1, 1], absolute convergence at ±1. n2 n=1 (b)

The following example is somewhat more interesting.

♦

Sequences and Series of Functions

213

7.4.3 Example. The Fibonacci sequence {cn } is defined by c0 = c1 = 1, cn = cn−1 + cn−2 , n ≥ 2. P∞ The Fibonacci power series is the series n=0 cn xn . We use 7.4.2 to show that √ the radius of convergence of the series is ( 5 − 1)/2. Set rn = cn+1 /cn . Note that the first few terms of the sequence {rn } are 1, 2, 3/2, 5/3, 8/5 and that rn =

cn + cn−1 1 =1+ , cn rn−1

n ≥ 2.

(7.8)

An induction argument then shows that 3/2 ≤ rn ≤ 5/3, n ≥ 2.

(7.9)

Now, from (7.8), rn − rm =

1 rn−1

−

1 rm−1

=

rm−1 − rn−1 . rm−1 rn−1

(7.10)

In particular, r2k+1 − r2k−1 =

r2k−2 − r2k r2k−3 − r2k−1 and r2k − r2k−2 = , r2k r2k−2 r2k−1 r2k−3

hence r2k+1 − r2k−1 =

r2k−1 − r2k−3 . r2k r2k−2 r2k−1 r2k−3

Iterating, we obtain r2k+1 − r2k−1 = (r3 − r1 )/ak for some ak > 0, hence {r2k+1 } is increasing. A similar argument shows that {r2k } is decreasing. Therefore, the sequences converge, say, r2k+1 → L and r2k → M . From (7.10), |rn − rn−1 | =

|rn−1 − rn−2 | |r2 − r1 | = ... = , rn−1 rn−2 bn

where bn is a product of 2n−2 terms, each of which is an rk . From (7.9), bn → +∞, hence rn − rn−1 → 0. Therefore, cn+1 = rn → L = M = 1/R, cn where R is the radius of convergence of the series. Taking √ limits in (7.8) shows that 1/R = 1 + R, which has positive solution R = ( 5 − 1)/2. ♦ Since a power series converges uniformly on closed bounded subintervals of (a − R, a + R), 7.3.13 implies that the series is continuous on the entire interval. The following theorem extends continuity to the endpoints.

214

A Course in Real Analysis

P∞ n 7.4.4 Abel’s Continuity Theorem. Let s(x) := n=0 cn (x − a) have radius of convergence R with 0 < R < +∞. If s(x) converges at x = a + R, then s(x) converges uniformly on [b, a + R] for any b ∈ (a − R, a + R). In particular, s is continuous on (a − R, a + R]. Proof. The transformation x = Ry +a produces a power series in y = (x−a)/R that converges on (−1, 1]. Hence we may assume in the original series that a = 0 and s(x) converges on (−1, 1]. It suffices then to show that s(x) converges uniformly on [0,P 1]. n Let sn (x) = k=0 ck xk , 0 ≤ x ≤ 1. For n > m > 1, define n X

Cm,n =

ck = sn (1) − sm−1 (1).

k=m

By 6.4.4, sn (x) − sm−1 (x) =

n−1 X

Cm,k (xk − xk+1 ) + Cm,n xn − Cm−1,n xm .

k=m

Since n cn converges, given ε > 0, we may choose N such that |Cm,n | < ε/3 for all n > m ≥ N . Then for all n > m ≥ N , P

|sn (x) − sm−1 (x)| ≤

n−1 X

|Cm,k |(xk − xk+1 ) + |Cm,n | + |Cm−1,n |

k=m

≤

n−1 2ε ε X k (x − xk+1 ) + . 3 3 k=m

Pn−1

Since k=m (xk − xk+1 ) = xm − xn ≤ 1, the last expression is ≤ ε. This shows that {sn } is uniformly Cauchy on [0, 1], hence converges uniformly. The next result shows that a power series may be differentiated or integrated term by term over the interior of the interval of convergence. P∞ 7.4.5 Theorem. Let s(x) := n=0 cn (x − a)n have radius of convergence R > 0. Then the derived series and the integrated series D(x) :=

∞ X

ncn (x − a)n−1 and I(x) :=

n=1

∞ X cn (x − a)n+1 n + 1 n=0

have radius of convergence R. Moreover, s(x) is differentiable on the interval (a − R, a + R), and for x ∈ (a − R, a + R) Z x s0 (x) = D(x) and s(t) dt = I(x). a

Sequences and Series of Functions

215

Proof. Since limn n1/n = limn 1/(n + 1)1/n = 1, lim sup |ncn |1/n = lim sup |cn /(n + 1)|1/n = lim sup |cn |1/n . n

n

n

Therefore, the series s(x), D(x), and I(x) have the same radius of convergence. Since the differentiation and integration takes place on closed subintervals where the convergence of each of the three series is uniform, the remaining assertions follow from 7.3.13.

Representation of Functions by Power Series P∞ A power series s(x) = n=0 cn (x − a)n is said to represent a function f on an interval I if f = s on I. The largest interval for which the representation is valid is called the representation interval. Note that the representation interval may be smaller than the convergence interval. (See the examples below.) P∞Power seriesn representations are unique. Indeed, if f is represented by n=0 cn (x − a) on Ia := (a − r, a + r), r > 0, then, by 7.4.5, f has derivatives of all orders on Ia , and repeated differentiation of the identity f (x) =

∞ X

cn (x − a)n , x ∈ Ia

n=0

shows that f (a) = cn n!. Therefore, if f has a power series representation about a, then ∞ X f (n) (a) (x − a)n , x ∈ Ia . f (x) = n! n=0 (n)

The last series is called the Taylor series expansion of f about a. For a = 0 it is called a Maclaurin series. The following examples show how various power series representations may be obtained from the geometric series representation of (1 − x)−1 given in (7.5). 7.4.6 Examples. (a) Differentiating (7.5) term by term and multiplying the result by x yields the representation ∞ X x = nxn , |x| < 1. (1 − x)2 n=1

(b) Replacing x in (7.5) by −t and integrating produces Z x ∞ X 1 (−1)n+1 n ln(x + 1) = dt = x , |x| < 1. n 0 1+t n=1

(7.11)

(7.12)

Since the series converges at x = 1, Abel’s continuity theorem shows that ∞ X (−1)n+1 ln 2 = , n n=1

216

A Course in Real Analysis

a result obtained in 6.4.8 by another method. (c) Replacing x in (7.5) by −t2 and integrating produces Z x ∞ X 1 x2n+1 arctan x = dt = (−1)n , |x| < 1. 2 2n + 1 0 1+t n=0

(7.13)

(d) For an example with a 6= 0, consider ∞ X 3 1 3 2n (x − 1)n , = = = 5 − 2x 3 − 2(x − 1) 1 − 2(x − 1)/3 n=0 3n

|x − 1| <

3 . ♦ 2

The next example and the theorem thereafter show that differentiation can be a powerful tool for finding a closed form for a power series. 7.4.7 Example. We show that ex =

∞ X xn , n! n=0

−∞ < x < +∞.

(7.14)

Let s(x) denote the series. By 7.4.2, the radius of convergence of s is (n + 1)! = lim(n + 1) = +∞, n n! so s(x) converges for all x. Differentiating the series term by term yields s0 (x) = s(x). Now set g(x) = e−x s(x). Then g 0 (x) = e−x [s0 (x) − s(x)] = 0, hence g is constant. Since g(0) = 1, s(x) = ex . ♦ lim n

a n

The following result is an extension of the binomial theorem. The coefficient in (7.15) is called a generalized binomial coefficient.

7.4.8 Binomial Series. For any a ∈ R and |x| < 1, ∞ X a a n a a(a − 1) · · · (a − n + 1) a (1 + x) = x , := , := 1. (7.15) 0 n n n! n=0 Proof. Let s(a, x) denote the series in (7.15). A simple calculation shows that −1 a a n+1 → 1. = n n+1 |a − n| Therefore, by 7.4.2, s(a, x) converges for |x| < 1. For such x, ∞ ∞ X a − 1 n X a − 1 n+1 (1 + x)s(a − 1, x) = x + x n n n=0 n=0 ∞ X a−1 a−1 =1+ + xn+1 n + 1 n n=0 ∞ X a =1+ xn+1 n + 1 n=0 = s(a, x),

(7.16)

Sequences and Series of Functions

217

where for the third equality we used the identity (Exercise 6) a−1 a−1 a + = , n ∈ Z+ . n n+1 n+1

(7.17)

Now differentiate the series s(a, x) term by term to obtain s0 (a, x) =

∞ X a n=1

n

nxn−1 =

∞ ∞ X X a a−1 n (n + 1)xn = a x n+1 n n=0 n=0

= as(a − 1, x).

(7.18)

Set g(x) = (1 + x)−a s(a, x), |x| < 1. By (7.18) and (7.16), g 0 (x) = −a(1 + x)−a−1 s(a, x) + a(1 + x)−a s(a − 1, x) = a(1 + x)−a−1 − s(a, x) + (1 + x)s(a − 1, x) = 0. Therefore, g(x) = g(0) = 1, hence s(a, x) = (1 + x)a , as claimed. 7.4.9 Example. Replacing x in (7.15) by −x, we have √

∞ X 1 −1/2 = (−1)n xn , n 1 − x n=0

|x| < 1.

Since 1 3 2n − 1 −1/2 1 − − ··· − = n! 2 2 2 n (−1)n 1 · 3 · 5 · · · (2n − 1) = n! 2n (−1)n 1 · 2 · 3 · 4 · · · (2n − 1) · 2n = n! 2n 2 · 4 · · · 2n n (−1) (2n)! = , (n!)2 4n we see that √

∞ X 1 (2n)! n = x , 1 − x n=0 (n!)2 4n

|x| < 1.

(7.19)

Replacing x by t2 and integrating term by term from 0 to x yields the Maclaurin series for arcsin x: arcsin x =

∞ X

(2n)! x2n+1 , 2 (2n + 1)4n (n!) n=0

|x| < 1.

(7.20)

218

A Course in Real Analysis

7.4.10 Remark. If a > 0 and is not an integer, then the binomial series converges absolutely uniformly on [−1, 1]. Indeed, if an = | na |, then an |a(a − 1) · · · (a − n + 1)| (n + 1)! n+1 = = , an+1 n! |a(a − 1) · · · (a − n)| |a − n| hence, for sufficiently large n, n+1 n(1 + a) an −1 =n −1 = → 1 + a > 1. n an+1 n−a n−a By Raabe’s test (6.3.2) the series converges absolutely at x = ±1, hence, by Abel’s continuity theorem (7.4.4), the series converges absolutely uniformly on the interval [−1, 1]. ♦

Multiplication of Power Series 7.4.11 The Cauchy product of the power series P∞ Definition. P∞ n n b x is the power series n n=0 n=0 cn x , where cn =

n X

P∞

n=0

an xn and

ak bn−k .

♦

k=0

P Note that n cn xn is precisely the series one obtains by formally carrying out the multiplication (a0 + a1 x + a2 x2 + · · · )(b0 + b1 x + b2 x2 + · · · ) P∞ and collecting like powers. We show below that if the power series n=0 an xn P∞ and n=0 bn xn converge for |x| < R, then so does the Cauchy product. For this we need the following result due to Mertens. P∞ P∞ 7.4.12 Lemma. If the numerical series A := n=0 αn and B := n=0 βn both converge, and if at least one of the series converges absolutely, then the Cauchy product ∞ n X X C := γn , γ n = αk βn−k , n=0

k=0

converges and C = AB. P∞ Proof. Assume that n=0 αn converges absolutely. Let An =

n X k=0

αk , Bn =

n X

βk , C n =

k=0

n X k=0

γk , and A0 =

∞ X

|αn |.

n=0

Then Cn = α0 β0 + (α0 β1 + α1 β0 ) + · · · + (α0 βn + α1 βn−1 + · · · + αn β0 ) = α0 Bn + α1 Bn−1 + · · · + αn B0 = α0 (Bn − B + B) + α1 (Bn−1 − B + B) + · · · + αn (B0 − B + B) = α0 (Bn − B) + α1 (Bn−1 − B) + · · · + αn (B0 − B) + An B.

Sequences and Series of Functions

219

Thus to show that Cn → AB it suffices to verify that α0 (Bn − B) + α1 (Bn−1 − B) + · · · + αn (B0 − B) → 0. Given ε > 0, choose N such that |Bn − B| < ε/(2A0 ) for all n > N . Since αn → 0, we may choose N 0 > N so that for all n > N 0 |αn (B0 − B) + αn−1 (B1 − B) + · · · + αn−N (BN − B)| < ε/2. For such n, |α0 (Bn − B) + α1 (Bn−1 − B) + · · · + αn (B0 − B)| ≤ |αn (B0 − B) + αn−1 (B1 − B) + · · · + αn−N (BN − B)| + |αn−N −1 | |BN +1 − B| + |αn−N −2 | |BN +2 − B| + · · · + |α0 | |Bn − B| < ε/2 + ε/2 = ε. 7.4.13 Cauchy Product Theorem. For each x, let C(x) =

∞ X

cn xn

cn :=

n=0

n X

ak bn−k

k=0

P∞ P∞ be the Cauchy product of series A(x) = n=0 an xn and B(x) = n=0 bn xn . If A(x) and B(x) have radii of convergence Ra and Rb , respectively, then C(x) has radius of convergence Rc ≥ min{Ra , Rb } and C(x) = A(x)B(x),

|x| < min{Ra , Rb }.

(7.21)

Moreover, if, say Rb < Ra and B(Rb ) converges, then C(Rb ) converges and C(Rb ) = A(Rb )B(Rb ). Proof. Assume that Rb ≤ Ra and let |x| < Rb . By 7.4.12 applied to αn = an xn and βn = bn xn , the series C(x) converges, hence Rc ≥ |x| and 7.21 holds. Since |x| was arbitrary, Rc ≥ Rb = min{Ra , Rb }. The last assertion of the theorem follows from 7.4.4 by letting x ↑ Rb in 7.21. 7.4.14 Example. By (7.5) and (7.14), for |x| < 1 ∞ n n X X X ex (−1)n−k (−1)k = cn xn , where cn = = (−1)n . 1 + x n=0 k! k! k=0

♦

k=0

Remark. If Ra = Rb and both A(Ra ) and B(Rb ) in 7.4.13 converge, it does not necessarily P∞ follow that√ C(Ra ) converges. Consider, for example, A(x) = B(x) = n=1 (−1)n xn / n, which has radius of convergence 1 and

220

A Course in Real Analysis

converges conditionally at x = 1. The Cauchy product at x = 1 is where n−1 X 1 p . cn = (−1)n k(n − k) k=1

P∞

n=1 cn ,

However, for odd n, |cn | =

n−1 X

p k=1

hence

P

(n−1)/2

1

n cn

k(n − k)

≥

1

X p k=1

k(n − k)

(n−1)/2

≥

1

√

2 p = , 2 2 (n − 1) /2

X k=1

diverges.

♦

Analytic Functions 7.4.15 Definition. A function f is said to be (real ) analytic at a point a if, for some r > 0, f has derivatives of all orders on (a−r, a+r) and is represented there by its Taylor series at a, that is, f (x) =

∞ X f (n) (a) (x − a)n , |x − a| < r. n! n=0

If f is analytic at each point of a set E, then f is said to be analytic on E.♦ A function that has derivatives of all orders on an interval may not be analytic there. This is the case for the function in Exercise 29 below. The following theorem gives a necessary and sufficient condition for analyticity at a point. 7.4.16 Taylor Series Representation. Let f have derivatives of all orders on an open interval I containing a. Then f is analytic at a iff there exist positive constants M and r such that |f (k) (x)| ≤ k!M k for all k ∈ N and x ∈ (a − r, a + r).

(7.22)

Proof. Assume condition (7.22) holds. To prove that f is analytic at a we use Taylor’s theorem (Section 4.6), which asserts that for each n ∈ N and x ∈ (a − r, a + r) there exists a number c = c(n, x) between x and a such that f (x) = Tn (x) + Rn (x), where Tn (x) :=

n−1 X k=0

f (k) (a) (x − a)k , k!

and Rn (x) :=

f (n) (c) (x − a)n . n!

Now let r ∈ (0, 1/M ) and |x − a| < r. By hypothesis, |Rn (x)| ≤ M n |x − a|n ≤ (M r)n . Since M r < 1, Rn (x) → 0, hence Tn (x) → f (x). Therefore, f is analytic at a.

Sequences and Series of Functions

221

Conversely, let f be analytic at a. Then there exist constants r1 ∈ (0, 1) and cn such that f (x) =

∞ X

cn (x − a)n , |x − a| ≤ r1 .

(7.23)

n=0

In particular, |cn r1n | → 0. Choose M1 > 1 so that |cn r1n | < M1 for all n. Termwise differentiation of (7.23) yields f (k) (x) =

∞ X

n(n − 1) · · · (n − k + 1)cn (x − a)n−k ,

n=k

hence for |x − a| ≤ r1 /2, |f (k) (x)| ≤

∞ X

n(n − 1) · · · (n − k + 1)|cn |(r1 /2)n−k

n=k

≤ M1 r1−k

∞ X

n(n − 1) · · · (n − k + 1)(1/2)n−k .

n=k

The last series is the kth derivative of the geometric series for (1 − x)−1 evaluated at 1/2 and therefore equals dk (1 − x)−1 = k!(1 − 1/2)−k−1 = k!2k+1 . dxk x=1/2 Thus

|f (k) (x)| ≤ M1 r1−k k!2k+1 ,

|x − a| ≤ r1 /2.

To obtain (7.22), take r = r1 /2 and choose M > 4M1 /r1 , so that M k > M1 r1−k 2k+1 for all k. 7.4.17 Example. Let f (x) = sin x. Then f (2k) (0) = 0 and f (2k+1) (0) = (−1)k . Since the derivatives of f are bounded, (7.22) holds for all x. Therefore, sin x =

∞ X (−1)n 2n+1 x3 x5 x =x− + − ..., (2n + 1)! 3! 5! n=0

∞ < x < +∞.

Similarly, cos x =

∞ X (−1)n 2n x2 x4 x =1− + − ..., (2n)! 2! 4! n=0

∞ < x < +∞.

♦

It is clear from 7.3.2 and 7.4.13 that the sum and product of functions analytic at a are analytic at a. In Exercise 33 the reader is asked to show that the reciprocal of a nonzero analytic function is analytic. It follows that the ratio of two analytic functions, if defined, is analytic. The next result extends the property of analyticity to nearby points.

222

A Course in Real Analysis P∞ 7.4.18 Theorem. If the series f (x) = n=0 an (x − a)n converges on I := (a − r, a + r), then f is analytic on I. Proof. By considering g(x) = f (x + a), we may suppose that a = 0. Let |b| < r, 0 < s < r − |b|, and |x − b| < s. We show that f has a power series expansion about b on the interval |x − b| < s. Since b is arbitrary, it will follow that f is analytic on I. n By the binomial theorem applied to (x − b) + b , ∞ X n ∞ X ∞ X X n f (x) = an (x − b)k bn−k = an dk,n (x − b)k bn−k , (7.24) k n=0 k=0 n=0 k=0 n where dk,n = k for k = 0, 1, · · · , n and dk,n = 0 for k > n. Now, ∞ X ∞ ∞ n X X X n k n−k |an (x − b) b dk,n | = |an | |x − b|k |bn−k | k n=0 n=0 k=0

k=0

=

∞ X

|an |(|x − b| + |b|)n .

n=0

If |x − b| < s then |x − b| + |b| < s + |b| < r and the last series converges. Therefore, (7.24) converges uniformly for |x − b| < s. By 6.5.4, the order of summation may be interchanged, so f (x) = = =

∞ X ∞ X k=0 n=0 ∞ X ∞ X k=0 n=k ∞ X

an dk,n (x − b)k bn−k an

n (x − b)k bn−k k

bk (x − b)k , where bk :=

k=0

∞ X n=k

an

n n−k b . k

This shows that f has a power series expansion about b on (b − s, b + s). 7.4.19 Theorem. Let f be analytic on an open interval I and let f = 0 on a subinterval (a, b) of I. Then f = 0 on I. Proof. Let c ∈ I, c > b, and define A = {t ∈ (a, c) | f (n) = 0 on (a, t] for all n ≥ 0}. Then A 6= ∅ and t0 := sup A ≤ c. Suppose, for a contradiction, that t0 < c. Since f is analytic at t0 , f has a Taylor series representation about t0 on J := (t0 − r, t0 + r) for some r > 0. By continuity and the approximation property of suprema, f (n) (t0 ) = 0 for each n. It follows that f is identically zero on J, contradicting the definition of t0 . Therefore, t0 = c, hence f = 0 on (a, c). Since c was arbitrary, f (x) = 0 for all x ∈ I with x ≥ a. Similarly, f (x) = 0 for all x ∈ I with x ≤ b.

Sequences and Series of Functions

223

The proof of the following corollary is left to the reader. 7.4.20 Corollary. Let f and g be analytic on the open intervals I and J, respectively. If I ∩ J = 6 ∅ and f = g on an open subinterval of I ∩ J, then there exists an analytic function h on I ∪ J such that h|I = f and h|J = g. The preceding corollary is known as analytic continuation, as it may be used to extend an analytic function to a larger interval.

Exercises 1. Find the interval of convergence of

P∞

n=1

fn (x), where fn (x) =

(−1)n n 23n n3 xn n2 n! (x − 1)n . (b) √ (x − 2)n . . (c) n 2 (2n)! n! (1 + 2/n)n (−1)n nxn n!xn (d) S . (e) n+2−1/n . (f) (x + 1)n . (n + 1) ln(n + 2) 3n n (1.5)(2.5) · · · (n + .5) n 1 n 2n + 5n 2n (g) S [3 + (−1)n ]n sin x . (i) S x . x . (h) n n n 3 +4 n! (a) S

2. Use (7.5) to represent the following functions as power series about the given point a. In each case, find the representation interval. x3 x x (a) , a = 0. (b)S , a = 0. (c) , a = 1. 2 (x + 1) 2 − 3x 3 + 2x 3. Use (7.12) to find power series representations for (a)S x ln x, (b) x2 ln x about the point a = 1. 4. Without using 7.4.16, find the Maclaurin series and representation interval for the following functions. 2 1 + 2x S (a) ln . (b) (1 + x2 ) arctan x. (c) x3 e−3x . 1 − 3x √ ex − 1 sin x cos x − 1 S √ . . (e) (d) (f) . x x2 x 1 (g) S sin x cos x. (h) √ . (i) sin(x + π/3). 9 − x2 5.S Use an identity and 7.4.9 to find the Maclaurin series for arccos x. 6. Verify the identity (7.17). 7. Without using 7.4.16, show that (a) sin x = (b) cos x =

∞ X n=0 ∞ X n=0

an (x − a)n , a2n =

(−1)n sin a (−1)n cos a , a2n+1 = . (2n)! (2n + 1)!

bn (x − a)n , b2n =

(−1)n cos a (−1)n+1 sin a , b2n+1 = . (2n)! (2n + 1)!

224

A Course in Real Analysis

8. Prove that

∞ ∞ X X 4(−1)n 2(2n)! = π. = 2 (2n + 1)4n 2n + 1 (n!) n=0 n=0

9. Find a power series representation for (a)

S

sin t − t . t3

√

(b)

cos t t.

10. Find a closed form for the series (a) n2 xn .

Rx

P∞

n=0 (n

2

f (t) dt if f (t) =

(c)

P∞

cos t − 1 . t

2

(d)

et − 1 . t2

fn (x), |x| < 1, where fn (x) =

n=0

(b)S (−1)n (2n + 1)x2n+1 .

11.S Sum the series

0

(c)

xn+1 n2 xn . (d) . (n + 1)(n + 2) n+1

+ n + 1)3−n .

12. Use 7.4.13 to find a series representation and representation interval for sin x ln(1 − x) e−x arctan x 2 . (c) (a)S . (b) √ . (d)S ex sin x. (e) . 2 1+x 1 − x x(1 + x2 ) 1−x 13. By calculating the Maclaurin series of the function sin2 x in two ways, establish the identity n

X 22n+1 1 = . (2n + 2)! (2k + 1)!(2n − 2k + 1)! k=0

14. By calculating the Maclaurin series of the function cos2 x in two ways, establish the identity n

X 22n−1 1 = . (2n)! (2k)!(2n − 2k)! k=0

15. By calculating the Maclaurin series of the function (1 − x)−3/2 in two ways, establish the identity n

(2n + 1)! X (2k)! = . (n!)2 4n (k!)2 4k k=0

16.S Show that the Fibonacci power series s(x) (7.4.3) has the closed form √ 5 − 1 /2. (1 − x − x2 )−1 , |x| < Conclude from Abel’s continuity theorem (7.4.4) that s(x) cannot con√ verge at the endpoint ( 5 − 1)/2.

Sequences and Series of Functions 225 P∞ 17. Let an → L ∈ R and set s(x) := n=0 an xn , |x| < 1. For m ∈ N, define ϕm (x) :=

2m−1 X

(−1)k xk .

k=0

Prove that limx→1− ϕm (x)s(x) = mL. Hint. Use Abel’s continuity theorem. 18.S Use the method of 7.4.9 to establish the representation ln

p

∞ X 1 + x2 + x =

(−1)n (2n)! x2n+1 , 2 4n (2n + 1)(n!) n=0

19. Let R be the radius of convergence of Prove:

P

n cn (x

|x| < 1.

− a)n and let p ≥ 0.

(a) If lim inf n |cn |np > 0, then R ≤ 1. (b) If lim supn |cn |/np < +∞, then R ≥ 1. 20. Let Ra and P Rb denote the radii of convergence of A(x) := B(x) := n bn xn , respectively. Suppose that

P

n

an xn and

lim sup(|an |/|bn |) < +∞. Prove that Ra ≥ Rb . 21.S Let Rs and Rt denote the radii of convergence of X X s(x) := cn (x − a)n and t(x) := cn2 (x − a)n , n

n

respectively. Prove: (a) If Rs > 1, then Rt = +∞. (b) If Rs ≤ 1, then no conclusion is possible. 22. Let Rs and Rt denote the radii of convergence of X X 2 s(x) := cn (x − a)n and t(x) := cn (x − a)n , n

n

respectively. Prove: (a)S If 0 < Rs < +∞, then Rt = 1. (b) If Rs = 0, then Rt ≤ 1, and any value of Rt ≤ 1 is possible. (c) If Rs = +∞, then Rt ≥ 1, and value of Rt ≥ 1 is possible.

226

A Course in Real Analysis

23. Suppose that the series A :=

∞ X

an , B :=

n=0

converge, where cn = AB = C.

∞ X

bn , and C :=

n=0

Pn

k=0

∞ X

cn

n=0

an bn−k . Use 7.4.12 and 7.4.4 to prove that

24. Prove that for any a, b ∈ R and n ∈ N, a b a b a b a+b + + ··· + = . 0 n 1 n−1 n 0 n 25. Let n ∈ Z+ . The Bessel function of order n may be defined as the power series ∞ X (−1)k x n+2k Jn (x) = . (n + k)!k! 2 k=0

Prove: (a) The radius of convergence of Jn (x) is +∞. (b) Jn satisfies Bessel’s differential equation x2 y 00 + xy 0 + (x2 − n2 )y = 0. d n (c) x Jn (x) = xn Jn−1 (x), n ≥ 1. dx ( ∞ X xn = +∞ if p ≤ 1, 26. Prove that lim np x→1− < +∞ if p > 1. n=1 P 27.S Let {cn } tend monotonically to 0. Prove that n cn xn is continuous on [−1, 1). 28. Let f (x) be bounded on [0, 1]. P (a)S Prove that t(x) := n nxn f (x) converges pointwise on [0, 1) and uniformly on [0, r] for 0 < r < 1. (b) Suppose that L := limx→1− (1 − x)−2 f (x) exists. Prove that the convergence of t(x) in (a) is uniform on [0, 1) iff L = 0. (Compare with Exercise 7.3.7.) 29. Show that the function ( f (x) =

2

e−1/x 0

if x 6= 0, otherwise

is not analytic at 0. (See Exercise 4.6.1.) 30.S Prove 7.4.20.

Sequences and Series of Functions 31. Prove: If f (x) is analytic at a, then f 0 (x) and g(x) := analytic at a.

227 Rx a

f (t) dt are

32. Let f be analytic at a and let {an } be a sequence of distinct real numbers such that an → a and f (an ) = 0 for all n. Prove that f is identically zero in a neighborhood of a. Hint. Assume that an ↑ a (how?). Construct, by (k) (k) (k) induction, sequences {an }n such that limn an = a and f (k) (an ) = 0 for all n and k. 33.S Let f be analytic at a and f (a) 6= 0. Carry out the following steps to show that 1/f is analytic at a. (a) Assume that f (a) = 1 and that f (x) =

∞ X

an (x − a)n 6= 0, |x − a| < r

n=0

for some r. Define a series g formally by g(x) =

∞ X

bn (x − a)n ,

n=0

where the sequence {bn } is given recursively by b0 = 1 and bn = −

n X

ak bn−k , n ≥ 1.

k=1

Show that if g(x) converges for |x − a| < r1 for some 0 < r1 < r, then f (x)g(x) = 1 for |x − a| < r1 . (b) Show that if |an | ≤ M n for all n, then |bn | ≤ (2M )n for all n. (c) Conclude that g is analytic at a and that g = 1/f .

Part II

Functions of Several Variables

Chapter 8 Metric Spaces

The essential feature in the notion of limit of a function is the idea of nearness. This is made precise by a distance function, which, in the case of limits on R, is derived from the absolute value function. It turns out that there are many other important mathematical structures equipped with a distance function and therefore admitting a definition of limit. In this chapter, we examine the general properties of these structures.

8.1

Definitions and Examples

8.1.1 Definition. A metric on a nonempty set X is a function d : X × X → R such that, for all x, y, z ∈ X, (a) d(x, y) ≥ 0 (nonnegativity), (b) d(x, y) = 0 iff x = y (coincidence), (c) d(x, y) = d(y, x) (symmetry), and (d) d(x, y) ≤ d(x, z) + d(y, z) (triangle inequality). The ordered pair (X, d) is called a metric space. A nonempty subset E of X with the metric d E×E is called a subspace of X and is denoted by (E, d). ♦ The real number system is a metric space under the usual metric d(x, y) = |x − y|. The following example shows that any nonempty set may be given a metric. 8.1.2 Example. (Discrete metric space). On a nonempty set X define d(x, x) = 0 for all x ∈ X, and d(x, y) = 1 if x = 6 y. Then d is easily seen to be a metric, called the discrete metric on X. For example, the triangle inequality d(x, y) ≤ d(x, z) + d(y, z) holds because the left side of the inequality is at most 1, in which case either x 6= z or y 6= z implying that the right side must be at least 1. ♦ 8.1.3 Definition. A subset E of a metric space X is said to be bounded if for some x0 ∈ X and M > 0, d(x, x0 ) ≤ M for all x ∈ E. ♦ 231

232

A Course in Real Analysis

The point x0 in the preceding definition may be replaced by any other point y0 ∈ X since for x ∈ E, d(x, y0 ) ≤ d(x, x0 ) + d(x0 , y0 ) ≤ M + d(x0 , y0 ). The notions of convergence and completeness readily carry over to general metric spaces: 8.1.4 Definition. A sequence {xn } in a metric space (X, d) is said to converge to a member x of X if limn d(xn , x) = 0. In this case we write xn → x or limn xn = x. A cluster point of a sequence in X is the limit of a convergent subsequence. ♦ The limit of a sequence {xn } in X, if it exists, must be unique. Indeed, if xn → x and xn → y, then, by the triangle inequality, 0 ≤ d(x, y) ≤ d(x, xn ) + d(y, xn ) → 0, hence d(x, y) = 0 and so x = y. 8.1.5 Definition. A sequence {xn } in a metric space (X, d) is said to be Cauchy if limm,n d(xm , xn ) = 0. A metric space (X, d) is said to be complete if every Cauchy sequence in X converges to a member of X. A subset E of X is complete if it is complete as a subspace of X, that is, every Cauchy sequence in E converges to a member of E. ♦ The real number system is complete under the usual metric. The subspace Q of R is not complete: a sequence of rational numbers converging to an irrational number is Cauchy. A discrete metric space is complete, since every Cauchy sequence is eventually constant and therefore trivially converges. 8.1.6 Proposition. (a) Every Cauchy sequence is bounded. (b) Every convergent sequence is Cauchy, hence bounded. Proof. (a) If {xn } is Cauchy, choose an index N such that d(xm , xn ) < 1 for all m, n ≥ N . Then, for all n ∈ N, d(xN , xn ) < 1 + max{d(xN , x1 ), d(xN , x2 ), . . . , d(xN , xN −1 )}. (b) If xn → x, then the inequality d(xm , xn ) < d(xm , x) + d(xn , x) implies that {xn } is Cauchy. The notions of pointwise convergence and uniform convergence of a sequence of real-valued functions easily extend to general metric spaces: 8.1.7 Definition. Let S be a nonempty set and let (X, d) be a metric space. A sequence of functions fn : S → X is said to converge pointwise to a function f : S → X if fn (s) → f (s) for each s ∈ S. In this case we write f = limn f or fn → f (on S). The sequence converges uniformly to f on S if for each ε > 0 there exists N ∈ N such that d fn (s), f (s) < ε for all n ≥ N and s ∈ S. ♦

Metric Spaces

233

8.1.8 Definition. Let X be a vector space. A norm on X is a function k · k from X to R such that for all x, y ∈ X and t ∈ R (a) kxk ≥ 0 (nonnegativity), (b) kxk = 0 iff x = 0 (coincidence), (c) ktxk = |t| kxk (absolute homogeneity), (d) kx + yk ≤ kxk + kyk (triangle inequality). The pair (X , k · k) is then called a normed vector space.

♦

The proof of the following proposition is left to the reader. 8.1.9 Proposition. If (X , k · k) is a normed vector space, then the function d(x, y) := kx − yk is a metric on X . From 1.6.4 and Exercise 1.6.4 we see that k · k2 , k · k1 , and k · k∞ are norms on Rn , hence, according to 8.1.9, give rise to metrics. We denote these, respectively, by d2 , d1 , and d∞ . In Exercise 17 the reader is asked to show that Rn is complete in each of these metrics. The metric d2 is called the Euclidean metric on Rn . The metric d1 is the `1 metric on Rn and d∞ the max metric on Rn . Clearly, for n = 1, all three metrics reduce to absolute value on R. 8.1.10 Example. Let S be a nonempty set and let B(S) denote the set of all bounded real-valued functions on S. Then B(S) is a vector space under the operations of addition f + g and scalar multiplication cf defined by (f + g)(s) = f (s) + g(s) and (cf )(s) = cf (s), s ∈ S. The supremum norm of f ∈ B(S) is defined by kf k∞ = sup {|f (s)| : s ∈ S} . It is easy to check that k · k∞ is indeed a norm. For example, the triangle inequality follows by taking the supremum over s ∈ S in the inequality |f (s) + g(s)| ≤ |f (s)| + |g(s)| ≤ kf k∞ + kgk∞ . Note that convergence of a sequence of functions in B(S) is simply uniform convergence on S. For this reason, k · k∞ is also called the uniform norm. The space B(S) is complete in the metric d∞ (f, g) := kf − gk∞ induced by the norm. To see this, let {fn } be a Cauchy sequence in B(S) and let ε > 0. Choose N such that d∞ (fn , fm ) < ε for all m, n ≥ N . For such m, n |fn (s) − fm (s)| < ε for all s ∈ S,

(8.1)

234

A Course in Real Analysis

hence {fn (s)} is a Cauchy sequence in R for every s ∈ S. Since R is complete, fn (s) → f (s) for some f (s) ∈ R. Fixing n in (8.1) and letting m → +∞ yields |fn (s) − f (s)| ≤ ε for all s ∈ S and all n ≥ N . This shows that f is bounded and that fn → f in B(S).

♦

In the case S = N, B(S) may be identified with the set of all bounded sequences and as such is denoted by `∞ . 1 8.1.11 the set of all sequences a = {an } in R such P Example. Let ` denote < +∞. Clearly, `1 is a vector subspace of `∞ . It is easy that n |an | P to check that kak1 := n |an | defines a norm on `1 . We show that `1 , k·k1 is complete in this norm. 1 Let {an := (a1,n , a2,n , . . .)}∞ n=1 be a Cauchy sequence in ` , and let ε > 0. Choose N so that

kan − am k1 =

∞ X

|ak,n − ak,m | < ε for all n, m ≥ N.

(8.2)

k=1

Since |ak,n − ak,m | ≤ kan − am k1 , the sequence {ak,n }n is Cauchy for each k, hence converges. Let ak = limn ak,n . Fix K ∈ N and n ≥ N . From (8.2), K X

|ak,n − ak,m | < ε for all m ≥ N .

k=1

Letting m → +∞, we obtain kan − ak1 =

PK

∞ X

k=1

|ak,n − ak | ≤ ε. Since K was arbitrary,

|ak,n − ak | ≤ ε for all n ≥ N.

k=1

It follows that a ∈ `1 and an → a.

♦

8.1.12 Definition. Let (X, d) and (Y, ρ) be metric spaces. The product metric d × ρ on X × Y is defined by (d × ρ) (x, y), (a, b) := d(x, a) + ρ(y, b), x, a ∈ X, y, b ∈ Y. The pair (X × Y, d × ρ) is called the product of the metric spaces X and Y . ♦ In Exercise 13 the reader is asked to prove, among other things, that d × ρ is indeed a metric and that a sequence {(xn , yn )} converges to (a, b) in X × Y in this metric iff xn → a in X and yn → b in Y .

Metric Spaces

235

Exercises 1.S Determine whether d is a metric on R2 , where d((x1 , x2 ), (y1 , y2 )) = (a) 2|x1 − y1 | + 3|x2 − y2 |.

(b) |x21 − y12 | + |x22 − y22 |.

(c) |x31 − y13 | + |x32 − y23 |.

(d) |x1 − x2 | + |y1 − y2 |.

(e)

|x1 − y1 | + |x2 − y2 | . 2 + |x1 − y1 | + |x2 − y2 |

(f) |ex1 − ey1 | + |ex2 − ey2 |.

2. (p-adic metric). Let p be a fixed prime number. Define ρp (n, n) = 0, and for m 6= n ∈ Z define ρp (m, n) = 1/pα , where α is the power of p in the unique prime factorization of |m − n|. (For example, ρ2 (42, 2) = 1/8, ρ5 (42, 2) = 1/5, and ρ3 (42, 2) = 1.) Show that ρp is a metric on Z. 3.S (Hamming distance). Let A be a nonempty set and X := An . For x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ X define d(x, y) to be the number of indices j for which xj 6= yj . Show that d is a metric on X. (The metric is named after Richard Hamming, who pioneered the field of error correcting codes.) 4. Let X be as in Exercise 3. Define ρ(x, x) = 0, and for distinct points x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in X define ρ(x, y) = 2−j , where j is the smallest index for which xj 6= yj . Show that ρ is a metric on X. 5.S Prove that a metric d satisfies |d(x, y) − d(a, b)| ≤ d(x, a) + d(y, b). Conclude that if xn → a and yn → b, then d(xn , yn ) → d(a, b). 6. Let X and Y be nonempty sets and let f : X → Y be one-to-one. Show that if ρ is a metric on Y , then d(x, y) := ρ(f (x), f (y)) defines a metric on X. 7. Prove that a finite union of bounded sets in a metric space is bounded. 8. Prove 8.1.9. 9. ⇓1 Prove that a Cauchy sequence with a cluster point converges. 10.S Let E1 , . . . , Em be complete subspaces of (X, d). Prove that the finite union E1 ∪ · · · ∪ Em is complete. Does the analogous assertion hold for a countable union of complete subspaces? 11. Let X := [1, +∞) have the metric d(x, y) = |x−1 − y −1 | (see Exercise 6). Show that xn → x with respect to the usual metric on X iff xn → x with respect to d. Is (X, d) complete? 1 This

exercise will be used in 8.5.8.

236

A Course in Real Analysis

12. Same as Exercise 11 but with the metric ρ(x, y) = x(1 + x2 )−1 − y(1 + y 2 )−1 . 13.S Let (X, d) and (Y, ρ) be metric spaces and let (Z, η) = (X × Y, d × ρ) be the product space. Prove: (a) η is a metric on Z. (b) A sequence {(xn , yn )} is Cauchy in Z iff {xn } is Cauchy in X and {yn } is Cauchy in Y . (c) A sequence {(xn , yn )} converges to (x, y) in Z iff xn → x in X and yn → y in Y . (d) Z is complete iff X and Y are complete. 14. Metrics d, ρ on a set X are said to be metrically equivalent if there exist positive constants a and b such that d(x, y) ≤ a ρ(x, y) and ρ(x, y) ≤ b d(x, y) for all x, y ∈ X. For example, by Exercise 1.6.6, the metrics d1 , d2 , and d∞ are metrically equivalent. Suppose that d and ρ are metrically equivalent. Let {xn } be a sequence in X and let x ∈ X. Prove the following: (a) xn → x in (X, d) iff xn → x in (X, ρ). (b) {xn } is Cauchy in (X, d) iff {xn } is Cauchy in (X, ρ). (c) (X, d) is complete iff (X, ρ) is complete. 15.S Let d be a metric on a set X and a > 0. Define ρ(x, y) := min{d(x, y), a}. Prove: (a) ρ is a metric on X. (b) A sequence is Cauchy in (X, ρ) iff it is Cauchy in (X, d). (c) A sequence converges in (X, ρ) iff it converges in (X, d). (d) (X, ρ) is complete iff (X, d) is complete. Are d and ρ metrically equivalent? Does σ(x, y) := max{d(x, y), a} define a metric on X? 16. Let ρ1 and ρ2 be metrics on X. Prove that max{ρ1 , ρ2 } is a metric. Is min{ρ1 , ρ2 } a metric? 17. Let x := (x1 , . . . , xn ), xk := (x1,k , . . . , xn,k ) ∈ Rn , k = 1, 2, . . .. Prove: (a) xk → x in (Rn , d2 ) iff xj,k → xj for j = 1, . . . , n. (b) {xk } is Cauchy in (Rn , d2 ) iff {xj,k }∞ k=1 is Cauchy in R for each j = 1, . . . , n. (c) Rn is complete in each of the metrics d1 , d2 , d∞ . (Use Exercise 14.)

Metric Spaces

237

18.S Let d be a metric on a set X and define ρ(x, y) :=

d(x, y) . 1 + d(x, y)

Verify that (a)–(d) of Exercise 15 hold. Are d and ρ metrically equivalent? 19. Let ρ1 and ρ2 be metrics on a set X and let α, β > 0. Define ρ(x, y) := αρ1 (x, y) + βρ2 (x, y). Prove: (a) ρ is a metric on X. (b) A sequence {xn } converges to x in (X, ρ) iff it converges to x in both (X, ρ1 ) and (X, ρ2 ). 20.S Let {dk }∞ k=1 be a sequence of metrics on a set X. For x, y ∈ X define ∞

ρk (x, y) =

X dk (x, y) 2−k ρk (x, y). and ρ(x, y) = 1 + dk (x, y) k=1

Prove: (a) ρ is a metric on X. (See Exercise 18.) (b) ρ(xn , x) → 0 iff dk (xn , x) → 0 for every k. 21. Let C(R) denote the set of continuous, real-valued functions on R. For f, g ∈ C(R) define ∞ X ρ(f, g) := 2−k ρk (f, g), k=1

where dk (f, g) =

sup

−k≤x≤k

|f (x) − g(x)| and ρk (f, g) =

dk (f, g) . 1 + dk (f, g)

Prove: (a) ρ is a metric on C(R). (b) fn → f in this metric iff fn → f uniformly on each bounded subset of R. (c) C(R) is complete in this metric. Rb 22. For f ∈ C([a, b]) define kf k1 = a |f |. Show that k · k1 is a norm on C([a, b]) and that C([a, b]) is not complete in the metric induced by this norm. 23.S Show that the sequence of functions fn (x, y) = (1 + xn )1/n (1 + y n )−1/n converges uniformly to f (x, y) = x/y on [1, b] × [1, b] for any b > 1.

238

8.2

A Course in Real Analysis

Open and Closed Sets Throughout this section, (X, d) denotes an arbitrary metric space.

It is frequently useful to formulate assertions regarding a metric space X in terms of certain subsets of X rather than the metric. The subsets of most interest in this regard are described in the next two definitions. 8.2.1 Definition. Let x ∈ X and r > 0. The sets Br (x) := {y ∈ X : d(x, y) < r} and Cr (x) := {y ∈ X : d(x, y) ≤ r} are called, respectively, the open and closed balls with center x and radius r. The set Sr (x) := Cr (x) \ Br (x) = {y ∈ X : d(x, y) = r} is called the sphere with center x and radius r. The ball Br (x) is also called a neighborhood of x. ♦ The open (closed) balls in R with the usual metric are simply the bounded open (closed) intervals. The spheres are the endpoints of these intervals. The open (closed) balls in Euclidean space R2 are open (closed) disks and the spheres are circles. The open and closed balls in a discrete metric space X are the sets X and {x}; the spheres are X \ {x} and the empty set. 8.2.2 Definition. A subset U of X is said to be open if either U = ∅ or else U has the following property: For each x ∈ U there exists ε > 0 such that Bε (x) ⊆ U . A subset of X is closed if its complement is open. The collection of all open sets is called the (metric ) topology of (X, d). ♦ In any metric space, X and ∅ are both open and closed. There are many metric spaces for which these are the only subsets that are both open and closed; Euclidean space Rn is an important example (see Section 8.7). The sets Q and I are neither open nor closed in R since every open ball (= open interval) contains members of both sets. A finite set F is always closed. Indeed, if x ∈ F c , then Br (x) ⊆ F c , where r = min{d(x, y) : y ∈ F }, hence F c is open. 8.2.3 Proposition. An open ball is open, a closed ball is closed, and a sphere is closed. Proof. Let x ∈ Br (x0 ). We claim that Bε (x) ⊆ Br (x0 ), where ε := r − d(x, x0 ). Indeed, if y ∈ Bε (x) then d(y, x0 ) ≤ d(y, x) + d(x, x0 ) < ε + d(x, x0 ) = r,

Metric Spaces

239

ε x

r x0

y

B(x)

Br (x0) FIGURE 8.1: An open ball is open. hence y ∈ Br (x0 ) (Figure 8.1). Since x was arbitrary, Br (x0 ) is open. A similar argument shows that Cr (x0 )c and Sr (x0 )c are open, hence Cr (x0 ) and Sr (x0 ) are closed. (See Exercise 2.) That Sr (x0 ) is closed also follows from 8.2.6 below. 8.2.4 Theorem. Open sets in (X, d) have the following properties: S (a) If Ui is open for each i in an index set I, then i∈I Ui is open. (b) If V1 , . . . , Vn are open, then V1 ∩ · · · ∩ Vn is open. Proof. (a) Let U denote the union. If x ∈ U , then x ∈ Ui for some i, hence there exists r > 0 such that Br (x) ⊆ Ui ⊆ U . Therefore, U is open. (b) Let V denote the intersection and let x ∈ V . For each j = 1, . . . , n there exists rj > 0 such that Brj (x) ⊆ Vj . Then Br (x) ⊆ V , where r = min{r1 , . . . , rn }. Therefore, V is open. 8.2.5 Corollary. A nonempty subset U is open iff it is the union of open balls. For example, in a discrete metric space, every subset is a union of open balls {x} = B1 (x) and hence is open. It follows that every subset is also closed. 8.2.6 Corollary. Closed sets in (X, d) have the following properties: T (a) If Ci is closed for each i in an index set I, then i∈I Ci is closed. (b) If C1 , . . . , Cn are closed, then C1 ∪ · · · ∪ Cn is closed. Proof. In (a), each Cic is open, hence, using DeMorgan’s law and 8.2.4, \ c [ Ci = Cic i∈I

i∈I

is open, that is, i∈I Ci is closed. Part (b) is proved in a similar manner, using DeMorgan’s law for complements of finite unions. T

240

A Course in Real Analysis

8.2.7 Theorem. A subset C of X is closed iff C contains the limit of each convergent sequence in C. Proof. Assume that C is closed and let {xn } be a sequence in C with xn → x. If x 6∈ C, then, because C c is open, there exists ε > 0 such that Bε (x) ∩ C = ∅. But then xn is eventually in Bε (x) ⊆ C c , impossible. Therefore, x ∈ C. Now suppose C is not closed. Then C c is not open, hence there exists x ∈ C c such that B1/n (x) 6⊆ C c , that is, B1/n (x) ∩ C 6= ∅, for every n ∈ N . Choosing a point xn in this intersection, we then obtain a sequence {xn } in C that converges to a point not in C. 8.2.8 Corollary. Let (X, d) be a metric space and let Y be a subspace of X. (a) If X is complete and Y is closed, then Y is complete. (b) If Y is complete, then Y is closed. Proof. (a) Let {yn } be a Cauchy sequence in Y . Since X is complete, there exists x ∈ X such that yn → x. Since Y is closed, x ∈ Y . Therefore, Y is complete. (b) Let {yn } be a sequence in Y such that yn → x ∈ X. Then {yn } is Cauchy and hence converges to some y ∈ Y . Since limits are unique, x = y. Therefore, x ∈ Y , hence Y is closed. 8.2.9 Example. Let C([a, b]) denote the set of all continuous real-valued functions on the interval [a, b]. Each such function is bounded, hence C([a, b]) is a vector subspace of B([a, b]) (8.1.10). Since the uniform limit of continuous functions is continuous (7.2.2), C([a, b]) is closed in the uniform metric. Since B([a, b]) is complete, 8.2.8(a) shows that C([a, b]) is complete. ♦ 8.2.10 Example. The subspace D([a, b]) of C([a, b]) consisting of all differentiable functions is not complete in the uniform metric. To see this take [a, b] = [0, 1] and define a sequence of continuous functions gn (x), n ≥ 2, on [0, 1] such that gn = 1 on [0, 1/2], gn = 0 on [1/2 + 1/n, 1], and gn is linear on [1/2, 1/2 + 1/n]. Also, define g(t) on [0, 1] by g = 1 on [0, 1/2] and g = 0 on (1/2, 1]. (See Figure 8.3.)

gn

g

1

x 1 2

1 2

+

1 n

1

1 2

FIGURE 8.2: The functions gn and g.

1

Metric Spaces

241

Now set fn (x) =

x

Z

gn (t) dt and f (x) =

0

Z

x

g(t) dt,

x ∈ [0, 1].

0

Then fn ∈ D([0, 1]), f ∈ C([0, 1]), and |fn (x) − f (x)| ≤

Z 0

1

|gn − g| =

Z

1/2+1/n

gn =

1/2

1 . 2n

Therefore, fn → f uniformly on [0, 1]. Since f is not differentiable at 1/2, D([0, 1]) is not closed. ♦ 8.2.11 Definition. Let Y be a subset of X. A subset A ⊆ Y is said to be relatively open (relatively closed ) in Y if A is open (closed) in the subspace (Y, d) of (X, d). ♦ 8.2.12 Theorem. Let A ⊆ Y ⊆ X. Then A is relatively open (relatively closed) in Y iff A = Y ∩ B for some open (closed) subset B of X. Proof. By definition, a nonempty open set A in the subspace Y is a union of open balls in Y . The latter are of the form Y ∩ Br (y), where y ∈ Y and Br (y) is an open ball of X. Therefore, A = Y ∩ B, where B is the corresponding union of the open balls Br (y). From the first paragraph, the closed sets of Y are of the form Y \A = Y ∩B c , where B is open in X. Since B c is closed in X, the assertion regarding closed sets follows. 8.2.13 Definition. Let X be a vector space and let a, b ∈ X . The line segment from a to b is defined by [a : b] = {(1 − t)a + tb : 0 ≤ t ≤ 1} . A subset E of X is said to be convex if a, b ∈ E implies [a : b] ⊆ E.

♦

a a

b

b

FIGURE 8.3: Convex and non-convex sets. Recall that, by definition, the convex subsets of R are the intervals. The reader may easily check that if D ⊆ Rp and E ⊆ Rq are convex, then D × E, as a subset of Rp+q , is convex. In particular, Cartesian products I1 × · · · × In of intervals Ij are convex in Rn . Other examples are given in Exercise 5.

242

A Course in Real Analysis

Exercises 1.S Sketch B1 (0, 0) ⊆ R2 for the metrics d1 and d∞ derived from the norms k · k1 and k · k∞ . 2. Prove that a closed ball is closed. 3.S Let x, y be distinct points in a metric space (X, d). Find the largest number r such that Br (x) ∩ Br (y) = ∅. 4. Show that every open subset U of Rn is a countable union of open balls as well as a countable union of bounded open n-dimensional intervals (a1 , b1 ) × · · · × (an , bn ). 5.S Prove that open and closed balls in a normed vector space are convex. Are spheres convex? 6. Show by example that arbitrary intersections of open sets may not be open and that arbitrary unions of closed sets may not be closed. 7. Metrics d and ρ on a set X are said to be topologically equivalent if they have the property that a sequence {xn } converges to x in (X, d) iff it converges to x in (X, ρ). (a) Prove that metrically equivalent metrics are topologically equivalent. (See Exercise 8.1.14.) (b) Prove that d and ρ are topologically equivalent iff (X, d) and (X, ρ) have the same topologies, that is, the metrics produce the same open sets. (c) Are topologically equivalent metrics necessarily metrically equivalent? 8.S Prove that the metric ρ(x, y) = |ex − ey | on R is topologically equivalent to the usual metric. Is R complete in this metric? Is ρ metrically equivalent to the usual metric on R? 9. Let Y be a subspace of (X, d) with the property that for some r > 0, d(x, y) ≥ r for all x, y ∈ Y with x = 6 y. Prove that Y is complete, hence closed. Conclude that finite metric spaces, discrete metric spaces, and the subspaces N and Z of R are complete. 10. Let xn → x0 in (X, d). Prove that the set C := {x0 , x1 , x2 , . . .} is closed in X. 11. Let Y be open (closed) in (X, d). Prove that a subset U of Y is relatively open (relatively closed) in Y iff it is open (closed) in X. 12.S Prove that the set C := {f ∈ C [0, 1] : f (x) = f (1 − x) for all x ∈ [0, 1]} is closed in the supremum metric (8.1.10) but not in the metric of Exercise 8.1.22.

Metric Spaces

243

13. Prove that the subspaces V := f ∈ B [0, +∞) : lim f (x) exists in R and x→+∞ W := f ∈ V : lim f (x) = 0 x→+∞

are closed in the supremum metric.

8.3

Closure, Interior, and Boundary Throughout this section, (X, d) denotes an arbitrary metric space.

8.3.1 Definition. Let E ⊆ X. • The closure cl(E) = clX (E) of E in X is the intersection of all closed subsets of X containing E. • The interior int(E) = intX (E) of E is the union of all open subsets of X contained in E. • The boundary bd(E) = bdX (E) of E is the set cl(E) \ int(E).

♦

8.3.2 Examples. (a) Since every nonempty open set of R (with the usual metric) contains rational and irrational points, int(Q) = int(I) = ∅ and cl(Q) = cl(I) = R, hence bd(Q) = bd(I) = R. For bounded intervals we have cl((a, b)) = [a, b], int([a, b]) = (a, b), and bd((a, b)) = bd([a, b]) = {a, b}. (b) In a discrete metric space a subset E is both open and closed, hence cl(E) = int(E) = E and bd(E) = ∅. ♦ By 8.2.4 and 8.2.6, int(E) is open and cl(E) is closed, hence bd(E) is closed. The following proposition asserts that int(E) is the largest open set contained in E and cl(E) is the smallest closed set containing E. 8.3.3 Proposition. If U is open, C is closed, and U ⊆ E ⊆ C, then U ⊆ int(E) ⊆ E ⊆ cl(E) ⊆ C. Proof. Simply note that U is one of the open sets in the definition of int(E) and that C is one of the closed sets in the definition of cl(E).

244

A Course in Real Analysis

8.3.4 Corollary. Let E ⊆ X. (a) E is open in X iff int(E) = E.

(b) int int(E) = int(E). (d) cl cl(E) = cl(E).

(c) E is closed in X iff cl(E) = E.

Proof. If E is open, take U = E in the proposition. If E is closed, take C = E. This proves (a) and (c). Parts (b) and (d) follow from these. 8.3.5 Proposition. For any subset E of X, c (a) cl(E c ) = int(E) ,

c (b) int(E c ) = cl(E) ,

(c) bd(E) = cl(E) ∩ cl(E c ) = bd(E c ). Proof. For (a) we have c int(E) =

[ U ⊆E U open

c U

=

\

C = cl(E c ).

C⊇E c C closed

Parts (b) and (c) follow from (a). 8.3.6 Proposition. Let E ⊆ X. Then x ∈ cl(E) iff there exists a sequence {an } in E such that an → x. Proof. Let C be the set of all limits of convergent sequences in E, including constant sequences, so E ⊆ C. We show that C = cl(E), which will establish the proposition. First, C is closed. If not, then C c is not open, hence there exists y ∈ C c and for each n a point yn ∈ B1/n (y) ∩ C. By definition of C, each yn is the limit of a sequence in E, hence there exists an ∈ E such that d(yn , an ) < 1/n. By the triangle inequality, d(an , y) < 2/n hence an → y. But then y ∈ C, a contradiction. Therefore C must be closed. It follows that cl(E) ⊆ C. Since cl(E) contains the limit of all convergent sequences in E (8.2.7), C ⊆ cl(E). Therefore, C = cl(E). 8.3.7 Example. (Topologist’s sine curve). Let A = {(x, sin(1/x)) : 0 < x < 2/π} and B = {0} × [−1, 1]. We show that cl(A) = A ∪ B. For the inclusion A ∪ B ⊆ cl(A), note first that 1 2 2 sin : ≤x≤ = [−1, 1], n ∈ Z+ . x (4n + 3)π (4n + 1)π It follows from the intermediate value theorem that for each y ∈ [−1, 1] and n ∈ N there exists xn ∈ R such that 0 < xn ≤

2 and sin(1/xn ) = y. (4n + 1)π

Metric Spaces

245

Since (xn , y) ∈ A and (xn , y) → (0, y), (0, y) ∈ cl(A). Therefore, B ⊆ cl(A), hence A ∪ B ⊆ cl(A). The reverse inclusion will follow if we show that A ∪ B is closed. For this we use 8.2.7. Let {(xn , yn )} be a sequence in A ∪ B with (xn , yn ) → (x, y). Case 1. There exists a subsequence {(xnk , ynk )} that lies in B. Then, since B is closed, (x, y) ∈ B. Case 2. {(xn , yn )} eventually lies in A, so yn = sin(1/xn ) for all sufficiently large n. Since limt→0 sin(1/t) does not exist, x cannot be zero, hence y = sin(1/x), that is, (x, y) ∈ A. In each case (x, y) ∈ A ∪ B, hence A ∪ B is closed. ♦ 8.3.8 Definition. A subset E of X is said to be dense in X if cl(E) = X. Equivalently, every x ∈ X is the limit of a sequence in E. ♦ By 8.3.2, Q and I are dense in R. The set of all points in R2 with rational coordinates is dense in R2 . A discrete space has no proper dense subsets. In Section 8.8 we show that the set of polynomials on [a, b] is dense in C([a, b]) in the uniform norm. 8.3.9 Example. (Dirichlet). If ξ is irrational, then the set E := {nξ + m : m ∈ Z, n ∈ N} is dense in R. To verify this we show that for any x ∈ R and k ∈ N there exists z ∈ E such that |z − x| < 1/k. To this end, let yj = jξ − bjξc, j = 1, . . . , k + 1. Because ξ is irrational, 0 < yj < 1, hence yj must be in one of the intervals (0, 1/k), (1/k, 2/k), . . . , ((k − 1)/k, 1). Since there are only k intervals, one of these must contain yi and yj for some i 6= j.2 By the irrationality of ξ, yj = 6 yi . Hence one of the quantities ±(yj − yi ), call it y, is in E and |y| < 1/k. We consider two cases. If y > 0, choose m ∈ Z such that x + m > 0 and let n be the smallest integer such that ny > x + m. Then n ∈ N and (n − 1)y ≤ x + m, hence z := ny − m ∈ E and 0 < z − x = ny − m − x ≤ y < 1/k. On the other hand, if y < 0, choose m ∈ Z such that x + m < 0 and let n be the smallest integer such that n(−y) > −(x + m), that is, ny < x + m. Again, z := ny − m ∈ E, and in this case, since (n − 1)y ≥ x + m, −1/k < y ≤ ny − m − x = z − x < 0. In either case, |z − x| < 1/k, as required. 2 This

is an instance of the so-called pigeon hole principle.

♦

246

A Course in Real Analysis

8.3.10 Example. We show that the set S := {sin n : n ∈ N} is dense in the interval [−1, 1]. Let x ∈ R and take ξ = 1/2π in the preceding example. Then nk /2π + mk → x for some integer sequences {nk } and {mk } with nk > 0, hence sin nk = sin 2π(nk /2π + mk ) → sin(2πx). Since x was arbitrary, every member of [−1, 1] is the limit of a sequence in S. A similar argument shows that {cos n : n ∈ N} is dense in [−1, 1]. ♦ 8.3.11 Definition. A metric space is said to be separable if it has a countable dense subset. ♦ For example, Rn is separable (consider all points with rational coordinates). An uncountable discrete space is not separable. The space C([a, b]) is separable in the supremum norm (Exercise 19).

Exercises 1. Let (X, d) be a metric space and A, B ⊆ X. Prove the following: (a) S cl(A ∪ B) = cl(A) ∪ cl(B).

(b)

cl(A ∩ B) ⊆ cl(A) ∩ cl(B).

(c)

int(A ∩ B) = int(A) ∩ int(B). (d)

S

(e)

bd(A ∪ B) ⊆ bd(A) ∪ bd(B).

(f)

S

(g)

bd(int(A)) ⊆ bd(A).

(h)

int(A ∪ B) ⊇ int(A) ∪ int(B). bd(cl(A)) ⊆ bd(A). cl(A) = A ∪ bd(A).

Show by examples that the inclusions may be strict. 2. Prove: bd(A ∩ B) ⊆ A ∩ bd(B) ∪ B ∩ bd(A) ∪ bd(A) ∩ bd(B) . Show that the inclusion may be strict. 3. Find cl(A) \ A for A = (a) (c) (e) S

{(1/n, 1/m) : m, n ∈ N} . (b) S (cos t, sin t, e−t ) : t > 0 . t cos t, sin t, : t ∈ R . (d) {(t cos t, t sin t, t) : t > 0} . 1 + |t| cos t sin t t cos t t sin t S , : t>0 . (f) , : t>0 . 1+t 1+t 1+t 1+t

4. An induction argument shows that parts (a) and (c) of Exercise 1 hold for any finite number of sets. Show, by example, that the analogous statements for infinitely many sets are false. 5. Prove that if cl(A) ∩ cl(B) = ∅, then int(A ∪ B) = int(A) ∪ int(B). 6. Let Y be a subspace of (X, d) and A ⊆ Y . Prove that (a)S clY (A) = clX (A) ∩ Y . (c) bdY (A) ⊆ bdX (A).

(b) intX (A) ∩ Y ⊆ intY (A).

Show by examples that the inclusions in (b) and (c) may be strict.

Metric Spaces

247

7. Let xn → x0 in X. Show that cl {x1 , x2 , . . .} = {x0 , x1 , x2 , . . .}. 8.S Let fn (x) = xn, 0 ≤ x ≤ 1. Show that the set{f1 , f2 , . . .} is closed in C([0, 1]), k · k∞ . Is it closed in C([0, 1]), k · k1 ? 9. Let B = Br (x0 ) and C = Cr (x0 ). Prove that (a)S B ⊆ int(C).

(b) cl(B) ⊆ C.

(c) bd(B) ⊆ C \ B.

Show, by example, that the inclusions may be strict. 10. Prove that in a normed vector space the inclusions in Exercise 9 are equalities. 11. Prove that the set E = {(x, y) : x, y ∈ Q and x 6= y} is neither open nor closed and is dense in Euclidean space R2 . 12. Let x ∈ R, r ∈ Q, r 6= 0. In each case, find the largest interval in which the given set is dense. (a) {sin(rn) : n ∈ N}. (c) {sin n cos n : n ∈ N}.

(b)S {sin(x + n) : n ∈ N}. (d) tan2 n : n ∈ N .

13. Show that limn sin(πnx) does not exist for any irrational number x. Conclude that limn sin(nr) does not exist for any nonzero rational number r. 14. (a) Let E be dense in X and let F be a proper finite subset of E. Show that E \ F is dense in X \ F . Is E \ F is necessarily dense in X? (b) Let X be a normed vector space with {a1 , a2 , . . .} dense in X . Show that {an : n ≥ N } is dense in X for every N ∈ N. Conclude that {sin n : n ≥ N } is dense in [−1, 1]. 15. Show that lim inf n sin n = −1 and lim supn sin n = 1. 16.S Let Y be dense in X and U ⊆ X open. Show that U ∩ Y is dense in U . What if U is not open? 17. Let X = Rn with the Euclidean metric and let Y ⊆ X have the property of Exercise 8.2.9. Prove that Y c is open and dense in X. Conclude that Nc and Zc are open and dense in R. 18. Show that in a separable space, every nonempty open set U is a countable union of open balls. 19. Use the Weierstrass approximation theorem (8.8.5, below) to show that C([a, b]), k · k∞ is separable 20. (a)S Let {Ii : i ∈ I} be a family of open intervals in R S with the property that each pair has a nonempty intersection. Show that i∈I Ii is an open interval. (b) Prove that every nonempty open set in R is a countable union of disjoint open intervals.

248

A Course in Real Analysis

8.4

Limits and Continuity

In this section, (X, d), (Y, ρ), and (Z, µ) denote arbitrary metric spaces. 8.4.1 Definition. Let E ⊆ X. A member a ∈ X is said to be an accumulation point of E if E ∩ Br (a) \ {a} 6= ∅ for each r > 0. A member of E that is not an accumulation point is called an isolated point of E. ♦ It follows from the definition that a is an accumulation point of E iff there exists a sequence of distinct points of E converging to a. No subset of a discrete metric space has an accumulation point. The set of functions x 7→ xn in C([0, 1]), n ∈ N, has no accumulation points in the uniform norm but the identically zero function is an accumulation point in the norm k · k1 . 8.4.2 Definition. Let E ⊆ X, f : E → Y , and let a ∈ X be either a member of E or an accumulation point of E. If b ∈ Y , we write b = lim{x→a, x∈E} f (x) if for each ε > 0 there exists δ > 0 such that x ∈ E and d(x, a) < δ implies ρ(f (x), b) < ε. In the special case E = X \ {a}, we write simply b = limx→a f (x).

(8.3) ♦

Note that condition (8.3) may be written f E ∩ Bδ (a) ⊆ Bε (b). This observation will be useful later in proving a global characterization of continuity. Many of the results in Chapter 3 on limits of functions on subsets of R hold for real-valued functions defined on a metric space. These include the theorems on limits of sums, products, and quotients of functions, the comparison theorem, the squeeze principle, and the sequential characterization of limit. The statements and proofs are essentially the same: simply replace |x − y| by the metric d(x, y). For future reference, we explicitly state: 8.4.3 Sequential Characterization of Limit. Let a be an accumulation point of E ⊆ X and let f : E → Y . Then lim{x→a, x∈E} f (x) exists and equals b ∈ Y iff f (an ) → b for all sequences {an } in E with an → a. The following theorem gives sufficient conditions for a double limit to equal an iterated limit.

Metric Spaces

249

8.4.4 Iterated Limit Theorem. Let X×Y have the product metric η := d×ρ, and let a and b be accumulation points of X \ {a} and Y \ {b}, respectively. If f : X × Y \ {(a, b)} → Z has the properties (a) g(x) := limy→b f (x, y) exists in Z for each x ∈ X, and (b) z := lim(x,y)→(a,b) f (x, y) exists in Z, then limx→a g(x) exists and equals z. Proof. Given ε > 0, by (b) choose δ > 0 such that µ f (x, y), z < ε for all (x, y) ∈ X × Y with 0 < η (x, y), (a, b) < δ. Let 0 < d(x, a) < δ. Then, for all y sufficiently near b, η (x, y), (a, b) < δ, hence µ g(x), z ≤ µ g(x), f (x, y) + µ f (x, y), z < µ g(x), f (x, y) + ε. Letting y → b in this inequality, noting that f (x, y) → g(x), we obtain µ g(x), z ≤ ε. This shows that limx→a g(x) = z. The theorem implies that lim

(x,y)→(a,b)

f (x, y) = lim lim f (x, y) = lim lim f (x, y) x→a y→b

y→b x→a

provided the limit on the left exists and inner limits on the right exist for each x and y, respectively. The limits on the right are called iterated limits and the limit on the left is sometimes called a double limit. In particular, if the iterated limits exist and are unequal, then the double limit cannot exist. In many cases, the iterated limit theorem (suitably modified) still holds if f is defined on subsets E of X × Y more general than X × Y \ {(a, b)}. This is the case in Examples (c) and (d) that follow. 8.4.5 Examples. In (a)–(e), X = Y = Z = R. Note that in this case the product metric η is equivalent to the Euclidean metric on R2 . (a) Let E = (0, +∞) × (0, +∞). To calculate the limit lim

(x,y)→(0,0) (x,y)∈E

we write the function as

sin(x + 2y) 2x + y

sin(x + 2y) x + 2y . x + 2y 2x + y

As (x, y) → (0, 0) along E, the first factor tends to 1 but the second factor has no limit. Indeed, along a path y = mx, m > 0, x > 0, x + 2y x + 2mx 1 + 2m = = . 2x + y 2x + mx 2+m

250

A Course in Real Analysis

Therefore, the double limit does not exist. The iterated limits exist and are unequal: lim lim f (x, y) = lim+

y→0+ x→0+

y→0

sin(2y) = 2, y

lim lim f (x, y) = lim+

x→0+ y→0+

x→0

sin x 1 = . 2x 2

(b) Let E be as in (a) and let p, q > 0. The limit L :=

xp + y q (x,y)→(0,0) x2 + y 2 lim

(x,y)∈E

exists iff p, q > 2 or p = q = 2. In the former case, L = 0 and in the latter, L = 1. This is best seen by converting to polar coordinates x = r cos θ, y = r sin θ, 0 < θ < π/2: L = lim rp−2 cosp θ + rq−2 sinq θ . r→0+

Both iterated limits exist iff p, q ≥ 2. (c) Let E = {(x, y) : x > 0, y > 0, x 6= y}. Then xp − y p (x,y)→(0,0) x − y lim

(x,y)∈E

exists iff p ≥ 1 and has zero limit if p > 1. Indeed, if 0 < x < y, then, by the mean value theorem, there exists t ∈ (x, y) such that xp − y p = ptp−1 (x − y), hence xp − y p pxp−1 < < py p−1 , x−y and the assertion follows from the squeeze principle. Clearly, the iterated limits exist (hence equal the double limit) iff p ≥ 1. (d) Let E be as in (c). Then xp + y p (x,y)→(0,0) y − x lim

(x,y)∈E

does not exist for any value of p Indeed, along the path y = mx, m, x > 0, m 6= 1, the function has values xp + (mx)p xp−1 (1 + mp ) = mx − x 1−m so the limit cannot exist if p ≤ 1. Let p > 1 and set θr = mrp−1 + π/4. Along the path given by r x = r cos θr = √ cos mrp−1 − sin mrp−1 2 r y = r sin θr = √ cos mrp−1 + sin mrp−1 , 2

Metric Spaces

251

where r ↓ 0, the function has values xp + y p 1 mrp−1 =√ cosp θr + sinp θr , p−1 ) y−x sin(mr 2m which tends to 2(1−p)/2 /m as r → 0. Neither of the iterated limits exists if p < 1. If p > 1, then clearly xp + y p xp + y p = lim lim = 0, y→0 x→0 y − x x→0 y→0 y − x lim lim

and if p = 1, then xp + y p xp + y p = −1, while lim lim = 1. y→0 x→0 y − x x→0 y→0 y − x lim lim

(e) Let E = {(x, y) : x > 0, y > 0}. Then xp y (x,y)→(0,0) x + y lim

(x,y)∈E

exists iff p > 0, in which case the limit is zero. Indeed, along the path y = mx the function has values mxp /(1 + m), so the limit cannot exist if p ≤ 0. If p > 0, one can introduce polar coordinates as in (b). Both iterated limits exist iff p ≥ 0, but are unequal if p = 0. ♦ 8.4.6 Definition. A function f : X → Y is said to be continuous at a point a ∈ X if limx→a f (x) = f (a). Also, f is said to be continuous on a set E ⊆ X if f is continuous at each point of E. If E = X, then f is simply said to be continuous. If f is one-to-one and onto Y and if f −1 : Y → X is continuous, then f is called a homeomorphism. ♦ From the sequential characterization of limit we have 8.4.7 Sequential Characterization of Continuity. Let f : X → Y and a ∈ X. Then is continuous at a iff f (an ) → f (a) for all sequences {an } in X with an → a. The next theorem gives an important global characterization of continuity. 8.4.8 Theorem. Let f : X → Y . The following statements are equivalent: (a) f is continuous. (b) f −1 (V ) is open in X for each open subset V of Y . (c) f −1 (C) is closed in X for each closed subset C of Y .

252

A Course in Real Analysis

Proof. That (b) and (c) are equivalent follows from the general set-theoretic c identity f −1 B c = f −1 (B) . (a) ⇒ (b): Let V ⊆ Y be open. If x ∈ f −1 (V ), then f (x) ∈ V so there exists ε > 0 such that Bε f (x) ⊆ V . By continuity there exists δ > 0 such −1 that f Bδ (x) ⊆ Bε f (x) . Therefore, f Bδ (x) ⊆ V , hence B (V ). δ (x) ⊆ f −1 (b) ⇒ (a): Let x ∈ X and ε > 0. Since U := f Bε f (x) is open in X and contains x, we may choose δ > 0 such that Bδ (x) ⊆ U . Then f Bδ (x) ⊆ f (U ) ⊆ Bε f (x) , which shows that f is continuous at x. 8.4.9 Definition. A function f : X → Y is said to be uniformly continuous on a set E ⊆ X if, given ε > 0, there exists δ > 0 such that ρ(f (u), f (v)) < ε for all u, v ∈ E with d(u, v) < δ.

♦

8.4.10 Example. The function f (x, y) =

1 2.1 + sin x + sin y

is uniformly continuous on R2 . Indeed, for all (x, y), (a, b) ∈ R2 , | sin x + sin y) − (sin a + sin b)| (2.1 + sin x + sin y)(2.1 + sin a + sin b) | sin x − sin a| + | sin y − sin b| ≤ (2.1 + sin x + sin y)(2.1 + sin a + sin b) ≤ 100| sin x − sin a| + 100| sin y − sin b)|

|f (x, y) − f (a, b)| =

≤ 100(|x − a| + |y − b|) p ≤ 200 (x − a)2 + (y − b)2 .

♦

The proof of the following theorem is entirely analogous to that of 3.5.2. The details are left to the reader. 8.4.11 Sequential Characterization of Uniform Continuity. A function f : X → Y is uniformly continuous on E ⊆ X iff ρ f (un ), f (vn ) → 0 for all sequences {un } and {vn } in E with d(un , vn ) → 0. For example, every function on a discrete metric space is uniformly continuous,R since eventually un = vn . The indefinite integral function x F (f )(x) = a f (t) dt on the space C([a, b]) is uniformly continuous with respect to the uniform norm, since kfn − gn k∞ → 0 ⇒ kF (fn ) − F (gn )k∞ → 0. The addition function (x, y) 7→ x + y is uniformly continuous on R2 since (xn , yn ) − (an , bn ) → (0, 0) clearly implies that xn + yn − (an + bn ) → (0, 0). On the other hand, the multiplication function (x, y) 7→ xy is not uniformly continuous on R2 , since (n+1/n, n+1/n)−(n, n) → 0 but (n+1/n)2 −n2 → 1.

Metric Spaces

253

Exercises 1. For each of the functions f (x) below, find lim{x→0, x∈E} f (x) and the corresponding iterated limits or show that the limits fail to exist. In each case take E to be the natural domain of the function. (a) (d) (g) (j) (m) (p)

y 2 + sin2 x . 3x2 + 2y 2 sin x sin y p . x2 + y 2

x2 y 2 x2 y . (c) . 2 4 + 2y x + 7y 4 x4 1 (e) S 4 . (f) (x + y) sin 2 . 2 4 x − xy + y x + y2 p (1 + x2 )(1 + y 2 ) − 1 sin(3xy 2 + 2xy 3 ) xy 2 cos(xy) S . (h) . (i) . 2 2 2 xy x +y x2 + y 2 3x + 2y x2 + |y|2.1 1 − cos(xy) S . (l) . . (k) sin x sin y x2 + y 2 (x2 + y 2 )1/3 p 1 − cos |xy| sin x ± sin y x−y . (n) . (o) S . |x|p x−y ln x − ln y xy + yz + xz x|y|1.1 3x2 + 2y 2 + z 2 p . (r) S p . (q) . x2 + y 2 + z 2 sin2 x2 + y 2 x2 + y 2 + z 2 (b) S

5x2

2.S Let a > 0, p > 1. Evaluate the limit x2 − 5y 2 . (x,y)→(0,0) x2 + 3y 2 lim

(x,y)∈E

for the sets (a) E = {(x, y) : |y| ≤ a|x|p |}

(b) E = {(x, y) : |y| < |x|}.

3.S Let f be continuously differentiable on (−π/2, π/2). Define g on the set E := {(x, y) ∈ (−π/2, π/2)2 : x 6= y} by g(x, y) =

f (x) − f (y) . sin x − sin y

Show that g has a continuous extension to (−π/2, π/2)2 . 4. Let f and g be continuously differentiable on some open interval (a, b) and suppose that g 0 6= 0. Define h on the set E := {(x, y) ∈ (a, b)2 : x 6= y} by h(x, y) =

f 2 (x) − f 2 (y) . g(x) − g(y)

Prove that h has a continuous extension to (a, b)2 .

254

A Course in Real Analysis

5. Let f : X → Y . Prove that the following statements are equivalent: (a) f is continuous. (b) f cl(A) ⊆ cl f (A) for each subset A of X. (c) cl f −1 (B) ⊆ f −1 (cl(B)) for each subset B of Y . (d) f −1 int(B) ⊆ int f −1 (B) for each subset B of Y . 6.S Show that d : X × X → R is uniformly continuous with respect to the product metric η := d × d on X × X. 7.S Let f : [0, a) → R and g(x, y) := f

p

x2 + y 2 ,

p x2 + y 2 < a.

(a) Prove that g is uniformly continuous iff f is uniformly continuous. (b) Use (a) to show that the functions p x2 + y 2 , p

1 x2

+

y2

+1

, and sin

p

x2 + y 2

are uniformly continuous on R2 but sin(x2 + y 2 ) is not. 8.S Let f (x) be uniformly continuous on R. Prove that the function g(x, y) := f (αx + βy) is uniformly continuous on R2 . Give an example of a bounded uniformly continuous function f on R such that the function h(x, y) := f (xy) is not uniformly continuous on R2 . 9. Show that the function f (x, y) =

1 1 − sin x sin y

is uniformly continuous on the set Er := [−π/2 + r, π/2 − r] × [−π/2 + r, π/2 − r] for any 0 < r < π/2, but is not uniformly continuous on E := (−π/2, π/2) × (−π/2, π/2). 10. Let f : (X, d) → (Y, ρ) and g : (Y, ρ) → (Z, µ) be (uniformly) continuous. Prove that g ◦ f : (X, d) → (Z, µ) is (uniformly) continuous. 11.S Let f : X → Rk , say f (x) = f1 (x), . . . , fk (x) . Prove that f is (uniformly) continuous iff each fj is (uniformly) continuous. 12.S Let fn : (X, d) → (Y, ρ) converge uniformly to f on X. Prove that if each fn is (uniformly) continuous, then f is (uniformly) continuous.

Metric Spaces

8.5

255

Compact Sets

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces. Compactness is one of the most important concepts in analysis. For example, it allows the formulation of results such as the extreme value theorem and the uniform continuity theorem in the context of general metric spaces. It is also the key feature that distinguishes the finite dimensional space Rn from its infinite dimensional counterparts `∞ and `1 . 8.5.1 Definition. Let E ⊆ X. A collection U = {Ui : i ∈ I} of subsets of X is called a cover of E if E is contained in the union of the sets Ui . If each Ui is open, then U is called an open cover of E. A cover U of E is said to have a finite subcover if there exists a finite subset I0 of I such that {Ui : i ∈ I0 } is a cover of E. If every open cover of E has a finite subcover, then E is said to be compact. ♦ Finite subsets of a metric space are compact. In a discrete metric space, these are the only compact sets. Indeed, if E is an infinite subset of a discrete space, then {x} : x ∈ E is an open cover of E with no finite subcover. 8.5.2 Proposition. A compact subset of a metric space is closed and bounded. Proof. Let E be compact and let a ∈ E c . For each x ∈ E let Ux and Vx denote disjoint open balls with centers x and a, respectively (see Figure 8.5). Then {Ux : x ∈ E} is an open cover of E, hence there T exists a finite subset E0 of E such that {Ux : x ∈ E0 } covers E. Set V = x∈E0 Vx . Then V is an open ball with center a, and since V ∩ Ux = ∅ for each x ∈ E0 , V ⊆ E c . Therefore E c is open.

a Vx E

x Ux

FIGURE 8.4: The neighborhoods Ux and Vx . To show that E is bounded, choose any x ∈ X and consider the open cover {Bn (x) : n ∈ N} of E. Let F be a finite subset of N such that {Bn (x) : n ∈ F } covers E. Then E ⊆ Bm (x), where m is the largest member of F .

256

A Course in Real Analysis

√ √ The converse of 8.5.2 is false. For example, the set Q ∩ [− 2, 2] is closed and bounded in Q but not compact. Indeed, if {rn } is a sequence √ √ √ in Q with rn ↑ 2, then {(−rn , rn ) : n ∈ N} is an open cover of Q ∩ [− 2, 2] with no finite subcover. For another example, consider a discrete metric space. Here, the entire metric space is closed and bounded but only finite sets are compact. 8.5.3 Proposition. A closed subset of a compact metric space is compact. Proof. Let X be compact, E ⊆ X closed, and let U = {Ui : i ∈ I} be an open cover of E. Then U ∪ {E c } is an open a finite S cover of X, hence there exists S subset I0 of I such that X = E c ∪ λ∈I0 Ui . It follows that E ⊆ i∈I0 Ui . Closely related to compactness is the notion of total boundedness. 8.5.4 Definition. Let E ⊆ X and ε > 0. An ε-net for E is a set F ⊆ X such that {Bε (x) : x ∈ F } covers E. E is said to be totally bounded if for each ε > 0 there exists a finite ε-net for E. ♦ An ε-net F for E has the property that every member of E is within ε of a member of F . For example, Q is an ε-net for R, and Z is a 1-net for R. The following proposition shows that the set F in the definition of total boundedness may be taken to be a subset of E. 8.5.5 Proposition. If E has a finite ε-net F , then E has a finite 2ε-net contained in E. Proof. For each x ∈ F , apply the following procedure: If E ∩Bε (x) = ∅, remove x from F . Otherwise, choose any a ∈ E ∩ Bε (x) and replace Bε (x) by B2ε (a)

ε x a 2ε E

FIGURE 8.5: A 2ε-net. and x in F by a. Since Bε (x) ⊆ B2ε (a), the revised set is a finite 2ε-net for E contained in E. Since a finite union of open balls is bounded (Exercise 8.1.3), every totally bounded set is bounded. The converse is false. For example, in a discrete space all sets are bounded but no infinite set can be totally bounded. Open and closed balls in C([0, 1]) with the supremum norm are bounded but not totally bounded (Exercise 8). Contrast this with the following example:

Metric Spaces

257

8.5.6 Example. Every bounded subset E of Rn is totally bounded. To see this, √ let ε > 0 and choose k ∈ N so large that E ⊆ [−kδ, kδ]n , where 0 < δ < 2ε/ n. Subdividing, we see that I is a finite union of sets of the form J := [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ], where bj − aj = δ. √ (See Figure 8.6.) The largest diagonal in J has length n δ < 2ε, hence J may

kδ

c

J B (c)

E −kδ

kδ

−kδ FIGURE 8.6: A bounded set in Rn is totally bounded. be enclosed in an open ball with radius ε and center c = (c1 , . . . cn ), where cj = (aj + bj )/2. The resulting collection of balls is a finite ε-cover of E. ♦ 8.5.7 Definition. A subset E of X is said to be sequentially compact if every sequence in E has a cluster point in E. ♦ By the Bolzano–Weierstrass theorem, closed and bounded intervals√in R are sequentially compact. The same is not true in Q; for example, Q ∩ [0, 2] is not sequentially compact. In a discrete space, no infinite set can be sequentially compact since sequences with distinct terms cannot converge. 8.5.8 Heine–Borel Theorem. The following statements are equivalent: (a) X is compact. (b) X is sequentially compact. (c) X is complete and totally bounded. Proof. (a) ⇒ (b): We prove the contrapositive ∼(b) ⇒ ∼(a). Let {an } be a sequence in X with no cluster point. Then for each x ∈ X there must exist an open ball B(x) with center x that contains only finitely many terms of the sequence. This implies that every finite subcover of the open cover

258

A Course in Real Analysis

{B(x) : x ∈ X} of X contains only finitely many terms of the sequence and hence cannot cover X. Therefore, X is not compact. (b) ⇒ (c): Let X be sequentially compact and let {an } be a Cauchy sequence in X. By hypothesis, {an } has a convergent subsequence, say ank → a ∈ X. By Exercise 8.1.9, an → a. Therefore, X is complete. Suppose that X is not totally bounded. Then there exists ε > 0 such that no finite collection of open balls of radius ε covers X. Choose any a1 ∈ X. Since Bε (a1 ) does not cover X, there exists a2 ∈ X \ Bε (a1 ). Since Bε (a1 ) ∪ Bε (a2 ) does not cover X, there exists a3 ∈ X \ Bε (a1 ) ∪ Bε (a2 ) . Continuing in this fashion, we construct a sequence {an } in X such that an ∈ X \ Bε (a1 ) ∪ Bε (a2 ) ∪ · · · ∪ Bε (an−1 ) . It follows that d(an , am ) ≥ ε for all m 6= n. But then no subsequence of {an } can converge. Therefore, X must be totally bounded. (c) ⇒ (a): Assume that X is complete and totally bounded but not compact. Then X has an open cover U = {Ui : i ∈ I} with no finite subcover. For each k let Fk be a finite set of points in X such that {B1/k (x) : x ∈ Fk } is a cover of X. Consider the case k = 1. If for each x ∈ F1 the ball B1 (x) could be covered by finitely many members of U, then X itself would have such a cover, contradicting our assumption. Thus there exists x1 ∈ F1 such that E1 := B1 (x1 ) cannot be covered by finitely many members of U. Since {B1/2 (x) : x ∈ F2 } covers X, {E1 ∩ B1/2 (x) : x ∈ F2 } covers E1 , so by similar reasoning there exists x2 ∈ F2 such that E2 := E1 ∩ B1/2 (x2 ) cannot be covered by finitely many members of U. In this manner we construct a sequence of points xn in X and decreasing sets En = B1 (x1 ) ∩ B1/2 (x2 ) ∩ · · · ∩ B1/n (xn ) = En−1 ∩ B1/n (xn )

(8.4)

that cannot be covered by finitely many members of U. In particular, En 6= ∅. Choose a point yn ∈ En . If n > m, then yn ∈ Em , hence from (8.4) d(xm , xn ) ≤ d(xm , yn ) + d(yn , xn ) < 1/m + 1/n. It follows that {xn } is a Cauchy sequence. Since X is complete, xn → x for some x ∈ X. Choose i ∈ I such that x ∈ Ui . Since Ui is open, there exists r > 0 such that Br (x) ⊆ Ui . Next, choose n > 2/r such that d(xn , x) < r/2. By the triangle inequality, B1/n (xn ) ⊆ Br (x). But then En ⊆ Ui , contradicting the noncovering property of En . Therefore, X must be compact, completing the proof. 8.5.9 Corollary. A subset of Rn is compact iff it is closed and bounded. Proof. We have already seen that a compact set in a metric space is closed and bounded. Conversely, let C ⊆ Rn be closed and bounded. Since Rn is complete (Exercise 8.1.17), C is complete (8.2.8). Since C is bounded, it is totally bounded (8.5.6). By the theorem, C is compact.

Metric Spaces

259

The validity of the preceding corollary ultimately rests on the finite dimen sionality of Rn . For infinite dimensional normed spaces such as C [0, 1] , a closed and bounded set need not be compact (Exercise 8). In the next section, we characterize the compact subsets of spaces like C [0, 1] . 8.5.10 Theorem. If f : X → Y is continuous and X is compact, then f (X) is compact. Proof. Let {Vi : i ∈ I} be an open cover of f (X) in Y . For each i ∈ I, set Ui = f −1 (Vi ). Then {Ui : i ∈ I} is an open cover of X, hence there exists a finite subset I0 of I such that {Ui : i ∈ I0 } is a cover of X. It follows that {Vi : i ∈ I0 } is a finite cover of f (X). 8.5.11 Corollary. Let f : X → Y be continuous, one-to-one, and onto Y . If X is compact then f −1 : Y → X is continuous, hence f is a homeomorphism. Proof. Let g = f −1 and let C be a closed subset of X. Then C is compact (8.5.3), hence, by the theorem, g −1 (C) = f (C) is compact and therefore closed in Y (8.5.2). By 8.4.8, g is continuous. Corollary 8.5.11 is false for noncompact X (Exercise 19). 8.5.12 Extreme Value Theorem. If f : X → R is continuous and X is compact, then there exist points xm and xM in X such that f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ X. Proof. By 8.5.10 and 8.5.2, f (X) is closed and bounded in R and therefore contains its supremum and infimum. 8.5.13 Theorem. If f : X → Y is continuous and X is compact, then f is uniformly continuous. Proof. Let ε > 0. By continuity, for each x ∈ X there exists γx > 0 such that f Bγx (x) ⊆ Bε/2 f (x) . (8.5) Set δx = γx /2. The collection {Bδx (x) : x ∈ X} is an open cover of X, hence there exists a finite set F ⊆ X such that the collection {Bδx (x) : x ∈ F } covers X. Let δ := minx∈F δx and let a, b ∈ X with d(a, b) < δ. Choose x ∈ F such that a ∈ Bδx (x). Then d(x, a) < δx < γx

and d(x, b) ≤ d(a, b) + d(x, a) < δx + δx = γx ,

so a, b ∈ Bγx (x). By (8.5), ρ f (a), f (b) ≤ ρ f (a), f (x) + ρ f (x), f (b) < ε/2 + ε/2 = ε. Therefore, f is uniformly continuous.

260

A Course in Real Analysis The following is a generalization of 3.5.9.

8.5.14 Corollary. Let X be compact, Y complete, E a dense subset of X, and f : E → Y continuous. The following statements are equivalent: (a) lim{x→a, x∈E} f (x) exists for each a ∈ X. (b) f has a continuous extension to X; that is, there exists a continuous function g : X → Y such that g|E = f . (c) f is uniformly continuous on E. Proof. (a) ⇒ (b): For each a ∈ X define g(a) = lim{x→a, x∈E} f (x). Since f is continuous, g|E = f . If g is not continuous at a ∈ X, then thereexist ε > 0 and a sequence {xn } in X such that xn → a and ρ g(xn ), g(a) ≥ ε for all n. By definition of g(xn ), for each n we may choose an ∈ E such that d(xn , an ) < 1/n and ρ g(xn ), f (an ) < ε/2. Then an → a but ρ f (an ), g(a) ≥ ρ g(xn ), g(a) − ρ g(xn ), f (an ) > ε/2, contradicting the definition of g(a). Therefore, g is continuous. (b) ⇒ (c): By 8.5.13, g is uniformly continuous on X, hence f is uniformly continuous on E. (c) ⇒ (a): Let a ∈ X and let {xn } be a sequence in E such that xn → a. Since f is uniformly continuous, {f (xn )} is Cauchy and therefore converges to some b ∈ Y . If {yn } is another sequence in E such that yn → a, then d(yn , xn ) → 0 so, by uniform continuity again, ρ f (yn ), f (xn ) → 0, hence f (yn ) → b. By the sequential criterion for limits, lim{x→a, x∈E} f (x) exists and equals b.

Exercises 1. Determine which of the following subsets of R2 are closed, bounded, or compact. (a) S {(x, y) : 2x2 + y 2 + 6y ≤ 8x}.

(b) S {(x, y) : 3x2 + 2y ≤ 6x}.

(c) {(x, y) : xy = 1}.

(d) {(x, y) : x1/3 + y 1/3 = 1}. x cos x x sin x (f) S , :x≥0 . 1+x 1+x

(e) {(x, y) : x2/3 + y 2/3 = 1}. −x (g) (e cos x, e−x sin x) : x ≥ 0 . (h) S {(x, y) : x3 /y + y 3 /x > 0}.

2. Let {xn } be a convergent sequence in X with xn → x0 . Prove that the set {x0 , x1 , x2 , . . .} is compact. 3.S Prove that a finite union of totally bounded (compact) sets is totally bounded (compact).

Metric Spaces

261

4.S Prove that the intersection of an arbitrary family of compact subsets of a metric space X is compact. 5. Prove that X × Y is compact in the product metric η := d × ρ iff X and Y are compact. 6. Prove that the closure of a totally bounded subset of a metric space is totally bounded. 7.S Prove that a subset E of a complete metric space X is totally bounded iff every sequence in E has a cluster point in X. 8. Prove that in C([0, 1]), k · k∞ , the closed ball with radius 1 and center the zero function is not compact. 9. Let C0 ([0, +∞)) be the vector subspace of B([0, +∞)) consisting of all realvalued continuous functions f on [0, +∞) such that limx→+∞ f (x) = 0. Prove that C0 ([0, +∞)) is closed in the uniform norm and that the closed ball C1 (0) in C0 ([0, +∞)) with radius 1 and center the zero function is not compact and therefore is not totally bounded. 10. For n ∈ N, define fn ∈ B([0, +∞)) by fn = 1 on [n, n + 1] and zero elsewhere. Prove that the set E := {f1 , f2 , . . .} is bounded but not totally bounded in the sup metric. 11.S (Cantor’s intersection theorem). Let C1 , C2 , . . . be a sequence of nonempty compact subsets of a metric space X such that Cn+1 ⊆ Cn T∞ for all n. Prove that n=1 Cn 6= ∅. 12. A collection of subsets of a metric space X is said to have the finite intersection property if every finite subcollection has a nonempty intersection. Prove that X is compact iff every collection of closed subsets of X with the finite intersection property has a nonempty intersection. 13.S The diameter of a nonempty subset A of (X, d) is defined by d(A) := sup {d(a, b) : a, b ∈ A} . (a) Prove that if A is compact, then there exist points a, b ∈ A such that d(A) = d(a, b). (b) Give an example of a closed and bounded set A in a metric space such that d(A) > d(a, b) for all a, b ∈ A. 14. ⇓3 The distance between nonempty subsets A and B of (X, d) is defined as d(A, B) := inf {d(a, b) : a ∈ A, b ∈ B} . 3 This

exercise will be used in 8.7.2.

262

A Course in Real Analysis (a) Prove that if A and B are disjoint with A closed and B compact, then d(A, B) > 0. (b) Show by example that the conclusion in (a) is false if B is merely closed. (c) Show that if both sets are compact, then there exist a ∈ A and b ∈ B such that d(A, B) = d(a, b).

15.S ⇓4 Let A be a nonempty subset of X and define d(A, ·) : X → R by d(A, x) = d(A, {x}) (see Exercise 14). Prove the following: (a) |d(A, x) − d(A, y)| ≤ d(x, y), hence d(A, ·) is uniformly continuous. (b) d(A, x) = 0 iff x ∈ cl(A). (c) If A and B are disjoint closed sets, then the function FAB (x) =

d(x, A) , d(x, A) + d(x, B)

x ∈ X,

is well-defined and continuous, 0 ≤ FAB ≤ 1 on X, and A = {x : FAB (x) = 0}, B = {x : FAB (x) = 1}. (d) If A and B are disjoint closed sets of X, then there exist disjoint open sets U and V such that A ⊆ U and B ⊆ V . (U and V are then said to separate A and B.) 16. Referring to 8.1.10, show that the set {f ∈ `∞ : |f (n)| ≤ e−n } is compact. Is {f ∈ B([1, +∞)) : |f (x)| ≤ e−x } compact? 17. (Lebesgue’s number). Let X be compact and let U = {Ui : i ∈ I} be an open cover of X. Prove that there exists a number r > 0 such that every set with diameter < r (Exercise 13) is contained in some Ui . 18. (Dini’s Theorem). Let X be compact and let fn , g : X → R be continuous such that either fn ↓ g or fn ↑ g on X. Prove that the convergence is uniform. (See 7.1.12.) 19.S Let f : [0, 2π) → R2 be defined by f (t) = (cos t, sin t). Show that f is continuous, one-to-one, and maps [0, 2π) onto the circle x2 + y 2 = 1 but has a discontinuous inverse. 20. Let f : R2 → R be defined by f (x, y) = x. Prove or disprove: (a) If E ⊆ R2 is closed, then f (E) is closed. (b) If E ⊆ R2 is open, then f (E) is open. 4 This

exercise will be used in 11.2.17.

Metric Spaces

263

21.S Let A and B be compact subsets of R. Prove that the sets AB := {ab : a ∈ A, b ∈ B} and A + B := {a + b : a ∈ A, b ∈ B} are compact. 22. ⇓5 Let a sequence of continuous functions fn : (X, d) → (Y, ρ) converge uniformly to f on X, let C ⊆ X be compact, and let U ⊆ Y be open. Prove that if f (C) ⊆ U , then fn (C) ⊆ U for all sufficiently large n.

*8.6

The Arzelà–Ascoli Theorem

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces and C(X, Y ) denotes the set of all continuous functions from X to Y . As noted in the previous section, closed and bounded subsets in infinite dimensional spaces such as C [0, 1] need not be compact. The additional property of equicontinuity is needed to characterize compact subsets of such spaces. 8.6.1 Definition. A family F of functions in C(X, Y ) is said to be a ∈ X if, for each ε > 0, there exists δ > 0 • equicontinuous at a point such that ρ f (x), f (a) < ε for all x ∈ X with d(x, a) < δ and all f ∈ F; • equicontinuous on E ⊆ X if F is equicontinuous at each point of E; • uniformly equicontinuous on E if, for each ε > 0, there exists δ > 0 such that ρ f (x), f (y) < ε for all f ∈ F and all x, y ∈ E with d(x, y) < δ.♦ The distinguishing feature of equicontinuity is that, while δ may vary with the point a, it is independent of the functions f ∈ F. With uniform equicontinuity, δ is independent of both f and a. 8.6.2 Example. For each x, t ∈ R, define ft (x) = tx. Let I = (c, d) be a bounded interval and set M = max{|c|, |d|}. The inequality |ft (x) − ft (y)| = |t| |x − y| ≤ M |x − y|, t ∈ I, shows that the collection of functions {ft : t ∈ I} is uniformly equicontinuous on R. On the other hand, the larger collection {ft : t ∈ R} is not equicontinuous at any a ∈ R. Indeed, no δ can be chosen so that |tx − ta| < 1 for all t ∈ R and all x ∈ R with |x − a| < δ. ♦ 5 This

exercise will be used in 13.6.5.

264

A Course in Real Analysis A straightforward modification of the proof of 8.5.13 yields

8.6.3 Theorem. If X is compact and F is equicontinuous on X, then F is uniformly equicontinuous. 8.6.4 Definition. A metric space is said to have the Bolzano–Weierstrass property if every bounded sequence has a cluster point. ♦ A compact metric space and the space Rn have the Bolzano–Weierstrass property, while infinite discrete metric spaces, the space Q, and the infinite dimensional space C [0, 1] , k · k∞ do not. 8.6.5 Proposition. (a) A metric space with the Bolzano–Weierstrass property is complete. (b) A metric space has the Bolzano–Weierstrass property iff every closed and bounded set is compact. Proof. For (a), use the fact that a Cauchy sequence is bounded and apply Exercise 8.1.9. Part (b) follows from 8.5.8. The following lemma may be proved using familiar ideas such as those found in 8.1.10. The details are left to the reader. 8.6.6 Lemma. Let (X, d) be compact and (Y, ρ) complete. For f, g ∈ C(X, Y ) define σ(f, g) = sup ρ(f (x), g(x)). x∈X

Then σ is a metric on C(X, Y ), and C(X, Y ) is complete in this metric. 8.6.7 Lemma. A compact metric space X has a countable dense subset of S∞ the form D = k=1 Fk , where Fk is a finite (1/k)-net for X. Proof. For each k ∈ N, the collection {B1/k (x) : x ∈ X} is an open cover of X, S∞hence has a finite subcover {B1/k (x) : x ∈ Fk }. By definition of ε-net, k=1 Fk is dense in X. 8.6.8 Arzelà–Ascoli Theorem. Let X be compact and let Y have the Bolzano–Weierstrass property. Then a set F is compact in C(X, Y ), σ iff it is closed, bounded, and equicontinuous. Proof. Suppose F is compact in C(X, Y ), hence closed and bounded. If F is not equicontinuous at some a ∈ X, then there exists an ε > 0 and for every n members xn of X and fn of F such that d(xn , a) < 1/n and ρ(fn (xn ), f (a)) ≥ ε.

(8.6)

By compactness of F, we may assume that {fn } converges uniformly to some f ∈ F (otherwise, take a subsequence). Since xn → a, the uniform convergence of {fn } implies that fn (xn ) → f (a). But this contradicts (8.6). Therefore, F is equicontinuous.

Metric Spaces

265

Conversely, assume that F is closed, bounded, and equicontinuous and let {fn } be any sequence in F. We show that {fn } has a convergent subsequence. The compactness of F will then follow from 8.5.8. Let Fk and D = {x1 , x2 , . . .} be as in 8.6.7. We show first that {fn } has a subsequence that converges pointwise on D. For this we use the Bolzano– Weierstrass property of Y and the following diagonalization argument: Because (1) (0) {fn } is bounded, we may choose a subsequence {fn } of {fn := fn } such (1) that the sequence {fn (x1 )} converges to some y1 ∈ Y . We may then choose (2) (1) (2) a subsequence {fn } of {fn } such that {fn (x2 )} converges to some y2 ∈ Y . (k) Continuing in this way, we obtain for each k a sequence {fn } such that (k+1) (k) (k) {fn } is a subsequence of {fn } and limn fn (xk ) = yk . Now take the (n) diagonal sequence {gn := fn }, which is a subsequence of {fn } and for each (k) k, except for the first k − 1 terms, is a subsequence of {fn }. It follows that limn gn (xk ) = yk for each k. The scheme may be depicted as follows: (1)

(1)

→ y1 at x1

(2)

(2)

→ y2 at x2

f1 , f2 , . . . , fn(1) , . . . f1 , f2 , . . . , fn(2) , . . . .. . (n)

(n)

→ yn at xn

f1 , f2 , . . . , fn(n) , . . . .. .

& yk

at each xk

Having obtained a subsequence {gn } of {fn } that converges pointwise on the dense set D, we now show that {gn } converges uniformly on X, which will complete the proof. By the uniform equicontinuity of {gn }, given ε > 0, we may choose δ > 0 such that ρ gn (x), gn (y) < ε/3, for all n ∈ N and x, y ∈ X with d(x, y) < δ. (8.7) Let k > 1/δ. Since {gn } converges pointwise on Fk and Fk is finite, we may choose Nk so that ρ gn (y), gm (y) < ε/3, for all n, m ≥ Nk and all y ∈ Fk . (8.8) Since Fk is a δ-net, given x ∈ X, there exists y ∈ Fk such that d(x, y) < δ. It follows from (8.7) and (8.8) that for m, n ≥ Nk , ρ gn (x), gm (x) ≤ ρ gn (x), gn (y) + ρ gn (y), gm (y) + ρ gm (y), gm (x) < ε/3 + ε/3 + ε/3 = ε. Since x was arbitrary, {gn } is a Cauchy sequence in C(X, Y ). Since C(X, Y ) is complete, {gn } converges in C(X, Y ).

266

A Course in Real Analysis

Remark. The proof of the sufficiency of the theorem did not require that F be uniformly bounded. All that was used was the property of pointwise boundedness, that is, {f (x) : f ∈ F} bounded in Y for each x ∈ X. Uniform boundedness is then a consequence of equicontinuity. ♦ 8.6.9 Example. Let X be compact. Then any convergent sequence of functions fn in C(X, R), say fn → f , is equicontinuous. This may be verified directly, but a quick proof uses 8.6.8 applied the set {f, f1 , f2 , . . .}, whose compactness is readily established. ♦

Exercises 1. Let X × Y have the product metric η := d × ρ and let f : X → Y . The graph of f is the set G(f ) = {(x, y) : x ∈ X and y = f (x)}. Prove that if f is continuous, then G(f ) is closed in X × Y . Conversely, prove that if G(f ) is closed, f (X) is bounded, and Y has the Bolzano– Weierstrass property, then f is continuous. Give an example of a realvalued discontinuous function on [0, 1] with a closed graph. 2. Let X have the Bolzano–Weierstrass property and let {xn } be a bounded sequence in X with only finitely many cluster points y1 , . . . , yk . Prove that the set C := {y1 , . . . , yk , x1 , x2 , . . .} is compact. 3.S Prove that a subset F of C(X, Y ) is equicontinuous at a ∈ X iff for any sequences {fn } in F and {xn } in X with xn → a, ρ fn (xn ), fn (a) → 0. 4. Prove that a subset F of C(X, Y ) is uniformly equicontinuous on E ⊆ X iff for any sequences {fn } in F and {xn }, {an } in E with d(xn , an ) → 0, ρ fn (xn ), fn (an ) → 0. 5. Prove that a finite set of uniformly continuous functions f : X → Y is uniformly equicontinuous. 6. Prove that the uniform closure of a set F ⊆ C(X, Y ) of uniformly equicontinuous functions is uniformly equicontinuous. 7.S Let c, p > 0 and define fn (x) = (nx)−p , x ≥ c. Show that the sequence {fn } is uniformly equicontinuous. 8. Define fn (x) = ln(n + x). Show that the sequence {fn } is uniformly equicontinuous on (0, +∞). 9.S Define fn (x) = sin(nx). Use Exercise 3 and Exercise 8.3.13 to show that the sequence {fn } is not equicontinuous at any nonzero rational number r.

Metric Spaces

267

10. Let M > 0 and define RM := {f : f is locally integrable on [0, +∞) and kf k∞ ≤ M } . For f ∈ RM define Ff (x) =

Z

x

f, x ≥ 0.

0

Prove that the set F := {Ff : f ∈ RM } is uniformly equicontinuous on [0, +∞). 11.S Let M > 0 and define DM := {f : (a, b) → R : |f 0 (x)| ≤ M for all a < x < b} . Show that DM is uniformly equicontinuous. Conclude that if g has a bounded derivative on R, then the set of functions {gt : t ∈ R} is uniformly equicontinuous on I, where gt (x) = g(t + x). 12. Let f : X × Y → R have the property that f (x, y) is continuous in y for each fixed x and continuous in x for each fixed y. Define F := {f ( · , y) : y ∈ Y } . Prove: (a) If F is equicontinuous, then f is continuous. (b) If f is continuous and Y is compact, then F is equicontinuous. 13. Let X be compact. Show that a totally bounded subset of C(X, Y ) is uniformly equicontinuous. 14.S Let {fi : i ∈ I} be a uniformly bounded subset of Rba . Define Z x Fi (x) := fi (t) dt, a ≤ x ≤ b. a

Show that {Fi : i ∈ I} is a totally bounded subset of C([a, b]). 15. Let f (t, x, y) be continuous on [a, b]3 and define ft (x, y) = f (t, x, y). Prove that the family {ft : t ∈ [a, b]} is uniformly equicontinuous on [a, b]2 . Apply this to the function f (t, x, y) =

1 + t sin x on [0, 1]3 . 2 + t sin y

268

8.7

A Course in Real Analysis

Connected Sets

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces. 8.7.1 Definition. A pair (U, V ) of open sets in X is said to separate X if X = U ∪ V, U 6= ∅, V 6= ∅, and U ∩ V = ∅. The pair (U, V ) is then called a separation of X. The space X is said to be connected if it has no separation, and disconnected otherwise. A subset E of X is connected if it is connected as a subspace of X. ♦ It follows from the definition that if E is disconnected, then there exist sets U , V open in X such that (E ∩ U, E ∩ V ) is a separation of E. The sets U and V need not be disjoint in this definition; however the next theorem shows that this useful state of affairs may always be achieved. In this case we shall call (U, V ) a separation of E. 8.7.2 Theorem. A subset E of X is disconnected iff there exists a separation (E ∩ U, E ∩ V ) of E such that U ∩ V = ∅.

U

E

V

FIGURE 8.7: A separation (U, V ) of E. Proof. The sufficiency is clear. For the necessity, assume that E is disconnected and that (E ∩ U1 , E ∩ V1 ) is a separation of E. Here, U1 and V1 are open in X but may not be disjoint. However, since E ∩ U1 and E ∩ V1 are disjoint, clE (E ∩ U1 ) ∩ V1 = ∅. Indeed, if, to the contrary, x ∈ clE (E ∩ U1 ) ∩ V1 for some x, then there would be a sequence {xn } in E ∩ U1 converging to x, which would imply that eventually xn ∈ E ∩ V1 , impossible. Recalling that clE (E ∩ U1 ) = E ∩ clX (U1 ), we now see that v 6∈ clX (U1 ) for each v ∈ E ∩ V1 . Similarly,

u 6∈ clX (V1 ) for each u ∈ E ∩ U1 .

By Exercise 8.5.14 it follows that for u ∈ E ∩ U1 and v ∈ E ∩ V1 the distances r(u) := inf{d(u, x) : x ∈ clX (V1 )}

and s(v) := inf{d(v, x) : x ∈ clX (U1 )}

Metric Spaces are positive. Define [ U= u∈E∩U1

269

[

Br(u)/2 (u), and V =

Bs(v)/2 (v).

v∈E∩V1

Clearly, U and V are open in X and contain E ∩ U1 and E ∩ V1 , respectively. To prove that (U, V ) is a separation of E, it remains to show that U ∩ V = ∅. Suppose the the contrary that there exists a point x ∈ U ∩ V . Then, by the above, d(x, u) < r(u)/2 for some u ∈ U1 and d(x, v) < s(v)/2 for some v ∈ V1 . Adding and using the triangle inequality we have d(u, v) < r(u)/2 + s(v)/2. On the other hand, by definition of r(u) and s(v), d(u, v) ≥ r(u) and d(u, v) ≥ s(v), hence

d(u, v) ≥ r(u) + s(v) /2

This contradiction shows that U ∩ V = ∅ and completes the proof of the theorem. In any metric space, the empty set and the singletons {x} are trivially connected, but no other finite subsets are connected. In a discrete space the only connected sets are the empty set and The set Q is not √ the singletons. √ connected in R, since the open sets (−∞, 2) and ( 2, +∞) separate Q. 8.7.3 Theorem. X is not connected iff there exists a continuous function from X onto {0, 1}. Equivalently, X is connected iff every continuous function from X into {0, 1} is constant. Proof. Assume that X is not connected and let (U, V ) separate X. Define ( 0 if x ∈ U , g(x) = 1 if x ∈ V . Then g maps X onto {0, 1}. Let W be any open set in R. Then g −1 (W ) is one of the sets ∅, U , V , or X, each of which is open in X. Therefore, g is continuous. Conversely, if a continuous function g from X onto {0, 1} exists, then the open sets g −1 ((−1, 1/2)) and g −1 ((1/2, 2)) separate X. 8.7.4 Corollary. The nonempty connected subsets of R are the intervals.

270

A Course in Real Analysis

Proof. By the intermediate value theorem, there can be no continuous function from an interval onto {0, 1}. Hence intervals must be connected. Now let E be a nonempty subset of R that is not an interval. Choose real numbers a < c < b with a, b ∈ E but c 6∈ E. Then (−∞, c) and (c, +∞) separate E, hence E is not connected. The following is a generalization of the intermediate value theorem. 8.7.5 Corollary. If f : X → Y is continuous and X is connected, then f (X) is connected. Proof. Let g : f (X) → {0, 1} be continuous. Then g ◦ f : X → {0, 1} is continuous and hence must be constant. It follows that g itself must be constant. 8.7.6 Corollary. If A ⊆ X is connected and A ⊆ B ⊆ cl(A), then B is connected. In particular, the closure of a connected set is connected. Proof. Let g : B → {0, 1} be continuous. Then g|A is continuous, hence must be constant. Since B ⊆ cl(A), g itself must be constant. Therefore, A is connected. The converse of 8.7.6 is false. For example, cl(Q) = R is connected but Q is not. 8.7.7 Definition. A path in X from x to y is a continuous function ϕ from an interval [a, b] to X such that ϕ(a) = x, the initial point of the path, and ϕ(b) = y, the terminal point. X is said to be path connected if for each pair of points x, y ∈ X there exists a path in X from x to y. A subset E of X is path connected if it is path connected as a subspace of X. ♦ Note that if ϕ : [a, b] → X is a path from x to y, then −ϕ(t) := ϕ(−t), −b ≤ t ≤ −a, defines a path from y to x. Also, if ϑ : [c, d] → X is a path from y to z, then the sum or concatenation ϕ + ϑ : [0, 2] → X of the paths ϕ and ϑ is a path from x to z, where ( ϕ a + (b − a)t if 0 ≤ t ≤ 1, (ϕ + ϑ)(t) = ϑ c + (d − c)(t − 1) if 1 ≤ t ≤ 2. A convex subset C of a normed vector X is path connected. Indeed, if x, y ∈ C, then the line segment ϕ(t) := (1 − t)x + ty,

0 ≤ t ≤ 1,

joins x to y and lies in C. In particular, open and closed balls in X are path connected.

Metric Spaces

271

8.7.8 Theorem. If X is path connected, then it is connected. Proof. Let g : X → {0, 1} be a continuous function, let x, y ∈ X, and let ϕ[a, b] → X be a path from x to y. Then g ◦ ϕ : [a, b] → {0, 1} is continuous and, because [a, b] is connected, must be constant. In particular, g(x) = (g ◦ α)(a) = (g ◦ α)(b) = g(y). Since x and y were arbitrary, g is constant. 8.7.9 Example. The subset B1 (−1, 0)∪B1 (1, 0) of R2 is not connected, hence not path connected.

x

y

(−1, 0)

(1, 0)

C1 (−1, 0)

C1 (1, 0)

FIGURE 8.8: C1 (−1, 0) ∪ C1 (1, 0) is path connected. However, its closure C1 (−1, 0) ∪ C1 (1, 0) is path connected, as can be seen from the figure, hence is connected. ♦ 8.7.10 Example. A sphere in Rn , n > 1, is path connected, hence connected. For example, consider the sphere S = {x ∈ Rn : kxk2 = 1} . We show that there is a path from the point a = (1, 0, . . . , 0) to any point b = (b1 , b2 , . . . , bn ). It will then follow that any pair of points in S may be joined by a path in S through a. If b = (−1, 0, . . . , 0), then (cos t, sin t, 0, . . . , 0), 0 ≤ t ≤ π, is such a path. Suppose b 6= (−1, 0, . . . , 0). Then the line segment ϕ(t) = (1 − t)a + tb = (1 − t + tb1 , tb2 , . . . , tbn ), 0 ≤ t ≤ 1, is never zero, hence kϕ(t)k−1 2 ϕ(t) is a path from a to b in S.

♦

The converse of 8.7.8 is false, as the following example—the topologist’s sine curve (8.3.7)—demonstrates.

272

A Course in Real Analysis

8.7.11 Example. Let A = {(x, sin(1/x)) : 0 < x < 2/π}, B = {0} × [−1, 1], and E = A ∪ B. Since A is connected and E = cl(A), 8.7.6 shows that E is connected. However, E is not path connected. Indeed, no point in A can be joined to a point in B by a path in E. Suppose such a path existed, say ϕ : [a, b] → E, where ϕ(t) = x(t), y(t) , ϕ(a) ∈ A, and ϕ(b) ∈ B. Let

S := t ∈ [a, b] : ϕ [a, t] ⊆ A .

Since S is nonempty and bounded, c := sup S exists and c ∈ [a, b]. Note that x(t) > 0 on S. If x(c) > 0, then c < b, hence, by continuity, x(s) is positive on [a, c + δ] for some δ > 0, contradicting the definition of c. Therefore, x(c) = 0 and x(t) > 0 on [a, c). This implies that ϕ(t) = x(t), sin(1/x(t)) on [a, c) and limt→c− x(t) = 0. By continuity, for each δ > 0 the set x([c − δ, c]) is an interval of the form [0, d], d > 0. Therefore, y(t) = sin(1/x(t)) takes on all values in [−1, 1] on each interval [c − δ, c), which implies that limt→c− y(t) cannot exist. But this contradicts the continuity of ϕ at c. ♦ While there is no strict converse to 8.7.8, the next theorem provides a partial converse. 8.7.12 Theorem. An open connected subset E of a normed vector space X is path connected. Proof. Fix a point x ∈ E and let U denote the set of all points u ∈ E for which there exists a path in E from x to u. We claim that U is open. Let

u u0 x E

Br (u0 )

FIGURE 8.9: E is path connected. u0 ∈ U and choose r > 0 such that Br (u0 ) ⊆ E. By definition of U , there exists a path in E from x to u0 . Since Br (u0 ) is convex, there exists a line segment in Br (u0 ) from u0 to any point u ∈ Br (u0 ). The sum of these paths is then a path in E from x to u. Therefore, Br (u0 ) ⊆ U , which shows that U is open. A similar argument shows that V := E \ U is open. Since E is connected and x ∈ U , V = ∅. Therefore, E = U .

Metric Spaces

273

Exercises 1. Determine which sets are connected in R2 : (a) B1 (−1, 0) ∪ {(0, 0)} ∪ B1 (1, 0). (b) R2 \ {(1/m, 1/n) : m, n ∈ N}. (c)S Q2 . (d)S R2 \ Q2 . (e)S {(x, sin(1/x)) : x 6= 0} ∪ {(0, a)}. (f) R2 \ G, where G is the graph of a bounded function f : [a, b] → R. (g) R2 \ G, where G is the graph of an equation F (x, y) = 0. (h) {(x, y, z) : x2 + y 2 − z 2 = 1}. (i) {(x, y, z) : x2 + y 2 − z 2 = −1}. (j) {(x, y, z) : x2 + y 2 − z 2 = 0, 0 < x2 + y 2 ≤ 1}. 2. Prove that a metric space X is connected iff it has no proper nonempty subset that is both open and closed. 3. Prove that X is connected iff it cannot be expressed as the union of nonempty sets A and B such that A ∩ clX (B) = clX (A) ∩ B = ∅. Hint. Use 8.7.3 and the sequential characterization of continuity. 4. Prove that X × Y is connected in the product metric d × ρ iff X and Y are connected. 5.S Let X be connected and f : X → R continuous. Suppose there exist u, v ∈ X such that f (u)f (v) < 0. Show that the equation f (x) = 0 has a solution. 6. Let X be connected and f : X → Y continuous. Suppose f has the property that for each x ∈ X there exists ε > 0, possibly depending on x, such that f is constant on Bε (x). Prove that f is constant on X. 7.S Let X be connected and let g, h : X → R be continuous such that g(x) 6= h(x) for all x ∈ X. Prove that g > h or h > g on X. 8. ⇓6 Let X be a normed vector space and u, v ∈ X . A polygonal path P from u to v is a finite sequence of line segments Lk = [xk : xk+1 ], k = 1, . . . , n − 1, where x1 = u and xn = v. The path P is nonoverlapping if Lj ∩ Lk = ∅ unless j = k − 1, in which case Lj ∩ Lk = xk . A subset E of a normed vector space X is polygonally connected if for 6 This

exercise will be used in 12.2.10.

274

A Course in Real Analysis each pair of points u and v in E there exists a polygonal path from u to v contained in E. For example, a convex set is polygonally connected. Prove that every open connected subset E of X is polygonally connected. Show also that it is always possible to choose P to be non-overlapping

9.S Show that for n > 1 the complement of an open ball or a closed ball in Rn is path connected, hence connected. 10. Suppose A ⊆ X is connected. By 8.7.6, cl(A) is connected. Prove or disprove: (a) int(A) is connected, (b) bd(A) is connected. 11. The exterior ext(E) of a subset E of a metric space X is defined as the interior of E c . Show that X = int(E) ∪ bd(E) ∪ ext(E). Conclude that X is connected iff every subset of X with nonempty interior and nonempty exterior also has a nonempty boundary. 12.S Let {An } be a finite or infinite sequence of connected subsets of X such S that An ∩ An+1 6= ∅ for each n. Prove that n An is connected. 13. Let {Ai : i ∈ I} be a collection of nonempty S connected sets and i0 ∈ I such that Ai ∩ Ai0 6= ∅ for all i. Prove that i Ai is connected. of compact connected subsets of X 14. Let {An } be an infinite sequence T such that An+1 ⊆ An . Prove that n An is connected. 15. Let X = A1 ∪ · · · ∪ Ap and Y = B1 ∪ · · · ∪ Bq , p < q, where Aj and Bj are connected, the Aj ’s are pairwise disjoint, and the Bj ’s are pairwise disjoint and closed. Show that no continuous function f : X → Y can map X onto Y . 16.S Prove that no one-to-one continuous function can map a closed line segment L onto a circle C. Show, however, that there are continuous functions that can do this. 17. Suppose closed line segments L1 , L2 , L3 in the plane meet at a single endpoint P . Show that no one-to-one continuous function can map a closed line segment L onto L1 ∪ L2 ∪ L3 . Show, however, that there are continuous functions that can do this. 18. Let C1 and C2 be tangent circles in the plane. Show that no one-to-one continuous function can map C1 ∪ C2 onto a circle C. Show, however, that there are continuous functions that can do this. 19. Show that no one-to-one continuous function can map the set E := {(x, y, z) : x2 + y 2 = z 2 , x2 + y 2 ≤ 1} onto a closed disk D. Show, however, that there are continuous functions that can do this.

Metric Spaces

275

20.S Let X be a normed vector space and f : X → R continuous. Let A := {x ∈ X : f (x) ≥ c} and B := {x ∈ X : f (x) = c}. Prove that bd(A) ⊆ B and that the inclusion may be strict. 21. Let X be connected and have at least two points. Show that X is uncountable. Hint. For all sufficiently small r > 0, X 6= Br (x) ∪ Crc (x). 22.S Let U be an open subset of a normed vector space X and let x ∈ U . The component of U containing x is the union Cx of all connected subsets of U containing x. (a) Prove that Cx is open and connected and that U is a union of pairwise disjoint components. (b) Show that the number of components is countable if X is a Euclidean space Rn . 23. Let (X, d) be complete, (Y, ρ) connected, c > 0, and let f : X → Y be a continuous mapping such that f (X) is open and ρ f (u), f (v) ≥ c d(u, v) for all u, v ∈ X. Prove that Y is complete.

8.8

The Stone–Weierstrass Theorem

Let (X, d) be a compact metric space and let C(X) denote the space of all continuous real-valued functions of X with the supremum norm kf k∞ = supx∈X |f (x)|. A member f of C(X) is said to be uniformly approximated by members of a subset S of C(X) if f ∈ cl(S). This is equivalent to the existence of a sequence {fn } in S converging uniformly to f on X. Weierstrass’s approximation theorem asserts that any function in C [a, b]) may be uniformly approximated by polynomials. Stone’s generalization of Weierstrass’s theorem replaces [a, b] by a compact metric space7 and the set of polynomials by a more general class of functions. The proof of Weierstrass’s theorem given below is due to Lebesgue. The basic idea is to show that every continuous function may be uniformly approximated by piecewise linear functions and that these in turn may be uniformly approximated by polynomials. 7 more

generally, by a compact Hausdorff topological space.

276

A Course in Real Analysis

8.8.1 Definition. Let a = x0 < x1 < . . . < xk = b. A function g on [a, b] is said to be piecewise linear with vertices (xj , yj ) if, for j = 0, 1, . . . , k − 1, g(x) = yj + mj (x − xj ), mj =

yj+1 − yj , xj ≤ x ≤ xj+1 . xj+1 − xj

♦

Note that a piecewise linear function is necessarily continuous and that its graph consists of a sequence of line segments joined at the vertices. (See Figure 8.10.) y y3 y5 y1 y2 y0 y4 a

x1

x2

x3

x4

x

b

FIGURE 8.10: A piecewise linear function. 8.8.2 Lemma. Every continuous function f on [a, b] may be uniformly approximated by a piecewise linear function. Proof. Given ε > 0, choose δ > 0 such that |f (x) − f (y)| < ε/2 whenever |x − y| ≤ δ. Let x0 = a < x1 < · · · < xk = b be a partition of [a, b] with mesh < δ and let g be as in 8.8.1 with yj = f (xj ). If xj ≤ x ≤ xj+1 , then |mj |(x − xj ) = |f (xj+1 ) − f (xj )| hence

x − xj ≤ |f (xj+1 ) − f (xj )| < ε/2, xj+1 − xj

|f (x) − g(x)| ≤ |f (x) − f (xj )| + |mj |(x − xj ) < ε.

8.8.3 Lemma. The function g in 8.8.1 may be written g(x) = y0 +

k−1 X

cj (x − xj )+ , a ≤ x ≤ b,

j=0

for suitably chosen constants cj . Proof. For 0 ≤ j ≤ k − 1 and xj ≤ x ≤ xj+1 , the desired equation reduces to yj + mj (x − xj ) = y0 +

j X i=0

ci (x − xi ) = y0 −

j X i=0

ci xi + x

j X i=0

ci .

Metric Spaces

277

This holds iff mj =

j X

ci , and yj − mj xj = y0 −

i=0

j X

ci x i .

(8.9)

i=0

The first equation in (8.9) is satisfied by taking c0 = m0 and cj = mj − mj−1 , j ≥ 1. For this choice, y0 −

j X

ci xi = y0 +

i=0

j X

mi−1 xi −

i=1

= y0 − mj xj +

j X

m i xi

i=0 j−1 X

mi (xi+1 − xi )

i=0

= y0 − mj xj +

j−1 X (yi+1 − yi ) i=0

= yj − mj xj , which shows that the second equation in (8.9) is also satisfied. 8.8.4 Lemma. The functions |x| and x+ may be uniformly approximated by polynomials on any bounded interval I. Proof. By 7.4.10, the binomial series ∞ X 1/2 (−t)n n n=0 converges uniformly to

√

1 − t on [−1, 1]. Setting t = 1 − x2 we see that ∞ X 1/2 (x2 − 1)n n n=0

√ converges uniformly to x2 = |x| on [−1, 1]. Thus if sn (x) denotes the nth partial sum of the last series and m is chosen so that I ⊆ [−m, m], then Qn (x) := msn (x/m) defines a sequence of polynomials converging uniformly to |x| on I. Since x+ = 12 (x + |x|), the polynomials Pn (x) := 12 x + Qn (x) converge uniformly to x+ on I. 8.8.5 Weierstrass Approximation Theorem. The set of all polynomials on [a, b] is dense in C([a, b]). That is, every member of C([a, b]) may be uniformly approximated by polynomials. Proof. Let f ∈ C([a, b]) and ε > 0. By 8.8.2, there exists a piecewise linear function g on [a, b] such that kf − gk∞ < ε/2. By 8.8.3 and 8.8.4, there exists a polynomial P such that kP − gk∞ < ε/2. Then, by the triangle inequality, kf − P k∞ < ε.

278

A Course in Real Analysis

For the statement of the Stone–Weierstrass theorem, we need the following definitions. 8.8.6 Definition. A collection A of real-valued functions on a set S is said to be an algebra if A is closed under addition, multiplication, and scalar multiplication; that is, f, g ∈ A and α ∈ R ⇒ f + g, f g, αf ∈ A. A is said to separate points of S if for each pair of distinct points s and t in S there exists f ∈ A such that f (s) 6= f (t). ♦ For example, the collection of all polynomials on [a, b] is an algebra that separates points of [a, b]. 8.8.7 Stone–Weierstrass Theorem. Let X be a compact metric space and let A be an algebra in C(X) that contains the constant functions and separates points of X. Then A is dense in C(X). Proof. Set B := cl(A). The proof that B = C(X) consists of the following sequence of steps. I. B is an algebra in C(X). J If fn , gn ∈ A, fn → f , gn → g, and α ∈ R, then (a) kαfn − αf k∞ = |α| |fn − f k∞ → 0,

(b) k(fn + gn ) − (f + g)k∞ ≤ kfn − f k∞ + kgn − gk∞ → 0, and (c) kfn gn − f gk∞ ≤ kfn gn − f gn k∞ + kf gn − f gk∞ ≤ kgn k∞ kfn − f k∞ + kf k∞ kgn − gk∞ → 0, the convergence in (c) holding because {gn } is uniformly bounded. (Each gn is bounded and gn converges uniformly to a bounded function.) Thus B is closed under addition, multiplication, and scalar multiplication. K

II. f ∈ B ⇒ |f | ∈ B.

J Let M = kf k∞ . By 8.8.4 there exists a sequence of polynomials Pn (x) converging uniformly to |x| on [−M, M ]. It follows that Pn ◦ f converges uniformly to |f | on X. Because B is an algebra containing the constants, Pk Pk Pn ◦ f ∈ B. Indeed, if Pn (x) = j=0 aj xj , then Pn ◦ f = j=0 aj f j . Since B is closed, |f | ∈ B. K

III. f1 , . . . , fk ∈ B ⇒ max{f1 , . . . , fk }, min{f1 , . . . , fk } ∈ B.

J By induction, it suffices to consider the case k = 2. This follows from step II and the identities max{f1 , f2 } = 12 f1 + f2 + |f1 − f2 | , min{f1 , f2 } = 12 f1 + f2 − |f1 − f2 | . K

Metric Spaces

279

IV. Let f ∈ C(X). Then for each pair of distinct points x, y in X there exists a function gxy ∈ A such that gxy (x) = f (x) and gxy (y) = f (y). J Choose a function h ∈ A such that h(x) 6= h(y) (A separates points). Define gxy (z) = f (x) +

f (x) − f (y) h(z) − h(x) , z ∈ X. h(x) − h(y)

Because A contains the constant functions, gxy ∈ A. Clearly, gxy (x) = f (x) and gxy (y) = f (y). K

V. If f ∈ C(X), x ∈ X, and ε > 0, then there exists a function gx ∈ B such that gx (x) = f (x) and gx (z) < f (z) + ε for all z ∈ X. J By continuity, for each y ∈ X the set Uy := {z ∈ X : gxy (z) < f (z) + ε} is open in X, where gxy is the function in step IV. Moreover, Uy contains both x and y. Since X is compact, there exist y1 , . . . yk ∈ X such that X = Uy1 ∪ · · · ∪ Uyk . Set gx := min{gxy1 , . . . , gxyk }. Then gx clearly has the required properties and, by step III, gx ∈ B. K

VI. If f ∈ C(X) and ε > 0, then there exists a function g ∈ B such that f (z) − ε < g(z) < f (z) + ε, for all z ∈ X. J By continuity, for each x ∈ X the set Vx := {z ∈ X : gx (z) > f (z) − ε}

is open in X, where gx is the function in step V. Moreover, Vx clearly contains x and f (z) − ε < gx (z) < f (z) + ε, for all z ∈ Vx . Since X is compact, there exist x1 , . . . , xm ∈ X such that X = V x1 ∪ · · · ∪ V xk . Set g := max{gx1 , . . . , gxm }. By step III, g ∈ B, and g clearly satisfies the desired inequality. K

To complete the proof of the theorem, observe that step VI asserts that C(X) = cl(B). Since B is closed, C(X) = B.

280

A Course in Real Analysis

8.8.8 Example. A trigonometric polynomial is a function on R of the form T (x) = a0 +

m X

aj cos(jx) + bj sin(jx),

aj , bj ∈ R.

j=1

The collection T ([a, b]) of all trigonometric polynomials on the interval [a, b] clearly contains the constant functions and is closed under addition and scalar multiplication. Since sin jx sin kx = 12 sin(j − k)x + sin(j + k)x , with similar identities holding for sin jx cos kx and cos jx cos kx, T ([a, b]) is an algebra. If 0 < b − a < 2π, then {cos x, sin x}, and hence T ([a, b]), separate points of [a, b]. By the Stone–Weierstrass theorem, every member of C([a, b]) may be uniformly approximated by trigonometric polynomials on [a, b]. If b − a = 2π, then T ([a, b]) no longer separates points of [a, b]. However, in this case every member f of C([a, b]) with f (a) = f (b) may be uniformly approximated by a trigonometric polynomial. We verify this for the interval [0, 2π]. Let E denote the algebra of continuous functions f : [0, 2π] → R with f (0) = f (2π), and let X denote the circle x2 + y 2 = 1 with the Euclidean R2 metric. For each f ∈ E, define Ff : X → R by Ff (cos t, sin t) = f (t),

0 ≤ t ≤ 2π.

It is straightforward to verify that Ff is continuous. For example, if (cos tn , sin tn ) → (1, 0), then every convergent subsequence {tnk } converges either to 0 or to 2π, hence Ff (cos tnk , sin tnk ) = f (tnk ) → f (0) = f (1) = Ff (1, 0). The set

A := {FT : T ∈ T ([0, 2π])}

is easily seen to be an algebra that contains the constant functions. Moreover, A separates points of X. Indeed, if x := (cos s, sin s) and y := (cos t, sin t) with x = 6 y, then, say, cos s 6= cos t hence FT (x) 6= FT (y), where T (x) = cos x. Therefore, each Ff may be uniformly approximated on X by members of A. It follows that each member of E may be uniformly approximated on [0, 2π] by trigonometric polynomials. ♦

Exercises 1. Give an example of a bounded continuous function that cannot be approximated uniformly by polynomials on (0, 1). 2. Let f be continuous on [a, +∞) such that limx→+∞ f (m) (x) 6= 0 for all sufficiently large m ∈ N. Prove that f cannot be uniformly approximated by polynomials on [a, +∞). Give an example of such a function.

Metric Spaces

281

Rb 3.S Let f ∈ C([a, b]) have the property that a xn f (x) dx = 0 for all n ∈ Z+ . Prove that f = 0 on [a, b]. Show that if a ≥ 0, then it is enough that the given property holds for even integers n in Z+ . 4. Let f : [a, b] → R have continuous derivatives up to order k such that Z

b

xn f (k) (x) dx = 0 for all n ∈ Z+ .

a

Prove that f is a polynomial. 5. Let f : [a, b] → R have continuous derivatives up to order k. Prove that (j) there exists a sequence of polynomials Pn such that limn Pn = f (j) uniformly on [a, b] for j = 0, 1, . . . , k. 6.S Let X be compact and let A be an algebra in C(X) that contains the constant functions and separates the points of X. Let x0 ∈ X and let f ∈ C(X) satisfy f (x0 ) = 0. Prove that there exists a sequence fn ∈ A converging uniformly to f such that fn (x0 ) = 0 for all n. 7. Show that there exists a sequence of polynomials Pn converging uniformly to sin x on [0, π] such that Pn (0) = Pn (π) = 0 for all n. 8. Let f be an odd (even) continuous function on [−a, a], a > 0. Prove that there is a sequence of odd (even) polynomials that converges uniformly to f on [−a, a]. 9.S Let f ∈ C([0, 2π]) have the properties f (0) = f (2π) and Z

2π

f (x) sinm x cosn x dx = 0 for all m, n ∈ Z+ .

0

Prove that f is identically zero on [0, 2π]. 10. Let f : R → R be continuous and periodic with period 2π. Prove that there exists a sequence of trigonometric polynomials that converges uniformly to f on R. 11.S Let f ∈ C([−π/2, π/2]) with f (0) = 0. Prove that f can Pmbe uniformly approximated on [−π/2, π/2] by functions of the form j=1 bj sin(jx). 12. Let g be continuous and one-to-one on [a, b]. Prove that any function in C [a, b] may be uniformly approximated by functions of the form Pm j j=0 aj g . 13. Prove the following version of the Stone–Weierstrass theorem: If V is a linear subspace of C(X) that contains the constant functions, separates points of X, and contains |f | for all f ∈ V, then V is dense in C(X).

282

A Course in Real Analysis

14. Show that for any f ∈ C([0, 2π]) there exists a sequence of trigonometric R 2π polynomials Tn such that 0 |f − Tn | → 0. 15.S Let X and Y be compact metric spaces and let f (x, y) ∈ C(X × Y ) be a continuous real-valued function on X × Y . Show that for every ε > 0 there exist g1 , . . . , gn ∈ C(X) and h1 , . . . , hn ∈ C(Y ) such that n X gi (x)hi (y) < ε for all (x, y) ∈ X × Y . f (x, y) − i=1

16. Let E0 denote the algebra of all continuous functions f [a, b] :→ R such that f (a) = f (b) = 0. If A0 is an algebra in E that separates points of (a, b) show that A0 is dense in E0 in the uniform norm. Hint. Use ideas of 8.8.8 by considering the algebra generated by A0 and the constant functions. 17. Let C0 (R) denote the algebra of all continuous functions f on R such that limt→±∞ f (t) = 0. Let B0 be an algebra in C0 (R) that separates points of R. Show that B0 is dense in C0 (R) in the uniform norm. Hint. Consider θ(t) = tan−1 [(t − π)/2], 0 < t < 2π and use Exercise 16.

*8.9

Baire’s Theorem

Let (X, d) be a metric space. The diameter d(E) of a nonempty subset E of X is defined by d(E) = sup d(x, y). x,y∈E

8.9.1 Lemma. If X is complete, then the intersection C of any decreasing sequence of nonempty closed sets Cn in X with d(Cn ) → 0 contains a single point. Proof. For each n choose a point xn ∈ Cn . If m > n, then xm ∈ Cn , hence d(xm , xn ) ≤ d(Cn ). Since d(Cn ) → 0, {xn } is Cauchy. Let xn → x. Since xn , xn+1 , . . . ∈ Cn and Cn is closed, x ∈ Cn for all n, that is, x ∈ C. Since d(C) ≤ d(Cn ) → 0, C = {x}. 8.9.2 Baire Category Theorem. Let X be a complete metric space. Then the following statements hold: T (a) If Un ⊆ X is open and dense in X for all n, then G := n Un is dense in X. S (b) If Cn ⊆ X is closed and has empty interior for all n, then F := n Cn has empty interior.

Metric Spaces

283

Proof. To prove (a), we show that B∩G 6= ∅ for any open ball B. Since B∩U1 is open and nonempty, C1 := Cr1 (x1 ) ⊆ B ∩ U1 for some x1 ∈ X and 0 < r1 ≤ 1. Since Br1 (x1 ) ∩ U2 is open and nonempty, C2 := Cr2 (x2 ) ⊆ Br1 (x1 ) ∩ U2 for some x2 ∈ X and 0 < r2 < 1/2. Continuing in this manner, we obtain a decreasing sequence of closed balls Cn ⊆ B ∩ Un with diameters tending to T zero. By 8.9.1, n Cn contains a point x. Then x ∈ B ∩ Un for all n, hence x ∈ B ∩ G. Part (b) follows from (a). T Indeed, suppose int(Cn ) = ∅ for all n. Then c Un := C is dense in X, hence n n Un is dense in X. It follows that the interior T of ( n Un )c = F is empty. We give three applications of Baire’s theorem. The first is known as the principle of uniform boundedness. 8.9.3 Theorem. Let X and Y be complete normed vector spaces and let L be a family of continuous linear transformations from X to Y such that sup kT xk < ∞ for each x ∈ X .

T ∈L

Then there exists M > 0 such that kT xk ≤ M kxk for all x ∈ X and T ∈ L. Proof. For each n, set Cn = {x ∈ X : kT xk ≤ n for all T ∈ L}. S By hypothesis, X = n Cn . By continuity of the transformations T , each Cn is closed. Therefore, Baire’s theorem shows that int(Cn ) 6= ∅ for some n. Thus there exists x0 and r > 0 such that kT yk ≤ n for all T ∈ L and y ∈ X with ky − x0 k ≤ r. If kxk ≤ r, then, taking y = x + x0 , we have kT xk ≤ kT x + T x0 k + kT x0 k = kT yk + kT x0 k ≤ n + kT x0 k. It follows that for all x 6= 0 and T ∈ L

T rx ≤ n + kT x0 k

kxk hence

kT xk ≤ r−1 n + kT x0 k kxk.

The following corollary is one of the few instances in analysis (Dini’s theorem being another) when pointwise convergence of a sequence of continuous functions is sufficient to convey the property continuity to the limit function. 8.9.4 Corollary. Let X and Y be complete normed vector spaces and let {Tn } be a sequence of continuous linear transformations from X to Y converging pointwise on X to a function T . Then T is linear and continuous.

284

A Course in Real Analysis

Proof. Linearity of T is clear. For continuity, note that supn kTn xk < +∞ for each x ∈ X , hence, by the theorem, there exists M > 0 such that kTn xk ≤ M kxk for all n and x. Letting n → +∞ yields kT xk ≤ M kxk, hence T is continuous. For the second application of Baire’s theorem, recall that there exist functions f : R → R whose set of discontinuity points is precisely Q (3.3.3). The obvious question raised by this fact is answered in the following theorem. 8.9.5 Theorem. There is no function f : R → R whose set of continuity points is precisely Q. Proof. For each n, let Un denote the union of all intervals (a, b) such that |f (x) − f (y)| < 1/n for all x, y ∈ (a, T b). Then Un is open and the set of ∞ continuity points of f is precisely C := n=1 Un . Suppose that C = Q. Then each Un contains Q and hence is dense in R. Let {r1 , r2 , . . .} be an enumeration of Q. Then the open sets Vm := R \ {rm } are also dense in R and have intersection I. By Baire’s theorem, the collection of sets {Un , Vm : m, n ∈ N} has a nonempty intersection. But this intersection is Q ∩ I = ∅. Therefore, C cannot equal Q. The last application of Baire’s theorem shows that there is a rich supply of continuous, nowhere differentiable functions. For the proof we need the following lemma. 8.9.6 Lemma. If g is piecewise linear on [a, b], then there exists M > 0 such that |g(x) − g(y)| ≤ M |x − y| for all x, y ∈ [a, b]. Proof. Let g be as in 8.8.1 and set M = maxj {|mj |}. If xi ≤ x ≤ xi+1 ≤ xj ≤ y ≤ xj+1 then |g(x) − g(y)| ≤ |g(x) − g(xi+1 )| + |g(xi+1 ) − g(xi+2 )| + · · · + |g(xj ) − g(y)| ≤ |mi |(xi+1 − x) + |mi+1 |(xi+2 − xi+1 ) + · · · + |mj |(y − xj ) ≤ M (y − x). 8.9.7 Theorem. The set of all continuous, nowhere differentiable functions on an interval [a, b] is dense in C([a, b]) in the uniform norm. Proof. For each n ∈ N and f ∈ C([a, b]) define En (f ) = {x ∈ [a, b] : |f (y) − f (x)| ≤ n|x − y| for all y ∈ [a, b]}. we break the proof into several steps:

Metric Spaces I.

S∞

n=1

285

En (f ) contains all points at which f is differentiable.

J Let x be such a point and choose δ > 0 such that f (y) − f (x) 0 − f (x) < 1 for all y ∈ [a, b] with 0 < |x − y| < δ. y−x Then ( |f (y) − f (x)| ≤

1 + |f 0 (x)| |y − x| if |x − y| < δ, −1 2kf k∞ ≤ 2δ kf k∞ |y − x| if |x − y| ≥ δ,

which shows that x ∈ En (f ) for all n > 1 + |f 0 (x)| + 2δ −1 kf k∞ . K

II. En := {f ∈ C([a, b]) : En (f ) 6= ∅} is closed in C([a, b]).

J Let {fk } be a sequence in En converging uniformly to f ∈ C([a, b]). For each k, choose a point xk ∈ En (fk ). We may assume that xk → x for some x ∈ [a, b] (otherwise, take a subsequence). Then for all y ∈ [a, b], |f (y) − f (x)| ≤ |f (y) − fk (y)| + |fk (y) − fk (xk )| + |fk (xk ) − fk (x)| + |fk (x) − f (x)| ≤ 2kf − fk k∞ + n|y − xk | + n|xk − x|. Letting k → ∞ shows that |f (y) − f (x)| ≤ n|y − x|, that is, x ∈ En (f ). Therefore, f ∈ En . K

III. Enc is dense in C([a, b]).

J Let f ∈ C([a, b]) and ε > 0. We construct a function h ∈ Bε (f )∩Enc . By 8.8.2, there exists a piecewise linear function g such that kf − gk∞ < ε/2. By 8.9.6, there exists M > 0 such that |g(x) − g(y)| ≤ M |x − y| for all x, y ∈ [a, b]. Let r > 0 and let x0 = a < x1 < · · · < x2p = b be a partition of [a, b] with mesh < r. Construct a “sawtooth” piecewise linear function hr with hr

c = |hr (x) − hr (xj )| ≥ 1, |x − xj | < r

1

x

c

x1 x0

x5

x3 x2

−1

x4

x7 x6

x8

x

xj FIGURE 8.11: The sawtooth function hr .

vertices (x0 , 1), (x2 , 1), . . . , (x2p , 1)

and (x1 , −1), (x3 , −1), . . . , (x2p−1 , −1),

286

A Course in Real Analysis and set h := g + εhr /2. Then kh − f k∞ ≤ kh − gk∞ + kg − f k∞ =

ε ε ε khr k∞ + kg − f k∞ < + = ε, 2 2 2

so h ∈ Bε (f ). To show that h ∈ Enc , let x be an arbitrary member of [a, b]. If hr (x) ≤ 0 (≥ 0) choose xj such that |x − xj | < r and hr (xj ) = 1 (= −1) (see Figure 8.11). Then |hr (x) − hr (xj )| ≥ 1, hence ε |hr (x) − hr (xj )| − |g(x) − g(xj )| 2 ε ≥ − M |x − xj | 2 ε ≥ − M |x − xj |. 2r

|h(x) − h(xj )| ≥

If r is chosen so that

ε − M > n, then x 6∈ En (h), hence h 6∈ En . K 2r

the proof note that by step III and Baire’s theorem, F := T∞To complete c E is dense in C([a, b]). Since f ∈ F implies that En (f ) = ∅ for every n=1 n n, and since a point at which f is differentiable must lie in some En (f ), no member of F can be differentiable at any point of [a, b].

Exercises 1.S Prove the converse of 8.9.1: If the intersection of any decreasing sequence of nonempty closed sets Cn in X with d(Cn ) → 0 contains a single point, then X is complete. Find a decreasing sequence of closed sets 2. Let Q have the usual metric. T Cn in Q with d(Cn ) → 0 and n Cn = ∅. 3.S Show that 8.9.2 does not hold in Q with the usual metric. 4. Let D = {x1 , x2 , . . .} be a proper subset of a complete metric space X. Show that (a) and (b) of 8.9.2 hold for Y := X \ D. Conclude that the set of irrationals I with the usual metric satisfies (a) and (b) of the theorem.

Chapter 9 Differentiation on Rn

For the remainder of the book, the Euclidean norm k · k2 on the spaces Rn will be denoted simply by k · k. In this chapter we extend the ideas of Chapter 4 to vector-valued functions of several variables. This will require some notions from linear algebra, a brief review of which may be found in Appendix B.

9.1

Definition of the Derivative

To motivate the general definition of the derivative of a function on Rn , we begin with two important special cases.

Derivative of a Vector-Valued Function of a Real Variable The definition of derivative in this case is a natural extension of the definition of the derivative of a scalar-valued function: 9.1.1 Definition. Let I ⊆ R be an interval and a ∈ I. A function f : I → Rm is said to be differentiable at a if the (vector) limit f 0 (a) := lim

h→0

f (a + h) − f (a) f (t) − f (a) = lim t→a h t−a

exists in Rm . (The limit is one-sided if a is an endpoint of I.) The vector f 0 (a) is called the derivative of f at a. If f is differentiable at each point in I, then f is said to be differentiable on I and the resulting function f 0 : I → Rm is called the derivative of f on I. ♦ The function f may be viewed as a parametrization of a curve C in Rm . The vector f 0 (a) is then called the tangent vector to C at the point f (a). If the variable t is interpreted as time, then C may be viewed as the path of a particle in Rm . In this context, f 0 (a) is called the velocity of the particle and kf 0 (a)k the speed. The curve is said to be smooth if f 0 is continuous and nonzero on I. Parameterized curves will be examined in detail in Chapter 12. 287

288

A Course in Real Analysis

Note that the function f : I → Rm may be written f = (f1 , . . . , fm ), where fj : I → R is the jth component function of f . 9.1.2 Proposition. Let I be an interval and f = (f1 , . . . , fm ) : I → Rm . Then f is differentiable at a ∈ I iff each fj is differentiable at a, in which case 0 f 0 (a) = (f10 (a), . . . , fm (a)). In particular, if f is differentiable at a, then f is continuous at a. Proof. The assertions follow directly from the inequalities 2

2

f (a + h) − f (a)

fj (a + h) − fj (a)

≤ − x − (x , . . . , x ) j 1 m

h h 2 m X fi (a + h) − fi (a) ≤ − xi . h i=1

The differential of f at a is the linear transformation dfa : R → Rm that takes a real number h to the vector hf 0 (a): dfa (h) = hf 0 (a), h ∈ R. Definition 9.1.1 may then be rephrased as follows: f is differentiable at a iff there exists a linear transformation T : R → Rm such that lim

h→0

f (a + h) − f (a) − T h = 0, |h|

in which case T = dfa

Derivative of a Real-Valued Function of Several Variables The derivative of a scalar-valued function of n variables is defined as follows: 9.1.3 Definition. Let U ⊆ Rn be open and a ∈ U . Then f : U → R is said to be differentiable at a if there exists a vector f 0 (a) in Rn such that f (a + h) − f (a) − f 0 (a) · h = 0. h→0 khk lim

(9.1)

The vector f 0 (a) is called the derivative of f at a. The differential of f at a is the linear transformation dfa ∈ L(Rn , R) defined by dfa (h) = f 0 (a) · h, Now let

h ∈ Rn .

♦

j

ej = (0, . . . , 0, 1, 0, . . . , 0), j = 1, . . . , n, denote the standard basis vectors in Rn . If f 0 (a) exists, then, taking h = tej in (9.1), we have f (a + tej ) − f (a) − tf 0 (a) · ej = 0, t→0 t lim

Differentiation on Rn

289

or, equivalently,

f (a + tej ) − f (a) = f 0 (a) · ej . (9.2) t→0 t The expression the right is just the jth component of f 0 (a). The limit on the left is called the jth partial derivative of f at a and is denoted variously by lim

∂j f = fxj =

∂f . ∂xj

We have proved the following result. 9.1.4 Proposition. If f is differentiable at a, then the partial derivatives ∂j f (a) of f exist at a and f 0 (a) = ∂1 f (a), ∂2 f (a), . . . , ∂n f (a) . (9.3) In particular, the derivative is unique. The vector on the right in (9.3) is called the gradient of f at a and is denoted by ∇f or grad f . The linear transformation dfa ∈ L(Rn , R) may now be written dfa (h) = ∇f (a) · h, h ∈ Rn . (9.4) For an alternate notation, let dxj : Rn → R be the linear function defined by dxj (h) = hj , h = (h1 , . . . , hn ). Then dfa may be expressed as dfa (h) =

n X ∂f (a) j=1

∂xj

dxj (h).

If the partial derivatives of f exist at each point of U , we write simply df =

n X ∂f dxj . ∂xj j=1

For example, d sin(x2 y) = 2xy cos(x2 y) dx + x2 cos(x2 y) dy. We show below that if f has continuous partial derivatives on U , then f is differentiable on U . The continuity hypothesis cannot be removed: There are functions f that are not differentiable on U but whose partial derivatives exist throughout U . This is the case for the function in the following example.

290

A Course in Real Analysis

9.1.5 Example. Let m ∈ N. The function m x y if (x, y) 6= (0, 0), f (x, y) = x2 + y 2 0 otherwise exhibits a variety of behavior depending on the values of m. The partial derivatives of f are m+1 y + mxm−1 y 3 − 2xm+1 y mx , if x 6= (0, 0), (x2 + y 2 )2 fx (x, y) = 0 otherwise, m 2 2 x (x − y ) , if x 6= (0, 0), (x2 + y 2 )2 fy (x, y) = 0 otherwise. If m = 1, f is not continuous at (0, 0), hence is not differentiable there (see 9.1.11, below). If m = 1 or 2, the partial derivatives exist at (0, 0) but are not continuous there. If m = 2, the function is continuous at (0, 0), with zero partial derivatives at (0, 0), but is not differentiable there since in this case the limit f (x) − f (0) − 0 · x x2 y lim ,= lim x→0 kxk (x,y)→(0,0) (x2 + y 2 )3/2 fails to exist. If m ≥ 3, f has continuous partial derivatives and is differentiable on R2 . ♦ The definition of the jth partial derivative of f at a may be written explicitly as ∂j f (a) = lim

h→0

f (a1 , . . . , aj + h, . . . , an ) − f (a1 , . . . , aj , . . . , an ) . h

This is simply the derivative at aj of the one-variable function t 7→ f (a1 , . . . , aj−1 , t, aj+1 , . . . , an ). Thus to find the jth partial derivative of f (x1 , . . . , xj , . . . , xn ), one simply differentiates f with respect to xj while holding the other variables fixed. It follows that the standard formulas for derivatives of functions of one variable hold for partial derivatives of functions of several variables. For example, the product rule takes the form ∂j (f g)(a) = f (a)∂j g(a) + g(a)∂j f (a), and the quotient rule becomes f g(a)∂j f (a) − f (a)∂j g(a) (a) = , g(a) 6= 0. ∂j g g 2 (a)

Differentiation on Rn

291

Derivative of a Vector-Valued Function of Several Variables We now consider the general case. The following definition includes the two special cases discussed before. 9.1.6 Definition. Let U ⊆ Rn be open. A function f : U → Rm is said to be differentiable at a ∈ U if there exists a linear transformation dfa : Rn → Rm , called the differential of f at a, such that lim

h→0

f (a + h) − f (a) − dfa (h) = 0. khk

The m × n matrix [dfa ] is called the derivative of f at a, or the Jacobian matrix of f at a, and is denoted by f 0 (a). ♦ 9.1.7 Example. If T ∈ L(Rn , Rn ), then, by the linearity of T , T (x + h) − T (x) − T h = 0 for all h. It follows that dTx = T for all x. This is the n-dimensional version of the familiar result that the derivative of the function x → tx is the constant t. ♦ 9.1.8 Theorem. Let U ⊆ Rn be open, f = (f1 , . . . , fm ) : U → Rm , and let a ∈ U . Then f is differentiable at a iff each function fi : U → R is differentiable at a. In this case, ∂j fi (a) exists and equals dfa (ej ) · ei , and dfa (h) = ∇f1 (a) · h, . . . , ∇fm (a) · h , h ∈ Rn . (9.5) In particular, if the differential exists, it is unique. Proof. Let f be differentiable at a. For i = i, . . . , m and j = 1, . . . , n, let bij = dfa (ej ) · ei , the ith component of dfa (ej ) and the (i, j)th entry of the matrix [dfa ]. Then dfa (h) = b1 · h, . . . , bm · h , where bi := (bi1 , . . . , bin ). Thus for each i, |fi (a + h) − fi (a) − bi · h| ≤ kf (a + h) − f (a) − dfa (h)k, from which it follows that lim

h→0

fi (a + h) − fi (a) − bi · h = 0. khk

Therefore, the derivative of fi at a exists and equals bi . By 9.1.4, bi = ∇fi (a), that is, bij = ∂j fi (a). Conversely, suppose each fj is differentiable at a. Then ∇fj (a) exists and by (9.4), lim

h→0

|fi (a + h) − fi (a) − ∇fi (a) · h| = 0, i = 1, . . . , m. khk

292

A Course in Real Analysis

Let T (h) denote the right side of (9.5). Then T is linear and m

X |fi (a + h) − fi (a) − ∇fi (a) · h|2 kf (a + h) − f (a) − T (h)k2 = →0 khk2 khk2 i=1 as h → 0. Therefore, dfa exists and equals T . By the theorem, the (i, j) entry of f 0 (a) is ∂j fi (a). The effect of dfa on a vector h ∈ Rn may therefore be expressed in matrix form as ∇f1 (a) · h h1 ∂1 f1 (a) · · · ∂n f1 (a) .. .. .. .. f 0 (a)ht = , . = . . . ∂1 fm (a) · · ·

∂n fm (a)

∇fm (a) · h

hn

where ht denotes the transpose of the vector h. In the special case m = n, the determinant of f 0 (a) is called the Jacobian of f at a and is denoted variously by ∂(f1 , . . . , fn ) det f 0 (a) = Jf (a) = (a). ∂(x1 , . . . , xn ) 9.1.9 Example. The transformation (x, y, z) = (r cos θ, r sin θ, z) from cylindrical coordinates to rectangular coordinates in R3 has Jacobian cos θ sin θ 0 ∂(x, y, z) ♦ = −r sin θ r cos θ 0 = r. ∂(r, θ, z) 0 0 1 The following characterization of differentiability will be useful. 9.1.10 Theorem. Let f : U → Rm , where U ⊆ Rn is open. Then f is differentiable at a ∈ U iff there exists T ∈ L(Rn , Rm ) and, for sufficiently small r, a function η : Br (0) → Rm such that f (a + h) = f (a) + T h + khk η(h), and

lim η(h) = 0.

h→0

(9.6)

In this case, T = dfa . Proof. Assume that f is differentiable at a. Choose r > 0 such that Br (a) ⊆ U and define η : Br (0) → Rm by η(0) = 0 and η(h) =

f (a + h) − f (a) − dfa (h) khk

if h 6= 0.

Then (9.6) holds with T = dfa . Conversely, if (9.6) holds for some η and T , then kf (a + h) − f (a) − T hk = lim kη(h)k = 0, h→0 h→0 khk lim

hence f is differentiable at a with dfa = T .

Differentiation on Rn

293

9.1.11 Corollary. If f is differentiable at a, then f is continuous at a. Proof. By (9.6) and the continuity of linear transformations, lim f (a + h) − f (a) = lim khkη(h) + lim dfa (h) = 0. h→0

h→0

h→0

Exercises 1. Find the differential df for each of the functions f (x, y): (a) S (d) (g)

x−y . x+y x cos . y

(b)

sec (yex ).

(h) S exy .

ln(x2 + y 3 ).

(e) S sin (x2 y). 2

(c)

arctan (xy 2 ).

(f)

y arcsin , 0 < y < x. x 3x + 2y tan . 2x + 3y

(i)

2. Find f 0 (x) where f (x) = xy x2 − y 2 3 3 2 2 S x y S (a) x − y , x y . (b) e sin y, e sin x . (c) , . x2 + y 2 x2 + y 2 (d) ln(x2 + y 2 + z 2 + 1), xyz . (e) arctan(x − y), exy , x/y . 3. For each of the functions f (x, y) below, find all values of p, q ∈ N for which on R2 (i) fx , fy exist, (ii) fx , fy are continuous, (iii) f 0 exists. p p q q x + y if (x, y) 6= 0, x y if (x, y) 6= 0, S 2 2 x +y x2 + y 2 (a) (b) 0 0 otherwise. otherwise. ( xp sin 1 + y q if x 6= 0, (x − y)p sin(x − y)−1 if x 6= y, (c) (d) S x y q 0 otherwise. otherwise. p p q q x + y x y if x 6= y, if (x, y) 6= 0, x−y x−y (e) (f) 0 0 otherwise. otherwise. 4. Find all values of p, q, s ∈ (0, +∞) for which on R2 (i) fx , fy exist,

(ii) fx , fy are continuous,

(iii) f 0 exists,

where f (0, 0) = 0 and, for (x, y) 6= (0, 0), f (x, y) = (a)S |x|p |y|q ln(x2 + y 2 ). (d)S

sin |x|p |y|q . (x2 + y 2 )s

(b)

sin(x2 + y 2 )p . (x2 + y 2 )q

(e)

sin−1 |x|p |y|q . (x2 + y 2 )s

(c)

tan(x2 + y 2 )p . (x2 + y 2 )q

294

A Course in Real Analysis

5. Spherical coordinates (ρ, φ, θ) in R3 are defined by x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ, where ρ ≥ 0, 0 ≤ φ ≤ π, and 0 ≤ θ < 2π. Show that ∂(x, y, z) = ρ2 sin φ. ∂(ρ, φ, θ) ∂(u, v) 6. Let (u, v) = sin f (x, y), cos f (x, y) . Find . ∂(x, y) 7. Let (u, v, w) = (y/z, z/x, x/y), where xyz 6= 0. Find 8.S Let f (x) =

n X

xai i and g(x) =

i=1

where xi , ai > 0 and

P

i

n Y

∂(u, v, w) . ∂(x, y, z)

xai i ,

i=1

ai = 1. Find

(a) x · ∇f (x).

(b) x · ∇g(x).

9. Let f (x) be defined implicitly by the equation n

X 1 1 = . f (x) x i=1 i Express ∇f (x) in terms of f . Pn xi 10.S Let f (x) = ln . Express ∇f (x) in terms of f . i=1 e 11. Let the equation αxn − x1 x2 · · · xn−1 = 0, α 6= 0, define each of the variables x1 , . . . , xn−1 as a differentiable function of xn . Show that xn−2 n

∂x1 ∂x2 ∂xn−1 ··· = α. ∂xn ∂xn ∂xn

12. Let x = (x1 , . . . , xn ). Find ∂i of 1 (a)S kxk. (b) . kxk

(c)S

xi . kxk

(d)

xi . kxk2

13. Let f : R → R be differentiable and p > 0. Show that for x 6= 0, x · ∇kxkp = pkxkp and x · ∇f kxkp = pf 0 kxkp kxkp .

Differentiation on Rn

9.2

295

Properties of the Differential

In this section we consider analogs of differentiation rules for single variable functions. Deeper properties of the differential are taken up in later sections.

Linearity of the Differential 9.2.1 Theorem. Let U ⊆ Rn be open, let f, g : U → Rm be differentiable at a ∈ U , and let α, β ∈ R. Then αf + βg is differentiable at a and d(αf + βg)a = αdfa + βdga . Proof. By 9.1.10, there exist functions η(h), µ(h), defined for h ∈ Rn with sufficiently small norm, such that f (a + h) = f (a) + dfa (h) + khkη(h), g(a + h) = g(a) + dga (h) + khkµ(h),

lim η(h) = 0, and

h→0

lim µ(h) = 0.

h→0

Then (αf + βg)(a + h) = (αf + βg)(a) + (αdfa + βdga )(h) + khk αη + βµ (h) and

lim αη + βµ (h) = 0.

h→0

Another application of 9.1.10 completes the proof.

The Norm of a Linear Transformation For additional properties of the differential, including product rules, we need the notion of operator norm on the space L(Rn , Rm ) of linear transformations from Rn to Rm . 9.2.2 Definition. Let T ∈ L(Rn , Rm ). The operator norm of T is defined as kT k = sup kT xk : x ∈ Rn , kxk = 1 . ♦ The following proposition justifies the use of the term “norm.” 9.2.3 Proposition. kT k defines a norm on L(Rn , Rm ) such that kT xk ≤ kT k kxk for all x ∈ Rn . Moreover, if [aij ]m×n is the matrix of T , then for all k, ` X 1/2 m X n |ak` | ≤ kT k ≤ a2ij . i=1 j=1

(9.7)

(9.8)

296

A Course in Real Analysis

Proof. Inequality (9.7) is clear if x = 0. If x 6= 0, then kxk−1 x has norm 1 hence

kxk−1 kT xk = T (kxk−1 x) ≤ 1. To verify (9.8), let ai = (ai1 , . . . , ain ). Since T x = a1 · x, . . . , an · x , by the Cauchy–Schwarz inequality, kT xk2 =

m m X X X (ai · x)2 ≤ kai k2 kxk2 = a2ij , i=1

i=1

kxk = 1,

i,j

which verifies the second inequality in (9.8). The first inequality follows from |ak` |2 ≤

m X

|ai` |2 = kT e` k2 ≤ kT k2 .

i=1

To see that kT k defines a norm, note that homogeneity follows directly from the definition, and the triangle inequality kT1 + T2 k ≤ kT1 k + kT2 k is a consequence of k(T1 + T2 )xk ≤ kT1 xk + kT2 xk ≤ kT1 k + kT2 k, kxk = 1. The property of coincidence follows directly from (9.7). 9.2.4 Corollary. A linear transformation T : Rn → Rm is uniformly continuous. Proof. This follows from kT x − T yk = kT (x − y)k ≤ kT k kx − yk, using the linearity of T . Since L(Rn , Rm ) is a normed vector space, it is a metric space under the distance function ρ(T1 , T2 ) := kT1 − T2 k. Thus the methods of Chapter 8 apply. In particular, we have the following consequence of 9.2.3. 9.2.5 Corollary. Let (X, d) be a metric space and let F be a function from X to L(Rn , Rm ). For each x ∈ X, let [aij (x)]m×n denote the matrix of F (x). Then F is (uniformly) continuous with respect to the metric ρ iff each function aij (x) is (uniformly) continuous on X. Proof. The matrix of F (x) − F (y) is [aij (x) − aij (y)]m×n , hence, by (9.8), X |ak` (x) − ak` (y)|2 ≤ kF (x) − F (y)k2 ≤ [aij (x) − aij (y)]2 . i,j

The assertion follows.

Differentiation on Rn

297

Product Rules We consider two product rules; additional product rules, as well as a quotient rule, are given in the exercises. 9.2.6 Theorem (Scalar Product Rule). Let U be open in Rn and f : U → Rm and ψ : U → R differentiable at a ∈ U . Then d(ψf )a (h) = ψ(a)dfa (h) + ∇ψ(a) · h f (a), h ∈ Rn . (9.9) Proof. By 9.1.10, there exist functions η(h) and µ(h), defined for h ∈ Rn with sufficiently small norm, such that f (a + h) − f (a) − dfa (h) = khkη(h),

lim η(h) = 0,

h→0

ψ(a + h) − ψ(a) − ∇ψ(a) · h = khkµ(h),

lim µ(h) = 0.

h→0

Let T h denote the right side of (9.9) and set ν(h) := (ψf )(a + h) − (ψf )(a) − T h = ψ(a + h)f (a + h) − ψ(a)f (a) − ψ(a)dfa (h) − ∇ψ(a) · h f (a). Then T is linear and ν(h) = ψ(a + h) f (a + h) − f (a) − dfa (h) + ψ(a + h) − ψ(a) − ∇ψ(a) · h f (a) + ψ(a + h) − ψ(a) dfa (h) = ψ(a + h)khkη(h) + khkµ(h)f (a) + ψ(a + h) − ψ(a) dfa (h). Since kdfa (h)k ≤ kdfa k khk, kν(h)k ≤ |ψ(a + h)| kη(h)k + |µ(h)| kf (a)k + kψ(a + h) − ψ(a)k kdfa k. khk By continuity of ψ at a, the right side of the last inequality tends to zero as h → 0, proving the theorem. 9.2.7 Theorem (Dot Product Rule). Let U be open in Rn and f, g : U → Rm differentiable at a ∈ U . Then d(f · g)a (h) = f (a) · dga (h) + g(a) · dfa (h), h ∈ Rn .

(9.10)

Proof. Let η(h) and µ(h) be functions defined for sufficiently small khk such that f (a + h) − f (a) − dfa (h) = khkη(h), g(a + h) − g(a) − dga (h) = khkµ(h),

lim η(h) = 0,

h→0

lim µ(h) = 0.

h→0

298

A Course in Real Analysis

Let T h denote the right side of (9.10) and define ν(h) :=(f · g)(a + h) − (f · g)(a) − T h, h ∈ Rn =f (a + h) · g(a + h) − f (a) · g(a) − f (a) · dga (h) − g(a) · dfa (h). Then T is linear and ν(h) = f (a + h) · g(a + h) − g(a) − dga (h) + g(a) · f (a + h) − f (a) − dfa (h) + f (a + h) − f (a) · dga (h) = khkf (a + h) · µ(h) + khkg(a) · η(h) + dfa (h) + khkη(h) · dga (h). By the Cauchy–Schwarz and operator norm inequalities, |ν(h)| ≤ kf (a + h)k kµ(h)k + kg(a)k kη(h)k khk + kdfa k kdga k khk + kη(h)k kdga k khk. Since the right side of this inequality tends to zero as h → 0 so does the left, completing the proof.

Continuity of the Differential If U is an open subset of Rn and f : U → Rm is differentiable, then the mapping x 7→ dfx is a function from U to L(Rn , Rm ). Since L(Rn , Rm ) is a metric space in the operator norm, the notion of continuity of this mapping is meaningful. 9.2.8 Definition. Let U ⊆ Rn be open. A function f : U → Rm is said to be continuously differentiable on U if dfx exists and is continuous as a function of x on U . In this case, f is also said to be of class C 1 on U . A function g is continuously differentiable on a subset E of Rn if g is the restriction to E of a continuously differentiable function f on an open set U ⊇ E. ♦ 9.2.9 Theorem. Let f = (f1 , . . . , fm ) : U → Rm , where U ⊆ Rn is open. Then f is continuously differentiable on U iff the partial derivatives ∂j fi , 1 ≤ i ≤ m, 1 ≤ j ≤ n, exist and are continuous on U . Proof. If f is continuously differentiable on U then, by 9.2.5, the matrix f 0 (x) has continuous entries. By 9.1.8, these entries are the partial derivatives of the components of f . For the sufficiency, by 9.1.8 we may assume that m = 1, that is, f is realvalued. Suppose then that the partial derivatives ∂j f exist and are continuous on U . Let a ∈ U and ε > 0. Choose r > 0 such that Br (a) ⊆ U and fix h = (h1 , . . . , hn ) such that khk < r. For 1 ≤ j ≤ n set gj (t) := f a + hj (t) , hj (t) := (h1 , . . . , hj−1 , thj , 0, . . . , 0), 0 ≤ t ≤ 1.

Differentiation on Rn Then

299

gj (1) − gj (0) = f a + hj (1) − f a + hj (0) .

Also, by the mean value theorem and the chain rule, there exists tj ∈ (0, 1) such that gj (1) − gj (0) = gj0 (tj ) = hj ∂j f a + hj (tj ) . Therefore, n n X X f (a + h) − f (a) = gj (1) − gj (0) = hj ∂j f a + hj (tj ) , j=1

j=1

hence f (a + h) − f (a) − ∇f (a) · h =

n X ∂j f a + hj (tj ) − ∂j f (a) hj = ν(h) · h, j=1

where ν(h) :=

n X

[∂j f a + hj (tj ) − ∂j f (a) ei .

j=1

Since limh→0 hj (tj ) = 0, the continuity of ∂j f at a implies that limh→0 ν(h) = 0. Since |ν(h) · h| ≤ kν(h)k khk, |f (a + h) − f (a) − ∇f (a) · h| ≤ kν(h)k → 0, khk completing the proof.

Exercises 1.S Prove that for T ∈ L(Rn , Rm ), kT k = sup kT xk : x ∈ Rn , kxk ≤ 1 . 2. Let T1 ∈ L(Rm , Rk ) and T2 ∈ L(Rn , Rm ). Prove that kT1 T2 k ≤ kT1 k kT2 k. (We use the standard notation T1 T2 for composition of linear operators.) 3.S (Quotient rule) Let U f , and ψ be as in 9.2.6. If ψ(a) 6= 0, prove that ψ(a)dfa (h) − ∇ψ(a) · h f (a) f d (h) = . ψ a ψ 2 (a) 4. Find dgx (x), kxk = 6 0, if g(x) = (a)S kxkx.

(b) kxk−2 x.

(c) kxk−1 x.

300

A Course in Real Analysis

5. The cross product of vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) is defined by a2 a3 1 a1 a3 2 a1 a2 3 e . a×b= e − e + b2 b3 b1 b3 b1 b2 (See Exercise 1.6.9.) Let f : U → R3 and g : U → R3 , where U ⊆ Rn is open. Define f × g on U by (f × g)(x) = f (x) × g(x). Prove that d(f × g)a (h) = f (a) × dga (h) + dfa (h) × g(a). 6.S Let V ⊆ Rp and W ⊆ Rq be open, f : V → Rk , g : W → Rk , and α, β ∈ R. Define F on V × W ⊆ Rp+q by F (x, y) = αf (x) + βg(y),

x ∈ V,

y ∈ W.

If f is differentiable at a ∈ V and g is differentiable at b ∈ W , prove that F is differentiable at c := (a, b) and dFc (h, k) = αdfa (h) + βdgb (k), h ∈ Rp , k ∈ Rq . 7. Let V ⊆ Rp and W ⊆ Rq be open and f : V → Rk , g : W → Rk . Define F on V × W ⊆ Rp+q by F (x, y) = f (x) · g(y),

x ∈ V,

y ∈ W.

If f is differentiable at a ∈ V and g is differentiable at b ∈ W , prove that F is differentiable at c := (a, b) and dFc (h, k) = g(b) · dfa (h) + f (a) · dgb (k), h ∈ Rp , k ∈ Rq . 8. Formulate and prove the analog of Exercise 7 for cross products. 9. Let f : I → Rm be differentiable and kf k = 1 on an open interval I. Prove that f (t) and f 0 (t) are perpendicular for all t, that is, f · f 0 = 0 on I. 10.S Let f : [a, b] → Rm be differentiable and v 6∈ A := f [a, b]. Referring to Exercise 8.5.15 with d(x, y) = kx − yk, show that (a) d(A, v) = kf (t0 ) − vk for some t0 ∈ [a, b]. (b) f (t0 ) − v · f 0 (t0 ) = 0 if t0 ∈ (a, b). 11. A path ϕ : [a, b] → Rn is piecewise smooth if there exists a partition a0 = a < a1 < · · · < an = b of [a, b] such that ϕ0 exists and is continuous on each subinterval [aj−1 , aj ]. Let U ⊆ Rn be nonempty open and connected. Show that if ε > 0, then any pair of points can be joined by a piecewise smooth path ϕ in U such that supaj−1 ≤t≤aj kϕ0 (t)k < ε for each j.

Differentiation on Rn

9.3

301

Further Properties of the Differential

In this section we prove two important theorems, the first of which is an n-dimensional version of the chain rule. 9.3.1 Chain Rule. Let U ⊆ Rn and V ⊆ Rm be open and f : U → Rm , g : V → Rk with f (U ) ⊆ V . If f is differentiable at a ∈ U and g is differentiable at b := f (a), then g ◦ f : U → Rk is differentiable at a and the linear transformation d(g ◦ f )a : Rn → Rk is the composition of the linear transformations dgb : Rm → Rk and dfa : Rn → Rm : d(g ◦ f )a = dgb ◦ dfa . Proof. Choose r, s > 0 such that Br (a) ⊆ U and Bs (b) ⊆ V . By 9.1.10 there exist functions η : Br (0) → Rm and ν : Bs (0) → Rk such that f (a + h) = f (a) + dfa (h) + khkη(h), g(b + k) = g(b) + dgb (k) + kkkν(k), Set

lim η(h) = 0, and

(9.11)

lim ν(k) = 0.

(9.12)

h→0 k→0

k = f (a + h) − f (a) = dfa (h) + khkη(h).

(9.13)

By the continuity of f at a, k ∈ Bs (0) for all sufficiently small khk. For such h set µ(h) = (g ◦ f )(a + h) − (g ◦ f )(a) − (dgb ◦ dfa )(h). To complete the proof we show that µ(h) = 0. h→0 khk lim

From (9.11), (9.12), and (9.13), µ(h) = g(b + k) − g(b) − dgb (k) + dgb [k − dfa (h)] = kkkν(k) + khkdgb (η(h)), hence

Since we have

kµ(h)k kkk ≤ kν(k)k + kdgb (η(h)k khk khk

kkk = dfa (h) + khkη(h) ≤ kdfa k + kη(h)k khk, kµ(h)k ≤ kdfa k + kη(h)k kν(k)k + kdgb (η(h))k. khk

Since h → 0 implies k → 0, (9.14) follows.

(9.14)

302

A Course in Real Analysis

9.3.2 Remark. Let f be differentiable on U and g differentiable on V . Set y = f (x) and z = (g ◦ f )(x) = g(y). Then the chain rule may be written in matrix form as (g ◦ f )0 (x) = g 0 (y)f 0 (x) or ∂z1 ∂x1 . . . ∂z k ∂x1

···

···

∂z ∂z1 1 ∂y1 ∂xn .. . . = .. ∂z ∂z k

k

∂xn

∂y1

∂z1 ∂y1 ∂ym ∂x1 .. . . .. ∂zk ∂ym ∂ym ∂x1

···

···

···

···

∂y1 ∂xn .. . . ∂y m

∂xn

From this we obtain the familiar formulas m

X ∂z` ∂yi ∂z` = ∂xj ∂yi ∂xj i=1

j = 1, . . . , n,

` = 1, . . . , k.

♦

9.3.3 Example. Let the partial derivatives of u = f (x, y) and v = g(x, y) exist on R. If x = r cos θ and y = r sin θ, we may use the chain rule to find fx , fy , gx , and gy in terms of ur , vr , uθ , and vθ . Indeed, from 9.3.2, ur uθ f fy cos θ −r sin θ = x , vr v θ gx gy sin θ r cos θ hence fx gx

fy u = r gy vr

uθ vθ

cos θ sin θ

−r sin θ r cos θ

−1 1 ur = r vr

uθ vθ

r cos θ − sin θ

Thus, for example, fx = (cos θ)ur − r−1 (sin θ)uθ .

r sin θ . cos θ ♦

9.3.4 Remark. The chain rule may be used to suggest a definition of tangent plane to a smooth surface. Let f : U → R be differentiable on the open subset U of Rn and let c ∈ R. The set S = {x ∈ U : f (x) = c and ∇f (x) 6= 0} is called a level surface of f in Rn . Let a ∈ S and let ϕ : (−r, r) → Rn be a smooth path in S such that ϕ(0) = a. The existence of such paths may be justified by the implicit function theorem, proved in the next section. Applying the chain rule to the identity f ϕ(t) = c, we see that 0 = (f ◦ ϕ)0 (0) = ∇f (a) · ϕ0 (0). Since ϕ0 (0) is tangent to the curve at a, ∇f (a) is perpendicular to S at a. The tangent hyperplane to S at a is then defined as the set of all points x ∈ Rn such that x − a is perpendicular to ∇f (a), that is, (x − a) · ∇f (a) = 0.

Differentiation on Rn

303

For the hyperplane tangent at a to the (n − 1)-dimensional sphere example, x ∈ Rn : |xk2 = 1 is the set of all x such that n X

2ai (xi − ai ) = 0

or a · x = 1.

i=1

The tangent hyperplane at a to a surface S may be seen as the best linear approximation to S near a. ♦ The second main result of this section is an n-dimensional version of the mean value theorem of Chapter 4. While such a theorem is not generally available for vector-valued functions (Exercise 14), there is a version for scalarvalued functions. For its statement, we recall that the line segment in Rn from a to b is defined by [a : b] = {(1 − t)a + tb : 0 ≤ t ≤ 1} . 9.3.5 Mean Value Theorem. Let U ⊆ Rn be open and let f : U → R be differentiable on U . For each pair of points a, b ∈ U with [a : b] ⊆ U there exists c ∈ [a : b] such that f (b) − f (a) = dfc (b − a) = ∇f (c) · (b − a). Proof. Set ϕ(t) = (1 − t)a + tb, 0 ≤ t ≤ 1, and g = f ◦ ϕ. Since ϕ0 (t) = b − a, the chain rule and one-variable mean value theorem imply that f (b) − f (a) = g(1) − g(0) = g 0 (c) = dfϕ(c) (b − a) for some c ∈ (0, 1). Setting c = ϕ(c) completes the proof. We conclude this section with two applications of the mean value theorem. 9.3.6 Theorem. Let U ⊆ Rn be open and let f : U → Rm be continuously differentiable on U . Let C ⊆ U be compact and convex and define c := supz∈C kdfz k. Then c < +∞ and kf (x) − f (y)k ≤ ckx − yk, x, y ∈ C. Proof. Since z 7→ dfz is continuous and C is compact, c < +∞. Let x, y ∈ C and u ∈ Rm . By 9.3.5 applied to the scalar function g := u · f , there exists a point c ∈ [x : y] ⊆ C such that u · f (x) − f (y) = g(x) − g(y) = dgc (x − y) = u · dfc (x − y). Taking u = f (x) − f (y) and using the Cauchy–Schwarz and the operator norm inequalities, we have kf (x) − f (y)k2 = f (x) − f (y) · dfc (x − y) ≤ ckf (x) − f (y)k kx − yk. Dividing by kf (x) − f (y)k completes the proof.

304

A Course in Real Analysis

9.3.7 Corollary. Let U ⊆ Rn be open and connected and let f : U → Rm be differentiable on U . If dfx = 0 for all x ∈ U , then f is constant. Proof. Let x ∈ U and choose r > 0 such that Cr (x) ⊆ U . Since Cr (x) is compact and convex, 9.3.6 implies that kf (x) − f (y)k ≤ ckx − yk, y ∈ Cr (x), c :=

sup kdfz k. z∈Cr (x)

By hypothesis, c = 0, hence f (y) = f (x) for all y ∈ Cr (x). Thus f is constant on any ball contained in U . Now let a ∈ U and define Ua = {x ∈ U : f (x) = f (a)} and Va = {x ∈ U : f (x) 6= f (a)} . By the first paragraph, if x ∈ Ua , then a ball with center x is contained in Ua . Therefore, Ua is open. A similar argument shows that Va is open. Since U is connected and Ua 6= ∅, Ua = U , that is, f (x) = f (a) for all x ∈ U .

Exercises 1.S Let g, ϕ, ψ : R → R be differentiable and let f (x, y) = g ϕ(x)ψ(y) . Find ∇f (x, y) in terms of g, ϕ, and ψ. 2. Let ϕ : R → R and g : R3 → R be differentiable and set f (x, y) := g x, ϕ(x + 2y), ϕ(x − 3y) . Find fy in terms of g and ϕ. 3.S Let g : R2 → R be differentiable, a, b ∈ Rn , and set f (x) = g a·x, b·x). Find ∇f . 4. Let the partial derivatives of f : R2 → R of f exist and let z = f (x, y) = f (r cos θ, r sin θ). Prove that

2 2 2 2 ∂z ∂z ∂z ∂z r + = + r . ∂r ∂θ ∂x ∂y

5. Let F : Rn → R be differentiable and set f (x) = F (x, . . . , x). Prove that f 0 (x) = (1, . . . , 1) · ∇F (x, . . . , x). 6. Let f (x, y) be continuously differentiable. Prove that f (x, y) =

Z 0

1

(x, y) · ∇f (tx, ty)t dt +

Z

1

f (tx, ty) dt.

0

7.S Let f : U → Rm be differentiable on an open set U ⊆ Rn . Find (T ◦f )0 (x) for T ∈ L(Rm , Rk ).

Differentiation on Rn

305

8. Let f : Rn → R be differentiable and a = (a1 , . . . , an ) ∈ Rn with an 6= 0. Prove that a · ∇f (x) = 0 for all x ∈ Rn iff there exists a differentiable function g : Rn−1 → R such that f (x1 , x2 , . . . , xn ) = g x1 − b1 xn , x2 − b2 xn , . . . , xn−1 − bn−1 xn , where bj = aj /an , 1 ≤ j ≤ n − 1. 9. Let U ⊆ Rn be open and f : U → R smooth. Let α, β : I → Rn be smooth paths in U such that ∇(f ◦ α) = α0 , α(t1 ) = β(t1 ), and kα0 (t1 )k = kβ 0 (t1 )k = 1 for some t1 ∈ I (that is, α and β both have unit speed at the intersection). Show that (f ◦ α)0 (t1 ) ≥ (f ◦ β)0 (t1 ). 10. Let U ⊆ Rn be open and f : U → R differentiable at a ∈ U . If u ∈ Rn with kuk = 1, define the directional derivative of f in the direction of u by f (a + tu) − f (a) Du f (a) = lim . t→0 t (a)S Show that if f is differentiable at a, then Du f (a) exists and equals u · ∇f (a). (b) Show that if Du f exists, then D−u f exists and D−u f = −Du f . (c)S Define

2 xy 2 f (x, y) = x + y 4 0

if (x, y) 6= (0, 0), otherwise.

Show that Du f (0, 0) exists for each u but f is not even continuous at (0, 0). (d) Find all unit vectors u such that Du (xy)1/3 exists at (0, 0). (e) Find all unit vectors u such that Du |x + y| exists at (x0 , −x0 ). (f) Find all unit vectors u such that Du (x + y)1/3 exists at (0, 0). 11. Let z = F (x, y), where x = x(u, v), y = y(u, v), z = z(u, v), and the partial derivatives of these functions exist on R2 . Suppose that xu yv − yu xv 6= 0. Find zx and zy in terms of zu , zv , xu , xv , yu , and yv . 12.S Let f and fx be continuous on [a, b] × [c, d]. Use the mean value theorem to prove that Z b Z b d f (t, x) dt = fx (t, x) dt, c ≤ x ≤ d. dx a a 13. Let f and fx be continuous on R2 and u(x), v(x) differentiable on R. Use Exercises 5 and 12 to prove that Z v(x) Z v(x) d f (t, x) dt = fx (t, x) dt + f v(x), x v 0 (x) − f u(x), x u0 (x). dx u(x) u(x)

306

A Course in Real Analysis

14. Show that the mean value theorem does not generally hold for vectorvalued functions. 15.S A function f : Rn \ {0} → R is homogeneous of degree p > 0 if f (tx) = tp f (x) for all t > 0 and all x 6= 0. Prove that a differentiable function f is homogeneous of degree p iff x · ∇f (x) = pf (x) for every x 6= 0. 16. Prove the following generalization of the Cauchy mean value theorem: Let U ⊆ Rn be open and convex and let f, g : U → R be differentiable on U . Then, for each pair of points a, b ∈ U , there exists c ∈ [a : b] such that f (b) − f (a) ∇g(c) · (b − a) = g(b) − g(a) ∇f (c) · (b − a). 17.S Let f : U → Rm be continuously differentiable on the open set U ⊆ Rn and let C be a compact convex subset of U . Prove that kf (x) − f (y) − dfy (x − y)k ≤ sup kdfz − dfy k kx − yk, x, y ∈ C, z∈C

and that the supremum is finite. 18. Let f (x, y) = x2 −y 2 , 2xy and (a, b) 6= (0, 0). Show that if the functions ϕ, ψ : (−1, 1) → R2 are differentiable and ϕ(0) = ψ(0) = (a, b), then ϕ0 (0) · ψ 0 (0) (f ◦ ϕ)0 (0) · (f ◦ ψ)0 (0) = , k(f ◦ ϕ)0 (0)k k(f ◦ ψ)0 (0)k kϕ0 (0)k kψ 0 (0)k that is, the angle between the curves ϕ and ψ at their intersection is preserved under the transformation f .

9.4

Inverse Function Theorem

The one-dimensional inverse function theorem of Section 4.4 has the following n-dimensional extension. 9.4.1 Inverse Function Theorem. Let U ⊆ Rn be open and let f : U → Rn be continuously differentiable on U . If Jf (a) 6= 0 for some a ∈ U , then there exist open sets Ua ⊆ U and Va = f (Ua ) with a ∈ Ua such that f is one-to-one on Ua and f −1 : Va → Ua is continuously differentiable. Moreover, dfx

−1

= d(f −1 )y ,

x ∈ Ua , y := f (x).

(9.15)

Differentiation on Rn

307

The conclusion of the theorem may be summarized by saying that f has a continuously differentiable local inverse at a. Of course, since f need not be one-to-one on U , f may not have a “global” inverse. The proof of the theorem requires two lemmas. The first is of some independent interest. 9.4.2 Lemma (Contraction Mapping Principle). Let (X, d) be a complete metric space and let ϕ : X → X be a continuous function such that, for some 0 ≤ c < 1, d ϕ(x), ϕ(y) ≤ c d(x, y) for all x, y ∈ X. Then there exists a unique point x ∈ X such that ϕ(x) = x. Proof. Choose any point x0 in X and define a sequence {xn } recursively by xn = ϕ(xn−1 ), n ≥ 1. By hypothesis, d(xk+1 , xk ) ≤ c d(xk , xk−1 ) ≤ c2 d(xk−1 , xk−2 ) ≤ · · · ≤ ck d(x1 , x0 ). Thus, by the triangle inequality, for m > n d(xn , xm ) ≤

m−1 X

d(xk , xk+1 ) ≤ d(x1 , x0 )

k=n

∞ X

ck .

k=n

P∞

Since c < 1, the series k=1 ck converges, hence the sum on the right tends to zero as n → ∞. It follows that {xn } is a Cauchy sequence and therefore converges to some x ∈ X. Letting n → +∞ in the equation xn = ϕ(xn−1 ) yields ϕ(x) = x. If also ϕ(y) = y, then d(x, y) = d ϕ(x), ϕ(y) ≤ c d(x, y), which is possible only if x = y. 9.4.3 Lemma. Let U ⊆ Rn be open and f : U → Rn continuously differentiable. If a ∈ U with Jf (a) 6= 0, then there exists r > 0 such that the linear transformation dfx is invertible for each x ∈ Br (a). Proof. Since f 0 is continuous, its entries are continuous, hence Jf (x) is a continuous function of x. Since Jf (a) 6= 0, there exists r > 0 such that Jf (x) 6= 0 on Br (a) ⊆ U . Since a linear transformation on Rn is invertible iff the determinant of its matrix is not zero, dfx is invertible for x ∈ Br (a). Proof of the inverse function theorem. By 9.4.3, there exists an r > 0 such that Cr (a) ⊆ U and dfx is invertible for each x in an open set Wr containing Cr (a). Let T = dfa and define g = T −1 ◦f on Wr . Then dga = T −1 ◦ dfa = In , the identity transformation on Rn . Now apply 9.3.6 to the function g(x) − x on Cr (a). The constant c in that theorem is sup{kdgz − dga k : z ∈ Cr (a)},

308

A Course in Real Analysis

which we can make less than 1/2 by taking r sufficiently small, using the continuity of the function z 7→ dgz at a. Thus kg(x) − g(y) − (x − y)k ≤ 12 kx − yk, x, y ∈ Cr (a). Since

(9.16)

kx − yk − kg(x) − g(y)k ≤ kg(x) − g(y) − (x − y)k,

we see from (9.16) that 1 2 kx

− yk ≤ kg(x) − g(y)k, x, y ∈ Br (a).

In particular, g is one-to-one on Br (a). Next, we use 9.4.2 to show that g Br (a) is open. Let c ∈ Br (a), d = g(c) and choose s > 0 so that Cs (c) ⊆ Br (a). We claim that Bs/2 (d) ⊆ g Cs (c) ⊆ g Br (a) . (9.17) The second inclusion is clear. For the first, let u ∈ Bs/2 (d). To show that u ∈ g Br (a) define ϕ(x) = x − g(x) + u, x ∈ Cs (c). Then kc − ϕ(x)k = kg(x) − g(c) − (x − c) + d − uk ≤ kg(x) − g(c) − (x − c)k + kd − uk ≤ 21 kx − ck + kd − uk

by (9.16)

< s/2 + s/2 = s, so ϕ Cs (c) ⊆ Bs (c). Moreover, using (9.16) again we have kϕ(x) − ϕ(y)k ≤ 21 kx − yk,

x, y ∈ Cs (c).

By Lemma 9.4.2, ϕ(x) = x for some x ∈ Bs (c), hence u = g(x) ∈ g Bs (c) . Since u was arbitrary, (9.17) holds. Since d ∈ g Br (a) was arbitrary, g Br (a) is open. Next, we show that g −1 : g Br(a) → Br (a) is differentiable at b := g(a). Since b ∈ g Br (a) and g Br (a) is open, b + k ∈ g Br (a) for sufficiently small kkk, that is, for each such k, b + k = g(a + h) for some khk < r. By (9.16), khk − kkk ≤ kh − kk = kg(a + h) − g(a) − hk ≤ 12 khk,

Differentiation on Rn

309

hence kkk ≥ 12 khk. Since g −1 (b + k) = a + h and g −1 (b) = a, recalling that dga = In we have kg −1 (b + k) − g −1 (b) − In kk kh − kk kg(a + h) − g(a) − dga (h)k = ≤2 . kkk kkk khk Since k → 0 implies that h → 0, which in turn implies that the right side of the above inequality tends to zero, we see that g −1 is differentiable at b with derivative In . Now set Ua = Br (a) and Va = (T ◦ g)(Ua ). Since T is invertible, it is a homeomorphism, hence Va is open. Moreover, since g is one-to-one on Ua and maps Ua onto g(Ua ), f = T ◦ g is one-to-one on Ua and maps Ua onto Va . Since f −1 = g −1 ◦ T −1 , the chain rule implies that f −1 is differentiable at f (a) = T b. Now observe that the entire above argument may be used at any point x of Ua , since all that is needed is the invertibility of dfx . Therefore, f −1 is differentiable on Va . To verify (9.15) apply the chain rule to f −1 ◦ f = In : d(f −1 )y ◦ dfx = d(f −1 ◦ f )x = d(In )x = In , y = f (x) ∈ Va . 9.4.4 Corollary. Let U ⊆ Rn be open and f : U → Rn continuously differentiable with Jf (x) 6= 0 for each x ∈ U . Then f is an open map, that is, if E ⊆ U is open, then f (E) is open. If particular, f (U ) is open. Proof. In the notation of the theorem, f (E) is the union of the open sets f (Ua ∩ E), a ∈ E. Since continuous differentiability is a local property, we have 9.4.5 Global Inverse Function Theorem. Under the conditions of the preceding corollary, if f is also one-to-one on U , then f −1 : f (U ) → U is continuously differentiable. 9.4.6 Example. The function (x, y) = f (r, θ) = (r cos θ, r sin θ), r > 0, θ ∈ R, has Jacobian r, hence is locally invertible at each point of its domain. Since the function is not one-to-one, it has no global inverse. However, if the domain of f is suitably restricted, say by requiring θ0 < θ < θ0 + 2π, then f is one-to-one on the resulting open set Uθ0 := (0, +∞) × (θ0 , θ0 + 2π). By 9.4.5, the restriction g of f to Uθ0 has a continuously differentiable inverse r(x, y), θ(x, y) = g −1 (x, y) on the open set Vθ0 = fp (Uθ0 ), obtained by removing the ray (r, θ0 ), r ≥ 0, from R2 . Clearly, r(x, y) = x2 + y 2 . The function θ(x, y) is called the argument of (x, y) (determined by θ0 ) and is denoted by argθ0 (x, y). Thus p g −1 (x, y) = x2 + y 2 , argθ0 (x, y) on Vθ0 . For example, if θ0 = −π, then argθ0 (x, y) = arctan(y/x) for x > 0.

♦

310

A Course in Real Analysis

y

θ0 x FIGURE 9.1: The domain of argθ0 . If a function f has a nonzero Jacobian on an open set U and if f is oneto-one on an open subset U0 of U , then the inverse of the restriction of f to U0 is called a branch of f −1 (even though a global f −1 may not exist). In the preceding example, g −1 is one of infinitely many branches of f −1 . 9.4.7 Example. The function (x, y) = f (u, θ) = (eu cos θ, eu sin θ), where (u, θ) ∈ R2 , has Jacobian eu , hence is locally invertible at each point of R2 . The set Uθ0 = R × (θ0 , θ0 + 2π) is open, and f restricted to Uθ0 is one-to-one. Therefore, the corresponding branch of f −1 is continuously differentiable on f (Uθ0 ), which is the set Vθ0 of 9.4.6. The inverse may be given explicitly by p ♦ u = ln x2 + y 2 , θ = argθ0 (x, y). 9.4.8 Example. Let (u, v) = f (x, y) = 2x2 − 3y 2 , 3x2 − 2y 2 . The Jacobian is nonzero on the open set U = {(x, y) : xy 6= 0}. Solving the equations for x2 and y 2 yields 3v − 2u 2v − 3u and y 2 = . x2 = 5 5 Restricting f to each of the open quadrants of R2 , we obtain four natural branches of f −1 , each defined on the open set V := {(u, v) : 3v > 2u and 2v > 3u} = {(u, v) : v > max{2u/3, 3u/2}} , and each of the form r f

−1

(u, v) =

±

! r 3v − 2u 2v − 3u ,± , (u, v) ∈ V, 5 5

For example, in the open second quadrant of the x, y plane, one chooses the minus sign in the first coordinate and the plus sign in the second. ♦

Differentiation on Rn

311

Exercises 1. Find the largest set at each point of which the inverse function theorem guarantees a local C 1 inverse of f , where f (x) = (a) S (x + y, xy). (c) (e) S

2

(b) S (sin x + cos y, cos x + sin y). 2

(d) (sin x + sin y, cos x − cos y). 1 √ S , x, y > 0. (f) ln xy, 2 x + y2 x y (h) , . 1 + x2 + y 2 1 + x2 + y 2

ye−x , xe−y . ye−2x , ye3x .

(g) (xy, x2 − y 2 ). (i) S (x2 + y 2 , xy). 2 2 (k) ye−x , yex .

(j) S (xy 2 , x2 z, yz 2 ). (l) (x/y, y/z, z/x), xyz 6= 0.

2. Find a local inverse of the function in the specified part below of Exercise 1 −1 about the point (a, b) and find df(u,v) . (i) S (a) , a > b > 0.

(ii)

(iv) (g) , a > b > 0. (v)

(e) , ab 6= 0. S

(iii)

(i) , a, b > 0. (vi)

(f) , a > b > 0. (k) , a, b > 0.

Show that for part (a) in Exercise 1, no inverse is possible on (0, +∞)2 . 3. Let f (ρ, φ, θ) = x(ρ, φ, θ), y(ρ, φ, θ), z(ρ, φ, θ) be the spherical coordinate transformation of Exercise 9.1.5. Find an explicit formula for the branch of f −1 on the set {(ρ, φ, θ) : ρ > 0, 0 < φ < π, 0 < θ < π} . 4.S Let f (x, y) :=

y x , 2 2 2 x + y x + y2

, (x, y) 6= (0, 0).

Show that f = f −1 and find Jf . 5. By considering the function ( x + x2 sin(1/x) if x 6= 0, f (x) = 0 otherwise, show that the hypothesis in the statement of the inverse function theorem that df be continuous on U cannot be removed. 6. Let U ⊆ Rn be open and f : U → Rn of class C 1 such that for some c > 0, kf (x) − f (y)k ≥ ckx − yk for all x, y ∈ U , where c > 0. Prove that dfx is invertible for each x ∈ U . Conclude that f : U → f (U ) is a homeomorphism.

312

9.5

A Course in Real Analysis

Implicit Function Theorem

The implicit function theorem is one of the most important applications of the inverse function theorem. The theorem gives conditions under which an equation of the form F (x, y) = 0 may be solved locally for y in terms of x. The resulting function is then said to be implicitly defined by the equation F (x, y) = 0. The following simple example illustrates the basic idea. 9.5.1 Example. Let F (x, y, z) = x2 + y 2 + z 2 − 1. Consider the problem of finding all points (a, b, c) with F (a, b, c) = 0 such that the equation F (x, y, z) = 0 has a continuously differentiable solution z = z(x, y) satisfying z(a, b) = c. The key fact here is that such a solution is possible if Fz (a, b, c)(= 2c) 6= 0. Indeed, in this case a2 + b2 = 1 − c2 < 1, hence x2 + y 2 < 1 for all (x, y, z) sufficiently near (a, b, c) that satisfy F (x, y, z) = 0. For such points the solution p z(x, y) = ± 1 − x2 − y 2 is continuously differentiable, and if the sign chosen is that of c, then z(x, y) is the unique solution satisfying z(a, b) = c. ♦ Notation. For the statement and proof of the implicit function theorem we use the following conventions: For points z ∈ Rn+m we write z = (x, y) = (x1 , . . . xn , y1 , . . . ym ), x ∈ Rn , y ∈ Rm . For a differentiable function F (z) = F (x, y) = (F1 (x, y), . . . , Fm (x, y)), we denote by Fy (x, y) the m × m matrix with (i, j)th entry

∂Fi (x, y). ∂yj

♦

9.5.2 Implicit Function Theorem. Let U be an open subset of Rn+m , let F = (F1 , . . . , Fm ) : U → Rm be continuously differentiable, and let F (a, b) = 0 for some (a, b) ∈ U . If ∂(F1 , . . . , Fm ) = det Fy (a, b) 6= 0, ∂(y1 , . . . , ym ) then there is an open set Va ⊆ Rn containing a and a unique continuously differentiable mapping f : Va → Rm such that f (a) = b and F x, f (x) = 0 for every x ∈ Va . Proof. Define G : U → Rn+m by G(x, y) = x, F (x, y) = x, F1 (x, y), . . . , Fm (x, y) .

Differentiation on Rn

313

Then G is continuously differentiable, and In×n On×m 0 G (x, y) = , A Fy where In×n is the n × n identity matrix, Om×n is the m × n zero matrix, and A is an m × n matrix of partial derivatives of the components of F with respect to x. Therefore, JG = det Fy . Since det Fy (a, b) 6= 0, by the inverse function theorem there exists an open set W ⊆ U containing (a, b) and an open set V ⊆ Rn+m containing G(a, b) = (a, 0) such that G(W ) = V and H = (H1 , . . . , Hn , Hn+1 , . . . , Hn+m ) := G−1 : V → W is continuouslydifferentiable. Note that the identities H G(x, y) = (x, y) and G H(x, y) = (x, y) imply, respectively, that Hn+1 G(x, y) , . . . , Hn+m G(x, y) = y, (x, y) ∈ W (9.18) and

F x, Hn+1 x, y , . . . , Hn+m x, y = y, (x, y) ∈ V.

(9.19)

Now let Va = {x ∈ R : (x, 0) ∈ V }. Then Va is open and contains a. Define f on Va by f (x) = Hn+1 (x, 0), . . . , Hn+m (x, 0) . n

Then f is continuously differentiable, and since (a, 0) = G(a, b), (9.18) implies that f (a) = Hn+1 (a, 0), . . . , Hn+m (a, 0) = b. Furthermore, (9.19) implies that F (x, f (x)) = 0 on Va . This establishes the existence of f . To show uniqueness, assume that F (x, g(x)) = 0 for some function g : Va → Rm . Then G(x, f (x)) = x, F (x, f (x)) = x, F (x, g(x)) = G(x, g(x)). Since G is one-to-one, f (x) = g(x). 9.5.3 Example. The point (x, y, u, v) = (−1, 1, 1, 1) is a solution of the system F (x, y, u, v) := xu2 + y = 0 G(x, y, u, v) := xy 2 + u2 v 2 = 0, and at that point ∂(F, G) = 4xu3 v 6= 0. ∂(u, v) By the implicit function theorem, there are C 1 functions u(x, y) and v(x, y) defined on a ball Br (−1, 1) that satisfy the above system with u(−1, 1) = v(−1, 1) = 1. If r < 1, then (x, y) ∈ Br (−1, 1) implies that x < 0 < y and we have the explicit solution p √ u = −y/x, v = −x y. ♦

314

A Course in Real Analysis

9.5.4 Remark. Let f = (f1 , . . . , fm ) be the function in the statement of the implicit function theorem. Set y = f (x) and w = F (x, y) = F (z). Applying the chain rule to the identity F x, f (x) = 0 yields ∂wi ∂y1 ∂wi ∂ym ∂wi + + ··· + = 0, i = 1, . . . , m, j = 1, . . . , n. ∂xj ∂y1 ∂xj ∂ym ∂xj This may be written in matrix form as ∂w

1

∂y1 . . . ∂w

m

∂y1

···

···

∂w1 ∂y1 ∂ym ∂x1 .. . . .. ∂wm ∂ym ∂ym ∂x1

···

···

∂w1 ∂y1 ∂x1 ∂xn = − .. . ∂w ∂y m

∂xn

m

∂x1

···

···

∂w1 ∂xn , ∂w m

∂xn

or, in the above notation, as Fy (z)f 0 (x) = −Fx (z). Therefore, f 0 (x) = −Fy (z)−1 Fx (z), which shows that the partial derivatives of the solution f in the implicit function theorem may be calculated by carrying out a matrix inversion. However, this is practical only for small dimensions, and even in this case it is often easier to apply the chain rule directly and then use Cramer’s rule. The next example illustrates the latter approach. ♦ 9.5.5 Example. Suppose (x0 , y0 , u0 , v0 ) satisfies the system F (x, y, u, v) = G(x, y, u, v) = 0,

(9.20)

where F and G are C 1 in a neighborhood of (x0 , y0 , u0 , v0 ) and ∂(F, G) (x0 , y0 , u0 , v0 ) 6= 0. ∂(u, v) Then (9.20) has a C 1 solution u = u(x, y), v = v(x, y) near (x0 , y0 ) such that u0 = u(x0 , y0 ), v0 = v(x0 , y0 ). Differentiating each equation in (9.20) with respect to x and y, we obtain the two systems Fu ux + Fv vx = −Fx

Fu uy + Fv vy = −Fy

Gu ux + Gv vx = −Gx

Gu uy + Gv vy = −Gy

Cramer’s rule gives the following solutions near (x0 , y0 , u0 , v0 ): ∂(F, G) ∂(x, v) ux = − , ∂(F, G) ∂(u, v)

∂(F, G) ∂(u, x) vx = − , ∂(F, G) ∂(u, v)

∂(F, G) ∂(y, v) uy = − , ∂(F, G) ∂(u, v)

∂(F, G) ∂(u, y) vy = − . ♦ ∂(F, G) ∂(u, v)

Differentiation on Rn

315

Exercises 1.S What does the implicit function theorem tell us about solving the equation x + y 2 + exy = 1 near (0, 0) for one of the variables in terms of the other? 2. Suppose (x0 , y0 , z0 ) satisfies the equation F (x, y, z) = 0, where F is C 1 in a neighborhood of (x0 , y0 , z0 ) and Fz (x0 , y0 , z0 ) 6= 0. By the implicit function theorem, F (x, y, z) = 0 has a C 1 solution z = z(x, y) near (x0 , y0 ) with z0 = z(x0 , y0 ). Show that near (x0 , y0 , z0 ), zx = −

Fx Fz

and zy = −

Fy . Fz

3. Show that for each of the functions F below the equation F (x, y, z) = 0 has a local C 1 solution z = z(x, y) on some ball Br (a, b) such that z(a, b) = c. Calculate zx in a neighborhood of (a, b, c). (a) sin(xyz) + cos(xyz) − 1,

(a, b, c) = (1, π, 0).

(b) e

+ x + y + z − 1, √ (c) z sin(x + y + z) − π 3/6,

(a, b, c) = (0, 0, 0).

(d) xyz + ln(x + y + z) − 1 − ln 3,

(a, b, c) = (1, 1, 1).

(e) x ln z + y ln x + z ln y,

(a, b, c) = (1, 1, 1).

(f) x sin z + y sin x + z sin y − 3π/2,

(a, b, c) = (π/2, π/2, π/2).

(g) z

(a, b, c) = (1, 1, −1).

xyz

2n

+ xz

2n−1

(a, b, c) = (π/6, π/6, π/3).

+ xy − 1, n ∈ N,

(h) cos(xyz) + cos(xz) + cos(yz),

(a, b, c) = (0, 1, π/2).

4. Suppose (x0 , y0 , z0 ) satisfies the system F (x, y, z) = G(x, y, z) = 0, where F and G are C 1 in a neighborhood of (x0 , y0 , z0 ) and ∂(F, G) (x0 , y0 , z0 ) 6= 0. ∂(x, y) By the implicit function theorem, the system has a C 1 solution (x, y) = (x(z), y(z)) near (x0 , y0 ) with (x0 , y0 ) = (x(z0 ), y(z0 )). Show that near (x0 , y0 , z0 ), ∂(F, G) ∂(z, y) x0 (z) = − , ∂(F, G) ∂(x, y)

∂(F, G) ∂(x, z) and y 0 (z) = − . ∂(F, G) ∂(x, y)

5.S Show that each pair of variables in the system √ sin(x + z) + ln(y + z) = 2/2 exz + sin(πy + z) = 1

316

A Course in Real Analysis are C 1 functions of the other variable near (x, y, z) = (π/4, 1, 0). In the case (x, y) = x(z), y(z) , calculate x0 (z) and y 0 (z) in a neighborhood of (π/4, 1, 0).

6. Show that each pair of variables in the system xy + yz + xz = 11 xyz + x + y

=9

are C 1 functions of the other variable near (x, y, z) = (1, 2, 3). In the case (x, y) = x(z), y(z) , calculate x0 (z) and y 0 (z) in a neighborhood of (1, 2, 3). 7. Show that each pair of the variables (u, v), (x, y), and (x, v) in the system x2 − y 2 + uv−v 2 = 0 x2 + y 2 + uv+u2 = 4 are C 1 functions of the remaining variables near (x, y, u, v) = (1, 1, 1, 1). In the case u(x, y), v(x, y), calculate ux in a neighborhood of (1, 1). 8.S Show that the system x − y + z + u2 = 2 −x

+ 2z + u3 = 2 − y + 3z + u4 = 3

cannot be solved for x, y, and z in terms of u near the point (x, y, z, u) = (1, 1, 1, 1), but for any other group of three variables a local C 1 solution in terms of the fourth variable is possible. 9. Let f (x, y) be continuously differentiable with f (0, 0) = 0. Give conditions on fx and fy such that each of the equations below has a C 1 solution y = y(x) on some interval (−r, r) with y(0) = 0. Calculate y 0 (x) in each case. (a) f (2y, 2x − 3y) = 0. (b)S f f (x, y), y = 0. (c) f f (x, y), f (x, y) = 0. 10. Let f (x, y) be continuously differentiable with f (0, 0) = 0. Give conditions on fx and fy under which each of the equations below has a C 1 solution z = z(x, y) on some open ball Br (0, 0) with z(0, 0) = 0. (a) f (2y + 3z, 3x − 2z) = 0. (b) f f (x, −z), z ln(e2 + x + y) = 0. (c) f e2z f (x, 2z), f (y, sin 3z) = 0. (d) f f (z, x), f (y, z) = 0.

Differentiation on Rn

317

11. Let f (x, y) be continuously differentiable with f (0, 0) = 0. For each system below, give conditions on fx and fy under which the system has a C 1 solution x = g(z), y = h(z) on some interval (−r, r) with g(0) = h(0) = 0. (a) S f f (x, y), f (z, y) = 0 (b) f f (z, z), f (x, y) = 0 f f (y, z), f (x, z) = 0 f f (x, y), f (y, z) = 0 For each system, calculate g 0 (z). 12. Let f (x, y) be continuously differentiable with f (0, 0) = 0. What does the implicit function theorem tell us about the possibility of solving the system f f (u, x), f (v, y) = f f (y, u), f (x, v) = 0 (a) for (x, y) in terms of (u, v) such that x(0, 0) = y(0, 0) = 0? (b) for (u, v) in terms of (x, y) such that u(0, 0) = v(0, 0) = 0? 13.S Let f , g, and h be continuously differentiable and f (1) = g(1) = h(1) = 0. Give conditions on f 0 , g 0 , and h0 so that the system f (xu) + g(yu) + h(zu) = 0 f (xv) + g(yv) + h(zv) = 0 has a C solution u = u(x, y, z), v = v(x, y, z) on some ball Br (1, 1, 1) such that u(1, 1, 1) = v(1, 1, 1) = 1. Calculate ux . 1

14. Let D ⊆ R2 be compact and let F (x, y, z) be continuous on the set E := D × [a, b] such that for each (x, y) ∈ D there exists a unique z = z(x, y) ∈ [a, b] for which F x, y, z(x, y) = 0. Prove that z(x, y) is continuous on D. 15.S Suppose the equation F (x1 , . . . , xn ) = 0 may be solved for each variable xj in terms of the others. Show that under suitable conditions ∂x2 ∂x3 ∂xn ∂x1 ... = (−1)n . ∂x1 ∂x2 ∂xn−1 ∂xn Verify this for each of the functions (a) F (x1 , x2 , x3 ) = x1 x2 x3 − 1, (b) F (x1 , x2 , x3 , x4 ) = x1 x2 x3 x4 − 1. 16. Let p(x, y) and q(x, y) be C1 on an open set U containing (0, 0) such that p(0, 0) = q(0, 0) = 0 and for (x, y) ∈ U \ {(0, 0)} p(x, y) > 0, and − 1 ≤ q(x, y) ≤ 1. Let

f (x, y, z) = z 3 + p(x, y)z + q(x, y), (x, y) ∈ U, z ∈ R.

Prove that there is a unique solution z = z(x, y) to f (x, y, z) = 0 on all of U which is C 1 on U \ {(0, 0)} and satisfies z(0, 0) = 0.

318

9.6

A Course in Real Analysis

Higher Order Partial Derivatives

Let f be a real-valued function defined on an open subset of R2 with first partial derivatives fx and fy . The higher order partial derivatives are defined inductively by ∂ ∂f ∂2f := , 2 ∂y ∂y ∂y ∂2f ∂ ∂f fyx = := , ∂x∂y ∂x ∂y ∂3f ∂ ∂2f fxxy = := ∂y∂x2 ∂y ∂x2 .. .

∂ ∂f ∂2f := , 2 ∂x ∂x ∂x ∂2f ∂ ∂f fxy = := , ∂y∂x ∂y ∂x ∂ ∂2f ∂3f := , fxxx = ∂x3 ∂x ∂x2 .. .

fyy =

fxx =

Analogous definitions are given for functions of n variables. For such a function f , integers mi ∈ Z+ and a permutation (i1 , . . . , in ) of (1, . . . , n), ∂mf mn 1 ∂xm i1 · · · ∂xin

,

m := m1 + · · · + mn ,

is called a partial derivative of order m. The following result will allow some simplifications in calculating higher order partial derivatives. 9.6.1 Theorem. Let U ⊆ R2 be open and let f : U → R have continuous first partial derivatives fx and fy on U . If fxy exists on U and is continuous at (a, b) ∈ U , then fyx (a, b) exists and equals fxy (a, b). Proof. Choose r > 0 such that (a − r, a + r) × (b − r, b + r) ⊆ U . For |h|, |k| < r, define ϕk (x) = f (x, b + k) − f (x, b),

x ∈ (a − r, a + r),

ψh (y) = f (a + h, y) − f (a, y),

y ∈ (b − r, b + r),

∆(h, k) = ϕk (a + h) − ϕk (a) = ψh (b + k) − ψh (b) = f (a + h, b + k) − f (a, b + k) + f (a, b) − f (a + h, b). By the mean value theorem applied twice, there exist s, t ∈ (0, 1) such that ∆(h, k) = ϕ0k (a + sh)h = fx (a + sh, b + k) − fx (a + sh, b) h = fxy (a + sh, b + tk)hk.

Differentiation on Rn

319

By continuity of fxy at (a, b), lim

(h,k)→(0,0)

∆(h, k) = lim fxy (a + sh, b + tk) = fxy (a, b). hk (h,k)→(0,0)

On the other hand, for each h, lim

k→0

∆(h, k) ψh (b + k) − ψh (b) = lim = ψh0 (b) = fy (a + h, b) − fy (a, b), k→0 k k

so by the iterated limit theorem (8.4.4), fy (a + h, b) − fy (a, b) ∆(h, k) ∆(h, k) = lim lim = lim . h→0 h→0 k→0 h hk hk (h,k)→(0,0) lim

Therefore, fyx (a, b) = fxy (a, b). The following example shows that continuity of at least one of the second partial derivatives in the theorem is essential. 9.6.2 Example. Let f (0, 0) = 0 and define f (x, y) =

x3 y − y 3 x if (x, y) 6= (0, 0). x2 + y 2

Then the first partial derivatives exist and are continuous on R2 , the second partial derivatives exist on R2 , but fxy (0, 0) 6= fyx (0, 0). Indeed, since fx (0, 0) = 0, f (h, y) − f (0, y) h2 y − y 3 = lim 2 = −y, h→0 h→0 h + y 2 h

fx (0, y) = lim

and similarly fy (x, 0) = x. Therefore, fxy (0, 0) = −1 and fyx (0, 0) = 1.

♦

Theorem 9.6.1 may be extended to functions f of n variables. Indeed, if 1 ≤ i < j ≤ n, then under suitable continuity conditions one has ∂2f ∂2f = , ∂xi ∂xj ∂xj ∂xi since the only “active” variables in this identity are xi and xj . Combining this observation with an induction argument leads to the following result. 9.6.3 Corollary. Let f be a real-valued function defined on an open subset of Rn and let m = m1 + m2 + · · · + mn , mi ∈ Z+ . Then, for any permutation (i1 , . . . , in ) of (1, . . . , n), ∂mf m ∂xi1 i1

m · · · ∂xinin

=

∂mf 1 ∂xm 1

n · · · ∂xm n

,

provided that all partial derivatives of f up to order m are continuous on U .

320

A Course in Real Analysis

9.6.4 Definition. Let r ∈ N. A real-valued function f on an open set U ⊆ Rn is said to be of class C r on U (or simply C r on U ) if all partial derivatives up to order r exist and are continuous on U . Also, f is of class C ∞ on U if it is of class C r on U for every r ∈ N. A vector-valued function is C r if each component function is C r . Continuous functions are said to be of class C 0 . A function is of class C r on a set if it is the restriction of a C r function on a larger open set. ♦ 9.6.5 Remarks. (a) A function of class r + 1 is of class r. The function ( xr+1 if x1 ≥ 0, 1 f (x1 , . . . , xn ) = 0 otherwise is C r on Rn but not C r+1 . (b) The standard rules of differentiation show that if f and g are real-valued functions of class C r , then so are αf , f + g, f g, and f /g. For example, if f (x, y) and g(x, y) are of class C 2 , then (f g)xx = fxx g + f gxx + 2fx gx , with similar formulas holding for (f g)xy and (f g)yy . Since the terms on the right are continuous, f g is C 2 . In particular, polynomials and rational functions of several variables are of class C ∞ . (c) The composite f = g ◦ h of real-valued C r functions is again C r . This follows from the chain rule: The matrix equation f 0 (x) = g 0 h(x) h0 (x) shows that the entries of f 0 (x) are sums of products of C r−1 functions, hence the entries of f (x) are C r . (d) If the function f in the statement of the inverse function theorem is C r on U , then the local inverse of f is also C r . This is proved by induction on r as follows. Assume that the assertion holds for r − 1, and let f be C r on U . Then the entries of the matrix f 0 (x) are C r−1 , hence, near a, the entries of −1 (f −1 )0 (y) = f 0 (f −1 (y)) are C r−1 , as these are rational functions of the entries of f 0 . Therefore, the entries of f −1 are C r . (e) If the function F in the statement of the implicit function theorem is C r , then the solution y = f (x) to the equation F (x, y) = 0 is C r . This follows from (d) , since f is constructed using the inverse function theorem. ♦ The following example illustrates how the chain rule may be used to calculate higher order partial derivatives of composite functions.

Differentiation on Rn

321

9.6.6 Example. Let u = f (x, y) be C 2 on R2 and let x = r cos θ, y = r sin θ. Then ur = (cos θ)ux + (sin θ)uy ,

uθ = −(r sin θ)ux + (r cos θ)uy ,

urr = (cos θ)uxr + (sin θ)uyr = (cos θ)2 uxx + (2 sin θ cos θ)uxy + (sin θ)2 uyy , uθθ = −(r cos θ)ux − (r sin θ)uxθ − (r sin θ)uy + (r cos θ)uyθ , = (r sin θ)2 uxx − (2r2 sin θ cos θ)uxy + (r sin θ)2 uyy − rur . Calculations like these are useful for changing coordinates in differential operators. For example, the above equations imply that ∂2 ∂2 ∂2 1 ∂ 1 ∂2 + 2 = 2+ + 2 2. 2 ∂x ∂y ∂r r ∂r r ∂θ

(9.21)

The operator on the left is called the Laplacian. The equation expresses the Laplacian in polar coordinates. ♦

Exercises 1. Let z = f (x, y) be C 2 on R2 . Show that the following equations hold for the given functions x = x(r, t) and y = y(r, t). zrr + ztt (a) zxx + zyy = 2 , x = ar + bt, y = at − br. a + b2 rzrr − tztt (b)S xzxx − zyy = , x = rt, y = r + t. t−r zrr + ztt , x = rt, y = r2 − t2 . (c) zxx + 4zyy = 2 r + t2 (d) x2 zxx + y 2 zyy = 21 [r2 zrr + ztt − rzr ], x = ret , y = re−t . (e)S zxx + zyy = e−2r [zrr + ztt ],

x = er sin t, y = er cos t.

(f)S a2 x2 zxx + b2 y 2 zyy = zrr + ztt − azr − bzt ,

x = ear , y = ebt .

2. Let z = f (x, y) be C 2 on R2 , x = ar + bs, and y = cr + ds. Show that 2 zrr a c2 2ac zxx zss = b2 d2 2bd zyy . zrs ab cd ad + bc zxy In particular, if x = r − s and y = r + s, show that zxx 1 1 −2 zrr 1 zyy = 1 1 2 zss . 4 1 −1 0 zxy zrs 3. Let z = f (x, y), x = g(r, s), y = h(r, s) be C 2 on R2 . Show that 2 2 ∂2z ∂z ∂ 2 x ∂z ∂ 2 y ∂ 2 z ∂x ∂ 2 z ∂r ∂ 2 z ∂x ∂y = + + + +2 . 2 2 2 2 2 ∂r ∂x ∂r ∂y ∂r ∂x ∂r ∂y ∂r ∂x∂y ∂r ∂r

322

A Course in Real Analysis

4.S Let F (x, y, z) be C 2 on an open set U and assume that the equation F (x, y, z) = 0 defines z implicitly as a function of x and y. Express zxx in terms of partial derivatives of F . 5.S Show that each of the following functions u = u(t, x) satisfies the one dimensional heat equation ut = k 2 uxx . (a) u = (a sin x + b cos x) exp(−k 2 t).

(b) u = t−1/2 exp (−x2 /4k 2 t).

6. Let f (x) and g(x) be twice differentiable. (a) Show that the function u(t, x) = f (x − ct) + g(x + ct) satisfies the one dimensional wave equation utt = c2 uxx . 1 (b) Show that the function v(t, x) = [f (x − ct) + g(x + ct)], x > 0, x 1 c2 satisfies the equation vtt = c2 1 + vxx + vx . x x 7.S (Spherical coordinate analog of (9.21)). Let w = f (x, y, z) be of class C 2 on R3 , where x = ρ sin φ cos θ, y = ρ sin φ sin θ, and z = ρ cos φ. Show that ∂2w ∂2w ∂2w ∂ 2 w 2 ∂w 1 ∂ 2 w cos φ ∂w 1 ∂2w + + = + + + + . ∂x2 ∂y 2 ∂z 2 ∂ρ2 ρ ∂ρ ρ2 ∂φ2 ρ2 sin φ ∂φ ρ2 sin2 φ ∂θ2 8. Show that if f (x, y) is C 2 and homogeneous of degree n ≥ 2 (Exercise 9.3.15), then x2 fxx + 2xyfxy + y 2 fyy = n(n − 1)f (x, y). 9.S Let g be C 2 on (0, +∞), p 6= 0, and f (x) = g (kxkp ), x ∈ Rn \ {0}. Show that n

1X fx x = (n + p − 2)kxkp−2 g 0 (kxkp ) + pkxk2(p−1) g 00 (kxkp ) and p i=1 i i h iX 1X fxi xj = (p − 2)kxkp−4 g 0 (kxkp ) + kxk2(p−2) g 00 (kxkp ) xi xj . p i 0, 0 ≤ t ≤ T

Differentiation on Rn

323

into uτ (τ, x) = (k − 1)ux (τ, x) + uxx (x, τ ) − ku(x, τ ), k := 2r/σ 2 . The first equation arises in the Black–Scholes theory of option pricing. The second is an example of a diffusion equation. 11. Show that the substitutions u(τ, x) = eax+bτ w(τ, x), a := 12 (1−k), b := a(k−1)+a2 −k = − 14 (k+1)2 , reduce the diffusion equation in Exercise 10 to the heat equation wτ (τ, x) = wxx (τ, x)

9.7

Higher Order Differentials and Taylor’s Theorem

Higher order differentials of a function f of several variables are analogs of higher order derivatives of functions of a single variable. These may be conveniently expressed in terms of higher order partial derivatives of f . An important consequence is Taylor’s theorem in n-dimensions, which is used to establish convergence of power series in several variables. We begin by giving an alternate description of the space L Rn , L(Rn , R) . For a member B of this space and each h ∈ Rn , Bh ∈ L(Rn , R) has matrix (Bh)e1 · · · (Bh)en , which we identify with the vector (Bh)e1 , . . . , (Bh)en , so that (Bh)k may be written (Bh) · k. Now define ˜ B(h, k) := (Bh) · k,

h, k ∈ Rn .

˜ is linear in h for each fixed k and linear in k for each fixed h. Such Clearly, B a function is called a bilinear functional on Rn . Using the bilinearity, we have n n n X n X X X i i ˜ ˜ B(h, k) = B hi e , kj e = Bij hi kj , (9.22) i=1

j=1

i=1 j=1

˜ i , ej ) = (Bei ) · ej . In matrix notation, where Bij := B(e k1 . ˜ B(h, k) = [h1 · · · hn ] Bij .. . kn

324

A Course in Real Analysis

˜ on Rn , the equation (Bh)k := Conversely, given any bilinear functional B n n ˜ ˜ B(h, k) defines a member B of L R , L(R , R) . Thus, identifying B with B, n n we see that L R , L(R , R) may be viewed as the vector space of all bilinear functionals on Rn . Now let U ⊆ Rn be open and let f : U → R be C 2 on U . Then df is a function on U taking values in L(Rn , R). Identifying df with the vector function ∇f = (∂1 f, . . . , ∂n f ), we define d2 fx ∈ L Rn , L(Rn , R) by d2 fx = d(df )x = d(∇f )x that is, by the above identification, d2 fx (h, k) = d(∇f )x (h) · k, x ∈ U, h, k ∈ Rn . The matrix of d(∇f )x has (i, j) entry ∂j ∂i f (x) = ∂i ∂j f (x), since f is C 2 . Thus n X ∂ 2 f (x) d2 fx (h, k) = hi kj , h, k ∈ Rn . ∂x ∂x i j i,j=1 The bilinear function d2 fx is called the second order differential of f at x. For higher order differentials, we need the following generalization of a bilinear functional: 9.7.1 Definition. An m-multilinear functional on Rn is a real-valued function M (h1 , . . . , hm ) of vectors hj = (hj1 , . . . , hjn ) ∈ Rn that is linear in each variable hj when the other variables are held fixed. ♦ Analogous to (9.22) we have M (h1 , . . . , hm ) =

n X

···

j1 =1

n X

Mj1 ,...,jm h1j1 · · · hm jm

(9.23)

jm =1

where Mj1 ,...,jm := M (ej1 , . . . , ejm ). Now let f be C m on U , m ≥ 2. The mth order differential of f at x is defined inductively by dm fx = d(dm−1 f )x . As in the case m = 2, we may interpret dm fx as the m-multilinear functional dm fx (h1 , . . . , hm ) =

n X

···

j1 =1

n X jm

∂ m f (x) h1j1 · · · hm jm . ∂x · · · ∂x j j 1 m =1

The mth total differential Dm fx of f at x is then defined by Dm fx (h) := dm fx (h, . . . , h), h := (h1 , . . . , hn ) n n X X ∂ m f (x) = ··· hj · · · hjm , ∂xj1 · · · ∂xjm 1 j =1 j =1 1

m

(9.24)

Differentiation on Rn

325

which is frequently written D m fx =

n X j1 ,j2 ,...,jm

∂ m f (x) dxj1 dxj2 · · · dxjm , dxj (h) := hj . ∂xj1 ∂xj2 . . . ∂xjm =1

By 9.6.3, each partial derivative in (9.24) may be expressed as ∂mf , m := m1 + · · · + mn , mj ∈ Z+ . n . . . ∂xm n

1 ∂xm 1

Similarly, the corresponding product of h’s in (9.24) may be written in the mn + + 1 form hm 1 . . . hn . For a fixed multi-index (m1 , . . . , mn ) ∈ Z × · · · × Z , the number of terms in (9.24) of the form ∂mf m1 mn mn h1 · · · hn 1 ∂xm 1 . . . ∂xn is given by the multinomial coefficient m m , = m1 , m2 , . . . , mn m1 ! m2 ! · · · mn !

(9.25)

which is the number of distinct ways of arranging m objects, where m1 are alike, m2 are alike, etc. With this notation, (9.24) may be written X ∂ m f (x) m m D fx (h) = hm1 · · · hmn , (9.26) m1 n m1 , . . . , mn ∂x1 . . . ∂xm n or, in differential notation, X m ∂ m f (x) (dx1 )m1 · · · (dxn )mn , D m fx = m1 n m1 , . . . , mn ∂x1 . . . ∂xm n where the sums are taken over all multi-indices (m1 , . . . , mn ) ∈ Z+ × · · · × Z+ for which m1 + · · · + mn = m. We may go a step further by appealing to the following generalization of the binomial theorem. 9.7.2 Multinomial Theorem. Let h1 , . . . , hn ∈ R and m ∈ N. Then X m m n (h1 + · · · + hn ) = hm1 · · · hm (9.27) n , m1 , . . . , mn 1 where the summation is taken over all multi-indices (m1 , . . . , mn ) for which m1 + · · · + mn = m. Proof. The theorem may be proved by induction, but we give a combinatorial argument instead. The left side of (9.27) expands into a sum of products of the form x1 · · · xm , where each xi is one of the terms in the sum h1 + · · · + hn .

326

A Course in Real Analysis

mn 1 Each such product may be written uniquely as hm 1 · · · hn , where mj ≥ 0 and m1 + · · · + mn = m. For each fixed (m1 , . . . , mn ), the number of products of this form is the number of ways m1 factors in the product x1 · · · xn may be chosen to be h1 , m2 factors may be chosen to be h2 , etc. This number is precisely the multinomial coefficient (9.25).

Now consider the operator hi

∂ , which takes a C 1 function f to the ∂xi

∂f . If multiplication of such operators is defined as operator ∂xi composition, then the usual laws of algebra hold. For example, the operator ∂ ∂ ∂ h1 h2 + h2 ∂x1 ∂x2 ∂x2 function hi

applied to a C 2 function f yields ∂2f ∂2f = h1 h2 + h22 ∂x1 ∂x2 ∂x22

∂2 ∂2 h1 h2 + h22 2 ∂x1 ∂x2 ∂x2

f,

hence we may write ∂ ∂ ∂ ∂2 ∂2 h1 + h2 h2 = h1 h2 + h22 2 . ∂x1 ∂x2 ∂x2 ∂x1 ∂x2 ∂x2 Similarly, h

∂ ∂ +k ∂x ∂y

2

= h2

2 ∂2 ∂2 2 ∂ + 2hk + k . ∂x2 ∂x∂y ∂y 2

The last example suggests that the multinomial theorem is valid in this setting. This is indeed the case (a similar proof works). It follows from (9.26) that the mth total differential may be written in operator form as m ∂ ∂ ∂ m D fx (h) = h1 + h2 + · · · + hn = (h · ∇)m . ∂x1 ∂x2 ∂xn We may now state the n-dimensional version of Taylor’s theorem. 9.7.3 Taylor’s Theorem. Let U ⊆ Rn be open, m ∈ N, and let f : U → R be C m+1 on U . Then for each pair of distinct points a, x ∈ U for which [a : x] ⊆ U , there exists a point c ∈ [a : x] depending on x and a such that f (x) =

m X p m+1 1 1 h · ∇ f (a) + h·∇ f (c), p! (m + 1)! p=0

h := x − a. (9.28)

Proof. The line segment [a : x] is described by ϕ(t) := (1 − t)a + tx = a + th, 0 ≤ t ≤ 1.

Differentiation on Rn

327

Since U is open, there exists an r > 0 such that ϕ (−r, 1 + r) ⊆ U . Let F = f ◦ ϕ. By the chain rule, n n X X ∂f ϕ(t) ∂ 2 f ϕ(t) d ∂f ϕ(t) 0 F (t) = hj and = hi , ∂xj dt ∂xj ∂xi ∂xj j=1 i=1 hence

n X ∂ 2 f ϕ(t) F (t) = hi hj . ∂xi ∂xj i, j=1 00

An induction argument shows that F

(p)

(t) =

n X

j1 ,...,jp

p ∂ p f ϕ(t) hj1 . . . hjp = h · ∇ f ϕ(t) . ∂x · · · ∂x j1 jp =1

By Taylor’s theorem in one variable, there exists c ∈ (0, 1) such that f (x) = F (1) =

m X F (p) (0) p=0

p!

+

F (m+1) (c) . (m + 1)!

Setting c = ϕ(c) completes the proof. The summation in (9.28) is called an mth order Taylor polynomial about a and is denoted by Tm (x, a). For example, the second order Taylor polynomial of a C 2 function f (x1 , x2 ) is f + h1

∂2f ∂f ∂f 1 ∂2f 1 ∂2f + h1 h2 , + h2 + h21 + h22 2 ∂x1 ∂x2 2 ∂x1 ∂x1 ∂x2 2 ∂x22

where hj = xj − aj and the terms are evaluated at (a1 , a2 ). The last term in (9.28) is called the remainder term and is denoted by Rm (x, a). The following theorem gives a sufficient condition for a C ∞ function to be expressed as a multi-variable Taylor’s series. 9.7.4 Taylor Series Representation. Let U ⊆ Rn be open and convex and let f : U → R be C ∞ on U . Suppose that for some M < +∞ p p ∂ pf (x) p ≤ M ∂x 1 ∂x 2 . . . ∂xnn 1 2 for all x ∈ U , p ∈ N, and all pj ∈ Z+ , where p = p1 + . . . + pn . Then f (x) =

∞ X p 1 h · ∇ f (a), a, x ∈ U, h := x − a. p! p=0

(9.29)

328

A Course in Real Analysis

Proof. By 9.7.3, the theorem will follow if we show that the remainder term Rm (x, a) =

m+1 1 h·∇ f (c) (m + 1)!

tends to zero as m → ∞. By (9.26), X m + 1 M |Rm (x, a)| ≤ |h1 |m1 |h2 |m2 . . . |hn |mn , (m + 1)! m1 , . . . , mn where the summation is taken over all multi-indices (m1 , . . . , mn ) for which m1 + · · · + mn = m + 1. By the multinomial theorem, this sum is hm+1 , where h = |h1 | + |h2 | + · · · + |hn |. Therefore, |Rm (x, a)| ≤

M hm+1 , (m + 1)!

which implies limm Rm (x, a) = 0. The series on the right in (9.29) is called the Taylor series for f about a. While the theorem may be applied directly, in many cases it is easier to make use of single variable series. For example, from the series expansion for ex we have exy =

∞ ∞ X n X X xn y n xj y n−j and ex+y = ex ey = . n! j! (n − j)! n=0 n=0 j=0

Exercises 1. Let f be of class C 3 . Write out explicitly (a) D2 f (x, y).

(b)S D3 f (x, y).

(c) D2 f (x, y, z).

2.S Calculate D2 f for the functions f (x, y) = 1 2 2 (a) x3 y 2 + x2 y 3 . (b) 2 . (c) sin(xy). (d) ex +y . (e) ln(x2 + y). x y 3.S Find Dm+n+1 (xm y n ). 4. Let f (x, y) be C n . Show that for 1 ≤ k ≤ n, k ∂k f (tx, ty) t=1 = (x, y) · ∇ f (x, y). k ∂t Conclude that if f is homogeneous of degree n (Exercise 9.3.15), then k (x, y) · ∇ f (x, y) = n(n − 1) · · · (n − k + 1)f (x, y).

Differentiation on Rn

329

5. Write out explicitly (a)S the first order Taylor polynomial for a C 1 function f (x1 , x2 , x3 ). (b) the third order Taylor polynomial for a C 3 function f (x1 , x2 ), 6. A polynomial of degree m + n in two variables x and y is a function of the form m X n X

aij xi y j , where aij ∈ R and amn 6= 0.

i=0 j=0

Prove that f (x, y) is a polynomial of degree ≤ p on Br (a, b) iff Dp+1 f (x, y) = 0 for all (x, y) ∈ Br (a, b). 7. Let P (x, y) be a polynomial in x, y. Prove that the polynomials P (x ± 1) may be written as linear combinations of derivatives ∂ k P (x, y) , ∂xi ∂y j

k = i + j.

8.S Let ϕ(t) be of class C m on an interval (−r, r) and let f (x) = ϕ b · x where b, x ∈ Rn . Show that the Taylor polynomial for f of order m about 0 is m X p ϕ(p) (0) b·x . p! p=0 9. Let U ⊆ R2 be open and connected and let f be C ∞ on U such that for each (x, y) ∈ U there exists r > 0 and p ∈ N depending on (x, y) such that Dp f = 0 on Br (x, y). Prove that there exists a single p ∈ N such that Dp f = 0 on U . Hint. Use Exercise 6. 10. Let U ⊆ Rn be open and let f be C p on U such that all partial derivatives of f of order r < p vanish throughout U . Let C be a compact convex subset of U . Prove that there exists c < +∞ such that kf (x) − f (y)k ≤ ckx − ykp ,

x, y ∈ C.

11. Use the one variable Taylor series to find third order Taylor polynomials with a = (0, 0) for the functions √ cos xy.

(a) S sin(x + y).

(b)

(d) S arctan(x + y).

(e) e2x+3y .

(c) (f)

ln(1 − x − y)−1 . y . 1 + xy

330

*9.8

A Course in Real Analysis

Optimization Throughout the section, f : U → R denotes a C 1 function on an open subset U of Rn .

In this section we use differential theory to find the maximum and minimum values of f on subsets E of U . The first step is to find all local extrema.

Local Extrema and Critical Points 9.8.1 Definition. Let a ∈ U . If f (a) is the maximum (minimum) value of f on some ball in U with center a then f is said to have a local maximum (local minimum) at a ∈ U In either case, f is said to have a local extremum at a. ♦ The following theorem gives a necessary condition for the existence of a local extremum. 9.8.2 Local Extremum Theorem. If f has a local extremum at a, then dfa = 0. Proof. The function g(t) := f (a1 , . . . aj−1 , t, aj+1 , . . . , an ) has a local extremum at t = aj , hence, by the single variable local extremum theorem (4.2.2), ∂j f (a1 , . . . , an ) = g 0 (aj ) = 0. 9.8.3 Definition. A point a ∈ U is called a critical point of f if dfa = 0. A critical point a is a local maximum (local minimum) point if f has a local maximum (local minimum) at a. If a is neither a local maximum nor a local minimum point, then a is called a saddle point. ♦

FIGURE 9.2: Saddle point. By definition, a critical point a is a saddle point iff in each ball Br (a) there exist points x and y such that f (x) < f (a) < f (y). This means that the graph of f rises in some directions from a and falls in others. A familiar example is f (x, y) = y 2 − x2 at (0, 0) (Figure 9.2).

Differentiation on Rn

331

Second Derivative Test The following theorem gives sufficient conditions for a critical point of a function f to be a local maximum point, a local minimum point, or a saddle point. It may be seen as an extension of the second derivative test for functions of one variable. 9.8.4 Second Derivative Test. Let f be C 2 on U and let a ∈ U be a critical point of f . (a) If D2 fa (h) > 0 for all h 6= 0, then a is a local minimum point. (b) If D2 fa (h) < 0 for all h 6= 0, then a is a local maximum point. (c) If D2 fa (h) > 0 for some h and D2 fa (k) < 0 for some k, then a is a saddle point of f . Proof. Choose r > 0 such that Br (a) ⊆ U . By 9.28 with m = 1, for each h with khk < r there exists c ∈ [a : a + h] such that f (a + h) − f (a) = 21 D2 fc (h) = 12 D2 fa (h) + η(h) , (9.30) where η(h) = D fc (h) − D fa (h) = 2

2

n X

hi hj

i,j=1

Set

( ε(h) =

∂ 2 f (c) ∂ 2 f (a) − . ∂xi ∂xj ∂xi ∂xj

khk−2 η(h) if khk = 6 0, 0 if khk = 0.

Since |hi hj | ≤ khk2 , n 2 X ∂ f (c) ∂ 2 f (a) |ε(h)| ≤ ∂xi ∂xj − ∂xi ∂xj . i,j=1 Since f is C 2 , limh→0 ε(h) = 0. With these preliminaries out of the way, assume that the hypothesis in (a) holds. Since the function D2 fa (h) is continuous in h, it has a positive minimum m on the sphere S1 (0) in Rn . Thus h 2 2 2 D fa (h) = khk D fa ≥ mkhk2 , h 6= 0, khk so from (9.30) f (a + h) − f (a) ≥

1 2

mkhk2 + η(h) =

1 2

m + ε(h) khk2 .

Since m > 0 and ε(h) → 0, f (a + h) − f (a) > 0 for all h = 6 0 with sufficiently small norm. This proves (a). Part (b) follows from (a) by considering −f .

332

A Course in Real Analysis

To prove (c), suppose for some h, k that D2 fa (h) > 0 and D2 fa (k) < 0. By (9.30), t2 2 f (a + th) − f (a) = D fa (h) + khk2 ε(th) , 2 for all t > 0. Therefore, f (a + th) − f (a) > 0 for all sufficiently small t > 0. Similarly, f (a + tk) − f (a) < 0 for all sufficiently small t > 0. 9.8.5 Example. Let f (x, y, z) = x2 + y 2 + xy + 3x + sin2 z. The system fx = 2x + y + 3 = 0, fy = x + 2y = 0, fz = sin(2z) = 0 has solutions an = (−2, 1, nπ/2), n ∈ Z. From fxx = fyy = 2, fzz = 2 cos(2z), fxy = 1, and fxz = fyz = 0, we have D2 f (h, k, `) =

h

∂ ∂ ∂ +k +` ∂x ∂y ∂z

2 f

= h2 fxx + k 2 fyy + `2 fzz + 2(hkfxy + h`fxz + k`fyz ) = 2 h2 + k 2 + hk + `2 cos(2z) . Therefore, ( D fan (h, k, `) = 2

2(h2 + k 2 + hk + `2 ) if n = 2k, 2(h2 + k 2 + hk − `2 ) if n = 2k + 1.

Since h2 + k 2 + hk ≥ 0 for all h, k, a2k is a local minimum point and a2k+1 a saddle point. ♦ The second derivative test gives no information if D2 fa = 0. For example, the critical point (0, 0) of the function f (x, y) = xn + y 2 , n ≥ 3, is a saddle point if n is odd and a local minimum point if n is even. For n = 2, there is a simpler version of the second derivative test: 9.8.6 Corollary. Let U ⊆ R2 be open and let f : U → R be C 2 on U . For a critical point (a, b) of f , set f (a, b) fxy (a, b) 2 = fxx (a, b)fyy (a, b) − fxy ∆ = ∆(a, b) = xx (a, b). fyx (a, b) fyy (a, b) (a) If ∆ > 0 and fxx (a, b) > 0, then (a, b) is a local minimum point. (b) If ∆ > 0 and fxx (a, b) < 0, then (a, b) is a local maximum point. (c) If ∆ < 0, then (a, b) is a saddle point of f .

Differentiation on Rn

333

Proof. Let α = fxx (a, b), β = fxy (a, b), and γ = fyy (a, b). Then ∆ = αγ − β 2 and D2 f(a,b) (h, k) = αh2 + 2βhk + γk 2 ,

h, k ∈ R.

(9.31)

If α 6= 0, completing the square yields 2 2 kβ k 2 (αγ − β 2 ) kβ k2 ∆ D f(a,b) (h, k) = α h + + =α h+ + . α α α α 2

Thus if ∆ > 0, α > 0, and (h, k) 6= (0, 0), then D2 fa (h, k) > 0, hence, by the theorem, (a) holds. A similar argument proves (b). Now suppose ∆ < 0. If α 6= 0, then from (9.31) D2 f(a,b) (1, 0) = α

and D2 f (a, b)(−βα−1 , 1) =

∆ , α

which have opposite signs. If γ 6= 0, then completing the square yields 2 h2 ∆ hβ + , D2 f(a,b) (h, k) = γ k + γ γ and one may argue similarly. (This also shows that (a) and (b) hold with fxx in the statement replaced by fyy .) Finally, if α = γ = 0, then β 6= 0, and (9.31) shows that, again, D2 fa (h, k) has positive and negative values. This proves (c). 9.8.7 Example. Let f (x, y) = 3x2 y + 2xy 2 − 6xy. Since fx (x, y) = 2y(3x + y − 3)

and fy (x, y) = x(3x + 4y − 6),

the critical points are (0, 0), (2, 0), (0, 3), and (2/3, 1). TABLE 9.1: Values of ∆. (a, b) fxx (a, b) fyy (a, b) fxy (a, b)

(0, 0) 0 0 −6

(2, 0) 0 8 6

(0, 3) 18 0 6

(2/3, 1) 6 8/3 2

∆(a, b)

−36

−36

−36

12

Table 9.1 shows that f has three saddle points and one local minimum point. ♦

334

A Course in Real Analysis

9.8.8 Example. Let 2

f (x, y) = (cx2 + y 2 )e−x

−y 2

, c 6= 0, 1.

The system fx = 2xe−x

2

−y 2

(c − cx2 − y 2 ) = 0,

fy = 2ye−x

2

−y 2

(1 − cx2 − y 2 ) = 0

has solutions (0, 0), (0, ±1), and (±1, 0). The second partial derivatives are fxx = 2e−x fyy fxy

2

−y 2

c − 3cx2 − y 2 + 2x2 (cx2 + y 2 − c) , 2 2 = 2e−x −y 1 − cx2 − 3y 2 + 2y 2 (cx2 + y 2 − 1) , and 2 2 = 4xye−x −y cx2 + y 2 + −c − 1 . TABLE 9.2: Values of ∆.

(a, b) fxx (a, b) fyy (a, b) fxy (a, b) ∆(a, b)

(0, 0) (0, 1)) 2c 2(c − 1)/e 2 −4/e 0 0 4c

8(1 − c)/e

(0, −1) 2(c − 1)/e −4/e 0

(1, 0) −4c/e 2(1 − c)/e 0

(−1, 0) −4c/e 2(1 − c)/e 0

8(1 − c)/e2

8(c − 1)/e 8(c − 1)/e2

The values of ∆ at the critical points (a, b) are given in Table 9.2. Assigning values to c produces a variety of local extreme points. For example, if c > 1, then (0, ±1) are saddle points and the remaining critical points are local minimum points of f . ♦

Global Extrema We now turn to the problem described at the beginning of the section, namely, to find the points in a subset E of U at which f has a maximum or a minimum. Such points, called global extrema, will always exist if E is closed and bounded. The following examples illustrate a common technique for finding them. 9.8.9 Example. Let f (x, y) = 2x3 − x2 + 3y 2 ,

E := (x, y) : x2 + y 2 ≤ 1 .

By 9.8.2, the extreme values of f occur at points on bd(E) or at critical points of f in int(E). Solving the system fx = 6x2 − 2x = 0, fy = 6y = 0 yields the critical points (0, 0) and (1/3, 0), which are candidates for extrema

Differentiation on Rn

335

in int(E). To find possible extrema on bd(E) we substitute 1 − x2 for y 2 in the expression for f to obtain the function F (x) = 2x3 − 4x2 + 3, −1 ≤ x ≤ 1. Since the only zero of F 0 (x) in [−1, 1] is x = 0, single variable optimization theory gives us the additional extrema candidates (0, ±1) and (±1, 0). Calculating the values of f at these six points shows that f (0, ±1) = 3 is the maximum value of f on E and f (−1, 0) = −3 is the minimum. ♦ 9.8.10 Example. Let f (x, y, z) = (x − 1)2 + (y − 2)2 + z 2 ,

E := (x, y, z) : x2 + y 2 + z 2 ≤ 6 .

The solution of the system fx = fy = fz = 0 is (1, 2, 0), at which f has minimum value zero. The maximum of f must then occur on bd(E). Substituting the expression 6 − x2 − y 2 for z 2 in the definition of f , we obtain the function F (x, y) = (x − 1)2 + (y − 2)2 + 6 − x2 − y 2 = 11 − 2x − 4y, x2 + y 2 ≤ 6. The system Fx = Fy = 0 has no solution, hence the extreme values √ of F must 2 2 lie on the boundary x + y = 6. To find these values, let x = 6 cos θ and √ y = 6 sin θ, so √ √ F (x, y) = G(θ) := 11 − 2 6 cos θ − 4 6 sin θ, 0 ≤ θ ≤ 2π. Applying single variable optimization techniques to G, we see that possible extreme values of F on x2 + y 2 = 6 occur at points q y) for which θ = 0 q (x, √ 6 6 and θ = arctan 2, that is, (x, y) = ( 6, 0) and ± 5, 2 5 . Calculating the values of F at these points shows that the maximum value of f on E is r r 6 6 f − , −2 , 0 ≈ 22. ♦ 5 5 In the above examples, E was the closure of an open set whose boundary is a smooth surface. In many important cases, however, E itself is a surface. The surfaces we shall consider are of the form E = {x ∈ U : g1 (x) = · · · = gm (x) = 0} , where U ⊆ Rn is open, m < n, and the functions gj are C 1 on U . The equations gj (x) = 0 are then called constraints and E is the constraint set. If f (a) is the maximum or minimum value of f on E, then f is said to have an extremum at a subject to the constraints gj = 0. 9.8.11 Example. We find the points on the surface z 2 −x2 y = 1 closest to the origin. This is equivalent to minimizing f (x, y, z) = x2 + y 2 + z 2 subject to the constraint z 2 −x2 y−1 = 0. Since the surface is unbounded, it suffices to consider

336

A Course in Real Analysis

that part of the surface inside a ball with center 0. To find the minimum, we substitute z 2 = x2 y + 1 into f to obtain a function F (x, y) = x2 (1 + y) + y 2 + 1 defined on an open disk containing a point at which f is minimum. The critical points of F , solutions of the system Fx = 2x(1 + y) = 0, Fy = x2 + 2y = 0, √ are (0, 0), and (± 2, −1). The last two are easily seen to be saddle points, while (0, 0) is a local minimum point. Therefore, the minimum of f occurs at (0, 0, ±1), hence the distance from the surface to the origin is 1. ♦

Lagrange Multipliers In 9.8.11, it was possible to solve the constraint equation for one of the variables in terms of the others, reducing the dimension by one, thereby simplifying the problem. This is not always possible, but the implicit function theorem may be used to solve the constraint equation locally. This is the method used in the proof of the next theorem. For its statement, we use the following notational conventions, similar to those used in the proof of the implicit function theorem. Notation. Let m < n and p := n − m. For points z ∈ Rn = Rm+p we write z = (x, y) = (x1 , . . . xm , y1 , . . . yp ), x ∈ Rm , y ∈ Rp . If G := (g1 , . . . , gm ) : U → Rm , then G(z) may be written as differentiable, we define ∂g1 ∂g1 ∂g1 ··· ∂y1 · · · ∂x1 ∂x m . .. . and Gy = . Gx = .. ··· . ··· . ∂g ∂gm ∂gm m ··· ··· ∂x1 ∂xm ∂y1

G(x, y). If G is ∂g1 ∂yp .. . . ∂gm ∂yp

♦

9.8.12 Lagrange Multipliers. Let U ⊆ Rn be open and let f, gj : U → R, j = 1, . . . , m < n be C 1 functions. Set G := (g1 , . . . , gm ). Suppose that f has a global extremum at c = (a, b) ∈ U subject to the constraint G = 0. If det Gx (c) 6= 0, then there exist constants λ1 , . . . , λm such that ∇f (c) =

m X

λi ∇gi (c).

i=1

Proof. Equation (9.32) is the system ∂g1 ∂gm + · · · + λm , j = 1, . . . , m, ∂xj ∂xj ∂g1 ∂gm ∂j+m f (c) = λ1 + · · · + λm , j = 1, . . . , p, ∂yj ∂yj ∂j f (c) = λ1

(9.32)

Differentiation on Rn

337

which may be written in matrix form as λ1 · · · λm Gx (c) = ∂1 f (c) · · · ∂m f (c) λ1 · · · λm Gy (c) = ∂m+1 f (c) · · · ∂n f (c) .

(9.33) (9.34)

Equation (9.33) is satisfied by defining λ1 · · · λm := ∂1 f (c) · · ·

∂m f (c) G−1 (9.35) x (c). It remains to show that (9.34) is satisfied for this choice of λ1 · · · λm . By the implicit function theorem applied to G, there is an open set Vb ⊆ Rp containing b and a continuously differentiable mapping h = (h1 , . . . , hm ) : Vb → Rm such that h(b) = a and G h(y), y = 0 for every y ∈ Vb . Applying the chain rule to each component equation gi h(y), y = 0 yields ∂gi ∂h1 ∂gi ∂hm ∂gi + ··· + + = 0, i = 1, . . . , m, j = 1, . . . , p, ∂x1 ∂yj ∂xm ∂yj ∂yj which may be written in matrix form as ∂h ∂h1 ∂g1 ∂g1 ∂g1 1 · · · ··· ∂yp ∂x1 ∂xm ∂y1 ∂y1 . .. .. .. = − .. . . . . . . ∂g ∂gm ∂gm ∂hm ∂hm m ··· ··· ∂x1 ∂xm ∂y1 ∂yp ∂y1

···

···

∂g1 ∂yp .. . ∂g1 ∂yp

or in the above notation as Gx (c)h0 (b) = −Gy (c). Multiplying the last equation on the left by ∂1 f (c) · · · ∂m f (c) G−1 x (c) and using (9.35), we obtain ∂1 f (c) · · · ∂m f (c) h0 (b) = − λ1 · · · λm Gy (c). (9.36) Since f h(y), y has a local extremum at b, its partial derivatives must vanish there: ∂f (c) ∂h1 (b) ∂f (c) ∂hm (b) ∂f (c) + ··· + + = 0, j = 1, 2, . . . , p. ∂x1 ∂yj ∂xm ∂yj ∂yj In matrix form, ∂1 f (c) · · ·

∂m f (c) h0 (b) = − ∂m+1 f (c) · · ·

Equation (9.34) now follows from (9.36) and (9.37).

∂n f (c) .

(9.37)

338

A Course in Real Analysis

9.8.13 Example. Let c, x ∈ Rn , c 6= 0. We find the extreme values of f (x) := c · x on the sphere kxk = 1, that is, subject to the constraint g(x) := kxk2 − 1 = 0. By Lagrange multipliers, the extreme values occur at points x for which ∇f (x) = λ∇g(x) for some λ ∈ R. This leads to the system ci = 2λxi , 1 ≤ i ≤ n. Squaring and adding yields kck2 = 4λ2 kxk2 = 4λ2 , hence 2λ = ±kck and x = c/2λ = ±c/kck. Therefore, the extreme values of f are f ± c/kck = ±kck. ♦ The last example has an important application to directional derivatives: Let h be differentiable on Br (a). From Exercise 9.3.10, the directional derivative Dx h(a) of h at a in the direction of a unit vector x is c · x, where c = ∇h(a). Thus, by the example, Dx h(a) is maximum when x = c/kck, that is, when x is in the direction of the gradient of h. 9.8.14 Example. Let x = (x1 , . . . , xn ), a = (a1 , . . . , an ), and c = (c1 , . . . , cn ), where xj ≥ 0, aj > 0, and cj > 0. We find the maximum value of f (x) = xa1 1 xa2 2 · · · xann subject to the constraint c · x = 1. Note that the conditions xj ≥ 0 and cj > 0 imply that the constraint set is closed and bounded. Set g(x) = c · x − 1. The maximum of f occurs at points x for which ∇f (x) = λ∇g(x) for some λ ∈ R. This leads to the equations aj f (x) = λcj xj , j = 1, . . . , n.

(9.38)

Adding Pn and using the constraint yields af (x) = λ, or f (x) = λ/a, where a = j=1 aj . From (9.38), aj = acj xj so the maximum occurs at the point a a2 an 1 , ,..., . ac1 ac2 acn In particular, if a1 = · · · = an = 1 and c1 = · · · = cn = 1/c, c > 0, then f (x1 , x2 , . . . , xn ) = x1 x2 · · · xn has maximum f (c/n, . . . , c/n) = (c/n)n . Thus x1 x2 · · · xn ≤ (c/n)n , or equivalently (x1 x2 · · · xn )1/n ≤ c/n for all xj > 0 satisfying x1 + · · · + xn = c. Since c is arbitrary, we obtain the classic result (x1 x2 . . . xn )1/n ≤

x1 + x2 + · · · + xn , xj ≥ 0, n

which asserts that the geometric mean of nonnegative data does not exceed the arithmetic mean. ♦

Exercises 1. In each case classify the critical point a := (π/2, π/2, π/2) of the function. (a) (sin x)(sin y)(sin z).

(b) (sin x)(cos y)(cos z).

2.S Show that the function x2 + 2y 2 + 3z 2 − xy − yz − xz on R3 has minimum value zero.

Differentiation on Rn

339

3. Find and classify the critical points of the following functions. (a) S x3 + 2xy + 3x2 + y 2 .

(b) x3 + 3x2 y 2 − 6x2 − 12y 2 .

(c) x2 y 2 + 2/x + 2/y.

(d) S x4 + 2y 2 − 4xy.

(e) x−1 + y −1 + ln(x2 + y 2 ).

(f) S x−1 + y −1 + arctan(y/x).

(g) x3 − xy 2 + x2 − y 2 .

(h) x4 − 2x2 + 4y 3 − 12y.

(i) S xy − x2 y − xy 2 .

(j) x4 − 4x3 + 4x2 + y 2 .

4. Find the maximum and minimum values of each of the following functions f on R2 \ {(0, 0)}. √ x+y x + 3y x + 2y x2 + xy (a)S p . . (b) p . (c) p . (d) 2 x + y2 x2 + y 2 x2 + y 2 x2 + y 2 5. Show that the point (x, x2 ) on the curve y = x2 nearest the point (1, 2) satisfies the equation 2x3 − 3x − 1 = 0. In Exercises 6–9, use the method of 9.8.9 and 9.8.10. 6. Find the extreme values of the following functions on the disk D := (x, y) : x2 + y 2 ≤ 1 : (a) 3x2 + 2y 2 − x. (b)S x2 + xy − x + y 2 . (c) cos(xy). (d)S sin(xy). 7.S Prove that the maximum of f (x, y) = x2 + ay 2 + (a − 1)y on the disk D := (x, y) : x2 + y 2 ≤ 1 occurs on bd(D). 8. Let f (x, y) = x2 + y 2 + axy on the disk D := (x, y) : x2 + y 2 ≤ 1 . Prove that a maximum of f occurs on bd(D), and that a minimum of f occurs on bd(D) iff |a| ≥ 2. Pn 9. Show that the (minimum) value of fn (x1 , . . . , xn ) = i=1 xi √ maximum √ on C1 (0) is n (− n). 10.S Let f (x, y) = ax−1 + by −1 + xy, a, b > 0. Prove that f has a minimum on (0, +∞) × (0, +∞) and that the minimum value is 3(ab)1/3 . 11.S Consider the data points (xi , yi ), 1 ≤ i ≤ n, where xi = 6 xj for at least one pair of points. The linear least squares fit is the line y = mx + b with the property that the sum of squares of the vertical distances from 2 Pn the data points to the line, namely, i=1 yi − mxi − b , is minimum. Show that x · y − nx y , and b = y − mx, where kxk2 − nx2 n n 1X 1X x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), x := xi , and y := yi . n i=1 n i=1 m=

340

A Course in Real Analysis

12. Let x, a, b, c ∈ Rn . Prove that f (x) := kx − ak2 + kx − bk2 + kx − ck2 has a minimum value and find the point at which it occurs. In Exercises 13–28, use Lagrange multipliers. 13. Show that the maximum value of f (x) = x1 x2 · · · xn on the set E := Pn −n {x : xi ≥ 0 and . i=1 xi ≤ 1} is n 14. Find the maximum and minimum of 2x − 3y subject to the constraint (x + 1)2 + (y − 1)2 = 1. 15.S Find the maximum and minimum of ax2 + 2bxy + y 2 subject to the constraint x2 + y 2 = c2 , where abc 6= 0 and (a + 1)2 + 4(a − b2 ) ≥ 0. 16. Show that the point√(x, y, z)√on the√surface x2 + y 2 + z 2 = 1 nearest the point (1, 2, 3) is (1/ 14, 2/ 14, 3/ 14). 17.S Show that the point (x, y, z) on the surface z = x2 + y 2 nearest the point (1, 2, 3) satisfies the equations 10x3 − 5x − 1 = 0, y = 2x, and z = x2 + y 2 = 5x2 . 18. Show that the point on the surface x2 + y 2 − z 2 = 1 nearest to the point (1, 2, 3) is x, 2x, 3x/(2x − 1)), where 20x4 − 20x3 − 8x2 + 4x − 1 = 0. 19.S Show that the point on the surface z 2 − x2 − y 2 = 1 nearest (1, 2, 3) is x, 2x, 3x/(2x − 1)), where 20x4 − 20x3 − 4x + 1 = 0. 20. The intersection of the surfaces z = x2 + y 2 and x + y + z = 1 is an ellipse lying above the xy plane. Find the highest and lowest points of the ellipse. 21. Let a, b, c > 0. Show that the maximum and minimum values of the function f (x, y, z) = ax + by + cz subject to the constraints x2 + z 2 = 1, y 2 + z 2 = 1, x, y, z ≥ 0, are, respectively, √ the maximum and minimum of the quantities c, a + b, and (a + b + cd)/ 1 + d2 , where d := c/(a + b). 22.S Find the maximum and minimum values of x + 2y + 3z subject to the constraints x + y + z = 1 and x2 + y 2 + z 2 = 1. 23. Let a > 1/3. Show that the maximum value of xyz subject to the constraints x + y + z = 1 and x2 + y 2 + z 2 = a is r 1 3a − 1 2 3 xyz = (1 − 3t + 2t ), where t = . 27 2

Differentiation on Rn

341

24.S Let x = (x1 , · · · , xn ), a = (a1 , · · · , an ) 6= 0, and b = (b1 , · · · , bn ). Show shortest distance from b to the hyperplane a · x = c is √ that the 2 c − a · b kak−1 . Pn 25. Let p ≥ 2. Show that the largest distance from the surface i=1 |xi |p = 1 to the origin is n(p−2)/2p and the smallest distance is 1. 26.S Find the distance from the point a = (a1 , . . . , an ) to the (n − 1)dimensional sphere kxk = 1 in Rn , where aj > 0, and kak = 6 1. 27. Let a = (a1 , . . . , an ) and b = (b1 , . . . , bn ), where ai , bi > 0. (a)S Show that the minimum value of the function a · x subject to the n n p X 2 X constraint bi /xi = 1, where xi > 0, is ai bi . i=1

(b) Show that the minimum value of n X 3 1/3 2/3 in (a) is ai bi .

i=1

Pn

i=1

ai x2i subject to the constraint

i=1

28. Let Pn a = (a1 , . . . , an ) and b = (b1 , . . . , bn ), where ai , bi > 0 and i=1 bi = 1. Find the minimum value of a · x subject to the conQn b straint i=1 xjj = 1, where xi > 0. 29. Let U ⊆ Rn be open and let f : U → R be C 2 on U . Show that if f has a local maximum (minimum) at a ∈ U , then D2 fa (h) ≤ 0 (≥ 0) for all h ∈ Rn . 30.S Prove the following generalization of Rolle’s theorem: Let U ⊆ Rn be bounded and open and let f : U → R be differentiable on U , continuous on cl(U ), and constant on bd(U ). Then f 0 (u) = 0 for some u ∈ U . 31. Let U ⊆ Rn be open and f : U → Rn C 1 on U such that Jf 6= 0 on U . Let a ∈ U and let C := Cr (a) ⊆ U , r > 0. Prove that if supC kf (x) − xk < r/2, then the equation f (x) = a has a solution in C.

Chapter 10 Lebesgue Measure on Rn

The methods of Chapter 5 may be modified in a natural way to construct the Riemann integral of a function of several variables. In Section 11.1, we briefly describe how this is done. However, the main goal of the present chapter and the next is to construct the more general Lebesgue integral. The choice to develop the n-dimensional Lebesgue integral rather than the n-dimensional Riemann integral is motivated by the fact that, as an analytical tool, the former has several distinct advantages over the latter. For example, the Lebesgue theory allows the interchange of limit and integral in more general settings. Furthermore, the collection of Lebesgue integrable functions, which includes unbounded functions on unbounded domains, is significantly larger than the set of Riemann integrable functions. These advantages make the Lebesgue theory better suited for applications based on, for example, probability theory and, in particular, stochastic processes. The key idea in Riemann integration on Rn is the partitioning of the domain of the integrand f into n-dimensional subintervals. The Riemann integral is then obtained as a limit of Riemann sums, that is, sums of function values times the volumes of the subintervals. In Lebesgue integration, it is the range of f rather than the domain that is partitioned into subintervals (see Figure 10.7). This still produces a partition of the domain of f ; however, the sets in this partition are generally more complicated than subintervals. The Lebesgue integral is constructed by multiplying the measure of these sets by function values, adding the results, and then taking limits. In this chapter we construct the measure and in the next chapter we construct the integral. The precise connection between the Riemann and Lebesgue integrals is made in Section 11.4.

10.1

General Measure Theory

In this section we give brief description of those aspects of measure theory that will be needed to construct Lebesgue measure on Rn . For a comprehensive treatment see, for example, [4].

343

344

A Course in Real Analysis

Sigma Fields 10.1.1 Definition. A σ-field on a nonempty set S is a collection F of subsets of S such that (a) S, ∅ ∈ F; (b) A ∈ F implies Ac ∈ F; (c) Ak ∈ F, k ∈ N, implies

[ k

Ak ∈ F.

♦

Part (c) of the definition says that F is closed under countable unions. By DeMorgan’s law, [ c \ Ak = Ack , k

k

hence part (b) implies that F is also closed under countable intersections. The collection of all subsets of S and the collection {∅, S} are simple examples of σ-fields. The following examples are somewhat more interesting. 10.1.2 Example. If A is an arbitrary collection of subsets of S, then the σ-field generated by A is the intersection σ(A) of all σ-fields containing A. It is the smallest σ-field containing A in the sense that if F is a σ-field containing A then F contains σ(A). In the special case where A = {A1 , A2 , . . .} is a countable partition of S, σ(A) is simply the collection F of all unions of members of A. Indeed, F is clearly closed under countable unions, and the calculation c [ [ Ak = Ak , F ⊆ N k∈F c

k∈F

shows that F is closed under complements. Thus, by minimality, σ(A) = F. ♦ 10.1.3 Example. If F is a σ-field on S and E ⊆ S, then the collection FE := {A ∩ E : A ∈ F} is a σ-field of subsets of E. Moreover, FE ⊆ F iff E ∈ F. (See Exercise 2.) ♦

Measure on a Sigma Field 10.1.4 Definition. A measure on a σ-field F of subsets of S is a function µ : F → [0, +∞] such that µ(∅) = 0 and µ has the additivity property [ X µ Ak = µ(Ak ) k

k

for any finite or infinite sequence of pairwise disjoint sets Ak ∈ F. The extended real number µ(A) is called the measure of A. ♦

Lebesgue Measure on Rn

345

10.1.5 Example. Let {pk } be a sequence of nonnegative real numbers. Define X µ(E) = pk , E ⊆ N, k∈E

where the sum may be infinite. (By convention, the sum over the empty set is zero.) It is not difficult to show that µ is a measure on the σ-field of all subsets of N. In the special case pk = 1 for all k, µ(E) counts the number of elements in E if E is a finite set, and µ(E) = +∞ otherwise. In this case, µ is called a counting measure. ♦ 10.1.6 Proposition. Let µ be a measure on a σ-field F and A1 , A2 , · · · ∈ F. (a) If A1 ⊆ A2 , then µ(A1 ) ≤ µ(A2 ) (monotonicity). P S (b) µ k Ak ≤ k µ(Ak ) (subadditivity). (c) µ(A1 ) + µ(A2 ) = µ(A1 ∪ A2 ) + µ(A1 ∩ A2 ) (inclusion-exclusion). (d) If Ak ↑ A, then µ(Ak ) ↑ µ(A) (continuity from below). (e) If Ak ↓ A and µ(A1 ) < +∞, then µ(Ak ) ↓ µ(A) (continuity from above). Proof. (a) By additivity, µ(A2 ) = µ(A2 \ A1 ) + µ(A1 ) ≥ µ(A1 ). (b) Write [ Ak = A1 ∪ (A2 ∩ Ac1 ) ∪ · · · ∪ (Am ∩ Ac1 ∩ · · · ∩ Acm−1 ) ∪ · · · . k

Since the sets in the union on the right are pairwise disjoint, by countable additivity and monotonicity [ X X µ Ak = µ(A1 ) + µ Ac1 ∩ · · · ∩ Acm−1 ∩ Am ≤ µ(Am ). k

m≥2

m≥1

(c) Since A1 ∪ A2 is the union of the pairwise disjoint sets A1 ∩ Ac2 , A1 ∩ A2 , and A2 ∩ Ac1 , additivity implies that µ(A1 ∪ A2 ) = µ(A1 ∩ Ac2 ) + µ(A1 ∩ A2 ) + µ(A2 ∩ Ac1 ). Similarly, µ(A1 ) + µ(A2 ) = µ(A1 ∩ Ac2 ) + 2µ(A2 ∩ A1 ) + µ(A2 ∩ Ac1 ). It follows that µ(A1 ∪ A2 ) = +∞ iff µ(A1 ) + µ(A2 ) = +∞, which proves (c) in the infinite case. In the finite case, simply subtract the above equations to get (c). (d) This is clear if some Ak has infinite measure, so assume µ(Ak ) < +∞

346

A Course in Real Analysis

for all k. Set A0 = ∅ and Ek = Ak \ Ak−1 . The sets Ek are pairwise disjoint, S∞ A = k=1 Ek , and µ(Ek ) = µ(Ak ) − µ(Ak−1 ), hence by additivity µ(A) =

∞ X

µ(Ek ) = lim n

k=1

n X µ(Ak ) − µ(Ak−1 ) = lim µ(An ). n

k=1

(e) Note that A1 \ Ak ↑ A1 \ A, hence, by (d), µ(A1 ) − µ(A) = µ(A1 \ A) = lim µ(A1 \ Ak ) = µ(A1 ) − lim µ(Ak ). k

k

Exercises For the following exercises, F is a σ-field of subsets of a set S and µ is a measure on F. 1.S Find an example which shows that the hypothesis µ(A1 ) < +∞ in 10.1.6(e) cannot be removed. 2. Verify that the collection FE in 10.1.3 is a σ-field. 3.S Let A, B ∈ F with µ(B) = 0. Show that µ(A ∪ B) = µ(A \ B) = µ(A). P 4. Let Ak , Bk ∈ F and let s denote the sum k µ(Ak \ Bk ). Prove that [ \ [ \ (a) µ Ak \ Bk ≤ s. (b) µ Ak \ Bk ≤ s. k

k

k

k

5.S (General inclusion-exclusion principle). Let µ A1 ∪ · · · ∪ An < +∞. Prove that for n ≥ 2 n X µ A1 ∪ · · · ∪ An = µ(Ai ) − i=1

+

n X

n X

µ(Ai ∩ Aj )

1≤i |I| − ε. Let {Ik } be any sequence of intervals covering I. By

350

A Course in Real Analysis

10.2.4, we may take Ik ∈ O. Let {Jk } be a sequence in H such that Ik ⊆ Jk and |Jk | < |Ik | + ε/2j (Figure 10.2). Since J is compact, there exists an m such that J ⊆ I1 ∪ · · · ∪ Im ⊆ J1 ∪ · · · ∪ Jm . Therefore, |I| − ε < |J| ≤ |J1 | + · · · + |Jm | ≤ ε +

∞ X

|Ik |,

k=1

P∞ the second inequality by 10.2.2(b). Letting ε → 0, we have |I| ≤ k=1 |Ik |. Therefore, |I| ≤ λ∗ (I). For (e), we may assume that λ∗ (Ak ) < +∞ for all k. Let ε > 0 and for each k choose a sequence {Ik,j }∞ j=1 in I such that Ak ⊆

∞ [

Ik,j and

j=1

∞ X

λ∗ (Ik,j ) ≤ λ∗ (Ak ) +

j=1

ε . 2k

S∞ Since the countable collection {Ik,j : k, j = 1, 2, . . .} covers k=1 Ak , [ X ∞ ∞ X ∞ ∞ X Ak ≤ λ∗ λ∗ (Ik,j ) ≤ λ∗ (Ak ) + ε. k=1

k=1 j=1

k=1

Since ε was arbitrary, (e) Smfollows. For (f), let I ∪ J ⊆ k=1 Ik , where Ik ∈ H. Since Ik ⊇ (Ik ∩ I) ∪ (Ik ∩ J), 10.2.2(c) shows that |Ik | ≥ |Ik ∩ I| + |Ik ∩ J|. Therefore, by (c), m X

|Ik | ≥

k=1

m X

|Ik ∩ I| +

k=1

m X

|Ik ∩ J| ≥ λ∗ (I) + λ∗ (J) = |I| + |J|,

k=1

Taking the infimum we have λ∗ (I ∪ J) ≥ |I| + |J|. The reverse inequality follows from (e).

Exercises 1.S Prove the assertions in 10.2.4. More generally prove the following: Let J be a collection of bounded intervals with the property that for each bounded interval I and each ε > 0 there exists J ∈ J containing I such that |J| < |I| + ε. For A ⊆ Rn , define X [ α(A) := inf |Jk | : Jk ∈ J and Jk ⊇ A . k

k

Then λ∗ (A) = α(A). 2. Prove that in the definition of λ∗ (A), I may be replaced by the collection Ir of all bounded intervals I whose coordinate intervals have rational endpoints.

Lebesgue Measure on Rn

351

3. Prove that in the definition of λ∗ (A), I may be replaced by the collection U of all bounded open subsets of R and also by the collection K of all compact sets. 4.S Show that Lebesgue outer measure is translation invariant, that is, λ∗ (A + x) = λ∗ (A) for every A ⊆ Rn and x ∈ Rn , where A + x := {a + x : a ∈ A}. 5. Show that Lebesgue outer measure has the reflection property λ∗ (−A) = λ∗ (A) for every A ⊆ Rn , where −A := {x : −x ∈ A}. 6. Show that Lebesgue outer measure has the dilation property λ∗ (rA) = |r|n λ∗ (A) for every A ⊆ Rn and r ∈ R, where rA := {rx : x ∈ A}.

10.3

Lebesgue Measure

By subadditivity of outer measure, λ∗ (C) ≤ λ∗ (C ∩ E) + λ∗ (C ∩ E c ) for all subsets E and C of Rn . The following definition singles out those sets E that also satisfy the reverse inequality for all sets C. 10.3.1 Definition. A subset E of Rn is said to be Lebesgue measurable if λ∗ (C) ≥ λ∗ (C ∩ E) + λ∗ (C ∩ E c )

(10.3)

for all subsets C of Rn . The collection of all Lebesgue measurable subsets of Rn is denoted by M = M(Rn ). The restriction of λ∗ to M is called Lebesgue measure on Rn and is denoted by λ = λn . Any particular set C satisfying (10.3) is called a test set for E. ♦ If C is a test set for E, then λ∗ (C) = λ∗ (C ∩ E) + λ∗ (C ∩ E c ) ; the set E splits the outer measure of C. 10.3.2 Theorem. M is a sigma field containing all sets of outer measure zero and λ is a measure on M.

352

A Course in Real Analysis

Proof. Clearly, ∅, Rn ∈ M, and since E and E c appear symmetrically in (10.3), E c ∈ M iff E ∈ M. If λ∗ (E) = 0, then, by monotonicity, λ∗ (C ∩ E) + λ∗ (C ∩ E c ) ≤ λ∗ (E) + λ∗ (C ∩ E c ) = λ∗ (C ∩ E c ) ≤ λ∗ (C), hence E ∈ M. Therefore, M contains all sets of LebesgueSouter measure 0. ∞ It remains to showSthat, for a sequence {Ek } in M, k=1 Ek ∈ M and P ∞ ∞ ∗ furthermore that λ∗ k=1 Ek = k=1 λ (Ek ) if the sets Ek are pairwise disjoint. This is accomplished in the following four steps: I. If E, F ∈ M, then E ∪ F, E ∩ F ∈ M. J To show that E ∪ F ∈ M, take any set C as a test set for E and take C ∩ E c as a test set for F to obtain λ∗ (C) = λ∗ (C ∩ E) + λ∗ (C ∩ E c ) and λ∗ (C ∩ E c ) = λ∗ (C ∩ E c ∩ F ) + λ∗ (C ∩ E c ∩ F c ). Combining these and using subadditivity, λ∗ (C) = λ∗ (C ∩ E) + λ∗ (C ∩ E c ∩ F ) + λ∗ (C ∩ E c ∩ F c ) ≥ λ∗ (C ∩ E) ∪ (C ∩ E c ∩ F ) + λ∗ (C ∩ E c ∩ F c ).

(10.4)

Since C ∩ E ∪ C ∩ E c ∩ F ⊇ C ∩ (E ∪ F ), by monotonicity and (10.4), λ∗ (C) ≥ λ∗ C ∩ (E ∪ F ) + λ∗ C ∩ E c ∩ F c = λ∗ C ∩ (E ∪ F ) + λ∗ C ∩ (E ∪ F )c .

This shows that E ∪ F ∈ M. That E ∩ F ∈ M follows from De Morgan’s law E ∩ F = (E c ∪ F c )c . K

II. If C ⊆ Rn and E, F ∈ M with E ∩ F = ∅, then λ∗ C ∩ (E ∪ F ) = λ∗ (C ∩ E) + λ∗ (C ∩ F ).

J Use C ∩ (E ∪ F ) as a test set for E to obtain λ∗ C ∩ (E ∪ F ) = λ∗ C ∩ (E ∪ F ) ∩ E + λ∗ C ∩ (E ∪ F ) ∩ E c = λ∗ (C ∩ E) + λ∗ (C ∩ F ).

K

S III. If the sets P Ek are pairwise disjoint and F := k Ek , then F ∈ M and λ(F ) = k λ(Ek ). Sk J Set Fk = j=1 Ej and let C ⊆ Rn . By steps I and II and induction, Fk ∈ M and k X λ∗ (C ∩ Fk ) = λ∗ (C ∩ Ej ). j=1

Lebesgue Measure on Rn

353

Thus, by monotonicity, λ∗ (C) = λ∗ (C ∩ Fk ) + λ∗ (C ∩ Fkc ) ≥

k X

λ∗ (C ∩ Ej ) + λ∗ (C ∩ F c ).

j=1

Since k was arbitrary, by subadditivity λ∗ (C) ≥

∞ X

λ∗ (C ∩Ej )+λ∗ (C ∩F c ) ≥ λ∗ (C ∩F )+λ∗ (C ∩F c ) ≥ λ∗ (C).

j=1

The inequalities are therefore equalities, which shows that F ∈ M. Taking C = F verifies the second assertion of III. K IV.

∞ [

Ek ∈ M.

k=1

J Use I, III and

S∞

k=1

Ek = E1 ∪ (E2 ∩ E1c ) ∪ (E3 ∩ E1c ∩ E2c ) ∪ . . . . K

10.3.3 Definition. A set E is said to have (Lebesgue) measure zero if λ(E) = 0. A property P (x) depending on points x ∈ Rn is said to hold almost everywhere (a.e.) or for almost all x if the set of all x for which P (x) is false has measure zero. ♦ For example, the Dirichlet function is zero a.e. More generally, if E ∈ M then 1E = 0 a.e. iff λ(E) = 0. By subadditivity, a countable union of sets of measure zero has measure zero. Since a point has measure zero, it follows that every countable set has measure zero. In particular, Qn has measure zero. The following is an example of an uncountable set with measure zero. 10.3.4 Example. (Cantor ternary set). Remove from I0,1 := [0, 1] the “middle third” open interval (1/3, 2/3), leaving closed intervals I1,1 and I1,2 with union E1 and total length 2/3. Next, remove from each of I1,1 and I1,2 the middle third open interval, leaving closed intervals I2,1 , I2,2 , I2,3 , and I2,4 with union E2 and total length 4/9 = (2/3)2 . By induction, one obtains a decreasing sequence .00220 . . .

E1 E2 E3

I0,1

.22202 . . .

I1,2

I1,1 I2,1

I2,2

I3,1 I3,2

I3,3 I3,4

I2,3

I2,4

I3,5 I3,6

I3,7 I3,8

.. .

FIGURE 10.3: Middle thirds construction. of closed sets Ek =

S 2k

j=1 Ik,j

such that, by subadditivity, λ∗ (Ek ) ≤ (2/3)k . If

354

A Course in Real Analysis

E denotes the intersection of these sets, then E is closed and, by monotonicity, λ∗ (E) ≤ (2/3)k for all k. Therefore, λ∗ (E) = 0. To show that E is uncountable, we use the fact that every real number x ∈ [0, 1] has both ternary and binary representations x = .d1 d2 . . . (ternary) = x = .e1 e2 . . . (binary) =

∞ X k=1 ∞ X

dk 3−k , where dk ∈ {0, 1, 2}, ek 2−k , where ek ∈ {0, 1}.

k=1

These are obvious analogs of the decimal representation of a real number (see Exercise 6.1.14). As with decimal representations, there is some ambiguity; for example, 1/3 = .1000 . . . = .0222 . . . (ternary). Now observe that if dk = 0 or

Ik−1,j dk = 0

Ik,2j−1

dk = 2

Ik,2j

FIGURE 10.4: x ∈ Ik−1,j ⇒ x ∈ Ik,2j−1+dk /2 . 2 for all k in the above ternary representation, then x ∈ E. For example, .00220 . . . ∈ I1,1 ∩ I2,2 ∩ I3,4 ∩ I4,7 ∩ · · · and .22202 . . . ∈ I1,2 ∩ I2,4 ∩ I3,7 ∩ I4,14 ∩ · · · (see Figure 11.2). In general, if x ∈ Ik−1,j , then x ∈ Ik,2j−1+dk /2 . Conversely, let x ∈ E. Since x ∈ E1 , we may choose d1 = 0 or 2. Similarly, since x ∈ E2 , we may choose d2 = 0 or 2, etc. Continuing in this manner, we see that every member of E has a (unique) ternary representation with digits 0 or 2. Now define ϕ : E → [0, 1] by ϕ .d1 d2 . . . (ternary) = .e1 e2 . . . (binary), where dk ∈ {0, 2} and ek = dk /2. The function ϕ is not one-to-one; for example, ϕ(.0222 . . .) = .0111 . . . = .1000 . . . = ϕ(.2000 . . .). However, by removing from E the countable set of all numbers with ternary representations having a tail end of zeros, these being necessarily rational, we obtain a set F on which ϕ is one-to-one. Since ϕ(F ) = (0, 1), it follows that E is uncountable. ♦

Lebesgue Measure on Rn

355

We show in the next section that intervals, open sets, and closed sets are Lebesgue measurable. It follows that countable unions and intersections of these sets are also Lebesgue measurable. The reader may well ask if there are any subsets of Rn that are not Lebesgue measurable. The answer is that there are many, but their construction is surprisingly intricate. The following is an example for the case n = 1. set). 10.3.5 Example. (A non-measurable Consider sets of the form x + Q, x ∈ R. We claim that if x + Q ∩ y + Q 6= ∅, then x + Q = y + Q. To see this, choose z ∈ x + Q ∩ y + Q , say z = x + r1 = y + r2 , r1 , r2 ∈ Q. Then, for any r ∈ Q, x + r = y + r2 − r1 + r ∈ y + Q and y + r = x + r1 − r2 + r ∈ x + Q, hence x + Q = y + Q. It follows that every real number is in exactly one of the sets x + Q. Now form a set E by choosing exactly one number in each of the distinct sets x + Q.1 For each x ∈ R, the set E ∩ (x + Q) has a single member, hence x = y + r for unique y ∈ E and r ∈ Q. Thus R may be expressed as a disjoint union ∞ [ R= (rk + E), (10.5) k=1

where {r1 , r2 , . . .} is an enumeration of Q. Suppose, for a contradiction, that E is Lebesgue measurable. Then λ(E) > 0, otherwise, by (10.5), translation invariance (Exercise 1), and countable additivity, R would have measure zero. On the other hand, let I be an arbitrary bounded interval and set J = Q ∩ (0, 1). Since I is measurable (Section 10.4, below), the set [ F := r+E∩I r∈J

is measurable. Also, since I and J are bounded so is F . Thus, by countable additivity and translation invariance, X X +∞ > λ(F ) = λ r+E∩I = λ E∩I . r∈J

r∈J

Since J is an infinite set, λ(E ∩ I = 0. But then

λ(E) =

∞ X k=0

∞ X λ(E ∩ [k, k + 1) + λ(E ∩ [−k − 1, −k) = 0. k=0

This contradiction shows that E cannot be Lebesgue measurable.

♦

1 The existence of E requires the axiom of choice, one of the axioms of Zermelo–Fraenkel set theory.

356

A Course in Real Analysis

Exercises 1. ⇓2 Show that E ∈ M and x ∈ Rn imply that x + E ∈ M. Conclude from Exercise 10.2.4 that λ(x + E) = λ(E). 2.S Show that E ∈ M implies that −E ∈ M. Conclude from Exercise 10.2.5 that λ(−E) = λ(E). 3. Show that E ∈ M and r 6= 0 imply that rE ∈ M. Conclude from Exercise 10.2.6 that λ(rE) = |r|n λ(E). 4.S Show that for any ε > 0 there exists an open set D dense in Rn such that λ(D) < ε. 5. Prove that if f and g are continuous real-valued functions on Rn which are equal a.e., then f = g. Does the same result hold if only one of the functions is continuous? 6. Let A be the subset of [0, 1] whose members are missing the digit three in their decimal expansions. Prove that A is uncountable and λ(A) = 0.

10.4

Borel Sets

Recall that the σ-field generated by a collection A of sets is the intersection of all σ-fields containing A (10.1.2). The following special case is of particular importance. 10.4.1 Definition. The Borel σ-field B = B(Rn ) is the σ-field generated by the open sets of Rn . A member of B is called a Borel set. ♦ 10.4.2 Remark. Since open sets and closed sets are complements of one another, B is also generated by the closed sets. Furthermore, since an open set is a countable union of n-dimensional open intervals (Exercise 8.2.4), B is also generated by O. Since every open interval is a countable union of closed and bounded intervals and every closed interval is a countable intersection of open intervals, B is also generated by C. Similar considerations show that B is generated by H as well. ♦ 10.4.3 Theorem. B(Rn ) ⊆ M(Rn ). Proof. By 10.4.2, it suffices to show that H ⊆ M. Note first that if I, J ∈ H then, using partitions as in the proof of 10.2.2, I \ J may be expressed (usually in several ways) as a disjoint union of members of H. (See Figure 10.5.) 2 This

exercise will be used in 11.2.18.

Lebesgue Measure on Rn

357

J

I1

I5

I2

I4

I3 I

FIGURE 10.5: I \ J = I1 ∪ I2 ∪ I3 ∪ I4 ∪ I5 . Now let I ∈ H, C ⊆ Rn , and let {Ik } be any sequence in H that covers C. We show that X λ∗ (C ∩ I) + λ∗ (C ∩ I c ) ≤ λ∗ (Ik ). (10.6) k

Taking the infimum over all such sequences {Ik } produces the inequality λ∗ (C ∩ I) + λ∗ (C ∩ I c ) ≤ λ∗ (C), provingPthat I ∈ M. ∞ To verify (10.6), we may assume that k=1 λ∗ (Ik ) < +∞. For each k there exist, according to the observation at the beginning of the proof, intervals Smk Jj,k ∈ H such that Ik \ I = j=1 Jj,k (disjoint union). Then Ik = (Ik ∩ I) ∪ (Ik \ I) = (Ik ∩ I) ∪

m [k

Jj,k (disjoint union),

j=1

hence, by 10.2.5(f) and induction, λ∗ (Ik ) = λ∗ (Ik ∩ I) +

mk X

λ∗ (Jj,k ).

j=1

Since {Ik ∩ I}k covers C ∩ I and {Jj,k }j,k covers C ∩ I c , X

λ∗ (Ik ) =

k

X

λ∗ (Ik ∩ I) +

k

mk XX k

λ∗ (Jj,k ) ≥ λ∗ (C ∩ I) + λ∗ (C ∩ I c ).

j=1

It may be shown that the inclusion B ⊆ M is proper.3 The importance of Borel sets is that they are closely linked to the topology of Rn and hence are better suited for contexts involving continuous functions. The remainder of the section demonstrates the precise connection between B and M. 3 See,

for example, [4].

358

A Course in Real Analysis

10.4.4 Lemma. For any bounded E ∈ M, there exists a decreasing sequence of bounded open sets Uk ⊇ E such that lim λ(Uk ) = lim λ cl(Uk ) = λ(E). k

k

Proof. By definition of λ(E), for each k we may choose a sequence of open intervals Ij,k with union Vk containing E such that X λ(E) ≤ λ(Vk ) ≤ λ cl(Vk ) ≤ |cl(Ij,k )| < λ(E) + 1/k. j

The sequence of open sets Uk := V1 ∩ · · · ∩ Vk is decreasing, contains E, and satisfies λ(E) ≤ λ(Uk ) ≤ λ cl(Uk ) ≤ λ cl(Vk ) ≤ λ(E) + 1/k. Letting k → +∞ proves the assertion. 10.4.5 Lemma. For any E ∈ M, there exists an increasing sequence of compact sets Ck ⊆ E such that limk λ(Ck ) = λ(E). Proof. Suppose first that E is bounded. Let I be a bounded open interval containing cl(E) and let ε > 0. Choose a sequence of open intervals Ik with

E

K =I \U

I \ E ⊆ U :=

S

k Ik

Ik

I FIGURE 10.6: K = cl(E) \ U . P∞ union U ⊇ I \ E such that k=1 |Ik | < λ(I \ E) + ε. Since I is open, we may assume that Ik ⊆ I (otherwise, replace Ik by Ik ∩ I). Then I \ E ⊆ U ⊆ I and λ(U ) ≤ λ(I \ E) + ε = λ(I) − λ(E) + ε. Set K = I \ U . Then K ⊆ E ⊆ cl(E) ⊆ I, hence K = cl(E) \ U . Therefore, K is compact and λ(K) = λ(I) − λ(U ) ≥ λ(I) − λ(I) − λ(E) + ε = λ(E) − ε. Now let E ∈ M be arbitrary and let {Ek } be a sequence of bounded

Lebesgue Measure on Rn

359

measurable sets such that Ek ↑ E. By the first paragraph, for each k we may choose a compact set Kk ⊆ Ek such that λ(Kk ) > λ(Ek ) − 1/k. These conditions still hold if Kk is replaced by the compact set Ck = K1 ∪ · · · ∪ Kk . The sequence {Ck } is increasing, contained in E, and λ(Ck ) → λ(E). 10.4.6 Lemma. If E ∈ M is bounded, then there exists an increasing sequence of compact sets Ck and a decreasing sequence of bounded open sets Uk such that Ck ⊆ E ⊆ Uk and lim λ(Uk \ Ck ) = 0. k

Proof. If Ck and Uk are as in 10.4.4 and 10.4.5 with Uk bounded, then λ(Uk \ Ck ) = λ(Uk \ E) + λ(E \ Ck ) → 0. 10.4.7 Theorem. If E ∈ M, then there exist Borel sets F and G such that F ⊆ E ⊆ G and λ(G \ F ) = 0. S∞ T∞ Proof. Suppose first that E is bounded. Set F = k=1 Ck and G = k=1 Uk , where Ck and Uk are the sets in 10.4.6. Then F ⊆ E ⊆ G and G \ F ⊆ Uk \ Ck for all k, hence λ G \ F ≤ λ Uk \ Ck ) → 0. In the general case, there exists a sequence of bounded Borel sets Ek ↑ E. By the first paragraph, there exist Borel sets Fk and Gk such that Fk ⊆ Ek ⊆ Gk and λ(Gk \ Fk ) = 0. Let F =

∞ [ k=1

Fk

and G =

∞ [

Gk .

k=1

Then F and G are Borel sets, F ⊆ E ⊆ G, and G \ F ⊆ countable subadditivity, λ(G \ F ) = 0.

S∞

k=1

Gk \ Fk . By

10.4.8 Corollary. Every E ∈ M is the disjoint union of a Borel set and a set of Lebesgue measure zero. Proof. By the theorem, E = F ∪ (E \ F ), where F ∈ B and λ(E \ F ) = 0.

Exercises 1.S Let ε > 0. Construct an explicit compact subset C ⊆ E := [0, 1] ∩ I such that λ(E \ C) < ε. 2. Show that the graph G := {(x, y) : y = f (x)} of a continuous function f : R → R is a Borel set with two-dimensional Lebesgue measure zero. 3. Let E denote the Cantor set (10.3.4). Show that E + Q and E + E are Borel sets and find their measures. 4.S Let B ∈ B(Rn ), y ∈ Rn , and r ∈ R. Prove that B + y := {x + y : x ∈ B}, rB := {rx : x ∈ B} and −B := {x : −x ∈ B} are Borel sets.

360

A Course in Real Analysis

10.5 .

Measurable Functions In this section, F denotes a σ-field of subsets of a set S.

Definition and Basic Properties 10.5.1 Lemma. (a) f −1 {+∞} , (b) f −1 {+∞} , (c) f −1 {+∞} ,

Let f : S → R. The following statements are equivalent: f −1 {−∞} ∈ F, and f −1 (U ) ∈ F for all open sets U ⊆ R. f −1 {−∞} ∈ F, and f −1 (F ) ∈ F for all closed sets F ⊆ R. f −1 {−∞} ∈ F, and f −1 (B) ∈ F for all Borel sets B ⊆ R.

(d) {x : f (x) ≤ t} ∈ F for all t ∈ R. (e) {x : f (x) < t} ∈ F for all t ∈ R. (f) {x : f (x) ≥ t} ∈ F for all t ∈ R. (g) {x : f (x) > t} ∈ F for all t ∈ R. of (a) Proof. The equivalence c and (b) follows from the general set theoretic relation f −1 (Ac ) = f −1 (A) . Clearly, (c) implies (b). For the converse, denote by G the collection of all Borel subsets B of R such that f −1 (B) ∈ F. Then G is a σ-field. If (b) holds, then G contains the closed sets, hence, by minimality, G = B. This proves (c) and hence shows that (a)–(c) are equivalent. The implications (c) ⇒ (d) ⇒ (e) ⇒ (f) ⇒ (g) ⇒ (d) are proved using the following set relations: (c) ⇒ (d) : {x : f (x) ≤ t} = f −1 {−∞} ∪ f −1 (−∞, t] . ∞ [ (d) ⇒ (e) : {x : f (x) < t} = {x : f (x) ≤ t − 1/n} . n=1 c

(e) ⇒ (f) : {x : f (x) ≥ t} = {x : f (x) < t} . ∞ [ (f) ⇒ (g) : {x : f (x) > t} = {x : f (x) ≥ t + 1/n} . n=1 c

(g) ⇒ (d) : {x : f (x) ≤ t} = {x : f (x) > t} . Thus (d)–(g) are equivalent and are implied by (a)–(c). Now assume that (d)–(g) hold. Then the sets f −1 (+∞) =

∞ \ k=1

{x : f (x) > k} , f −1 (−∞) =

∞ \ k=1

{x : f (x) < −k}

Lebesgue Measure on Rn

361

are members of F, and for −∞ < a < b < +∞, f −1 (a, b) = {x : f (x) > a} ∩ {x : f (x) < b} ∈ F. Since every open subset of R is a countable union of open intervals, (a) holds, completing the proof. 10.5.2 Definition. A function f : S → R is said to be measurable with respect to F, or simply F-measurable, if any (hence all ) of the conditions in Lemma 10.5.1 hold. ♦ The following theorem shows that the collection of all measurable functions is closed under the standard ways of combining functions. The functions f + , f − , supn fn , inf n fn , lim supn fn , and lim inf n fn in the statement of the theorem are defined by f + (x) := max{f (x), 0}, (sup fk )(x) := sup fk (x), k

f − (x) := max{−f (x), 0}, (inf fk )(x) := inf fk (x), k

k

(lim sup fk )(x) := lim sup fk (x), k

k

k

(lim inf fk )(x) := lim inf fk (x). k

k

10.5.3 Theorem. Let f, g, fk be measurable with respect to a σ-field F on S. If α ∈ R and p > 0, then f + g, αf , f 2 , f g, |f |p , f + , f − , supk fk , inf k fk , lim supk fk , and lim inf k fk are measurable. Proof. The proof is based on the following equalities. The details are left to the reader. [ • {x : (f + g)(x) < t} = {x : f (x) < r} ∩ {x : g(x) < t − r}. r∈Q

• {x : αf (x) < t} = {x : f (x) < t/α} for α > 0. √ √ • x : f 2 (x) < t = x : − t < f (x) < t for t > 0. • f g = 12 [(f + g)2 − f 2 − g 2 ]. • {x : |f |p (x) < t} = x : −t1/p < f (x) < t1/p for t > 0. • f + = 12 (|f | + f ), f − = 12 (|f | − f ). n o \ • x : sup fk (x) ≤ t = {x : fk (x) ≤ t}. k

k

• inf k fk = − supk (−fk ). • lim inf k fk = supk inf j≥k fj ;

lim supk fk = − lim inf k (−fk ).

10.5.4 Corollary. If fk : S → R is F-measurable for every k and if fk → f on S, then f is F-measurable.

362

A Course in Real Analysis

Simple Functions 10.5.5 Definition. The indicator function of a set A ⊆ S is the function 1A on S defined by ( 1 if x ∈ A, and 1A (x) = ♦ 0 if x 6∈ A. For example, the Dirichlet function may be expressed as 1Q . 10.5.6 Definition. A function f : S → R with finite range is called a simple function. The collection of all nonnegative F-measurable simple functions is denoted by S+ (F). ♦ 10.5.7 Remarks. (a) A linear combination of indicator functions is a simple function. Conversely, a simple function f may be expressed in many ways as a linear combination of indicator functions. The most important of these is the standard form f=

m X

aj 1Aj , Aj := {x ∈ S : f (x) = aj } ,

(10.7)

j=1

where a1 , . . . , am ∈ R are the distinct values of f . Note that the sets Aj form a partition of Rn . By 10.5.3 and Exercise 8, f is F-measurable iff Aj ∈ F for each j. (b) If f1 , f2 ∈ S+ (F), α ≥ 0, and p > 0, then the functions αf1 , f1 + f2 , f1 f2 , f1p , max{f1 , f2 }, min{f1 , f2 } are nonnegative, measurable, and have finite ranges, hence are in S+ (F).

♦

The following theorem shows that the collection S+ (F) generates all measurable functions. It is a key ingredient in the development of the Lebesgue theory. 10.5.8 Theorem. For each nonnegative F-measurable function f on S, there exists a sequence {fk } in S+ (F) such that fk ↑ f on S. Proof. Let f0 = 0, and for each k ∈ N define k

fk =

k2 X j−1 j=1

2k

1Ak,j + k1Ak , where Ak = {x : f (x) ≥ k} and

Ak,j = x : (j − 1)2−k ≤ f (x) < j2−k , j = 1, 2, . . . , k2k . (See Figure 10.7.) We show that fk (x) ↑ f (x) for each x ∈ S. This is clear if f (x) = +∞, since then fk (x) = k for all k. Suppose f (x) ∈ R and let k ∈ N. If f (x) ≥ k + 1, then fk+1 (x) = k + 1 > k = fk (x). If

Lebesgue Measure on Rn

363

f k .. .

j2−k

(j − 1)2−k S

Ak Ak,j FIGURE 10.7: The components of fk . k +. 1

.. .

..

k .. . −k

j2

(2j − 1)2k+1 (j − 1)2−k x x x FIGURE 10.8: The components of fk+1 .

S

k ≤ f (x) < k + 1, then fk+1 (x) ≥ k = fk (x). Finally, suppose that f (x) < k. Then (j − 1)2−k ≤ f (x) < j2−k for some 1 ≤ j ≤ k2k , hence 2j − 2 2j − 1 ≤ f (x) < k+1 2k+1 2

or

2j − 1 2j ≤ f (x) < k+1 . 2k+1 2

(See Figure 10.8.) In either case, fk+1 (x) ≥

2j − 2 j−1 = k = fk (x). k+1 2 2

Thus fk ↑ on S. Since 0 ≤ f (x) − fk (x) < 2−k for all sufficiently large k, fk (x) → f (x).

Lebesgue and Borel Measurable Functions 10.5.9 Definition. A function f : Rn → R is said to be Borel (Lebesgue) measurable if f is measurable with respect to the σ-field B(Rn ) (M(Rn )). ♦ 10.5.10 Proposition. If f is Lebesgue measurable and f = g a.e., then g is Lebesgue measurable.

364

A Course in Real Analysis

Proof. Let A = {x : f (x) 6= g(x)}. By hypothesis, A has Lebesgue measure zero, hence Ac and {x : g(x) < t} ∩ A ∈ M. Therefore, {x : g(x) < t} = {x : f (x) < t} ∩ Ac ∪ {x : g(x) < t} ∩ A ∈ M. If f is Borel measurable and f = g a.e., then g need not be Borel measurable. Indeed, there exist sets E ∈ M \ B with measure zero, hence 1E = 0 a.e. but 1E is not Borel measurable.4 Clearly, a Borel measurable function is Lebesgue measurable. The preceding paragraph shows that the converse is false. However, we have 10.5.11 Proposition. If f : Rn → R is Lebesgue measurable, then there exists a Borel measurable function g : Rn → R such that g = f a.e. Proof. Consider first the case f = 1E , E ∈ M. By 10.4.8, E is the disjoint union of a Borel set F and a set A of Lebesgue measure zero. Thus g := 1F is Borel measurable and f = g + 1A = g a.e. The assertion therefore holds for indicator functions. If f is a simple function, then each term in the standard form of f is a.e. equal to a Borel function. Therefore, the assertion holds for simple functions. If f ≥ 0, then, by 10.5.8, there exists a sequence of nonnegative Lebesgue measurable simple functions fk such that fk → f on Rn . By the previous paragraph, for each k there exists a Borel measurable function gk such that fk = gk a.e. Let Ak := {x : fk (x) 6= gk (x)}

and A :=

∞ [

Ak .

n=1

Then A ∈ M, λ(A) = 0 and fk (x) = gk (x) for all x ∈ Ac and all k. Let B denote the set of all x such that the sequence {gk (x)} does not converge. Then B ⊆ A and, by 10.5.3, B ∈ B. Let g = limk gk 1B c . Then g is Borel measurable and {x : g(x) 6= f (x)} ⊆ A so g = f a.e. Therefore, the assertion holds for nonnegative f . The general case follows from the identity f = f + − f − . Part (a) of 10.5.1 implies that a continuous function f : Rn → R is Borel measurable. In a similar vein, 10.5.12 Proposition. If f : Rn → R be continuous except on a set E of Lebesgue measure zero, then f is Lebesgue measurable. Proof. Let U ⊆ R be open. Then f −1 (U ) = A ∪ B, where A := f −1 (U ) ∩ E and B := f −1 (U ) ∩ E c . Since A ⊆ E and λ(E) = 0, A ∈ M. Since f is continuous on E c , B is open in E c , hence B = V ∩ E c for some open subset V of Rn . Therefore, B ∈ M, so f −1 (U ) ∈ M. By 10.5.1, f is Lebesgue measurable. 4 See,

for example, [4].

Lebesgue Measure on Rn

365

Proposition 10.5.12 implies that a function with at most countably many discontinuities is Lebesgue measurable. An examination of the proof shows that such a function is in fact Borel measurable. In particular, monotone functions on R, hence also functions of bounded variation, are Borel measurable (see 3.3.6 and 5.9.7). Note that a function that is continuous except on a set of measure zero is not necessarily equal a.e. to a continuous function (Exercise 12). Conversely, a function equal a.e. to a continuous function need not be continuous anywhere; the Dirichlet function is an obvious example.

Exercises .

In Exercises 1–8, F denotes a σ-field of subsets of a set S. 1.S Let f : S → R haveSthe property that 1Ak f is F-measurable for every k, where Ak ∈ F and k Ak = S. Prove that f is F-measurable. 2. Prove that if f : S → R is F-measurable and never zero, then 1/f is F-measurable. 3. Let f : S → R have the property that {x : f (x) < r} ∈ F for all r ∈ Q. Prove that f is F-measurable. 4. Let f : S → R be F-measurable and let g : R → R be continuous. Show that g ◦ f is F-measurable. 5. Let g, h : S → R be F-measurable functions. Prove that the following sets are F-measurable: (a)S {x ∈ S : g(x) > h(x)},

(b) {x ∈ S : g(x) ≥ h(x)},

(c) {x ∈ S : g(x) = h(x)},

(d)S {x ∈ S : g(x)h(x) = 1}.

6. Let {fk : S → R} be a sequence of F-measurable functions. Prove that the set x ∈ S : limk fk (x) exists in R is F-measurable. 7. Let f : S → R have range consisting of the distinct values ak , k ∈ N. Show that f is F-measurable iff {x ∈ S : f (x) = ak } ∈ F for every k. 8.S Let E ⊆ S. Prove that 1E is F-measurable iff E ∈ F. 9. Let A, B, and C be subsets of S. Prove: (a) 1AB = 1A 1B .

(b) 1A∪B = 1A + 1B − 1A 1B .

(c) 1Ac = 1 − 1A

(d) 1A ≤ 1B iff A ⊆ B.

10.S Define the symmetric difference A∆B of sets A and B by A∆B = (A \ B) ∪ (B \ A) = (A ∪ B) \ (A ∩ B). Prove that 1A∆B = |1A − 1B |.

366

A Course in Real Analysis

11. Let Ak ⊆ S and set B = lim inf k Ak and C = lim supk Ak . (see Exercise 10.1.6). Prove that (a) 1B = lim inf k 1Ak .

(b) 1C = lim supk 1Ak .

12. Prove that 1[0,1] is not equal a.e. to a continuous function on R. 13. Let f : R → R. Prove that if f 0 exists on R, then f 0 is Borel measurable. 14.S Let f (x) = bx−1 c−1 , 0 < x ≤ 1. Show that f is Borel measurable on (0, 1]. 15. Let f (x) = 1 + r bx−1 c , 0 < x ≤ 1, where r(k) denotes the remainder on division of an integer k by 3. Show that f is Borel measurable. √ 16. Define f : [0, 1] → R by f (x) = 0 if x is rational and f (x) = 1/ d if x is irrational, where d is the first nonzero digit in the decimal expansion of x. Prove that f is Borel measurable. 17.S Prove that if the function f in 10.5.8 is bounded, then the convergence of the sequence is uniform. 18. Let f : R2 → R have the property that f (x, y) is continuous in x for each y and Borel measurable in y for each x. Let g : R → R be Borel measurable. Prove that the function h(y) := f g(y), y is Borel measurable. Hint. Start with indicator functions g. 19.S ⇓5 Let f = (f1 , . . . , fm ) : Rn → Rm , where each fj : Rn → R is Borel measurable. Prove: (a) F := B ∈ B(Rm ) : f −1 (B) ∈ B(Rn ) is a σ-field. (b) F = B(Rm ), that is, f −1 (B) ∈ B(Rn ) for every B ∈ B(Rm ). (c) If F : Rm → R is Borel measurable, then the function g := F ◦ f is Borel measurable. 20. (a) Show that B × R ∈ B(R2 ) for all B ∈ B(R). (b) Let f : R → R be Borel measurable and define g : R2 → R by g(x, y) = f (x). Show that g is Borel measurable. 21. Let 0 ∈ A ⊆ Rn . Define the “radius function” fA : Rn → R by fA (x) := sup {t ≥ 0 : tx ∈ A} ,

x ∈ Rn .

(a) Let 0 ∈ Ak for all k and Ak ↑ A. Show that fAk ↑ fA . (b) Show that if A is open, then fA is positive and Borel measurable. (c) Use (b) to show that if A is compact, then fA is Borel measurable. (d) Conclude from 10.4.5 that fA is Borel measurable for any Borel set A containing 0. 5 This

exercise will be used in 11.5.4.

Chapter 11 Lebesgue Integration on Rn

In this chapter we use the measure theory developed in Chapter 10 to construct the Lebesgue integral of a measurable function of several variables. For comparison purposes, we begin with a brief description of the Riemann integral on compact subintervals of Rn .

11.1

Riemann Integration on Rn

The n-dimensional Riemann integral is constructed in essentially the same way as the one-dimensional integral: Let f be a bounded real-valued function on an n-dimensional interval [a, b] := [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ], where a := (a1 , . . . , an ) and b := (b1 , . . . , bn ). x2 b2

P2

I

I2 a2 a1

P1

I1

b1

x1

FIGURE 11.1: Partition of [a, b] × [c, d]. For each j, let Pj be a partition of the coordinate interval [aj , bj ]. The collection of all Cartesian products of the resulting coordinate subintervals produces a partition P of [a, b] consisting of n-dimensional subintervals I = I1 ×I2 ×· · ·×In with volume ∆VI := |I1 | |I2 | · · · |In | (see Figure 11.1). The lower and upper

367

368

A Course in Real Analysis

sums of f over P are defined by X S(f, P) = mI ∆VI ,

mI := inf f (x), and x∈I

I∈P

X

S(f, P) =

MI ∆VI ,

MI := sup f (x). x∈I

I∈P

The lower and upper integrals on [a, b] are defined by Z

b

f := sup S(f, P) and P

a

Z

b

a

f := inf S(f, P), P

where the supremum and infimum are taken over all partitions P of [a, b]. If the two integrals are equal, then f is said to be Riemann–Darboux integrable Rb on [a, b]. The common value of these integrals is then denoted by a f . As in the one-variable case, Z

b

f = lim S(f, P, {ξ I }I ), kPk→0

a

where kPk = maxj kPj k and S(f, P, {ξ I }I ) is the Riemann sum X S(f, P, {ξ I }I ) := f (ξ I )∆VI , ξ I ∈ I, I ∈ P. I

The n-dimensional Riemann integral has properties analogous to those of the one-dimensional integral. Moreover, as is shown in Section 11.5, if f is Rb continuous, then a f may be expressed as an iterated integral Z

b1

Z

bn

... a1

f (x1 , . . . , xn ) dxn · · · dx1 ,

an

effectively reducing the theory to the one-dimensional case. Integrals over regions bounded by “nice” surfaces may be similarly evaluated.

11.2

The Lebesgue Integral

The Lebesgue integral on Rn is defined first for nonnegative Lebesgue measurable simple functions and is then extended to a larger class of functions, including all nonnegative Lebesgue measurable functions. The identity f = f + − f − is then used to define the integral for general measurable functions.

Lebesgue Integration on Rn

369

The Integral of a Simple Function 11.2.1 Definition. Let f ∈ S+ (M) have standard form f=

m X

aj 1Aj , Aj := {x : f (x) = aj } ,

j=1

where {A1 , . . . , Am } is a (measurable) partition of Rn . The Lebesgue integral of f on Rn is defined by Z

f dλ :=

m X

aj λ(Aj ).

♦

j=1

Note that the above sum may contain a term of the form 0 · (+∞). While this expression was heretofore undefined, it is now necessary to make the definition 0 · (+∞) := 0. In particular, the integral of the identically zero function is 0 · λ(Rn ) = 0. 11.2.2 Lemma. If f, g ∈ S+ (M) and α ≥ 0, then Z Z Z Z Z (a) αf dλ = α f dλ; (b) (f + g) dλ = f dλ + g dλ; Z Z Z Z (c) f dλ ≤ g dλ if f ≤ g a.e. (d) f dλ = g dλ if f = g a.e. Proof. Part (a) is immediate from the definition, and (d) follows from (c). To prove (b) and (c), let f and g have standard representations f=

m X

ai 1Ai and g =

i=1

so Z

f=

m X

Sm

i=1

Ai =

λ(Ai ) =

k X

Sk

ai λ(Ai ) and

j=1

bj 1Bj ,

j=1

i=1

Since Rn =

k X

Z

g=

k X

bj λ(Bj ).

j=1

Bj and the unions are disjoint,

λ(Ai ∩ Bj ) and λ(Bj ) =

j=1

m X

λ(Ai ∩ Bj ),

i=1

hence Z

f dλ =

m X k X i=1 j=1

ai λ(Ai ∩ Bj ) and

Z

g dλ =

k X m X j=1 i=1

bj λ(Ai ∩ Bj ). (11.1)

370

A Course in Real Analysis

Now let c1 , . . . , cp be the distinct values of f + g, and set C` = {x : (f + g)(x) = c` } , ` = 1, . . . , p. Then f +g =

p X

c` 1C` and C` =

[

Ai ∩ Bj (disjoint),

{(i,j):ai +bj =c` }

`=1

so Z

(f + g) dλ =

p X

c` λ(Cl ) =

`=1

=

k m X X

p X `=1

X

c`

λ(Ai ∩ Bj )

{(i,j):ai +bj =c` }

(ai + bj )λ(Ai ∩ Bj ).

i=1 j=1

R R By (11.1), the last sum is f dλ + g dλ, proving (b). For (c), suppose f ≤ g a.e. and let E = {x : f (x) ≤ g(x)}. Then λ(E c ) = 0 and ai ≤ bj for all i, j for which Ai ∩ Bj ∩ E 6= ∅. From λ(Ai ∩ Bj ) = λ(Ai ∩ Bj ∩ E) + λ(Ai ∩ Bj ∩ E c ) = λ(Ai ∩ Bj ∩ E) and (11.1), we have Z

f dλ =

m X k X

ai λ(Ai ∩ Bj ∩ E) ≤

i=1 j=1

k X m X

bj λ(Ai ∩ Bj ∩ E) =

Z g dλ.

j=1 i=1

The Integral of a Measurable Function 11.2.3 Definition. Let f : Rn → R be Lebesgue measurable. If f ≥ 0, define Z Z nZ o f dλ = f (x) dλ(x) := sup fs dλ : fs ≤ f, fs ∈ S+ (M) . (11.2) In general, define the Lebesgue integral on Rn by Z Z Z f dλ := f + dλ − f − dλ, provided at least one of the terms on the right is finite. For E ∈ M define the Lebesgue integral on E by Z Z f dλ := f · 1E dλ E

R R whenever the right side is defined. If both E f + dλ and E f − dλ are finite, then f is said to be (Lebesgue) integrable on E. The collection of all integrable functions on E is denoted by L1 (E). Finally, f is said to be integrable if it is integrable on Rn . ♦

Lebesgue Integration on Rn R Note that from the definition, f ≥ 0 ⇒ f ≥ 0. More generally,

371

11.2.4 If f, g : Rn → RR are Lebesgue measurable, f ≤ g a.e., R Proposition. R R and f dλ, g dλ are defined, then f dλ ≤ g dλ. In particular, if f ≥ 0 and g is integrable, then f is integrable Proof. Assume first that f, g ≥ 0. Let fs ∈ S+ (M) with fs ≤ f and set gs := 1E fs , where E := {x : f (x) ≤Rg(x)}. Then R gs ∈ SR+ (M), fs = gs a.e., and gs ≤ 1RE f ≤ 1ERg ≤ g. By 11.2.2, fs dλ = gs dλ ≤ g dλ. Since fs was arbitrary, f dλ ≤ g dλ. In the general case, f + ≤ g + and f − ≥ g − a.e., hence, by the first part of the proof, Z Z Z Z Z Z f dλ = f + dλ − f − dλ ≤ g + dλ − g − dλ = g dλ. 11.2.5 Corollary. If fR : Rn → R is integrable and f = g a.e., then g is R integrable and f dλ = g dλ. Proof. RBy 10.5.10,R g is Lebesgue measurable. Moreover, f + = g + a.e., so by R − R − + + 11.2.4, f dλ = g dλ. Similarly, f dλ = g dλ. 11.2.6 Proposition. If f : Rn → R is integrable, then f is finite a.e. Proof. Suppose first that f ≥ 0. Let A = {x : f (x) = +∞} and Ak = {x : f (x) ≥ k} . Since f ≥ f 1Ak ≥ k1Ak ≥ k1A , 1 0 ≤ λ(A) ≤ k

Z

f dλ < +∞.

Letting k → +∞ shows that λ(A) = 0. In the general case, apply the result of the first paragraph to f + and f − to obtain λ x : f + (x) = +∞ = λ x : f − (x) = +∞ = 0, hence λ {x : |f (x)| = +∞} = 0. 11.2.7 Proposition. Let f : Rn → [0, +∞] be Lebesgue measurable. Then R f dλ = 0 iff f = 0 a.e. R Proof. The sufficiency follows from 11.2.5. For the necessity, suppose f dλ = 0 and let B = {x : f (x) > 0} and Bk = {x : f (x) > 1/k} . S∞ Then B = k=1 Bk and f ≥ f 1Bk ≥ k −1 1Bk so Z 0 ≤ λ(Bk ) ≤ k f dλ = 0. Therefore, λ(Bk ) = 0. By countable subadditivity, λ(B) = 0.

372

A Course in Real Analysis R By 11.2.4, f ≥ 0 implies that A f dλ ≥ 0 for all A ∈ M. The following is a converse:

n 11.2.8 R Proposition. Let f : R → R be Lebesgue measurable and suppose that A f dλ is defined for all A ∈ M. R (a) If A f dλ ≥ 0 for all A ∈ M, then f ≥ 0 a.e. R (b) If A f dλ = 0 for all A ∈ M, then f = 0 a.e.

Proof. Part (b) follows from part (a). To prove (a), let Ak = x : f (x) ≤ −k −1 and A = {x : f (x) < 0} . R Then f 1Ak ≤ −k −1 1Ak or 1Ak ≤ −kf 1Ak , hence, since Ak f ≥ 0, 0 ≤ λ(Ak ) ≤ −k

Z

f dλ ≤ 0.

Ak

Therefore, λ(Ak ) = 0. Since A =

S∞

k=1

Ak , λ(A) = 0.

11.2.9 Remark. The above properties of integrals on Rn also hold for integrals on E ∈ M. For example, if f is integrable on E, then f is finite a.e. on E: simply replace f in 11.2.6 by f · 1E . This observation applies to most of the results that follow. We shall usually refrain from making this explicit, but the reader is invited to formulate and verify such generalizations. ♦

Linearity of the Integral The following lemma is a special case of the monotone convergence theorem proved in the next section. 11.2.10 Lemma (Beppo–Levi). If {fk } is a sequence of nonnegative Lebesgue measurable functions such that fk ↑ f on Rn , then Z Z f dλ = lim fk dλ. k

R Proof. By 10.5.3, f is Lebesgue measurable, hence f dλ is defined. It follows from 0 ≤ fk ≤ fk+1 ≤ f and 11.2.4 that Z Z Z fk dλ ≤ fk+1 dλ ≤ f dλ. R R Therefore, L := lim fk dλ existsR in R and L ≤ f dλ. For the reverse inequality, it suffices to show that g dλ ≤ L for any g ∈ S+ (M) with g ≤ f . Let 0 < r < 1 and set Ek = {x : fk (x) ≥ rg(x)} . Since the sequence {fk } is

Lebesgue Integration on Rn

373

n increasing, Ek ⊆ k+1 . Since fk (x) ≥ rg(x) for all large k, Ek ↑ R . If g has PE m standard form j=1 aj 1Aj , then

fk ≥ fk 1Ek ≥ r

m X

aj 1Ek ∩Aj ,

j=1

hence Z fk dλ ≥ r

m X

aj λ(Ek ∩ Aj ).

j=1

Letting k → +∞, noting that Ek ∩ Aj ↑k Aj , we then obtain L≥r

m X

aj λ(Aj ) = r

Z g dλ.

j=1

Letting r ↑ 1 yields L ≥

R

g dλ, as required.

11.2.11 Theorem. If f, g : Rn → [0, +∞] are Lebesgue measurable, then Z Z Z (αf + βg) dλ = α f dλ + β g dλ α, β ∈ R+ . In particular, if f and g are integrable then so is αf + βg. Proof. By 10.5.8, there exist sequences {fk } and {gk } in S+ (M) such that fk ↑ f and gk ↑ g. Then αfk + βgk ↑ f + g and, by 11.2.10 and 11.2.2, Z Z (αf + βg) dλ = lim (αfk + βgk ) dλ k Z Z = α lim fk dλ + β lim gk dλ k k Z Z = α f dλ + β g dλ. 11.2.12 Corollary. Let f, g : Rn → R be Lebesgue measurable. (a) f is integrable iff |f | is integrable. (b) If f is integrable and |g| ≤ |f |, then g is integrable. (c) If f and g are integrable, then f + g is integrable. (d) If f is integrable and E ∈ M, then f is integrable on E. Proof. (a) If f is integrable then, by definition, both f + and f − are integrable, hence, by the theorem, |f | = f + + Rf − is integrable. Conversely, if |f | is R integrable, then the inequalities 0 ≤ f ± dλ ≤ |f | dλ show that both f + and f − are integrable, hence f is integrable.

374

A Course in Real Analysis

(b) By (a), |f | is integrable. The inequality |g| ≤ |f | then implies that |g| is integrable. By (a) again, g is integrable. (c) If f and g are integrable, then so are |f | and |g|. The inequality |f + g| ≤ |f | + |g| then shows that |f + g| is integrable. By (a), f + g is integrable. (d) This follows from (b) since |f 1E | ≤ |f |. The following theorem complements 11.2.11. 11.2.13 Theorem. Let f, g : Rn → R be Lebesgue measurable with g integrable, and let c ∈ R. Then the following hold: Z Z (a) cg is integrable and cg dλ = c g dλ. (b) If f is integrable, then f + g is integrable and Z Z Z (f + g) dλ = f dλ + g dλ. (c) If

Z

f dλ is defined, then

Z

(11.3)

(f + g) dλ is defined and (11.3) holds.1

Proof. (a) If c ≥ 0, then (cg)+ = cg + and (cg)− = cg − , hence, by 11.2.11, the functions (cg)± are integrable and Z Z Z Z Z Z cg dλ = (cg)+ dλ − (cg)− dλ = c g + dλ − c g − dλ = c g dλ. Next, observe that (−g)+ = g − and (−g)− = g + so Z Z Z Z Z Z + − − + (−g) dλ = (−g) dλ − (−g) dλ = g dλ − g dλ = − g dλ. Therefore, if c < 0, Z Z Z Z cg dλ = (−c)(−g) dλ = −c (−g) dλ = c g dλ. (b) By 11.2.12, f +g is integrable. By 11.2.6, there exists a set A of Lebesgue measure zero such that f (x), g(x) ∈ R for x ∈ Ac . Then on the set Ac , (f + g)+ − (f + g)− = f + g = f + − f − + g + − g − , hence (f + g)+ + f − + g − = (f + g)− + f + + g + . 1 To avoid undefined expressions such as ∞ − ∞ in the integrand f + g in (b) and (c), it must be assumed that g is finite-valued. This is no real loss of generality since g is integrable, hence finite-valued a.e. (11.2.6).

Lebesgue Integration on Rn

375

By 11.2.11 and 11.2.5, Z Z Z Z Z Z (f + g)+ dλ + f − dλ + g − dλ = (f + g)− dλ + f + dλ + g + dλ. Since the integrals in this equation are finite, rearranging yields Z Z Z + (f + g) dλ = (f + g) dλ − (f + g)− dλ Z Z Z Z = f + dλ − f − dλ + g + dλ − g − dλ Z Z = f dλ + g dλ. (c) The cases to be considered are R R (i) f − dλ < +∞ and f + dλ = +∞; R R (ii) f + dλ < +∞ and f − dλ = +∞. Suppose that (i) holds. We may assume that both f − and g are finite-valued. Since Z Z − (f + g) dλ ≤ (f − + g − ) dλ < +∞, R (f + g) dλ is defined. If (f + g)+ dλ < +∞, then (f + g) would be integrable, hence, by part (b,) so would f + = (f + g) + f − − g, contrary to our assumption. Therefore, Z Z Z (f + g) dλ = +∞ = f dλ + g dλ. R

Case (ii) is similar (or apply Case (i) to −f ). Z Z 11.2.14 Corollary. If f is integrable, then f dλ ≤ |f | dλ. Proof. Since ±f ≤ |f |, ±

Z

f dλ =

Z

Z ±f dλ ≤

|f | dλ.

Approximation of Integrable Functions 11.2.15 Definition. For E ∈ M and f ∈ L1 (E) define the L1 seminorm of f by Z kf k1 :=

|f | dλ. E

11.2.16 Theorem. L1 (E) is a linear space and k · k1 has all the properties of a norm except the coincidence property.

376

A Course in Real Analysis

Proof. That L1 (E) is a linear space follows from 11.2.13. Coincidence may fail since kf k1 = 0 only implies that f = 0 a.e. (Consider the Dirichlet function.) The other properties of a norm are easily established. 11.2.17 Theorem. Let f ∈ L1 (Rn ) and ε > 0. Then there exists a simple function g and a continuous function h, each vanishing outside a bounded interval, such that kf − gk1 < ε and kf − hk1 < ε. positive and negative parts, we may assume that f ≥ 0. Proof. By considering R By definition of f dλ, there exists fs ∈ S+ (M) with fs ≤ f such that Z Z kf − fs k1 = f dλ − fs dλ < ε/4. Let fs =

Pm

i=1

ai 1Ai , where ai > 0. Since m X

ai λ(Ai ) =

Z

Z fs dλ ≤

f dλ < +∞,

i=1

λ(Ai ) < +∞ for each i. Let M = maxi ai . By 10.1.6(d), there exists a bounded interval I such that λ(Ai ) − λ(I ∩ Ai ) < ε/(4M m), i = 1, . . . , m. Set Bi := Ai ∩ I and g :=

m X

ai 1Bi . Then

i=1

kg − fs k1 =

m X

ai λ(Ai ) − λ(Bi ) < ε/4,

i=1

hence

kf − gk1 ≤ kf − fs k1 + kfs − gk1 < ε/2.

To obtain h, for each i choose a compact set Ci and a bounded open set Ui such that Ci ⊆ Bi ⊆ Ui and λ(Ui \Ci ) < ε/(4mM ) (10.4.6). By Exercise 8.5.15, there exists a continuous function hi : Rn → [0, 1] such that hi = 1 on Ci and hi = 0 on Uic . Since hi − 1Bi = 0 on Ci ∪ Uic = (Ui \ Ci )c , Z k1Bi − hi k1 = |1Bi − hi | dλ ≤ 2λ(Ui \ Ci ) < ε/2mM. Ui \Ci

Pm

The function h := i=1 ai hi is continuous and by the triangle inequality kg − hk1 < ε/2. Therefore, kf − hk1 < ε, completing the proof.

Lebesgue Integration on Rn

377

Translation Invariance of the Integral 11.2.18 Theorem. If f : Rn → R is Lebesgue measurable and y ∈ Rn , then Z Z f (x + y) dx = f (x) dx (11.4) in the sense that if one side is defined, then so is the other and the integrals are then equal. Proof. If E ∈ M, then E − y ∈ M and λ(E − y) = λ(E) (Exercise 10.3.1), hence Z Z Z 1E (y + x) dx = 1E−y dλ = λ(E − y) = 1E dλ. Therefore, (11.4) holds for indicator functions. For a function h, define hy (x) := h(y + x). Let f ≥ 0 and let gR ∈ S+R(M) with g R≤ f . Then gy ≤ fy and, by the first paragraph and g = gy , R R linearity, R hence g ≤ fy . Taking the supremum over g yields f ≤ fy . Replacing y by −y and f by fy in this inequality produces the reverse inequality. Therefore, (11.4) holds for f ≥ 0. The general case follows from this and the identities (f ± )y = (fy )± .

Exercises 1. Let f and g be integrable. Prove: R R (a) If E f dλ ≤ E g dλ for all E ∈ M, then f ≤ g a.e. R R (b) If E f dλ = E g dλ for all E ∈ M, then f = g a.e. 2. Let f (x) = 1 + r bx−1 c for 0 < x ≤ 1, where r(k) is the remainder on division of the positive integer k by 3. (Cf. Exercise 10.5.15.) Show that Z ∞ 2 X 1 2 3 f dλ = + + + . 3 3k(3k + 1) (3k + 1)(3k + 2) (3k + 2)(3k + 3) (0,1] k=1

3.S Define f : [0, 1] → R by f (x) = 0 if x is rational, and f (x) = d2 if x is irrational, where d is the first nonzeroR digit in the decimal expansion of x. (See Exercise 10.5.16.) Show that [0,1] f dλ = 95/3. 4. (a) Prove the following mean value theorem for integrals: Let f be continuous on a compact connected set K ⊆ Rn . Then there exists xK ∈ K such that Z f dλ = f (xK )λ(K).

K

(b) Let f be continuous on C1 (x0 ). Prove that Z 1 lim f dλ = f (x0 ). r→0 λ(Cr (x0 )) C (x ) r 0

378

A Course in Real Analysis

5.S Let f be Lebesgue measurable on R and let m ≤ f ≤ M on E ∈ M(R). (a) Prove that if g is integrable on E, then there exists a ∈ [m, M ] such that Z Z f |g| dλ = a |g| dλ E

E

(b) Show that part (a) may be false if |g| is replaced by g. (c) Use (a) to show that at each point x where f is continuous, Z Z 1 lim f dλ − f dλ = f (x). y→x y − x [a,y] [a,x] 6. (Cauchy–Schwarz inequality) Let f and g be Lebesgue measurable on Rn . Prove that Z 2 Z Z 2 |f g| dλ ≤ f dλ · g 2 dλ. (See 5.7.19.) 7.S Prove that if f is integrable on [0, 1] and ε > 0, then there exists a R1 polynomial P on [0, 1] such that 0 |f − P | dλ < ε. 8. (Absolute continuity of the integral). Let f ≥ 0 be integrable on Rn . Prove that for each ε > 0 there exists a δ > 0 such that Z f dλ < ε for all E ∈ M(Rn ) with λ(E) < δ. E

Conclude that if {Ek } is a sequence in M(Rn ) with λ(Ek ) → 0, then R f dλ → 0. Hint. Begin with simple functions. Ek R 9.S Let f be integrable. Prove that limk [k,k+1] f dλ = 0. (A quick proof uses the dominated convergence theorem. For now, give a proof starting with simple functions.) 10. Suppose f : I = [0, 1] → [−1, 1] is integrable. Prove that Z f 2 dλ ≤ ε2 + λ {x : |f (x)| > ε} for every ε > 0. I

11.S Let f : I = [0, 1] → R be integrable. Prove that Z f 2 dλ ≤ ε2 λ {x ∈ I : |f (x)| > ε} for every ε > 0. I

12. Prove: If f is integrable and f < 1 a.e. on I, then

R I

f < 1.

Lebesgue Integration on Rn

379

13. Let R f be Lebesgue integrable on E ∈ M with 0 < λ(E) < +∞ and f dλ ≥ λ(E). Prove that λ {x ∈ E : f (x) ≥ 1} > 0. E 14. Let f be integrable on Rn . Show R that for each R r > 0 the function fr (x) := f (rx) is integrable and fr dλ = r−n f dλ. 15.S Let Rf : [0, 1] → R be a bounded Lebesgue measurable function such that [0,1] x2k f (x) dλ(x) = 0 for all k ∈ Z+ . Prove that f = 0 a.e. 16. Let f be Lebesgue integrable and g, g 0 bounded and continuous on R. Carry out the following steps to show that Z lim f (x)g 0 (kx) dλ(x) = 0. (11.5) k

(a) Prove (11.5) for f = 1[a,b] . (Use the fact that the Riemann and Lebesgue integrals of a continuous function on a closed bounded interval are equal. (Section 11.4.)) (b) Use (a) to show that (11.5) holds for f = 1U , where U is bounded and open. (c) Use (b) and 10.4.7 to show that (11.5) holds for f = 1E , where E ∈ M is bounded. (d) Use (c) and 11.2.17 to complete the proof. If g 0 (x) = sin x or cos x, then (11.5) is known as the Riemann–Lebesgue lemma.

11.3

Convergence Theorems

In this section we state and prove three pointwise convergence theorems for the Lebesgue integral. The first of these is a generalization of 11.2.10. Let fk : Rn → R be Lebesgue 11.3.1 Monotone Convergence Theorem. R − n measurable with fk ↑ f on R and let f1 dλ < +∞. Then Z Z f dλ = lim fk dλ. (11.6) k

fk−

f1−

R R Proof. From 0 ≤ f − ≤ ≤ we have fk− dλ < +∞ and f − Rdλ < +∞, hence the integrals in the assertion of the theorem are defined. If f1+ dλ = + + + +∞, R + then from f1 ≤ fk ≤ f we see that each side of (11.6) is +∞. If f1 dλ < +∞, then f1 is integrable and we may apply 11.2.10 to fk − f1 (≥ 0) to obtain Z Z Z Z Z Z fk dλ = (fk − f1 ) dλ + f1 dλ → (f − f1 ) dλ + f1 dλ = f dλ.

380

A Course in Real Analysis

11.3.2 Remark. Equation (11.6) is still true if the inequalities fk ≤ fk+1 ≤ f and the convergence fk ↑ f hold only almost everywhere. To see this, let A denote the set on which fk ≤ fk+1 for all k and fk ↑ f . Set f˜k = fk 1A and f˜ = f 1A . Then (11.6) holds for the new functions. Since λ(Ac ) = 0, 11.2.5 shows that the equation holds for the original functions. Analogous remarks apply to the other convergence theorems in this section. ♦ 11.3.3 Corollary. If gk is Lebesgue measurable and nonnegative for every k, then Z X ∞ ∞ Z X gk dλ = gk dλ. k=1

k=1

Pk

Proof. Let fk = j=1 gj and f = the theorem and linearity, Z

f dλ = lim

Z

k

P∞

j=1 gj .

Then 0 ≤ fk ↑ f on Rn , hence, by

fk dλ = lim k

k Z X

gj dλ.

j=1

11.3.4 Corollary. Let f ≥ 0 be Lebesgue measurable. Define a function µ on M(Rn ) by Z µ(E) := f dλ, E ∈ M(Rn ). E

Then µ is a measure on M(R ). n

Proof. For countable additivity, apply 11.3.3 to gk = 1Ek . 11.3.5 Fatou’s Lemma. If fk is nonnegative and Lebesgue measurable for every k, then Z Z lim inf fk dλ ≤ lim inf fk dλ. (11.7) k

k

Proof. Let gk = inf j≥k fj and g = lim inf k fk . Then gk ≤ fk , gk ↑ g, and gk and g are Lebesgue measurable (10.5.3). By the monotone convergence theorem, Z Z Z Z Z lim inf fk dλ = g dλ = lim gk dλ = lim inf gk dλ ≤ lim inf fk dλ. k

k

k

k

The inequality in (11.7) may be strict. For example, if fk = k1[0,1/k] , then the left side is zero while the right side is one. 11.3.6 Dominated Convergence Theorem. Let g : Rn → [0, +∞] be integrable and let {fk : Rn → R} be a sequence of Lebesgue measurable functions n such that |f R R k | ≤ g for all k. If fk → f on R , then f is integrable and fk dλ → f dλ.

Lebesgue Integration on Rn

381

Proof. Since |f | ≤ g, fk and f are integrable (11.2.12). Fatou’s lemma applied to g ± fk (≥ 0) shows that Z Z Z Z Z g dλ + f dλ ≤ lim inf (g + fk ) dλ = g dλ + lim inf fk dλ k

k

and Z

Z g dλ −

Subtracting

f dλ ≤ lim inf k

Z

(g − fk ) dλ =

Z

g dλ − lim sup

Z fk dλ.

k

g dλ in each inequality yields Z Z Z Z f dλ ≤ lim inf fk dλ ≤ lim sup fk dλ ≤ f dλ. R

k

k

The following example illustrates that care must be taken when applying the dominated convergence theorem. 11.3.7 Example. Let p > 0 and define fk (x) :=

k , 0 < x ≤ 1, and Ik := 1 + k 2 x2p

Z

fk dλ, k ∈ N.

(0,1]

Clearly, fk → 0 for all p > 0. We show that limk Ik = 0 iff 0 < p < 1. By Section 11.4, below, the integrals are Riemann, hence, making the substitution t = kxp and setting q = p−1 − 1, we obtain Z 1 Z k Z k 1 tq Ik = dx = q dt = gk dλ, 2 2p pk 0 1 + t2 0 1+k x where gk (t) =

1 tq 1[0,k] . q pk 1 + t2

If p = 1, then q = 0 and Ik = arctan k → π/2. If 0 < p < 1, then gk → 0 and gk (t) ≤ p−1 (1 + t2 )−1 for all t ≥ 0 and all k, so Ik → 0 by the dominated convergence theorem. Finally, if p > 1, then −1 < q < 0 and Z 1 1 1 Ik ≥ q dt → +∞. ♦ pk 0 1 + t2 The following theorem gives general conditions under which one may “differentiate under the integral sign.” 11.3.8 Theorem. Let f (x, y) be Lebesgue measurable on I := (a, b) × (c, d) such that for each y in (c, d) the function f (·, y) is Lebesgue integrable on (a, b) and the derivative fy exists on I. If there exists an integrable function g on (a, b) such that |fy (x, y)| ≤ g(x) for all (x, y) ∈ I, then Z Z d ∂f f (x, y) dλ(x) = (x, y) dλ(x). dy (a,b) (a,b) ∂y

382

A Course in Real Analysis

Proof. We prove the right-hand derivative version. Let y ∈ (c, d) and yk ↓ y. Set Z f (x, yk ) − f (x, y) G(y) = f (x, y) dλ(x) and gk (x) = . yk − y (a,b) By the mean value theorem, gk (x) = fy (x, tk ) for some tk ∈ (y, yk ), hence |gk | ≤ g. Since gk (x) → fy (x, y), the dominated convergence theorem implies that Z Z G(yk ) − G(y) = gk (x) dλ(x) → fy (x, y) dλ(x). yk − y (a,b) (a,b) R Since {yk } was arbitrary, G0r (y) exists and equals (a,b) fy (x, y) dλ(x).

Exercises 1.S Prove the following: Z k (a) lim sink x (1 − sin x) dλ = 0. k

(b) lim k

[0,π]

Z [0,+∞)

k sin x3/2 dλ(x) = 0. 1 + k 2 x2

2. Let f : Rn → (0, +∞) be integrable. Prove that Z Z Z (a) k ln(1 + k −1 f ) dλ → f dλ. (b) S k ln(1 + k −2 f ) dλ → 0. Z Z Z (c) S k sin k −1 f dλ → f dλ. (d) f 1/k dλ → λ(E), E ∈ M. E

3.S Let f, g : Rn → (0, +∞) be Lebesgue measurable with g integrable. Prove: Z Z g(1 + k −1 f )k exp (−f ) dλ → g dλ. 4. Let f : Rn → [1, +∞) be Lebesgue measurable and g : Rn → [0, +∞) integrable. Prove that Z k 2 g exp (−kf ) dλ → 0. 5.S Let f be integrable on (0, ∞). Show that for each t ∈ R the function f (x) sin(tx)/x is integrable on (0, ∞) and prove that the integral R f (x)x−1 sin(tx) dλ(x) is continuous in t. (0,∞) 6. Prove that the derivative of the gamma function (5.7.8) is Z ∞ Γ0 (x) = tx−1 e−t ln t dt, x > 0. 0

Lebesgue Integration on Rn

383

(Use the fact, proved in the next section, that the improper Riemann integral and the Lebesgue integral of a nonnegative continuous function are equal.) 7. Let f : [0, +∞) → R be bounded and Lebesgue measurable and suppose that limx→+∞ f (x) = r. Show that Z lim f (kx) dλ(x) = ar for every a > 0. k

[0,a]

Hint. Use Exercise 11.2.14. 8.S Let f : Rn → R be Lebesgue measurable and have countable range {a1 , a2 , . . .}. P Set Ak = {x ∈ Rn : f (x) = ak }. Prove that f is integrable ∞ iff the series k=1 R ak λ(Ak ) converges absolutely, in which case the value of the series is f dλ. R 9. Let p > 1 and f (x) := bx−1 c−p , 0 < x < 1. Find (0,1) f dλ. 10. Let fk , f be integrable and Ek , E ∈ M(Rn ) such that lim kfk − f k1 = 0 and lim λ(Ek ∆E) = 0 k

k

(see Exercise 10.5.10). Prove that Z Z lim fk dλ = f dλ. k

Ek

E

11. Let f : Rn → R be integrable and ε > 0. (a) Prove that the set A = {x : |f (x)| ≥ ε} has finite measure. (b) Show that there exists B ∈ M with λ(B) < +∞ such that Z Z < ε. f dλ − f dλ B

12.S Let {fk } be a sequence of integrable functions on Rn such that ∞ X kfk k1 < +∞. Prove that limk fk (x) = 0 a.e. k=1

13. Let Lebesgue integrable on R and p > 0. Prove that the series P∞ f be −p f (kx) converges absolutely a.e. on R. k=1 k 14. Let T : C [a, b] → C [a, b] be linear and continuous in the L1 norm. If f : [a, b] × [c, d] → R is continuous, prove that Z d Z d T f (·, x) dx = T f (·, x) dx c

c

where the integrals may be taken to be Riemann.

384

A Course in Real Analysis

15.S Prove the following extension of Fatou’s lemma: If fk , g are Lebesgue integrable on Rn and fk ≥ g for all k, then Z Z lim inf fk dλ ≤ lim inf fk dλ. k

k

16. Let g be integrable on Rn and let {fk } be a sequence of Lebesgue measurable functions on Rn such that |fk | ≤ g. Show that Z Z Z Z lim inf fk ≤ lim inf fk ≤ lim sup fk ≤ lim sup fk . k

k

k

k

17. Let f, fk be nonnegative Lebesgue integrable functions on Rn such that fk → f . Prove that Z Z fk dλ → f dλ iff kfk − f k1 → 0. Hint. For the necessity, note that (fk − f )− ≤ f . 18. Let f, fk be integrable on Rn with fk → f . Prove that Z Z kfk − f k1 → 0 iff |fk | dλ → |f | dλ. Hint. For the sufficiency use Fatou’s lemma. R 19.S Let f be Lebesgue integrable on R such that [a,b] f dλ = 0 for all intervals [a, b]. Prove that f = 0 a.e. Hint. Use 11.3.4, 10.4.4, and Exercise 11.2.8. 20. Let f : R2 → R have the property that f (x, y) is Lebesgue measurable in y for each x and continuous in x for each y. Suppose there exists an integrable function g : R → R such that |f (x, y)| ≤ g(y) for all x and y. Prove that the function Z F (x) := f (x, y) dλ(y) is continuous. P∞ 21. Let f ≥ 0 be integrable on [1, +∞). Prove that k=1 f (x+k) is integrable on [0, 1]. Conclude that the series converges a.e. on [1, +∞). 22. Let f be Lebesgue measurable on I = [0, 1] and set Ak = {x ∈ I : |f (x)| ≥ k} . Prove: (a) f is integrable on I iff

P∞

k=0

λ(Ak ) converges.

(b) If f is integrable on I = [0, 1] then limk kλ(Ak ) = 0.

Lebesgue Integration on Rn

11.4

385

Connections with Riemann Integration Throughout the section, f denotes an arbitrary bounded real-valued function on a closed and bounded interval [a, b].

In this section we show that f is Riemann integrable if and only if its set of discontinuities has Lebesgue measure zero. The first step is to show that the upper and lower integrals of f may be expressed as integrals of Borel measurable functions. By 5.2.1 there exists a sequence of partitions {Pk } of [a, b] such that Pk+1 is a refinement of Pk , kPk k → 0, and Z

b k

a

Define

hk =

Z

f = lim S(f, Pk ),

X

mj 1[xj−1 ,xj ]

b

f = lim S(f, Pk ).

a

k

and gk =

X

j

where

Mj 1[xj−1 ,xj ] ,

j

mj :=

inf

xj−1 ≤x≤xj

f (x),

Mj :=

sup

xj−1 ≤x≤xj

f (x),

and the intervals [xj−1 , xj ] are those generated by the partition Pk . Then gk and hk are Borel measurable simple functions and Z Z S(f, Pk ) = hk dλ, S(f, Pk ) = gk dλ. [a,b]

[a,b]

Moreover, h1 ≤ h2 ≤ · · · ≤ f ≤ . . . ≤ g2 ≤ g1 , hence h(x) := limk hk (x) and g(x) := limk gk (x) exist in R for each x ∈ [a, b], h ≤ f ≤ g, and h and g are Borel measurable. If M is a bound for |f |, then |hk |, |gk | ≤ M a.e., hence by the dominated convergence theorem b

Z

f = lim S(f, Pk ) = lim k

a

and

Z a

k

b

f = lim S(f, Pk ) = lim k

Z

k

hk dλ =

Z

[a,b]

Z [a,b]

h dλ

(11.8)

g dλ.

(11.9)

[a,b]

gk dλ =

Z [a,b]

11.4.1 Lemma. f ∈ Rba iff g = h a.e. In this case, f is Lebesgue measurable Rb R and a f = [a,b] f dλ.

386

A Course in Real Analysis R Proof. From (11.8) and (11.9), f ∈ Rba iff [a,b] (g − h) dλ = 0, which, by 11.2.7, is equivalent to g = h a.e. If this holds, then h = f = g a.e. so f is Lebesgue Rb R measurable and a f = [a,b] f dλ by (11.8) and (11.9). 11.4.2 Lemma. Suppose that x ∈ [a, b] is not a member of any of the partitions Pk . Then f is continuous at x iff h(x) = g(x). Proof. Suppose f is continuous at x. Given ε > 0, choose δ > 0 such that y ∈ [a, b] and |x − y| < δ implies |f (x) − f (y)| < ε. Choose N so that kPk k < δ for all k ≥ N and fix k ≥ N . Since x is in some subinterval (xj−1 , xj ) of Pk , f (x) − ε < f (y) < f (x) + ε for all y ∈ [xj−1 , xj ], hence

f (x) − ε ≤ hk (x) = mj ≤ Mj = gk (x) ≤ f (x) + ε.

Letting k → +∞ yields f (x) − ε ≤ h(x) ≤ g(x) ≤ f (x) + ε, and since ε was arbitrary, g(x) = h(x). Conversely, let g(x) = h(x). Given ε > 0, choose k such that |gk (x) − g(x)| < ε and |hk (x) − h(x)| < ε. Suppose that x is in the open subinterval (xi−1 , xi ) of Pk . Choose δ > 0 so that (x − δ, x + δ) ⊆ (xi−1 , xi ). Then for all y ∈ (x − δ, x + δ), h(x) − ε ≤ hk (x) ≤ f (y) ≤ gk (x) ≤ g(x) + ε = h(x) + ε, which implies that |f (x) − f (y)| < 2ε. Therefore, f is continuous at x. Here is the main result of the section. 11.4.3 Theorem. Let f : [a, b] → R be bounded. Then f ∈ Rba iff the set D of discontinuities of f has Lebesgue measure zero. In this case, f is Lebesgue measurable and Z Z b

f (x) dx =

a

f dλ. [a,b]

Proof. Let A denote the union of the partitions Pk and set B {x : g(x) 6= h(x)}. By 11.4.2,

=

B ∩ Ac ⊆ D ⊆ A ∪ B Since A is countable, λ(A) = 0, hence λ(B ∩ Ac ) = λ(A ∪ B) = λ(B). It follows that λ(B) = λ(D). Thus, by 11.4.1, f ∈ Rba iff λ(D) = 0.

Lebesgue Integration on Rn

387

11.4.4 Example. Let A := (0, 1) \ E, where E is the Cantor ternary set (10.3.4). Since A is open, the function f (x) = 1A (x) sin(πx) is continuous on A. Since λ(E) = 0, f is both Riemann and Lebesgue integrable on [0, 1] and Z

1

f (x) dx =

0

Z 0

1

2 sin(πx) dx = . π

♦

11.4.5 Remark. Theorem 11.4.3 readily extends to n-dimensional Riemann integrals; the statement and proof are essentially the same. Note that in this case, a Riemann integrable function f may be discontinuous on m-dimensional hyperplanes, m < n, as these have Lebesgue measure zero (see 11.6.9). ♦ Here is the connection between improper integrals and Lebesgue integrals. 11.4.6 Corollary. Let g be locally Riemann integrable on [a, b) (where b could be infinite). Then g is Lebesgue measurable on [a, b). Moreover: (a) If g ≥ 0, then g is improperly integrable on [a, b) iff g is Lebesgue integrable on [a, b), in which case Z b Z g= g dλ. (11.10) a

[a,b)

(b) If g is Lebesgue integrable on [a, b), then g is improperly integrable on [a, b) and (11.10) holds. (c) If g is improperly integrable on [a, b), then g need not be Lebesgue integrable on [a, b). Proof. (a) Let bk ↑ b and let D denote the set of discontinuities of g on [a, b). Since g is Riemann integrable on [a, bk ], λ [a, bk ] ∩ D = 0. By the theorem, 1[a,bk ] g is Lebesgue measurable for every k and Z a

bk

g=

Z

g dλ =

Z 1[a,bk ] g dλ.

[a,bk ]

Taking limits we see that g is Lebesgue measurable and, by the monotone convergence theorem, 11.10 holds. (b) If g is Lebesgue integrable on [a, b), then, by (a), g + and g − are improperly integrable on [a, b) hence (b) holds. (c) The function g(x) = x−1 sin x is improperly integrable but not absolutely improperly integrable on [1, +∞) (5.7.18). Since a Lebesgue integrable function is absolutely integrable, g cannot be Lebesgue integrable on [1, +∞).

388

A Course in Real Analysis

11.5

Iterated Integrals For the remainder of the text we also use the notation dx Rb R for dλ(x) and a f (x) dx for [a,b] f (x) dλ(x), etc.

In this section we state and prove a result that gives general conditions under which the Lebesgue integral of a function on Rn may be expressed as an iterated integral, a useful tool for evaluating integrals. 11.5.1 Fubini–Tonelli Theorem. Let f be Borel measurable on Rn and let p, q ∈ N with p + q = n. (a) If f ≥ 0, then the functions Z Z f (x, z) dz and Rq

f (z, y) dz

Rp

are Borel measurable in x ∈ Rp and y ∈ Rq , respectively, and Z Z Z Z f (x, z) dz dx = f (z, y) dz dy. Rp

Rq

Rq

(b) If either of the iterated integrals Z Z Z |f (x, z)| dz dx or Rp

Rq

(11.11)

Rp

Rq

Z

|f (z, y)| dz dy

Rp

is finite, then both are finite, f is integrable, and (11.11) holds. By induction we have 11.5.2 Corollary. Let f be Borel measurable on Rn such that Z ∞ Z ∞ ··· |f (x1 , . . . , xn )| dxi1 · · · dxin < +∞ −∞

−∞

for some permutation (i1 , . . . , in ) of (1, . . . , n). Then f is integrable and Z Z ∞ Z ∞ f dλ = ··· f (x1 , . . . , xn ) dxj1 · · · dxjn . Rn

−∞

−∞

for every permutation (j1 , . . . , jn ) of (1, . . . , n). 11.5.3 Example. We prove the Gaussian density formula Z ∞ 2 e−t /2 . ϕ(t) dt = 1, where ϕ(t) := √ 2π −∞

(11.12)

Lebesgue Integration on Rn

389

By 11.4.6, the integral may be interpreted either as a Lebesgue integral or as an improper Riemann integral. The function ϕ is called the standard normal (or Gaussian) density. It plays an important role in probability and statistics. For Rb example, σ −1 a ϕ (x − µ)/σ dx is the probability that randomly chosen data from a normally distributed population R ∞ with mean µ and standard deviation σ lies between a and b, and σ −1 −∞ ϕ (x − µ)/σ x dx is the average of the data. To verifyR(11.12) note that because the integrand is an even function, the ∞ left side is 2 0 ϕ(t) dt. By a change of variable, Z ∞ Z ∞ Z ∞ 2 2 2 −t2 /2 e−t dt. 2 ϕ(t) dt = √ e dt = √ π 2π 0 0 0 Thus it suffices to show that 2 √ π

Z

∞

2

e−t dt = 1.

0

Let I denote the integral on the left. Then Z ∞ Z ∞ 2 2 −y 2 I = e e−t dt dy Z0 ∞ Z0 ∞ 2 2 −y 2 = e ye−x y dx dy, by the substitution t = xy 0 0 Z ∞Z ∞ 2 2 = ye−y (1+x ) dy dx, by 11.5.1 0 0 Z Z ∞ 1 ∞ (1 + x2 )−1 e−u du dx, by the substitution u = y 2 (1 + x2 ) = 2 0 0 ∞ R∞ 1 = arctan x because 0 e−u du = 1 2 0 π ♦ = 4 11.5.4 Example. Let f, g : Rn → R be Borel measurable and integrable. By Exercise 10.5.19, the function F (x, y) := f (x − y)g(y) is Borel measurable in (x, y). By the Fubini–Tonelli theorem and translation invariance of the integral, Z Z Z |F (x, y)| dλ(x, y) = |g(y)| |f (x − y)|dx dy = kgk1 kf k1 < +∞. Rn ×Rn

Rn

Rn

Therefore, F is integrable, hence the function Z (f ∗ g)(x) := f (x − y)g(y)dy, Rn

called the convolution of f and g, is finite a.e. and integrable on Rn . Convolutions are useful in calculating the probability distribution of a sum of independent random variables. ♦

390

A Course in Real Analysis

11.5.5 Example. (Volume of a simplex). Let a > 0 and let ej , 1 ≤ j ≤ n, be the standard basis in Rn . Define the n-dimensional simplex in Rn by n n o X S(a, n) = x : xj ≤ a and xj ≥ 0 . j=1

x3 a

a x1

x2

a

FIGURE 11.2: Three-dimensional simplex. We use the Fubini–Tonelli theorem and induction to show that an λn S(a, n) = . n! The formula holds for n = 1 since S(a, 1) = [0, a]. Assume the formula holds for n − 1 and all a > 0. Then Z λn S(a, n) = 1S(a,n) (x1 , . . . , xn ) d(x1 , . . . , xn ) Z = 1S(a−xn ,n−1) (x1 , . . . , xn−1 ) d(x1 , . . . , xn−1 ) dxn [0,a] Z 1 (a − xn )n−1 dxn . = (n − 1)! [0,a] The last integral evaluates to an /n, completing the proof.

♦

11.5.6 Example. Let Crn (x) denote the closed ball in Rn with center x and radius r. We show that λ Crn (x) = rn αn , where (2π)n/2 if n is even, ···4 · 2 αn = n(n − 2) (n−1)/2 2(2π) if n is odd. n(n − 2) · · · 3 · 1 For ease of notation we write Crn for Crn (0) and denote by 1r the indicator n function of Cr . By the translation and dilation properties of Lebesgue measure, λ Crn (x) = rn λ C1n , hence it suffices to establish the formula for r = 1 and x = 0.

Lebesgue Integration on Rn

391

If n = 1, then C1n = (−1, 1) and αn = 2, so the formula holds in this case. By a simple integration, λ C12 = π, hence the formula holds for n = 2 as well. Now assume that n > 2. From C1n = (x1 , . . . , xn ) : x21 + · · · + x2n ≤ 1 = (x1 , . . . , xn ) : x23 + · · · + x2n ≤ 1 − x21 − x22 , (x1 , x2 ) ∈ C12 we have

11 (x1 , . . . , xn ) = 1√1−x2 −x2 (x3 , . . . , xn )11 (x1 , x2 ), 1

2

hence, by the Fubini–Tonelli theorem, Z Z λ C1n = 11 (x1 , x2 ) 1√1−x2 −x2 (x3 , . . . , xn ) dλ(x3 , . . . , xn ) dx1 dx2 . 1

Rn−2

R2

The inner integral is n−2 λ C√

2

= (1 − x21 − x22 )(n−2)/2 λ C1n−2 ,

1−x21 −x22

hence, changing to polar coordinates,2 Z n−2 n λ C1 = λ C1 (1 − x21 − x22 )(n−2)/2 dx1 dx2 x21 +x22 ≤1

= λ C1n−2

Z 0

2π

2π λ C1n−2 . = n

Z

1

(1 − r2 )(n−2)/2 r dr dθ

0

Iterating, we obtain 2π (2π)2 λ C1n = λ C1n−2 = λ C1n−4 = · · · n n(n − 2) (2π)m−1 n−2(m−1) = λ C1 . n(n − 2) · · · (n − 2(m − 2)) Thus λ C12m =

(2π)m−1 (2π)m λ C12 = 2m(2m − 2) · · · 4 2m(2m − 2) · · · 2

and λ C12m−1 =

(2π)m−1 2(2π)m−1 λ C11 = . (2m − 1)(2m − 3) · · · 3 (2m − 1)(2m − 3) · · · 3

♦

2 The general change of variables theorem for Lebesgue integrals is proved in the next section.

392

A Course in Real Analysis

Proof of the Fubini–Tonelli theorem. We show first that part (b) of the theorem is a consequence of part (a). Indeed, if one of the iterated integrals in (b) is finite, then, by part (a) applied to |f |, so is the other and f is integrable. Applying part (a) to f ± , we see that (11.11). Next, observe that if part (a) of the theorem holds for indicator functions then, by linearity of the integrals, it holds for nonnegative simple functions. By 10.5.8 and the monotone convergence theorem, (a) holds for all nonnegative Borel measurable functions. It remains then to prove (a) for indicator functions. The proof consists of several lemmas, the first of which is a special case of a theorem due to E.B. Dynkin. 11.5.7 Lemma. Let F denote the intersection of all collections G of subsets of Rn with the following properties: (a) If A, B ∈ G and A ⊆ B, then B \ A ∈ G. (b) If Ak ∈ G and Ak ↑ A, then A ∈ G. (c) G contains every bounded interval. Then F is a σ-field containing B(Rn ). Proof. It is easy to see that F itself has properties (a)–(c). Moreover, from (b) and (c), F contains every interval. In particular, Rn ∈ F. We show first that F is closed under finite intersections. To see this, fix A ∈ F and define FA := {B ∈ F : A ∩ B ∈ F} . One easily checks that FA has properties (a) and (b). Furthermore, if A is an interval, then FA has property (c) so by minimality F ⊆ FA . This shows that if B ∈ F, then A ∩ B ∈ F for all intervals A; in other words, FB contains all intervals. Thus FB has properties (a)–(c). By minimality, F ⊆ FB , that is, A, B ∈ F ⇒ A ∩ B ∈ F. By induction, F is closed under finite intersections. Now observe that property (a), together with the fact that Rn ∈ F, implies that F is closed under complements. Thus if {Ek } is a sequence in F, then, by the result of the preceding paragraph, Ak :=

k [ j=1

Ek =

\ k

Ekc

c ∈ F.

j=1

S∞ S∞ By (b), k=1 Ek = k=1 Ak ∈ F. This shows that F is a σ-field. Since F contains all intervals, it must contain B(Rn ). 11.5.8 Lemma. Let p, q ∈ N with p + q = n. If A ∈ B(Rp ) and B ∈ B(Rq ), then A × B ∈ B(Rn ) and λ(A × B) = λ(A)λ(B). (11.13)

Lebesgue Integration on Rn

393

Proof. For fixed bounded intervals I ⊆ Rp and J ⊆ Rq , define GI,J = B ∈ B(Rq ) : I ×(B ∩J) ∈ B(Rn ) & λ I ×(B ∩J) = λ(I)λ(B ∩J) . We show that GI,J has properties (a)–(c) of 11.5.7. Clearly, (c) holds. If B ∈ GI,J , then I × (B c ∩ J) = (I × J) \ I × (B ∩ J) ∈ B(Rn ) and λ I × (B c ∩ J) = λ I × J − λ I × (B ∩ J) = λ(I) λ(J) − λ(B ∩ J) = λ(I)λ(J ∩ B c ), hence B c ∈ G. Therefore, GI,J is closed under complements. Now let Bk ∈ GI,J and Bk ↑ B. Then I × (J ∩ B) =

∞ [

I × (J ∩ Bk ) ∈ B(Rn )

k=1

and, by 10.1.6, λ I × J ∩ B) = lim λ I × (J ∩ Bk ) = λ(I) lim λ(J ∩ Bk ) = λ(I)λ(J ∩ B), k

k

which shows that B ∈ GI,J . Therefore, GI,J has properties (a)–(c) of 11.5.7, so B(Rq ) = GI,J . We have shown that for all bounded intervals I ⊆ Rp , J ⊆ Rq and all B ∈ B(Rq ), I × (B ∩ J) ∈ B(Rn ) and λ I × (B ∩ J) = λ(I)λ(B ∩ J). Taking a sequence of bounded intervals Jk ↑ Rn , we see that I × B ∈ B(Rn ) and λ I × B) = λ(I)λ(B).

(11.14)

Now fix B ∈ B(Rq ) and let I ⊆ Rp be a bounded interval. Define HB,I = {A ∈ B(Rp ) : (A ∩ I) × B ∈ B(Rn ) & λ (A ∩ I) × B = λ(A ∩ I)λ(B)}. By (11.14), HB,I contains all intervals. Arguing as above, we see that HB,I = B(Rp ). Thus for all A ∈ B(Rp ), B ∈ B(Rq ), and all bounded intervals I ⊆ Rp , (A ∩ I) × B ∈ B(Rn ) and λ (A ∩ I) × B = λ(A ∩ I)λ(B). Taking a sequence of bounded intervals Ik ↑ Rn in the last equation yields (11.13). The following lemma asserts that part (a) of the Fubini–Tonelli theorem holds for indicator functions of Borel sets and hence completes the proof of the theorem.

394

A Course in Real Analysis

11.5.9 Lemma. Let p, q ∈ N with p + q = n and let C ∈ B(Rn ). Then Z Z 1C (x, z) dz and 1C (z, y) dz Rq

Rp

are Borel measurable functions of x ∈ R and y ∈ Rq , respectively, and Z Z Z Z λ(C) = 1C (x, z) dz dx = 1C (z, y) dz dy. p

Rp

Rq

Rq

Rp n

Proof. Let G denote the collection of all C ∈ B(R ) for which the assertions of the lemma hold. We show that G = B(Rn ). The first step is to show that G has properties (b) and (c) of 11.5.7. For property (b), let Ck ∈ G and Ck ↑ C. Then 1Ck (x, z) ↑ 1C (x, z), hence, by the monotone convergence theorem, Z Z 1Ck (x, z) dz ↑ 1C (x, z) dz, x ∈ Rp . Rq

Rq

Thus Rq 1C (x, z) dz is Borel measurable in x. Applying the monotone convergence theorem again, we see that Z Z Z Z λ(C) = lim λ(Ck ) = lim 1Ck (x, z) dz dx = 1C (x, z) dz dx, R

k

k

Rp

Rq

Rp

Rq

and similarly for the other iterated integral. Therefore, G has property (b). For property (c), let A ∈ B(Rp ), B ∈ B(Rq ), and C = A × B. Then Z Z 1C (x, z) dz = 1A (x)1B (z) dz = 1A (x)λ(B), Rq

Rq

which is Borel measurable in x and, together with 11.5.8, implies that Z Z 1C (x, z) dz dx = λ(A)λ(B) = λ(C). Rp

Rq

Similar assertions hold for the other iterated integral. Thus, G contains Cartesian products of Borel sets and, in particular, all intervals. Now let I be a bounded interval in Rn and let GI = {B ∈ B : B ∩ I ∈ G}. Since G has properties (b) and (c) of 11.5.7, so does GI . We claim that GI also has property (a). To see this, let C, D ∈ GI with C ⊆ D and let E = D \ C. Since 1E∩I = 1D∩I − 1C∩I , Z Z Z 1E∩I (x, z) dz = 1D∩I (x, z) dz − 1C∩I (x, z) dz, Rq

Rq

Rq

which, because C ∩ I and D ∩ I ∈ G, is Borel measurable in x and implies that Z Z 1E∩I (x, z) dz dx Rp Rq Z Z Z Z = 1D∩I (x, z) dz dx − 1C∩I (x, z) dz dx Rp

Rq

= λ(D ∩ I) − λ(C ∩ I) = λ(E ∩ I).

Rp

Rq

Lebesgue Integration on Rn

395

Here we have used the fact that, because I is bounded, the calculations take place in R, hence subtraction is legitimate. The other iterated integral is treated similarly. Therefore E ∈ GI , as required. Since GI has properties (a)–(c) of 11.5.7, GI contains all Borel sets. This means that for any C ∈ B(Rn ) and bounded interval I ⊆ Rn , the functions Z Z 1C∩I (x, z) dz and 1C∩I (z, y) dz Rq

Rp

are Borel measurable in x and y, respectively, and Z Z Z Z 1C∩I (x, z) dz dx = λ(C ∩ I) = Rp

Rq

Rq

1C∩I (z, y) dz dy.

Rp

Taking an increasing sequence of bounded intervals I tending to Rn and using the monotone convergence theorem shows that C ∈ G. Therefore, G = B(Rn ), as required.

Exercises 1.S Prove that λn {(x1 , . . . , xn ) : xj ∈ Q for some j} = 0. R 2. Evaluate [0,+∞)n f , where 2

(a)S f (x) = x1 · · · xn e−kxk .

(b) f (x) = x1 · · · xn (1 + kxk2 )−n−1 .

3. (Cavalieri’s principle). For E ∈ M(Rn ) and t ∈ R, define Et := x = (x1 , . . . , xn−1 ) ∈ Rn−1 : (x, t) ∈ E . Suppose that Et ∈ M(Rn ) for all t ∈ [a, b]. Prove that h

λn E ∩ R

n−1

× [a, b]

i

=

b

Z

λn−1 (Et ) dt.

a

Thus the “volume” of the portion of E between the hyperplanes xn = a and xn = b is the integral from a to b of the “cross-sectional areas” λn−1 (Et ). 4. Let f and g be Riemann integrable on [0, 1]. Prove that 1

Z 0

Z

x

g(x − y)f (y) dy dx =

Z

=

Z

0

0

0

5.S Evaluate

Z 0≤x≤x1 ≤···≤xm ≤1

1

Z

1−y

g(x)f (y) dx dy

0 1Z

1−x

0

x dλ(x, x1 , . . . , xm ).

g(x)f (y) dy dx.

396

A Course in Real Analysis

6. Show that Z 1Z 0

1

x2 − y 2 dy dx = − (x2 + y 2 )2

0

Z

1

1

Z 0

0

x2 − y 2 π dx dy = . (x2 + y 2 )2 4

Why does this not contradict the Fubini–Tonelli theorem? 7.S Let f be integrable on (0, 1), p > 0, and define Z g(x) = t−p f (t) dt, 0 < x < 1. [x1/p ,1)

Prove that g is integrable on (0, 1) and that Z Z g dλ = f dλ. (0,1)

(0,1)

8. Let f be continuous on [−1, 1]. Show that Z 2π Z 1 (a) f 0 (r cos θ)r cos2 θ dr dθ 0

0

0

=

Z

2π

f (cos θ) cos θ dθ −

Z

0

(b) (c)

Z

2π

Z

1

0 Z 2π

0 Z 1

0

0

f 0 (r cos θ)r sin2 θ dr dθ =

Z 0

f 0 (r cos θ)r dr dθ =

Z

2π

Z

2π

0 cos θ

Z

cos θ

f (x) dx dθ.

0

f (x) dx dθ.

0

2π

f (cos θ) cos θ dθ.

0

9. Let a, b > 0. Use the Fubini–Tonelli R ∞theorem, the dominated convergence theorem, and the identity 1/x = 0 e−xt dt, x > 0, to prove that Z ∞ Z ∞ −ax π e − e−bx sin x (a)S dx = . (b) dx = ln b − ln a. x 2 x 0 0 x 1 10. Show that ϕ ∗ ϕ(x) = √ ϕ √ . 2 2 11. Let f, g : Rn → R be Borel measurable and integrable. Prove: (a)S f ∗ g = g ∗ f . (b) If f and g are continuous, then Z d f (x)g(y) dx dy = f ∗ g(z), where Az = {(x, y) : x + y ≤ z} . dz Az 12. Let f : [0, 1] → (0, +∞] be Lebesgue measurable. Use the Fubini–Tonelli theorem to prove that Z Z f dλ 1/f dλ ≥ 1. [0,1]

[0,1]

(A simpler but less interesting proof uses the Cauchy–Schwarz inequality.)

Lebesgue Integration on Rn

397

13. Let f and g be positive Lebesgue measurable functions on [0, 1] such that f g ≥ 1. Use the preceding exercise to prove that Z Z f dλ g dλ ≥ 1. [0,1]

[0,1]

(The Cauchy–Schwarz inequality may be used here as well.) 14.S Let f and g be Lebesgue integrable on [a, b] and for x ∈ [a, b] let Z Z F (x) = F (a) + f (t) dλ(t) and G(x) = G(a) + g(t) dλ(t), [a,x]

[a,x]

where F (a) and G(a) are arbitrary. Prove that Z Z F (x)g(x) dλ(x) + G(x)f (x) dλ(x) = F (b)G(b) − F (a)G(a). [a,b]

[a,b]

15. (a) Verify that the function 2 1 1 κ(t, x) = √ e−x /4t = √ ϕ 2 πt 2t

x √ 2t

is a solution of the heat equation wt (t, x) = wxx (t, x),

x ∈ R, t > 0.

(b) Let w0 (x) be integrable on R and define Z ∞ w(t, x) = w0 (y)κ(t, x − y) dy, −∞

the convolution of κ with w0 . Show that w(t, x) satisfies the heat equation. (c) Verify that w(t, x) =

Z

∞

√ w0 x + z 2t ϕ(z) dz.

−∞

(d) Use (c) and the dominated convergence theorem to show that if w0 is continuous and satisfies |w0 (x)| ≤ aeb|x| for some positive constants a, b and for all x, then limt→0+ w(t, x) = w0 (x). Conclude that the solution w(t, x) may be continuously extended to [0, +∞) × R and consequently satisfies the boundary condition w(0, x) = w0 (x). 16. For a Borel measurable function f : R → [0, +∞), define A := {(x, y) : 0 ≤ y ≤ f (x)} and Ay := {x : f (x) > y} , y ∈ R.

398

A Course in Real Analysis Prove: (a) A ∈ B(R2 ). (b) The function y 7→ λ(Ay ) is Borel measurable and Z Z f (x) dλ(x) = λ(Ay ) dλ(y) = λ(A). (0,+∞)

(c) Part (b) holds if A and Ay are replaced, respectively, by B = {(x, y) : 0 ≤ y < f (x)} and By = {x : f (x) ≥ y} . (d) λ {(x, y) : f (x) = y} measure zero.)

11.6

= 0. (The graph of a Borel function has

Change of Variables

In Chapter 5 we proved that if ϕ : [a, b] → R is continuously differentiable with everywhere nonzero derivative and if f is Riemann integrable on [c, d] := ϕ([a, b]), then Z d Z b f (y) dy = f (ϕ(x))|ϕ0 (x)| dx. c

a

In this section we prove the following n-dimensional version of this result. 11.6.1 Change of Variables Theorem. Let U and V be open subsets of Rn and let ϕ : U → V be C 1 on U with C 1 inverse ϕ−1 : V → U . If f is Lebesgue measurable on V and either f ≥ 0 or f is integrable, then Z Z f (y) dy = (f ◦ ϕ)(x)|Jϕ (x)| dx, (11.15) V

U

where Jϕ is the Jacobian of ϕ on U . 11.6.2 Example. Spherical coordinates (r, θ1 , θ2 , . . . , θn−1 ) in Rn are defined by the transformation formulas x1 = r cos θ1 x2 = r sin θ1 cos θ2 x3 = r sin θ1 sin θ2 cos θ3 .. . xn−1 = r sin θ1 sin θ2 · · · sin θn−2 cos θn−1 xn = r sin θ1 sin θ2 · · · sin θn−2 sin θn−1 ,

Lebesgue Integration on Rn

399

where r > 0,

0 < θj < π, j = 1, . . . , n − 2, and 0 < θn−1 < 2π. Pn Note that sin θj > 0 for j ≤ n − 2 and j=1 x2j = r2 . Let U := (0, +∞) × (0, π)n−2 × (0, 2π) and V := Rn \ Rn−2 × [0, +∞) × {0}

and define ϕ on U by ϕ r, θ1 , , . . . , θn−1 = (x1 , . . . , xn ), where the xj are as above. Clearly U and V are open and ϕ is C ∞ on U . We claim that ϕ maps U onto V and has a C ∞ inverse on U . The inclusion ϕ(U ) ⊆ V is established as follows: If (r, θ1 , . . . , θn−1 ) ∈ U and (x1 , . . . , xn ) = ϕ(r, θ1 , . . . , θn−1 ) 6∈ V , then xn−1 ≥ 0 and xn = 0. But the latter implies that θn−1 = π, which gives the contradiction xn−1 < 0. For the reverse inclusion, we show that for each (x1 , . . . , xn ) ∈ V there exists a unique solution (r, θ1 , θ2 , . . . , θn−1 ) to the above system. Clearly, r and θ1 have the unique solutions X 1/2 n 2 r= xj and θ1 = arccos(x1 /r). j=1

In particular, the system has a unique solution if n = 2. Now set yj = xj /(r sin θ1 ), 2 ≤ j ≤ n. By induction, we may assume that the reduced system y2 = cos θ2 y3 = sin θ2 cos θ3 .. . yn−1 = sin θ2 · · · sin θn−2 cos θn−1 yn = sin θ2 · · · sin θn−2 sin θn−1 has a unique solution (θ2 , . . . , θn−1 ). Then the original system has the unique solution (r, θ1 , . . . , θn−1 ). Therefore, ϕ is one-to-one and ϕ(U ) = V . By standard properties of determinants and a reduction argument, Jϕ (r, θ1 , θ2 , . . . , θn−1 ) = rn−1 sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2 . Since Jϕ > 0 on U , the inverse function theorem implies that ϕ has a global C ∞ inverse on U . Hence, by the change of variables theorem, if f is Lebesgue measurable on Rn and either f ≥ 0 or f is integrable, then Z Z f dλ = (f ◦ ϕ)Jϕ dλ. V

U

400

A Course in Real Analysis

Since V differs from Rn by a set of measure zero, we may write the last equation as Z ∞ Z ∞ ··· f (x1 , . . . , xn ) dx1 · · · dxn (11.16) −∞

=

Z

−∞ ∞Z π

Z

π

2π

Z

f r cos θ1 , r sin θ1 cos θ2 , . . . , r sin θ1 · · · sin θn−1 0 0 0 0 rn−1 sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2 dθn−1 dθn−2 · · · dθ1 dr. ···

In particular, taking f to be the indicator function of C1n (0) and using 11.5.6, n we see that the left side of (11.16) is λ C1 (0) = αn and the right side is Z

1

Z

π

Z ···

0

0

π

Z

2π

rn−1 sinn−2 θ1 · · · sin2 θn−3 sin θn−2 dθn−1 dθn−2 · · · dθ1 dr 0 0 Z Z π 2π π = ··· sinn−2 θ1 · · · sin2 θn−3 sin θn−2 dθn−2 · · · dθ1 . n 0 0

In particular, Z π Z π nαn ··· sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2 dθn−2 · · · dθ1 = . ♦ 2π 0 0 Proof of the change of variables theorem. Before we begin the proof proper, we make some reductions. First, by considering f + and f − , we need only prove the case f ≥ 0. Second, since a Lebesgue measurable function is equal a.e. to a Borel measurable function, we may assume that f is Borel measurable. Note that in this case f ◦ ϕ is also Borel measurable. To prove (11.15) it then suffices to verify that Z Z f dλ ≤ (f ◦ ϕ)|Jϕ | dλ (11.17) V

U

for all Borel measurable functions f : V :→ [0, +∞]. Indeed, if (11.17) holds for all f and ϕ, then, switching the roles of U and V it must also be the case that Z Z g dλ ≤ (g ◦ ϕ−1 )|Jϕ−1 | dλ U

V

for all Borel measurable g : U :→ [0, +∞]. Taking g = (f ◦ ϕ)|Jϕ | and recalling that Jϕ Jϕ−1 = 1, we obtain the reverse of inequality (11.17). Finally, by considering simple functions and using linearity, 10.5.8, and the monotone convergence theorem, it suffices to prove (11.17) for indicator functions f = 1B , where B ∈ B(Rn ) and B ⊆ V . Equation (11.17) then reduces to Z λ(B) ≤ |Jϕ | dλ, B ⊆ V, B ∈ B(Rn ), ϕ−1 (B)

Lebesgue Integration on Rn or, equivalently, (taking B = ϕ(E)), Z λ ϕ(E) ≤ |Jϕ | dλ, E ⊆ U, E ∈ B(Rn ).

401

(11.18)

E

The proof of (11.18) is a sequence of lemmas, the first of which treats the case of a linear change of variable. 11.6.3 Lemma. If T ∈ L(Rn , Rn ) is nonsingular, then λ(T (E)) = | det T |λ(E), E ∈ B(Rn ).

(11.19)

Proof. Since T is nonsingular, T (E) ∈ B(Rn ) so the left side of (11.19) is defined. Furthermore, if (11.19) holds for T1 and T2 , then it holds for T1 T2 : λ T1 T2 (E) = | det T1 |λ T2 (E) = | det T1 | | det T2 |λ(E) = | det(T1 T2 )|λ(E). Now observe that a nonsingular linear transformation T may be expressed as a product of elementary linear transformations, that is, linear transformations whose matrices are obtained from the identity matrix by one of the following operations: (a) Interchange of two rows. (b) Multiplication of a row by a nonzero constant. (c) Addition of one row to another. This is simply the assertion that a matrix may be put into reduced row echelon form by a sequence of elementary row operations. (See Appendix B.) We claim that (11.19) holds for elementary linear transformations T and bounded intervals E = I1 × . . . × In . In case (a), det T = −1 and T (E) is the interval obtained from E by interchanging a pair of intervals Ii and Ij , hence (11.19) holds in this case. In (b), T (E) is the interval obtained from E by multiplying one of the coordinate intervals by a nonzero constant a, hence λ(T (E)) = |a|λ(E). Since | det T | = |a|, (11.19) holds in this case as well. For case (c), assume for definiteness that the matrix of T is the result of adding row two of the identity matrix to row one, so T (x1 , x2 , x3 , . . . , xn ) = (x1 + x2 , x2 , x3 , . . . , xn ). Then det T = 1 and λ T (E) =

Z

1T (E) (x) dx =

Z

1E (x1 − x2 , x2 , . . . , xn ) dx.

By the Fubini–Tonelli theorem and translation invariance, the last integral

402

A Course in Real Analysis

evaluates to ZZ

Z ···

1I1 (x1 − x2 )1I2 (x2 ) · · · 1In (xn ) dxn · · · dx2 dx1 Z Z = |In | · · · |I3 | 1I2 (x2 ) 1I1 (x1 − x2 ) dx1 dx2 = |In | · · · |I3 | |I2 | |I1 | = λ(E).

Therefore, (c) holds. It now follows that (11.19) holds for all nonsingular T and all intervals E. To verify (11.19) for all Borel sets E, we use 11.5.7. For a fixed bounded interval I, let GI denote the collection of all E ∈ B(Rn ) for which λ(T (E ∩ I)) = | det T |λ(E ∩ I).

(11.20)

By the first part of the proof, GI contains all intervals. Let A, B ∈ GI with A ⊆ B, and set C = A ∩ I and D = B ∩ I. Then (B \ A) ∩ I = D \ C and λ T (D\C) = λ T (D) −λ T (C) = | det T | λ(D)−λ(C) = | det T |λ(D\C), hence B \ A ∈ GI . (The operation of substraction is legitimate because C and D are bounded.) Now let Ak ∈ GI , Ak ↑ A. Letting k → +∞ in λ(T (Ak ∩ I)) = | det T |λ(Ak ∩ I) shows that A ∈ GI . Therefore, GI satisfies (a)–(c) of 11.5.7, hence (11.20) holds for every E ∈ B(Rn ). Taking a sequence of bounded intervals Ik ↑ Rn in (11.20) yields (11.19).

√ r n/2 y

r/2

Qr (y) Br√n/2 (y)

FIGURE 11.3: Concentric cube and ball. For the remaining lemmas, the following terminology and notation will be useful. The cube with center y ∈ Rn and edge r > 0 is the semi-closed interval Q = Qr (y) := {x ∈ Rn : yj − r/2 ≤ xj < yj + r/2, j = 1, . . . , n} .

Lebesgue Integration on Rn √ Note that |Q| = rn and the diameter of Q is r n. Thus

403

Br/2 (y) ⊆ Qr (y) ⊆ Br√n/2 (y). A paving of a subset A of Rn is a finite collection Qr of pairwise disjoint cubes with edge r that covers A. Two pavings Qr = {Qr (xj ) : 1 ≤ j ≤ m} and Qs = {Qs (xj ) : 1 ≤ j ≤ m} with the same centers are said to be concentric. Any bounded set A has a paving Qr with arbitrarily small r. Indeed, if A ⊆ [a, b)n , one need only subdivide [a, b) into subintervals of size (b − a)/k for sufficiently large k and form Cartesian products of these subintervals. 11.6.4 Lemma. Let K ⊆ U be compact. (a) For each sufficiently small δ > 0, there exists a compact set Kδ with K ⊆ Kδ ⊆ U . (b) For each r < δ, there exists a paving Qr of K contained in Kδ . Proof. For subsets A, B ⊆ Rn , denote by d(A, B) the distance between A and B: d(A, B) = inf {ka − bk : a ∈ A, b ∈ B} . √ Since K is compact and U c is closed, δ0 := d(U c , K) > 0. For 0 < δ < δ0 / n, let √ Kδ = x : d(x, K) ≤ δ n . Then Kδ is compact and K ⊆ Kδ ⊆ U . Let Q be a cube with edge r. If

Qi

K Kδ U

FIGURE 11.4: The paving Qr . x ∈ Q ∩ K and y ∈ Q ∩ Kδc , then √ √ δ n < d(y, K) ≤ kx − yk ≤ r n. Therefore, if r < δ and Q ∩ K 6= ∅, then Q ∩ Kδc = ∅, that is, Q ⊆ Kδ . Since K is bounded, there exists a paving Qr of K. Removing those members of Qr that do not meet K produces a paving of K contained in Kδ . 11.6.5 Corollary. Let ψ : U → Rn be C 1 on U and let E ⊆ U with λ(E) = 0. Then λ ψ(E) = 0.

404

A Course in Real Analysis

Proof. Suppose first that E is bounded. Let V ⊇ E be open with compact closure contained in U and set c := sup kψ 0 (z)k. z∈cl(V )

By continuity of ψ 0 and compactness of cl(V ), c < +∞. Given ε > 0, let W ⊇ E be open with compact closure K = cl(W ) ⊆ V such that λ(K) < ε/2. This is possible by 10.4.4, since λ(E) = 0. Now let Kδ be as in 11.6.4. Since Kδ ↓ K as δ ↓ 0, we may take δ sufficiently small so that λ(Kδ ) < ε. According to the lemma, we may choose a paving Qr = {Q1 , . . . , Qk } of K contained in Kδ with r < ε. It follows that kr = n

k X

λ(Qj ) = λ

j=1

[ k

Qj

< ε.

(11.21)

j=1

Let xj denote the center of Qj . Since Qj is convex, 9.3.6 implies that √ kψ(x) − ψ(xj )k ≤ ckx − xj k ≤ cr n, x ∈ Qj . Therefore,

ψ(Qj ) ⊆ Bcr√n ψ(xj )) ⊆ Q2cr√n ψ(xj )

and so λ ψ Qj

√ ≤ (2cr n)n .

Since the sets ψ(Qj ) cover ψ(K), √ √ λ (ψ(E)) ≤ λ (ψ(K)) ≤ k(2cr n)n ≤ (2c n)n ε, the last inequality by (11.21). Since ε was arbitrary, λ (ψ(E)) = 0. This proves the assertion of the lemma for bounded E. In the unbounded case, take a sequence of bounded Borel sets Ek ↑ E. 11.6.6 Lemma. Let ψ be C 1 on U , Q a cube contained in U , and let In denote the identity transformation on Rn . If kdψx0 − In k ≤ c for all x ∈ Q, then λ ψ(Q) ≤ [(1 + c)n]n λ(Q). ˜ Proof. Let ψ(x) = ψ(x) − x. Then dψ˜x = dψx − In . By 9.3.6, ˜ ˜ kψ(x) − ψ(y)k ≤ ckx − yk, for all x, y ∈ Q. Thus, if Q has center x0 and edge r, then for all x ∈ Q √ ˜ ˜ 0 )k+kx−x0 k ≤ (c+1)kx−x0 k ≤ (c+1) nr/2, kψ(x)−ψ(x0 )k ≤ kψ(x)− ψ(x that √ is, ψ(Q) is contained in the closed ball C with center ψ(x0 ) and radius n(c + 1)r/2. Since C is contained in the cube with center ψ(x0 ) and edge (c + 1)nr, λ ψ(Q) ≤ [(c + 1)nr]n = [(c + 1)n]n λ(Q).

Lebesgue Integration on Rn

405

11.6.7 Lemma. Let ψ : U → Rn be C 1 on U and let K ⊆ U be compact. Then, for each ε > 0, there exists δ > 0, a compact set Kδ with K ⊆ Kδ ⊆ U , and concentric pavings Qr , Qnr of K contained in Kδ with arbitrarily small r such that for any Qr (y) ∈ Qr , λ ϕ Qr (y) ≤ (1 + ε)n |Jϕ (y)|λ Qnr (y) (11.22) Moreover, δ may be chosen so that Z Z |Jϕ (x)| dx < |Jϕ (x)| dx + ε. Kδ

(11.23)

K

Proof. Let M = sup (dϕy )−1 : y ∈ Kδ , where Kδ is chosen as in 11.6.4. For x, y ∈ U define ψ y (x) = dϕy Since dϕy

−1

−1

−1 −1 ϕ(x) − ϕ(y) = dϕy ϕ(x) − dϕy ϕ(y) .

is linear, by the chain rule d(ψ y )x = (dϕy )−1 ◦ dϕx .

Thus for all x ∈ U , y ∈ Kδ , and z ∈ Rn ,

−1 kd(ψ y )x (z) − zk = dϕy dϕx (z) − dϕy (z) ≤ M kdϕx − dϕy k kzk. Therefore, by definition of the operator norm, kd(ψ y )x − In k ≤ M kdϕx − dϕy k.

(11.24)

Now, by the uniform continuity of dϕ on Kδ there exists 0 < √δ1 < δ such that kdϕx − dϕy k ≤ ε/M for all x, y ∈ Kδ with kx − yk < δ1 n. Let r < δ1 /n and let Qr Qnr be concentric pavings of√ K contained in Kδ . If √ x ∈ Q := Qr (y) ∈ Qr , then kx − yk < r n < δ1 n, hence, from (11.24), kd(ψ y )x − In k < ε. By 11.6.6, λ ψ y (Q) ≤ [(1 + ε)n]n λ(Q) = (1 + ε)n λ Qnr (y) . (11.25) On the other hand, since ψ y (Q) = dϕy ) translation invariance and 11.6.3,

−1

−1 ϕ(Q) − dϕy ϕ(y) , by

−1 λ ψ y (Q) = λ dϕy (ϕ(Q)) = |Jϕ (y)|−1 λ ϕ(Q) .

(11.26)

Inequality (11.22) now follows from (11.25) and (11.26). R For (11.23), note that since K1/k ↓ K and µ(A) := A |Jϕ | dλ is a measure on the Borel sets (11.3.4), µ K1/k ↓ µ(K). Thus there exists k such that µ K1/k < µ(K) + ε. Taking δ < 1/k completes the proof.

406

A Course in Real Analysis

11.6.8 Lemma. If K ⊆ U is compact, then Z λ ϕ(K) ≤ |Jϕ (y)| dy. K

Proof. Let ε > 0 and choose δ > 0 as in 11.6.7. By uniform continuity of Jϕ (x) on Kδ , there exists δ1 < δ such that |Jϕ (x) − Jϕ (y)| < ε for all x, y ∈ Kδ with kx − yk < δ1 . Choose pavings Qr = {Qr (y)}y and Qnr = {Qnr (y)}y as in 11.6.7. Then for x ∈ Qnr (y) |Jϕ (y)| ≤ |Jϕ (x) − Jϕ (y)| + |Jϕ (x)| < ε + |Jϕ (x)|, hence, by (11.22), (1 + ε)−n λ ϕ(Qr (y)) ≤ |Jϕ (y)|λ(Qnr (y)) ≤

Z

|Jϕ (x)| + ε dx,

Qnr (y)

so X (1 + ε)−n λ ϕ(K) ≤ (1 + ε)−n λ ϕ(Qr (y)) y

Z

|Jϕ (x)| + ε dx

≤ Kδ

Z

|Jϕ (x)| dx + ε 1 + λ(Kδ ) .

≤

by (11.23)

K

Letting ε → 0 verifies the lemma. Now use 10.4.5 to obtain an increasing sequence of compact sets Kk ⊆ E such that λ(Kk ) ↑ λ(E). Then λ ϕ(Kk ) ↑ λ ϕ(E) and, by 11.6.8, Z Z |Jϕ (y)| dy ≤ |Jϕ (y)| dy. λ ϕ(Kk ) ≤ Kk

E

Letting k → +∞ yields (11.18), completing the proof of the change of variables theorem. 11.6.9 Remark. If V is a linear subspace of Rn of dimension m < n, then λn (V) = 0. To see this, let v1 , . . ., vm , . . ., vn , be an orthonormal basis for Rn , where the first m vectors form a basis for V.3 Define TV ∈ L(Rn , Rn ) such that TV (vj ) = ej , 1 ≤ j ≤ n. Then TV is an orthogonal transformation and TV (V) = Rm × {0}. By 11.6.3 λn (V) = | det(TV )|λn (Rm × {0}) = 0, 3 This

is always possible by the Gram–Schmidt process.

Lebesgue Integration on Rn

407

as claimed. This also shows that (11.19) holds for singular transformations T as well, since then both sides of that equation are zero. While the n-dimensional volume of a subset E of V is zero, E may still have positive m-dimensional measure. This is defined as λV (E) := λm TV (E) for E ∈ TV−1 B(Rm ) . From a geometric point of view, this is a reasonable definition, since an orthogonal transformation is either a rotation or a rotation combined with a reflection and therefore does not change volumes or areas. To see that the definition does not depend on the particular choice of the orthonormal basis, let w1 , . . . , wn be another orthonormal basis for Rn whose first m members form a basis for V and let T˜V ∈ L(Rn , Rn ) satisfy T˜V (wj ) = ej , 1 ≤ j ≤ n. Set T = T˜V TV−1 . Then, by (11.19), λm T˜V (E) = λm T TV (E) = | det T |λm TV (E) = λm TV (E) , the last equality because T is orthogonal and hence has determinant ±1.

♦

Exercises 1. Define the n-dimensional ellipsoid ( ) 2 2 x1 xn E = (x1 , . . . , xn ) : + ··· + ≤1 , a1 an where aj > 0. Prove that λn (E) = a1 · · · an λn C1 (0) . p p p 2. Show that the volume of the solid with surface |x| + |y| + |z| = 1 is given by Z Z Z 1

1−u

1−u−v

64

uvw dw dv du. 0

0

0

3.S ⇓4 Let h be Lebesgue integrable on [0, +∞). Use 11.6.2 to prove that Z Z ∞ h(kxk) dx = nαn h(r)rn−1 dr. Rn

0

4. Use Exercise 3 to show that for n ≥ 2 Z Z (a) exp(−kxk) dx = n! αn . (b) Rn

exp(−kxk2 ) dx = π n/2 .

Rn

5.S A hole of radius R ∈ (0, 1) is drilled in the (n + 1)-dimensional ball C1n+1 (0) from the north pole (0, 0, . . . , 1) to the south pole (0, 0, . . . , −1). Use Exercise 3 to show that the amount removed from the ball is p p nαn R 1 − R2 − arcsin 1 − R2 + π/2 . 4 This

exercise will be used in 13.2.5 and 13.4.2.

408

A Course in Real Analysis

6.S (Theorem of Pappus) Let E ∈ M(Rn ) be bounded with positive n-dimensional Lebesgue measure such that xn > 0 for all x = (x1 , . . . , xn ) ∈ E. Define Er = {(x1 , . . . , xn−1 , xn cos θ, xn sin θ) : x ∈ E, 0 < θ < 2π} . Prove that

λn+1 (Er ) = 2πxn λn (E),

where

1 xn := λn (E)

Z

xn dλn (x1 , . . . , xn ),

E

the nth coordinate of the centroid x of E. Thus if n = 2, then Er is the rotation of E about the x1 -axis, and the theorem of Pappus asserts that the volume of Er is equal to the area of E times the distance the centroid of E travels around the x1 axis.

x2 E x θ x1

x3 FIGURE 11.5: Theorem of Pappus.

Chapter 12 Curves and Surfaces in Rn

12.1

Parameterized Curves

A parameterized curve Rn is a continuous function ϕ : I → Rn , where I is an interval in R. We shall usually refer to ϕ as simply a curve. The range ϕ(I) of ϕ is called the trace of ϕ and is denoted by trace(ϕ). The curve is said to lie in a set E ⊆ Rn if trace(ϕ) ⊆ E. The curve is called simple if ϕ is one-to-one. If I = [a, b], the point ϕ(a) is the initial point of the curve and ϕ(b) the terminal point. The curve ϕ is then said to be closed if ϕ(a) = ϕ(b), and simple closed if it is closed and ϕ is one-to-one on (a, b), that is, the curve intersects itself only at the initial and terminal points. For example, the curve (cos(2kπt), sin(2kπt)), t ∈ [0, 1], k ∈ N, is a simple closed curve iff k = 1; its trace is the circle x2 + y 2 = 1. ϕ(a)

ϕ(b)

ϕ(a)

ϕ(a) = ϕ(b)

ϕ(b)

Simple curve

Non-simple curve

Simple closed curve

FIGURE 12.1: Curves in R2 . A curve ϕ : I → Rn is said to be of class C r if ϕ is C r on an open interval containing I. A C 1 curve ϕ is smooth if ϕ0 (t) 6= 0 for all t ∈ I. For example, on [−1, 1] the curve ϕ(t) = (t, t2 ) is smooth but the curve ψ(t) = (t3 , t6 ), which has the same trace as ϕ, is not. A curve ϕ : [a, b] → Rn is said to be piecewise smooth if, for some partition a = a0 < a1 < · · · < am = b, ϕ is smooth on each interval [aj−1 , aj ]. This implies that ϕ0 is uniformly continuous on each interval of smoothness (aj−1 , aj ) and has right-hand and left-hand limits at the left and right endpoints, respectively. Thus a piecewise smooth curve may be viewed as a concatenation (sum) 409

410

A Course in Real Analysis

of smooth curves, as shown in Figure 12.2. Note that at junctions that are corners there are two tangent vectors, and at junctions that are cusps there is one. A point on a smooth portion of the curve will be called a smooth point. A piecewise smooth curve therefore consists of smooth points and finitely many corner or cusp points. corner

cusp

smooth point

corner

FIGURE 12.2: A piecewise smooth curve with tangent vectors. A reparametrization of a curve ϕ : I → Rn is a curve ψ = ϕ ◦ α : J → Rn , where α : J → I is continuous, strictly increasing, and α(J) = I (hence trace(ψ) = trace(ϕ)). If ϕ is smooth, then α is required to be smooth with positive Jacobian. If ψ is a reparametrization of ϕ, then ϕ and ψ are said to be equivalent. For example, the smooth curve (t, t2 , t3 ) (t > 0) is equivalent to the curve (et , e2t , e3t ) (t ∈ R). A curve ϕ : I → Rn has a positive direction, namely, the direction that ϕ(t) moves as t increases. An equivalent curve ψ = ϕ ◦ α has the same direction since α is strictly increasing. The curve −ϕ, defined by (−ϕ)(t) := ϕ(−t), −t ∈ I, has the opposite (negative) direction. If ϕ is piecewise smooth, then the positive direction is given by the tangent vectors ϕ0 (t), defined at smooth points. At corners and cusps the tangent vectors are right- and left-hand limits. The set of tangent vectors to a curve is called the tangent vector field (defined more precisely later). 12.1.1 Proposition. . Let ϕj : [aj , bj ] → Rn , j = 1, . . . , k, be piecewise C 1 curves such that ϕj (bj ) = ϕj+1 (aj+1 ), j = 1, . . . , k − 1. Then there exists a piecewise C 1 curve ϕ : [0, 1] → Rn , denoted by ϕ = ϕ1 + ϕ2 + · · · + ϕk and called the sum of the curves ϕj , such that ϕ [(j−1)/k,j/k] is equivalent to ϕj . Proof. Define αj : [(j − 1)/k, j/k] → [aj , bj ] by αj (t) = bj + (bj − aj )(kt − j), (j − 1)/k ≤ t ≤ j/k, and ϕ : [0, 1] → Rn by ϕ = ϕj ◦ αj on [(j − 1)/k, j/k].

Curves and Surfaces in Rn

411

Exercises 1.S Prove that the notion of equivalent smooth curves is an equivalence relation. 2. Show that if ϕ is smooth and ψ = ϕ ◦ α is an equivalent curve, then ϕ0 (t) ψ 0 (t) = . kϕ0 (t)k kψ 0 (t)k Thus the unit tangent vector field is invariant under a reparametrization. 3.S Sketch the trace of the curve ϕ(t) = (t2 , t3 − t) on the interval [−2, 2]. Find all points on the trace where there are two tangent vectors and express these vectors in terms of the standard basis. 4. Find the tangent vector field of the given curve ϕ on the interval [0, 2π]. Sketch the trace and find all points on the trace at which there are two tangent vectors. Express these vectors in terms of the standard basis. (a)S ϕ(t) = sin t, cos(2t) . (b) ϕ(t) = cos t, sin(2t) . (c) ϕ(t) = cos t, cos(2t) . (d) ϕ(t) = sin t, sin(2t) . 5. In (a)–(d) below, find a smooth simple curve or a smooth simple closed curve ϕ : I → C with trace C. x2 y2 (a) C is the intersection of the elliptic cylinder 2 + 2 = 1 and the a b plane x + y + z = 1. x2 y2 (b)S C is the intersection of the elliptic cylinder 2 + 2 = 1 and the a b surface z = 2xy. (c) C is the intersection in the first octant of the paraboloid z = x2 + y 2 and the plane x + y + z = 1. (d) C is the intersection in the first octant of the cone z = x2 + y 2 and the plane x + y + z = 1. 6.S Let ϕ : [a, b] → Rn be a C 1 curve with the property that for some x ∈ Rn , ϕ(t) = x for infinitely many t ∈ [a, b]. Prove that ϕ is not smooth. 1 7. Let f be C 1 on an open set U and let ϕ be a C curve in U . Suppose 0 that ϕ (t) = ∇f ϕ(t) for all t > a and that the limit x := limt→+∞ ϕ(t) exists in U . Prove that ∇f (x) = 0.

Hint. Assume ∇f (x) 6= 0. Let g = f ◦ ϕ and show that g 0 (t) > k∇f (x)k2 /2 for all sufficiently large t.

412

12.2

A Course in Real Analysis

Integration on Curves

Rectifiable Curves Let ϕ : I → Rn be a parameterized curve. Assume first that I = [a, b]. For a partition P = {t0 = a < t1 < · · · < tk−1 < tk = b} of [a, b] define LP (ϕ) =

k X

kϕ(tj ) − ϕ(tj−1 )k,

j=1

which is the length of the inscribed polygonal line with segments joining the points ϕ(tj−1 ) and ϕ(tj ).

ϕ(t2 )

ϕ(t3 )

ϕ(t1 )

ϕ(b)

ϕ(a)

FIGURE 12.3: Inscribed polygonal line. The (arc) length of ϕ is defined as length(ϕ) := sup LP (ϕ), P

where the supremum is taken over all partitions P of [a, b]. If length(ϕ) < +∞, then ϕ is said to be rectifiable. Note that if ψ = ϕ ◦ α is equivalent to ϕ, then length(ψ) = length(ϕ), since α : [c, d] → [a, b] induces a one-to-one correspondence between partitions of [c, d] and [a, b]. If I = [a, b) (where b could be infinite), define length(ϕ) := sup length ϕ [a,t] . a 1. This follows from the inequalities k X

|y(tj ) − y(tj−1 )| ≤

k X

j=1

kϕ(tj ) − ϕ(tj−1 )k ≤ 2(b − a) + 2

j=1

k X

|y(tj ) − y(tj−1 )|

j=1

and 5.9.3.

♦

We prove in 12.2.4 below that piecewise C 1 curves on [a, b] are rectifiable. For this, we require two lemmas. The proof of the first is similar to that of the corresponding result for lower Darboux sums and is left as an exercise. 12.2.2 Lemma. Let ϕ : [a, b] → Rn be a curve and let P and Q be partitions of [a, b]. If P is a refinement of Q, then LQ (ϕ) ≤ LP (ϕ). 12.2.3 Lemma. Let ϕ : [a, b] → Rn be a curve and c ∈ (a, b). Then length(ϕ) = length ϕ|[a,c] + length ϕ|[c,b] . In particular, ϕ is rectifiable iff ϕ|[a,c] and ϕ|[c,b] are rectifiable. Proof. Let P 0 and P 00 be partitions of [a, c] and [c, b], respectively, and set P = P 0 ∪ P 00 . Then P is a partition of [a, b] and length(ϕ) ≥ LP (ϕ) = LP 0 ϕ|[a,c] + LP 00 ϕ|[c,b] . Taking suprema over P 0 and then P 00 yields length(ϕ) ≥ length ϕ|[a,c] + length ϕ|[c,b] . For the reverse inequality, let P = {t0 = a < t1 < · · · < tk = b} be a partition of [a, b] and suppose c ∈ (ti−1 , ti ]. If P 0 = {t0 = a < t1 < · · · < ti−1 < c} and P 00 = {c ≤ ti < · · · < tk = b}, then an application of the triangle inequality shows that LP (ϕ) ≤ LP 0 ϕ|[a,c] + LP 00 ϕ|[c,b] ≤ length ϕ|[a,c] + length ϕ|[c,b] . Since P was arbitrary, length(ϕ) ≤ length ϕ|[a,c] + length ϕ|[c,b] . 12.2.4 Theorem. Let ϕ : [a, b] → Rn be piecewise C 1 . Then ϕ is rectifiable and m Z aj X length(ϕ) = kϕ0 (t)k dt, j=1

aj−1

where ϕ is smooth on the intervals [aj−1 , aj ], a = a0 < a1 < · · · < am = b.

414

A Course in Real Analysis

Proof. By 12.2.3 we may assume that ϕ = (ϕ1 , . . . , ϕn ) is C 1 on [a, b]. Given ε > 0, choose δ > 0 so that m Z b X 0 kϕ (t)k dt − (12.1) kϕ0 (tk )k∆tk < ε, ∆tk := tk − tk−1 a

k=1

for all partitions P = {t0 = a < t1 < · · · < tm−1 < tm = b} with kPk < δ. For such a partition P, choose sj,k ∈ (tk−1 , tk ) such that ϕj (tk ) − ϕj (tk−1 ) = ϕ0j (sj,k )∆tk ,

k = 1, . . . , m, j = 1, . . . , n.

Then LP (ϕ) =

m X

kϕ(tk ) − ϕ(tk−1 )k =

m X n X

1/2

∆tk ,

j=1

k=1

k=1

|ϕ0j (sj,k )|2

hence m X 0 LP (ϕ) − kϕ (t )k∆t k k k=1

m n 1/2 X 1/2 n X X 0 2 0 2 = |ϕj (sj,k )| − |ϕj (tk )| ∆tk . k=1 j=1 j=1 Taking a smaller δ if necessary, we may assume that the absolute value of the term in braces is less than ε/(b − a). This is possible by the uniform continuity of ϕ0 . It follows that m X 0 LP (ϕ) − kϕ (t )k∆t (12.2) k k < ε. k=1

From (12.1) and (12.2) we now have Z b Z kϕ0 (t)k dt − 2ε < LP (ϕ) < a

b

kϕ0 (t)k dt + 2ε

(12.3)

a

for all P with kPk < δ. Since LP (ϕ) ≤ length(ϕ) and ε was arbitrary, the first inequality in (12.3) implies that Z b kϕ0 (t)k dt ≤ length(ϕ). a

For the reverse inequality, let Q be any partition of [a, b]. Refine Q to obtain a partition P with kPk < δ. Then, from 12.2.2 and the second inequality in (12.3), Z b L(ϕ, Q) < kϕ0 (t)k dt + 2ε. a

Since Q and ε are arbitrary, length(ϕ) ≤

Rb a

kϕ0 (t)k dt.

Curves and Surfaces in Rn

415

The proof of the following corollary is left to the reader. 12.2.5 Corollary. If ϕ : [a, b) → Rn is C 1 , then length(ϕ) is the improper Rb integral a kϕ0 (t)k dt. 12.2.6 Example. Let ϕ(t) = e−t cos t, e−t sin t , where 0 ≤ t < +∞. Then R∞ kϕ0 (t)k = e−t , hence length(ϕ) = 0 e−t = 1. ♦

Line Integrals Let ϕ : [a, b] → Rn be a C 1 curve with trace C and let f : C → R be continuous. The line integral of f over ϕ is defined by Z Z Z b f ds = f ds = f ϕ(t) kϕ0 (t)k dt. ϕ

C

a

Note that if ψ = ϕ ◦ α is an equivalent parametrization, where α : [c, d] → [a, b] is C 1 , then, by the chain rule and the change of variables theorem, Z d Z d 0 f ψ(t) kψ (t)k dt = f ϕ(α(t)) kϕ0 α(t) kα0 (t) dt c

c

=

Z

b

f ϕ(u) kϕ0 (u)k du.

a

The value of a line integral is therefore independent of the choice of parametrization. If ϕ : [a, b] → Rn is piecewise C 1 , then the line integral is defined as Z XZ f ds = f ds, ϕ

j

ϕj

where ϕj is the restriction of ϕ to [aj , aj+1 ] and ϕ is C 1 on [aj , aj+1 ]. If ϕ : I → Rn is C 1 , where I is an arbitrary interval, then the line integral is defined as an improper integral, as in the case of arc length. 12.2.7 Remark. Theorem 12.2.4 shows that arc length is the line integral of the constant function 1. Using techniques similar to those found in the proof R of that theorem, one may show that if ϕ is C 1 , then ϕ f is the limit of sums of the form k X (f ◦ ϕ)(t∗j )kϕ(tj ) − ϕ(tj−1 )k, t∗j ∈ (tj−1 , tj ), j=1

as maxj kϕ(tj ) − ϕ(tj−1 )k → 0. This interpretation is useful in applications. For example, if f (x) is the mass per unit length at the point x of a wire C, then (f ◦ ϕ)(t∗j )kϕ(tj ) − ϕ(tj−1 )k is approximately the mass of a small piece of the wire. Summing and taking the limit gives the mass of the wire as the R line integral C f ds. ♦

416

A Course in Real Analysis

Vector Fields 12.2.8 Definition. A vector field on a set E ⊆ Rn is a function F~ = (f1 , . . . , fn ) : E → Rn . The vector field is said to be of class C r if each fj is C r .

♦

Geometrically, a vector field assigns to each point of E a unique vector in R , as illustrated in Figure 12.4. n

E

x F~ (x)

FIGURE 12.4: Vector field on E. If ϕ is a simple smooth curve and x = ϕ(t), then ϕ0 (t) ~vϕ (x) := ϕ0 (t) and T~ϕ (x) := kϕ0 (t)k denote, respectively, the tangent vector field and unit tangent vector field along ϕ. If ϕ denotes the position of a particle at time t, then the tangent vector field is called the velocity vector field of the particle. Vector fields that describe forces, such as gravitation or electromagnetism, are called force fields. Line integrals may then be used to calculate the work done by the force in moving a particle along a curve. Specifically, suppose the particle moves along a simple smooth curve ϕ : [a, b] → R3 under the action of a continuous force field F~ = f1 , f2 , f3 on C := trace(ϕ). The work ∆j W done by the force in moving the particle from a point xj = ϕ(tj ) on C to a nearby point xj+1 = ϕ(tj+1 ) is approximately the component of the force in the direction of the tangent to the curve at xj multiplied by the distance the particle travels: ∆j W ≈ F~ (xj ) · T~ϕ (xj ) kxj − xj+1 k. P The total work W done by the force is then approximately j ∆j W . Since F is continuous, the approximation gets better by taking smaller intervals. It is therefore reasonable to define the totalPwork done by the force in moving the particle along the curve as the limit of j ∆j W as maxj kxj − xj+1 k → 0. By 12.2.7, we are therefore led to the definition Z Z b W := F~ · T~ ds = F~ ϕ(t) · T~ϕ ϕ(t) kϕ0 (t)k dt. ϕ

a

Curves and Surfaces in Rn

417

Since T~ϕ ϕ(t) = kϕ0 (t)k−1 ϕ0 (t), we see that Z b Z b dx1 dx2 dx3 f1 (x) F~ ϕ(t) · ϕ0 (t) dt = W = + f2 (x) + f3 (x) dt, dt dt dt a a where x = ϕ(t). The last integral is frequently written Z f1 dx1 + f2 dx2 + f3 dx3 . ϕ

The integrand is called a (differential) 1-form on C in R3 .

Differential 1-Forms in Rn Let fj be defined on a set S ⊆ Rn . The symbol ω := f1 dx1 + · · · + fn dxn is called a (differential) 1-form on S. The form is said to be C r on S if each fj is C r on S, where r ∈ N ∪ {+∞}. Given another 1-form η = g1 dx1 + · · · + gn dxn on S and a, b ∈ R, the 1-form aω + bη on S is defined by aω + bη := (af1 + bg1 ) dx1 + · · · + (afn + bgn ) dxn . ~ = (h1 , . . . , hn ) is a vector field on S, we define the inner product ω · H ~ of If H ~ ω and H on S by ~ ω · H(x) :=

n X

fj (x)hj (x),

x ∈ S.

j=1

The integral of a continuous (that is, C 0 ) 1-form ω over a C 1 curve ϕ : [a, b] → S is defined as Z Z b Z b 0 0 ω= f1 ϕ(t) ϕ1 (t) + · · · + fn ϕ(t) ϕn (t) dt = F~ (ϕ(t)) · ϕ0 (t) dt, ϕ

a

a

R where F~ := (f1 , . . . , fn ). If ϕ is only piecewise C , then ϕ ω is defined to be the sum of the integrals over the intervals on which ϕ is C 1 . The following properties of the integral are easily established: Z Z Z • (aω + bη) = a ω + b η, 1

ϕ

•

Z

ϕ

ω=−

Z

−ϕ

•

Z

ϕ

and

ω, ϕ

ω=

ϕ1 +···+ϕk

k Z X j=1

ϕj

ω.

418

A Course in Real Analysis

A continuous 1-form ω = f1 dx1 + · · · + fn dxn on an open set U ⊆ Rn is said to be exact if there exists a C 1 function f on U such that fj = ∂j f on U for each j. We then write n X ∂f ω = df = dxj . ∂x j i=1

The following proposition shows that the integral of an exact form over a curve depends only on f and the endpoints of the curve. 12.2.9 Proposition. If ϕ : [a, b] → U is piecewise C 1 , then Z df = f ϕ(b) − f ϕ(a) . ϕ

Proof. If ϕ is C 1 , then, by the chain rule and the fundamental theorem of calculus, Z Z bX Z b n 0 df = (∂j f ) ϕ(t) ϕi (t) dt = (f ◦ ϕ)0 (t) dt = f ϕ(b) − f ϕ(a) . ϕ

a i=1

a

If ϕ is only piecewise C 1 , subdivide the interval [a, b] into intervals on which ϕ is smooth, apply the above result to each subinterval, and sum the results. 12.2.10 Theorem. Let U ⊆ Rn be open and connected and let ω be a continuous 1-form on U . The following statements are equivalent: (a) ω is exact. R (b) ϕ ω = 0 for every closed piecewise C 1 curve ϕ in U . R R (c) φ ω = ψ ω for every pair of piecewise C 1 curves φ, ψ : [a, b] → Rn in U with φ(a) = ψ(a) and φ(b) = ψ(b). Proof. That (a) implies (b) follows from 12.2.9. φ

ψ(a) = φ(a)

ϕ

ψ(b) = φ(b)

ψ

FIGURE 12.5: ϕ = ψ − φ. by

For (b) implies (c), define a closed, piecewise smooth curve ϕ : [a, b+1] → Rn ϕ(t) = ψ(t), a ≤ t ≤ b, ϕ(t) = φ (b + (b − t)(b − a)) , b ≤ t ≤ b + 1.

Curves and Surfaces in Rn

419

(See Figure 12.5.) Then ϕ|[b,b+1] is equivalent to −φ, hence if (b) holds, Z Z Z 0= ω= ω − ω, ϕ

ψ

φ

proving (c).

ϕx

a

ψt U

x

x + tej

FIGURE 12.6: ϕx+tej = ϕx + ψt . Pn Now assume that (c) holds and let ω = j=1 fj dxj . To establish (a), we construct a function f on U such that ∂j f = fj . Choose any point a ∈ U . 1 By Exercise 8.7.8, for each x ∈ U there exists a piecewise R C curve ϕx in U with initial point a and terminal point x. Define f (x) = ϕx ω. By (c), f (x) is independent of the path and hence is well-defined. Fix j, let t > 0, and denote by ψt the line segment x + uej , 0 ≤ u ≤ t. Then ψt lies in U for sufficiently small t > 0, and by continuity of fj , Z Z 1 1 1 t f (x + tej ) − f (x) = ω= fj x + uej ) du → fj (x) t t ψt t 0 as t → 0+ . A similar argument works for the case t → 0− . Therefore, ∂j f (x) = fj (x), as required.

Exercises 1.S Determine which of the following curves are rectifiable. (a) ϕ(t) = (t, t−p ), 0 < t ≤ 1, where p > 0. 2

3

(b) ϕ(t) = (e−t , e−t , e−t ), 0 ≤ t < +∞. (c) ϕ(t) = t−1 , e−t , t ≥ 1. (d) ϕ(t) = t−1 , e−t , 0 < t ≤ 1. R 2. Evaluate ϕ f for (a) ϕ(t) = (t3 /3, t4 /4), 1 ≤ t ≤ 2, f (x, y) = x/y. (b)S ϕ(t) = t, sin(2t), cos(2t) , 0 ≤ t ≤ π/4, f (x, y, z) = xz. (c) ϕ(t) = t, t2 /2, t3 /3 , 0 ≤ t ≤ 1, f (x, y, z) = x + 6z. √ (d) ϕ(t) = sin t, 2 cos t, sin t , 0 ≤ t ≤ π/2, f (x, y, z) = xyz.

420

A Course in Real Analysis

3. Set up, but do not evaluate, the integral that gives the circumference of x2 y2 the ellipse 2 + 2 = 1. (Your answer should involve sin2 t.) a b 4. In each case below, find a smooth simple curve or a smooth simple closed curve with trace C. Use the parametrization to find an integral that gives the length of the curve. (Do not evaluate the integral.) (a) C = (x, y) : x3 − 7y 2 = 1, 1 < x < 2, y > 0 . (b)S C = (x, y) : 9(x − 1)2 + 4(y − 2)2 = 36 . (c) C = (x, y) : x2 − y 2 = 4, x > 2, 0 < x + y < a . 5. Let ϕ(x) = (x, g(x)), a ≤ x ≤ b, where g is continuously differentiable, and let f (x, y) be continuous on the graph of g. Show that Z ϕ

f=

Z

b

p f x, g(x) 1 + [g 0 (x)]2 dx.

a

Use this to find Z (a) f if g(x) = (2/5)x5/2 and f (x, y) = x2 , 0 ≤ x ≤ 1. ϕ

(b)S the length of the graph of the equation x2/3 + y 2/3 = 1. (c) the length of the graph of the function g(x) = 0 < a ≤ x ≤ b and p > 2.

xp x2−p + , where 2p 2(p − 2)

6. Prove 12.2.5. 7.S Let a smooth curve ϕ : [a, b] → R2 be described in polar coordinates by ϕ(t) = r(t) cos θ(t), r(t) sin θ(t) , r(t) ≥ 0. Show that length(ϕ) =

Z

b

q 2 2 r(t) θ0 (t) + r0 (t) dt.

a

8. Let F~ = (F1 , F2 , F3 ) be a force field in R3 that moves a particle of mass m along a smooth curve ϕ : [a, b] → R3 . The kinetic energy of the particle at time t is defined as 21 mkϕ0 (t)k2 . Use Newton’s second law F~ = mϕ00 to show that the work done by the force in moving the particle from ϕ(a) to ϕ(b) is the change in kinetic energy 0 2 1 2 mkϕ (b)k

− 12 mkϕ0 (a)k2 .

Curves and Surfaces in Rn

421

9. A force field F~ in R3 is said to be conservative if there exists a function P (x, y, z) such that F~ = −∇P . P (x, y, z) is called the potential energy of an object at the point (x, y, z). (a)S Show that the work done by a conservative force in moving the object along a curve ϕ from ϕ(a) to ϕ(b) is P ϕ(a) − P ϕ(b) . (b) Deduce the Law of Conservation of Energy P ϕ(b) + 12 mkϕ0 (b)k2 = P ϕ(a) + 12 mkϕ0 (a)k2 , that is, the sum of the potential and kinetic energies is constant. (c) Find a potential function for the gravitational force field F (x) = −mM Gkxk−3 x, where M is the mass of the earth (concentrated at the origin, the center of the earth), m is the mass of the particle at point x, and G is the gravitation constant. 10. For a smooth curve ϕ : [a, b] :→ Rn , define the arc length function s = s(t) by Z t s(t) = kϕ0 (τ )k dτ, a ≤ t ≤ b. a

Show that s has a smooth inverse t = t(s), 0 ≤ s ≤ ` := length(ϕ). The curve ψ(s) = ϕ(t(s)) is called a reparametrization of ϕ by arc length. Show that, for a continuous vector field F~ on trace(ϕ) = trace(ψ), Z

F~ · T~ϕ =

ϕ

Z

`

F~ ψ(s) · ψ 0 (s) ds.

0

11. Let P = {a = t0 < t1 < · · · < tk = b} be a partition of [a, b]. For f : [a, b] → R, define VP (f ) =

k X

|f (tj ) − f (tj−1 )|.

j=1

Then f is said to have bounded variation on the interval [a, b] if supP VP (f ) < +∞. (Section 5.9.) Show that a curve ϕ = (ϕ1 , . . . , ϕn ) : [a, b] → Rn is rectifiable iff each component function ϕi has bounded variation on [a, b].

422

A Course in Real Analysis

12.3

Parameterized Surfaces

12.3.1 Definition. Let 1 ≤ m ≤ n. A smooth parameterized m-surface in Rn is a C 1 function ϕ = (ϕ1 , . . . , ϕn ) : U → Rn , where U ⊆ Rm is open and the derivative ϕ0 (u) has rank m at each point u ∈ U . A reparametrization of ϕ is a smooth parameterized m-surface ψ = ϕ ◦ α : V → Rn , where V ⊆ Rm is open and α : V → U is C 1 with C 1 inverse α−1 : U → V such that Jα > 0 on V . In this case, ϕ and ψ are said to be equivalent. ♦ We shall usually drop the qualifier “smooth” when referring to parameterized surfaces. Note that the parameter set U is a m-parameterized surface in Rm . Here, we take ϕ to be the identity map ι : U → U .

Tangent Spaces of a Parameterized Surface Let ϕ : U → Rn be a parameterized m-surface and u ∈ U . For small |t| the line segment u + tej is contained in U and is mapped by ϕ onto a curve in S := ϕ(U ) with tangent vector d ∂ϕ1 ∂ϕn (u), . . . , (u) =: ∂j ϕ(u), ϕ(u + tej ) = dϕu (ej ) = dt t=0 ∂uj ∂uj where e1 , . . . , em are the standard basis vectors in Rm . Note that ∂j ϕ(u) is just the jth column of ϕ0 (u). Since ϕ0 (u) has rank m, the vectors dϕu (ej ) are linearly independent and hence form a basis for an m-dimensional subspace Tϕ(u) of Rn , called the tangent space of ϕ at u. Thus dϕu is a linear isomorphism from Rm onto Tϕ(u) mapping the frame (e1 , . . . , em ) onto the frame (∂1 ϕ(u), . . . , ∂m ϕ(u)).1 Note that ϕ is not assumed to be one-to-one, and ϕ(u) = ϕ(v) does not necessarily imply that Tϕ(u) = Tϕ(v) . (See Figure 12.7.)

p Tϕ(v)

ϕ(U )

Tϕ(u)

FIGURE 12.7: Tangent spaces at p = ϕ(u) = ϕ(v). 1A

frame in a finite dimensional vector space is simply an ordered basis—see Appendix B.

Curves and Surfaces in Rn

423

Orientation of a Parameterized m-Surface Tangent spaces may be used to assign an orientation to a parameterized m-surface, a notion that will be needed later to construct the integral of a differential form on a surface. First, we define orientation for the space Rm . Two frames (v 1 , . . . , v m ) and (w1 , . . . , wm ) in Rm are said to be orientation equivalent if the determinants of the matrices 1 v · · · v m and w1 · · · wm (where v j and wj are written as column vectors) have the same sign. Orientation equivalence is easily seen to be an equivalence relation. The collection of frames of Rm is therefore partitioned into two classes, one that contains (e1 . . . , em ) and the other containing (−e1 . . . , em ). An orientation is assigned to Rm by designating one of these equivalence classes to be positive and the other negative. Any frame in the former class is then said to have positive orientation, while a frame in the latter class is said to have negative orientation. For example, if m = 3 and (v 1 , v 2 , v 3 ) has positive orientation, then so does (v 2 , v 3 , v 1 ), while (v 2 , v 1 , v 3 ) has negative orientation. By convention, the standard or positive orientation of Rm is the orientation obtained by designating the frame (e1 , . . . , em ) to be positive. For example, in the standard orientation, the sign of the frame (em , e1 , . . . , em−1 ) is (−1)m−1 . We shall always assume that the spaces Rm have the standard orientation. A parameterized m-surface ϕ : U → Rn is said to be orientable if, whenever ϕ(u) = ϕ(v), • Tϕ(u) = Tϕ(v) and • the matrix of the linear transformation, Tuv = (dϕv )−1 ◦ dϕu : Rm → Rm

(12.4)

has positive determinant. Frames (ξ 1 , . . . , ξ m ) and (ζ 1 , . . . , ζ m ) in Tϕ(u) are then declared to be orientation equivalent if the frames 1 m −1 1 −1 m dϕ−1 u (ξ , . . . , ξ ) := dϕu (ξ ), . . . , dϕu (ξ ) and

1 m −1 1 −1 m dϕ−1 u (ζ , . . . , ζ ) := dϕu (ζ ), . . . , dϕu (ζ )

are orientation equivalent in Rm . Since Tuv ◦ (dϕu )−1 (ξ 1 , . . . , ξ m ) = (dϕv )−1 (ξ 1 , . . . , ξ m ) 1 m m −1 1 and det Tuv > 0, the frames dϕ−1 u (ξ , . . . , ξ ) and dϕv (ξ , . . . , ξ ) have the same sign, hence the notion of orientation equivalence in the common tangent space Tϕ(u) = Tϕ(v) is well-defined. As with the vector space Rm , orientation

424

A Course in Real Analysis

equivalence on Tϕ(u) is an equivalence relation with two equivalence classes, one containing dϕu (e1 , . . . , em ), the other containing dϕu (−e1 . . . , em ). The positive (negative) orientation of ϕ is obtained by designating the equivalence class containing dϕu (e1 , . . . , em ) to be positive (negative) for every u ∈ U . We define the sign of ϕ by ( +1 if ϕ is positively oriented, sign(ϕ) = −1 ϕ is negatively oriented. Obviously, if ϕ is one-to-one, then it is orientable. For example, a simple smooth curve ϕ : I → Rn is orientable, and since d(ϕt )(e1 ) = ϕ0 (t) = lim + ∆t→0

ϕ(t + ∆t) − ϕ(t) , ∆t

the positive orientation is the one for which the tangent vector dϕt (e1 ) is in the direction of increasing t. By contrast, the curve in Figure (12.7) is not orientable. 12.3.2 Example. Let a1 , . . . , am be linearly independent vectors in Rn and let b ∈ Rn . Define m-dimensional parameterized affine space ϕ : Rm → Rn by ϕ(u) = ϕ(u1 , . . . , um ) = b +

m X

ui ai .

i=1

x3 u2 a2

b

u1 a1 x2

x1

FIGURE 12.8: Affine space.

Since ϕ is one-to-one, it is orientable. Since ∂i ϕ = ai , the tangent space at each point is the subspace of Rn with frame (a1 , . . . , am ). ♦ 12.3.3 Example. The Cartesian product of circles ϕ(θ1 , . . . , θm ) = r1 cos θ1 , r1 sin θ1 , . . . , rm cos θm , rm sin θm , ri > 0, is a parameterized m-surface in R2m . Orientability follows from the periodicity of the sine and cosine functions. ♦

Curves and Surfaces in Rn

425

Orientation of a Parameterized (n − 1)-Surface For m = n−1, the notion of orientability may be formulated more concretely in terms of a normal vector field. 12.3.4 Lemma. Let ϕ : U → Rn be a parameterized (n − 1)-surface. Define ∂ϕ⊥ : U → Rn by ∂ϕ⊥ :=

n X i=1

(−1)i+n

∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) i e, ∂(u1 , . . . , un−1 )

where the hat indicates that ϕi is omitted in the calculation, and let ∂1 ϕ(u) .. . . A := ∂n−1 ϕ(u) ∂ϕ⊥ (u) n×n Then dϕ⊥ (u) is perpendicular to the tangent space Tϕ(u) , and |A| = k∂ϕ⊥ (u)k2 = det ϕ0 (u)t ϕ0 (u) > 0.

(12.5)

Proof. Let m = n − 1. For each j, the determinant ∂j ϕ1 (u) · · · ∂j ϕn (u) ∂1 ϕ1 (u) · · · ∂1 ϕn (u) Dj (u) := .. .. . . ∂m ϕ1 (u) · · · ∂m ϕn (u) has two identical rows and hence is zero. Expanding Dj (u) along the first row and multiplying by (−1)m yields Dj (u) = (−1)m

n X ∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) (−1)i+1 ∂j ϕi = ∂j ϕ(u) · ∂ϕ⊥ (u). ∂(u , . . . , u ) 1 n−1 i=1

Therefore, d∂j (u) · ϕ⊥ (u) = 0, so ϕ⊥ (u) is perpendicular to Tϕ(u) . To prove the first equality in (12.5), expand |A| along the last row to obtain |A| =

2 n X ∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) i=1

∂(u1 , . . . , um )

= kϕ⊥ (u)k2 > 0,

the positive inequality because ϕ0 has rank m. For the second equality in (12.5), using what has already been established

426

A Course in Real Analysis

we calculate k∂ϕ⊥ (u)k4 = |A|2 = |AAt | ∂1 ϕ(u) · ∂1 ϕ(u) · · · ∂1 ϕ(u) · ∂m ϕ(u) .. .. . . = ∂m ϕ(u) · ∂1 ϕ(u) · · · ∂m ϕ(u) · ∂m ϕ(u) 0 ··· 0 0 ⊥ 2 t 0 = k∂ϕ (u)k det ϕ (u) ϕ (u) .

0 .. .

0 ⊥ 2 k∂ϕ (u)k

12.3.5 Corollary. The frame dϕu (e1 ), . . . , dϕu (en−1 ), ∂ϕ⊥ (u) is positively oriented in Rn . 12.3.6 Theorem. Let ϕ : U → Rn be a parameterized (n − 1)-surface. The following statements are equivalent: (a) ϕ is orientable. (b) ϕ(u) = ϕ(v) ⇒ ∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c > 0. ~ ϕ : ϕ(U ) → Rn (necessarily unique) such that (c) There exists a function N ~ ϕ ϕ(u) = k∂ϕ⊥ (u)k−1 ∂ϕ⊥ (u) N 1 =q 0 det ϕ (u)t ϕ0 (u)

(12.6) n X

(−1)i+n

i=1

∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) i e. ∂(u1 , . . . , un−1 )

Proof. For u ∈ U , let Tu : Rn → Rn denote the unique linear isomorphism such that Tu (ej ) = dϕu (ej ), 1 ≤ j ≤ n − 1, and Tu en ) = ∂ϕ⊥ (u). By Lemma 12.3.4, det Tu = kϕ⊥ (u)k2 > 0. Suppose that ϕ(u) = ϕ(v). Since ∂ϕ⊥ (u) ⊥ Tϕ(u) and ∂ϕ⊥ (v) ⊥ Tϕ(v) , Tϕ(v) = Tϕ(u)

iff

∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c 6= 0.

In this case, by (12.4), (Tv−1 Tu )(ej ) = dϕv (Tv−1 Tu )(en ) = Tv−1

−1

dϕu (ej ) = Tuv (ej ), 1 ≤ j ≤ n − 1, and c∂ϕ⊥ (v) = cen .

Thus the matrix of Tv−1 Tu has columns Tuv (e1 ), . . ., Tuv (en−1 ), cen . It follows that 0 < det Tu / det Tv = det(Tv−1 Tu ) = c det Tuv . (12.7)

Curves and Surfaces in Rn

427

~ ϕ exists and let With these preliminaries out of the way, assume that N ϕ(u) = ϕ(v). Then ~ ϕ ϕ(u) = N ~ ϕ ϕ(v) = k∂ϕ⊥ (v)k−1 ∂ϕ⊥ (v), k∂ϕ⊥ (u)k−1 ∂ϕ⊥ (u) = N hence ∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c > 0. By the first paragraph, Tϕ(u) = Tϕ(v) and det Tuv > 0. Therefore, ϕ is orientable. Conversely, assume that ϕ is orientable and let ϕ(u) = ϕ(v). Then Tϕ(u) = Tϕ(v) , hence ∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c 6= 0. Since det[Tuv ] > 0, c > 0 by (12.7). Therefore, k∂ϕ⊥ (u)k−1 ∂ϕ⊥ (u) = k∂ϕ⊥ (v)k−1 ∂ϕ⊥ (v), ~ ϕ may be unambiguously defined by (12.6). so N 12.3.7 Special Cases. (a) n = 2: Then ϕ⊥ = (−ϕ02 , ϕ01 ), the inward normal. (Figure 12.9.)

(−ϕ02 , ϕ01 ) (ϕ01 , ϕ02 )

FIGURE 12.9: The inward unit normal. (b) n = 3: Then ∂ ϕ ∂ϕ⊥ = 1 2 ∂2 ϕ2

∂1 ϕ1 ∂1 ϕ3 , − ∂2 ϕ1 ∂2 ϕ3

∂1 ϕ3 ∂1 ϕ1 , ∂2 ϕ3 ∂2 ϕ1

∂1 ϕ2 = ∂1 ϕ × ∂2 ϕ, ∂2 ϕ2

the familiar cross product of ∂1 ϕ and ∂2 ϕ. dϕu (e2 )

~ ϕ (p) N

S = ϕ(U ) p dϕu (e1 )

FIGURE 12.10: Normal vector to S at p. ~ ϕ (p) is a right-handed Thus the positively oriented frame dϕu (e1 ), dϕu (e2 ), N system, as shown in Figure 12.10.

428

A Course in Real Analysis

(c) Let U ⊆ Rn−1 be open and let g : U → R be C 1 . Define ϕ(u1 , . . . , un−1 ) = u1 , . . . , un−1 , g(u1 , . . . , un−1 ) . Then ϕ(U ) is the graph of g. Since ϕ is one-to-one, it is orientable. Also, ∂j ϕ = 0, · · · , 0, 1, 0, · · · , 0, ∂j g ⊥ (−∂1 g, · · · , −∂j g, · · · , −∂n−1 g, 1 j

and, by elementary row operations, 1 ··· 0 ∂1 g 0 ··· 0 ∂2 g .. .. .. = (∂ g)2 + · · · + (∂ 2 . 1 n−1 g) + 1. . . 0 ··· 1 ∂n−1 g −∂1 g · · · −∂n−1 g 1 Since this is positive, by uniqueness, (−∂1 g, · · · , −∂n−1 g, 1

(−∇g, 1 ~ϕ ◦ ϕ = p N =p . (∂1 g)2 + · · · + (∂n−1 g)2 + 1 k∇gk2 + 1

♦

12.3.8 Example. Let r > 0 and define ϕ(θ1 , θ2 ) = (r sin θ1 cos θ2 , r sin θ1 sin θ2 , r cos θ1 ), θ1 ∈ (0, π), θ2 ∈ (0, 2π). The image of ϕ is the sphere in R3 with radius r and center (0, 0, 0) and with the great circle (r sin θ1 , 0, r cos θ1 ) (that is, θ2 = 0) through the poles (0, 0, ±r) missing. Since ∂1 ϕ(θ1 , θ2 ) = r(cos θ1 cos θ2 , cos θ1 sin θ2 , − sin θ1 ) and ∂2 ϕ(θ1 , θ2 ) = r(− sin θ1 sin θ2 , sin θ1 cos θ2 , 0), by 12.3.7 ∂ϕ⊥ (θ1 , θ2 ) = ∂1 ϕ(θ1 , θ2 ) × ∂2 ϕ(θ1 , θ2 ) = r sin θ1 r sin θ1 cos θ2 , r sin θ1 sin θ2 , r cos θ1

= (r sin θ1 )ϕ(θ1 , θ2 ). Therefore, ~ ϕ ◦ ϕ)(θ1 , θ2 ) = (N that is,

ϕ(θ1 , θ2 ) = r−1 ϕ(θ1 , θ2 ), kϕ(θ1 , θ2 )k

~ ϕ (p) = p , N kpk

p ∈ S.

♦

Curves and Surfaces in Rn

429

x2 ψ1 (t), ψ2 (t) cos θ, ψ2 (t) sin θ ψ(t)

θ x1 x3 FIGURE 12.11: Surface of revolution. 12.3.9 Example. Let I be an open interval and ψ : I → R2 a smooth curve with ψ2 (t) > 0 for t ∈ I. The parameterized surface of revolution in R3 is defined by ϕ(t, θ) = (ψ1 (t), ψ2 (t) cos θ, ψ2 (t) sin θ), t ∈ I, θ ∈ R. From (12.3.7) and the calculations ∂1 ϕ(t, θ) = ψ10 (t), ψ20 (t) cos θ, ψ20 (t) sin θ , ∂2 ϕ(t, θ) = 0, −ψ2 (t) sin θ, ψ2 (t) cos θ , we have and

∂ϕ⊥ (t, θ) = ψ2 (t) ψ20 (t), −ψ10 (t) cos θ, −ψ10 (t) sin θ

(12.8)

∂ψ ⊥ (t) = (−ψ20 (t), ψ10 (t)).

Now suppose that ψ is orientable. We claim that ϕ is then orientable. To see this, suppose that ϕ(t1 , θ1 ) = ϕ(t2 , θ2 ). Then ψ1 (t1 ) = ψ1 (t2 ), and because ψ2 (t) > 0, ψ2 (t1 ) = ψ2 (t2 ) and hence θ2 = θ1 + 2kπ. By orientability of ψ, (−ψ20 (t2 ), ψ10 (t2 )) = ∂ψ ⊥ (t2 ) = c∂ψ ⊥ (t1 ) = c(−ψ20 (t1 ), ψ10 (t1 )) for some c > 0. It follows from (12.8) that ∂ϕ⊥ (t2 , θ2 ) = ψ2 (t2 ) ψ20 (t2 ), −ψ10 (t2 ) cos θ2 , −ψ10 (t2 ) sin θ2

= cψ2 (t1 ) ψ20 (t1 ), −ψ10 (t1 ) cos θ1 , −ψ10 (t1 ) sin θ1 = c∂ϕ⊥ (t1 , θ1 ), which shows that ϕ is orientable. Moreover, from (12.8), k∂ϕ⊥ (t, θ)k = ψ2 (t)kψ 0 (t)k,

430

A Course in Real Analysis

hence ⊥ ~ ϕ (ϕ(t, θ)) = ∂ϕ (t, θ) = kψ 0 (t)k−1 ψ20 (t), −ψ10 (t) cos θ, −ψ10 (t) sin θ , N ⊥ k∂ϕ (t, θ)k

which is the rotation of the unit normal vector −Nψ about the x1 axis. For the special case ψ(x) = x, f (x) , ϕ(x, θ) = (x, f (x) cos θ, f (x) sin θ) and

~ ϕ (ϕ(x, θ)) = [f 0 (x)]2 + 1 N

−1/2

f 0 (x), − cos θ, − sin θ .

A point (x, y, z) on the surface S = ϕ(U ) and not on the graph of f may be written uniquely as x, f (x) cos(θ(y, z)), f (x) sin θ(y, z) where 0 < θ(y, z) < 2π is the (continuous) argument of (y, z) determined by θ0 = 0 (see 9.4.6). Therefore, ~ ϕ (x, y, z) = [f 0 (x)]2 + 1 −1/2 f 0 (x), − cos θ(y, z) , − sin θ(y, z) , N which is continuous on S by the periodicity of sine and cosine.

♦

12.3.10 Example. The parameterized Möbius strip is defined by ϕ(t, θ) = 2 + t cos 12 θ cos θ, 2 + t cos 12 θ sin θ, t sin 12 θ , where −1 < t < 1 and θ ∈ R. The surface may be concretely realized by taking one end of a long strip of paper, giving it a half-twist, and gluing it to the other end.

FIGURE 12.12: Möbius strip. The Möbius strip is not orientable. Indeed, ϕ(0, 0) = ϕ(0, 2π), but since ∂1 ϕ(0, 0) = −∂1 ϕ(0, 2π) = (1, 0, 0) and ∂2 ϕ(0, 0) = ∂2 ϕ(0, 2π) = (0, 1, 0), we see that ∂1 ϕ(0, 0) × ∂2 ϕ(0, 0) = (0, 0, 1) = −∂1 ϕ(0, 2π) × ∂2 ϕ(0, 2π). ~ ϕ cannot exist. Therefore, N

♦

Curves and Surfaces in Rn

431

Exercises 1. Assuming that R3 has the standard orientation, find the sign of the frames (a)S (e1 + e2 , e2 + e3 , e3 + e1 ). (b) (−e1 + e2 + e3 , e1 − e2 + e3 , e1 + e2 − e3 ). 2. Show that the frames (e1 + e2 + e3 , 2e1 + e2 + 3e3 ) and (e1 + 3e2 − e3 , e1 + 4e2 − 2e3 ) in R3 span the same subspace but have opposite orientations. 3. Let ϕ : U → Rn be a parameterized m-surface and ψ = ϕ ◦ α : V → Rn a reparametrization of ϕ. Show that ϕ is orientable iff ψ is orientable. 4. Let ϕ : U → Rn be an orientable parameterized (n − 1)-surface and let ψ = ϕ ◦ α : V → Rn be a reparametrization of ϕ. Find ∂ψ ⊥ in terms of ∂ϕ⊥ . Use the result to show that Nψ = Nϕ on S := ϕ(U ) = ψ(V ). ~ ϕ (x, y, z) for the torus 5.S Use 12.3.9 to find N ϕ(φ, θ) = a cos φ, (b + a sin φ) cos θ, (b + a sin φ) sin θ , 0 < θ, φ < 2π, where 0 < a < b. ~ ϕ (x, y, z) for the following orientable 2-surfaces in R3 : 6. Find N (a)S ϕ(t, θ) = (t cos θ, t sin θ, t), t > 0, θ ∈ R. (b) ϕ(t, θ) = (sinh t, cosh t cos θ, cosh t sin θ), t, θ ∈ R (hyperboloid of one sheet). (c) ϕ(t, θ) = (cosh t, sinh t cos θ, sinh t sin θ), t, θ ∈ R (one sheet of a hyperboloid of two sheets). (d) ϕ(t, θ) = (t cos θ, t sin θ, θ), t > 0, θ ∈ R (helicoid). (e) ϕ(t, θ) = (t cos θ, t sin θ, θ2 ), t > 0, θ > 0. (f)S ϕ(t, s) = (1 − s) a cos t, a sin t, 0 + s b cos t, b sin t, 1), 0 < s < 1, where 0 < a < b. 7.S Let V ⊆ Rn−2 be open and let ψ : V → Rn−1 be an (n−2)-parameterized surface in Rn−1 . Define the cylinder ϕ over ψ by ϕ(v, s) = ψ(v), s , v ∈ V, s ∈ (a, b). Show that

432

A Course in Real Analysis (a) ϕ is a parameterized (n − 1)-surface in Rn . (b) ∂ϕ⊥ (u1 , . . . , un−1 ) = ∂ψ ⊥ (u1 , . . . , un−2 ), 0 . (c) ϕ is orientable iff ψ is orientable, in which case Nϕ (x1 , . . . , xn ) = Nψ (x1 , . . . , xn−1 ), 0 .

8. Let V ⊆ Rn−2 be open and let ψ : V → Rn−1 be an (n−2)-parameterized surface in Rn−1 . Define the cone over ψ by ϕ(v, s) = (1 − s)ψ(v), s , v ∈ V, 0 < s < 1. Show that (a) ϕ is a parameterized (n − 1)-surface in Rn . (b) ∂ϕ⊥ (v, s) = (1 − s)n−2 ∂ψ ⊥ (v), D(v, s) , where (1 − s)a1,1 ··· (1 − s)a1,n−2 (1 − s)a2,1 · · · (1 − s)a2,n−2 D(v, s) = .. .. . . (1 − s)an−1,1

···

(1 − s)an−1,n−2

−ψ1 (v) −ψ2 (v) .. .

−ψn−1 (v)

and [ai,j ](n−1)×(n−2) = ψ 0 (v).

12.4

m-Dimensional Surfaces

Let 1 ≤ m < n and let V ⊆ Rn be open. Suppose that the function F = (F1 , . . . , Fn−m ) : V → Rn−m is C 1 on V such that the (n − m) × n matrix F 0 (x) has rank n − m at each point x ∈ V . A set of the form S = {x ∈ V : F (x) = c} , where c ∈ Rn−m , is called an m-dimensional level surface of F or simply an m-surface in Rn . By replacing F by F − c, we may (and hereafter shall) take c = 0.

Local Parametrization of an m-Surface The following theorem shows that an m-surface may be “patched together” from a collection of one-to-one parameterized m-surfaces. This will be an important tool in the development of a theory of integration on m-surfaces.

Curves and Surfaces in Rn

433

12.4.1 Theorem. Let S = {x ∈ V : F (x) = 0} be an m-surface in Rn . (a) For each a ∈ S there exist open sets Ua ⊆ Rm and Va ⊆ Rn with a ∈ Va , and a one-to-one parameterized m-surface ϕa from Ua onto Sa := S ∩ Va . 1 (b) Each ϕ−1 a is the restriction to Sa of a C map on Va .

(c) If Sa ∩ Sb 6= ∅, then the mapping −1 −1 ϕab := ϕ−1 b ◦ ϕa : ϕa (Sa ∩ Sb ) → ϕb (Sa ∩ Sb )

is C 1 with inverse ϕba . (d) The mappings ϕa may be chosen so that 0 ∈ Ua and ϕa (0) = a. Proof. If (a)–(c) of the theorem hold and a = ϕa (u0 ), then (d) may be achieved by replacing Ua by Ua − u0 and ϕa by ϕa (u + u0 ), u ∈ Ua − u0 . We prove (a)–(c) first for the case m = n − 1, that is, for F real-valued, and then outline the proof for the general case. Since F has rank 1, ∂i F (a) 6= 0 for some index i (which typically depends on a). Define a C 1 map Ga : V → Rn by Ga (x1 , . . . , xn ) = x1 , . . . , xi−1 , F (x1 , . . . , xn ), xi+1 , . . . xn . Thus Ga simply replaces the ith coordinate of its argument x by F (x). Note that G0a (x) is the identity matrix with row i replaced by ∇F (x). A standard row reduction shows that JGa (a) = ∂i F (a). Since this is nonzero, by the inverse function theorem there exist open sets Va ⊆ V and Wa = Ga (Va ) in 1 Rn with a ∈ Va such that Ga is one-to-one on Va and G−1 a : Wa → Va is C . Taking smaller Wa and Va if necessary, we may suppose that Wa = (α1 , β1 ) × · · · × (αn , βn ). Note that 0 ∈ (αi , βi ), since a1 , . . . , ai−1 , 0, ai+1 , . . . , an = Ga (a) ∈ Wa . Now let (u1 , . . . , un ) ∈ Wa and set (v1 , . . . , vn ) = G−1 a (u1 , . . . , un ). Then (u1 , . . . , ui , . . . , un ) = Ga (v1 , . . . , vn ) = v1 , . . . , vi−1 , F (v1 , . . . , vn ), vi+1 , . . . , vn

= u1 , . . . , ui−1 , (F ◦ G−1 a )(u1 , . . . , un ), ui+1 , . . . , un , hence

(F ◦ G−1 a )(u1 , . . . , un ) = ui

(12.9)

(F ◦ G−1 a )(u1 , . . . , ui−1 , 0, ui+1 , . . . , un ) = 0.

(12.10)

and, in particular,

Now set Ua := (α1 , β1 ) × · · · × (αi−1 , βi−1 ) × (αi+1 , βi+1 ) × · · · × (αn , βn )

434

A Course in Real Analysis

and define ϕa : Ua → Rn by ϕa (u1 , . . . , un−1 ) = G−1 a (u1 , . . . , ui−1 , 0, ui , . . . , un−1 ). By (12.10), F ϕa (u1 , . . . , un−1 ) = 0, hence ϕa (Ua ) ⊆ Sa . Conversely, by (12.9), (v1 , . . . , vn ) ∈ Sa ⇒ ui = (F ◦ G−1 a )(u1 , . . . , un ) = F (v1 , . . . , vn ) = 0 ⇒ (v1 , . . . , vn ) = G−1 a (u1 , . . . , ui−1 , 0, ui+1 , . . . un−1 ) = ϕa (u1 , . . . , un−1 ). Therefore, ϕa (Ua ) = Sa .

Sa

Ua

Wa

S

ϕa

G−1 a

Va

FIGURE 12.13: The mapping G−1 a . Now define the injection mapping ιa : Ua → Wa and the projection mapping πa : Va → Rn−1 , respectively, by ιa (u1 , . . . , un−1 ) = (u1 , . . . , ui−1 , 0, ui , . . . , un−1 ) and πa (v1 , . . . , vn ) = (v1 , . . . , vi−1 , vi+1 , . . . , vn ). −1 Then πa ◦ ιa : Ua → Ua is the identity function and ϕa = G−1 a ◦ ιa . Since Ga has rank n and ιa has rank n − 1, ϕa has rank n − 1. Also, if v = ϕa (u), then

(πa ◦ Ga )(v) = (πa ◦ Ga ◦ ϕa )(u) = πa ◦ ιa (u) = u = ϕ−1 a (v), 1 which shows that ϕ−1 a : Sa → Ua is the restriction to Sa of the C function πa ◦ Ga : Va → Ua . Now let b ∈ S and Sa ∩ Sb = 6 ∅. Then Gb ◦ G−1 a maps the open set Ga (Va ∩ Vb ) onto the open set Gb (Va ∩ Vb ). Also, in the preceding notation, −1 ϕa = G−1 a ◦ ιa on Ua and ϕb = πb ◦ Gb on Sb , hence −1 ϕ−1 b ◦ ϕa = πb ◦ Gb ◦ Ga ◦ ιa , −1 which maps the open set ϕ−1 a (Sb ∩ Sa ) ⊆ Ua onto the open set ϕb (Sb ∩ Sa ) ⊆ Ub and is C 1 with C 1 inverse ϕ−1 a ◦ ϕb . This verifies the theorem for the case m = n − 1.

Curves and Surfaces in Rn

435

In the general case, there exist indices i1 < · · · < ik in {1, . . . , n} such that ∂(F1 , . . . Fk ) (a) 6= 0, ∂(ui1 , . . . , uik ) where k := n − m. Let i01 < i02 < · · · < i0m denote the complementary indices. (In the above case, these were the indices 1, . . . , i − 1, i + 1, . . . , n.) Define Ga (x1 , . . . , xn ) to be the n-tuple (x1 , . . . , xn ), with the coordinates xi1 , . . . , xik replaced by F1 (x), . . ., Fk (x). Then JGa (a) 6= 0, so the sets Va and Wa may be obtained as before. Define Ua = (αi01 , βi01 ) × · · · × (αi0m , βi0m ) → Rn and the injection mapping ιa : Ua → Wa by ιa (u1 , u2 , . . . , um ) = (v1 , v2 , . . . , vn ), where vij = 0, 1 ≤ j ≤ k, and vi0j = uj , 1 ≤ j ≤ m. Thus ιa places zeros in the coordinate positions i1 < · · · < ik and fills the complementary positions by u1 , . . . , um . Finally, define the projection mapping πa : Va → Rn−1 by πa (v1 , . . . , vn ) = (vi01 , . . . , vi0m ). The proof then proceeds as before.

S Sa ϕa

b

a

Sb ϕab

Ua

ϕb

Ub ϕba

FIGURE 12.14: Transition mappings. The functions ϕa : Ua → Sa in the theorem are called local parametrizations of S, and the C 1 functions ϕab are called transition mappings. The sets Sa are called surface elements. A collection of local parameterizations of S whose surface elements cover S is called an atlas for S. Note that if F is C r then, as an examination of the proof reveals, the local parameterizations and the transition maps are C r as well.

436

A Course in Real Analysis

12.4.2 Example. Consider the (n − 1)-sphere S := {y ∈ Rn : kyk = 1} with north and south poles p := (0, . . . , 0, 1) and q := (0, . . . , 0, −1). Let the points y = (y1 , . . . , yn ) and x = (x1 , . . . , xn−1 ) be related as in Figure 12.15. p = (0, . . . , 0, 1) Rn S y 0

Rn−1

(x, 0)

q = (0, . . . , 0, −1)

FIGURE 12.15: Stereographic projection from p. Then for some t, (x1 , . . . , xn−1 , −1) = (x, 0) − p = t(y − p) = ty1 , . . . tyn−1 , t(yn − 1) , hence (x1 , . . . , xn−1 ) =

1 (y1 , . . . , yn−1 ), −1 ≤ yn < 1. 1 − yn

The mapping x = ϕ−1 (y) =

1 (y1 , . . . , yn−1 ), yn < 1, 1 − yn

from S \ {p} onto Rn−1 , is called the stereographic projection from p onto the equatorial hyperplane xn = 0. One readily checks that the inverse of this mapping is given by 1 y = ϕ(x) = 2x1 , . . . , 2xn−1 , kxk2 − 1 , x ∈ Rn−1 . 2 1 + kxk Similarly, the stereographic projection from q is given by x = ϕ˜−1 (y) =

1 (y1 , . . . , yn−1 ), yn > −1 1 + yn

with inverse y = ϕ(x) ˜ =

1 2x1 , . . . , 2xn−1 , 1 − kxk2 . 2 1 + kxk

The set {ϕ, ϕ} ˜ is an atlas for S. The transition mapping from Rn−1 \ {0} to n−1 R \ {0} is the self-inverse mapping x (ϕ−1 ◦ ϕ)(x) ˜ = . ♦ kxk2

Curves and Surfaces in Rn

437

Tangent Space of an m-Surface The local parameterizations ϕa of an m-surface S = {x : F (x) = 0} may be used to construct a tangent space at each point a ∈ S. Let ϕa (u) = ϕb (v) ∈ Sa ∩ Sb . Then v := ϕab (u) and, by the chain rule applied to ϕa = ϕb ◦ ϕab , d(ϕa )u = d(ϕb )v ◦ d(ϕab )u . Since d(ϕab )u : Rm → Rm is an isomorphism, the vectors dj := dϕab )u (ej ) form a basis of Rm . Therefore, we have the mapping of Rm -frames d(ϕa )u (e1 , . . . , em ) = d(ϕb )v (d1 , . . . , dm ),

(12.11)

which shows that Tϕb (v) = Tϕa (u) and hence makes the following definition meaningful. 12.4.3 Definition. The tangent space Tx to S at a point x ∈ S is defined as Tϕa (u) , where ϕa is any local parametrization of S with ϕa (u) = x. ♦ The next proposition gives an intrinsic characterization of tangent space. 12.4.4 Proposition. For x ∈ S let Λx denote the set of all vectors in Rn of the form α0 (0), where α : (−r, r) → S is a C 1 curve with α(0) = x. Then Tx = Λx = {z ∈ R : dFx (z) = 0} = n

n−m \

{z ∈ Rn : ∇Fi (x) · z = 0} .

i=1

Proof. Let ϕ be a local parametrization of S with ϕ(u) = x. A member of Tx m X is of the form z = ai dϕu (ei ). For small |t|, the curve i=1

m X α(t) = ϕ u + t ai ei i=1

lies in S, α(0) = x, and, by the chain rule, α0 (0) = dϕu

X m i=1

ai ei

=

m X

ai dϕ)u (ei ) = z.

i=1

Therefore, z ∈ Λx . On the other hand, if α0 (0) ∈ Λx , then differentiating the identity (F ◦ α)(t) = 0 at t = 0 yields dFx α0 (0) = 0. We have shown that Tx ⊆ Λx ⊆ {z : dFx (z) = 0} . Since dFx (z) = 0 has dimension m, the three spaces must be equal.

438

A Course in Real Analysis

12.4.5 Remark. The proposition shows that if S1 is an m1 -surface, S2 is an m2 -surface, and ψ : S1 → S2 is C 1 , then for x ∈ S1 and y = ψ(x) the function dψx maps Tx into Ty . Indeed, if v ∈ Tx , then there exists a smooth curve α1 : (−1, 1) → S1 with α1 (0) = x and α10 (0) = v. Then α2 =: ψ ◦ α1 is a smooth curve in S2 and dψx (v) = (ψ ◦ α1 )0 (0) = α20 (0) ∈ Ty . ♦

dψx x S1

v α1

y

α2 S2

ψ FIGURE 12.16: The mapping dψx : Tx → Ty .

Orientation of an m-Surface Let S be an m-surface with local parameterizations ϕa : Ua → Rn . Since ϕa is one-to-one, it is orientable. Suppose the parameterizations have the same orientation, that is, sign(ϕa ) = sign(ϕb ) for all a and b. If u ∈ Ua , v ∈ Ub , and ϕa (u) = ϕb (v), then (12.11) shows that the orientation of Tϕb (v) agrees with that of Tϕa (u) iff Jϕab (v) > 0. Thus if Jϕab > 0 whenever Sa ∩ Sb 6= ∅, then S may be given a well-defined orientation via the orientations of the local parameterizations. In this case, S is said to be orientable. The positive orientation is obtained if each local parametrization is positively oriented.

Orientation of an (n − 1)-Surface Orientability of an (n − 1)-surface may be characterized in terms of the ~ ϕ . For this we need the following lemma, which relates normal vector fields N a ~ ϕ and N ~ ϕ on overlapping surface elements. N a b 12.4.6 Lemma. Let x := ϕa (u) = ϕb (v) ∈ Sa ∩ Sb , where u ∈ Ua , v ∈ Ub . Then ~ ϕ (x) = |Jϕ (u)|−1 Jϕ (u)N ~ ϕ (x) = sign Jϕ (u) N ~ ϕ (x). N a ab ab b ab b Proof. Since ϕa = ϕb ◦ ϕab and v = ϕab (u), the chain rule implies that ϕ0a (u) = ϕ0b (v)ϕ0ab (u) and ∂(ϕa,1 , . . . , ϕ d ∂(ϕb,1 , . . . , ϕ d a,i , . . . , ϕa,n ) b,i , . . . , ϕb,n ) (u) = (v)Jϕab (u). ∂(u1 , . . . , un−1 ) ∂(v1 , . . . , vn−1 )

Curves and Surfaces in Rn

439

From the first equation, q q det ϕ0a (u)t ϕ0a (u) = det ϕ0ab (u)t ϕ0b (v)t ϕ0b (v)ϕ0ab (u) q = |Jϕab (u)| det ϕ0b (v)t ϕ0b (v) . The assertion now follows by recalling that ~ ϕ (x) = q N a

n X ∂(ϕa,1 , . . . , ϕ d a,i , . . . , ϕa,n ) (−1)i+n (u) ∂(u , . . . , un−1 ) 1 det ϕ0a (u)t ϕ0a (u) i=1

1

and ~ ϕ x) = q N b

n X

1 det ϕ0b (v)t ϕ0b (v)

i=1

(−1)i+n

∂(ϕb,1 , . . . , ϕ d b,i , . . . , ϕb,n ) (v). ∂(v1 , . . . , vn−1 )

12.4.7 Theorem. An (n − 1)-surface S is orientable iff there exists a contin~ on S such that uous vector field N ~ = N ~ ϕ for each a ∈ S. N (12.12) a Sa ~ϕ = N ~ ϕ on Proof. If S is orientable, then Jϕab > 0, hence, by 12.4.6, N a b ~ ~ Sa ∩ Sb . Therefore, (12.12) defines N unambiguously. Since Nϕa is easily seen ~ is continuous on S. to be continuous on Sa and Sa is relatively open in S, N ~ on S that Conversely, assume there exists a continuous vector field N satisfies (12.12). If x = ϕa (u) ∈ Sa ∩ Sb , then ~ ϕ (x) = N ~ (x) = N ~ ϕ (x), N a b hence, by 12.4.6, Jϕab (u) > 0. Therefore, S is orientable. Let S be orientable with positive orientation. Then, by definition, the frame d(ϕa )u (e1 ), . . . , d(ϕa )u (en−1 )) in Ta is designated as positive (sign(ϕa ) > 0) for each a ∈ S. Since the frame ~ (a) d(ϕa )u (e1 ), . . . , d(ϕa )u (en−1 ), N ~ . The in Rn is positive (12.3.5), we say in this case that S is oriented by N ~ notion of orientation by −N is defined analogously. For example, the sphere S = {(x1 , . . . , xn ) : kxk = r} is locally parameterized by the mappings ϕ and ϕ˜ of 12.3.8. The positive orientation is given by the unit normal vector field ~ (p) = kpk−1 p, called the outward unit normal. N 12.4.8 Corollary. If S = {x : F (x) = 0} is connected, then S is orientable and ~ = k∇F k−1 ∇F or N ~ = −k∇F k−1 ∇F. N

440

A Course in Real Analysis

~ implies Proof. Since ∇F (x) is perpendicular to S at x, the uniqueness of N that ~ (x) = s(x) ∇F (x) , x ∈ S, N k∇F (x)k where s(x) = ±1 is constant on each surface element. Since the surface elements are open in S, s(x) is continuous. Since S is connected, s(x) must be constant on S.

(n − 1)-Surfaces-with-Boundary To discuss surfaces-with-boundary, we shall need the following notation: Rn−1 := y ∈ Rn−1 : yn−1 > 0 . + Hn−1 := y ∈ Rn−1 : yn−1 ≥ 0 . ∂Hn−1 := y ∈ Rn−1 : yn−1 = 0 . 12.4.9 Definition. An (n − 1)-surface-with-boundary is a subset of Rn of the form S = {x ∈ W : F (x) = 0 and gi (x) ≥ 0, i = 1, . . . , k} , where W ⊆ Rn is open and F : W → R and gi : W → R are C 1 and satisfy the following conditions: (a) ∇F (x) 6= 0 for all x ∈ S. (b) The sets Bi := {x ∈ S : gi (x) = 0} are pairwise disjoint. (c) For each i and x ∈ Bi , the vectors ∇F (x) and ∇gi (x) are linearly independent. Sk The set ∂S:= i=1 Bi is called the boundary of S and S \ ∂S is the interior.♦

x3

B2 : g2 (x) := 1 − x3 = 0

∇g1

∇F ∇g2

x1

F (x) := x21 + x22 − 1 = 0

x2 B1 : g1 (x) := x3 = 0 ∇F

FIGURE 12.17: Cylinder-with-boundary: x21 + x22 = 1, 0 ≤ x3 ≤ 1.

Curves and Surfaces in Rn

441

If V denotes the open set {x ∈ W : gi (x) > 0, i = 1, . . . , k}, then S \ ∂S = {x ∈ V : F (x) = 0} . Therefore, condition (a) implies that the interior of S is an (n − 1)-surface. Conditions (b) and (c) assert that the boundary of S is made up of disjoint (n − 2)-surfaces. Indeed, if Fi := (F, gi ), then Bi = {x ∈ W : Fi = 0} and ∇F 0 Fi = ∇gi has rank 2. Also, because the (n − 2)-surfaces Bi are pairwise disjoint, a local parametrization of Bi may be chosen to be disjoint from a local parametrization of Bj . The following theorem shows that, as in the case of an (n − 1)-surface, an (n − 1)-surface-with-boundary may be described by a collection of local parameterizations. 12.4.10 Theorem. Let S be an (n − 1)-surface-with-boundary. (a) If a ∈ S \ ∂S, then there exists a local parametrization ϕa : Ua → Rn of S \ ∂S at a with ϕa (0) = a. ˜a ⊆ Rn−1 and a one-to-one (b) If a ∈ ∂S, then there exists an open set U ˜a → Rn−1 with ϕ˜a (0) = a such that parameterized (n − 1)-surface ϕ˜a : U n−1 ˜a ∩ H if Ua := U and ϕa := ϕ˜a U , then a (i) ϕa Ua is open in S, (ii) ϕa Ua ∩ Rn−1 is open in S \ ∂S, and + n−1 (iii) ϕa Ua ∩ ∂H is open in ∂S. ϕa Ua ∩ ∂Hn−1 ϕ˜a U˜a +

∂S S

a ϕa Ua ∩ Rn−1 +

˜a ∩ Hn−1 . FIGURE 12.18: Surface element Sa = ϕa U Proof. Part (a) follows from 12.4.1, since S \ ∂S is an (n − 1)-surface without boundary. For part (b), we may assume without loss of generality that ∂S = {x ∈ S : g(x) = 0}. Choose a local parametrization ψa : Wa → Rn of

442

A Course in Real Analysis

S 0 := {x ∈ W : F (x) = 0} such that ψa (0) = a. Since ψa has rank n − 1 and g has rank 1, ∂i (g ◦ ψa ) 0) 6= 0 for some i. Define Ha : Wa → Rn−1 by Ha (w1 , . . . , wn−1 ) = w1 , . . . , wi−1 , wi+1 , . . . , wn−1 , g ◦ ψa (w1 , . . . , wn−1 ) . Then Ha has rank n − 1 at 0, hence, by the inverse function theorem, there ˜ a ⊆ Wa and U ˜a = Ha (W ˜ a ) in Rn−1 with 0 ∈ W ˜ a such that exist open sets W −1 1 ˜ ˜ ˜ Ha is one-to-one on Wa and Ha : Ua → Wa is C . Set ˜a → S 0 . ϕ˜a = ψa ◦ Ha−1 : U ˜a , then g ◦ ψa (w) = g ◦ ψa ◦ Ha−1 (u) = g ◦ ϕ˜a (u), hence, by If u = Ha (w) ∈ U definition of Ha , (u1 , . . . , un−1 ) = w1 , . . . , wi−1 , wi+1 , . . . , wn−1 , g ◦ ϕ˜a (u) . Therefore, un−1 = g ◦ ϕ˜a (u), so ˜a ∩ Rn−1 and g ◦ ϕ˜a (u) = 0 iff u ∈ U ˜a ∩ ∂ Hn−1 . g ◦ ϕ˜a (u) > 0 iff u ∈ U + It follows that ˜a ∩ Rn−1 = (S \ ∂S) ∩ ψ W ˜ a and ϕ˜a U ˜a ∩ ∂Hn−1 = ∂S ∩ ψ W ˜a . ϕ˜a U + ˜ a is open in S 0 and S 0 ⊇ S, (i)–(iii) follow. Since ψ W

Oriented (n − 1)-Surfaces-with-Boundary As in the non-boundary case, orientation of an (n−1)-surface-with-boundary S may be defined in terms of local parameterizations. By 12.4.4, the (n − 1)dimensional tangent space at a ∈ S is TaS = {z ∈ Rn : z · ∇F (a) = 0} . The new feature here is that if a ∈ ∂S, say a ∈ Bi , then there is also an (n − 2)-dimensional tangent space to ∂S at a, namely, Ta∂S = {z ∈ Rn : z · ∇F (a) = z · ∇gi (a) = 0} . The connection between TaS and Ta∂S is described as follows: Let ϕa be a local parametrization of S as described in part (b) of 12.4.10, where ϕa (0) = a. Since ϕ˜a (Ua ∩ ∂Hn−1 ) ⊆ ∂S and (e1 , . . . , en−2 ) is a frame for ∂Hn−1 , d(ϕ˜a )0 (e1 , . . . , en−2 ) is a frame for Ta∂S . Since the vector d(ϕ˜a )0 (−en−1 ) is not in the subspace Ta∂S , d(ϕ˜a )0 (−en−1 , e1 , . . . , en−2 )

(12.13)

is a frame for TaS . The induced orientation of ∂S is obtained by declaring the frame d(ϕ˜a )0 (e1 , . . . , en−2 ) of Ta∂S to have the sign of the frame (12.13). If S is positively oriented, then this sign is (−1)n−1 .

Curves and Surfaces in Rn

443 TaS

R+ n−1

d(ϕa )0 (−e

Hn−1 d(ϕa )0 (e1 )

Ua 0 −en−1

∂Hn−1

n=3

)

∂S

a

Ta∂S

→ − N ϕ (a) ϕa

S

FIGURE 12.19: Induced orientation of Ta∂S . Figure 12.19 depicts the case n = 3. Here, S is oriented by the normal ~ (pointing outward). Therefore, by definition, the frame d(ϕa )0 (e1 , e2 ) is N positive in TaS , hence so is the frame d(ϕ˜a )0 (−e2 , e1 ). Thus, again by definition, the frame d(ϕ˜a )0 (e1 ) of Ta∂S is positive in the induced orientation. Note that ~ ϕ ) in R3 is positive (12.3.5), so because the frame (d(ϕa )0 (e1 ), d(ϕa )0 (e2 ), N ~ ϕ ). The latter therefore forms a rightis the frame (d(ϕa )0 (−e2 ), d(ϕa )0 (e1 ), N 3 handed system in R . Thus if d(ϕ˜a )0 (−e2 ) points upward, then d(ϕ˜a )0 (e1 ) must point in the direction shown. Therefore, the induced orientation of ∂S is the one for which the surface S is on the left when ∂S is traversed in the direction of the tangent vectors d(ϕ˜a )0 (e1 ).

Exercises 1. Let 0 < a < b. Show that the mapping ϕ(φ, θ) = a cos φ, (b + a sin φ) cos θ, (b + a sin φ) sin θ , 0 < θ, φ < 2π, p 2 is a local parametrization of the torus x2 + y 2 + z 2 − b = a2 with two circles missing. 2. Let U = x ∈ Rn−1 : kxk < 1 and define a local parametrization ψ : U → S n−1 = {y ∈ Rn : kyk = 1} by p ψ(x) = x, 1 − kxk2 , x ∈ Rn−1 Give a geometric description of ψ. Referring to 12.4.2, find the transition mapping ϕ˜−1 ◦ ψ. 3.S Consider the stereographic projection ϕ−1 1 (y) = x from p onto the hyperplane xn = −1 shown in Figure 12.20, where y = (y1 , . . . , yn ) and x = (x1 , . . . , xn−1 ). Calculate ϕ1 (x) and ϕ−1 1 (y) and find the transition mapping ϕ−1 ◦ ϕ1 , where ϕ is the mapping of 12.4.2.

444

A Course in Real Analysis

p = (0, . . . , 0, 1) S

Rn y

0

q = (0, . . . , 0, −1)

(x, −1)

FIGURE 12.20: Stereographic projection ϕ−1 1 (y) from p. 4. Replace the sphere in 12.4.2 by the elliptic paraboloid ) ( 2 2 y2 y1 + , y3 < 1 S = (y1 , y2 , y3 ) : y3 = a1 a2 (with p = (0, 0, 1)) and find the corresponding maps ϕ and ϕ−1 . 5.S Repeat Exercise 4 using the elliptic cone ( ) 2 2 y1 y2 2 S = (y1 , y2 , y3 ) : y3 = + , 0 < y3 < 1 . a1 a2 6. Repeat Exercise 4 using the ellipsoid ( ) 2 2 y1 y2 S = (y1 , y2 , y3 ) : + + y32 = 1 . a1 a2 7. Find the equation of the tangent plane Ta at a = (1, 1, 1) for each of the following surfaces: (a)S x21 + 2x22 + 3x23 = 6. (b) x21 + x22 − 2x23 = 0. (c) x21 − x22 + x3 = 1. 8. An n × n matrix A is said to be orthogonal if At A is the identity matrix. Identifying a 2 × 2 matrix [ xx13 xx24 ] with the point (x1 , x2 , x3 , x4 ), show that the collection of all 2 × 2 orthogonal matrices is a 1-surface S in R4 . Characterize the matrices in the tangent space to S at each of the following points: √ √ 1 0 −1 0 0 1 1/√2 −1/√2 (a) . (b) . (c) . (d) . 0 1 0 1 1 0 1/ 2 1/ 2 The matrices in the tangent space at the point in part (a) are the so-called 2 × 2 skew-symmetric matrices.

Curves and Surfaces in Rn

445

9. Referring to 12.4.2, let y ∈ S and set T := d(ϕ−1 )y : Ty → Rn−1 . 1 (a)S Prove that kT (v)k = kvk for all v ∈ Ty . (1 − yn ) (b) Use (a), the bilinearity of v · w and T (v) · T (w), and the identity 2v · w = kv + wk2 − kvk2 − kwk2 to prove that T (v) · T (w) v·w = , v, w ∈ Ty . kvkkwk kT (v)kkT (w)k Thus, by 12.4.4, the stereographic projection preserves the angle at the intersection of a pair of simple smooth curves on S. 10. Let each of the following 2-surfaces-with-boundary be positively oriented. Find parametrizations of the boundary curves that are compatible with the induced orientation on the boundary. (a) S = (x1 , x2 , x3 ) : x21 + x22 = 1, 0 ≤ x3 ≤ 2 − x2 . (b) S = (x1 , x2 , x3 ) : x3 = x21 + x22 , 0 ≤ x3 ≤ 1 − x1 − x2 . (c) S = (x1 , x2 , x3 ) : x21 + x22 + x23 = 4, −2 ≤ x3 ≤ 3 − x1 − x2 . Hint. For (c) the boundary is a circle on the plane x1 + x2 + x3 = 3. Translate and rotate that plane into the plane x3 = 0, find a parametric equation of the rotated circle with center 0, then reverse the procedure to find the parametrization of the original circle with appropriate orientation. 11. Let S = {x : F (x) = 0} be an oriented 2-surface in R3 , where F is C 2 . (a)S The tangent bundle of S is the set [ TS = {x} × Tx . x∈S

Show that TS = (x, v) ∈ R6 : F (x) = 0 and v · ∇F (x) = 0 and that TS is a 4-surface in R6 . (b) The sphere bundle of S is the subset TS1 := {(x, v) ∈ TS : kvk = 1} . Show that TS1 is a 3-surface in R6 . (c) Let S = x ∈ R3 : kxk2 = 3 . Show that the tangent space to the √ √ √ sphere bundle TS1 at the point (1, 1, 1, 1/ 6, 1/ 6, −2/ 6) consists of all vectors w ∈ R6 satisfying the system w1 √ −3 6w3 w4

+ w2 + w4 + w5

+ + −

w3 w5 2w6

+ w6

=0 =0 =0

Chapter 13 Integration on Surfaces

Throughout the chapter m and n are fixed positive integers with 1 ≤ m ≤ n. In this chapter we construct the integral of a differential m-form on an m-surface in Rn , a generalization of the line integral of a 1-form on a curve. This will provide the necessary context for the divergence theorem and the theorems of Green and Stokes, far-reaching generalizations of the fundamental theorem of calculus

13.1

Differential Forms

Alternating Multilinear Functionals An m-multilinear functional on Rn is a real-valued function M (a1 , . . . , am ),

a1 , . . . , am ∈ Rn ,

that is linear in each variable ai separately. (See Section 9.7.) Such a function is said to be alternating if interchanging two vectors changes the sign of M : M (a1 , . . . , ai , . . . , aj , . . . , am ) = −M (a1 , . . . , aj , . . . , ai , . . . , am ). Thus if ai = aj , then M (a1 , . . . , am ) = 0. Note that a linear combination of alternating m-multilinear functionals is an alternating m-multilinear functional. A permutation of (1, . . . , m) is a one-to-one function σ mapping {1, . . . , m} onto itself, frequently denoted by (i1 , . . . , im ), where ik = σ(k). The sign (−1)σ of σ is positive (negative) if an even (odd) number of adjacent interchanges are required to transform (i1 , . . . , im ) back to (1, . . . , m) (see Appendix B). It follows that if M is an alternating m-multilinear functional, then M (aσ(1) , . . . , aσ(m) ) = (−1)σ M (a1 , . . . , am ). An important example is the determinant of an n × n matrix, which is 447

448

A Course in Real Analysis

multilinear and alternating on its rows as well as its columns. To build on this, we introduce the following notation. Define Jm = {j := (j1 , . . . , jm ) : 1 ≤ jk ≤ n} , and Im = {i := (i1 , . . . , im ) : 1 ≤ i1 < i2 < · · · < im ≤ n} . Thus Jm is the set of all m-tuples of (possibly repeated) indices in {1, . . . , n} and Im the set of all strictly increasing m-tuples in Jm . In particular, In = {(1, . . . , n)}. Now let A be an n × m matrix with columns a1 , . . . , am ∈ Rn and B an m × n matrix with rows b1 , . . . , bm ∈ Rn . For any member j = (j1 , . . . , jm ) of Jm define Aj to be the m × m matrix whose rth row is row jr of A and define B j to be the m × m matrix whose cth column is column jc of B, that is, 1 1 aj1 a2j1 · · · am a1 a21 · · · am j1 1 2 m a12 a22 · · · am a1j 2 2 aj2 · · · aj2 Aj = a1 · · · am j = . = . . . . . .. .. .. .. .. .. a1n

a2n

···

am n

j

a1jm

a2jm

···

am jm

and 1 j b1 b1 b12 B j = ... = . .. bm b1m

Thus j selects rows from m = 3, 1 4 7 10 and

1 5 9

b21 b22 .. .

··· ···

b2m

···

j j1 bn1 b1 bj1 bn2 2 .. = . .. . bnm bjm1

bj12 bj22 .. .

··· ···

bj1m bj2m .. .

bjm2

···

bjmm

A and columns from B. For example, for n = 4 and

2 6 10

2 5 8 11 3 7 11

3 10 6 4 = 9 1 12 (4,2,1) (4,4,1) 4 4 8 =8 12 12

11 5 2 4 8 12

12 6 3 1 5 . 9

Finally, define the alternating m-multilinear functional dxj = dxj1 ,...,jm on Rn by dxj a1 , . . . , am = det[a1 · · · am ]j . Note that if m = 1, the definition reduces to dxj (a) = aj , as defined in Section 9.7.

Integration on Surfaces

449

13.1.1 Lemma. If i = (i1 , · · · , im ) and j = (j1 , · · · , jm ) ∈ Im , then ( 1 if i = j, j1 jm dxi e , . . . , e = 0 otherwise, where e1 , . . . , en are the standard basis vectors in Rn . Proof. By definition, j1 e1 . jm j1 dxi e , . . . , e = det ..

···

ejn1

···

j1 ei ej1m 1 .. .. = . . j jm e 1 e n

im

i

··· ···

eji1m .. , . jm e im

where eji = 1 if i = j and 0 otherwise. If j1 < i1 , then j1 < i` for every `, hence the first column is zero and the determinant is zero. Similarly, if j1 > i1 , then the first row is zero and, again, the determinant is zero. If j1 = i1 , then the determinant reduces to j2 jm ei 2 · · · ei2 .. .. , . . j e 2 · · · ejm im im and an induction argument completes the proof. 13.1.2 Lemma. Let M and M 0 be alternating m-multilinear functionals on Rn . If M (ei1 , . . . , eim ) = M 0 (ei1 , . . . , eim ) (13.1) for all (i1 , . . . , im ) ∈ Im , then M = M 0 . Proof. For j = 1, . . . , m, let aj = (aj1 , . . . , ajn ) = M (a1 , . . . , am ) = M

n X

a1i ei , . . . ,

i=1

=

n X i1 =1

···

Pn

n X

i=1

aji ei . By multilinearity, !

i am i e

i=1 n X

i1 im a1i1 · · · am im M (e , . . . , e ),

im =1

with the analogous equality holding for M 0 . It therefore suffices to show that M (ei1 , . . . , eim ) = M 0 (ei1 , . . . , eim ). This is clear if two of the indices ik are equal, since then both sides are zero. If the indices are distinct, then, by permuting the vectors ei1 , . . . , eim and attaching the appropriate signs, the indices may be brought into increasing order, and the desired equality then follows from the hypothesis.

450

A Course in Real Analysis

13.1.3 Theorem. If M is an alternating m-multilinear functional on Rn , then X M= M (ei1 , . . . , eim ) dxi1 ,··· ,im . (i1 ,...,im )∈Im

Proof. Let M denote the alternating m-multilinear functional on the right. If (j1 , . . . , jm ) ∈ Im , then X M 0 (ej1 , . . . , ejm ) = M (ei1 , . . . , eim ) dxi1 ,...,im (ej1 , . . . , ejm ) 0

(i1 ,...,im )∈Im

= M (ej1 , . . . , ejm ), the second equality from 13.1.1. By 13.1.2, M = M 0 . The following application of 13.1.3 will be needed later in connection with integration on surfaces. 13.1.4 Binet–Cauchy Product. Let C be an m × n matrix and D an n × m matrix. Then X det(CD) = det C i det Di . i∈Im

Proof. Let c1 , . . ., cm ∈ Rn denote the rows of C and d1 , . . ., dm ∈ Rn the columns of D, the latter considered as variables. Define M d1 , . . . , dm = det(CD) = det ci · dj m×m . Then M is an alternating m-multilinear form and, by 13.1.3, X M d1 , . . . , dm = M (ei1 , . . . , eim ) dxi1 ,...,im d1 , . . . , dm . (i1 ,...,im )∈Im

Since M ei1 , . . . , eim = det C i and dxi d1 , . . . , dm = det Di , the conclusion follows.

13.1.5 Corollary. If C and D are n × n matrices, then det(CD) = (det C)(det D). 13.1.6 Corollary. If A is an n × m matrix, then X det(At A) = [det(Ai )]2 . i∈Im i t Proof. Take C = At and D = A in the theorem 2and note that C = (Ai ) , so t i det C det Di = det Ai det Ai = [det Ai ] .

From 13.1.6, we have 13.1.7 Corollary. Let A be an n × m matrix. Then A has rank m iff det(At A) 6= 0.

Integration on Surfaces

451

Definition of a Differential Form A differential m-form on a set S ⊆ Rn is a function ω that assigns to each x ∈ S an alternating m-multilinear functional ωx on Rn . We shall usually drop the qualifier “differential” when referring to forms. The integer m is called the degree of the form. A 0-form is simply a real-valued function on S. By 13.1.3, if ω is an m-form, then for each i ∈ Im there exists a unique function gi on S such that X ωx = gi (x) dxi , x ∈ S. i∈Im

Conversely, if fj is a real-valued function on S, then X ωx := fj (x) dxj , x ∈ S,

(13.2)

j∈Jm

defines an m-form on S. If each fj is of class C r on S (that is, on an open set containing S), then ω is called a differential form of class C r or simply a C r form, where r ∈ Z+ ∪ {+∞}.

The Algebra of Differential Forms For a ∈ R and m-forms X X ω= fj dxj and η = gj dxj j∈Jm

j∈Jm

on S, define m-forms aω and ω + η on S by X X aω := afj dxj and ω + η := (fj + gj ) dxj . j∈Jm

j∈Jm

The collection of m-forms on S is easily seen to be a vector space under these operations. It is also possibly to multiply forms. For this, the notation dxj1 ,...,jm = dxj1 ∧ · · · ∧ dxjm

(13.3)

will be useful. The right side may be interpreted as a product of differentials, called a wedge product and made precise below. Because dxj1 ,...,jm (a1 , . . . , am ) is a determinant, interchanging a pair of differentials in (13.3) changes the sign of the product. Furthermore, if there are duplicate indices, then the product is zero. Thus we have the “rules” dxj ∧ dxi = −dxi ∧ dxj

and dxi ∧ dxi = 0.

(13.4)

Using these rules, one can reduce any m-form to its unique canonical representation X ω= gi1 ,...,im dxi1 ∧ · · · ∧ dxim . (i1 ,...,im )∈Im

452

A Course in Real Analysis

For example, the 3-form in R4 ω = f dx2 ∧ dx1 ∧ dx2 + g dx3 ∧ dx2 ∧ dx1 + h dx2 ∧ dx4 ∧ dx1 has canonical representation ω = −g dx1 ∧ dx2 ∧ dx3 + h dx1 ∧ dx2 ∧ dx4 . 13.1.8 Definition. Let 1 ≤ p, q ≤ n. The wedge product or exterior product of the forms X X ω= fj1 ,...,jp dxj1 ∧ · · · ∧ dxjp and η = gk1 ,...,kq dxk1 ∧ · · · ∧ dxkq (j1 ,...,jp )∈Jp

(k1 ,...,kq )∈Jq

is the form ω ∧ η :=

X

fj1 ,...,jp gk1 ,...,kq dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq . (13.5)

(j1 ,...,jp )∈Jp (k1 ,...,kq )∈Jq

If f is a 0-form on S, then the p-form f ω = f ∧ ω is defined by X f ∧ ω := f fj1 ,...,jp dxj1 ∧ · · · ∧ dxjp .

♦

(j1 ,...,jp )∈J

Note that the right side of (13.5) may be obtained by formally multiplying the sums defining ω and η, where the product of forms dxi1 ∧ · · · ∧ dxip and dxj1 ∧ · · · ∧ dxjq is defined as dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq . The rules in (13.4) may then be used to obtain the canonical representation of ω ∧ η. The resulting form has degree ≤ n, in compliance with our definition. 13.1.9 Example. In R4 , (a)

(f1 dx1 + f2 dx2 + f3 dx3 + f4 dx4 ) ∧ (g1 dx1 + g2 dx2 ) = (f1 g2 − f2 g1 ) dx1 ∧ dx2 − f3 g1 dx1 ∧ dx3 − f3 g2 dx2 ∧ dx3 − f4 g2 dx2 ∧ dx4 − f4 g1 dx1 ∧ dx4 .

(b)

(f1 dx1 + f2 dx2 + f3 dx3 + f4 dx4 ) ∧ (h1 dx1 ∧ dx3 + h2 dx2 ∧ dx4 ) = f1 h2 dx1 ∧ dx2 ∧ dx4 − f2 h1 dx1 ∧ dx2 ∧ dx3 − f3 h2 dx2 ∧ dx3 ∧ dx4 + f4 h1 dx1 ∧ dx3 ∧ dx4 .

♦

It must still be shown that the definition of ω ∧ η in (13.5) is independent of the particular representations of ω and η. To see this, apply the rules in (13.4), first on the indices jp and then on the indices kq , to reduce the right side of (13.5) to X f˜i1 ,...,ip g˜i01 ,...,i0p dxi1 ∧ · · · ∧ dxip ∧ dxi01 ∧ · · · ∧ dxi0q (i1 ,...,ip )∈Ip (i01 ,...,i0q )∈Iq

Integration on Surfaces

453

where X

f˜i1 ,...,ip dxi1 ∧ · · · ∧ dxip and

X

g˜i01 ,...,i0q dxi01 ∧ · · · ∧ dxi0q

(i01 ,...,i0q )∈Iq

(i1 ,...,ip )∈Ip

are the canonical representations of ω and η. Since the latter are unique, every version of ω ∧ η may be reduced to the same form, hence ω ∧ η is well-defined. 13.1.10 Proposition. Let ω be a p-form, η a q-form, and ν an r-form, where 1 ≤ p, q, r ≤ n. Then (a) ω ∧ η is linear in each variable separately; (b) (ω ∧ η) ∧ ν = ω ∧ (η ∧ ν); (c) η ∧ ω = (−1)pq ω ∧ η. Proof. The straightforward proofs of (a) and (b) are left to the reader. For the proof of (c), let ω and η be as in 13.1.8. Then X η∧ω = gk1 ,...,kq fj1 ,...,jp dxk1 ∧ · · · ∧ dxkq ∧ dxj1 ∧ · · · ∧ dxjp (k1 ,...,kq )∈Jq (j1 ,...,jp )∈Jp

=

X

gk1 ,...,kq fj1 ,...,jp (−1)pq dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq

(k1 ,...,kq )∈Jq (j1 ,...,jp )∈Jp

= (−1)pq ω ∧ η, the last equality because pq adjacent interchanges are required. Pn 13.1.11 Proposition. Let aj = i=1 aji ej , j = 1, . . . , n. Then ! ! n n X X 1 n ai dxi ∧ · · · ∧ ai dxi = det[a1 · · · an ]dx1 ∧ · · · ∧ dxn . i=1

i=1

Proof. By properties of the wedge product, the left side of the equation is n X i1 =1

···

n X in =1

a1i1 · · · anin dxi1 ∧ · · · ∧ dxin =

X

a1i1 · · · anin dxi1 ∧ · · · ∧ dxin .

i1 ,...,in distinct

If σ = (i1 , . . . , in ), then dxi1 ∧ · · · ∧ dxin = (−1)σ dx1 ∧ · · · ∧ dxn , and the assertion follows from the definition of determinant. The proposition provides an alternate method for evaluating determinants.

454

A Course in Real Analysis 1 3 5 4 6. By wedge product rules applied to 13.1.12 Example. Let A = 2 3 −2 1 the forms constructed from the columns, (1 dx1 + 2 dx2 + 3 dx3 ) ∧ (3 dx1 + 4 dx2 − 2 dx3 ) ∧ (5 dx1 + 6 dx2 + 1 dx3 ) = (−2 dx1,2 − 11 dx1,3 − 16 dx2,3 ) ∧ (5 dx1 + 6 dx2 + dx3 ) = (−2 dx1,2,3 + 66 dx1,2,3 − 80 dx1,2,3 , ) = −16 dx1,2,3 , hence det(A) = −16.

♦

The Differential of a Form 13.1.13 Definition. The differential of a 0-form f of class C 1 on S ⊆ Rn is its differential as a C 1 function, namely, the 1-form df =

n X

(∂j f )dxj .

j=1

The differential of an m-form X ω=

fj1 ,...,jm dxj1 ∧ · · · ∧ dxjm

(j1 ,...,jm )∈J

of class C 1 on S is the (m + 1)-form dω defined by X dω = (dfj1 ,...,jm ) ∧ dxj1 ∧ · · · ∧ dxjm

(13.6)

(j1 ,...,jm )∈Jm

=

X

n X

(∂j fj1 ,...,jm ) dxj ∧ dxj1 ∧ · · · ∧ dxjm .

♦

(j1 ,...,jm )∈J j=1

Note that if m = n, then dω = 0, since in the last expression every dxj is a dxji for some i. As in the case of wedge products, it must be verified that the definition of dω does not depend on the particular representation of ω. For this we use the rules in (13.4) to express ω canonically as X ω= gi1 ,...,im dxi1 ∧ · · · ∧ dxim . (i1 ,...,im )∈Im

Here, each gi1 ,...,im is a linear combination the functions fj1 ,...,jm produced by combining these functions during the reduction process. Applying the same sequence of operations to the sum on the right in (13.6) results in X ηi1 ,...,im ∧ dxi1 ∧ dxi2 ∧ · · · ∧ dxim , (i1 ,...,im )∈Im

Integration on Surfaces

455

where ηi1 ,...,im is precisely the same linear combination of the forms dfj1 ,...,jm . Since the differential is linear on 0-forms, ηi1 ,...,im = dgi1 ,...,im . Therefore, all versions of dω may be reduced to the same form and hence are equal. For the next example, we introduce the following notation and terminology from classical vector analysis. 13.1.14 Definition. The curl of a C 1 vector field F~ = (f1 , f2 , f3 ) on an open subset of R3 is the vector curl F~ = (∂2 f3 − ∂3 f2 ) e1 + (∂3 f1 − ∂1 f3 ) e2 + (∂1 f2 − ∂2 f1 ) e3 . The divergence of a C 1 vector field F~ = (f1 , . . . , fn ) on an open subset of Rn is defined by n X ~ div F = ∂i fi . i=1

If ω =

Pn

j=1

fj dxj we define div ω = div F~ .

♦

13.1.15 Example. In R3 , (a) d f1 dx1 + f2 dx2 + f3 dx3

= (∂1 f1 dx1 + ∂2 f1 dx2 + ∂3 f1 dx3 ) ∧ dx1 + (∂1 f2 dx1 + ∂2 f2 dx2 + ∂3 f2 dx3 ) ∧ dx2 + (∂1 f3 dx1 + ∂2 f3 dx2 + ∂3 f3 dx3 ) ∧ dx3 = (∂2 f3 − ∂3 f2 ) dx2,3 + (∂3 f1 − ∂1 f3 ) dx3,1 + (∂1 f2 − ∂2 f1 ) dx1,2 = e1 · curl F~ dx2,3 + e2 · curl F~ dx3,1 + e3 · curl F~ dx1,2 . (b) d f3 dx1 ∧ dx2 + f1 dx2 ∧ dx3 + f2 dx3 ∧ dx1 ) = (∂1 f3 dx1 + ∂2 f3 dx2 + ∂3 f3 dx3 ) ∧ dx1 ∧ dx2 + (∂1 f1 dx1 + ∂2 f1 dx2 + ∂3 f1 dx3 ) ∧ dx2 ∧ dx3 + (∂1 f2 dx1 + ∂2 f2 dx2 + ∂3 f2 dx3 ) ∧ dx3 ∧ dx1 = (∂1 f1 + ∂2 f2 + ∂3 f3 ) dx1 ∧ dx2 ∧ dx3 = div F~ dx1 ∧ dx2 ∧ dx3 .

♦

13.1.16 Theorem. Let f be a 0-form, let ω and η be p-forms, and let ν be a q form, all of class C 1 on S ⊆ Rn . Then (a) d(aω + bη) = a dω + b dη, a, b ∈ R; (b) d2 ω := d(dω) = 0; (c) d(ω ∧ ν) = (dω) ∧ ν + (−1)p ω ∧ (dν); (d) d(f ν) = (df ) ∧ ν + f dν. Proof. Part (a) is clear from the definition of addition and scalar multiplication of m-forms and the linearity of the differential operator on 0-forms.

456

A Course in Real Analysis For (b), it suffices by linearity to prove that d (df ) dxj1 ∧ dxj2 ∧ · · · ∧ dxjp = 0.

The left side of this equation is X n d (∂k f )dxk ∧ dxj1 ∧ · · · ∧ dxjp k=1

=

X n X n

∂j ∂k f dxj ∧ dxk ∧ dxj1 ∧ · · · ∧ dxjp .

j=1 k=1

Since dxk ∧ dxj = −dxj ∧ dxk and ∂j ∂k f = ∂k ∂j f , the terms in the square brackets on the right cancel pairwise, producing zero, as required. To prove (c), let X X ω= fj dxj and ν = gk dxk . j∈Jp

k∈Jq

By the product rule for differentials of 0-forms, X d(ω ∧ ν) = d(fj gk ) ∧ dxj ∧ dxk j∈Jp , k∈Jq

=

X

gk (dfj ) ∧ dxj ∧ dxk +

j∈Jp ,k∈Jq

X

fj (dgk ) ∧ dxj ∧ dxk

j∈Jp ,k∈Jq

= (dω) ∧ ν + (−1)−p ω ∧ (dν), the last equality because p adjacent interchanges are needed to place the form dgk in the second sum to the immediate left of dxk . Part (d) follows from (c) with p = 0.

The Pullback of a Form Throughout this subsection, U ⊆ Rm and W ⊆ Rn are open and ϕ : U → W is a C 1 map. 13.1.17 Definition. The pullback by ϕ of a C 1 function (0-form) f on W is the 0-form ϕ∗ (f ) on U defined by ϕ∗ (f )(u) := f ϕ(u) , u ∈ U. The pullback by ϕ of the 1-form dxj on W is the 1-form ϕ∗ (dxj ) on U defined by m X ∂ϕj ϕ∗ (dxj ) := dui = dϕj , j = 1, . . . , n. ∂ui i=1

Integration on Surfaces

457

The pullback by ϕ of the C 1 p-form X ω= fj1 ,...,jp dxj1 ∧ · · · ∧ dxjp (j1 ,...,jp )∈Jp

on W is the C 1 p-form ϕ∗ ω on U defined by X ϕ∗ ω := ϕ∗ (fj1 ,...,jp )ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp ).

♦

(j1 ,...,jp )∈Jp

Arguments similar to those used earlier show that the definition of ϕ∗ ω is independent of the representation of ω. 13.1.18 Example. Let ϕ = (ϕ1 , ϕ2 , ϕ3 ) : R2 → R3 be C 1 . Then (a) ϕ∗ f dx1 ∧ dx2 ) = ϕ∗ (f )ϕ∗ ( dx1 ) ∧ ϕ∗ ( dx2 ) ∂ϕ1 ∂ϕ2 ∂ϕ2 ∂ϕ1 du1 + du2 ∧ du1 + du2 = (f ◦ ϕ) ∂u1 ∂u2 ∂u1 ∂u2 ∂ϕ1 ∂ϕ2 ∂ϕ2 ∂ϕ1 = (f ◦ ϕ) − du1 ∧ du2 . ∂u1 ∂u2 ∂u1 ∂u2 (b) ϕ∗ f1 dx1 + f2 dx2 + f3 dx3 = ϕ∗ (f1 )ϕ∗ ( dx1 ) + ϕ∗ (f2 )ϕ∗ ( dx2 ) + ϕ∗ (f3 )ϕ∗ ( dx3 ∂ϕ1 ∂ϕ1 ∂ϕ2 ∂ϕ2 du1 + du2 + (f2 ◦ ϕ) du1 + du2 = (f1 ◦ ϕ) ∂u1 ∂u2 ∂u1 ∂u2 ∂ϕ3 ∂ϕ3 + (f3 ◦ ϕ) du1 + du2 ∂u1 ∂u2 ∂ϕ1 ∂ϕ2 ∂ϕ3 = (f1 ◦ ϕ) + (f2 ◦ ϕ) + (f3 ◦ ϕ) du1 ∂u1 ∂u1 ∂u1 ∂ϕ2 ∂ϕ3 ∂ϕ1 + (f2 ◦ ϕ) + (f3 ◦ ϕ) du2 . ♦ + (f1 ◦ ϕ) ∂u2 ∂u2 ∂u2 13.1.19 Theorem. If ω and η are C 1 p-forms and ν is a C 1 q-form, then (a) ϕ∗ (aω + bη) = aϕ∗ (ω) + bϕ∗ (η), a, b ∈ R; (b) ϕ∗ (ω ∧ ν) = ϕ∗ (ω) ∧ ϕ∗ (ν); (c) ϕ∗ (dω) = dϕ∗ (ω); (d) (ϕ∗ ω)u (a1 , . . . , ap ) = ωϕ(u) (dϕu (a1 ), . . . , dϕu (ap )). Proof. Part (a) follows directly from the definition of pullback. Part (b) is easily established for ω = f dxi1 ∧ · · · ∧ dxip and ν = g dxj1 ∧ · · · ∧ dxjq ; bilinearity of the wedge product and linearity of ϕ∗ then imply that (b) holds generally. For (c) it suffices, by linearity of the differential and pullback, to verify that ϕ∗ d(f dxj1 ∧ · · · ∧ dxjp ) = dϕ∗ (f dxj1 ∧ · · · ∧ dxjp ),

458

A Course in Real Analysis

that is, n X

[(∂j f ) ◦ ϕ] ϕ∗ ( dxj ) ∧ ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp )

j=1

=

X m

∂i (f ◦ ϕ)dui ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp ) (13.7)

i=1

By the chain rule, ∂i (f ◦ ϕ) =

n X ∂ϕj , (∂j f ) ◦ ϕ ∂ui j=1

hence the right side of (13.7) is n m X X ∂ϕj (∂j f ) ◦ ϕ dui ∂ui j=1 i=1

! ∧ ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp ).

Recalling the definition of ϕ∗ ( dxj ), we see that the last expression is precisely the left side of (13.7). To prove (d), let ω have canonical representation X ω= fi1 ,...,ip dxi1 ∧ · · · ∧ dxip . (i1 ,...,ip )∈Ip

By 13.1.2, it suffices to show that (ϕ∗ ω)u (e`1 , . . . , e`p ) = ωϕ(u) (dϕu (e`1 ), . . . , dϕu (e`p )) for any (`1 , . . . , `p ) ∈ Ip . The left side of this equation is X ϕ∗ (fi )(u) ϕ∗ (dxi1 ) ∧ · · · ∧ ϕ∗ (dxip ) (e`1 , . . . , e`p ) i∈Ip

and the right side is X fi ϕ(u) dxi1 ∧ · · · ∧ dxip dϕu (e`1 ), . . . , dϕu (e`p )) i∈Ip

Hence it suffices to prove that ϕ∗ (dxi1 ) ∧ · · · ∧ ϕ∗ (dxip ) (e`1 , . . . , e`p ) = dxi1 ∧ · · · ∧ dxip dϕu (e`1 ), . . . , dϕu (e`p ))

(13.8)

By multilinearity, ϕ∗ (dxi1 ) ∧ · · · ∧ ϕ∗ (dxip ) =

X

∂ϕip ∂ϕi1 ··· duj1 ∧ · · · ∧ dujp . ∂uj1 ∂ujp

(j1 ,...,jp )∈Jp

Integration on Surfaces

459

Now, duj1 ∧ · · · ∧ dujp (e`1 , . . . , e`p ) 6= 0 only if the p-tuple (j1 , . . . , jp ) is a permutation of (`1 , . . . , `p ). For each such p-tuple define a permutation σ of (1, . . . , p) such that `k = jσ(k) . Then duj1 ∧ · · · ∧ dujp (e`1 , . . . , e`p ) = (−1)σ du`1 ∧ · · · ∧ du`p (e`1 , . . . , e`p ) = (−1)σ and

∂ϕiσ(1) ∂ϕiσ(p) ∂ϕip ∂ϕip ∂ϕi1 ∂ϕi1 ··· = ··· = ··· , ∂uj1 ∂ujp ∂u`τ (1) ∂u`τ (p) ∂u`1 ∂u`p

where τ = σ −1 . Thus the left side of (13.8) is X ∗ ∂ϕiσ(p) ∂ϕiσ(1) ··· , (13.9) ϕ (dxi1 )∧· · ·∧ϕ∗ (dxip ) (e`1 , . . . , e`p ) = (−1)σ ∂u`1 ∂u`p σ where the sum is taken over all permutations σ of (1, . . . , p). On the other hand, since dϕu (e ) = ∂`j ϕ(u) = `j

p X ∂ϕi (u) i=1

∂u`j

ei ,

the right side of (13.8) is dxi1 ∧ · · · ∧ dxip =

p p X X ∂ϕ ∂ϕ j j ej , . . . , ej ∂u ∂u ` ` 1 p j=1 j=1 X

∂ϕjp ∂ϕj1 ··· dx ∧ · · · ∧ dxip (ej1 , . . . , ejp ). (13.10) ∂u`1 ∂u`p i1

(j1 ,...,jp )∈Jp

As above, dxi1 ∧ · · · ∧ dxip (ej1 , . . . , ejp ) 6= 0 only if the p-tuple (j1 , . . . , jp ) is a permutation of (i1 , . . . , ip ). For each such p-tuple, define a permutation σ of (1, . . . , p) such that jk = iσ(k) . Then dxi1 ∧ · · · ∧ dxip (ej1 , . . . , ejp ) = dxi1 ∧ · · · ∧ dxip eiσ(1) , . . . , eiσ(k) = (−1)σ and

∂ϕiσ(1) ∂ϕiσ(p) ∂ϕjp ∂ϕj1 ··· = ··· ∂u`1 ∂u`p ∂u`1 ∂u`p

so (13.10) reduces to X σ

(−1)σ

∂ϕiσ(1) ∂ϕiσ(p) ··· , ∂u`1 ∂u`p

where the sum is taken over all permutations of (1, . . . , p). As this is precisely (13.9) the proof is complete.

460

A Course in Real Analysis

Exercises 1. Let Tj ∈ L(Rn , R), j = 1, . . . , m. Which of the following functions is multilinear on Rn ? Pm Qm (a) M (x1 , . . . , xm ) := i=1 Ti (xi ). (b) M (x1 , . . . , xm ) := i=1 Ti (xi ). 2. For fixed c = (c1 , c2 ), d = (d1 , d2 ) ∈ R2 define M (x, y) := (c · x)(d · y) − (c · y)(d · x), x, y ∈ R2 . (a) Show that M is an alternating multilinear functional on R2 . (b) Express M in terms of differentials, as in 13.1.3 3.S Let M (a1 , . . . , am ) be a multilinear functional on Rn with the property that M (a1 , . . . , am ) = 0 whenever two of the vectors aj are equal. Prove that M is alternating. 4. Let M be an alternating m-multilinear functional on Rn . Show that if the vectors a1 , . . . , am are linearly dependent, then M (a1 , . . . , am ) = 0. 5. Let M (a1 , . . . , am ) be an m-multilinear functional on Rn . Define Alt(M )(a1 , . . . , am ) =

1 X (−1)σ M aσ(1) , . . . , aσ(m) , m! σ

where the sum is taken over all permutations σ of (1, . . . , m). Show that Alt(M ) is an alternating m-multilinear functional on Rn and that Alt(M ) = M iff M is alternating. n 6. Prove that the vector space of m-forms on S has dimension m . 7. Find the canonical representation of the following forms in R3 : (a)S (f1 dx1 + f2 dx2 + f3 dx3 ) ∧ (g1 dx1 + g2 dx2 + g3 dx3 ). (b) (f1 dx1 + f2 dx2 + f3 dx3 ) ∧ (g1 dx1 + g2 dx2 + g3 dx3 ) ∧(h1 dx1 + h2 dx2 + h3 dx3 ). 8. Find the canonical representation of the following forms in R5 : (a) (−dx1 + dx2 + dx3 ∧ (dx1 − 2dx2 + 3dx3 ). (b)S (dx1 + dx2 ) ∧ (dx1 − dx3 ) ∧ (dx2 + 2dx3 ). (c) dx1 ∧ (dx1 ∧ dx3 + 3dx5 ∧ dx4 ). (d) dx1 ∧ dx2 + dx1 ∧ dx3 ∧ dx 4 ∧ dx3 + dx2 ∧ dx5 ∧ dx3 ∧ dx1 + dx4 ∧ dx1 . 9. Find the canonical representation of the following forms in Rn : (a)S dx2 ∧ dx4 ∧ · · · ∧ dx2k ∧ dx1 ∧ dx3 ∧ · · · ∧ dx2k−1 , 2k ≤ n. (b) dx1 ∧ dx5 ∧ · · · ∧ dx4k−3 ∧ dx3 ∧ dx7 ∧ · · · ∧ dx4k−1 ∧ dx2 ∧ dx6 ∧ · · · ∧ dx4k−2 ∧ dx4 ∧ dx8 ∧ · · · ∧ dx4k , 4k ≤ n.

Integration on Surfaces

461

10. Show that if ω is an m-form and m is odd, then ω ∧ ω = 0. Find an example of a 2-form ω in R4 such that ω ∧ ω 6= 0. 11. Use the method of 13.1.12 to verify the determinants 1 −1 −1 3 1 0 2 1 1 = -4. (b)S 2 −1 0 = 9. (c) 0 (a) −1 1 0 −1 1 −1 2 1 12. Show directly that in Rn , d f ( dx1 ∧ · · · ∧ dxn ) = 0.

1 2 −1 1 = -6. 1 0

13. Let f : R → R be C 1 and define gj (x) = f (xj ). Find the canonical representation of n n X X S (a) d gj dxj . (b) d gn−j+1 dxj . j=1

j=1

14. Find d(f dg), where f is C 1 and g is C 2 on W . 15.S A form η on W is exact if η = dω for some form ω on W . Prove that if η is exact and dν = 0, then η ∧ ν is exact. Pn 16. Let f and ω := i=1 fi dxi be C 1 on an open set W ⊆ Rn . Show that if d(f ω) = 0, then f ω ∧ dω = (df ) ∧ ω ∧ ω. 17.S Let U ⊆ Rk , V ⊆ R` , and W ⊆ Rn be open and let ϕ : U → V and ψ : V → W be C 1 . If ω is an m-form on W , prove that (ψ ◦ ϕ)∗ ω = ϕ∗ (ψ ∗ ω). Hint. Use 13.1.19(d). 18. Let U, W ⊆ Rn be open and let ϕ : U → W and f : W → Rn be C 1 . Show that ϕ∗ (dx1 ∧ · · · ∧ dxn ) = det(ϕ0 ) du1 ∧ · · · ∧ dun . 19.S Let F = (f1 , f2 , f3 ) be C 1 on R3 and homogeneous of degree k ∈ N. (See Exercise 9.3.15.) Let ω = f1 dx1 + f2 dx2 + f3 dx3 . Show that if dω = 0, then ω = df where f (x) = (k + 1)−1 F (x) · x.

13.2

Integrals on Parameterized Surfaces

Recall that the length of a parameterized curve C in Rn is, by definition, a limit of lengths of inscribed polygonal lines. The proof of 12.2.4 shows that if the curve C is C 1 , then its length may be also be approximated by tangent line segments. This idea may be extended to higher dimensions, using tangent parallelepipeds to approximate surface area. This leads ultimately to the definition of the integral of a function or a form on a surface.

462

A Course in Real Analysis

Area of a Parallelepiped 13.2.1 Definition. The parallelepiped spanned by vectors a1 , . . . , am ∈ Rn is the set X m 1 m i P = P (a , . . . , a ) := ti a : 0 ≤ ti ≤ 1 . i=1

The volume vol(P ) of P is its n-dimensional Lebesgue measure.

♦

For m = n, there is a simple formula for the volume: 13.2.2 Lemma. vol(P ) = det a1 . . . an . Proof. Denote by T ∈ L(Rn , Rn ) the linear mapping with matrix A := 1 n a · · · a . Since T (ej ) = aj , a typical member of P := P (a1 , . . . , an ) may be expressed as X n n X ti ai = T ti ei = T (t1 , . . . , tn ) ∈ T ([0, 1]n ) . i=1

i=1

By 11.6.3 and 11.6.9, λn (P ) = λn (T ([0, 1]n )) = | det A|λn ([0, 1]n ) = | det A|. If m < n, then λn (P ) = 0 but P may still have positive m-dimensional Lebesgue measure, as defined in 11.6.9. Specifically, let V denote the linear span of the vectors a1 , . . ., am and choose an orthonormal basis v 1 , . . ., v n of Rn such that v 1 , . . ., v m is a basis for V. Define T ∈ L(V, Rm ) so that T (v j ) = ej , 1 ≤ j ≤ m. Thus T “rotates” and/or “reflects” V onto Rm × {0}. The area of P is then defined by area P (a1 , . . . , am ) = λm T P (a1 , . . . , am ) . A concrete value for this area is given in the following theorem. 13.2.3 Theorem. Let m < n, a1 , . . ., am ∈ Rn , and A = [a1 · · · am ]. Then X p 2 1/2 area P (a1 , . . . , am ) = det(At A) = det Ai . i∈Im

Proof. Set b = T (aj ) and B = b j

1

···

b . By linearity of T , m

T P (a1 , . . . , am ) = P (b1 , . . . , bm ) ⊆ Rm ,

hence, by 13.2.2, area P (a1 , . . . , am ) = λm P (b1 , . . . , bm ) = | det B|. Now, the (i, j)th entry of B t B is bi ·bj , and because T preserves inner products this is the same as ai · aj . Therefore, B t B = At A, hence p p p | det B| = (det B t )(det B) = det(B t B) = det(At A). This proves the first equality in the theorem. The second equality is from 13.1.6.

Integration on Surfaces

463

Area of a Parameterized Surface Let ϕ : U → Rn be a parameterized m-surface in Rn with image S and let u = (u1 , . . . , um ) ∈ U and a = ϕ(u) ∈ S. Choose a small m-dimensional interval Q = [u1 , u1 + ∆u1 ] × · · · × [um , um + ∆um ] ⊆ U, ∆uj > 0. As noted in Chapter 12, the line segments u + tej in U map onto curves in S with tangent vectors dϕu (ej ) = ∂j ϕ(u),

1 ≤ j ≤ m,

at ϕ(u). The matrix with columns ∂j ϕ(u) is ϕ0 (u), the Jacobian matrix of ϕ at u. By 13.2.3, the parallelepiped spanned by the vectors ∆uj ∂j ϕ(u) therefore (∆u2 ) dϕu (e2 ) U

Q (∆u2 )e2 u (∆u )e 1 1

ϕ

S = ϕ(U ) p

ϕ(Q)

(∆u1 ) dϕu (e1 )

FIGURE 13.1: Parallelogram approximation to ϕ(Q). has area

q det ϕ0 (u)t ϕ0 (u) ∆u1 ∆u2 · · · ∆um ,

which is taken as an approximation of the area of the surface element ϕ(Q). Partitioning U into a grid Q of intervals Q and summing these expressions, we obtain the Riemann sums Xq det ϕ0 (u)t ϕ0 (u) ∆u1 ∆u2 · · · ∆um . Q

It is reasonable then to define the area of S as the limit of these sums as the diameters of the intervals Q tend to zero, that is, Z q area(ϕ) := det ϕ0 (u)t ϕ0 (u) du. (13.11) U

Integral of a Function on a Parameterized Surface Let f be a continuous, real-valued function on S = ϕ(U ). Motivated by (13.11) we define the surface integral of f over ϕ by Z Z q f dS = (f ◦ ϕ)(u) det ϕ0 (u)t ϕ0 (u) du (13.12) ϕ

U

464

A Course in Real Analysis

whenever the right side exists. In particular, Z area(S) = 1 dS. ϕ

The integral on the right in (13.12) may be interpreted as a Lebesgue integral or (if ϕ has compact support) as a Riemann integral. In the latter case, it is a limit of Riemann sums q X (13.13) (f ◦ ϕ)(u) det ϕ0 (u)t ϕ0 (u) ∆u1 · · · ∆um . Q

This interpretation has important physical applications. For example, if f is the density in mass per unit area of a curved sheet S in R3 , then (13.13) approximates the mass of the surface element X ϕ {u + tj ej : 0 ≤ tj ≤ ∆uj } , j

hence ϕ f gives the mass of S. For another example, let f (x) be denote the R temperature of the sheet at point x ∈ S. Then [area(S)]−1 ϕ f dS gives the average temperature of the sheet. To evaluate (13.12), it is useful to note that since ϕ0 = ∂1 ϕ · · · ∂n ϕ , by 13.1.6 R

X

det ϕ0 (u)t ϕ0 (u) =

(i1 ,...,im )∈Im

2 ∂(ϕii , . . . ϕim ) (u) ∂(u1 , . . . , um )

(13.14)

The following instances of 13.12 are of particular interest. 13.2.4 Special Cases. (a) m = 1: Then det ϕ0 (u)t ϕ0 (u) = kϕ0 (u)k2 , hence Z Z Z 0 f dS = (f ◦ ϕ)(u)kϕ (u)k du = f ds, ϕ

U

ϕ

which is the line integral of Section 12.2. (b) m = 2: In this case det ϕ0 (u)t ϕ0 (u) = det hence

Z ϕ

f dS =

Z U

∂1 ϕ ∂1 ϕ ∂2 ϕ

∂2 ϕ

∂ ϕ · ∂1 ϕ ∂1 ϕ · ∂1 ϕ = 1 ∂1 ϕ · ∂2 ϕ ∂2 ϕ · ∂2 ϕ

q 2 (f ◦ ϕ) k∂1 ϕk2 k∂2 ϕk2 − ∂1 ϕ · ∂2 ϕ du.

Integration on Surfaces

465

(c) m = n − 1: Here det ϕ (u) ϕ (u) = 0

t

0

n X ∂(ϕ1 , . . . , ϕbi , . . . ϕn )

∂(u1 , . . . , un−1 )

i=1

hence

Z

f dS =

ϕ

Z

(u)

2

= k∂ϕ⊥ (u)k2 ,

(f ◦ ϕ)(u)k∂ϕ⊥ (u)k du.

U

(d) ϕ(u1 , . . . , un−1 ) = u1 , . . . , un−1 , g(u1 , . . . , un−1 ) (the graph of g): Let i = (1, . . . , i − 1, i + 1, . . . , n). Then 1 0 ··· 0 0 1 ··· 0 ∂(ϕ1 , . . . , ϕbi , . . . ϕn ) . . .. .. = .. . ∂(u1 , . . . , un−1 ) 0 0 ··· 1 ∂1 g ∂2 g · · · ∂n−1 g i ( n−1+i (−1) ∂i g, i < n, = (13.15) 1, i = n, hence and

det ϕ0 (u)t ϕ0 (u) = 1 + k∇g(u)k2 Z

f dS =

Z

ϕ

p (f ◦ ϕ)(u) 1 + ||∇g(u)||2 du.

♦

U

13.2.5 Example. Let S be the following portion of an n-dimensional cone: n o n X x2i , 0 < xn+1 < 1 . S = (x1 , . . . , xn+1 ) : x2n+1 = i=1

Then S is parameterized by ϕ(x) = x, g(x) , g(x) := kxk, x := (x1 , . . . , xn ), where ∇g(x) = x/kxk. If f is of the form f (x) = h(kxk), then, by Exercise 11.6.3, Z Z 1 √ f dS = 2 n αn h(r)rn−1 dr. ϕ

0

In particular, taking h = 1, area(S) =

√

2 αn ,

♦

The following result will be needed later to construct the integral of a function on a general m-surface. It asserts that the integral over a parameterized surface ϕ is invariant under a change of parameter and hence may be viewed as a construct intrinsic to the image of ϕ.

466

A Course in Real Analysis

13.2.6 Proposition. Let U and V be open subsets of Rm , α : V → U a C 1 function with C 1 inverse, and ϕ : U → Rn a Rparameterized m-surface. Then R ψ := ϕ ◦ α is a parameterized m-surface and ϕ f dS = ψ f dS. Proof. By the chain rule, ψ 0 (v) = ϕ0 (u)α0 (v), where u = α(v), hence det ψ 0 (v)t ψ 0 (v) = det α0 (v)t ϕ0 (u)t ϕ0 (u)α0 (v) 2 = Jα (v) det ϕ0 (u)t ϕ0 (u) . Therefore, by the change of variables theorem, Z Z q f dS = (f ◦ ψ)(v) det ψ 0 (v)t ψ 0 (v) dv ψ V Z q = (f ◦ ϕ)(α(v)) det ϕ0 (α(v))t ϕ0 (α(v)) |Jα (v)| dv ZV q = (f ◦ ϕ)(u) det ϕ0 (u)t ϕ0 (u) du ZU = f dS. ϕ

13.2.7 Remark. The material in this section holds, in particular, for a local parametrization of an m-surface as well as a local parametrization of an (n − 1)surface-with-boundary. In the latter case, the domain of the parametrization at a boundary point is an open set in Hn−1 . ♦

Integration of a Form on a Parameterized m-Surface 13.2.8 Definition. Let ϕ : U → Rn be a parameterized orientable m-surface in Rn and let X ω= fj1 ,··· ,jm dxj1 ∧ · · · ∧ dxjm (j1 ,··· ,jm )∈Jm

be a continuous m-form on S := ϕ(U ). The integral of ω over ϕ is defined by Z Z Z ω= ω = sign(ϕ) ωϕ(u) dϕu (e1 ), . . . , dϕu (em ) du. ♦ ϕ

S

U

The inclusion of sign(ϕ) corresponds to the familiar convention Z a Z b f (t) dt = − f (t) dt b

a

for Riemann integrals, which reflects the fact that the process of Riemann integration respects the natural orientation (ordering) of the interval [a, b]. Recalling that dϕu (ej ) = ∂j ϕ(u) and ∂(ϕj1 , . . . , ϕjm ) dxj1 ∧ · · · ∧ dxjm ∂1 ϕ(u), . . . , ∂m ϕ(u) = (u), ∂(u1 , . . . , um )

Integration on Surfaces we obtain the formula Z Z ω = sign(ϕ) ϕ

X

(fj1 ,...,jm ◦ ϕ)

U (j ,...,j )∈J 1 m m

467

∂(ϕj1 , . . . , ϕjm ) du. ∂(u1 , . . . , um )

(13.16)

The following instances of (13.16) are of particular importance. 13.2.9 Special Cases. Let ϕ be positively oriented. (a) m = 1:

Z X n

fi dxi =

i=1

ϕ

n Z X i=1

fi ϕ(t) ϕ0i (t) dt,

I

which is the integral of Section 12.2. (b) m = n − 1: Z X n ϕ

ci ∧ · · · ∧ dxn = fi dx1 ∧ · · · ∧ dx

i=1

Z X n ∂(ϕ1 , . . . , ϕbi , . . . ϕn ) du. (fi ◦ ϕ) ∂(u1 , . . . , un−1 ) i=1

U

In particular, for the graph ϕ(u1 , . . . , un−1 ) = u1 , . . . , un−1 , g(u1 , . . . , un−1 ) , we have from (13.15) Z X n

ci ∧ · · · ∧ dxn = fi dx1 ∧ · · · ∧ dx

fn ◦ ϕ +

U

i=1

ϕ

Z h

(c) m = 2, n = 3: Let Dij (u) :=

n−1 X

i (−1)n−1+i (fi ◦ ϕ)∂i g du.

i=1

∂(ϕi , ϕj ) . Then ∂(u1 , u2 )

Z

f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 ϕ Z = [(f1 ◦ ϕ)(u)D23 (u) + (f2 ◦ ϕ)(u)D13 (u) + (f3 ◦ ϕ)(u)D12 (u)] du. U

(d) m-form on parameterized surface ι : U → U : Z Z X gj1 ,··· ,jk duj1 ∧ · · · ∧ dujm = ι (j ,··· ,j )∈J 1 m m

X

gj (u) du.

13.2.10 Notation. For the integral on the left in (d) we write R . In particular, ι Z Z g duj1 ∧ · · · ∧ dujm = g(u) du. ι

♦

U j∈J m

U

R U

instead of

♦

468

A Course in Real Analysis

13.2.11 Example. Let S be the following portion of a paraboloid: S = (x1 , x2 , x3 ) : x1 = x22 + x23 , 0 < x1 < 1 . For purposes of integration, we may consider S to be the image of the parameterized 2-surface √ √ ϕ(t, θ) = (t, t cos θ, t sin θ), 0 < t < 1, 0 < θ < 2π, since there are no contributions to an integral on the set where θ = 0. By 13.2.9(c) , Z Z 1 Z 2π ∂(ϕ1 , ϕ2 ) x22 x3 dx1 ∧ dx2 = [t3/2 cos2 θ sin θ] dθ dt ∂(t, θ) S 0 0 Z 1 Z 2π =− t2 cos2 θ sin2 θ dθ dt 0

=−

0

π . 12

♦

The following proposition, the analog of 13.2.6 for differential forms, shows that the definition of integral of a form is invariant under reparametrizations. 13.2.12 Proposition. Let U, V be open connected subsets of Rm , α : V → U a C 1 function with C 1 inverse and positive Jacobian, and ϕ : U → Rn a parameterized orientable m-surface. If ω is a continuous m-form on ϕ(U ), then Z Z ω= ω. ϕ

ϕ◦α

Proof. Note first that sign(Jα ) is constant since α is C 1 and V is connected. Let ψ = ϕ ◦ α. By the chain rule and the change of variables theorem, Z ∂(ψj1 , . . . , ψjm ) fj1 ,...,jm ◦ ψ dv ∂(v1 , . . . , vm ) V Z ∂(ϕj1 , . . . , ϕjm ) α(v) Jα (v) dv = fj1 ,...,jm ◦ ϕ ◦ α (v) ∂(u , . . . , u ) 1 m ZV ∂(ϕj1 , . . . , ϕjm ) = (fj1 ,...,jm ◦ ϕ) du. ∂(u1 , . . . , um ) U The conclusion now follows from (13.16) and linearity of the integral. R The final result of this section expresses ϕ ω as an integral of a form on U . It will be needed in the proof of Stokes’s theorem. 13.2.13 Theorem. Let U ⊆ Rm be open and let ϕ : U → Rn be an oriented parameterized surface. If ω is a C 1 m-form on ϕ(U ), then Z Z ω = sign(ϕ) ϕ∗ ω. ϕ

U

Integration on Surfaces

469

Proof. By (d) of 13.1.19, if ι : U → U denotes the identity map then ωϕ(u) (dϕu (e1 ), . . . , dϕu (em )) = (ϕ∗ ω)u (e1 , . . . , em ) = (ϕ∗ ω)u (d ιu (e1 ), . . . , d ιu (em )), The result now follows directly from the definition of the integral of a form (13.2.8) and 13.2.10.

Exercises 1. Find the area of the following 2-surfaces in R3 . (a) ϕ(t, θ) = (t cos θ, t sin θ, t), t ∈ (0, 1), θ ∈ (0, 2π). (b)S ϕ(t, θ) = (t cos θ, t sin θ, θ), 0 < t < 1, 0 < θ < 2π. (c) ϕ(θ, s) = (1 − s) a cos θ, a sin θ, 0 + s b cos θ, b sin θ, 1), 0 < s < 1, 0 < θ < 2π, 0 < a < b. 2. Let a1 , . . . , am ∈ Rn be linearly independent and let b ∈ Rn . Define ϕ : Rm → Rn by ϕ(u1 , . . . , um ) = b +

m X

ui a i .

i=1

(See 12.3.2.) For a continuous function f on Rn , prove that Z Z p f = det(At A) (f ◦ ϕ)(u) du, Rn

ϕ

where A = a1 · · · am n×m . 3. Let ϕ be as in Exercise 2. Show that Z Z X X fi dxi = det(Ai ) fi ◦ ϕ du. ϕ i∈I m

i∈Im

U

4.S Show that the area of the Cartesian product of circles ϕ(θ1 , . . . , θm ) = r1 cos θ1 , r1 sin θ1 , . . . , rm cos θm , rm sin θm , ri > 0, is (2πr1 )(2πr2 ) · · · (2πrm ). 5. Let ϕ be the product of two circles: ϕ(θ1 , θ2 ) = r1 cos θ1 , r1 sin θ1 , r2 cos θ2 , r2 sin θ2 , ri > 0, and let ω = f12 dx1 ∧ dx2 + f13 dx1 ∧ dx3 + f14 dx1 ∧ dx4 + f23 dx2 ∧ dx3 + f24 dx2 ∧ dx4 + f34 dx3 ∧ dx4 .

470

A Course in Real Analysis Show that Z Z ω = r1 r2 ϕ

0

2π

Z

2π

(f13 ◦ ϕ) sin θ1 sin θ2 − (f14 ◦ ϕ) sin θ1 cos θ2 0 − (f23 ◦ ϕ) cos θ1 sin θ2 + (f24 ◦ ϕ] cos θ1 cos θ2 dθ dφ.

6.S (Area of an n-dimensional simplex in Rn+1 ). Use Example 11.5.5 to find the surface area of n+1 n o X S = (x1 , . . . xn+1 ) : xj = 1 and xj ≥ 0 . j=1

x3 1

S

1

x1

x2

1

FIGURE 13.2: Two dimensional simplex S in R3 . 7.S Let U ⊆ Rn−2 be open and let ψ : U → Rn−1 be a parameterized (n − 2)-surface in Rn−1 . Let ϕ : U × [0, h] → Rn be the cylinder ϕ(u, s) = ψ(u), s , u ∈ U, 0 ≤ s ≤ h. Show that area(ϕ) = h · area(ψ). 8.S Let ϕ be the cylinder of Exercise 7 for n = 3 and h = 1. Show that Z f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 ϕ

=

Z

1

(f1 ◦ ϕ)ψ10 + (f2 ◦ ϕ)ψ20 dt.

0

9. Let ψ : [a, b] → R2 be a C 1 curve in R2 and let ϕ : [a, b] × (0, h) → R3 be the cone ϕ(t, s) = (1 − s/h)ψ(t), s , a ≤ t ≤ b, 0 < s < h. Show that the area of ϕ is Z q 2 h b 0 2 0 2 ψ1 (t) + ψ2 (t) + h−2 [ψ1 (t)ψ20 (t) − ψ2 (t)ψ10 (t) dt. 2 a

Integration on Surfaces

471

Use this to show that the √ surface area of a right circular cone with radius r and axis length h is πr r2 + h2 . 10. Let ϕ be the cone of Exercise 9 with h = 1. Show that Z f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 ϕ

=

1 2

Z

1

n o (f1 ◦ ϕ)ψ10 + (f2 ◦ ϕ)ψ20 + [ψ1 (t)ψ20 (t) − ψ2 (t)ψ10 (t)] dt.

0

11.S Let ψ : [a, b] → R2 a parameterized C 1 curve with ψ2 (t) > 0 for all t. Define ϕ(t, θ) = ψ1 (t), ψ2 (t) cos θ, ψ2 (t) sin θ , t ∈ I, θ ∈ (0, 2π), which is the parameterized surface of revolution of 12.3.9. Show that Z b area(ϕ) = 2π ψ2 (t)kψ 0 (t)k dt = (2πy)length(ψ), (13.17) a

Z 1 y ds, the y-coordinate of the length(ψ) ψ centroid of ψ. Use the first part of (13.17) to find the surface area of the torus ϕ(t, θ) = a cos θ, (b + a sin t) cos θ, (b + a sin t) sin θ , 0 < θ, t < 2π,

where (x, y) = ψ and y :=

where 0 < a < b. Show also that the area of the cone in Exercise 9 may be found from (13.17). 12. Let ϕ be the parameterized surface of revolution in Exercise 11 and let ω := f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 . Show that Z Z bZ 2π Z bZ 2π (f2 ◦ ϕ)ψ10 (t)ψ2 (t) cos θ dθ dt ω= (f1 ◦ ϕ)ψ2 (t)ψ20 (t) dθ dt + ϕ

a

a

0

Z bZ − a

0

2π

(f3 ◦ ϕ)ψ10 (t)ψ2 (t) sin θ dθ dt.

0

Show also that if ψ(t) = (t, g(t)) (the graph of g), then this reduces to Z b Z 2π g(t) (f1 ◦ ϕ)g 0 (t) + (f2 ◦ ϕ) cos θ − (f3 ◦ ϕ) sin θ dθ dt. a

0

13. Use Exercise 12 to evaluate Z Z Z (a)S x1 x3 dx1 ∧ dx2 , (b) x2 x3 dx1 ∧ dx2 , (c) x21 x22 dx2 ∧ dx3 , S

S

S

where S is the cone S = (x1 , x2 , x3 ) : x21 = x22 + x23 , 0 < x1 < 1 .

472

A Course in Real Analysis

14. Repeat Exercise 13 using the portion of the hyperboloid n √ o S = (x1 , x2 , x3 ) : x21 − x22 − x23 = 1, 1 < x1 < 2 . p 2 15. Let S be the torus given by x21 + x22 + x23 − b = a2 , where 0 < a < b. Use Exercise 12 to evaluate Z Z Z (a) x2 dx2 ∧ dx3 . (b)S x1 dx2 ∧ dx3 . (c) x2 dx1 ∧ dx3 . S

13.3

S

S

Partitions of Unity

The theorem proved in this section will be used to extend the definition of the integral to functions and forms on m-surfaces. It will also be needed later in the proofs of Stokes’s theorem and the divergence theorem. 13.3.1 Definition. The support of a continuous function ψ : Rn → R is defined by supp(ψ) = cl {x : ψ(x) 6= 0} . ♦ Thus, by definition of closure, supp(ψ) is the smallest closed set outside of which ψ is zero. 13.3.2 Partition of Unity. Let K be a compact subset of Rn and let {Ui : i ∈ I} be an open cover of K. Then there exists a finite subcover {U1 , . . . , Up } of K and C ∞ functions χi : Rn → [0, i = 1, . . . , p, such P+∞), p that supp(χi ) is compact and contained in Ui and i=1 χi = 1 on K.

χ1

χ2

K U1

U2

FIGURE 13.3: A partition of unity subordinate to U1 and U2 . The functions χi are said to form a partition of unity subordinate to the open sets Ui . They are typically used to patch together local data to form a global construct such as a surface integral, or to reduce a global problem to a local one, as in the case of the proof of Stokes’s theorem. The proof of 13.3.2 requires several lemmas which are of intrinsic interest. 13.3.3 Lemma. Let a < b. Then there exists a C ∞ function h : R → [0, +∞) such that h > 0 on (a, b), and h = 0 on (a, b)c .

Integration on Surfaces

473

Proof. Define h by ( exp (x − a)−1 (x − b)−1 if a < x < b, h(x) = 0 otherwise. Clearly, h(m) = 0 on [a, b]c for all m ≥ 0. Moreover, if x ∈ (a, b), then h(m) (x) is a sum of terms of the form ±h(x) , p, q ∈ Z+ . (x − a)p (x − b)q Since the exponent (x − a)−1 (x − b)−1 is negative on (a, b), by l’Hospital’s rule, lim

x→a+

h(x) = 0. (x − a)p (x − b)q

Therefore, limx→a h(m) (x) = 0, and an induction argument then shows that h(m) (a) = 0 for all m. A similar argument holds at the point b. Thus h is C ∞ on R.

FIGURE 13.4: The functions h and g. 13.3.4 Lemma. Let a < b. Then there exists a C ∞ function g : R → R such that 0 ≤ g ≤ 1, g = 0 on (−∞, a], and g = 1 on [b, +∞). Proof. Let h be the function in 13.3.3. Then g(x) :=

hZ a

b

i−1 Z h

x

h

a

has the required properties. 13.3.5 Lemma. Let I = (a1 , b1 ) × · · · × (an , bn ). Then there exists a C ∞ function f : Rn → R such that f > 0 on I and f = 0 on I c . Proof. For each j, let hj : R → [0, +∞) be a C ∞ function such that hj > 0 on (aj , bj ) and hj = 0 on (aj , bj )c . The function f (x1 , . . . , xn ) := h1 (x1 ) · · · hn (xn ) then satisfies the requirements.

474

A Course in Real Analysis

For the next lemma we define the open cube with center x ∈ Rn and edge 2r by {y ∈ Rn : xj − r < yj < xj + r, j = 1, . . . , n} . 13.3.6 Lemma. Let K ⊆ U ⊆ Rn , where K is compact and U is open. Then there exists a C ∞ function ψ : Rn → [0, 1] such that supp(ψ) ⊆ U and ψ = 1 on K. Proof. For each x ∈ K, let Vx be an open cube with center x and edge 2r such that cl Vx ⊆ U and let Wx ⊆ Vx denote the concentric open cube with center x and edge r. Since K is compact, there exist finitely many cubes Wx whose union contains K. Denote these cubes by W1 , . . . , Wm and denote the corresponding cubes Vx by V1 , . . . , Vm . (See Figure 13.5.) By 13.3.5, for each i

f =0

f >0 U Wi Vi

K

FIGURE 13.5: The cubes Wi and Vi . there exists a C ∞ function fi : Rn → R such that fi > 0 on Wi and fi = 0 on Wic . Set m m m X [ [ f= fi , V = Vi , and W = Wi . i=1

i=1

i=1

Then f is nonnegative and C on R , f > 0 on W ⊇ K, and supp(f ) ⊆ cl(V ) ⊆ U . Now let a = minx∈K f (x). Since a > 0, there exists a C ∞ function g : R → [0, 1] such that g = 0 on (−∞, 0] and g = 1 on [a, +∞) (13.3.4). The function ψ := g ◦ f then has the required properties. ∞

n

Proof of the partition of unity theorem. For each x ∈ K, let i(x) be an index such that x ∈ Ui(x) . Choose a bounded open set Vx containing x such that cl Vx ⊆ Ui(x) . Since K is compact, finitely many of the sets Vx cover K. Denote these by V1 , . . . Vp and denote the corresponding sets Ui(x) by U1 , . . . , Up . Since Vi ⊆ Ki := cl(Vi ) ⊆ Ui , by 13.3.6 there exists a C ∞ function ψi : Rn → [0, 1] such that ψi = 1 on Ki and supp(ψi ) ⊆ Ui . Now set χ1 = ψ1 and χi = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψi−1 )ψi , i > 1. Then χi is C ∞ , 0 ≤ χi ≤ 1, and supp(χi ) ⊆ supp(ψi ) ⊆ Ui . Finally, let ηi = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψi ).

Integration on Surfaces

475

For i > 1, ηi−1 − ηi = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψi−1 ) 1 − (1 − ψi ) = χi , hence p X

χi = χ1 +

p X

i=1

(ηi−1 − ηi ) = χ1 + η1 − ηp = 1 − ηp .

i=2

S S Pp Since K ⊆ i Vi ⊆ i Ki and ψi = 1 on Ki , ηp = 0 on K, hence i=1 χi = 1 on K, completing the proof.

13.4

Integration on Compact m-Surfaces

In this section we define the integrals of a function and a form on a compact m-surface S = {x ∈ V : F (x) = 0} , where V ⊆ Rn is open, F : V → Rn−m is C 1 , and F 0 (x) has rank n − m for all x ∈ V . To set the stage, let {(Ua , ϕa ) : a ∈ S} be an atlas for S. By the partition of unity theorem, there exist finitely many charts (Ui , ϕi ) := Uai , ϕai and C 1 n functions P χi : R → R such that the sets Si := ϕi (Ui ) cover S, supp(χi ) ⊆ Si , and i χi = 1 on S.

Integral of a Function The (surface) integral of a continuous function f on S is defined by Z XZ XZ q f dS = χi f = (χi f ) ◦ ϕi (u) det ϕ0i (u)t ϕ0i (u) du. S

i

ϕi

i

Ui

To see that the integral is independent of the system { Ui , ϕi , χi )}i and hence ˜j , ϕ˜j , χ is well-defined, consider another such system { U ˜j }j . Since X X χi = χi χ ˜j and χ ˜j = χ ˜j χi on S, j

we see that XZ i

Set

ϕi

f χi =

i

XZ i,j

ϕi

f χi χ ˜j and

XZ j

ϕ ˜j

fχ ˜j =

XZ i,j

ϕ ˜j

−1 −1 ˜ ˜ αij = ϕ˜−1 j ◦ ϕi : ϕi (Si ∩ Sj ) → ϕj (Si ∩ Sj ).

fχ ˜j χi .

476

A Course in Real Analysis

˜ ˜j R= 0 outside Si ∩ S˜j and ϕi = ϕ˜j ◦ αij on ϕ−1 i (Si ∩ Sj ), by 13.2.6 RSince f χi χ f χi χ ˜j = ϕ˜j f χi χ ˜j . Therefore, ϕi XZ XZ f χi = fχ ˜j , ϕi

i

ϕ ˜j

j

as required. The definition of the integral is extended to a finite union S of compact m-surfaces S1 , . . . , Sp by defining Z XZ f dS = f dS. S

Si

i

13.4.1 Definition. The area of S is defined as Z area(S) = 1 dS.

♦

S

13.4.2 Example. In 11.5.6 we found that the volume of the closed ball Crn (0) = {x ∈ Rn : ||x|| ≤ r} is rn αn , where (n−1)/2 2(2π) if n is odd, ···3 · 1 αn = n(n − 2)n/2 (2π) if n is even. n(n − 2) · · · 4 · 2 We now show that for the sphere S := Srn−1 (0) = {x ∈ Rn : ||x|| = r}, n area(S) = nrn−1 αn = λn Crn (0) . (13.18) r To this end, note that the upper hemisphere H u of S is the graph of the function q p g(x1 , . . . , xn−1 ) = r2 − (x21 + · · · + x2n−1 ) = r2 − ||x||2 , ||x|| ≤ r. Let 0 < t < 1 and consider the part of the hemisphere Htu for which ||x|| < rt. Since ||x||2 r2 1 + ||∇g(x)||2 = 1 + 2 = , r − ||x||2 r2 − ||x||2 by 13.2.4(c) area(Htu ) = r

Z

r2 − ||x||2

||x|| 0}.

♦

Note that Sa is an (n − 1)-surface in Rn and hence has a local parametrization at each x ∈ Sa . Let ~na = k∇Fa k−1 ∇Fa , and let x ∈ Sa . For sufficiently small |t|, h(t) := x + t~na (x) ∈ Ua . Since (Fa ◦ h)0 (0) = ∇Fa (x) · ~na (x) = k∇Fa (x)k > 0, (Fa ◦ h) is strictly increasing. Because (Fa ◦ h)(0) = 0 we therefore have ( < 0 if t < 0, Fa x + t~na (x) > 0 if t > 0. It follows from (ii) and (iii) that the normal vector t~na (x) to Sa at x points into E if t < 0 and away from E (that is, toward E c ) if t > 0. The exterior unit normal vector on bd(E) is then defined by ~n(x) = ~na (x), x ∈ Sa . Uniqueness and continuity of ~na shows that ~n is well-defined and continuous on bd(E). (See Figure 13.6.)

Integration on Surfaces

Ua

→ − n (x)

Fa > 0

483

a x Fa = 0 Fa < 0

E

FIGURE 13.6: Regular region E. 13.5.6 Example. The n-dimensional annulus E = {x ∈ Rn : r1 < kxk < r2 } is a regular region in Rn . Here, bd(E) has the components Si = {x ∈ Rn : kxk = ri } , i = 1, 2. The conditions of regularity are met by defining ( r1 − kxk on {x ∈ Rn : kxk < (r1 + r2 )/2} if a ∈ S1 , Fa (x) = kxk − r2 on {x ∈ Rn : kxk > (r1 + r2 )/2} if a ∈ S2 . Figure 13.7 depicts the case n = 2.

♦

S1

S2

E FIGURE 13.7: Annulus in R2 with exterior normal. 13.5.7 Divergence Theorem. If E is a regular region in Rn and ω is a C 1 1-form on cl(E), then Z Z ω · ~n dS = div ωx dx. (13.25) bd(E)

E

484

A Course in Real Analysis

Proof. The proof uses ideas similar to those used in the proof of Stokes’s theorem. By hypothesis, ω is C 1 on an open set containing cl(E), which we may assume also contains the sets Ua in 13.5.5. Since cl(E) is compact, by using a partition of unity as in the proof of Stokes’s theorem, we may assume that for any a = (a1 , . . . , an ) ∈ cl(E) and the neighborhoods W of a constructed in the proof, n [ K := supp(fi ) ⊆ W. i=1

Suppose first that a ∈ E. Choose an n-dimensional interval W containing a such that cl(W ) ⊆ E. If K ⊆ W , then ω = 0 on W c ⊇ bd(E), hence Z

ω · ~n dS = 0 and

Z

div ωx dx =

E

bd(E)

Z

n X

∂i fi (x) dx = 0,

W i=1

the last equality by the Fubini–Tonelli theorem and the fundamental theorem of calculus. Therefore, (13.25) holds in this case.

bd(E) K

W E

a

FIGURE 13.8: The case a ∈ E. Now let a ∈ bd(E) and let Ua and Fa be as in 13.5.5. We may assume that the components ai of a and ni (a) of ~n(a) are positive, otherwise apply a rotation and translation; the change of variables theorem implies that (13.25) is invariant under such transformations. (See Exercise 11 below for a special case of this.) We show that for each i = 1, . . . , n, there exists a neighborhood Wi of a such that if K ⊆ Wi then Z Z fi ni dS = ∂i fi (x) dx. (13.26) S

E

For notational simplicity, we do this for the case i = n. Since ∂n Fa (a) 6= 0, by the implicit function theorem there exists a neighborhood V of (a1 , . . . , an−1 ), an open interval I containing an , and a C 1 function g : V → R such that V × I ⊆ Ua , an = g(a1 , . . . , an−1 ), and Fa x1 , . . . , xn−1 , g(x1 , . . . , xn−1 ) = 0 on V. By continuity, we may choose V and I sufficiently small so that g(x1 , . . . , xn−1 ) > 0 and ∂n Fa (x) > 0 for all x ∈ V × I. Now let x = (x1 , . . . , xn ) ∈ (V × I) ∩ E. Since Fa (x) is a strictly increasing

Integration on Surfaces

485

xn = g(x1 , . . . , xn−1 )

I bd(E) xn < g(x1 , . . . , xn−1 )

V ×I

a

K

V Ua

E

FIGURE 13.9: The case a ∈ bd(E). function of xn ∈ I when the other coordinates are fixed and since Fa (x) < 0, it must be the case that 0 < xn < g(x1 , . . . , xn−1 ). Thus (V × I) ∩ E = {x ∈ V × I : 0 < xn < g(x1 , . . . , xn−1 )} and (V × I) ∩ Sa = {x ∈ V × I : xn = g(x1 , . . . , xn−1 )} . (See Figure 13.9.) Note that the function ϕ defined by ϕ(v) := (v, g(v)), v = (v1 , . . . , vn−1 ) ∈ V, is a local parametrization of Sa with unit normal (1 + k∇gk2 )−1/2 − ∇g, 1 . Since this points outward it coincides with ~n. In particular, the nth component of ~n is nn = (1 + k∇gk2 )−1/2 on (V × I) ∩ Sa . Therefore, if K ⊆ V × I then, by 13.2.4(d), Z Z Z fn p dS = (fn ◦ ϕ)(v) dv. (13.27) fn nn (a) dS = 1 + k∇gk2 V Sa (V ×I)∩Sa On the other hand, since fn = ∂n fn = 0 outside K, by the Fubini–Tonelli theorem, Z Z ∂n fn (x) dx = ∂n fn (x) dx (V ×I)∩E

E

= =

Z Z ZV

g(v1 ,...,vn−1 )

(∂n fn )(v1 , . . . , vn−1 , xn ) dxn dv1 . . . dvn−1

0

(fn ◦ ϕ)(v) dv,

(13.28)

V

the last equality by the fundamental theorem of calculus. Setting Wn = V × I and comparing (13.27) and (13.28), we see that (13.26) holds for i = n. A similar proof works for i < n. Thus if K ⊆ W1 ∩ · · · ∩ Wn , then (13.26) holds for all i. Summing from 1 to n we obtain (13.25).

486

A Course in Real Analysis

Connection with Stokes’s Theorem Let E be a regular region in Rn whose boundary is a finite union of compact connected (n − 1)-surfaces of the form S = {x : F (x) = 0}, where F : U → R is a C 1 function with ∇F 6= 0 such that Ua = U and Fa = F for all a ∈ S. A ball or annulus in Rn are simple examples. By 12.4.8, S is oriented and, for each local parametrization ϕ : V → S, ~n(ϕ(v)) = q

n X ∂(ϕ1 , . . . , ϕ [ i−1 , . . . , ϕn ) (−1)i−1 (v). ∂(v , . . . , vn−1 ) 1 det ϕ0 (v)t ϕ0 (v) i=1

±1

where the sign is chosen to be the same for all v. Let each S have the orientation for which the sign is (+). We shall call the resulting orientation of bd(E) positive. In this setting we have the following consequence of the divergence theorem. 13.5.8 Theorem. Let E be as described above and let ω=

n X

fi dx1 ∧ · · · ∧ d dxi ∧ · · · ∧ dxn

i=1

be an (n − 1)-form on cl(E). If bd(E) is positively oriented, then Z Z ω= dω. E

bd(E)

Proof. Recalling the additive definition of bd(E) ω, we may assume that bd(E) consists of a single compact connected (n − 1)-surface S. Let R

η :=

n X (−1)i−1 fi dxi . i=1

By the above, (~n · η) ◦ ϕ(v) = q

n X ∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) (fi ◦ ϕ)(v) (v), ∂(v1 , . . . , vn−1 ) det ϕ0 (v)t ϕ0 (v) i=1

1

hence Z ϕ

~n · η dS =

n Z X i=1

∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) (fi ◦ ϕ) dv = ∂(v1 , . . . , vn−1 ) V

Using a partition of unity we obtain Z Z ~n · η dS. ω= S

S

Z ω. ϕ

(13.29)

Integration on Surfaces

487

On the other hand, dω =

n X n X (∂j fi ) dxj ∧ dx1 ∧ · · · ∧ d dxi ∧ · · · ∧ dxn i=1

j=1

X n i−1 = (−1) ∂i fi dx1 ∧ · · · ∧ dxn i=1

= div η dx1 ∧ · · · ∧ dxn , hence, recalling 13.2.10, Z Z Z dω = div ηx dx1 ∧ · · · ∧ dxn = div ηx dx E

E

(13.30)

E

The conclusion now follows from (13.29), (13.30), and the divergence theorem. 13.5.9 Remark. The divergence theorem has an interesting application to fluid dynamics. Consider an incompressible fluid moving in space. Let ρ(x, t) denote the density of the fluid in mass per unit volume at time t and point x, and let ~v (x, t) denote its velocity. If ~n is normal to a small surface element of area ∆S, then (ρ~v · ~n)(∆S)(∆t) is approximately the mass of the fluid flowing across that surface element during a small time period ∆t. The rate of flow is then (ρ~v · ~n)∆S. Adding these quantities and taking limits, we see that the rate of flow of the fluid across a surface S in the direction of the normal is given by the integral Z ρ~v · ~n dS S

Now let E be a regular region with smooth boundary S. Applying the foregoing to a ball Bε in E with boundary Sε , center y, and outer normal ~n, we see that the integral Z ρ~v · ~n dS

Sε

represents the rate of flow of the fluid out of the ball, that is, the negative of the rate of R change of fluid in the ball. Since the amount of fluid in the ball at time t is Bε ρ(x, t) dx, d dt

Z Bε

ρ(x, t) dx = −

Z Sε

ρ~v · ~n dS = −

Z

div (ρ~v ) dx,

Bε

the last equality by the divergence theorem. Differentiating under the integral sign and dividing by vol(Bε ), we obtain Z Z 1 1 ∂t ρ(x, t) dx = − div (ρ~v ) dx. vol(Bε ) Bε vol(Bε ) Bε

488

A Course in Real Analysis

Letting ε → 0, we obtain ∂t ρ(y, t) = −div ρ(y, t)~v (y, t) . In particular, if ρ is constant in time, then div (ρ~v ) is zero throughout E, hence Z Z ~ ρ~v · n dS = div (ρ~v ) dx = 0, S

E

that is, the amount of fluid flowing out of E equals the amount flowing in. ♦

Green’s Theorem Let E be a regular region in R2 with boundary the union of finitely many smooth simple pairwise disjoint curves C = ϕ(I). The boundary bd(E) is said to be positively oriented if the vector obtained by rotating the unit tangent vector T~ , which is in the direction of (ϕ01 , ϕ02 ), 90 degrees clockwise. This produces the exterior normal ~n on C, which is in the direction of (ϕ02 , −ϕ01 ). The region is then to the left as the boundary is traced in the direction of the tangent vector field on each curve C.

C1

T~

E

~n

C2 C3 FIGURE 13.10: Regular region E in R2 . Now let ω = Q dx − P dy. Then (ω · ~n)◦ϕ = (Q◦ϕ, −P ◦ϕ)·(ϕ02 , −ϕ01 )kϕ0 k−1 = (P ◦ϕ)ϕ01 +(Q◦ϕ)ϕ02 kϕ0 k−1 , hence

Z

ω · ~n ds =

C

Z

(P dx + Q dy).

C

Summing over the curves C, we have Z Z ~ ω · n ds = bd(E)

(P dx + Q dy).

bd(E)

Since

∂Q ∂P − , ∂x ∂y we obtain the following important special case of the divergence theorem. div ω =

Integration on Surfaces

489

13.5.10 Green’s Theorem. Let E be a region in R2 , as described above. If P, Q are C 1 functions on an open set containing E, then Z ZZ ∂Q ∂P (P dx + Q dy) = − dx dy. (13.31) ∂x ∂y bd(E) E 13.5.11 Corollary. The area of E is given by Z 1 (x dy − y dx). area(S) = 2 ∂S Proof. Apply Green’s theorem to P (x, y) = −y/2, Q(x, y) = x/2, noting that Qx − Py = 1. x2 y2 13.5.12 Example. The ellipse 2 + 2 = 1 has parametrization x = a cos t, a b y = b sin t, 0 ≤ t ≤ 2π. Therefore, the area inside the ellipse is Z 1 2π ab(cos2 t + sin2 t) dt = πab. ♦ 2 0

The Piecewise Smooth Case Both Stokes’s theorem and the divergence theorem may be extended to more general surfaces called piecewise smooth. In the case n = 3, these are finite unions of smooth surfaces S1 , . . ., Sk that fit together so that • no three surfaces meet in more than a single point, and • the common boundary of two of these surfaces consists of finitely many disjoint piecewise smooth simple closed curves. S3 S5

S3 S2

S4 S2

S1 S1

FIGURE 13.11: Piecewise smooth surfaces. R (See Figure 13.11.) If S is such a surface, then the surface integral S f dS is Pk R defined as the sum j=1 Sj f dS. The integral of a form on S has an analogous definition. These definitions are reasonable since, by cancelations, the common

490

A Course in Real Analysis

boundary of a pair of surfaces contributes nothing to the integral. We illustrate the basic idea with the simple example of a cube. Removing a face of the cube results in a surface-with-boundary Q, which we orient by the outward normal. If Stokes’s theorem is applied to each of the five faces and the results are added, the integrals along the boundaries cancel and one is left with Stokes’s theorem for Q: Z Z ~ ~ ~ dS. F · dr = curl F~ · N ∂Q

Q

Q

∂Q FIGURE 13.12: Oriented cube without bottom face. Similarly, Green’s theorem extends to regions in R2 whose boundaries are only piecewise smooth. This, of course, leads to extended versions of its corollaries. Here’s an application of the extended version of 13.5.11: 13.5.13 Example. Let ∂S be a closed polygon consisting of m line segments Li := [(ai , bi ) : (ai+1 , bi+1 )], i = 1, 2, . . . , m, where (am+1 , bm+1 ) = (a1 , b1 ) and the vertices are in counterclockwise order. (See Figure 13.13.)

(a4 , b4 ) L4 (a5 , b5 )

L3 (a3 , b3 )

L2 (a2 , b2 )

L5 L1 (a1 , b1 ) FIGURE 13.13: Closed polygon.

Integration on Surfaces

491

Then Li has the parametrization x = (1 − t)ai + tai+1 , y = (1 − t)bi + tbi+1 , 0 ≤ t ≤ 1, hence Z

(x dy − y dx) = (bi+1 − bi )

Z

Li

1

(1 − t)ai + tai+1 dt

0

− (ai+1 − ai )

Z

1

(1 − t)bi + tbi+1 dt

0

= ai bi+1 − ai+1 bi . Therefore,

m

area(S) =

1X (ai bi+1 − ai+1 bi ). 2 1=1

♦

Exercises 1.S Verify directly the following version of Stokes’s theorem Z [f dx + g dy + h dz] ∂S Z = (hy − gz ) dy ∧ dz + (fz − hx ) dx ∧ dz + (gx − fy ) dx ∧ dy , S

where S is the cylinder (x, y, z) : x2 + y 2 = 1, 0 ≤ z ≤ 1 . 2. For (x, y) 6= (0, 0) define P (x, y) =

−y dx x2 + y 2

and Q(x, y) =

x dy . x2 + y 2

Show that (a) Qx = Py . R (b) ϕr P dx + Q dy = 2π, where ϕr (t) = (r cos t, r sin t), 0 ≤ t ≤ 2π. R (c) ψ P dx + Q dy = 2π, where ψ is any piecewise smooth, clockwise oriented, simple closed curve enclosing (0, 0). Z 2π cos2m t sin2m t 2π + (d) 4m+2 dt = (2m + 1)ab , m ∈ Z , a, b > 0. 2 4m+2 2 a cos t + b sin t 0 3. Let 0 < r < R and let S = (x, y) : r2 ≤ x2 + y 2 ≤ R2 . Verify Green’s

492

A Course in Real Analysis theorem on S for (a) S P (x, y) = p

−y + y

x2

(b) P (x, y) = p

y2

x2 + y 2 x (c) P (x, y) = 2 , x + y2

,

Q(x, y) = p

,

Q(x, y) = p

x + y2 x

x2

x2 + y 2 −y Q(x, y) = 2 . x + y2

. .

4. Use Green’s theorem to evaluate the following integrals, where the curves C have counterclockwise orientation. Z (a) sin(x − y) dx + sin(x + y) dy , C = bd [0, π/2] × [0, π/2] . ZC −xy (b) e dx + exy dy , C = bd [0, 1] × [0, 1] . ZC (c) cos(xy) dx + sin(xy) dy , C = bd [0, 1] × [0, 1] . ZC (d)S f (x) dx + g(y) dy , where f and g are C 1 and C is simple, closed, C

and piecewise C 1 . 5.S Use 13.5.11 to show that the area enclosed by the “elliptical astroid”

x2 a2

1/(2m+1) 2 1/(2m+1) y + = 1, a > 0, b > 0, m ∈ Z+ , b2

is given by Z

π/2

β

cos2m t + sin2m t) dt =

0

βπ (2m − 1)(2m − 3) · · · 5 · 3 , 2 2m(2m − 2) · · · 4 · 2

where β := 4−m ab(m + 21 ). (See 5.3.4.) 6. Let E be a regular region in Rn and let f and g be C 2 on cl(E). Prove Green’s formulas: Z Z ~ (a) f ∇g · n dS = ∇f · ∇g + f ∇2 g dx. E

bd(E)

(b)

Z

(f ∇g − g∇f ) · ~n dS =

f ∇2 g − g∇2 f dx,

E

bd(E)

where ∇2 f :=

Z

n X ∂2f i=1

∂x2i

, the Laplacian of f .

Integration on Surfaces

493

7. A C 2 function f is said to be harmonic on set S ⊆ Rn if ∇2 f = 0 on an open set containing S. R (a) Show that if f is harmonic on the ball Cr (0), then Sr (0) ∇f · ~n dS = 0. (b) Show that if f and g are harmonic on the region cl(E) of 13.5.6 and ~nt = kxk−1 x on St := St (0), then Z Z ~ ∇f · n1 dS = ∇f · ~n2 dS S1

and

Z

S2

(g ∇f − f ∇g) · ~n1 dS =

S1

Z

(g ∇f − f ∇g) · ~n2 dS.

S2

8. Let E ⊆ Rn be a regular region and let f be harmonic on cl(E) (Exercise 7). Show that Z Z 2 k∇f k dx = f ∇f · ~n dS, E

bd(E)

where ~n is the outer normal. Deduce that if f = 0 on bd(E) and E is connected, then f = 0 on E. 9.S Let E ⊆ Rn be a regular region and let f and g be harmonic on cl(E) (Exercise 7). Show that Z Z (f ∇g + g∇f ) · ~n dS = 2 ∇f · ∇g dx, E

bd(E)

where ~n is the outer normal. 10. Let n > 2. For t > 0, let Ct = Ct (0), St = St (0), and ~nt (x) = kxk−1 x, the outer normal to St . Suppose f is harmonic on Cr (Exercise 7). Prove the average value property of harmonic functions Z 1 f dS f (0) = area(Sr ) Sr by verifying (a)–(f) for 0 < t ≤ r. (Refer to 13.4.2.) (a) The function g(x) := kxk2−n , x 6= 0, is harmonic. Z Z 2−n (b) f ∇g · ~nt dS = n−1 f dS. t St St Z (c) g∇f · ~nt dS = 0. St

(d)

1 tn−1

Z St

f dS =

1 rn−1

Z f dS. Sr

494

A Course in Real Analysis Z 1 1 (e) f dS = f dS. area(Sr ) Sr area(St ) St Z 1 f dS = f (0). (f) lim t→0 area(St ) S t Z

11. Let E be a region as in the statement of Green’s theorem. For the functions ψ in (a) and (b) below, prove that if the conclusion of Green’s theorem holds for ψ(E), then it holds for E. (This is a special case of the statement in the proof of the divergence that the region E may be rotated and translated without loss of generality.) (a) ψ is the translation ψ(x, y) = (x + x0 , y + y0 ). (b) ψ is the rotation ψ(x, y) = x cos θ − y sin θ, x sin θ + y cos θ . S1

S1

S2

C

C

S2 (a)

(b)

FIGURE 13.14: Surfaces S1 and S2 with common boundary C. 12.S Orient the surfaces S1 and S2 in (a) and (b) of Figure 13.14 by their outer normals ~n. Show that in Z Z Z (a), curl F~ · ~n dS = 0; (b), curl F~ · ~n dS = curl F~ · ~n dS. S1 ∪S2

S1

S2

13. Let a ∈ Rn , n > 2, and define an (n − 1) form ω on Rn+1 \ {a} by ωx = kx − ak−n

n X

ci ∧ · · · ∧ dxn . (−1)i−1 (xi − ai ) dx1 ∧ · · · ∧ dx

i=1

Show that dω = 0. Conclude that if S is Ra compact, oriented n-surfacewith-boundary in Rn+1 and a 6∈ S, then ∂S ω = 0. 14.S Use the divergence theorem and 11.5.6 to show that the area of the sphere Sr (0) is nrn−1 αn , derived by another method in 13.4.2. 15. Let E ⊆ Rn be a regular region and a ∈ E. Define f on Rn \ {a} by

Integration on Surfaces

495

f (x) = kx − ak2−n . Show that div ∇f = 0. Conclude that if Cr (a) ⊆ E, then Z Z (∇f ) · ~n dS = (∇f ) · ~n dS = (2 − n)nαn , bd(E)

Sr (a)

where ~n denotes the outer normals.

Closed Forms in Rn

*13.6

13.6.1 Definition. A C 1 m-form ω on an open subset W of Rn is said to be closed if d ω = 0. The form ω is exact if there exists a C 2 (m − 1)-form η on W such that d η = ω. ♦ By 13.1.16(b), an exact form is closed. The converse is false (see Exercise 13.5.2). However, there is a general class of regions on which every closed m-form is exact. We consider first the case m = 1.

Closed 1-Forms on Simply Connected Regions 13.6.2 Definition. An open connected subset U of Rn is said to be simply connected if for each closed C 2 curve ϕ : [a, b] → Rn in U there exists a C 2 function Φ : [a, b] × [0, 1] → U such that for all s ∈ [0, 1] and t ∈ [a, b], Φ(t, 1) = ϕ(t), Φ(t, 0) = ϕ(a) = ϕ(b), and Φ(a, s) = Φ(b, s).

♦

The function Φ is called a (C 2 ) homotopy between ϕ and the point p : ϕ(a) = ϕ(b).

s 1

Φ( · , 1)

s

Φ( · , s)

a

b

Φ( · , 0) t

q p

FIGURE 13.15: Curves contracting to p must pass through q. Note that, for each s ∈ [0, 1], Φ(·, s) is a closed C 2 curve in U such that

496

A Course in Real Analysis

Φ(·, 1) = ϕ and Φ(·, 0) is a single point p. Thus a simply connected region U has the property that every closed curve in U may be contracted smoothly to a point while remaining in U (see Figure 13.15). In R2 this means that there are no “holes” in U . In higher dimensions a simply connected set may have holes. For example, Rn \ C1 (0) is simply connected if n ≥ 3. However, the holes may not be too large: the set R3 \ L, where L is a line, is not simply connected. To prove that every closed 1-form of class C 2 on a simply connected set is exact, we follow [5]. 13.6.3 Lemma. Let ω be a closed 1-form on a simply connected subset U of R Rn . Then ϕ ω = 0 for each closed C 2 curve ϕ in U . Pn Proof. Let ω = j=1 fj dxj and let Φ : [a, b] × [0, 1] → U be a homotopy as in 13.6.2. By hypothesis, 0 = dω =

n X n X

∂i fj dxi ∧ dxj =

j=1 i=1

hence

X

(∂i fj − ∂j fi )dxi ∧ dxj ,

1≤i 0 c on (−1, 1) and h = R 0 on (−1, 1) . Multiplying h by a positive constant, we may assume that R h = 1. Let R hk (x) = kh(kx), k = 1, 2, . . . . Then hk ≥ 0, hk (x) = 0 for |x| ≥ 1/k, and R hk = 1. Define a C ∞ function gk on R by gk (x) =

Z

∞

ϕ (y)hk (x − y) dy = 0

Z

−∞

1/k

ϕ0 (x + y)hk (y) dy.

−1/k

The sequence {gk } is uniformly bounded since Z ∞ Z |gk (x)| ≤ |ϕ0 (x + y)|hk (y) dy ≤ M −∞

By periodicity, Z

∞

hk (y) dy = M.

−∞

1

ϕ0 (x + y) dx =

Z

0

1

ϕ0 (x) dx = ϕ(1) − ϕ(0) = 0

0

(Exercise 5.3.1), hence, by Fubini’s theorem, Z 1 Z ∞ Z 1 gk (x) dx = hk (y) ϕ0 (x + y) dx dy = 0. −∞

0

0

Now define ϕk on R by ϕk (x) = ϕ(0) +

Z

x

gk (y) dy.

0

Then (a) and (d) hold and (b) follows from Z 1/k 0 0 0 0 ϕk (x) − ϕ (x) = gk (x) − ϕ (x) = ϕ (x + y) − ϕ0 (x) hk (y) dy, −1/k

which tends to 0 at continuity points x as k → +∞. Finally, (c) follows from (b), the inequality Z t Z 1 |ϕk (t) − ϕ(t)| ≤ |ϕ0k (x) − ϕ0 (x)| dx ≤ |ϕ0k (x) − ϕ0 (x)| dx, 0

0

and Lebesgue’s dominated convergence theorem, noting that the set of discontinuity points of ϕ0 is finite and hence has measure zero.

Integration on Surfaces

499

13.6.5 Theorem. Let ω be a closed 1-form on a simply connected subset U of Rn . Then ω is exact. R Proof. By 12.2.10 it suffices to show that ϕ ω = 0 for every piecewise C 1 closed curve ϕ : [0, 1] → U . Let {ϕk } be as in 13.6.4. Since ϕk → ϕ uniformly on [0, 1] and ϕ([0, 1]) ⊆ U , it follows R that ϕk ([0, 1]) ⊆ U for all sufficiently large k (Exercise 8.5.22). For such k, ϕk ω = 0 by 13.6.3. By (b) and (c) of 13.6.4, Lebesgue’s dominated convergence theorem, and the definition of integral of a R R R form (13.16), ϕk ω → ϕ ω. Therefore, ϕ ω = 0, as required.

Closed m-Forms on Star-Shaped Regions 13.6.6 Definition. A subset W of Rn is said to be star-shaped with respect to y ∈ W if the line segment from y to any point x ∈ W lies in W : y + t(x − y) ∈ W, 0 ≤ t ≤ 1.

♦

For example, a convex set is star-shaped with respect to every one of its points. In Figure 13.17, W is star-shaped with respect to y but not z, and V is not star-shaped with respect to any of its points.

z

x

x

y

y W

V

FIGURE 13.17: Star-shaped and non-star-shaped regions. 13.6.7 Poincaré’s Lemma. Let W ⊆ Rn be open and star-shaped with respect to some y ∈ W . If ω is a closed C 1 m-form on W , where 1 ≤ m ≤ n, then ω is exact. Proof. Define a function ψ : [0, 1] × W → W by ψ(t, x) = y + t(x − y). For an r-form X η= gj dxj j∈Jr

on W , define the (r − 1)-form ηe on W by X Z 1 r−1 ηex = t (gj ◦ ψ)(t, x) dt η j , where j∈Jm

η j :=

r X i=1

0

dj ∧ · · · ∧ dxj , j = (j1 , . . . , jr ). (−1)i−1 (xji − yji ) dxj1 ∧ · · · ∧ dx i r

500

A Course in Real Analysis

A standard argument shows that the definition of ηe is independent of the choice of representation of η. In particular, by putting η in canonical form we see that η = 0 ⇒ ηe = 0. Furthermore, dη j =

r X

dj ∧ · · · ∧ dxj = r dxj . (−1)i−1 d(xji − yji ) dxj1 ∧ · · · ∧ dx i m

i=1

Now let

ω=

X

fj dxj .

j∈Jm

Then

X Z

ω e=

j∈Jm

1

t

m−1

(fj ◦ ψ)(t, x) dt ωj ,

0

and, by 13.1.16(d) (suppressing the variables (t, x) in fj ◦ ψ(t, x)), Z 1 X Z 1 m−1 m−1 dω e= d t fj ◦ ψ dt ∧ ωj + t fj ◦ ψ dt dωj . 0

j∈Jm

0

Differentiating under the integral sign, applying the chain rule, and noting that ψx = tIn , we have Z 1 X n Z 1 m−1 m d t (fj ◦ ψ) dt = t (∂i (fj ) ◦ ψ) dt dxi . 0

i=1

0

Therefore, using dωj = m dxj , ( n Z ) Z 1 1 X X tm (∂i fj ) ◦ ψ dt dxi ∧ ωj + m tm−1 fj ◦ ψ dxj . dω e= i=1

j∈Jm

0

0

(13.33) On the other hand, dω =

X

n X

j∈Jm

i=1

hence, since dω = 0, n Z X X j∈Jm i=1

1

! ∂i fj dxi

∧ dxj =

n X X

∂i fj dxi ∧ dxj ,

j∈Jm i=1

f = 0. t (∂i (fj ) ◦ ψ)(t, x) dt (dω)(i,j) = dω m

0

By the above definition, (dω)(i,j) =

m X dj ∧ · · · ∧ dxj (−1)` (xj` − yj` ) dxi ∧ dxj1 ∧ · · · ∧ dx m ` `=1

+ (xi − yi ) dxj1 ∧ · · · ∧ dxjm = − dxi ∧ ωj + (xi − yi ) dxj ,

Integration on Surfaces

501

hence =

n Z X X j∈Jm i=1

1

tm (∂i fj ) ◦ ψ dt − dxi ∧ ωj + (xi − yi ) dxj = 0.

(13.34)

0

Adding (13.33) and (13.34), we obtain ( Z X ) Z 1 n 1 X m−1 m t (fj ◦ ψ) + dxj . dω e= m (xi − yi ) t (∂i fj ) ◦ ψ dt j∈Jm

0

i=1

0

The term in braces is simply Z 1 1 d m [t fj ◦ ψ] dt = tm fj ◦ ψ 0 = fj . 0 dt Therefore, d ω e = ω, which shows that ω is exact. From Poincaré’s lemma we obtain the following results from classical vector analysis, where, in keeping with the spirit, we write grad f for ∇f . 13.6.8 Corollary. Let W be an open star-shaped subset of R3 and let F~ (x, y, z) = P (x, y, z), Q(x, y, z), R(x, y, z) be a C 1 vector field on W . Then (a) curl F~ = 0 iff F~ = grad f for some C 2 function f : W → R. ~ for some C 2 vector field G ~ on W . (b) div F~ = 0 iff F~ = curl G Proof. (a) If F~ = grad f = (fx , fy , fz ), then curl F~ = (fzy − fyz , fxz − fzx , fyx − fxy ), which is zero because f is C 2 . Conversely, assume that curl F~ = 0, that is, Ry − Qz = Pz − Rx = Qx − Py = 0. Let ω = P dx + Q dy + R dz. Then dω = (Py dy + Pz dz) ∧ dx + (Qx dx + Qz dz) ∧ dy + (Rx dx + Ry dy) ∧ dz = (Qx − Py ) dx ∧ dy + (Rx − Pz ) dx ∧ dz + (Ry − Qz ) dy ∧ dz = 0 so ω is closed. By Poincaré’s lemma, there exists a 0-form f of class C 2 on W such that df = ω, that is, grad f = F~ . ~ where G ~ = (f, g, h), then (b) If F~ = curl G, P = hy − gz , Q = fz − hx , and R = gx − fy ,

502

A Course in Real Analysis

hence, if G is C 2 , div F~ = Px + Qy + Rz = (hyx − gzx ) + (fzy − hxy ) + (gxz − fyz ) = 0. Conversely, assume div F~ = 0 and let ω = R dx ∧ dy + P dy ∧ dz + Q dz ∧ dx. Then dω = div F~ dx ∧ dy ∧ dz, hence ω is closed. By Poincaré’s lemma, ω = d(f dx+g dy+h dz) = (gx −fy ) dx∧ dy+(hx −fz ) dx∧ dz+(hy −gz ) dy∧ dz for some C 2 functions f , g, h on W . Therefore, P = hy − gz , that is, F~ = curl (f, g, h).

Q = fz − hx ,

R = gx − fy ,

Part III

Appendices

Appendix A Set Theory

In this appendix we give an overview of those aspects of elementary set theory that are used throughout the book. For details the reader may wish to consult [2, 8].

Notation for a Set A set is simply a collection of objects, each of which is called a member or element of the set. Sets are usually denoted by capital letters, and members of a set by small letters. If x is a member of the set A, we write x ∈ A; otherwise, we write x 6∈ A. The empty set, denoted by ∅, is the set with no members. A concrete set may be described either by listing its elements or by setbuilder notation. The latter notation is of the form {x : P (x)}, which is read “the set of all x such that P (x),” where P (x) is a well-defined property that x must possess in order to belong to the set. For example, the set A of all odd positive integers may be described as A = {1, 3, 5, . . .} = {n : n = 2m − 1 for some positive integer m}. A set A is a subset of a set B, written A ⊆ B, if every member of A is a member of B. If A ⊆ B and A 6= B, then A is called a proper subset of a set B. The empty set is a subset of every set and a proper subset of every nonempty set. Sets A and B are said to be equal, written A = B, if each is a subset of the other. If all sets under consideration are subsets of the set S, then S is called a universal set (of discourse).

Set Operations Let S be a universal set. The basic set operations are A∪B A∩B A×B Ac A\B

= = = = =

{x : x ∈ A or x ∈ B}, {x : x ∈ A and x ∈ B}, {(x, y) : x ∈ A and y ∈ B}, {x : x ∈ S and x 6∈ A}, {x : x ∈ A and x 6∈ B},

union of A and B; intersection of A and B; Cartesian product of A and B; complement of A in S; difference of A and B.

More generally, if {Ai : i ∈ I} is an arbitrary collection of sets indexed by a 505

506

A Course in Real Analysis

set I, then the union and intersection of the collection are defined, respectively, by [ Ai = {x : x ∈ Ai for some i ∈ I}, i∈I

\

Ai = {x : x ∈ Ai for every i ∈ I}.

i∈I

If the index set is {1, 2 . . . , n} or {1, 2, . . . , n, . . .}, we use the alternate notation n [

Aj = A1 ∪ A2 ∪ · · · ∪ An ,

j=1

n \

Aj = A1 ∩ A2 ∩ · · · ∩ An

j=1

and

∞ [

Aj = A1 ∪ A2 ∪ . . . ,

∞ \

Aj = A1 ∩ A2 ∩ . . .

j=1

j=1

A sequence of sets An is said to be increasing if A1 ⊆ A2 ⊆ · · · , in which case we write An ↑. Similarly, the sequence is decreasing if A1 ⊇ A2 ⊇ · · · , written An ↓. In the first case we also write An ↑ A, where A = A1 ∪ A2 ∪ · · · , and in the second An ↓ A, where A = A1 ∩ A2 ∩ · · · . For finitely many sets we extend the definition of Cartesian product by n Y

Aj = A1 × · · · × An = {(a1 , . . . , an ) : aj ∈ Aj , j = 1, . . . n},

j=1

where (a1 , . . . , an ) is an (ordered) n-tuple. Also, we write An = A × A · · · × A . {z } | n

In particular, for an interval [a, b] and the set of all real numbers R, [a, b]n = [a, b] × · · · × [a, b] and Rn = R × · · · × R . | {z } {z } | n

n

The following propositions summarize the basic properties of set operations that will be needed in the text. As with many set equalities, they may be established directly by showing that an arbitrary member of the left side of an equation is a member of the right side, and vice versa. Proposition. If {Ai : i ∈ I} is collection of subsets of a set S, then \ \ [ c \ (a) Ai = Aci . (b) A ∪ Ai = A ∪ Ai . i∈I

(c)

\ i∈I

i∈I

Ai

c

=

[ i∈I

i∈I

Aci .

(d) A ∩

[ i∈I

i∈I

Ai =

[ i∈I

A ∩ Ai .

Set Theory

507

Parts (a) and (c) of the above proposition are known as DeMorgan’s laws. Parts (b) and (d) are called distributive laws. Proposition. The Cartesian product of sets has the following properties: (a) A × A1 ∪ · · · ∪ An = (A × A1 ) ∪ · · · ∪ (A × An ). (b) A × A1 ∩ · · · ∩ An = (A × A1 ) ∩ · · · ∩ (A × An ). (c) A1 ∩ · · · ∩ An × B1 ∩ · · · ∩ Bn = (A1 × B1 ) ∩ · · · ∩ (An × Bn ).

Partitions and Equivalence Relations A collection of sets is pairwise disjoint if A ∩ B = ∅ for each pair of distinct members A and B in the collection. A partition of a set S is a collection of nonempty pairwise disjoint sets whose union is S. An equivalence relation on a set S is a subset R of S × S with the following properties: • (reflexivity) xRx for every x ∈ S; • (symmetry) xRy ⇒ yRx; • (transitivity) xRy and yRz ⇒ xRz. Here, as is customary, we have written xRy for (x, y) ∈ R. There is an important duality regarding partitions and equivalence relations: If R is an equivalence relation on S, then the collection of sets of the form [x] := {y ∈ S : xRy}, called an equivalence class of the relation, is a partition of S. Conversely, given a partition of S, define xRy iff x and y are in the same partition member. Then R is an equivalence relation on S whose equivalence classes are precisely the members of the partition.

Functions Let A and B be nonempty sets. A function or mapping from A to B is a rule f that assigns to each member x of A a unique member f (x) of B. We then write f : A → B. The set A is called the domain of f . The alternate notation x 7→ f (x) : A → B is also used. If A0 ⊆ A and B0 ⊆ B, then f (A0 ) = {f (x) : x ∈ A0 } and f −1 (B0 ) = {x ∈ A : f (x) ∈ B0 } are called, respectively, the image of A0 and the pre-image of B0 under f . The set f (A) is called the range of f . A function f : A → B is said to be onto B if f (A) = B, and one-to-one if x1 6= x2 implies f (x1 ) 6= f (x2 ).

508

A Course in Real Analysis

Proposition. Let f : A → B be a function, {Ai : i ∈ I} a collection of subsets of A, and {Bj : j ∈ J} a collection of subsets of B. Then [ [ (a) f −1 Bj = f −1 (Bj ). j∈J

(b) f −1

\

j∈J

Bj =

j∈J

(c) f

[ \

Ai =

(f) (g) (h)

[

f (Ai ).

i∈I

\ Ai ⊆ f (Ai ), where equality holds if f is one-to-one.

i∈I

(e)

f −1 (Bj ).

j∈J

i∈I

(d) f

\

i∈I

c f = f −1 (Bj ) . c f (Aci ) ⊆ f (Ai ) , where equality holds if f is onto B. f f −1 (Bj ) ⊆ Bj , where equality holds if f is onto B. Ai ⊆ f −1 f (Ai ) , where equality holds if f is one-to-one. −1

(Bjc )

If f : A → B and g : C → D are functions with B ⊆ C, then the composition of g and f is the function g ◦ f : A → D defined by (g ◦ f )(x) = g f (x) , x ∈ A. If D0 ⊆ D, then

(g ◦ f )−1 (D0 ) = f −1 g −1 (D0 ) .

If f : A → B is one-to-one and onto B, then the inverse f −1 : B → A is defined by the rule x = f −1 (y) iff y = f (x). One then has the identities (f −1 ◦ f )(x) = x and (f ◦ f −1 )(y) = y, x ∈ A, y ∈ B. Thus f −1 ◦ f and f ◦ f −1 are the identity functions on A and B, respectively.

Cardinality Two sets A and B are said to have the same cardinality if there exists a one-to-one function from A onto B. A set A is finite if either A is the empty set or A has the same cardinality as {1, 2, . . . , n} for some positive integer n. In the latter case, the members of A may be labeled with the numbers 1, 2, . . . , n, so A may be written {a1 , a2 , . . . , an }. A set A is countably infinite if it has the same cardinality as the set of natural numbers. In this case the members of A may be labeled with the positive integers 1, 2, 3, . . . A set is countable if it is either finite or countably infinite; otherwise it is said to be uncountable. The set of all integers is countably infinite, as is the set of rational numbers. The set R of all real numbers is uncountable, as is any (nondegenerate) interval of real numbers.

Appendix B Linear Algebra

This appendix contains a brief review of the main ideas of linear algebra that will be needed in Part II of the text. For details and proofs the reader is referred to [9].

Vector Spaces. Bases A vector space is a set V containing at least one member 0, called the zero vector, together with two operations u + v and au (u, v ∈ V, a ∈ R), called vector addition and scalar multiplication, respectively, such that for all u, v, w ∈ V and a, b ∈ R the following axioms hold: • Associativity of addition: (u + v) + w = u + (v + w). • Commutativity of addition: u + v = v + u. • Additive identity: v + 0 = v. • Existence of additive inverse: u + (−u) = 0. • Associativity of scalar multiplication: (ab)u = a(bu). • Scalar distributivity: a(u + v) = au + av. • Vector distributivity: (a + b)u = au + bu. • Scalar multiplicative identity: 1u = u. A subset W of V containing the zero vector and closed under the operations of vector addition and scalar multiplication is called a subspace of V. The set W is then a vector space under the operations it inherits from V. A linear combination of vectors v 1 , . . . , v n ∈ V is an expression of the form c1 v 1 + · · · + cn v n , cj ∈ R. The set of all linear combinations of v 1 , . . . , v n is called the linear span of v 1 , . . . , v n or the subspace spanned by v 1 , . . . , v n . The vectors v 1 , . . . , v n are then said to span V. Vectors v 1 , . . . , v n ∈ V are linearly independent if an equation of the form c1 v 1 + · · · + cn v n = 0 509

510

A Course in Real Analysis

can hold only if c1 = · · · = cn = 0. A basis for V is a finite set of linearly independent vectors that span V. It follows that each member of V is uniquely expressible as a linear combination of the basis vectors. A vector space that has a basis is said to be finite dimensional; otherwise it is infinite dimensional. All bases in a finite dimensional vector space V have the same number of vectors. This number is called the dimension of the vector space and is denoted by dim V. A frame for a finite dimensional vector space is an ordered basis. If V is finite dimensional, then every set of linearly independent vectors may be extended to a basis, and every finite set of vectors that span V may be reduced to a basis. An important example of a finite dimensional vector space is Euclidean space Rn (Section 1.6). The standard basis in Rn consists of the n vectors e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), en = (0, 0, . . . , 0, 1). An example of an infinite dimensional vector space is the set of all Riemann integrable functions on [a, b] with the operations of pointwise addition and scalar multiplication. A basis {w1 , . . . , wm } for a subspace W of Rn is orthonormal if ( 0 if i 6= j, wi · wj = 1 if i = j, where (·) is the usual inner (= dot) product on Rn . For example, the standard basis is orthonormal. Every subspace of Rn has an orthonormal basis.

Linear Transformations Let U and V be vector spaces. A linear transformation from U to V is a function T : U → V with the properties T (u + v) = T u + T v and T (cu) = cT u, u, v ∈ U, c ∈ R. Here, we have used the convention for linear transformations of dropping the parentheses in the notation T (u) when there is no danger of ambiguity. The collection of all linear transformations from U to V is denoted by L(U , V). It is a vector space under the operations T1 + T2 and cT defined by (T1 + T2 )(u) = T1 u + T2 u, (cT )u = c(T u),

u ∈ U , c ∈ R.

If T ∈ L(U, V) and S ∈ L(V, W), then ST := S ◦ T is a member of L(U, W). Also, the subspace N (T ) := T −1 ({0}) of U is called the nullspace of T . The range of T , which is a subspace of V, is denoted by R(T ). If U and R(T ) are finite dimensional, then dim N (T ) + dim R(T ) = dim U.

Linear Algebra

511

If T ∈ L(U, V) is one-to-one and onto V, then T −1 ∈ L(V, U). In this case T is said to be invertible. If U and V are finite dimensional, then T is invertible iff N (T ) = {0} iff R(T ) = V. In this case T maps a frame (u1 , . . . , un ) in U onto a frame (v 1 , . . . , v n ) in V, where v j = T uj . We indicate this by writing T (u1 , . . . , un ) = (v 1 , . . . , v n ).

Matrices An m × n matrix is a rectangular array of real numbers with m rows and n columns. It is written variously as 1 a1 a21 · · · an1 a1 a12 a22 · · · an2 a2 A = [aji ]m×n = . .. .. = .. = a1 a2 · · · an , . . . ··· . . a1m

a2m

···

anm

am

where ai = (a1i , · · · , ani ) is the ith row of A and aj = (aj1 , · · · , ajm ) is the jth column of A (written, of course, as a column). The number aji located in row i and column j of the matrix is also written aij and is called the (i, j)th entry of A. For a ∈ R and matrices A = [aji ]m×n , B = [bji ]m×n , and C = [aji ]n×p , the sum A + B, scalar multiple aA, and product AC are defined, respectively, by A+B = [xji ], xji := aji +bji , aA = [yij ], yij := aaji , AC = [zij ], zij :=

n X

aki cjk .

k=1

The product AC may also be written as a1 a2 .. c1 c2 · · · cp n×p = [ai · cj ]m×p . . am

m×n

The m × n matrix Om×n with all entries equal to 0 is called a zero matrix. It has the property that A+Om×n = A for all m×n matrices A. The collection of m × n matrices is a vector space under the operations A + B and aA and with zero Om×n . The transpose of an m × n matrix A is the n × m matrix At := [xji ], where j xi = aij . For example t 1 4 1 2 3 = 2 5 . 4 5 6 3 6 The transpose operation has the following properties: (A + B)t = At + B t , (aA)t = aAt , (AC)t = C t At .

512

A Course in Real Analysis For each n, the matrix 1 0 In := . .. 0

0 ··· 1 ··· .. . ··· 0 ···

0 0 .. . 1

is called the nth order identity matrix. It has the property that AIn = A and In B = B for all m × n matrices A and all n × p matrices B. An n×n matrix A is said to be nonsingular if there exists a matrix, denoted by A−1 and called the inverse of A, such that AA−1 = A−1 A = In . The inverse operation has the property (AB)−1 = B −1 A−1 for all nonsingular n × n matrices A and B. An m × n matrix A is said to be in reduced row echelon form if the following conditions hold: • Any nonzero row has its first entry equal to 1. This entry is then called the leading entry of the row. • If rows i and k are nonzero and i < k, then the leading entry of row i is to the left of the leading entry of row k. • Entries above and below a leading entry are zero. • Any zero row is below all nonzero rows. For example, the following matrix is 0 1 0 0 0 0 0 0

in reduced row echelon form: 0 3 0 1 7 0 . 0 0 1 0 0 0

For a given n, In is the only n × n matrix in reduced row echelon form without any zero rows. An elementary row operation on an m × n matrix A is one of the following: • Interchange a pair of rows. • Multiply a row by a nonzero scalar. • Add to one row a scalar multiple of another.

Linear Algebra

513

An elementary matrix is a matrix obtained from the identity matrix by an elementary row operation. Each elementary row operation on A may be achieved by multiplying A on the left by a suitable elementary matrix. For example, the multiplication 0 1 0 1 2 3 4 5 6 1 0 0 4 5 6 = 1 2 3 0 0 1 7 8 9 7 8 9 switches the first and second rows 1 0 0 1 2 1 0 4 0 0 1 7

of A, and the 2 3 1 5 6 = 6 8 9 7

multiplication 2 3 9 12 8 9

adds twice row one to row two. Using elementary operations, one may transform any m × n matrix A into reduced row echelon form R. It follows that there exists a sequence of elementary matrices Ej such that R = Ep Ep−1 · · · E1 A. The row rank (column rank) of a matrix A is the maximum number of linearly independent rows (columns) of A. The row rank of a matrix is always equal to the column rank. (This is clear for the reduced row echelon form.) The rank of a matrix is its row (= column) rank.

The Matrix of a Linear Transformation Let T ∈ L(Rn , Rm ). The matrix of T is defined by [T ] = T e1 T e2 · · · T en (where T ej is P written as a column). If T ej = (aj1 , aj2 , · · · , ajm ) and x = n (x1 , . . . , xn ) = j=1 xj ej , then, by linearity of T , T (x1 , x2 , . . . , xn ) =

n X

xj T e j =

j=1

=

n X j=1

n X

(aj1 xj , aj2 xj , · · · , ajm xj )

j=1

aj1 xj ,

n X j=1

aj2 xj , · · · ,

n X j=1

which may be written in column matrix form as 1 a1 a21 · · · an1 x1 a12 a22 · · · an2 x2 [T ]xt = . .. .. .. . .. . ··· . . 1 2 xn am am · · · anm

ajm xj ,

514

A Course in Real Analysis

Note that aji may be expressed as (T ej ) · ei . The operations of addition, scalar multiplication, and composition of linear transformations correspond to addition, scalar multiplication, and multiplication of matrices in the following way: If T, T 0 ∈ L(Rn , Rm ) and S ∈ L(Rm , Rp ), then [T + T 0 ] = [T ] + [T 0 ], [tT ] = t[T ] [ST ] = [S][T ]. In particular, if T ∈ L(Rn , Rn ), then T is invertible iff [T ] is nonsingular. An n × n matrix A is orthogonal if AAt = In , that is, if At = A−1 or, equivalently, det A = ±1. (See below.) A linear transformation T ∈ L(Rn , Rm ) is said to be orthogonal if [T ] is orthogonal.

Determinants A permutation of the n-tuple (1, . . . , n) is a one-to-one function σ mapping {1, . . . , n} onto itself. It is frequently denoted by (i1 , . . . , in ), where ik = σ(k). The permutation is said to be even or odd according as an even or odd number of adjacent interchanges are required to transform (i1 , . . . , in ) to (1, . . . , n) (or vice versa). For example, (3, 2, 1) is odd and (4, 3, 2, 1) is even. The sign of a permutation σ is defined by ( 1 if σ is even, (−1)σ = −1 if σ is odd. We then have (−1)στ = (−1)σ (−1)τ

and

(−1)σ

−1

= (−1)σ ,

where, as is customary, τ σ stands for τ ◦ σ. The determinant of an n × n matrix A = [aji ] is defined by 1 a1 a21 · · · an1 1 a2 a22 · · · an2 X σ(1) det A = . (−1)σ a1 · · · aσ(n) , .. .. := n . . . · · · . σ a1 a2 · · · an m

m

m

where the sum is taken over all permutations σ of (1, . . . , n). For example, a b c d = ad − bc, since (−1)(1,2) = 1 and (−1)(2,1) = −1. If T ∈ L(Rn , Rn ) we denote the determinant of the matrix of T by det T rather than by the more cumbersome det[T ]. The following theorem summarizes the main properties of determinants. Parts (a)–(f) follow directly from the above definition; part (g) is proved in Chapter 13.

Linear Algebra 515 Theorem. Let A = a1 · · · an be an n × n matrix and t ∈ R. Then (a) det a1 · · · taj · · · an = t det a1 · · · aj · · · an . (b) det a1 · · · aj + b · · · an = det a1 · · · aj · · · an + det a1 · · · b · · · an . (c) Interchanging two rows of A changes the sign of the determinant. (d) If A has a pair of duplicate rows, then det A = 0. (e) Adding a multiple of one row to another does not change the value of the determinant. (f) det At = det A. Thus any “row property” has a corresponding “column property.” (g) If B is an n × n matrix, then det(AB) = (det A)(det B). The following theorem is frequently useful in evaluating determinants. Theorem. Let A = [aij ] be an n × n matrix, and for each (i, j), let Aij denote the matrix obtained by removing row i and column j from A. Then for each fixed i and j, det A =

n X

(−1)i+k aik det Aik =

k=1

n X

(−1)k+j akj det Akj .

k=1

The first equality is called expansion along row i and the second expansion along column j. For example, expanding along row 1, a11 a12 a13 a21 a22 a23 = a11 a22 a23 − a12 a21 a23 + a13 a21 a22 . a32 a33 a31 a33 a31 a32 a31 a32 a33 a b = ad − bc may then be used to complete the evaluation. The formula c d For another example, consider Ip Cp×q Oq×p Dq×q = det D, obtained by successive expansion along the first column. The preceding theorem may be used to prove the following result. Theorem. Let A = [aij ] be an n × n matrix. Then A−1 exists iff det A = 6 0. In this case the (i, j) entry of A−1 is (−1)i+j

det Aji . det A

516

A Course in Real Analysis

The last theorem may be used to prove Cramer’s Rule: Consider a system of n equations in n unknowns, written in matrix form as Ax = b or explicitly as a11 a12 · · · a1n x1 b1 a21 a22 · · · a2n x2 b2 .. .. .. .. = .. . . . ··· . . . an1

an2

···

ann

xn

bn

If A is nonsingular, then the solution to the system is a11 · · · a1,j−1 b1 a1,j+1 · · · 1 a21 · · · a2,j−1 b2 a2,j+1 · · · xj = . .. .. .. det A .. ··· . . . an1 · · · an,j−1 bn an,j+1 · · ·

a1n a2n .. . . ann

Appendix C Solutions to Selected Problems

Section 1.2 1. (b) (ab) + (−a)b = a + (−a) b = 0 · b = 0, so uniqueness of the additive inverse implies −(ab) = (−a)b. A similar argument works for the second equality. (d) By (b), (−1)a = 1(−a) = −a. (f) Using commutativity and associativity of multiplication and the distributive law and 1.2.1(i), a/b + c/d = ab−1 (dd−1 ) + cd−1 (bb−1 ) = ad(b−1 d−1 ) + bc(b−1 d−1 ) = ad(bd)−1 + bc(bd−1 ) = (ad + bc)/(bd). 3. If s := r/x ∈ Q, then, by Exercise 2, x = r/s ∈ Q, a contradiction. Therefore, r/x ∈ I. The remaining parts have similar proofs. 1 n! n−1n−2 · · · = n . For (b), n n n n (2n)! = 2n(2n − 2)(2n − 4) · · · 4 · 2 (2n − 1)(2n − 3) · · · 3 · 1 = 2n n(n − 1)(n − 2) · · · 2 · 1 (2n − 1)(2n − 3) · · · 3 · 1 .

5. The left side of (a) is

8. f (k) = k 3 − (k − 1)3 = 3k 2 − 3k + 1.

Section 1.3 1. (c) Follows from a/b − c/d = (ad − bc)/bd. 4. If 0 < x < y, then multiplying the inequality by 1/(xy) and using (d) of 1.3.2 shows that 1/y < 1/x. If x < y < 0, then 0 < −y < −x, hence, by the first part, 1/(−x) < 1/(−y) so 1/x > 1/y. 6. (a) By Exercise 1.2.4, y n − xn = (y − x)

n X

y n−j xj−1 . Each term of the

j=1

sum is positive and less than y n−j y j−1 = y n−1 . Since there are n terms, part (a) follows. 8. a = ta + (1 − t)a < tb + (1 − t)b = b. 517

518

A Course in Real Analysis

10. If a > b, then x := a − b > 0 and a > b + x, contradicting the hypothesis. 13. (b) 0 ≤ (x − y)2 + (y − z)2 + (z − x)2 = 2(x2 + y 2 + z 2 ) − 2(xy + yz + xz). 14. Expand (x − a)2 ≥ 0 and divide by x. 18. If a ≤ x ≤ b, then x ≤ |b| and −x ≤ −a ≤ |a|, hence |x| ≤ max{|a|, |b|}. 21. Assume without loss of generality that S1 = S \{a1 , . . . , ak }, so min S1 = ak+1 . Each of the remaining sets Sj contains at least one of a1 , . . . , ak , hence min Sj ≤ ak < ak+1 , verifying the assertion.

Section 1.4 2. (a) sup = 12, inf = −12.

(b) sup = 1, inf = −1.

3. (c) sup = 10/3, inf = 3;

(d) sup =

(e) sup = +∞, inf = −∞. (i) sup =

1 2

+

√

2 4 ,

inf =

1 2

√

−

2 4 ;

√ 3+ 5 2 ,

inf = −∞;

(h) sup = 3, inf = 0; (m) sup = 4/3, inf = −1.

5. Let x, y ∈ A. Then ±(x−y) ≤ sup A−inf A, hence |x−y| ≤ sup A−inf A. Since |x|−|y| ≤ |x−y|, |x|−|y| ≤ sup A−inf A so |x| ≤ sup A−inf A+|y|. Since x was arbitrary, we have sup |A| ≤ sup A − inf A + |y|, hence sup |A| − sup A + inf A ≤ |y|. Since y was arbitrary, it follows that sup |A| − sup A + inf A ≤ inf |A|. 6. (b) Since x > 0, xa ≤ x sup A for all a ∈ A, hence sup (xA) ≤ x sup A. Replacing x by 1/x proves the inequality in the other direction. The infimum case is similar. √ √ √ 9. Let a < b and choose a rational r in (a − 2, b − 2). Then r + 2 is irrational and in (a, b). 12. (b) If n := bxc = −b−xc, then x − 1 < n ≤ x and x ≤ n < x + 1. This is possible only if x = n. The converse is trivial. (c) By definition −x − 1 < b−xc ≤ −x. m 1/n 14. Let x := (bm ) and y := b1/n . By definition, x is the unique positive h m in h 1/n n im solution of xn = bm . Since y n = b1/n = b = bm , x = y. 17. Let ` ≤ x ≤ u for all x ∈ A. By the Archimedean principle, there exist positive integers m and n such that −m < ` ≤ u < n. Set N = max{m, n}.

Solutions to Selected Problems 519 √ √ 20. For√any a ∈ N, if√r := n + a +√ n ∈ Q, then squaring both sides of n + a = r − n shows√that n ∈ Q and hence that n = j 2 for some j ∈ N (1.4.11). Then n + a ∈ Q, hence n = k 2 for some k ∈ N. Therefore, a = k 2 − j 2 = (k − j)(k + j). If a = 11, then k − j = 1 and j + k = 11 so n = 25. If a = 21, then either k − j = 1 and j + k = 21 or k − j = 3 and j + k = 7. The first choice leads to j = 10 and n = 100 and the second to j = 2 and n = 4.

Section 1.5 3. Let f (n) denote the sum on the left side of the equation and g(n) the sum on the right. Then f (1) = 1/2 = g(1). Now let n ≥ 1. Then f (n + 1) − f (n) =

2n+2 X k=1

g(n + 1) − g(n) =

2n

(−1)k+1 X (−1)k+1 1 1 − = − k k 2n + 1 2n + 2

2n+2 X k=n+2

k=1

2n X 1 1 1 1 1 − = + − . k k 2n + 2 2n + 1 n + 1 k=n+1

Since the right sides are equal, f (n) = g(n) ⇒ f (n + 1) = g(n + 1). 5.

25 3 3 n

6. (b)

−

500 X k=1

15 2 2 n

+ 16 n.

(4k 2 − 1) = 4

500 · 501 · 1001 − 500 = 167, 166, 500. 6

7. For n ≥ 1, let Q(n) be the statement P (n − 1 + n0 ). Then Q(1) = P (n0 ) is true. Assume Q(n) = P (n − 1 + n0 ) is true. Then Q(n + 1) = P (n + n0 ) is true. By mathematical induction, Q(n) = P (n − 1 + n0 ) is true for all n ≥ 1, that is, P (n) is true for every n ≥ n0 . 8. In each case, let f (n) be the left side of the inequality and g(n) the right side, and let P (n) : f (n) < g(n). Let n0 be the base value of n for which P (n) is true. It is straightforward to check that f (n0 ) < g(n0 ). Assume P (n) holds for some n ≥ n0 , so that f (n)/g(n) < 1. Then (a)

f (n + 1) 2n + 3 f (n) 1 = n+1 = + < 1. g(n + 1) 2 2g(n) 2n

(e)

2n+1 (n + 1)! 2 f (n + 1) f (n) = = < 1. n+1 g(n + 1) (n + 1) g(n) (1 + 1/n)n

9. Check that 6 < ln(6!). For the induction step, use (n + 1)! = (n + 1)n!. 13. Let gn denote the expression on the right in the assertion. One checks directly that g0 = g1 = 1. Let n ≥ 2 and assume that fj = gj for all

520

A Course in Real Analysis 2 ≤ j ≤ n. Then gn+1 − fn+1 = gn+1 − fn − fn−1 = gn+1 − gn − gn−1 1 1 = √ an+2 − an+1 − an + √ bn+2 − bn+1 − bn 5 5 bn 2 an 2 = √ (a − a − 1) + √ (b − b − 1) = 0. 5 5

15. The set of all nonnegative integers of the form m−qn, q ∈ Z, is nonempty (Archimedean principle), hence has a smallest member r = m − qn (well ordering principle). If r ≥ n, then 0 ≤ r − n = m − (q + 1)n < r, contradicting the minimal property of r. Therefore, m = qn + r has the required form. If also m = q 0 n + r0 , q 0 ∈ Z, and r0 ∈ {0, . . . n − 1}, then |q − q 0 |n = |r − r0 | < n, hence q 0 = q and r0 = r.

Section 1.6 1. x = c −

d · e − (b · c)(b · d) a, 1 − (a · b)(b · d)

y =e−

b · c − (a · b)(d · e) d. 1 − (a · b)(b · d)

2. (c) By the triangle inequality, ||x||2 = ||x − y + y||2 ≤ ||x − y||2 + ||y||2 , hence ||x||2 − ||y||2 ≤ ||x − y||2 . Similarly, ||y||2 − ||x||2 ≤ ||x − y||2 . 3. By 1.6.3, ||x1 + x2 + · · · + xk ||22 =

n X

xi · xj =

i,j=1

k X

xj · xj .

j=1

7. The hypotheses imply that n X j=1

x2j =

n X

yj2 = 1 and

j=1

n X

(xj + yj )2 = 4.

j=1

Pn Pn It follows that j=1 xj yj = 1 and j=1 (xj − yj )2 = 0. The same does not hold for || · ||∞ (take x = (−1, 1) and y = (1, 1)) or for || · ||1 (take x = (1, 0) and y = (0, 1)).

Section 2.1 1. (a) an = [a + b + (−1)n (b − a)]/2. 3. (b) If n ≥ 6, |(2n2 − n)/(n2 + 3) − 2| = |n + 6|/(n2 + 3) ≤ 2n/n2 = 2/n. Therefore, choose N ≥ min{6, 2/ε}. (e) |(2 + 1/n)3 − 8| = (2 + 1/n)2 + 2(2 + 1/n) + 4 /n ≤ 19/n, so choose any integer N > 19/ε.

Solutions to Selected Problems

521

5. Let r = pq −1 , p, q ∈ Z, q > 0. For all n ≥ q, n!r ∈ Z, hence sin(n!rπ) = 0. 7. Let A = {x1 , . . . , xp } and Aj = {n : an = xj }. One of these sets, say A1 , must have infinitely many members. Since |x1 − a| ≤ |x1 − an | + |an − a| and an → a, letting n → +∞ through A1 shows that x1 = a. We may therefore choose ε > 0 so that I := (a − ε, a + ε) contains no xj for j ≥ 2. Let N ∈ N such that an ∈ I for all n ≥ N . For such n, an = a. 8. (a) bn = (3an + 2bn − 3an )/2 → (c − 3a)/2. √ 9. (a) 2. (d) b/2 a. (g) −kak−1 .

(k) 1/2.

11. Use −r ≤ an − bn ≤ r and 2.1.4. 14. (a) Suppose first that r > 1. Set hn = r1/n − 1. By the binomial theorem, r = (1 + hn )n > nhn , hence, by the squeeze principle, hn → 0. If r < 1, consider 1/r. 17. an < ran−1 < r2 an−2 < · · · < rn−1 a1 → 0. For the example, take an = 21/n . 19. Choose N such that an − a < ε for all n ≥ N . For such n, 0 ≤ min{a1 , . . . , an } − a ≤ an − a < ε. Therefore, min{a1 , . . . , an } → a. The converse is false: consider an = 1 + (−1)n . 22. Suppose that c ≤ f (x) − x ≤ d for all x, so c + jx ≤ f (jx) ≤ djx. Summing and using Exercise 1.5.4, nc + xn(n + 1)/2 ≤

n X

f (jx) ≤ nd + xn(n + 1)/2,

j=1

hence c/n + x(1 + 1/n)/2 ≤ (1/n2 )

n X

f (jx) ≤ d/n + x(1 + 1/n)/2.

j=1

Letting n → +∞, we obtain (a). Part (b) is proved similarly.

Section 2.2 1. Since

a1/n a1/(n+1)

= a1/n(n+1) < 1 < b1/n(n+1) =

b1/n b1/(n+1)

,

a1/n is increasing and b1/n is decreasing. Each tends to 1 by Exercise 2.1.14.

522

A Course in Real Analysis

3. By results of Section 2.1, an = a(1/n + nb)−1 → 0 and nan = a(1/n2 + b)−1 → ab−1 . The condition an+1 < an is equivalent to (n2 + n)b > 1, which holds eventually. The condition (n+1)an+1 > nan is equivalent to the inequality (n + 1)2 > n2 . 3x + 4 1 = . Then f : [1, 2] → [1, 2], f is 2 + (1 + x)−1 2x + 3 increasing and f (am ) = am+2 . Since a1 , a2 ∈ [1, 2], an ∈ [1, 2] for all n.

7. Let f (x) = 1 +

Since a1 = 1, a2 = 3/2, a3 = 7/5 and a4 = 17/12, the inequalities a2n+2 < a2n and a2n+1 > a2n−1 hold for n = 1. Assume they hold for n = k. Then a2k+4 = f (a2k+2 ) < f (a2k ) = a2k+2

and

a2k+3 = f (a2k+1 ) > f (a2k−1 ) = a2k+1 , hence the inequalities hold for n = k + 1. Since the sequences {a2n } and {a2n+1 } are bounded and monotone, the monotone convergence theorem implies that a2n → a and a2n+1 → b for some a, b ∈√R. Letting n → +∞ √ in f (a2n ) = a2n+2 gives f (a) = a. Therefore, a = 2. Similarly, b = 2. √ √ √ 2 r ≥ 2x r, hence (x + r/x)/2 ≥ r. Therefore, an ≥ r. 9. For x > 0, √x + 2 2 For x ≥ r, x + r ≤ 2x , hence (x + r/x)/2 ≤ x. Therefore,√an ≥ an+1 . By the monotone convergence theorem, an → a for some a ≥ r. Letting n → +∞ in an = (an−1 √ + r/an−1 )/2, yields a = (a + r/a)/2, which has positive solution a = r.

Section 2.3 1. (a) 0, ±3/8. 2/k 3. (d) an = 1 +

(c) ±4, ±6, ±12, ±14. 2n+k −k 1 1 1+ → e. 2n + k 2n + k

5. If {an } lies in the set {x1 , . . . , xn }, then one of the sets {n : an = xj } must have infinitely many members and a subsequence may be constructed from these. P∞ 8. Given ε > 0, choose N so that n=N |an+k − an | < ε. For m > n ≥ N , |amk − ank | ≤ |amk − a(m−1)k | + · · · + |a(n+1)k − ank | < ε. Therefore, {ank }∞ n=1 is Cauchy.

Solutions to Selected Problems

523

10. Clearly an → 0 implies bn → 0. For the converse, suppose an 6→ 0. Choose ε > 0 and a subsequence such that ank ≥ ε > 0 for all k. Then 1 1 1 1 1 = bn k + ≤ b + , nk aqnk εq εq−p aq−p nk hence bn 6→ 0. If 0 < q < p, then √ the sufficiency is false: Take an = n, q = 1/2 and p = 1. Then bn = n/(n + 1) → 0 but an → +∞.

Section 2.4 1. (a) lim inf = −5/3, lim sup = 5/3.

(c) lim inf = −14, lim sup = 14.

(h) lim inf = −∞, lim sup = +∞. 3. Follows from Exercise 1.4.6. 5. Follows from {ank : k ≥ n} ⊆ {ak : k ≥ n}. 7. 0 < b − ε < bn < b + ε ⇒ an (b − ε) < an bn < an (b + ε) ⇒ (b − ε) lim supn→∞ an ≤ lim supn→∞ an bn ≤ (b + ε) lim supn→∞ an . Now let ε → 0. 10. Choose r so that lim inf n→∞ bn > r > 0. Then, given ε > 0, there exists N such that an > a/2 and bn > r, and cn := (bn − 3an )(bn + 2an ) = b2n − an bn − 6a2n < ε for every n > N . Then bn − 3an = cn /(bn + 2an ) < ε/(r + a), so lim supn→∞ bn ≤ 3a. an+1 . Choose r strictly between an these numbers and then choose N such that an /an−1 > r for all n > N . For such n, an > an−1 r > an−2 r2 > · · · > aN rn−N , 1/n

12. Suppose that lim inf n an

1/n

< lim inf n

1/n

hence lim inf n an ≥ lim inf n (aN r1−N/n ) = r, a contradiction. To evaluate limn n/(n!)1/n take an = nn /n! and calculate n an+1 n+1 = → e. an n

Section 3.1 1. Let x1 < · · · < xn denote the points of E and let δ=

1 min{xj − xi : 1 ≤ i < j ≤ n}. 2

Then for each j, (xj − δ, xj + δ) ∩ E = {xj }.

524

A Course in Real Analysis

4. Let ε, M > 0. (b) The limit is 1. If |x − 1| < 1, then x > 0, hence 2|x − 1| x+3 3x + 1 − 1 = 3x + 1 < 2|x − 1|. Therefore, choose δ = min{1, ε/2}. √ √ √ (d) The limit is +∞: x < − M − 1 ⇒ −x > M and − x − 1 > M ⇒ x2 + x = (−x)(−x − 1) > M . 6. (a) 2/3. 7. (b) −1/2.

(d) +∞. (g) 9/25. √ √ √ √ r b+x− b−x c+x+ c−x c √ √ → (f) √ = √ . b c+x− c−x b+x+ b−x

√ √ (h) (a d)/(c b).

9. The limit exists at a iff lim{x→a, x∈Q} f (x) = lim{x→a, x∈I} f (x). By continuity of polynomials, this is equivalent to 4a2 + 2a − 11 = 3a2 + a − 5. Thus a = −3, 2. √ √ 11. (a) a. (e) (c a)/(2 b). 13. Proof for the case f increasing and L := limn f (an ) ∈ R: Given ε > 0, choose N such that L − ε < f (an ) < L + ε for all n ≥ N . Let x > aN and let n be the least integer > N such that x < an . Then an−1 ≤ x < an so L − ε < f (an−1 ) ≤ f (x) ≤ f (an ) < L + ε.

Section 3.2 1. (a) −1, 1.

(c) −2/3, 2/3.

(e) −1, 1.

(i) −3, 1.

3. lim sup case: Assume a ∈ R. Set L = lim sup{x→a, x∈E} f (x) and Lj = lim sup{x→a, x∈Ej } f (x), j = 1, 2. By 3.2.1, there exists a sequence an ∈ E1 such that f (an ) → L1 . Since an ∈ E, by the same theorem, L1 ≤ L. Similarly, L2 ≤ L. Now let bn ∈ E such that f (bn ) → L. Then one of the sets, say E1 , contains infinitely many terms of the sequence. Therefore, L ≤ L1 , hence L = max{L1 , L2 }. 4. Let g(x) = 1/f (x). Then g(r) = 1/f (r).

Section 3.3 1. By continuity, f (2) = limx→2− (mx + 3) = limx→2+ (3x2 + 7), that is, 2m + 3 = 19. Therefore, m = 8.

Solutions to Selected Problems

525

4. This follows from lim{x→a, x∈Q} d(x)g(x) = g(a) and lim{x→a, x∈I} d(x)g(x) = 0. 8. The identity implies that f (nx) = nf (x), n ∈ N. Also, f (0) + f (0) = f (0) so f (0) = 0. Since f (−x) + f (x) = f (0), we see that f (−x) = −f (x), hence f (nx) = nf (x) for all n ∈ Z. Let m, n ∈ N. Then f (x) = f (nx/n) = nf (x/n). Replacing x by xm gives mf (x) = f (mx) = nf (mx/n). Thus, f (tx) = tf (x) for all x ∈ R and t ∈ Q. Since f is continuous at zero and f (x − y) = f (x) + f (−y) = f (x) − f (y), f is continuous on R. Thus, f (tx) = tf (x) for all x, t ∈ R. Setting x = 1 gives the desired result. P 9. (c) Let a ∈ R and ε > 0. Choose N so that n>N 2−n < ε and then choose δ > 0 so that (a, a+δ) contains none of the numbers c1 , c2 , . . . , cN . If a < x < a + δ, then X X 0 ≤ f (x) − f (a) = 2−n ≤ 2−n < ε. n:aN

Therefore, f is right continuous at a. If a 6∈ {cn }, then we may choose δ so that (a − δ, a] contains none of the numbers c1 , c2 , . . . , cN . If a − δ < x < a, then, as before, X 0 ≤ f (a) − f (x) = 2−n < ε. n:x 0. We show that the set Dε := {x ∈ [0, 1] : |f (x) − g(x)| ≥ ε} is finite. The desired

526

A Course in Real Analysis conclusionSwill follow on observing that the set of discontinuities off is ∞ precisely n=1 D1/n . Suppose Dε is infinite. Then there exists a sequence of distinct terms such that |f (xn ) − g(xn )| ≥ ε for all n. By the Bolzano–Weierstrass theorem, {xn } has a convergent subsequence, say xnk → x. Because the terms of {xn } are distinct, xnk 6= x for all large k, hence f (xnk ) → g(x). Also, by continuity, g(xnk ) → g(x). But this contradicts the inequality |f (xnk ) − g(xnk )| ≥ ε.

Section 3.4 2. Let x0 > 0 and choose r > x0 such that f (x) < f (x0 ) for all x with |x| > r. Then the maximum M of f on [−r, r] is ≥ f (x0 ), hence M must be the maximum of f on R. 4. (e) Suppose f is not upper semicontinuous at x0 . Choose r such that f (x0 ) < r < lim supx→x0 f (x) and then i such that fi (x0 ) < r. For each δ > 0, r < sup0 0, choose δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ. Then choose N such that |an − am | < δ for all m, n ≥ N . For such m, n, |f (an ) − f (am )| < ε. 9. The inequality |x| − |y| ≤ |x − y| shows that |x| is uniformly continuous. The given functions are therefore compositions of uniformly continuous functions. 13. If 0 < p ≤ 1, then sin x = 0, and x→+∞ xp lim

lim+

x→0

sin x sin x = lim+ x1−p = 0 or 1. p x→0 x x

Therefore, (sin x)/xp has a uniformly continuous extension to [0, +∞). If p > 1, (sin x)/xp is continuous on (0, +∞) but has no continuous extension to [0, +∞). 15. Since f may be extended continuously to [a, b], it is bounded. The examples f (x) = x on (0, +∞) and f (x) = 1/x on (0, 1) show that the assumptions cannot be relaxed. 18. f (x) has unequal one-sided limits at 0 while those of g(x) are equal. Hence 0 is a removable discontinuity of g but not of f .

Section 4.1 f (x + h) − f (x) = h 2 1 →√ (b) √ . √ 2x + 1 2x + 2h + 1 + 2x + 1 −3 −3 √ → (d) √ . √ √ 2(3x + 2)3/2 3x + 3h + 2 3x + 2 3x + 2 + 3x + 3h + 2 2 x −1 3(5x + 7)1/3 5(3x + 2)1/5 4x cos . 3. (a) + . (c) 2 (x + 1)2 x2 + 1 5(3x + 2)4/5 3(5x + 7)2/3

1. If f denotes the given function,

4. (b) −

1 y − y cos xy 2 2x

528

A Course in Real Analysis

7. f is continuous at 1 iff 2a + b = 1. For such a and b, f is differentiable iff a + b = 3. Therefore, a = −2 and b = 5. 11. (b) The difference quotient is f (a − h) − f (a) f (a + h2 ) − f (a) h+ → f 0 (a). h2 −h 14. For all h 6= 0, [f (x + h) − f (x)]/h ≥ 0, hence f 0 (x) ≥ 0. 16. Clear for n = 2. Suppose the assertion holds for n ≥ 2. Then D

n+1

n X n (f g) = D (Dk f )(Dn−k g) k k=0 n X n k+1 = (D f )(Dn−k g) + (Dk f )(Dn+1−k g) k k=0 n X n n = + (Dk f )(Dn+1−k g) + gDn+1 f + f Dn+1 g k−1 k k=1 n+1 X n + 1 (Dk f )(Dn+1−k g). = k k=0

18. (c) (f 0 ◦ g)g 00 + (f 00 ◦ g)(g 0 )2 . 19. (a)

(−1)n n! . xn+1

sin xn 21. If x 6= 0, f (x) = x n cos x + m n . Also, x does not exist if n + m < 1, sin xn n+m−1 0 f (0) = lim x =0 if n + m > 1, x→0 xn =1 if n + m = 1. 0

m+n−1

n

Therefore, f 0 is continuous at 0 if n + m ≥ 1. 23. For the second order determinant use the expansion f1 f2 g1 g2 = f1 g2 − f2 g1 . For the third order determinant, expand along a row or column and use the formula for the second order case. The same idea may be applied to nth order determinants.

Solutions to Selected Problems

529

Section 4.2 √ 1. Set f (x) = cos x − x + 1. Since f (0) > 0 > f (π/2), f has at least one zero in (0, π/2), by the intermediate value theorem. Since f 0 < 0 on (0, π/2), f is strictly decreasing so the zero is unique. 3. Since f 0 (x) = 4x(x − 1)(x − 2) < 0 on (1, 2), f has at exactly one zero in the interval (1, 2) iff f (1)(= 1 + c) and f (2)(= c) have opposite signs, that is, iff c < 0 < c + 1, or −1 < c < 0. 7. The assertion is clear if n = 0. Suppose it holds for all polynomials with degree ≤ n. Let P (x) have degree n + 1 and suppose that the equation sin(ax) = P (x) has more than n + 2 solutions. Then f (x) := sin(ax) − P (x) has more than n + 2 zeros, hence, by Rolle’s theorem, f 00 (x) = −a2 sin(ax) − P 00 (x) has more than n zeros. But this means that sin(ax) = −P 00 (x)/a2 has more than n solutions, contradicting the induction hypothesis. 9. By the Cauchy mean value theorem, |f (x) − f (y)| |g 0 (c)| = |g(x) − g(y)| |f 0 (c)| ≤ |g(x) − g(y)| |g 0 (c)|. 11. The derivative of x−1 sin x is negative since tan x > x, 0 < x < π/2. 17. Let c1 < · · · < cm be the distinct zeros of P 0 . By the intermediate value theorem, P 0 has a constant sign on (cj , cj+1 ). Therefore, P (x) is strictly monotone on these intervals. 19. Let |f 0 | ≤ c < r. Then g 0 (x) = r + f 0 (x) ≥ r − c > 0, so g is strictly increasing, hence one-to-one. By the mean value theorem, |f (x) − f (0)| ≤ c|x| or f (0) − c|x| ≥ f (x) ≤ f (0) + c|x|. Therefore, f (0) + rx − c|x| ≤ g(x) ≤ f (0) + rx + c|x|. Thus x > 0 ⇒ g(x) ≥ f (0) + (r − c)x ⇒ limx→+∞ g(x) = +∞, and x < 0 ⇒ g(x) ≤ f (0) + (r − c)x ⇒ limx→−∞ g(x) = −∞. By the intermediate value theorem, g(R) = R. 22. g 0 (0) = 0, hence f 0 (0) > 0. Since f (±1/nπ) = ±1/nπ for all n ∈ N, f is not monotone on any neighborhood of 0. 25. Let a, b ∈ I with a < b and suppose that f 0 (a) < y0 < f 0 (b), so 0 0 g (a) < 0 < g (b). Then g(x) − g(a) /(x − a) < 0 for x ∈ (a, a + δ), so the minimum of g cannot occur at a. Similarly, the minimum of g cannot occur at b. Thus, by the local extremum theorem, g 0 (x0 ) = 0, that is, f 0 (x0 ) = y0 , for some x0 ∈ (a, b).

530

A Course in Real Analysis

28. Set q(x, y) = [f (x) − f (y)](x − y), x 6= y. If f is uniformly differentiable on I, then |f 0 (x) − f 0 (y)| ≤ |f 0 (x) − q(x, y)| + |f 0 (y) − q(x, y)| shows that f 0 is uniformly continuous. Conversely, assume that f 0 is uniformly continuous. By the mean value theorem, for each x < y there exists a z ∈ (x, y) such that |q(x, y) − f 0 (y)| = |f 0 (z) − f 0 (y)|. It follows that f is uniformly differentiable. 31. If such a function exists, then lim

x→y

f (x) − f (y) = ϕ(y, y) x−y

so f 0 (y) = ϕ(y, y), which is continuous in y. Conversely, assume f is continuously differentiable on an open interval I and define f (x) − f (y) if x 6= y, x−y ϕ(x, y) = 0 f (x) if x = y. Clearly ϕ is continuous on {(x, y) ∈ I × I : x 6= y}. By the mean value theorem, ϕ(x, y) = f 0 (ξxy )(x − y), where ξxy is between x and y. The continuity of ϕ on I × I now follows from the continuity of f 0 .

Section 4.4 1. (b) (2 − 3x)/(2x − 3), x 6= 3/2. (f) cos−1

3x − 2 , 1/2 < x < 3/4. 1−x

5. (b) Fix y > 0 and let f (x) = ln(xy) − ln x − ln y. Since f 0 (x) = 0, f (x) = f (1) = 0 for all x > 0. 7. (b) ax+y = exp((x + y) ln a) = exp(x ln a) exp(y ln a) = ax ay . 9. xa = exp(a ln x), hence (xa )0 = exp(a ln x)(a/x) = axa−1 . 13. The derivative of the left side of (c) is 2 4x p − , y := x2 + 1 (x2 + 1)2 1 − y 2

x2 − 1 x2 + 1

which reduces to 0. Therefore, the left side is constant.

2 ,

Solutions to Selected Problems

531

14. Set c = f 0 (0). Since f (h) − 1 f (x + h) − f (x) = f (x) , h h f 0 (x) exists and equals cf (x). Therefore, e−cx f (x) has zero derivative, hence e−cx f (x) = f (0). f 00 f −1 (x) −1 00 18. (f ) (x) = − 3 . f 0 f −1 (x)

Section 4.5 1. (a) p − q.

(d) −1.

(g) −2.

(s) 1 if p > 1, +∞ if p ≤ 1.

(j) 0.

(m) −∞.

(p) 0.

(v) 1.

2. (c) f (0) = limx→0+ f (x) = 5/3. 3. (a) ln an = n−1 ln sin(1/n) is of the form −∞ +∞ , hence has the same limit as 1 cos(1/n) cos(1/n) 1 =− → 0. 2 −1 n sin(1/n) n n sin(1/n) Therefore, an → 1. 6. By logarithmic differentiation, 1 ln x 1/x f 0 (x) = 1 + x . x1/x − x x By l’Hospital’s rule, x1/x → 1, hence limx→+∞ f 0 (x) = 1. Applying the mean value theorem to f on each of the intervals [n, n + 1] shows that f (n + 1) − f (n) → 1. 9. Let L := limx→+∞ f 0 (x)/g 0 (x). By l’Hospital’s rule, limx→+∞ g(x)/f (x) exists and equals 1/L. Another application of l’Hospital’s rule yields ln f (x) f 0 (x) g(x) = lim 0 = 1. x→+∞ ln g(x) x→+∞ g (x) f (x) lim

10. (a) By l’Hospital’s rule, the quotient has the same limit as αβ f 0 (a + αh) − f 0 (a + βh) 2 h αβ f 0 (a + αh) − f 0 (a) f 0 (a + βh) − f 0 (a) = α −β , 2 αh βh which is αβ(α − β)f 00 (a)/2.

532

A Course in Real Analysis

12. Apply l’Hospital’s rule n times to f (x)/x−n to obtain lim+ xn f (x) = lim+ x→0

x→0

where a =

(−1)n f (n) (x) = lim+ ax2n f (n) (x), x→0 n(n + 1) . . . (2n − 1)x−2n

(−1) (n − 1)! . Therefore, lim+ xn f (x) exists and equals aL. x→0 (2n − 1)! n

16. By l’Hospital’s rule, f (g(x)) f 0 (g(x))g 0 (x) = lim = L. x→+∞ g(x) x→+∞ g 0 (x) √ For examples, take f (x) = x, ln x, or x + 1/x, and g(x) = xn , ex , or ln x. lim

18. By l’Hospital’s rule, xf (x) = lim xf 0 (x) + f (x) x→+∞ x→+∞ x = lim xf 0 (x) + lim f (x).

lim f (x) = lim

x→+∞

x→+∞

x→+∞

For the second part consider, f (x) = ln x.

Section 4.6 2. Apply Taylor’s theorem to the function between the inequalities to produce the number c ∈ (0, x) in the remainder term: (b) f (k) (x) = (−1)k e−x , hence e−x =

k=0

e−c ∈ (0, 1). 1 3. Let In := n! ing,

2n−1 X

Z

x

(x − t)n f (n+1) (t) dt = −

a

In = −

n X f (k) (a) k=1

k!

(−1)k k e−c 2n x + x , where k! (2n)!

f (n) (a) (x − a)n + In−1 . Iteratn!

(x − a)k + I0 = −Tn (x, a) + f (x).

5. By Taylor’s theorem, bk =

n−k P (k) (b) 1 X = (j + 1)(j + 2) · · · (j + k)(b − a)j ak+j . k! k! j=0

Section 4.7 1. (a) −1.52137970. 2. (a) 0.87672621. 4. 7.937253933.

(d) −1.42360584.

(g) 1.220924381.

(c) 1.55714559.

Solutions to Selected Problems

533

Section 5.1 3. Since Mj (−f ) = −mj (f ), S(−f, P) = −S(f, P), hence Z

b

(−f ) = inf S(−f, P) = inf (−S(f, P)) = − sup S(f, P) = −

a

P

P

Replacing f by −f shows that

P

Rb a

(−f ) = −

Rb a

Z

b

f. a

f.

5. Since g may be obtained from f by changing one point at a time, we may assume that f = g except at a single point c ∈ (a, b). Let ε > 0 and let M be a bound for both |f | and |g|. The point c is in at most two intervals of any partition P, and each of these has width ≤ kPk. Since f = g on the remaining intervals, |S(f, P) − S(g, P)| ≤ 2M kPk. It follows from 5.1.15 that

Rb a

f=

Rb a

g. Similarly

Rb a

f=

Rb a

g.

6. (c) Let g = sin f , ε > 0, and let P be any partition of [a, b] such that S(f, P) − S(f, P) < ε. For fixed j, choose sequences an , bn ∈ [xj−1 , xj ] such that g(an ) → Mj (g) and g(bn ) → mj (g). Then g(an ) − g(bn ) ≤ |f (an ) − f (bn )| ≤ Mj (f ) − mj (f ), hence Mj (g) − mj (g) ≤ Mj (f ) − mj (f ). Therefore, S(g, P) − S(g, P) ≤ S(f, P) − S(f, P) < ε. 7. (a) Let L = limP F (P) and M = limP G(P). Given ε > 0, choose Pε0 and Pε00 such that |F (P) − L| < η for all partitions P refining Pε0 and |G(P) − M | < η for all partitions P refining Pε00 , where η = ε/(2|α| + 2|β| + 2). Let Pε denote the common refinement of Pε0 and Pε00 . Then both inequalities hold for any partition P refining Pε , hence |(αF (P) + βG) − (αL + βM )| ≤ |α||F (P) − L| + |β||G(P) − M | < ε. Rb Rb (b) Given ε > 0, choose Pε such that a f − ε < S(f, Pε ) ≤ a f . The inequality still holds if Pε is replaced by a refinement. Therefore, Rb f = limP S(f, P). a

Section 5.2 1. Assume cn → c ∈ (a, b). Choose δ > 0 so that a < c − δ < c + δ < b and choose N so that cn ∈ (c − δ, c + δ) for all n > N . Since f has only finitely many discontinuities on [a, c − δ] ∪ [c + δ, b], f is integrable on

534

A Course in Real Analysis these intervals and the integrals are zero. Thus, given ε > 0, there exist partitions P1 of [a, c − δ] and P2 of [c + δ, b] such that S(f, P1 ) − S(f, P1 ) < ε/3 and S(f, P2 ) − S(f, P2 ) < ε/3. Define a partition P on [a, b] by P = P1 ∪ P2 and let |f | ≤ M on [a, b]. If δ < ε/6M , then S(f, P) − S(f, P) ≤ S(f, P1 ) − S(f, P1 ) + S(f, P2 ) − S(f, P2 ) + 2M δ < ε. Therefore, f ∈ Rba . Moreover, Z a

hence

b

f=

Z

c−δ

f+

c+δ

Z

a

f+

c−δ

Z

b

Z f ≤

a

b

Z

f=

c+δ

Z

c+δ

f, c−δ

c+δ

|f | ≤ 2M δ.

c−δ

Since δ may be made arbitrarily small,

Rb a

f = 0.

5. Set Mn = max{f1 , . . . , fn }. Then M2 = f1 + f2 + |f1 − f2 | /2 ∈ Rba . Since Mn = max{Mn−1 , fn }, the general result follows by induction. A similar argument holds for min. Rb 6. Choose x0 such that f (x0 ) = supa≤x≤b f (x). Then a f ≤ f (x0 )(b − a) < M (b − a). 9. Let |f | ≤ M on [a, b]. Then |F (x, y) − F (x, y0 )| ≤ M (y − y0 ), hence limy→y0 F (x, y) = F (x, y0 ). 12. (a) By the approximation property, choose x0 such that |f (x0 )| > M − ε. By continuity, we may take x0 ∈ (a, b) and we may choose δ > 0 such that |f (x)| > M − ε for all x ∈ (x0 − δ, x0 + δ). Then M (b − a) ≥

Z

b

Z

x0 +δ/2

|f | ≥

|f | ≥ δ(M − ε). x0 −δ/2

a

(b) By (a), |f (x)|p > (M − ε)p on (x0 − δ, x0 + δ), hence, as in (a), δ(M − ε)p ≤

Z

b

|f |p ≤ M p (b − a).

a

Therefore, δ 1/p (M − ε) ≤

Z a

b

|f |p

1/p

≤ M (b − a)1/p ,

Solutions to Selected Problems

535

hence M − ε ≤ lim inf

b

Z

p→+∞

|f |p

1/p

≤ lim sup

Z

p→+∞

a

b

|f |p

1/p

≤ M.

a

Since ε was arbitrary, lim inf

Z

p→+∞

b

|f |

p

1/p

= lim sup p→+∞

a

b

Z

|f |p

1/p

= M.

a

Section 5.3 1. By a change of variables and periodicity, Z p Z p+y f (x + y) dx = f (x) dx y

0

=

Z

p

f (x) dx +

y

=

Z Z

f (x) dx

p p

f (x) dx +

p+y

Z

y

=

p+y

Z

f (x − p) dx

p p

f (x) dx +

y

Z

y

f (x dx =

0

f (x) dx.

0 1

Z

3. (a) On [0, 1], 2x/π ≤ sin x ≤ x. Since

p

Z

√

0

inequalities follow.

x x2 + 1

dx =

√

2 − 1, the

5. (a) Substituting y = x1/n and integrating by parts n − 1 times yields Z

1

exp x1/n dx = n

Z

0

1

y n−1 ey dy = F (1) − F (0),

0

where F (y) = (−1)n+1 n!ey

n−1 X j=0

(−1)j j y . j!

7. Let I denote the integral. Successive integration by parts yields Z 1 (k − 1)(k − 3) · · · (k − 2j + 1) I= Ij , Ij := xk−2j (1 − x2 )j−1/2 dx. 1 · 3 · · · (2j − 1) 0 If k is odd, take j = (k − 1)/2 so I=

(k − 1)(k − 3) · · · 4 · 2 Ij , Ij = 1 · 3 · · · (k − 2)

Z 0

1

x(1 − x2 )(k−2)/2 dx = k −1 .

536

A Course in Real Analysis If k is even, take j = k/2 so (k − 1)(k − 3) · · · 3 · 1 I= Ij = Ij = 3 · 5 · · · (k − 1)

Z

1

(1 − x2 )(k−1)/2 dx.

0

By trig. substitution and Exercise 6, Ij =

π (k − 1)(k − 3) · · · 3 · 1 . 2 k(k − 2) · · · 4 · 2

π/2

Z

cosk θ dθ =

0

9. Substituting s = f (t) and integrating by parts yields y

Z

f −1 (s) ds =

Z

0

hence Z x

f+

0

f −1 (y)

tf 0 (t) dt = yf −1 (y) −

0

Z

f

= yf

−1

f, 0

y −1

f −1 (y)

Z

(y) +

x

Z

0

Z f−

0

f −1 (y)

f = yf

−1

(y) +

Z

0

x

f. f −1 (y)

If f −1 (y) ≤ x, then f (t) ≥ y for all t ∈ [f −1 (y), x], hence Z x Z x f (t) dt + yf −1 (y) ≥ y dt + yf −1 (y) = xy. f −1 (y)

f −1 (y)

On the other hand, if f −1 (y) ≥ x then f (t) ≤ y for all t ∈ [x, f −1 (y)], hence Z x Z f −1 (y) f (t) dt + yf −1 (y) ≥ − y dt + yf −1 (y) = xy. f −1 (y)

x

10. (b) Take f (t) = ln(t + 1), 0 ≤ x ≤ 1, and 0 ≤ y ≤ ln 2 in Young’s inequality to obtain Z x Z y (x + 1) ln(x + 1) − x + ey − y − 1 = ln(t + 1) dt + (es − 1) ds ≥ xy. 0

0

Replace x + 1 by x, 1 ≤ x ≤ 2. 13. Integrate by parts to obtain Z b a 1 Z b f (x) sin(nx) dx = f (x) cos(nx) + f 0 (x) cos(nx) dx. n a b a 17. If F is a primitive of f , then chain rule.

Z

v(x)

u(x)

f = F v(x) − F u(x) . Now use the

Solutions to Selected Problems

537

19. By l’Hospital’s rule, Z x Z x i h g(x) lim f = lim g(x)f (x) + g 0 (x) f = g(a)f (a). x→a x − a a x→a a Z

20. (a) sn is a Riemann sum for

1

xp dx, hence limn→+∞ sn = 1/(p + 1).

0

21. By the mean value theorem, |f (x) − f (xk−1 )| ≤ M |x − xk−1 | ≤ M (xk − xk−1 ) = M h, x ∈ [xk−1 , xk ], hence n Z Z b n X X f− f (x )h k−1 = a

k=1

k=1

xk

xk−1

f (x) − f (xk−1 ) dx ≤ M nh2 .

Section 5.5 3. The substitution t = sin x yields Z

√

π/3

f sin x dx =

π/6

Z

3/2

1/2

f (t) √ dt. 1 − t2

Now apply 5.5.3 with g(t) = (1 − t2 )−1/2 . Z b 7. G(b) ≤ f g ≤ G(a). Now apply the intermediate value theorem to G. a

9. Apply 5.5.3 to obtain c ∈ [0, 1] such that Z π Z c Z g(x) sin x dx = g(0) sin x dx + g(1) 0

0

π

sin x dx = cos c + 1.

c

Section 5.7 R1 R1 Rε 1. Let f (x) denote the integrand and 0 < ε < 1. Then 0 f = 0 f + ε f. On (0, ε], 2x/π ≤ sin x ≤ x and 1 − ε ≤ 1 − x < 1, hence 1 (π/2)p ≤ f (x) ≤ . p |x| (1 − ε)q |x|p On [ε, 1),

Therefore,

1 1 ≤ f (x) ≤ . (1 − x)q sinp 1 (1 − x)q sinp ε R1 0

f converges iff p, q > 1.

538

A Course in Real Analysis

5. Only (b) and (d) diverge. p sin x < 1 for 0 < x < r. Then x Z Z ε Z ε sinp x 1 ε q−p x ≤ xq−p dx. dx ≤ 2 0 xq 0 0

8. Choose r > 0 so that 1/2 <

Now apply 5.7.3(a). 9. (a) all p.

(k) p > −1. Rx 11. Let g(x) = x(1+x2 )−1 , h(x) = sin x and f := gh. Then | 1 h| is bounded and g 0 < 0 so, by 5.7.17, f is improperly integrable on [1, +∞). For every n, Z

(c) all p.

∞

|f | dx ≥ 0

n Z X j=2

=M

jπ

(j−1)π

n X j=2

(h) p > −2.

n

X x| sin x| dx ≥ 2 1+x j=2

Z

jπ

(j−1)π

π(j − 1)| sin x| dx 1 + π2 j 2

j−1 , 1 + π2 j 2

where M is a positive constant. The sums in the last equality are unbounded, hence h is not improperly absolutely integrable in this case. 13. (a) Converges for all p > 0 if 0 < q < 1; diverges for all p > 0 if q ≥ 1. (b) Converges for all p > 0 if 0 < q < 1; diverges for all p > 0 if q ≥ 1. (c) Converges if p > 2 or q > 2 and diverges otherwise. (d) Converges if p < 2 or q < 2 and diverges otherwise. (e) Converges iff q < 1. (f) Converges iff pq < 1. 15. Integrate by parts: Z ∞ Z 2 In := x2n e−x /2 dx = (2n − 1) −∞

∞

x2n−2 e−x

2

/2

dx = (2n − 1)In−1 .

−∞

20. Both integrals converge. The root test is inconclusive. 24. By the Cauchy–Schwarz inequality, Z ∞ 1/2 Z ∞ 1/2 Z ∞p f (x) dx ≤ f (x) dx < +∞. x x2 1 1 1

Solutions to Selected Problems

539

Rx 26. Let F (x) = a f g, a ≤ x < b, and let bn ↑ b. By the weighted mean value theorem, F (bm ) − F (bn ) = f (cm,n )[G(bm ) − G(bn )] for some cm,n between bm and bn . Since G is bounded and f (cm,n ) → 0, {F (bn )} is a Cauchy sequence and hence converges. Since {bn } was Rb arbitrary, a f g converges. Rt 28. Let F (t) = 0 f dx. Then Z

t

f (x + c) dx =

Z

c+t

f (x) dx =

c

0

hence

Z Z

f (x) dx + F (t + c) − F (t),

c

∞

f (x + c) dx =

Z

∞

f (x) dx.

c

0

Similarly,

t

Z

0

f (x + c) dx =

−∞

Z

c

f (x) dx.

−∞

Section 5.8 2. Given ε > 0, let An be covered by intervals In,k , k = 1, 2, . . ., with total length < ε/2n . Then the union is covered by intervals In,k , n, k = 1, 2, . . ., with total length < ε. 6. The discontinuity set is countable, hence the integral exists. Since all lower sums are zero, the integral must be zero.

Section 6.1 1. (a)

m3 (m + 1) . 2m + 1

m

(c) ln(3/2).

(e)

1 X (−1)k . m k k=1

(i) − ln(m + 1).

(n)

m X (−1)k

k

k=1

2. (a)

(g)

23 . 480

.

1 . 1 + r2

3. (a) 193e. 5. Let sn =

(c) (e − 1/e)/2. n n n X X X 1 1 4 , un = , and vn = . k 2k − 1 (2k − 1)(2k + 1)

k=1

k=1

k=1

(a) s2n = sn /2 + un , hence, by 6.1.9, un −

1 2

ln n = [s2n − ln(2n)] − 12 [sn − ln n] + ln 2 → 21 γ + ln 2.

540

A Course in Real Analysis

8. Given ε > 0, choose N such that L − ε < ak /bk < L + ε for all k ≥ N . Multiplying by bk and summing, (L − ε)

m X k=n

bk <

m X k=n

ak < (L + ε)

m X

bk , m > n ≥ N.

k=n

Letting m → +∞ and dividing, P∞ ak < L + ε, n ≥ N. L − ε < Pk=n ∞ k=n bk P P 12. Let sn and tn denote the nth partial sums of an and bn , respectively. P Then tk = snk so P {tk } is a subsequence of {sn }. Therefore, if n an converges, so does k bk . If the Pterms an are nonnegative, then, P for each b n, sn ≤ t for k ≥ n, hence if converges, then so does k k k n an . The P∞ series n=0 (−1)n shows that the latter assertion fails in general. 15. By summing a geometric series, a real number x with representation bN bN −1 · · · b0 .a1 a2 · · · an 999 · · · , where an 6= 9 may be written as bN bN −1 · · · b0 .a1 a2 · · · an + 10−n = bN bN −1 · · · b0 .a1 a2 · · · an−1 a0n , where a0n := an + 1. Therefore, a real number has at least one standard representation. Suppose that bN bN −1 · · · b0 .a1 a2 · · · = cM cM −1 · · · c0 .d1 d2 · · · are standard representations. Then |bN bN −1 · · · b0 − cM cM −1 · · · c0 | = |(.d1 d2 · · · ) − (.a1 a2 · · · )| ∞ X |dj − aj | ≤ . 10j j=1 Since the representations are standard, |dj − aj | cannot eventually equal 9, hence the right side is < 1. Therefore, since the left side is an integer, it must be zero. It follows from Exercise 1.5.16 that M = N and bj = cj , 0 ≤ j ≤ N . Then a1 .a2 a3 · · · = d1 .d2 d3 · · · , hence a1 = d1 . An induction argument shows that an = dn for all n.

Section 6.2 1. By the ratio test, (a), (b), (e), and (f) converge; (c) and (d) diverge. 2. (a) Converges by ratio test. (d) Converges by ratio test. (g) Converges by integral test iff p > 1. P (j) Diverges by limit comparison with 1/n.

Solutions to Selected Problems P (m) Converges by limit comparison with 1/n2 .

541

(p) Diverges by ratio test. (s) Diverges since 2ln n = np , p = ln 2 < 1. P (v) Converges by limit comparison with 1/2n . 5. For all sufficiently large n, an < an n1/n < 2an . 6. (a) Converges iff p > 1. q > 1 + p.

(e) Converges iff q > p.

8. (a) Since an → 0, a2n < an for all large n. Therefore, comparison test.

(g) Converges iff P

bn converges by

(d) Converges by comparison test: bn ≤ an . (h) Converges: For n sufficiently large, say n ≥ N , an < 1, hence bn = M aN · · · an < M an , where M = a1 · · · aN −1 . (l) Converges by the Cauchy–Schwarz inequality. 11. The inequality implies that {an /bn } is a decreasing sequence and hence converges to L < +∞. Now use the comparison test. 14. Since limx→∞ f (g(x)) = limx→∞ g(x) = 0, l’Hospital’s rule implies that 0 0 limx→∞ f (g(x))/g(x) P = limx→∞ f (g(x)) P = f (0). Now apply the limit comparison test to n f (g(n)) and n g(n). P 15. (a) If f (1/np ) converges, then f (0) = limn f (1/np ) = 0. Suppose f (xp ) f 0 (0) 6= 0. Then, by l’Hospital’s rule, limx→0 2p = ∞. Therefore, x eventually f (1/np ) > 1/n2p so the series diverges by the comparison test. 17. (a) n!(e − sn ) = m(n − 1)! −

n X n! k=1

k!

∈ N.

∞ X

1 (n + k)! k=1 1 1 1 = 1+ + + ... (n + 1)! n + 2 (n + 2)(n + 3) 1 1 1 < 1+ + + ... (n + 1)! n + 1 (n + 1)2 1 n+1 = , (n + 1)! n

(b) e − sn =

hence n!(e − sn ) < 1/n. By (a) and (b), n!(e − sn ) is a positive integer < 1/n, which is impossible.

542

A Course in Real Analysis

Section 6.3 3. (a) and (c) diverge: dn → 2/3; (b) converges: dn → 3/2. 5. (a) By ratio test: series converges if p < e and diverges if p > e. If p = e, series diverges by Raabe’s test since then dn → −1/2. 6. Ratio test fails. Raabe: dn → (1 + p)/2, hence converges if p > 1 and diverges if p < 1. Also diverges if p = 1, since then an = 1/(2n + 1). 10. (a) Diverges.

(d) Converges iff r > 1.

13. − ln an / ln n → ln b. 16. (a) Converges iff q > p. 18. Let c > 1 and choose r ∈ (1, c). Then, for sufficiently large n, cn > r, hence r ln a−1 n = ln n + cn ln ln n > ln n + r ln ln n = ln n(ln n) R∞ 1 . Since 2 1/x(ln x)r dx < +∞, the integral r n(ln n) and comparison tests complete the proof in this case. The case c < 1 is similar. and therefore an <

The given series diverges. 21. Take bn = n ln n in Kummer’s test. Then 1 βn n cn = 1 + + n ln n−(n+1) ln(n+1) = (n+1) ln +βn . n n ln n n+1 Since the first term on the right side tends to −1, lim inf βn > 1 implies n→∞ lim inf cn > 0, and lim sup βn < 1 implies lim inf cn < 0. n→∞

n→∞

n→∞

Section 6.4 2. Choose r > 1 and N ∈ N such that |an+1 |/|an | > r for all n ≥ N . Then |aN +k | > rk |aN | for all k, hence an 6→ 0. Therefore, series diverges. 4. (a) Diverges.

(b) Converges conditionally.

(c) Converges absolutely if p > 1, conditionally if p ≤ 1. (i) Converges absolutely if p > 1/2, conditionally if p ≤ 1/2. (m) Converges absolutely if p > 1, diverges if p < 1. n − 1/2 . If p ≤ 1, then bn sin nθ need not tend to zero (see + (−1)n Example 8.3.10). For p > 1, it suffices by Dirichlet’s test to P show that P |bn+1 − bn | < +∞. This follows by limit comparison with 1/np .

9. Let bn =

np

Solutions to Selected Problems

543

13. (a) For n ∈ N, n = qmn + rn , where rn , mn ∈ N and 0 ≤ rn ≤ q − 1. Since sn − sqmn is a sum of terms of the form aqmn +j , j = 1, . . . q − 1, each of which → 0, sn − sqmn → 0. Therefore, sn → s. (b) For n ∈ N, 1 1 1 1 1 1 1 1 1 1 1 − + + + + + − + + s6n = 1 + + 2 3 4 5 6 7 8 9 10 11 12 1 1 1 1 1 1 + ··· + + + − + + 6n − 5 6n − 4 6n − 3 6n − 2 6n − 1 6n 1 1 1 1 1 1 1 = 1− + − + − + ··· + − 4 2 5 3 6 6n − 3 6n =

6n−3 X 3 3 3 1 3 + + + ··· + =3 . 1·4 2·5 3·6 (6n − 3)6n k(k + 3) k=1

The last expression converges to (1 + 1/2 + 1/3) = 11/6 by 6.1.5 with m = 3. By part (a), s = 11/6. (c) Let tn be the nth partial sum of the series. Then 1 1 1 1 1 1 1 1 1 1 + − − + + + − − + ··· − 2 3 4 5 6 7 8 9 10 5n 1 1 1 1 1 1 1 1 1 + + − − + + + − − + ··· 3 3 3 3 8 8 8 8 8 1 1 1 + + + ··· + . 8 13 5n − 2

t5n = 1 + 1 3 1 = 3 ≥

Thus t5n → +∞, so the series diverges.

Section 6.5 3. (a), (b), (c): Double limit does not exist; only one iterated limit exists. (d), (g), (l): Iterated limits exist and are unequal. Hence double limit does not exist. (e), (h): Iterated limits exist and are equal. Double limit exists. (f), (i), (k): Iterated limits exist and are equal. Double limit does not exist. (j) If a = b, iterated limits exist and are equal, double limit exists. If a 6= b, iterated limits exist and are unequal. Pm Pn Pn 9. Let sm,n = j=1 k=1 aj,k and sn = k=1 bk . Then for m ≥ n, sn ≤ sn,n ≤ sm,n ≤ sm,m ≤ s2m−1 , hence the result follows from the squeeze principle.

544

A Course in Real Analysis

10. (b) Let bn =

Pn

j=1 aj,n+1−j =

n X j=1

1 and let sn = [j 2 + (n + 1 − j)2 ]p/2

Pn

2 2 k=1 bk . The minimum of x +(n+1−x) on [1, n] occurs at x = (n+1)/2 and the maximum at x = 1 and x = n, hence

(n + 1)2 /2 ≤ j 2 + (n + 1 − j)2 ≤ n2 + 1, 1 ≤ j ≤ n, and therefore (n2

2p/2 n n ≤ bn ≤ , p/2 (n + 1)p + 1)

so the double series converges iff p > 2. 11. If |r| ≥ 1, then am,n 6→ 0, hence the double series diverges. Let |r| < 1 and set cm = |r|m /(1−|r|m ). Choose M such that |r|m < 1/2 for m > M . Then ∞ X ∞ X m=1 n=1

|r|mn =

M X

cm +

m=1

∞ X m=M +1

cm ≤

M X

cm + 2

m=1

∞ X

|r|m < +∞.

m=1

Therefore, the iterated series, and hence the double series, converges absolutely. 1/mn

12. Let L < 1. Choose r ∈ (L, 1) and then N suchP that am,n < r for all m, n ≥ N . For such m, n, am,n < rmn , hence am,n converges by Exercises 6 and 11. If L > 1, choose r ∈ (1, L) and then N such that 1/mn am,n > r for all m, n ≥ N . For such m, n, am,n > rmn > 1, hence am,n 6→ 0, so the series diverges.

Section 7.1 1. (b) Pointwise to 0 on (−1, 1] for all p ≥ 0, uniformly on intervals [a, 1] for a > −1 and p < 1. Uniformly on [−1, 1] if p < 0. (d) Pointwise to 0 on R, uniformly on |x| ≥ a > 0. (g) Uniformly to 0 on R. (j) Pointwise on R, uniformly on the sets |x| ≥ r > 1 and |x| ≤ s < 1. 2. (a) Pointwise but not uniformly.

(b) Uniformly.

6. For example, fn (x) = x + 1/n, f (x) = gn (x) = g(x) = x on [1, +∞]. 10. Given ε > 0, choose δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ R with |x − y| < δ. Then choose N such that |an − a| < δ for all n ≥ N . For such n and for all x, |fn (x) − f (x + a)| = |f (x + an ) − f (x + a)| < ε.

Solutions to Selected Problems

545

13. If x ∈ Q has reduced form x = k/m, then fn (x) = 1 for all n ≥ m. Therefore, fn converges pointwise to the Dirichlet function d(x). Suppose the convergence were uniform on [0, 1]. Then we could find n such that |fn (x) − d(x)| < 1 for all x ∈ [0, 1]. In particular, |fn (1/m) − 1| < 1 for all m > n, which is impossible since fn (1/m) = 0. 17. Let M > |f0 (x)| + 1 for all x ∈ S. Then |fn+1 (x) − fn (x)| = | sin rfn (x) − sin rfn−1 (x) | ≤ r|fn (x) − fn−1 (x)| ≤ · · · ≤ rn |f1 (x) − f0 (x)| ≤ M rn . Since r < 1, {fn } is uniformly Cauchy. Therefore, fn → some f , uniformly on S. The generalization is proved in a similar manner, using the mean value theorem.

Section 7.2 4. Let x > 0. By l’Hospital’s rule, n2 xe−nx has the same limit as 2ne−nx , namely, 0. The convergence is not uniform on (0, 1), however, as may be seen by taking bn = 1/n in 7.1.5. An integration by parts shows that R 1 2 −nx n xe dx = 1 − e−n (1 + n) → 1. 0 R1 5. (d) Let L := limn 0 fn . By the mean value theorem, e−x/n − 1 = (−x/n)e−ξ/n , hence √ n e−x/n − 1 e−ξ/n 1 ≤ √ ≤ √ x n n so

√

n e−x/n − 1 /x converges uniformly to zero. Therefore, L = 0. √ √ 6. If x ≥ r > 0, then fn (x) = n/(1 + n2 x2 ) ≤ n/(1 + n2 r2 ), hence fn → 0 uniformly on [r, +∞). The convergence is not uniform on (0, 1), as can be seen by taking bn = 1/n in 7.1.5. A substitution shows that R1 f = n−1/2 arctan n → 0. 0 n Rb 8. (a) n sin fn → f 0 /f uniformly ⇒ a n sin fn → ln f (b) − ln f (a). 9. This follows from the inequality Z x Z Z x fn (t) dt − f (t) dt ≤ a

a

a

x

|fn (t) − f (t)| dt ≤

Z

b

|fn − f |. a

546

A Course in Real Analysis

Section 7.3 1. (a) Pointwise on (1, +∞), uniformly on [r, +∞), r > 1. (d) Uniformly on [0, +∞). (g) Pointwise on (0, +∞), uniformly on [r, +∞), r > 0. (i) If p > 1, pointwise on [0, +∞), uniformly on [0, r]; If p = 1, converges only at x = 0. 2. (b) s(x) = vals.

1 . Pointwise on (1/e, e), uniformly on closed subinter1 + ln x

4. Both s(x) and c(x) converge uniformly on R by the M -test. Therefore, term by term integration is justified so Z x Z π/2 X an X an cos (2n + 1)x , c(t) dt = sin(nx). s(t) dt = 2n + 1 n 0 x n n 6. (a) Let p ≤ 1/2 and x 6= 0. By l’Hospital’s rule, n−1 [1 − cos(x/np )] has the same limit as n → +∞ as −pxn−p−1 sin(x/np ) sin(x/np ) = px2 n1−2p . 2 −1/n x/np Since this limit is positive, (a) follows from the limit comparison test. (b) Since cosine is an even function, to show uniform convergence on intervals [a, b] we may assume a = 0. By the mean value theorem, for each n ∈ N and x ∈ [0, b] there exists xn ∈ [0, b] such that |1 − cos(x/np )| = (x/np )| sin(xn /np )| ≤ b2 /n2p . Therefore, uniform convergence on [0, b] follows from the M -test. Since 1−cos(x/np ) does not converge uniformly to 0 on any unbounded interval, s(x) does not converge uniformly on R. 9. Let |f 0 | ≤ M on I. By the mean value theorem, for each x ∈ I and n ∈ N there exists ξ between x/(n + 1) and 0 such that 1 x |xf 0 (ξ)| rM f = ≤ . n n+1 n(n + 1) n(n + 1) Therefore, s(x) converges uniformly on I by the Weierstrass M -test. Since f 0 is bounded, the derived series ∞ X 1 x s0 (x) = f0 n(n + 1) n+1 n=1 converges uniformly on I and s0 (0) = f 0 (0).

Solutions to Selected Problems

547

11. Since fn ≥ 0, the partial sums of the series increase, so the conclusion follows from Dini’s theorem (7.1.12). 13. For x ∈ [a, b], either fn (a) ≤ fn (x) ≤ fn (b) or fn (b) ≤ fn (x) ≤ fn (a), hence |fn (x)| ≤ Mn := P P max{|fn (a)|, |fn (b)|} ≤ |fn (a)| + |fn (b)|. Since Mn < +∞, s = n converges uniformly on [a, b]. Since each n f Rb PRb fn ∈ R [a, b] , s ∈ R [a, b] and a s = f . a n 15. By Dini’s theorem, the convergence of {gn } is uniform. Therefore, the result follows from 7.3.9. 18. Since g is continuous and n−2 [g + n] ↓ 0, the convergence is uniform on closed bounded intervals I. By 7.3.9, s(x) converges uniformly on I. The P convergence is not absolute for any x (compare with n 1/n).

Section 7.4 1. (a) (−1, 3). 2. (b)

(d) (−1, 1].

(g) (−1/4, 1/4).

(i) (−1, 1).

∞ X 3n−3 n x , −2/3 < x < 2/3. 2n−2 n=3

3. (a) Replace x by x − 1 in (7.12), where |x − 1| < 1, to obtain x ln x = (x − 1) ln x + ln x ∞ ∞ X X (−1)n+1 (−1)n+1 = (x − 1)n+1 + (x − 1)n n n n=1 n=1 ∞ ∞ X X (−1)n+1 (−1)n (x − 1)n + (x − 1)n n − 1 n n=1 n=2 ∞ n X (−1) (−1)n+1 = (x − 1) + + (x − 1)n n − 1 n n=2

=

= (x − 1) +

4. (a)

∞ X (−1)n (x − 1)n . n(n − 1) n=2

∞ X (−1)n+1 2n + 3n n x , |x| < 1/3. n n=1

∞ X (−1)n 4n 2n+1 (g) x , x ∈ R. (2n + 1)! n=0

5. Use arccos x = π/2 − arcsin x and (7.20). 9. (a)

∞ X n=1

(−1)n

x2n−1 . (2n − 1)(2n + 1)!

(e)

∞ X (−1)n n x , x > 0. (2n + 1)! n=0

548

A Course in Real Analysis

10. (b)

x(1 − x2 ) . (1 + x2 )2

11. 27/4. 12. (a) (d)

∞ X n=1 ∞ X

(−1)n+1 cn xn , |x| < 1, cn :=

n X (−1)k k=1

cn x2n+1 , x ∈ R, cn :=

n=0

n X k=0

k

.

(−1)k . (2k + 1)!(n − k)!

√

16. For |x| < ( 5 − 1)/2, (1 − x − x2 )s(x) =

∞ X

cn xn −

n=0

∞ X

cn xn+1 −

n=0

= c0 + c1 x − c0 x +

∞ X

cn xn+2

n=0 ∞ X

(cn − cn−1 − cn−2 )xn

n=2

= 1. 18. Replace x by −t2 in (7.19) to obtain √

∞ X 1 (−1)n (2n)! 2n t , = (n!)2 4n 1 + t2 n=0

|t| < 1.

Integrating from 0 to x yields the desired representation. 21. (a)) Choose r such that Rs−1 = lim supn |cn |1/n < r < 1. Then 2

|cn2 |1/n = |cn2 |1/n

n

< rn → 0,

hence Rt = +∞. (b) If cn = (1 + a/np )n , p > 0, then Rs = 1 and

a −n Rt = lim 1 + 2p n n

−a e 0 = +∞ 1

if if if if

p = 1/2, p < 1/2 and a > 0 p < 1/2 and a < 0 p > 1/2.

22. (a) If 0 < Rs < +∞, choose N such that |cn |1/n < 2Rs−1 for all n ≥ 2 N . For such n, |cn |1/n < (2Rs−1 )1/n → 1, hence Rt ≥ 1. Similarly, 2 |cn |1/n > (Rs−1 /2)1/n for infinitely many n, hence Rt ≤ 1. P∞ 27. By the alternating series test, n=0 cn xn converges at x = −1, hence the result follows from Abel’s continuity theorem.

Solutions to Selected Problems n X

549

x , x ∈ [0, 1). By 7.4.6 and (1 − x)2 k=1 the boundedness of f , sn (x) → s(x) uniformly on [0, r], 0 < r < 1.

28. (a) Let sn (x) =

kxk and s(x) =

30. Define h on I ∪ J by ( h(x) =

f (x) if x ∈ I, g(x) if x ∈ J.

By 7.4.19, f = g on I ∩ J, hence h is well-defined and analytic on I ∪ J. 33. (a) P By 7.4.13,n if the series g(x) converges for |x−a| < r1 , then f (x)g(x) = cn (x − a) , where c0 = a0 b0 = a0 = 1 and cn =

n X

ak bn−k = a0 bn − bn = 0, n ≥ 1.

k=0

Therefore, f (x)g(x) = 1 for |x − a| < r1 . (b) Suppose |an | ≤ M n for all n. If |bj | ≤ (2M )j for 1 ≤ j ≤ n − 1, then |bn | ≤

n X

|ak ||bn−k | ≤

k=1

n X

2n−k M k M n−k < (2M )n .

k=1

By induction, |bn | ≤ (2M ) for all n. n

(c) By 7.4.16, there exists a constant M > 0 such that |an | ≤ M n for all n, hence (b) holds. By 7.4.16, g is analytic at a.

Section 8.1 1. Only (b) and (d) are not metrics. 3. Symmetry and coincidence are clear. To verify the triangle inequality d(x, y) ≤ d(x, z) + d(y, z) simply note that if xj 6= yj then either xj 6= zj or yj 6= zj so that every index j contributing to d(x, y) also contributes to d(x, z) + d(y, z). 5. By the triangle inequality, d(x, y) ≤ d(x, a) + d(a, y) ≤ d(x, a) + d(a, b) + d(b, y), hence d(x, y) − d(a, b) ≤ d(x, a) + d(b, y). Similarly d(a, b) − d(x, y) ≤ d(x, a) + d(b, y). 10. Let {xn } be a Cauchy sequence in E. Some Ej must contain a subsequence of {xn }, and since Ej is complete, the subsequence converges to a member of Ej . By Exercise 9, {xn } converges. The assertion is false for infinitely sets. For example, let {r1 , r2 , . . .} be an enumeration of the rationals, and take En = {rn } (or {r1 , . . . , rn }).

550

A Course in Real Analysis

13. The proof of (a) is straightforward. For the necessity in (b), let {(xn , yn )} be Cauchy in Z. Since d(xn , xm ) ≤ η (xn , yn ), (xm , ym ) , {xn } is Cauchy in X. Similarly, {yn } is Cauchy in Y . The converse is clear. Part (c) is proved in a similar manner, and (d) follows from (b) and (c). 15. Part (a) is straightforward. For example, if ρ(x, y) = 0, then ρ(x, y) = d(x, y), hence x = y. Parts (b) and (c) follow from the observation that ρ(x, y) = d(x, y) if either term is less than a. Part (d) follows from (b) and (c). The metrics need not be metrically equivalent: Take d to be the usual metric on R. The function σ does not define a metric on X since σ(x, x) = a > 0. 18. (a) The triangle inequality follows from the observation that the function t(1 + t)−1 is increasing on [0, +∞). The remaining properties of a metric are easily established. Parts (b) and (c) follow from the definition of ρ and the equation ρ(x, y) d(x, y) = , 1 − ρ(x, y) noting that ρ < 1. The metrics |x−y| and |x−y|/(1+|x−y|) are not metrically equivalent. 20. By Exercise 18, each ρk is a metric on X. It follows easily that ρ is a metric on X. For (b), suppose ρ(xn , x) → 0. Since ρk ≤ 2k ρ, ρk (xn , x) → 0. By Exercise 18, dk (xn , x) → 0. Conversely, suppose dk (xn , x) → 0, hence ρ k (xn , x) → 0, for each k. Given ε > 0, choose M ∈ N such that P −n < ε/2 and choose N > M so that n>M 2 ρ1 (xn , x) + ρ2 (xn , x) + · · · + ρM (xn , x) < ε/2 for all n ≥ N . For such n, ρ(xn , x) < ε. 23. For x, y ∈ [1, b], y(1 + xn )1/n − x(1 + y n )1/n |fn (x, y) − f (x, y)| = y(1 + y n )1/n n 1/n −n 1/n (1 + x ) − x(1 + y ) = (1 + y n )1/n |(1 + xn )1/n − x| + |x − x(1 + y −n )1/n | (1 + y n )1/n h i h i ≤ x (1 + x−n )1/n − 1 + x (1 + y −n )1/n − 1 h i ≤ b (21/n − 1) + (21/n − 1) → 0. ≤

Solutions to Selected Problems

551

Section 8.2 1.

(1, 0) (0, 0)

(0, 0)

B1d1 (0, 0)

B1d∞ (0, 0)

(1, 0)

FIGURE C.1: Open balls for Exercise 1. 3. r = d(x, y)/2. 5. If x, y ∈ Br (a) and 0 < t < 1, then ktx + (1 − t)y − ak = kt(x − a) + (1 − t)(y − a)k ≤ kt(x − a)k + k(1 − t)(y − a)k < tr + (1 − t)r = r. In general, spheres are not convex. (Consider (R2 , d2 ).) 8. By Exercise 8.1.6, ρ is a metric. Since ex is a continuous function on R with continuous inverse, ρ(xn , x) → 0 iff |xn − x| → 0. Therefore, ρ is topologically equivalent to the usual metric of R. (R, ρ) is not complete in this metric. For example, {−n}∞ n=1 is a Cauchy sequence in (R, ρ) with no limit. Therefore, ρ cannot be metrically equivalent to the usual metric of R. 12. Let {fn } be a sequence in C converging uniformly to f . Then fn (x) = fn (1 − x) for all n and x. Taking limits yields f (x) = f (1 − x) for all x. To see that C is not closed in the metric of Exercise 8.1.22, define fn ∈ C by fn (1/2) = 1, fn (x) = 0 if x ∈ [0, 1/2 − 1/n] ∪ [1/2 + 1/n, 1] and linear on [1/2 − 1/n, 1/2 + 1/n].

Section 8.3 1. (a) cl(A) ∪ cl(B) is closed and ⊇ A ∪ B, so cl(A) ∪ cl(B) ⊇ cl(A ∪ B). Similarly, cl(A ∪ B) ⊇ cl(A) and cl(A ∪ B) ⊇ cl(B). (d) int(A)∪int(B) is open and ⊆ A∪B, hence int(A)∪int(B) ⊆ int(A∪B). The example A = (0, 1], B = (1, 2) in R produces strict inclusion. (f) bd(cl(A)) = cl(cl(A)) \ int(cl(A)) ⊆ cl(A) \ int(A) = bd(A). The example A = Q in R produces strict inclusion.

552

A Course in Real Analysis 3. (b) (x, y, 0) : x2 + y 2 = 1 . (e) {(1, 0), (0, 0)}. 2 2 (f) The circle (x, y) : x + y = 1 together with the point (0, 0).

6. (a) By 8.3.6, y ∈ clY (A) iff for any sequence {an } in A with an → y, y ∈ A. The same characterization can be given for y ∈ clX (A) ∩ Y . 8. The sequence {fn } has no cluster points in C([0, 1]), k · k∞ , hence the set {f1 , f2 , . . .} is closed. The identically zero function is a cluster point of the sequence in C([0, 1]), k · k1 , hence the set is not closed in this space. 9. (a) B is open and B ⊆ C, hence B ⊆ int(C). The example B1 (x) = {x} and C1 (x) = X in a nontrivial discrete space gives strict inclusion. 12. (b) By 8.3.9, for any y ∈ R there exist integers nk > 0, mk such that nk /(2π) + mk → y − x/(2π) hence sin(nk + x) = sin 2π (nk + x)/(2π) + mk → sin(2πy). Therefore, the set is dense in [−1, 1]. 16. Let u ∈ U and choose ε > 0 such that Bε (u) ⊆ U . Since Y is dense in X, Bε (u) ∩ U ∩ Y = Bε (u) ∩ Y 6= ∅. If U is not open, then the assertion may not hold. For example, take X = [0, 1], Y = (0, 1], and U = {0}. S 20. (a) Let u, v ∈ I := i∈I Ii and t ∈ (0, 1). Then u ∈ Ii and v ∈ Ij for some i, j ∈ I. Since Ii ∩ Ij = 6 ∅, Ii ∪ Ij is an interval. Therefore, tu + (1 − t)v ∈ Ii ∪ Ij ⊆ I, hence I is an interval. Since each Ii is open, I is open.

Section 8.4 1. (b), (k) (o), (r) Limit and iterated limits are 0. (e) Limit does not exist. One iterated limit is 0, the other is 1. (i) Limit and iterated limits exist and = 1/2. 2. (a) The limit is 1 since 2 x − 5y 2 8y 2 8y 2 = − 1 < ≤ 8a−2/p |y|2(1−1/p) → 0. x2 + 3y 2 x2 + 3y 2 (|y|/a)2/p + 3y 2 (b) The limit does not exist, as may be seen by converting to polar coordinates.

Solutions to Selected Problems

553

6 y there exists a number 3. By the Cauchy mean value theorem, for each x = θ = θ(x, y) between x and y such that g(x, y) =

f 0 (θ) . cos θ

Since limy→x θ(x, y) = x, define g(x, x) = f 0 (x)/ cos x. 6. This follows from Exercise 8.1.5 7. Given ε > 0, choose p δ > 0 such that |f (x) − f (a)| < ε for all x, a with |x − a| < δ. Let (x − a)2 + (y − b)2 < δ/2. Then p p |x2 + y 2 − a2 − b2 | √ x2 + y 2 − a2 + b2 = p x2 + y 2 + a2 + b2 |x − a|(|x| + |a|) + |y − b|(|y| + |b|) p ≤ √ x2 + y 2 + a2 + b2 ≤ |x − a| + |y − b| p ≤ 2 (x − a)2 + (y − b)2 < δ, hence |g(x, y) − g(a, b)| < ε. 8. For a proof using the sequential criterion for uniform continuity, let xn − an , yn − bn → 0. Then αxn + βyn − (αan + βbn ) → 0, hence g(xn , yn ) − g(an , bn ) = f (αxn + βyn ) − f (αan + βan ) → 0. The functions xy and sin(xy) are not uniformly continuous on R2 . (For the former √ take xn = yn = n + 1/n and an =√bn = n. For the latter take xn = yn = 2π [n + 1/(3n)] and an = bn = 2π n.) 11. This follows from the inequalities |fj (x) − fj (a)| ≤ kf (x) − f (a)k ≤

n X

|fj (x) − fj (a)|.

j=1

12. We prove the uniform continuity part. Given ε > 0, choose a fixed n such that ρ(fn (x), f (x)) < ε/3 for all x ∈ X. Then choose δ > 0 such that ρ(fn (x), fn (a)) < ε/3 for all x, a ∈ X with d(x, a) < δ. The triangle inequality then shows that ρ(f (x), f (a)) < ε/3 for all x, a ∈ X with d(x, a) < δ.

Section 8.5 1. (a) compact. (f) bounded, not closed.

(b) closed, not bounded. (h) neither bounded nor closed.

554

A Course in Real Analysis

3. Compact case: Let {Ui : i ∈ I} be an open cover of C := C1 ∪ · · · ∪ Ck , where each Cj is compact. For each j there exists a finite set Ij ⊆ I such that {Ui : i ∈ Ij } covers Cj . If I0 is the union of the Ij , then {Ui : i ∈ I0 } is a finite subcover of C. 4. Such an intersection is closed and contained in a compact set and is therefore compact. 7. If E is totally bounded, then cl(E) is totally bounded. Since X is complete, cl(E) is complete. Therefore, by 8.5.8, cl(E) is sequentially compact. In particular, every sequence in E has a cluster point in X. Conversely, assume every sequence in E has a subsequence that converges in X. Let {yn } be a sequence in cl(E). For each n, choose xn ∈ E such that d(xn , yn ) < 1/n. By hypothesis, a subsequence xnk converges to some x ∈ X, hence ynk → x. Therefore, cl(E) is sequentially compact hence totally bounded. T∞ S∞ 11. Suppose n=1 Cn = ∅. Then n=1 Cnc = X, hence {Cnc : n ∈ N} is an open cover of X and therefore also of C1 . Choose n ∈ N such that C1 ⊆ C1c ∪ · · · ∪ Cnc . Taking complements, Cn = C1 ∩ · · · ∩ Cn ⊆ C1c ⊆ Cnc , which is impossible. 13. By the approximation property of suprema, there exist sequences {an } and {bn } in A such that d(an , bn ) → d(A). Since A is compact, there exists a subsequence {a0n } of {an } converging to some a ∈ A. Similarly, there exists a subsequence {b00n } of the corresponding subsequence {b0n } that converges to some b ∈ A. It follows that d(a, b) = limn d(a00n , b00n ) = d(A). For the example, take A = {fn } in C [0, 1] with the sup metric, where fn (x) = xn . Then d(A) = 1 > d(fn , fm ) for all m, n. 15. (a) For any a ∈ A, d(A, x) ≤ d(a, x) ≤ d(a, y) + d(y, x), hence d(A, x) − d(y, x) ≤ d(a, y). Taking the infimum over a yields d(A, x) − d(y, x) ≤ d(A, y) or d(A, x) − d(A, y) ≤ d(y, x). Interchanging x and y yields (a). (b) If x 6∈ cl(A) there exists r > 0 such that Br (x) ∩ cl(A) = ∅. Then d(a, x) ≥ r for all a ∈ A, hence d(A, x) > 0. Conversely, assume x ∈ cl(A) and let an ∈ A with an → x. Since d(A, x) ≤ d(an , x) → 0, d(A, x) = 0. (c) By (b), the denominator of FAB (x) is positive, hence FAB is welldefined. Continuity follows from (a), and clearly 0 ≤ FAB ≤ 1. The last assertions follow from (b). (d) U = {x ∈ X : FAB (x) < 1/2}, V = {x ∈ X : FAB (x) > 1/2}. 19. Let xn := f (1/n) and yn := f (2π −1/n). Then limn xn = limn yn = (1, 0) but f −1 (xn ) = 1/n → 0 and f −1 (yn ) = 2π − 1/n → 2π. 21. Each set is a continuous image of the compact set A × B.

Solutions to Selected Problems

555

Section 8.6 3. Suppose that F is equicontinuous at a. Given ε > 0, choose δ > 0 such that ρ(f (x), f (a)) < ε for all x ∈ X with d(x, a) < δ and all f ∈ F. Given sequences {fn } in F and {xn } in E with xn → a, choose N such that d(xn , a) < δ for all n ≥ N . For such n, ρ fn (xn ), fn (a) < ε. Conversely, suppose that F is not equicontinuous at a. Then there exist ε > 0 and members xn of E and fn of F such that d(xn , a) < 1/n but ρ fn (xn ), fn (a) ≥ ε. Therefore, the sequential condition does not hold. 7. Let x > a ≥ c. By the mean value theorem applied to the function f (z) = z −p on (na, nx), 1 1 pn|x − a| for some yn ∈ (na, nx). (nx)p − (na)p = y p+1 n Since ynp+1 ≥ (nc)p+1 ≥ cp+1 , |(nx)−p − (na)−p | ≤ p|x − a|c−(p+1) , which shows equicontinuity. 9. Take xn = a + π/n in Exercise 3. Then xn → a but sin(nxn ) − sin(na) = −2 sin(na), which has no limit if a is a nonzero rational number. 11. By the mean value theorem, |f (x) − f (y)| ≤ M |x − y|. 14. Let kfi k∞ ≤ M for all i. Then |Fi (x) − Fi (y)| ≤ M |x − y|, hence F is uniformly equicontinuous on [a, b]. It follows that the uniform closure G of F in C([a, b]) is uniformly equicontinuous on [a, b] (Exercise 6). Since G is also closed and bounded, it is compact (Arzelà–Ascoli Theorem), hence totally bounded.

Section 8.7 1. (c) not connected.

(d) path connected, hence connected.

(e) connected iff −1 ≤ a ≤ 1. 5. Then f (u) and f (v) have opposite signs, say f (u) < 0 < f (v). Since the range of f is connected, it contains the interval (f (u), f (v)). 7. Let f = (g, h) : X → R2 and L := {(x, x) : x ∈ R}. Then L separates R2 into two open half-planes H1 and H2 . Choose any x0 ∈ X and suppose f (x0 ) ∈ H1 . Then E := f −1 (H1c ) = f −1 (H2 ) is both open and closed. Since X is connected, E = ∅. Therefore, f (X) ⊆ H1 .

556

A Course in Real Analysis

9. Consider the case B := B1 (0). Any point in B c may be connected to the sphere S := S2 (0) by a radial line segment. Since S is path connected (8.7.10), B c is path connected. 12. Denote the union by A. Let f : A → {0, 1} be continuous. Since An is connected, f (An ) is a single point. Since An ∩ An+1 = 6 ∅, an induction argument shows that f is constant. 16. Suppose that f : L → C is such a function. Then f −1 : C → L is continuous (8.5.11). Remove a point p from the interior of L. Then f −1 maps the connected set C \ f (p) onto the disconnected set L \ p. The function f (t) = (cos t, sin t) maps [0, 2π] continuously onto the circle x2 + y 2 = 1. 20. Let x ∈ bd(A) and ε > 0. Then there exist u, v ∈ Bε (x) such that f (u) ≥ c and f (v) < c. Since Bε (x) is convex, it is connected, hence f Bε (x) is an interval and so must contain c. Taking ε = 1/n, we may construct a sequence xn → x with f (xn ) = c for each n. Therefore, f (x) = c. This shows that bd(A) ⊆ B. The example f (x) = x2 on R with c = 0 shows that the inclusion may be strict. 22. (a) Cx is connected by Exercise 13. Let u ∈ Cx and choose ε > 0 such that Bε (u) ⊆ U . Since Bε (u) is connected, Bε (u) ∪ Cx is connected, hence Bε (u) ⊆ Cx . Therefore, Cx is open. If Cx ∩ Cy 6=, then Cx ∪ Cy is connected hence Cx = Cy . Therefore, U is a union of pairwise disjoint components. (b) Choose a point with rational coordinates in each component in (a). Since these points form a countable set, the union is countable.

Section 8.8 3. Choose a sequence of polynomials Pn converging uniformly to f on Rb Rb [a, b]. By hypothesis, a f Pn = 0 for all n, hence a f 2 = 0. Since f is continuous, f = 0. If a ≥ 0, then the polynomials with even powers form a separating algebra, hence the result follows as before. 6. By the Stone–Weierstrass theorem, there exists a sequence of functions gn in A converging uniformly to f . Set fn = gn − gn (x0 ). Then fn ∈ A and gn (x0 ) → 0, hence kfn − f k∞ ≤ kfn − gn k∞ + kgn − f k∞ = |gn (x0 )| + kgn − f k∞ → 0. 9. By 8.8.8, there exists a sequence {Tn } of trigonometric polynomials converging uniformly to f on [0, 2π]. For any j, sin(jx) and cos(jx) m n are x, hence, by hypothesis, R 2π linear combinations of products sin xRcos 2π 2 f (x)T (x) dx = 0 for all n. Therefore, f = 0 so f = 0. n 0 0

Solutions to Selected Problems

557

Pm 11. The set of all functions of the form T (x) := b0 + j=1 bj sin(jx) on [−π/2, π/2] is an algebra A containing the constant functions. Since sin x separates points, so does A. Therefore, given ε > 0, kf − T k∞ < ε/2 for some T . Since f (0) = 0, |b0 | < ε/2. Therefore, kf − (T − b0 )k∞ < ε. Pn 15. The functions i=1 gi (x)hi (y) form an algebra and separate points of X ×Y.

Section 8.9 1. Assume that X has the decreasing sequence property and let {xn } be a Cauchy sequence in X. Take Cn = cl {xn , xn+1 , . . .} . Then Cn is closed, Cn+1 ⊆ Cn and d(Cn ) → 0 (because {xn } is Cauchy). By assumption, there exists x ∈ X such that x ∈ Cn for all n. It follows that some subsequence of {xn } converges to x. Therefore xn → x (Exercise 8.1.9). 3. Let {r1 , r2 . . .} be an enumeration of Q. Then Un := {rn , rn+1 , . . .} is T open and dense in Q but n Un = ∅.

Section 9.1 1. (a)

2y dx − 2x dy . (x + y)2

(e) cos(x2 y)(2xy dx + x2 dy).

2

(h) exy (y 2 dx + 2xy dy). x e sin y ex cos y 2. (b) . ey cos x ey sin x

(c)

3 1 y − x2 y 2 2 2 4xy 2 (x + y )

x3 − xy 2 . −4y 3

3. Let ∆ = {(x, x) : x ∈ R}. (a) Differentiable on R2 iff p, q > 3, in which case partials are continuous. (d) Differentiable on R2 iff p, q > 1. Partials are continuous iff p > 2. 4. (a) Differentiable and partials continuous iff p + q > 1. (d) Differentiable and partials continuous iff p + q > 2s + 1. 8. x · ∇f (x) = a · f (x), x · ∇g(x) = g(x). 10. e−f (x) (ex1 , ex2 , . . . , exn ). 12. (a)

xi . kxk

(c)

kxk2 − x2i . kxk3

Section 9.2 1. Let α denote the right side of the inequality. Clearly, kT k ≤ α. If kxk ≤ 1, then kT xk ≤ kT kkxk ≤ kT k, hence α ≤ kT k.

558

A Course in Real Analysis

3. Since ∇(ψ −1 ) = ψ −2 ∇ψ, the assertion follows from the scalar product rule. 4. (a) Let f (x) = x and ψ(x) = kxk in the product rule (9.2.6). Since dfx is the identity transformation and ∇ψ(x) = x/kxk, dgx (h) = kxkh + kxk−1 (x · h)x. Therefore, dgx (x) = kxkx + kxk−1 (x · x)x = 2kxkx. 6. Let η(h), µ(k) be such that f (a+h) = f (a)+dfa (h)+khkη(h), g(b+k) = g(b)+dgb (k)+kkkµ(k) for all h ∈ Rp , k ∈ Rq with khk, kkk sufficiently small, and lim η(h) = lim µ(k) = 0.

h→0

k→0

Let T (h, k) = αdfa (h) + βdgb (k). Then T is linear in (h, k) and ε(h, k) := F (a + h, b + k) − F (a, b) − T (h, k) = αkhkη(h) + βkkkµ(k). Since k(h, k)k =

p

k(hk2 + kkk2 ≥ khk, kkk,

kε(h, k)k |α|khkkη(h)k + |β|kkkkµ(k)k ≤ ≤ |α|kη(h)k + |β|kµ(k)k. k(h, k)k k(h, k)k 10. Part (a) follows from Exercise 8.5.14. For (b), set g(t) = kf (t) − vk2 . Then g 0 (t) = 2(f (t) − v) · f 0 (t), and since g(t0 ) is the minimum value of g, g 0 (t0 ) = 0.

Section 9.3 1. g 0 ϕ(x)ψ(y) ϕ0 (x)ψ(y), ϕ(x)ψ 0 (y) . 3. gx a · x, b · x)a + gy a · x, b · x)b. 7. T f 0 (x). 10. (a) Let g(t) = f (a + tu). By definition, Du f (a) = g 0 (0). On the other hand, by the chain rule, g 0 (t) = u · ∇f (a + tu). Setting t = 0 yields (a). f (tu) − f (0) ab2 = lim 2 exists for all u = (a, b). f is not t→0 t→0 a + b4 t2 t √ continuous at (0, 0), since f → 0 along y = 0 but f = 1/2 along y = x, x > 0.

(c) lim

Solutions to Selected Problems 12. Let F (x) =

Rb a

559

f (t, x) dt. By the mean value theorem,

F (x + h) − F (x) = h

b

Z

fx (t, x + rh) dt, for some r = r(t, x, h) ∈ (0, 1).

a

Since fx is uniformly continuous, fx (t, x + rh) → fx (t, x) uniformly in t Rb on [a, b] as h → 0. Therefore, F 0 (x) = a fx (t, x) dt. 15. Let ϕ(t) = t−p f (tx). By the product rule and the chain rule, ϕ0 (t) =

1 −p f (tx) + p ∇f (tx) · x. tp+1 t

If f is homogeneous of degree p, then ϕ is a constant function, hence p 1 f (tx) = p ∇f (tx) · x. tp+1 t Setting t = 1 produces the desired identity. On the other hand, if the identity holds, then tx · ∇f (tx) = pf (tx) for all t and x, hence ϕ0 (t) = 0. Therefore, ϕ(t) = ϕ(1), which shows that f is homogeneous of degree p. 17. Fix y ∈ C and define g on U by g(x) = f (x) − f (y) − dfy (x). Then g(x) − g(y) = f (x) − f (y) − dfy (x − y) and dgz = dfz − dfy (9.1.7), hence the result follows from 9.3.6 applied to g.

Section 9.4 1. (a) {(x, y) : x 6= y}. (e) {(x, y) : xy 6= 0}.

(b) {(x, y) : x + y 6= (2n + 1)π/2, n ∈ Z}. (f) {(x, y) : x, y > 0, y 6= x}.

(i) {(x, y) : y 6= ±x}. (j) {(x, y, z) : xyz 6= 0}. √ √ 2. (i) x = 12 u + u2 − 4v , y = 12 u − u2 + 4v . √ √ 1/2 1/2 (v) x = √12 u − u2 − 4v 2 , y = √12 u + u2 − 4v 2 . 4. Set u = x(x2 + y 2 )−1 and v = y(x2 + y 2 )−1 . Square and add. Jf = −1.

Section 9.5 1. Let f (x, y) = x + y 2 + exy − 1. Then fx (0, 0) = 1 and fy (0, 0) = 0, so the implicit function theorem guarantees a local solution x = x(y) but says nothing about a solution y = y(x). 5. Let F (x, y, z) = sin(x + z) + ln(y + z) − G(x, y, z) = e

xz

+ sin(πy + z) − 1.

√

2/2 and

560

A Course in Real Analysis Then, at (π/4, 1, 0), F = G = 0 and ∂(F, G) ∂(F, G) ∂(F, G) 6= 0. ∂(x, y) ∂(y, z) ∂(x, z)

8. Let F = x − y + z + u2 − 2, G = −x + 2z + u3 − 2, H = −y + 3z + u4 − 3. Then, at (1, 1, 1, 1), F = G = H = 0 and ∂(F, G, H) ∂(F, G, H) ∂(F, G, H) 6= 0. ∂(x, y, u) ∂(y, z, u) ∂(x, z, u) 9. (b) Let a := fx (0, 0) and b := fy (0, 0). Thecondition is b(a + 1) 6= 0. The −fx (x, y)fx f (x, y), y . derivative is fy (x, y)fx f (x, y), y + fy f (x, y), y 11. (a) The condition is a(a3 − ab2 − b3 ) 6= 0 where a := fx (0, 0), b := fy (0, 0). 13. f 0 (1) + g 0 (1) + h0 (1) 6= 0. 15. Let y = F (x1 , . . . , xn ). If x1 is a function of x2 , . . . , xn , then, assuming the necessary differentiability, 0= hence

∂y ∂x1 = Fx 1 + Fxn , ∂xn ∂xn

∂x1 Fx = − n . In this manner we obtain ∂xn Fx1

Fx Fx ∂x2 ∂x3 ∂xn ∂x1 Fx Fx ... = (−1)n 1 2 . . . n−1 n = (−1)n . ∂x1 ∂x2 ∂xn−1 ∂xn Fx 2 Fx 3 Fxn Fx1

Section 9.6 1. (b) zrr = t2 zxx + 2tzxy + zyy , ztt = r2 zxx + 2rzxy + zyy . (e) zrr = (e2r sin2 t)zxx + (e2r cos2 t)zyy + (2e2r sin t cos t)zxy + (er sin t)zx + (er cos t)zx , ztt = (e2r cos2 t)zxx + (e2r sin2 t)zyy − (2e2r sin t cos t)zxy − (er sin t)zx − (er cos t)zx . (f) zr = axzx , zt = byzy , zrr = a2 x2 zxx + a2 xzx , ztt = b2 y 2 zyy + b2 yzy . 4. Fx + zx Fz = 0, hence 2 0 = Fxx + 2zx Fxz + zxx Fzz + zxx Fz = Fxx − 2

and so zxx = −

Fx F2 Fxz + x2 Fzz + zxx Fz Fz Fz

1 Fx F2 Fxx + 2 2 Fxz − x3 Fzz . Fz Fz Fz

Solutions to Selected Problems

561

5. (a) ut = −k 2 u, uxx = −u. (b) By logarithmic differentiation, 2 2 1 2 x x − − 2 u. ut = u, uxx = 4k 2 t2 2t 4k 4 t2 4k t 7. The second order partial derivatives are wρρ = (sin φ cos θ)2 wxx + (sin φ sin θ)2 wyy + (cos θ)2 wzz + (2 sin φ) (sin φ sin θ cos θ)wxy + (cos φ cos θ)wxz + (cos φ sin θ)wyz , wθθ = (ρ sin φ)2 (sin2 θ)wxx + (cos2 θ)wyy − 2(sin θ cos θ)wxy − (ρ sin φ)[(cos θ)wx − (sin θ)wy ], wφφ = ρ (cos φ cos θ)2 wxx + (cos φ sin θ)2 wyy + (sin φ)2 wzz + 2ρ2 (cos2 φ sin θ cos θ)wxy − (cos φ sin φ cos θ)wxz − (cos φ sin φ sin θ)wyz − ρ (sin φ cos θ)wx + (sin φ sin θ)wy + (cos φ)wz . 2

9. fxi = pxi kxkp−2 g 0 (kxkp ), hence fxi xi = p kxkp−2 + (p − 2)x2i kxkp−4 g 0 (kxkp ) + p2 x2i kxk2(p−2) g 00 (kxkp ) , h i fxi xj = pxi xj (p − 2)kxkp−4 g 0 (kxkp ) + pkxk2(p−2) g 00 (kxkp ) (i 6= j).

Section 9.7 1. (b)

∂3f ∂3f ∂3f ∂3f (dx)3 + 3 2 (dx)2 dy + 3 dx (dy)2 + 3 (dx)3 . 3 2 ∂x ∂x ∂y ∂x∂y ∂y

2. (a) 2y 2 (3x + y) (dx)2 + 12xy(x + y) dx dy + 2x2 (x + 3y) (dy)2 . 6 4 2 (b) 4 (dx)2 + 3 2 dx dy + 2 3 (dy)2 . x y x y x y (c) −y 2 sin(xy) (dx)2 + 2 cos(xy) − xy sin(xy) dx dy − x2 sin(xy) (dy)2 . (d) 2f (x, y) (2x2 + 1) (dx)2 + 4xy dx dy + (2y 2 + 1) (dy)2 . 1 (e) 2 2(y − x2 ) (dx)2 − 4x dx dy − (dy)2 . 2 (x + y) 3. zero. 5. (a) f + h1

∂f ∂f ∂f + h2 + h3 . The terms are evaluated at a. ∂x1 ∂x2 ∂x3

8. By induction, ∂ p f (x) = bp11 bp22 . . . bpnn ϕ(p) b · x , . . . ∂xpnn

∂xp11 ∂xp22

562

A Course in Real Analysis hence (x · ∇)p f (0) = ϕ(p) (0)

X

p (b1 x1 )p1 (b2 x2 )p2 . . . (bn xn )pn p1 , p2 , . . . , pn

= ϕ(p) (0)(b · x)p , where the second equality follows from the multinomial theorem. 11. (a) x + y − 16 (x + y)3 .

(d) x + y − 31 (x + y)3 .

Section 9.8 2. x2 +2y 2 +3z 2 −xy−yz−xz =

1 2

(x−y)2 +(y−z)2 +(x−z)2 +y 2 +2z 2 ≥ 0.

3. (a) (0, 0): local min; (−4/3, 4/3): saddle. (d) (1, 1), (−1, −1): local max; (0, 0): saddle. (f) (2, −2): saddle. (i) (1/3, 1/3): local max; (0, 0), (0, 1), (1, 0): saddle. 4. (a) Use polar coordinates to optimize the resulting single variable function g(θ) = cos θ +sin θ, g 0 (θ) = − sin θ +cos θ, 0 ≤ θ ≤ 2π. The √ critical points of g occur at values of θ that satisfy sin θ = cos θ = ± 2/2. At these √ values, g(θ) = ± 2. Also, g(0) √ = g(2π) = 1. Therefore, the maximum and minimum values of f are ± 2. 2 6. (b) The only critical point is (2/3, −1/3). On √bd(D),√f = x − x + 2, −1 ≤ x ≤ 1, which has critical point (±1/ 2, ±1/ 2). Checking the values of f at these√points and √ at (±1, 0) shows that the maximum of f is f (−1, 0) = f (−1/ 2, −1/ 2) = 2 and the minimum is f (2/3, −1/3) = −1/3. √ (d) The only critical point is (0, 0). On bd(D), f = ± sin x 1 − x2 , √ √ −1 ≤ x ≤ 1, which has critical points (±1/ 2, ±1/ 2). Checking the values of f at these points and at (±1, 0) shows that the extreme values of f are ± sin(1/2).

10. Since

lim

(x,y)→(0+ ,0+ )

f (x, y) =

lim

(x,y)→(+∞,+∞)

f (x, y) = +∞,

f has a minimum on (0, c) × (0, d) for suitable c, d > 0, and the minimum must occur at a critical point. The unique critical point is (a2/3 b−1/3 , a−1/3 b2/3 ), which gives the minimum 3(ab)1/3 . 2 Pn 11. Let f (m, b) = i=1 yi − mxi − b . Since not all x coordinates are the same, m must be bounded. Since the data is bounded, b must be

Solutions to Selected Problems

563

bounded. Therefore, the minimum exists and must occur at the unique critical point (m, b) of f , which is determined by the system n n X X (yi − mxi − b)(−xi ) = (yi − mxi − b)(−1) = 0. i=1

i=1

It follows that x · y − mkxk2 − nbx = mx − y + b = 0. 15. Let f (x, y) = ax2 + 2bxy + y 2 and g(x, y) = x2 + y 2 − c2 . The equation ∇f = λ∇g yields ax + by = λx and bx + y = λy. Multiplying the first equation by x and the second by y and then adding yields f (x, y) = λ(x2 + y 2 ) = λc2 . Since the system (a − λ)x + by = bx + (1 − λ)y = 0 has a nontrivial solution iff the determinant of the coefficient matrix is zero, we obtain λ2 − (a + 1)λ + a − b2 = 0. Solving for λ we see that the maximum and minimum values of f on the circle are p λc2 = a + 1 ± (a + 1)2 + 4(a − b2 ) (c2 /2). 17. We minimize f (x, y) := (x − 1)2 + (y − 2)2 + (z − 3)2 subject to the constraint g(x, y, z) := x2 + y 2 − z = 0. From ∇f = λ∇g we have x − 1 = λx, y − 2 = λy, z − 3 = −λ/2, from which it follows that y = 2x and z = 3−(x−1)/2x. From z = x2 +y 2 we then have 3 − (x − 1)/2x = 5x2 , or 10x3 − 5x − 1 = 0. 19. We minimize f (x, y) := (x − 1)2 + (y − 2)2 + (z − 3)2 subject to the constraint g(x, y, z) := z 2 − x2 − y 2 − 1 = 0. From ∇f = λ∇g we have x=

1 2 3 , y= x, z = , 1+λ 1+λ 1−λ

hence y = 2x and z = 3x/(2x − 1). Substituting into z 2 − x2 − y 2 = 1 yields the desired polynomial. 22. Let f (x, y, z) = x + 2y + 3z, g1 (x, y, z) = x + y + z − 1, and g2 (x, y, z) = x2 + y 2 + z 2 − 1. From ∇f = λ1 ∇g1 + λ2 ∇g2 , 1 = λ1 + 2λ2 x, 2 = λ1 + 2λ2 y, 3 = λ1 + 2λ2 z. Subtracting yields 1 = 2λ2 (y − x) = 2λ2 (z − y) so y − x = z − y or x − 2y + z = 0. Combining this with the constraint x + y + z = 1 yields y = 1/3 and z = 2/3 − x. From the constraint x2 + y 2 + z 2 = 1 we √ obtain x2 − 2x/3 − 2/9 = 0 so x = (1√± 3)/3. The maximum value of f (≈ 3.154694) √ occurs when x = (1 − 3)/3, the minimum (≈ 0.845293) when x = (1 + 3)/3.

564

A Course in Real Analysis Pn 24. We minimize f (x) := i=1 (xi − bi )2 subject to the constraint g(x) := a · x − c = 0. From ∇f = λ∇g we have (xj − bj ) = λaj /2, hence (xj − bj )2 =

λ2 a2j λ(aj xj − aj bj ) = , 4 2

1 ≤ j ≤ n.

Adding and using the constraint, f (x) =

λ λ λ2 kak2 and f (x) = (a · x − a · b) = (c − a · b) 4 2 2

Therefore, λ = 2(c − a · b)kak−2 , which gives the desired conclusion. 26. Let f (x) = kx−ak2 and g(x) = kxk2 −1. The equation ∇f = λ∇g leads to the system xj − aj = λxj , or xj (1 − λ) = aj , j = 1, . . . , n. Therefore, n X xj = aj /(1 − λ), so by the constraint kak2 = a2j = (1 − λ)2 , hence j=1

x = ±a/kak. The distance to the sphere is then the smaller of

± akak−1 − a = 1 ± kak−1 kak, namely 1 − kak−1 kak. Pn 27. (a) Let f (x) = a · x and g(x) = i=1 bi /xi − 1. From ∇f = λ∇g we have ai = −λbi /x2i , hence p p √ = ai bi /µ, µ := −λ. xi = µ bi /ai and bi x−1 i √ Pn √ The constraint implies that µ = i=1 ai bi . Since ai xi = µ ai bi , the n p X 2 minimum is a i bi . i=1

That the value is indeed the minimum may be argued as follows. If x is any point satisfying the constraint, then f (x) = a1 x1 + a2 x2 + · · · + an−1 xn−1 +

1−

an bn Pn−1 i=1

bi /xi

,

where xi > bP i . Thus |f | → +∞ as the variables x1 , x2 , . . ., xn−1 become n large or as i=1 bi /xi nears 1. Therefore, the minimum occurs in the interior of a compact set, hence at the point obtained above. 30. Since cl(U ) is compact, there exist points u, v ∈ cl(U ) such that f (u) ≤ f (x) ≤ f (v) for all x ∈ cl(U ). If f (u) = f (v), then f is a constant function and the result follows. If f (u) < f (v), then one of the points, say u, must lie in U . By 9.8.2, f 0 (u) = 0.

Solutions to Selected Problems

565

Section 10.1 1. Let µ be as in 10.1.5 with pk = 1/k, or let µ be as in ??, and take Ak = {k, k + 1, . . .}. 3. By the inclusion-exclusion principle and additivity, µ(A ∪ B) = µ(A) + µ(B) − µ(A ∩ B) = µ(A), and µ(A) = µ(A \ B) + µ(A ∩ B) = µ(A \ B). 5. Let B = A1 ∪ · · · ∪ An . By 10.1.6(c), µ(A1 ∪ · · · ∪ An+1 ) = µ(B ∪ An+1 ) = µ(B) + µ(An+1 ) − µ(B ∩ An+1 ). By the induction hypothesis, µ(B) + µ(An+1 ) =

n+1 X i=1

µ(Ai ) −

n X

µ(Ai ∩ Aj ) + · · · + (−1)n−1 µ(A1 ∩ · · · ∩ An )

1≤i 0} ∈ F. Conversely, if E ∈ F and t ∈ R, then if t < 0, ∅ c {x : 1E (x) ≤ t} = E if 0 ≤ t < 1, and S if t ≥ 1. In each case, {x : 1E (x) ≤ t} ∈ F, hence 1E is measurable. 10. 1A∆B (x) = 1 iff 1A (x) − 1B (x) = 1 or 1B (x) − 1A (x) = 1 iff x ∈ A \ B or x ∈ B \ A. 14. The range of f is {1/k : k ∈ N}. Since f (x) = 1/k iff 1/x − 1 < k ≤ 1/x iff 1/(k + 1) < x ≤ 1/k, the assertion follows from Exercise 7. 17. Let ε > 0 and choose N ∈ N such that 2N > 1/ε and f ≤ N on S. Let k > N , so 0 ≤ f ≤ k. Then, in the notation of the proof of 10.5.8, k

fk =

k2 X j−1 j=1

2k

1Ak,j , where

Ak,j = x ∈ S : (j − 1)2−k ≤ f (x) < j2−k , j = 1, 2, . . . , k2k . For any x ∈ S there exists j ∈ {1, 2, . . . , k2k } such that x ∈ Ak,j , hence 0 ≤ f (x) − fk (x) = f (x) − (j − 1)2−k ≤ 1/2k < ε. 19. (a) That F is a σ-field follows from properties of preimages. Tm (b) Since f −1 I1 × · · · × Im = j=1 fj−1 (Ij ), F contains all intervals, hence, by minimality, F = B(Rm ). (c) If A ∈ B(R) and B := F −1 (A), then B ∈ B(Rm ), hence, from (b), g −1 (A) = f −1 (B) ∈ B(Rn ).

568

A Course in Real Analysis

Section 11.2 3. Since f

−1

{d } = 2

∞ [

d/10k , (d + 1)/10k ∩ I,

k=1

Z

f dλ =

[0,1]

9 X d=1

9 1 X d2 λ f −1 {d2 } = d2 . 9 d=1

R 5. (a) If E |g| dλR= 0, then g = 0 a.e.,R hence both integralsR in (a) are R zero. Suppose E |g| dλ 6= 0. Since m E |g| ≤ E f |g| ≤ M E |g| on E, −1 R R a := E |g| dλ f |g| dλ satisfies the requirement. E (b) For example, take E = (−1, 1) and f = g = 1(−1,0) − 1(0,1) , so Z Z Z fg = 1(−1,0) + 1(0,1) = 2, g = 0. E

E

E

(c) Given ε > 0, choose δ > 0 such that −ε < f (t) − f (x) < ε for all t ∈ (x − δ, x + δ). If y ∈ (x, x + δ), then, by (a), Z Z Z f dλ − f dλ − f (x)(y − x) = [f (t) − f (x)]1[x,y] (t) dt [a,y] [a,x] Z = ay 1[x,y] dλ = ay (y − x), where |ay | ≤ ε. Dividing by y − x proves t

The third part consists of appendices on set theory and linear algebra as well as solutions to some of the exercises. Features • Provides a detailed axiomatic account of the real number system • Develops the Lebesgue integral on n from the beginning • Gives an in-depth description of the algebra and calculus of differential forms on surfaces in n • Offers an easy transition to the more advanced setting of differentiable manifolds by covering proofs of Stokes’s theorem and the divergence theorem at the concrete level of compact surfaces in n • Summarizes relevant results from elementary set theory and linear algebra • Contains over 90 figures that illustrate the essential ideas behind a concept or proof • Includes more than 1,600 exercises throughout the text, with selected solutions in an appendix

• Access online or download to your smartphone, tablet or PC/Mac • Search the full text of this and other titles you own • Make and share notes and highlights • Copy and paste text and figures for use in your own documents • Customize your view by changing font size and layout K22153

w w w. c rc p r e s s . c o m

JUNGHENN

With clear proofs, detailed examples, and numerous exercises, this book gives a thorough treatment of the subject. It progresses from single variable to multivariable functions, providing a logical development of material that will prepare readers for more advanced analysis-based studies.

A COURSE IN

The second part focuses on functions of several variables. It introduces the topological ideas needed (such as compact and connected sets) to describe analytical properties of multivariable functions. This part also discusses differentiability and integrability of multivariable functions and develops the theory of differential forms on surfaces in n.

REAL ANALYSIS

The first part of the text presents the calculus of functions of one variable. This part covers traditional topics, such as sequences, continuity, differentiability, Riemann integrability, numerical series, and the convergence of sequences and series of functions. It also includes optional sections on Stirling’s formula, functions of bounded variation, Riemann–Stieltjes integration, and other topics.

WITH VITALSOURCE ® EBOOK

A COURSE IN

REAL ANALYSIS

HUGO D. JUNGHENN

A COURSE IN

REAL ANALYSIS

K22153_FM.indd 1

1/9/15 4:46 PM

K22153_FM.indd 2

1/9/15 4:46 PM

A COURSE IN

REAL ANALYSIS HUGO D. JUNGHENN

The George Washington University Washington, D.C., USA

K22153_FM.indd 3

1/9/15 4:46 PM

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20150109 International Standard Book Number-13: 978-1-4822-1928-9 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

TO THE MEMORY OF MY PARENTS Rita and Hugo

Contents

Preface

xi

List of Figures

xiii

List of Tables

xvii

List of Symbols

I

xix

Functions of One Variable

1

1 The Real Number System 1.1 From Natural Numbers to Real Numbers 1.2 Algebraic Properties of R . . . . . . . . . 1.3 Order Structure of R . . . . . . . . . . . 1.4 Completeness Property of R . . . . . . . 1.5 Mathematical Induction . . . . . . . . . . 1.6 Euclidean Space . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 3 4 8 12 19 24

2 Numerical Sequences 2.1 Limits of Sequences . . . . . . . . . 2.2 Monotone Sequences . . . . . . . . . 2.3 Subsequences and Cauchy Sequences 2.4 Limits Inferior and Superior . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

29 29 36 38 42

. . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . .

47 47 55 59 63 67

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

73 73 80 85 88 94

3 Limits and Continuity on R 3.1 Limit of a Function . . . . . . . . *3.2 Limits Inferior and Superior . . . 3.3 Continuous Functions . . . . . . . 3.4 Properties of Continuous Functions 3.5 Uniform Continuity . . . . . . . .

. . . .

4 Differentiation on R 4.1 Definition of Derivative and Examples 4.2 The Mean Value Theorem . . . . . . . *4.3 Convex Functions . . . . . . . . . . . 4.4 Inverse Functions . . . . . . . . . . . 4.5 L’Hospital’s Rule . . . . . . . . . . . .

. . . .

. . . . .

. . . . .

vii

viii

Contents 4.6 *4.7

Taylor’s Theorem on R . . . . . . . . . . . . . . . . . . . . Newton’s Method . . . . . . . . . . . . . . . . . . . . . . .

5 Riemann Integration on R 5.1 The Riemann–Darboux Integral . . . . 5.2 Properties of the Integral . . . . . . . . 5.3 Evaluation of the Integral . . . . . . . . *5.4 Stirling’s Formula . . . . . . . . . . . . 5.5 Integral Mean Value Theorems . . . . . *5.6 Estimation of the Integral . . . . . . . . 5.7 Improper Integrals . . . . . . . . . . . . 5.8 A Deeper Look at Riemann Integrability *5.9 Functions of Bounded Variation . . . . *5.10 The Riemann–Stieltjes Integral . . . . . 6 Numerical Infinite Series 6.1 Definition and Examples . . . . . . . 6.2 Series with Nonnegative Terms . . . . 6.3 More Refined Convergence Tests . . . 6.4 Absolute and Conditional Convergence *6.5 Double Sequences and Series . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

107 . . 107 . 116 . 120 . 129 . . 131 . 134 . 143 . . 151 . 152 . 156

. . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

163 . 163 . 169 . 176 . . 181 . 188

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

. . . . . . .

. . . . . . . . . . . .

7 Sequences and Series of Functions 7.1 Convergence of Sequences of Functions . . 7.2 Properties of the Limit Function . . . . . . 7.3 Convergence of Series of Functions . . . . . 7.4 Power Series . . . . . . . . . . . . . . . . .

II

. . . .

Functions of Several Variables

8 Metric Spaces 8.1 Definitions and Examples . . . . 8.2 Open and Closed Sets . . . . . . 8.3 Closure, Interior, and Boundary 8.4 Limits and Continuity . . . . . . 8.5 Compact Sets . . . . . . . . . . *8.6 The Arzelà–Ascoli Theorem . . . 8.7 Connected Sets . . . . . . . . . . 8.8 The Stone–Weierstrass Theorem *8.9 Baire’s Theorem . . . . . . . . .

100 103

193 193 199 204 211

229 . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

231 . . 231 . 238 . 243 . 248 . 255 . 263 . 268 . 275 . 282

9 Differentiation on Rn 9.1 Definition of the Derivative . . . . . . . . . 9.2 Properties of the Differential . . . . . . . . 9.3 Further Properties of the Differential . . . 9.4 Inverse Function Theorem . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

287 . . 287 . 295 . . 301 . 306

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

Contents 9.5 9.6 9.7 *9.8

ix

Implicit Function Theorem . . . . . . Higher Order Partial Derivatives . . . Higher Order Differentials and Taylor’s Optimization . . . . . . . . . . . . . .

10 Lebesgue Measure on Rn 10.1 General Measure Theory . . 10.2 Lebesgue Outer Measure . . 10.3 Lebesgue Measure . . . . . . 10.4 Borel Sets . . . . . . . . . . . 10.5 Measurable Functions . . . .

. . . . . . . . . . . . Theorem . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

312 318 323 330

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

343 . 343 . . 347 . . 351 . 356 . 360

11 Lebesgue Integration on Rn 11.1 Riemann Integration on Rn . . . . . . 11.2 The Lebesgue Integral . . . . . . . . . 11.3 Convergence Theorems . . . . . . . . 11.4 Connections with Riemann Integration 11.5 Iterated Integrals . . . . . . . . . . . . 11.6 Change of Variables . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

367 . . 367 . 368 . 379 . 385 . 388 . 398

12 Curves and Surfaces in Rn 12.1 Parameterized Curves . 12.2 Integration on Curves . 12.3 Parameterized Surfaces 12.4 m-Dimensional Surfaces

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

13 Integration on Surfaces 13.1 Differential Forms . . . . . . . . . . . . 13.2 Integrals on Parameterized Surfaces . . 13.3 Partitions of Unity . . . . . . . . . . . . 13.4 Integration on Compact m-Surfaces . . 13.5 The Fundamental Theorems of Calculus *13.6 Closed Forms in Rn . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

447 . . 447 . . 461 . 472 . 475 . 478 . 495

. . . .

. . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . . .

409 409 412 422 432

III Appendices

503

A Set Theory

505

B Linear Algebra

509

C Solutions to Selected Problems

517

Bibliography

581

Index

583

Preface

The purpose of this text is to provide a rigorous treatment of the foundations of differential and integral calculus at the advanced undergraduate level. It is assumed that the reader has had the traditional three semester calculus sequence and some exposure to elementary set theory and linear algebra. As regards the last two subjects, appendices provide a summary of most of the results used in the text. Linear algebra will not be needed until Part II. The book consists of three parts. Part I treats the calculus of functions of one variable. Here, one can find the traditional topics: sequences, continuity, differentiability, Riemann integrability, numerical series, and convergence of sequences and series of functions. Optional sections on Stirling’s formula, Riemann–Stieltjes integration, and other topics are also included. As the ideas inherent in these subjects ultimately rest on properties of real numbers, the book begins with a careful treatment of the real number system. For this we take an axiomatic rather than a constructive approach, guided as much by the need for efficiency of exposition as by pedagogical preference. Of course, presenting the real number system in this way begs the excellent question as to whether such a system exists. It is a question we do not answer, but the interested reader may wish to consult a text on the construction of the real number system from the natural numbers, or even on the philosophy of mathematics. Part II treats functions of several variables. Many of the results in Part I, such as the chain rule, the inverse function theorem, and the change of variables theorem, have counterparts in Part II. The reader’s exposure to the one-variable results should make the multivariable versions more meaningful and accessible. As might be expected, however, some results in Part II have no counterparts in Part I, the implicit function theorem and the iterated integral (Fubini–Tonelli) theorem being obvious examples. Part II begins with a chapter on metric spaces. Here we introduce the topological ideas needed to describe some of the analytical properties of multivariable functions. Primary among these are the notions of compact set and connected set, which, for example, allow the extension to higher dimensions of the extreme value and intermediate value theorems. The remainder of Part II covers differentiability and integrability of multivariable functions. As regards integrability, we have chosen to develop from the beginning the Lebesgue integral rather than to the extend the Riemann integral to higher dimensions. The additional time required for this approach is, in my view, more than offset xi

xii

Preface

by the enormous added utility of the Lebesgue integral. The last chapter of Part II develops the theory of differential forms on surfaces in Rn . The chapter culminates with proofs of Stokes’s theorem and the divergence theorem for compact surfaces. It is hoped that exposure to these topics at the concrete level of surfaces in Rn will ease the transition to more advanced courses such as calculus on differentiable manifolds. Part III consists of the aforementioned appendices on set theory and linear algebra, as well as solutions to some of the over 1600 exercises found in the text. For convenience, exercises with solutions that appear in the appendix are marked with a superscript S . Exercises that will find important uses later are marked with a downward arrow ⇓. Instructors with suitable bona fides may obtain from the publisher a manual of complete solutions to all of the exercises. The book is an outgrowth of notes developed over many years of teaching real analysis to undergraduates at George Washington University. The more recent versions of these notes have been specifically tested in classes over the last three years. During this period, the typical two-semester course closely followed the non-starred sections of this text: Chapters 1–7 for the first semester and 8–13 for the second. Given the wealth of material, it was necessary to leave some proofs for students to read on their own, a not wholly unfortunate compromise. Material in some starred sections was assigned as optional reading. I would like to express my gratitude to the many students whose critical eyes caught errors before they made their way into these pages. Of course, any remaining errors are my complete responsibility. Special thanks are due to Zehua Zhang, whose enlightened comments have improved the exposition of several topics. Finally, to my wife Mary for her support and understanding during the writing of this book: thank you! Hugo D. Junghenn Washington, D.C. September 2014

List of Figures

1.1 1.2

Supremum and infimum of A . . . . . . . . . . . . . . . . . Greatest integer function . . . . . . . . . . . . . . . . . . . .

12 14

2.1 2.2 2.3 2.4

Convergence of a sequence . . . Squeeze principle . . . . . . . . Interval halving process . . . . Limits supremum and infimum

3.1 3.2 3.3 3.4

Limit of a function . . . . . . . . . L can’t be greater than M . . . . . One-to-one correspondence between Intermediate value property . . . .

. . . . . . . . D and . . . .

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

30 31 39 42

. . . . Q . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

48 53 61 64

Trigonometric inequality . . . . . . Local extrema . . . . . . . . . . . . Mean value theorems . . . . . . . . Convex function . . . . . . . . . . . Convex function inequalities . . . . Intermediate value property implies Intermediate value property implies Newton’s method . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . monotonicity continuity . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. 74 . 80 . . 81 . 86 . . 87 . 89 . 89 . 104

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10

Upper and lower sums . . . . . The partitions P and Q . . . . The partition Pn . . . . . . . . The partitions P 0 , P, and P 00 . Riemann sum . . . . . . . . . . The partitions P x and P y . . . Trapezoidal rule approximation Midpoint rule approximation . . Simpson’s rule approximation . The partition Q . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. 108 . 110 . . 111 . 112 . 113 . 122 . 136 . . 137 . 139 . 159

7.1 7.2

Uniform convergence . . . . . . . . . . . . . . . . . . . . . . 193 Pointwise convergence insufficient . . . . . . . . . . . . . . . . 201

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

xiii

xiv

List of Figures 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11

An open ball is open . . . . . . . . . . . The functions gn and g . . . . . . . . . . Convex and non-convex sets . . . . . . . The neighborhoods Ux and Vx . . . . . . A 2ε net . . . . . . . . . . . . . . . . . . A bounded set in Rn is totally bounded A separation (U, V ) of E . . . . . . . . . C1 (−1, 0) ∪ C1 (1, 0) is path connected . E is path connected . . . . . . . . . . . . A piecewise linear function . . . . . . . . Sawtooth function . . . . . . . . . . . . .

9.1 9.2

The domain of argθ0 . . . . . . . . . . . . . . . . . . . . . . Saddle point . . . . . . . . . . . . . . . . . . . . . . . . . . .

10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8

Interval grid . . . . . . . . . . . . . . . Coverings . . . . . . . . . . . . . . . . Middle thirds . . . . . . . . . . . . . . Ternary expansion algorithm . . . . . . Decomposition into half-open intervals K = cl(E) \ U . . . . . . . . . . . . . . The components of fk . . . . . . . . . The components of fk+1 . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. 348 . 349 . 353 . 354 . . 357 . 358 . 363 . 363

11.1 11.2 11.3 11.4 11.5

Partition of an n-dimensional interval Three-dimensional simplex . . . . . . Concentric cube and ball . . . . . . . The paving Qr . . . . . . . . . . . . Theorem of Pappus . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . 367 . 390 . 402 . 403 . 408

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16

Curves in R2 . . . . . . . . . . . . . . . A piecewise smooth curve with tangent Inscribed polygonal line . . . . . . . . Vector field on E . . . . . . . . . . . . Closed curve ϕ . . . . . . . . . . . . . Concatenation of curves . . . . . . . . Tangent spaces at p . . . . . . . . . . . Affine space . . . . . . . . . . . . . . . The inward unit normal . . . . . . . . Normal vector to S at p . . . . . . . . Surface of revolution . . . . . . . . . . Möbius strip . . . . . . . . . . . . . . . −1 The mapping Ga . . . . . . . . . . . . Transition mappings . . . . . . . . . . Stereographic projection . . . . . . . . The mapping dψx . . . . . . . . . . . .

. . . . . vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. 409 . 410 . 412 . 416 . 418 . 419 . 422 . 424 . . 427 . . 427 . 429 . 430 . 434 . 435 . 436 . 438

. . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. 239 . 240 . . 241 . 255 . 256 . . 257 . 268 . . 271 . 272 . 276 . 285 310 330

List of Figures . . . .

xv

12.17 12.18 12.19 12.20

Cylinder-with-boundary . . . . Surface element . . . . . . . . . Induced orientation of Ta∂S . . . Stereographic projection from p

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. 440 . . 441 . 443 . 444

13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 13.12 13.13 13.14 13.15 13.16 13.17

Parallelogram approximation to ϕ(Q) . . . . . Two dimensional simplex . . . . . . . . . . . . A partition of unity subordinate to U1 and U2 The functions h and g . . . . . . . . . . . . . The cubes Wi and Vi . . . . . . . . . . . . . . Regular region E . . . . . . . . . . . . . . . . Annulus in R2 with exterior normal . . . . . . The case a ∈ E . . . . . . . . . . . . . . . . . The case a ∈ bd(E) . . . . . . . . . . . . . . . Regular region in R2 . . . . . . . . . . . . . . Piecewise smooth surfaces . . . . . . . . . . . Oriented cube without bottom face . . . . . . Closed polygon . . . . . . . . . . . . . . . . . Surfaces S1 and S2 with common boundary C Curves contracting to p must pass through q Boundary parametrization . . . . . . . . . . . Star-shaped and non-star-shaped regions . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. 463 . 470 . 472 . 473 . 474 . 483 . 483 . 484 . 485 . 488 . 489 . 490 . 490 . 494 . 495 . . 497 . 499

C.1

Open balls for Exercise 1 . . . . . . . . . . . . . . . . . . . . . 551

List of Tables

4.1 5.1 5.2 5.3

Newton’s method for ex + x − 2 = 0 . R Table for evaluating Rf h by parts . . . Table for evaluating (x + 1)3 e5x dx by A comparison of the methods . . . . .

. . . . . . . . . . . .

105

. . . . . . . . . . . . parts . . . . . . . . . . . . . . . . . . . .

125 125 143

9.1 9.2

Values of ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . . Values of ∆ . . . . . . . . . . . . . . . . . . . . . . . . . . .

333 334

xvii

List of Symbols

R P Q N Z Q I n! aa a≤b b≥a |x| max S min S x+ x− sup A inf A bxc R +∞, −∞ (a, b) (a, b] [a, b) [a,b] n k n

R x·y kxk2 kxk1 kxk∞ a×b limn an an ↑

real number system . . . . . . . . . summation symbol . . . . . . . . . product symbol . . . . . . . . . . . set of natural numbers . . . . . . . set of integers . . . . . . . . . . . . set of rational numbers . . . . . . . set of irrational numbers . . . . . . n factorial . . . . . . . . . . . . . . less than . . . . . . . . . . . . . . . greater than . . . . . . . . . . . . . less than or equal . . . . . . . . . . greater than or equal . . . . . . . . absolute value of x . . . . . . . . . maximum of S . . . . . . . . . . . . minimum of S . . . . . . . . . . . . positive part of x . . . . . . . . . . negative part of x . . . . . . . . . . supremum of A . . . . . . . . . . . infimum of A . . . . . . . . . . . . . greatest integer in x . . . . . . . . . extended real number system . . . . positive infinity, negative infinity . . open interval . . . . . . . . . . . . . left-open interval . . . . . . . . . . right-open interval . . . . . . . . . . closed interval . . . . . . . . . . . . binomial coefficient . . . . . . . . . Euclidean space . . . . . . . . . . . Euclidean inner product . . . . . . Euclidean norm . . . . . . . . . . . `1 norm . . . . . . . . . . . . . . . . max norm . . . . . . . . . . . . . . cross product . . . . . . . . . . . . limit of a sequence . . . . . . . . . . increasing sequence of real numbers

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 4 6 6 6 6 7 8 8 9 9 9 10 10 10 10 12 12 14 15 15 16 16 16 16 21 24 25 25 26 26 27 29 36 xix

xx

List of Symbols an ↓ an ↑ a an ↓ a lim inf n an lim supn an N (a) = Nr (a) lim f (x) x→a x∈E

decreasing sequence of real numbers sequence increases to a . . . . . . . sequence decreases to a . . . . . . . limit infimum of a sequence . . . . limit supremum of a sequence . . . neighborhood of a . . . . . . . . . . limit of f along E . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . .

36 36 36 42 42 47 47

lim f (x)

left-hand limit . . . . . . . . . . . . . . . .

48

lim f (x)

right-hand limit . . . . . . . . . . . . . . .

48

lim f (x)

two-sided limit . . . . . . . . . . . . . . . .

48

lim f (x)

limit at +∞ . . . . . . . . . . . . . . . . .

48

lim f (x)

limit at −∞ . . . . . . . . . . . . . . . . .

48

lim inf f (x) x→a

limit inferior of f along E . . . . . . . . .

56

lim sup f (x)

limit superior of f along E . . . . . . . . .

56

df f = Df = dx D` f (a) = f`0 (a) Dr f (a) = fr0 (a) f (n) Tn (x, a) Rn (x, a) kPk S(f, P) S(f, P) Rb f a Rb f a Rb f a Rba S(f, P, ξ) R f b Va (f ) Sw (f, P, ξ) Rb f dw a S w (f, P) S w (f, P) Rb f dw a Rb f dw a

derivative of f . . . . . . . left-hand derivative at a . right-hand derivative at a . nth derivative of f . . . . Taylor polynomial . . . . . Taylor remainder . . . . . mesh of partition P . . . . lower Darboux sum . . . . upper Darboux sum . . . .

x→a− x→a+ x→a

x→+∞

x→−∞ x∈E

x→a x∈E 0

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . .

73 75 75 77 101 101 107 107 107

lower Darboux integral . . . . . . . . . . .

109

upper Darboux integral . . . . . . . . . . .

109

Riemann–Darboux integral . . . . . . . . . set of Riemann integrable functions on [a, b] Riemann sum . . . . . . . . . . . . . . . . indefinite integral of f . . . . . . . . . . . . total variation of f on [a, b] . . . . . . . . Riemann-Stieltjes sum . . . . . . . . . . . Riemann-Stieltjes integral . . . . . . . . . upper Darboux–Stieltjes sum . . . . . . . . lower Darboux–Stieltjes sum . . . . . . . .

109 110 113 121 152 156 156 160 160

upper Darboux-Stieltjes integral . . . . . .

160

lower Darboux-Stieltjes integral . . . . . .

160

List of Symbols P∞ an = n=1 an limm limn am,n lim P m,n am,n am,n m,n P P∞ ∞ a Pj=1 k=1 P∞j,k f = n n=1 fn Pn∞ n n=0 cn (x − a) −1 R = ρ

infinite series of real numbers . iterated limit . . . . . . . . . . . double limit . . . . . . . . . . . double infinite series . . . . . . . iterated series . . . . . . . . . . infinite series of functions . . . . power series in x about a . . . . radius of convergence . . . . . . a generalized binomial coefficient . n (X, d) metric space . . . . . . . . . . . kxk norm of x . . . . . . . . . . . . (X , k · k) normed vector space . . . . . . . d2 Euclidean metric on Rn . . . . . d1 `1 metric on Rn . . . . . . . . . d∞ max metric on Rn . . . . . . . . B(S) space of bounded f : S → R . . kf k∞ supremum norm f . . . . . . . . `∞ set of bounded sequences . . . . `1 set of summable sequences . . . kak1 `1 norm of a . . . . . . . . . . . d×ρ product metric . . . . . . . . . . Br (x) open ball . . . . . . . . . . . . . Cr (x) closed ball . . . . . . . . . . . . Sr (x) sphere . . . . . . . . . . . . . . C([a, b]) space of cont. f on [a, b] . . . . D([a, b]) space of diff. f on [a, b] . . . . . [a : b] line segment from a to b . . . . cl(E) closure of E . . . . . . . . . . . int(E) interior of E . . . . . . . . . . . bd(E) boundary of E . . . . . . . . . . lim{x→a, x∈E} f (x) limit of f along E . . . . . . . . lim(x,y)→(a,b) f (x, y) double limit . . . . . . . . . . . limx→a limy→b f (x, y) iterated limit . . . . . . . . . . . d(A) diameter of A . . . . . . . . . . d(A, B) distance between A and B . . . C(X, Y ) set of cont. f : X → Y . . . . . ext(E) exterior of E . . . . . . . . . . . C(X) space of cont. f X :→ R . . . . ∂f ∂j f = fxj = ∂x partial derivative of f . . . . . . j ∇f or grad f gradient of f . . . . . . . . . . . dfa : Rn → Rm differential of f at a . . . . . . . f 0 (a) Jacobian matrix of f at a . . . ∂(f1 ,...,fn ) Jf (a) = ∂(x (a) Jacobian of f . . . . . . . . . . 1 ,...,xn ) P

n

xxi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . .

.

. .

. .

163 188 189 190 190 204 211 211 216 231 233 233 233 233 233 233 233 234 234 234 234 238 238 238 240 240 241 243 243 243 248 249 249 261 261 263 274 275 289 289 291 291 292

xxii

List of Symbols ∂mf m

n ∂xi 1 ···∂xm in 1 Dm fx m m1 ,m2 ,...,mn

Tm (x, a) Rm (x, a) λ∗ M = M(Rn ) λ = λn B = B(Rn ) 1A S R + (F) R f dλ f dλ E 1 RL (E) R f (x, z) dz dx Rp Rq f ∗g αn length(ϕ) R f ds ϕ ~ F T~ϕ f1 dx1 + · · · + fn dxn ~ ω·H ϕ Tϕ(u) sign(ϕ) ∂ϕ⊥ ~ϕ N ϕa : Ua → Sa ϕab Sa Rn−1 + Hn−1 ∂Hn−1 ∂S S \ ∂S dxj ωx ω∧η dω ϕ∗ ω area(ϕ) R f dS ϕ

higher order partial derivative . . . . . . .

318

mth total differential of f . . . . . . . . . multinomial coefficient . . . . . . . . . . Taylor polynomial . . . . . . . . . . . . . Taylor remainder term . . . . . . . . . . Lebesgue outer measure . . . . . . . . . . Lebesgue measurable sets . . . . . . . . . Lebesgue measure on Rn . . . . . . . . . Borel measurable sets . . . . . . . . . . . indicator function of A . . . . . . . . . . set of F-measurable simple functions ≥ 0 Lebesgue integral of f . . . . . . . . . . . Lebesgue integral of f on E . . . . . . . space of integrable functions on E . . . . iterated integral . . . . . . . . . . . . . . convolution of f and g . . . . . . . . . . volume of unit ball in Rn . . . . . . . . . length of curve ϕ . . . . . . . . . . . . . line integral over ϕ . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

324 325 327 327 349 351 351 356 362 362 370 370 370 388 389 390 412 415

vector field . . . . . . . . . . . . . . . . unit tangent vector field along ϕ . . . . 1-form in Rn . . . . . . . . . . . . . . . inner product of a form and vector field parameterized m-surface . . . . . . . . tangent space of ϕ . . . . . . . . . . . . sign of parametrization ϕ . . . . . . . . normal vector to surface ϕ . . . . . . . normal vector field . . . . . . . . . . . local parametrization of S . . . . . . . transition mapping . . . . . . . . . . . surface element . . . . . . . . . . . . . Rn−1 with xn−1 > 0 . . . . . . . . . . . Rn−1 with xn−1 ≥ 0 . . . . . . . . . . . boundary of Hn−1 . . . . . . . . . . . . boundary of S . . . . . . . . . . . . . . interior of S . . . . . . . . . . . . . . . multidifferential . . . . . . . . . . . . . differential form . . . . . . . . . . . . . wedge product . . . . . . . . . . . . . . differential of ω . . . . . . . . . . . . . pullback of ω by ϕ . . . . . . . . . . . . area of ϕ . . . . . . . . . . . . . . . . . integral of f on a para. surface . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. .

. .

416 416 417 417 422 422 424 425 426 435 435 435 440 440 440 440 440 448 451 452 454 457 463 463

List of Symbols R

ω ϕ L(U, V) At [T ] det A

integral of a form on a para. surface . . set of linear transformations T : U → V transpose of A . . . . . . . . . . . . . . matrix of T . . . . . . . . . . . . . . . determinant of A . . . . . . . . . . . .

xxiii . . . . .

. 466 . 510 . . 511 . 513 . 514

Part I

Functions of One Variable

Chapter 1 The Real Number System

If the notion of limit is the cornerstone of analysis, then the real number system is the bedrock. In this chapter we provide a description of the real number system that is sufficiently detailed to allow a careful development of limit in the various forms that appear in this book. The real number system is defined as a nonempty set R together with two algebraic operations, called addition and multiplication, and an ordering less than that collectively satisfy three sets of axioms: the algebraic or field axioms, the order axioms, and the completeness axiom. These are discussed in Sections 1.2–1.4. We begin, however, with a brief description of how the real number system may be constructed from a more fundamental number system.

1.1

From Natural Numbers to Real Numbers

A rigorous construction of the real number system starts with the set of natural numbers (positive integers) N and then proceeds to the set of integers Z, the rational number system Q, and, finally, the real number system R. In this approach the natural numbers are assumed to satisfy a set of axioms called the Peano Axioms. These are used to define the operations of addition and multiplication in N. Subtraction is introduced by enlarging the system of natural numbers to Z, thereby allowing solutions of all equations of the form x + m = n, m, n ∈ Z. To obtain division, Z is enlarged to Q by forming all quotients m/n, where m, n ∈ Z, n 6= 0. In this system, one may solve all equations of the form ax + b = c, a 6= 0. The final step, the construction of R from Q, may be viewed as “filling in the gaps” of the rational number line, these gaps corresponding to the so-called irrational numbers.1 For the details of this “bottom up” approach, the interested reader is referred to [7] or [10]. We shall instead take a “top down” approach, describing the real number system axiomatically. 1 This step results in a system that, while having the structure necessary to formulate a robust theory of limits, does not allow solutions of all polynomial equations. This shortcoming is removed by introducing complex numbers, a subject outside the scope of this book.

3

4

A Course in Real Analysis

1.2

Algebraic Properties of R

In this section we list the axioms that govern the use of addition (+) and multiplication (·) in the real number system. These axioms lead to all of the familiar algebraic properties of real numbers. The operations of addition and multiplication satisfy the following field axioms, where a, b, c denote arbitrary members of R: • Closure under addition: a + b ∈ R. • Associative law of addition: (a + b) + c = a + (b + c). • Commutative law of addition: a + b = b + a. • Existence of an additive identity: There exists a member 0 of R such that a + 0 = a for all a ∈ R. • Existence of additive inverses: For each a ∈ R there exists a member −a of R such that a + (−a) = 0. • Closure under multiplication: a · b ∈ R. • Associative law of multiplication: (a · b) · c = a · (b · c). • Commutative law of multiplication: a · b = b · a. • Existence of a multiplicative identity: There exists a real number 1 6= 0 such that a · 1 = a for all a ∈ R. • Existence of multiplicative inverses: For each a 6= 0 there exists a member a−1 of R such that a · a−1 = 1. • Distributive law: a · (b + c) = a · b + a · c. We use the following standard notation: a = a/b = ab−1 , b a + b + c = (a + b) + c = a + (b + c), abc = (ab)c = a(bc), a − b = a + (−b), ab = a · b,

an = aa · · · a}, a−n = 1/an (a 6= 0), and a0 = 1. | {z n

We also use the summation and product symbols n X j=m

aj = am + am+1 + · · · + an and

n Y

P

and

Q

defined by

aj = am am+1 · · · an .

j=m

The field axioms may be used to derive the standard rules of algebra. Some of these are given in the following proposition; others may be found in Exercise 1.

The Real Number System

5

1.2.1 Proposition. The following algebraic properties hold in R: (a) The additive identity is unique; that is, if 00 is a real number such that a + 00 = a for all a ∈ R, then 00 = 0. (b) The additive inverse of a real number is unique; that is, if a + b = 0, then b = −a. (c) The multiplicative identity is unique; that is, if 10 is a real number such that a · 10 = a for all a ∈ R, then 10 = 1. (d) a · 0 = 0 for all a ∈ R. (e) The multiplicative inverse of a nonzero real number is unique; that is, if ab = 1, then b = 1/a. (f) If ab = 0, then either a = 0 or b = 0. (g) If ab = ac and a 6= 0, then b = c. (h) If b 6= 0 and d 6= 0, then a/b = c/d if and only if ad = bc. (i) If a 6= 0 and b 6= 0, then (ab)−1 = a−1 b−1 , or

1 1 1 = . ab a b

Proof. (a) If a + 00 = a for all a then, in particular, 0 + 00 = 0. But, by definition of 0 and commutativity of addition, 0 + 00 = 00 . Therefore 00 = 0. (b) By associativity and commutativity of addition, b = b + 0 = 0 + b = (−a + a) + b = −a + (a + b) = −a + 0 = −a. (c) If a · 10 = a for all a then, in particular, 1 · 10 = 1. But, by definition of the multiplicative identity and commutativity of multiplication, 1 · 10 = 10 . Therefore 10 = 1. (d) By the distributive property, a · 0 = a(0 + 0) = a · 0 + a · 0. Adding −(a · 0) to both sides of this equation and using associativity of addition produces the desired equation. (e) By associativity and commutativity of multiplication, b = 1 · b = (a−1 a)b = a−1 (ab) = a−1 · 1 = a−1 . (f) Assume a 6= 0. By (d) and commutativity and associativity of multiplication, 0 = a−1 · 0 = (a−1 )(ab) = (a−1 a)b = 1 · b = b.

6

A Course in Real Analysis

(g) By commutativity and associativity of multiplication, b = 1 · b = (a−1 a)b = a−1 (ab) = a−1 (ac) = (a−1 a)c = c. (h) If a/b = c/d, then multiplying both sides by bd and using the commutativity and associativity of multiplication we obtain ad = bc. Conversely, if ad = bc, then multiplying both sides by 1/(bd) yields a/b = c/d. (i) By associativity and commutativity of multiplication, (ab)(a−1 b−1 ) = (aa−1 )(bb−1 ) = 1. Now apply (e). The reader will notice that the assertions in the proposition are implications, that is, statements of the form p implies q, frequently written p ⇒ q. Such assertions may be proved directly by assuming p and then deducing q, or indirectly by assuming the negation of q and arguing to a contradiction or to the negation of p. Part (h) also contains an assertion of the form p if and only if q (hereafter, shortened to p iff q). Such an assertion is established by proving both p ⇒ q and q ⇒ p. Throughout the text, we shall encounter many examples of such proofs. The reader is advised that a careful proof requires that each (nontrivial) step be justified by citing hypotheses, appropriate axioms, or previously proved results. One more point of logic: To prove that a general statement involving the universal quantifier “for every” (or “for all”) is false, one must construct a counterexample. For example, the assertion that xy = x + y for all real numbers x and y is clearly false. For a proof, one need only find a single pair of numbers x and y such that xy = 6 x + y, for example x = y = 1. On the other hand, to prove that x2 − y 2 = (x − y)(x + y) for all real numbers x and y, it not sufficient to verify the statement for a specific pair of numbers; a general proof is needed here. For details on constructing proofs in mathematics, the reader is referred to [2]. The number systems described in Section 1.1 are summarized as follows: • N = {1, 2 := 1 + 1, 3 := 2 + 1, . . .}

(positive integers),

• Z = {0, ±1, ±2, ±3, . . .} • Q = {m/n : m, n ∈ Z, n 6= 0}

(integers), (rational numbers),

• I=R\Q

(irrational numbers).

An integer is said to be even (odd) if n = 2k (n = 2k + 1) for some k ∈ Z. A precise definition of N is given in Section 1.5. From this it is possible to argue rigorously that the number system N is closed under addition and multiplication. As a consequence, Z is closed under addition, subtraction, and multiplication, and Q is closed under addition, subtraction, multiplication, and division (Exercise 2).

The Real Number System

7

Exercises 1. Prove the following properties of addition and multiplication in R: (a) −(−a) = a.

(b)S −(ab) = (−a)b = a(−b).

(c)⇓2 (−a)(−b) = ab.

(d)S (−1)a = −a.

a/b ad ad = = . c/d bc bc c ad + bc a . (f)S If b 6= 0 and d 6= 0, then + = b d bd (e) If b, d 6= 0, then

2. Let r, s ∈ Q. Assuming that Z is closed under addition and multiplication, prove that r ± s, rs, r/s ∈ Q, the last provided that s 6= 0. 3.S If r 6= 0 ∈ Q and x ∈ I, prove that r ± x, rx, r/x ∈ I. 4. Let n ∈ N. Prove the following identities without using mathematical induction: n X n 3 n (a) ⇓ x − y = (x − y) xn−j y j−1 . j=1

(b)

xn + y n = (x + y)

n X (−1)j−1 xn−j y j−1 if n is odd. j=1

(c)

x−n − y −n = (y − x)

n X

xj−n−1 y −j if x 6= 0 and y 6= 0.

j=1

5.S Define 0! = 1 and, for n ∈ N, define n! = n(n − 1) · · · 2 · 1 (n factorial). Prove the following: n! (a) (1 − 1/n)(1 − 2/n) · · · 1 − (n − 1)/n = n . n (2n)! (b) 1 · 3 · 5 · · · (2n − 1) = n . 2 n! 6. ⇓4 For n ∈ Z+ and k = 0, 1, . . . , n, define the binomial coefficient n n! = k!(n − k)! k (read “n choose k”). Prove that n+1 n n = + . k k−1 k 2 This

exercise will be used in 1.3.2. exercise will be used in 4.1.2. 4 This exercise will be used in 1.5.5. 3 This

8

A Course in Real Analysis 7. Without using mathematical induction, prove that for any n ∈ N, n n X 2 X 1 1 (a) = . (k + 1)(n − k + 1) n+2 k+1 (b)

k=0 n X

k=0

k=0

n

1 X 1 1 = . (2k + 1)(2n − 2k + 1) n+1 2k + 1 k=0

8.S Find a polynomial f (x) of degree 2 such that

1.3

Pn

k=1

f (k) = n3 for all n.

Order Structure of R

The order relation on R is derived from the following order axiom. There exists a nonempty subset P of R, closed under addition and multiplication, such that for each x ∈ R exactly one of the following holds: x ∈ P, −x ∈ P, or x = 0. The last part of the axiom is known as the trichotomy property. A real number x is called positive if x ∈ P and negative if −x ∈ P. 1.3.1 Definition. Let a and b be real numbers. If b − a ∈ P, we write a < b or b > a and say that a is less than b or that b is greater than a. ♦ 1.3.2 Proposition. The order relation < on R has the following properties: (a) a < b iff −a > −b (reflection property). (b) If a < b and b < c, then a < c (transitive property). (c) If a < b and c < d, then a + c < b + d (addition property). (d) If a < b and c > 0, then ac < bc (multiplication property). (e) For a, b ∈ R, exactly one of the following is true: a = b, a < b, or b < a (trichotomy property). (f) If x 6= 0, then x2 > 0. In particular, 1 > 0. Proof. (a) a < b iff (−a) − (−b) = b − a ∈ P iff −a > −b. (b) By hypothesis, b − a ∈ P and c − b ∈ P, hence, by closure under addition, c − a = (b − a) + (c − b) ∈ P, that is, a < c. (c) Similar to (b). (d) Since b − a, c ∈ P, bc − ac = (b − a)c ∈ P, that is, ac < bc. (e) This follows by applying the trichotomy property of P to a − b. (f) If x > 0, then, by closure of P under multiplication, x2 > 0 . If x < 0, then −x > 0 so, by Exercise 1.2.1(c), x2 = (−x)(−x) > 0.

The Real Number System

9

1.3.3 Definition. Let a and b be real numbers. If either a < b or a = b, we write a ≤ b or b ≥ a and say that a is less than or equal to b or that b is greater than or equal to a. If A ⊆ R, we define A+ = {x ∈ A : x ≥ 0}. ♦ Note that by the trichotomy property, a ≤ b and b ≤ a ⇒ a = b.

(1.1)

The inequality a ≤ b is sometimes called weak inequality in contrast to strict inequality a < b. The reader may check that parts (a)–(d) of the above proposition are valid if strict inequality is replaced by weak inequality. 1.3.4 Definition. The absolute value of a real number x is defined by ( x if x ≥ 0, |x| = −x if x < 0.

♦

For example, |0| = 0 and |2| = | − 2| = 2. 1.3.5 Proposition. Absolute value has the following properties: (a) |x| ≥ 0.

(b) |x| = 0 iff x = 0.

(d) − |x| ≤ x ≤ |x|.

(c) | − x| = |x|. x |x| (y 6= 0). (f) = y |y|

(e) |xy| = |x| |y|. (g) |x + y| ≤ |x| + |y|. (h) |x| − |y| ≤ |x − y|. (triangle inequalities)

Proof. Properties (a)–(e) are easily established by considering cases. For example, in (e), if x ≥ 0 and y ≤ 0, then xy ≤ 0, hence |xy| = −(xy) = x(−y) = |x| |y|. For part (f), use (e) to obtain x x |x| = y = |y|, y y and then divide both sides by |y|. For (g), we have ±x ≤ |x| and ±y ≤ |y| by (d), hence ±(x + y) ≤ |x| + |y|. Since one of the signed quantities on the left is |x + y|, the assertion follows. From (g) we have |x| = |(x − y) + y| ≤ |x − y| + |y|, hence |x| − |y| ≤ |x − y|. Switching x and y and using (c) yields (h).

10

A Course in Real Analysis

1.3.6 Definition. Let S be a nonempty set of real numbers. The largest element or maximum of S is a member max S of S that satisfies max S ≥ s for all s ∈ S. The smallest element or minimum of S, denoted by min S, is defined analogously. A set may not have a largest or smallest member. The existence of max S and min S for a nonempty finite set may be established by mathematical induction. (See Exercise 1.5.2.) 1.3.7 Definition. The positive and negative parts of a real number x are defined by x+ = max{x, 0} and x− = max{−x, 0}. ♦

Exercises Prove the following: 1. (a) If ab > 0, then a and b have the same sign. (b) a > 0 iff 1/a > 0. (c)S Suppose either b, d < 0 or b, d > 0. Then a/b > c/d iff ad > bc. 2. If x > 1, then x2 > x. If 0 < x < 1, then x2 < x. 3. (a) If 0 < x < y and 0 < a < b, then 0 < ax < by. (b) If x < y < 0 and a < b < 0, then 0 < by < ax. (c) Let x, y > 0. Then x < y iff x2 < y 2 . 4.S If either 0 < x < y or x < y < 0, then 1/y < 1/x. 5. If −1 < x < y or x < y < −1, then x/(x + 1) < y/(y + 1). What if x < −1 < y? 6. If 0 < x < y and n ∈ N, then (a)S 0 < y n − xn ≤ n(y − x)y n−1 , 7. If x > 1, m, n ∈ N, and

(b)

ny + 1 (n + 1)y + 1 < . nx + 1 (n + 1)x + 1

x−1 m < < 1, then n > x. x n

8.S If a < b and 0 < t < 1, then a < ta + (1 − t)b < b. In particular, a < (a + b)/2 < b. 9. x2 + y 2 + axy ≥ 0 for all x, y ∈ R iff |a| ≤ 2. 10.S If a ≤ b + x for every x > 0, then a ≤ b.

The Real Number System

11

11. If 0 < a ≤ bx for every x > 1, then a ≤ b. 12. If a/x ≤ x + 1 for every x > 0, then a ≤ 0. 13. For all x, y, z, w ∈ R, (a) 2xy ≤ x2 + y 2 .

(b) S xy + yz + xz ≤ x2 + y 2 + z 2 .

(c) (xy + zw)2 ≤ (x2 + z 2 )(y 2 + w2 ). (d) (x + y)2 ≤ 2(x2 + y 2 ). 14.S If x, a > 0, then x + a2 /x ≥ 2a. Equality holds iff x = a. 15. (a) |x − y| ≤ |x − z| + |z − y|. (b) |x − L| < ε iff L − ε < x < L + ε. 16. Let S, T ⊆ R be finite and nonempty. Define −S := {−s : s ∈ S}. Then (a) max(−S) = − min S. (b) min(−S) = − max S. (c) max(S ∪ T ) = max{max S, max T }. (d) min(S ∪ T ) = min{min S, min T }. 17. For any x, y ∈ R, (a) x+ ≥ 0, x− ≥ 0, x = x+ − x− , and |x| = x+ + x− . (b) x+ = |x| + x /2 and x− = |x| − x /2. (c) x = y − z and |x| = y + z imply y = x+ and z = x− . (d) (x + y)+ ≤ x+ + y + and (x + y)− ≤ x− + y − . (e) (x − y)− ≤ y, if x, y ≥ 0. 18.S If a ≤ x ≤ b, then |x| ≤ max{|a|, |b|}. 19. (a) max{x, y} = x + y + |x − y| /2. (b) min{x, y} = x + y − |x − y| /2. 20. (a) max{a, b, c} = (b) min{a, b, c} =

1 4 1 4

a + b + 2c + |a − b| + a + b − 2c + |a − b| . a + b + 2c − |a − b| − a + b − 2c − |a − b| .

21.S Let S = {a1 , . . . , an }, where a1 < · · · < an . Let 1 ≤ k < n and denote by S1 , . . . , Sm thesubsets obtained by removing exactly k members from n S, where m = is the binomial coefficient (see Theorem 1.5.5). Then k max min S1 , . . . , min Sm = ak+1 .

12

1.4

A Course in Real Analysis

Completeness Property of R

A system (F, +, ·, 0 there exists n ∈ N such that na > b. Proof. Suppose, for a contradiction, that na ≤ b for all n ∈ N. The set S = {na : n ∈ N} is then bounded above and hence has a least upper bound u. Since u − a < u, the approximation property for suprema implies that u − a < na for some n ∈ N. But then u < (n + 1)a ∈ S, contradicting that u is an upper bound for S. 1.4.5 Example. Let n A = (−1)n

o n 1 2 3 o n : n ∈ N = − , ,− ,... . n+1 2 3 4

14

A Course in Real Analysis

Since A is bounded above by 1 and below by −1, −1 ≤ inf A ≤ sup A ≤ 1. Let 0 < r < 1. By the Archimedean principle we may choose an even integer n such that n > r/(1 − r). Then r < n/(n + 1) ∈ A, which shows that r cannot be an upper bound of A. Therefore, sup A = 1. Similarly, inf A = −1. ♦ 1.4.6 Well-Ordering Principle. Every nonempty subset A of N has a smallest member. Proof. Since A is bounded below by 1, it has a greatest lower bound `. The theorem will follow if we show that ` ∈ A. Suppose, for a contradiction, that ` 6∈ A. By the approximation property for infima, there exists a ∈ A such that ` < a < ` + 1. Choose any real number r with ` < r < a, for example, r = (a + `)/2. By the approximation property again, there exists a0 ∈ A such that ` < a0 < r. We now have ` < a0 < a < ` + 1, which implies that a − a0 is an integer strictly between 0 and 1. As this is impossible,5 it follows that ` must be a member of A. 1.4.7 Greatest Integer Function. For each x ∈ R there exists a unique integer bxc such that x − 1 < bxc ≤ x. Proof. The uniqueness is clear. To prove existence, apply the Archimedean principle twice: first to obtain an integer k such that x + k ≥ 1 and then to conclude that the set A := {n ∈ N : n > x + k} is nonempty. By the well-ordering principle, A has a least member a. Since 1 ≤ x + k < a, a − 1 is a positive integer. Since a − 1 < a, a − 1 cannot be in A so x + k ≥ a − 1. Therefore, x − 1 < a − 1 − k ≤ x, hence the integer bxc := a − 1 − k has the required property.

y 3 2 1 −3 −2 −1

1 −1

2

3

x

−2

−3

FIGURE 1.2: Greatest integer function. The integer bxc is called the greatest integer in x or the floor of x. The greatest integer function allows a simple proof of the following important result: 5 This is intuitively clear. The abstract definition of N given in Section 1.5 may be used to give a rigorous proof.

The Real Number System

15

1.4.8 Density of the Rationals. Between any pair of distinct real numbers there is a rational number. Proof. Let a < b. By the Archimedean principle, n(b − a) > 1 for some n ∈ N. Let m := bnac + 1. Then na < m ≤ na + 1 < nb, hence a < m/n < b. 1.4.9 Definition. (nth roots). Let n be a positive integer and let b > 0. The unique positive solution of the equation xn = b is called the positive nth root of b and by b1/n . For m ∈ Z we define bm/n = (b1/n )m . As usual we √ is denoted 1/2 write b for b . ♦ The existence of b1/n is an easy consequence of the intermediate value theorem, proved in Chapter 3. Uniqueness follows from Exercise 1.2.4(a). We omit the straightforward (but admittedly tedious) proof of the following theorem that summarizes the familiar rules of rational exponentiation. 1.4.10 Theorem. For r, s ∈ Q and positive real numbers a, b, br = br−s , (br )s = brs , and (ab)r = ar br . bs The following proposition gives a simple way to generate irrational numbers. br bs = br+s ,

1.4.11 Proposition. If n is positive integer that is not a perfect square, then √ n is irrational. √ √ √ Proof. √By definition of the greatest integer function, n − 1 < b nc ≤ n. Since n is is strict, √ √ assumed √ not to be an integer, the second inequality hence 0 < n − b nc < 1. Suppose, for a contradiction, that n is rational. √ Then the set A := {m ∈ N : m n ∈ N} is nonempty.√By the well-ordering principle, A has a least member m0 . In particular, m0 n ∈ N, hence both of the quantities √ √ √ √ √ m := m0 n − b nc and m n = m0 n − nb nc are positive√integers. But then m ∈ A, which is impossible since m < m0 . Therefore, n must be irrational. In later chapters, we shall see other important examples of irrational numbers, notably the base e of the natural logarithm. 1.4.12 Definition. The extended real number system is the set R := R ∪ {−∞, +∞}, where +∞, −∞ are symbols with the following prescribed properties: −∞ < x < ∞ for all x ∈ R, x + ∞ = +∞ if − ∞ < x ≤ +∞, x · (+∞) = +∞ if 0 < x ≤ +∞,

x − ∞ = −∞ if − ∞ ≤ x < +∞, x · (+∞) = −∞ if − ∞ < x < 0,

x · (−∞) = −∞ if 0 < x < +∞, x · (−∞) = +∞ if − ∞ ≤ x < 0, x x = = 0 if − ∞ < x < +∞. +∞ −∞

♦

16

A Course in Real Analysis

The above algebraic conventions are derived from limit considerations. Note that the operations ±∞ ∓ ∞, (±∞) · (∓∞),

±∞ ±∞ , , ±∞ ∓∞

and 0 · (±∞)

(1.2)

are not defined. 1.4.13 Definition. If A 6= ∅ is not bounded above, we set sup A = +∞. Similarly, if A is not bounded below, we set inf A = −∞. We also define sup ∅ = −∞ and inf ∅ = +∞. ♦ The reader may verify that the approximation properties for suprema and infima given in 1.4.3 hold in the extended system R. 1.4.14 Definition. An interval in R is a nonempty set I with the property that a, b ∈ I and a < x < b imply that x ∈ I. An interval containing more than one point is said to be nondegenerate. ♦ Arguing cases, one may show that the definition of interval reduces to the following familiar subsets of R: (a, b) := {x : a < x < b},

(a, b] := {x : a < x ≤ b},

[a, b) := {x : a ≤ x < b},

[a, b] := {x : a ≤ x ≤ b}.

For example, if I is unbounded below and bounded above with b := sup I ∈ I, then I = (−∞, b]. If, instead, I is bounded below and above such that a := inf I ∈ I and b := sup I 6∈ I, then I = [a, b). Intervals that contain their endpoints are said to be closed; those that don’t contain any endpoints are called open. The length |a − b| of a finite interval I with endpoints a, b will be denoted by |I|. Note that the length of a degenerate interval is zero.

Exercises 1. Prove that inf (−A) = − sup A, where −A := {−a : a ∈ A}. Conclude that every nonempty subset of R that is bounded below has a greatest lower bound. 2. Find the supremum and infimum of the following sets, where rn denotes the remainder on division of n ∈ N by 3. 6 (−1)n n(rn − 1) (a) S {(−1)n (rn2 + 3rn + 2) : n ∈ N}. (b) S :n∈N . (n + 1)(rn + 1) ( ) (−1)bn/3c − 1 n n 3n + 2 (c) (−1) :n∈N . (d) :n∈N . 2n + 3 n+1 6 For

the existence of rn , see Exercise 1.5.15.

The Real Number System

17

3. Find the supremum and infimum of the following sets. (a) {x : x2 − 5x + 6 < 0}.

(b) {x : (x + 3)(x − 4) < −6}.

(c) {x : (x − 4)/(x − 3) < −2}.

(d) S {x : x − 2 < 1/(x − 1)}.

(e) S {x : (x − 1)/x < 4}.

(f)

(g) {x : |x − 3x + 2| ≤ 1/4}. p (i) S {x : x − 1/8 > x}.

(h) {x : |x − 1| + |x − 2| ≤ 3}. p (j) {x : x + 1/8 > x}.

S

2

{x > 0 : x/(2 − x) > 3}. S

(k) {x : 2|x − 1| + 3|y − 2| < 6 for some y ∈ R}. (l) {x : 2 x2 − 1 + 3 y 2 − 2 < 6 for some y ∈ R}. (m)S (−1)n sin(nπ/2) − n−1 : n ∈ N . (n) (−1)n sin(mπ/2) − n−1 : m, n ∈ N . 4. Let A ⊆ B be nonempty subsets of R. Prove that sup A ≤ sup B and inf A ≥ inf B. 5.S ⇓7 For a nonempty bounded set A define |A| := {|a| : a ∈ A}. Prove that sup |A| − inf |A| ≤ sup A − inf A. Hint. Use |x| − |y| ≤ |x − y|. 6. For r ∈ Q, x ∈ R, and nonempty subsets A and B of R, define xA = {xa : a ∈ A} AB = {ab : a ∈ A, b ∈ B}

A + B = {a + b : a ∈ A, b ∈ B} Ar = {ar : a ∈ A}, A ⊆ (0, +∞).

Under the conventions described in 1.4.12, prove that (a) sup (A + B) ≤ sup A + sup B, inf (A + B) ≥ inf A + inf B. (b)S sup (xA) = x sup A, inf (xA) = x inf A if x ≥ 0. (c) sup (AB) ≤ (sup A)(sup B) and inf (AB) ≥ (inf A)(inf B) if A, B ⊆ (0, ∞). (d) sup Ar = (sup A)r , inf Ar = (inf A)r if A ⊆ (0, ∞) and r > 0. (e) sup A−1 = 1/ inf A, inf A−1 = 1/ sup A if A ⊆ (0, ∞). 7. Let A ⊆ R be nonempty such that inf{|x − y| : x, y ∈ A, x 6= y} > 0 (for example, any set of integers). If A is bounded above, prove that sup A ∈ A, that is, A has a maximum. 8. Let A be a nonempty bounded set and let r ∈ R such that x − y < r for all x, y ∈ A. Show that sup A − inf A ≤ r. 9.S Prove that between any pair of distinct real numbers there is an irrational number. 7 This

exercise will be used in 5.2.6.

18

A Course in Real Analysis

10. Prove that between any pair of real numbers a < b there exist infinitely many rational numbers and infinitely many irrational numbers. 11. (Density of the dyadic rationals). Prove that for each pair of real numbers a < b there exists m ∈ Z and n ∈ N such that a < m/2n < b. (Suggestion. You might want to use the fact that 2n > n, a consequence of the binomial theorem, proved in the next section.) A number of the form m/2n is called a dyadic rational. 12. Prove: (a) bxc = b−xc iff x = 0.

(b)S bxc = −b−xc iff x ∈ Z.

(c)S −1 < x + b−xc ≤ 0.

(d) bxc + bm − xc = m or m − 1.

13. Let m ∈ Z, n ∈ N, xj ∈ R, and define s :=

n X

xj

and

t :=

j=0

n X

bxj c.

j=0

Prove: (a) 0 ≤ bsc − t ≤ n.

(b) k ≤ s − t < k + 1 for some k = 0, 1, . . . , n. 1/n

14.S Let b > 0. Prove that bm/n = (bm )

.

15. ⇓8 Prove that for a, b > 0 and n ∈ N, 1/n

a

−b

1/n

= (a − b)

X n

1−j/n (j−1)/n

a

b

−1 .

j=1

16. Show that if 0 ≤ a < b and n ∈ N, then a1/n < b1/n . 17.S Prove that if A is a bounded set, then there exists an integer N such that |x| ≤ N for all x ∈ A. √ 18. Let a, b ∈ Q \ {0} and n ∈ N. Prove that x := a + b n is irrational iff n is not a perfect square. √ √ 19. Show that if x, y ∈ Q( 2), then √ x±y, xy, x/y ∈ Q( 2), the last provided that √ y 6= 0. Conclude that Q( 2) is an ordered subfield of R. Show that Q( 2) is not complete. √ √ 20.S (a) Find all n ∈ N such that n + 11 + n ∈ Q. √ √ (b) Same question for n + 21 + n. 21. Let √ p ∈ N√be prime, that is, divisible only by 1 and itself. Prove that ( n + 1)( n + p + 1)−1 ∈ Q iff n = (p − 1)2 /4. 8 This

exercise will be used in 4.1.2.

The Real Number System

1.5

19

Mathematical Induction

In this section we give an abstract characterization of the natural number system. This will lead directly to the principle of mathematical induction. 1.5.1 Definition. A set S of real numbers is said to be inductive if • 1 ∈ S, • x ∈ S implies x + 1 ∈ S. The set N of natural numbers is then defined as the intersection of all inductive subsets of R. ♦ The sets (a, +∞), and (a, +∞) ∩ Q, a < 1, are clearly inductive. More importantly, N itself is inductive. Indeed, since 1 is common to all inductive sets, 1 ∈ N, and if n is common to all inductive sets, then so is n + 1. We may therefore characterize N as the smallest inductive set (in the sense of set inclusion). The principle of mathematical induction follows immediately from this characterization: 1.5.2 Principle of Mathematical Induction. For each n ∈ N, let P (n) be a statement depending on n. Suppose that (a) P (1) is true, (b) P (n + 1) is true whenever P (n) is true. Then P (n) is true for all n. Proof. Let S denote the set of n ∈ N for which P (n) is true. Then (a) and (b) imply that S is inductive and hence, as a subset of N, must in fact equal N. In a particular application of 1.5.2, part (a) is called the base step and part (b) the inductive step. The assumption in (b) that P (n) is true is called the induction hypothesis. The principle of mathematical induction has been loosely described as the “domino principle”: If dominoes are lined up vertically in such a way that the (n + 1)st domino will fall if the nth one falls, then, if the first domino is tipped, all the dominoes will fall. Mathematical induction may be used to give a rigorous proof that N is closed under addition: Let P (n) be the statement that n + m ∈ N for all m ∈ N. Then P (1) is true because N is inductive, and if, for some n, P (n) is true, that is, if n + m ∈ N for all m, then clearly P (n + 1) is true. A similar argument shows that N is closed under multiplication. Mathematical induction is indispensable in proving many useful inequalities and formulas. We offer two examples; others may be found in the exercises.

20

A Course in Real Analysis

1.5.3 Example. We prove by induction that 3n n! > nn for all n ∈ N. This is obvious for n = 1. For the induction step, we need the fact (verified in Example 2.2.4) that (1 + 1/n)n < 3, or equivalently, (n + 1)n < 3nn , for all n. Assuming this, we see that if 3n n! > nn , then 3n+1 (n + 1)! = 3(n + 1)3n n! > 3(n + 1)nn > (n + 1)n+1 . ♦ Pn 1.5.4 Example. We derive a closed formula for f (n) := k=1 (3k − 1)2 and then verify the result by induction. A little experimentation suggests that we should try a polynomial in n of degree 3, say g(n) := An3 + Bn2 + Cn + D. Then g(n + 1) − g(n) = A (n + 1)3 − n3 + B (n + 1)3 − n2 + C (n + 1) − n = 3An2 + (3A + 2B)n + A + B + C and f (n+1)−f (n) =

n+1 X

(3k −1)2 −

k=1

n X

2 (3k −1)2 = 3(n+1)−1 = 9n2 +12n+4.

k=1

Assuming that f (n) = g(n) for all n, we may equate coefficients to obtain A = 3, B = 3/2, and C = −1/2. Since f (1) = 4, we see that D = 0. Thus, under the assumption that the sum has a closed form that is a cubic polynomial, we obtain the formula n X

(3k − 1)2 = 3n3 + 32 n2 − 21 n.

k=1

To prove the validity of the formula we use induction. When n = 1, each side equals 4. Assuming the formula holds for n, we have n+1 X k=1

(3k −1)2 =

n X

2 2 (3k −1)2 + 3(n+1)−1 = 3(n+1)−1 +3n3 + 32 n2 − 12 n.

k=1

A little algebra shows that the last expression reduces to 3(n + 1)3 + 32 (n + 1)2 − 12 (n + 1). Thus the formula holds for n + 1, completing the induction.

♦

The stalwart reader may wish to use the methods of the last example to derive and then verify by induction the formula n X k=1

k4 =

n 6n4 + 15n3 + 10n2 − 1 . 30

There are many other types of applications of the principle of mathematical induction, some of which are given in the exercises. The following has important consequences in combinatorics, probability theory, and infinite series.

The Real Number System

21

1.5.5 Binomial Theorem. Let a, b ∈ R and n ∈ N. Then (a + b)n =

n X n k n−k n n! . a b , where := k!(n − k)! k k

k=0

Proof. For n = 1 the formula asserts that 1 0 1 1 1 0 a+b= a b + a b , 0 1 which follows from the convention 0! = 1. Suppose that the formula holds for some n ≥ 1. Writing (a + b)n+1 as (a + b)(a + b)n and using the induction hypothesis, we have (a + b)

n+1

=

n X n k=0 n X

k

k+1 n−k

a

b

+

n X n k=0

k

ak bn+1−k

n X n k n+1−k n ak bn+1−k + a b + an+1 + bn+1 = k k−1 k=1 k=1 n X n n = + ak bn+1−k + an+1 + bn+1 k−1 k k=1 n+1 X n + 1 = ak bn+1−k , k k=0

where, for the last step, we used Exercise 1.2.6. By induction, the formula holds for all n.

Exercises 1. ⇓9 Let 0 < a < x1 , y1 < b := a + 1 and define p p xn+1 = a + |xn − a| and yn+1 = b − |b − yn |. Prove that a < xn < xn+1 < b and a < yn+1 < yn < b for all n ∈ N. 2. Use induction to prove that a nonempty finite set has a maximum and a minimum. 3.S ⇓10 Verify by induction that 2n X (−1)k+1 k=1 9 This

k

exercise will be used in 2.2.3. 10 This exercise will be used in 6.4.8.

=

2n X 1 for all n ≥ 1. k

k=n+1

22

A Course in Real Analysis 4. Establish the following formulas by mathematical induction: (a)

n X

k = n(n + 1)/2.

(b)

k=1

(c)

n X k=1 n X

2

k 3 = [n(n + 1)/2] .

(d)

n X k=1 n X

k 2 = n(n + 1)(2n + 1)/6. (2k − 1)2 = n(4n2 − 1)/3.

k=1 n X

√ 1 √ = n. k−1+ k k=1 k=1 p n n X X 2k + k(k − 1) − 1 √ 3 2 4 √ (g) = n n. (4k − 6k + 4k − 1) = n . (h) √ k+ k−1 k=1 k=1

(e)

(2k − 1)3 = n2 (2n2 − 1).

(f)

√

5.S P Use the methods of 1.5.4 to derive and verify a closed formula for n 2 k=1 (5k − 4) . 6. Use known formulas to calculate (a) 1 · 2 + 2 · 3 + 3 · 4 + · · · + 999 · 1000. (b)S 1 · 3 + 3 · 5 + 5 · 7 + · · · + 999 · 1001. (c) 1 · 3 + 5 · 7 + 9 · 11 + · · · + 1001 · 1003. 7.S Use the principle of mathematical induction to prove the following variant: Let n0 ∈ Z and let P (n) is a statement depending on integers n ≥ n0 such that (a) P (n0 ) is true, (b) if n ≥ n0 and P (n) is true, then P (n + 1) is true. Then P (n) is true for every n ≥ n0 . 8. Use the variant of mathematical induction in Exercise 7 to verify the following inequalities. (For (e) use (1 + 1/n)n > 2, an easy consequence of the binomial theorem.) (a) S 2n + 1 < 2n , n ≥ 3.

(b) n2 < 2n , n ≥ 5.

(c) 2n < n!, n ≥ 4.

(d) 3n < n!, n ≥ 7.

(e) S 2n n! < nn , n ≥ 6.

(f) 8n n! < (2n)!, n ≥ 6.

9.S Use the variant of mathematical induction in Exercise 7 to prove that n < ln(n!), n ≥ 6. 10. Prove Bernoulli’s inequality: (1 + x)n ≥ 1 + nx, n ∈ Z+ , x ≥ −1.

The Real Number System

23

11. Use the principle of mathematical induction to prove the following variant: Let n0 ∈ Z and let P (n) be a statement depending on integers n ≥ n0 such that (a) P (n0 ) is true, (b) P (n + 1) is true whenever P (j) is true for all n0 ≤ j ≤ n. Then P (n) is true for every n ≥ n0 . 12. (Prime Factorization). Use the variant of induction in Exercise 11 to prove that every integer n ≥ 2 may be written as a product of powers of prime numbers (for example, 72 = 23 · 32 ). 13.S The Fibonacci numbers fn are defined recursively by f0 = f1 = 1 and fn+1 = fn + fn−1 , n ≥ 1. Use the variant of induction in Exercise 11 to prove that √ √ 1 1+ 5 1− 5 n+1 n+1 fn = √ a −b , a := , b := , 2 2 5 where a, b are the zeros of x2 − x − 1. 14. Let a0 and a1 be arbitrary and define an+1 = 21 (an + an−1 ),

n ≥ 1.

Use the variant of induction in Exercise 11 to prove that for all n ≥ 0, an =

1 (−1)n (a0 − a1 ) + (a0 + 2a1 ). 3 · 2n−1 3

15.S (Division algorithm). Prove that for each pair of integers m and n with n > 0 there exist unique integers q and r such that m = qn + r and 0 ≤ r ≤ n − 1. (The integer q is called the quotient and r the remainder on division of m by n.) 16. Use the variant of induction in Exercise Pp11 to prove that each n ∈ N may be uniquely expressed in the form k=0 dk 10k for some p ∈ N and dk ∈ {0, 1, . . . , 9}. The representation n = dp dp−1 . . . d0 is called the decimal positional notation for n.

24

A Course in Real Analysis

1.6

Euclidean Space

The real number system may be used to construct other important mathematical systems, such as n-dimensional Euclidean space and the complex number system. In this section we construct the former. The reader may delay reading this section, as the material will not be needed until Chapter 8. For n ∈ N, let Rn denote the set of all n-tuples x := (x1 , x2 , . . . , xn ), where xj ∈ R. Each such n-tuple is called a point or vector, depending on context. The distinction between points and vectors is important in physics and geometry, as it allows one to refer to a vector at a point, a notion useful in describing, say, forces or tangent vectors. The set Rn has an algebraic structure which is defined as follows: Let x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), and t ∈ R. The operations of addition x + y and scalar multiplication tx in Rn are then defined by x + y = (x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ), and tx = t(x1 , . . . , xn ) = (tx1 , . . . , txn ). We also define −x := (−x1 , . . . , −xn )

and

0 := (0, . . . , 0).

The following theorem asserts that Rn is a vector space under these operations (see Appendix B). The straightforward proof is left to the reader. 1.6.1 Theorem. Addition and scalar multiplication on Rn have the following properties: • associativity of addition: (x + y) + z = x + (y + z); • commutativity of addition: x + y = y + x; • existence of an additive identity: x + 0 = x; • existence of additive inverses: x + (−x) = 0; • associativity of scalar multiplication: (st)x = s(tx); • distributivity of a scalar over vector addition: s(x + y) = sx + sy; • distributivity of a vector over scalar addition: (s + t)x = sx + tx; • existence of a scalar multiplicative identity: 1x = x.

The Real Number System

25

1.6.2 Definition. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). The Euclidean inner product x · y of x and y and the Euclidean norm kxk2 of x are defined by X 1/2 n n X √ 2 x·y = xj yj and kxk2 = xj = x · x. j=1

j=1

The set R with its vector space structure and the Euclidean inner product is called n-dimensional Euclidean space. ♦ n

The structure of Euclidean space allows one to define lines, planes, length, perpendicularity, angle between vectors, etc. These ideas will be useful in later chapters. 1.6.3 Theorem. The inner product in Rn has the following properties: (a) x · x = kxk22 . (b) x · y = y · x (commutativity). (c) t(x · y) = (tx) · y = x · (ty) (associativity). (d) x · (y + z) = (x · y) + (x · z) (additivity). (e) |x · y| ≤ kxk2 kyk2 (Cauchy–Schwartz inequality). Proof. Properties (a) and (b) are immediate and parts (c) and (d) follow respectively from the calculations t

n X j=1

xj yj =

n n n n n X X X X X (txj )yj = xj (tyj ) and xj (yj + zj ) = xj yj + xj zj . j=1

j=1

j=1

j=1

j=1

The inequality in (e) holds trivially if y = 0. Suppose y 6= 0, so kyk2 6= 0. By properties (a)–(d), 0 ≤ kx − tyk22 = (x − ty) · (x − ty) = kxk22 − 2t(x · y) + t2 kyk22 . Setting t = (x · y)/kyk22 , we obtain 0 ≤ kxk22 − 2(x · y)2 /kyk22 + (x · y)2 /kyk22 = kxk22 − (x · y)2 /kyk22 , which implies that (x · y)2 ≤ kxk22 kyk22 . Taking square roots yields (e). 1.6.4 Theorem. The Euclidean norm on Rn has the following properties: (a) kxk2 ≥ 0 (nonnegativity). (b) kxk2 = 0 iff x = 0 (coincidence). (c) ktxk2 = |t| kxk2 (absolute homogeneity). (d) kx + yk2 ≤ kxk2 + kyk2 (triangle inequality).

26

A Course in Real Analysis

Proof. Parts (a) and (b) are clear, and (c) follows from ktxk22 =

n n X X (txj )2 = t2 x2j = t2 kxk22 . j=1

j=1

For (d) we use 1.6.3: kx + yk22 = (x + y) · (x + y) = kxk22 + kyk22 + 2(x · y) ≤ kxk22 + kyk22 + 2kxk2 kyk2 = (kxk2 + kyk2 )2 .

Exercises 1.S Solve the following system of vector equations for x and y in terms of a, b, c, d, and e, assuming that (a · b)(d · b) 6= 1. x + (y · b)a = c y + (x · b)d = e. 2. Prove the following: (a) kx + yk22 − kx − yk22 = 4(x · y) (polarization identity). (b) kx + yk22 + kx − yk22 = 2 kxk22 + kyk22 (parallelogram rule). (c)S kxk2 − kyk2 ≤ kx − yk2 . Pn (d) kx1 + · · · + xn k2 ≤ j=1 kxj k2 (generalized triangle inequality). 3.S Suppose that xi · xj = 0 for i 6= j. Prove that kx1 + · · · + xk k22 =

k X

kxj k22 .

j=1

4. ⇓11 For x = (x1 , . . . , xn ) define kxk1 =

n X

|xj | and kxk∞ = max{|x1 |, . . . , |xn |}.

j=1

Verify that k · k1 and k · k∞ have the properties (a)–(d) of 1.6.4. 5. A nonempty subset C of Rn is said to be convex if x, y ∈ C and t ∈ [0, 1] imply that tx + (1 − t)y ∈ C. Let r > 0. Prove that {x ∈ Rn : kxk2 ≤ r} is convex. Is the set {x ∈ Rn : kxk2 = r} convex? What about the sets {x ∈ Rn : kxk1 ≤ r} and {x ∈ Rn : kxk∞ ≤ r}? 11 This

exercise will be used in Section 8.1.

The Real Number System

27

6. Find positive constants a, b, c such that for all x ∈ Rn , kxk2 ≤ akxk1 ,

kxk1 ≤ bkxk∞ , and

kxk∞ ≤ ckxk2 .

7.S Prove that kxk2 = kyk2 = k(x + y)/2k2 = 1 ⇒ x = y. Is the same true for k · k∞ or k · k1 ? 8. Show that in R3 , a · b = kak kbk cos θ, where θ is the (smaller) angle between a and b. 9. The cross product of vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) in R3 is defined by a a3 , − a1 a3 , a1 a2 a × b = 2 b2 b3 b1 b3 b1 b2 = ha2 b3 − a3 b2 , a3 b1 − a1 b3 , a1 b2 − a2 b1 i . Let θ be the (smaller) angle between a and b. Verify the following: (a) (a × b) · a = (a × b) · b = 0. (b) b × a = −a × b. (c) a × (tx + sy) = t(a × x) + s(a × y). (d) (a × b) · c = a · (b × c). (e) a × (b × c) = (a · c)b − (a · b)c. (f) ka × bk = kak kbk sin θ.

Chapter 2 Numerical Sequences

2.1

Limits of Sequences

Simply stated, a sequence in a set E is a function from N to E. It is more instructive, however, to think of a sequence as an infinite ordered list of members of E. The list may be written out, for example, as a1 , a2 , . . . , an , . . . or abbreviated by {an }∞ n=1 or simply by {an }. A sequence usually starts with the index 1, although this is not necessary, 0 being a common alternative. The set E in the definition of sequence is arbitrary. However, for Part I of the book, we consider only numerical sequences, that is, sequences contained in R. Sequences may be defined by a closed formula, such as an = (−1)n , or recursively, such as the Fibonacci sequence, defined by a0 = a1 = 1 and an+1 = an + an−1 , n ≥ 1 (see Exercise 1.5.13). The following notion will occasionally be useful. A property P of a sequence {an } is said to hold eventually if there exists an index N such that an has property P for all n ≥ N . For example, by the Archimedean principle, the sequence {1/n} is eventually less than .001. Or, consider the sequence defined by an = n2 + 100(−1)n ; the reader may verify that eventually an < an+1 . Convergence of a sequence to a number a expresses the idea that eventually the terms of the sequence will be as close to a as desired. The following definition makes this precise. 2.1.1 Definition. A sequence {an } in R is said to converge to a real number a, written an → a or lim an = lim an = a, n

n→+∞

if for each ε > 0 there exists N ∈ N such that |an − a| < ε, (a − ε < an < a + ε), for all n ≥ N. If no such real number a exists, then the sequence is said to diverge.

♦ 29

30

A Course in Real Analysis

a+ a a− 1 2 3 4 5

N −2

N N +2

FIGURE 2.1: Convergence of a sequence to a It follows immediately from the definition that an → a iff the terms of sequence eventually lie in any open interval containing a. The definition also implies that an → a iff |an − a| → 0. Limits, if they exist, are unique. Indeed, if an → a and an → b, then by the triangle inequality |a − b| ≤ |a − an | + |b − an | → 0, hence a = b. Examples. (a) The sequence {(−1)n } oscillates between −1 and 1 and so cannot converge. For a rigorous proof, suppose (−1)n → a for some a ∈ R. Choose N such that a − 1 < (−1)n < a + 1 for all n ≥ N . Thus, if n ≥ N is even, then 1 < a + 1, and if n ≥ N is odd, then a − 1 < −1. Adding these inequalities produces the absurdity a < a. (b) To show that

(−1)n = 0, n n let ε > 0 and choose an integer N > 1/ε (Archimedean principle). Then |(−1)n /n − 0| = 1/n < ε for all n ≥ N . lim

(c) To verify that lim n

note that

2n + 1 2 = , 3n + 5 3

2n + 1 2 7 7 − 3n + 5 3 = 3(3n + 5) < n ,

so any index N > 7/ε satisfies the condition in 2.1.1.

♦

2.1.2 Definition. A sequence {an } is said to be bounded (above, below ) if the set of its terms is bounded (above, below). ♦ 2.1.3 Proposition. A convergent sequence in R is bounded. Proof. Assume that an → a ∈ R. Choose N such that |an − a| < 1 for all n > N . Since |an | − |a| ≤ |an − a|, we see that |an | ≤ |an − a| + |a| < 1 + |a| for all n > N . Thus |an | ≤ max{1 + |a|, |a1 |, . . . , |aN |} for all n ∈ N.

Numerical Sequences

31

2.1.4 Theorem. Let {an } and {bn } be sequences with an → a and bn → b. If an ≤ bn for infinitely many n, then a ≤ b. Proof. Suppose b < a. Then b < (a + b)/2 < a, hence we may choose indices N1 and N2 such that bn < (a + b)/2 for all n ≥ N1 and an > (a + b)/2 for all n ≥ N2 . But then bn < an for all n ≥ max{N1 , N2 }, contradicting the hypothesis. Note that, as a consequence of the preceding theorem, a convergent sequence in a closed interval I must have its limit in I. 2.1.5 Theorem (Squeeze principle). Let {an }, {bn }, and {cn } be sequences in R such that an ≤ bn ≤ cn for all n. If limn an = limn cn = x ∈ R, then limn bn = x. Proof. Given ε > 0, choose N1 , N2 ∈ N such that |an − x| < ε for all n ≥ N1 and |cn − x| < ε for all n ≥ N2 . For n ≥ max{N1 , N2 }, the inequalities −ε < an − x ≤ bn − x ≤ cn − x < ε imply that |bn − x| < ε.

an

bn

cn

x FIGURE 2.2: The squeeze principle. 2.1.6 Example. We show that limn nrn = 0 for any r ∈ (0, 1). Let h = r−1 −1. Then h > 0 and, by the binomial theorem, r−n = (1 + h)n = 1 + nh + 12 n(n − 1)h2 + · · · > 21 n(n − 1)h2 , hence 0 < nrn <

2 , n > 1. (n − 1)h2

Since the term on the right tends to 0 as n → +∞, the squeeze principle shows that nrn → 0. (See Exercise 16 for an extension of this result.) ♦ For another illustration of the squeeze principle we prove 2.1.7 Proposition. For any real number x there exist sequences {an } in Q and {bn } in I such that limn an = limn bn = x. Proof. By 1.4.8 and Exercise 1.4.9, for each n ∈ N we may choose points an ∈ (x − 1/n, x + 1/n) ∩ Q and bn ∈ (x − 1/n, x + 1/n) ∩ I. The squeeze principle then implies that an , bn → x.

32

A Course in Real Analysis

2.1.8 Definition. (Infinite limits) A sequence {an } in R is said to diverge to +∞, written an → +∞ or lim an = lim an = +∞, n

n→+∞

if for each real number M there exists an index N such that an ≥ M for all n ≥ N . Divergence to −∞ is defined analogously. ♦ 2.1.9 Example. If r > 1, then rn /n → +∞. This follows from 2.1.6: Given M > 0 there exists N ∈ N such that n/rn < 1/M , hence rn /n > M , for all n ≥ N. ♦ 2.1.10 Example. If r > 0, then an := rn n! → +∞. Indeed, since an = rn → +∞, an−1 there exists N ∈ N such that an > 2an−1 , for all n > N . Iterating, we see that an > 2k an−k ≥ kan−k , so taking k = n − N we have an > (n − N )aN for all n > N . Since limn (n − N )aN = +∞ (Archimedean principle), the assertion follows. ♦ For the following theorem, recall the conventions regarding addition and multiplication in the extended real number system R (1.4.12). 2.1.11 Theorem. Let {an } and {bn } be sequences in R. The following limit properties hold in R in the sense that if the expression on the right side of the equation exists in R, then the limit on the left side exists and equality holds. (a) limn (san + tbn ) = s limn an + t limn bn ,

s, t ∈ R.

(b) limn an bn = limn an limn bn . (c) limn an /bn = limn an / limn bn , if limn bn 6= 0. (d) limn |an | = | limn an |. √ √ (e) limn an = limn an if an ≥ 0 for all n. Proof. Let an → a, bn → b. We prove the theorem first for the case a, b ∈ R. Let ε > 0. For (a) choose N1 and N2 so that |an − a| <

ε ε for all n ≥ N1 and |bn − b| < for all n ≥ N2 . 2(|s| + 1) 2(|t| + 1)

If n ≥ N := max{N1 , N2 }, then both of these inequalities hold, hence, by the triangle inequality, |san + tbn − (sa + tb)| ≤ |s| |an − a| + |t| |bn − b| < ε/2 + ε/2 = ε.

Numerical Sequences

33

To prove (b), choose M ≥ |a| so that |bn | ≤ M for all n (2.1.3) and choose N so that |an − a| < ε/2M and |bn − b| < ε/2M for all n ≥ N . For such n, |an bn − ab| = |(an − a)bn + a(bn − b)| ≤ |an − a||bn | + |a||bn − b| ≤ M |an − a| + M |bn − b| < ε/2 + ε/2 = ε. For (c) it suffices to show that 1/bn → 1/b. Choose N such that |bn − b| < min{|b|/2, εb2 /2}

for all n ≥ N .

For such n, |bn | ≥ |b| − |bn − b| > |b|/2, hence 1 − 1 = |bn − b| ≤ 2|bn − b| < ε. bn b |bbn | b2 Part (d) follows from the inequality |an | − |a| ≤ |an − a|. For (e), observe first that a ≥ 0 (2.1.4). If a = 0, choose N √ such that an < ε2 for all n ≥ N . If a > 0, choose N such that |an − a| < ε a for all n ≥ N . For such n, √ √ |an − a| |a − a| √ ≤ n√ | an − a| = √ < ε. an + a a To illustrate the remaining cases a = ±∞ or b = ±∞, we prove part (b) for the case −∞ < a < 0 and bn → +∞. To show that an bn → −∞, let M < 0 and choose N so that an < a/2 For such n,

and bn > 2M/a for all n ≥ N .

−an bn > (−a/2)(2M/a) = −M,

hence an bn < M . 2.1.12 Example. To find √ lim n

4n6 − 3n2 + 5 , 2n3 + 7n + 3

divide the numerator and denominator of the general term an by n3 , the highest power of n occurring in the denominator, to obtain p 4 − 3/n4 + 5/n6 an = . 2 + 7/n2 + 3/n3 The quotients in the numerator and denominator tend to 0, hence, by 2.1.11, √ an → 4/2 = 1. ♦

34

A Course in Real Analysis

Exercises 1. Let a, b ∈ R. Find a closed formula for the nth term an of the sequences (a)S a, b, a, b, . . .

(b) a, a, b, b, a, a, . . .

(d) a, b, a, c, a, b, a, c . . .

(c) a, a, a, b, b, b, a, a, a . . . (e) 1, 2, 3, 4, 1, 2, 3, 4, . . .

2. Find a recursive formula for the sequence a, b, a, b, . . . 3. Use the ε, N definition of limit to prove that 4n − 1 (a) lim = 2. n 2n + 7

(b)

S

n−1 = +∞. (e) S (d) lim √ n n+1

√ 5 2n2 − n 5 n+7 √ = . lim 2 = 2. (c) lim n n +3 n 3 n+2 3 r 1 3 n+2 = 8. (f) limn lim 2 + = 1. n n n+1

4. Prove rigorously that the sequence {(−1)n n/(n + 1)} has no limit. 5.S Find limn sin (n!rπ) for r ∈ Q. 1 1 p 6. Find limn n+ for all p ∈ R. n n 7.S Let {an } be contained in a finite set A. Prove that if an → a, then there exists an index N such that an = a for all n ≥ N . In particular, a ∈ A. 8. Find limn bn if (a)S an → a and 3an + 2bn → c. (b) an → 2 and 3an bn + 5a2n − 2bn → 1. 9. Let k ∈ N and a, b > 0. Evaluate limn an if an =

(n + k)! . n!(n + k)k (g) S n (a − 1/n)k − ak .

1/2 an − 1 . (b) bn + 1 q √ √ (d) S an + b n − an. p (f) nk a2 + n−k − a . h i (h) n 1 − (1 − a/n)1/k .

(i) (1 − 1/2)(1 − 1/3) · · · (1 − 1/n).

(j)

2n + 1 (a) . k (n + 3n + 1)1/k p (c) n2 + kn − n. S

(e)

(k) S (1 − 1/22 )(1 − 1/32 ) · · · (1 − 1/n2 ). (l)

n X

(n2 + j)−1 .

j=1 n X

(nk + j)−1/k , k > 1.

j=1

10. Let {an } be bounded and bn → 0. Prove that an bn → 0.

Numerical Sequences

35

11.S Let an → a ∈ R, bn → b ∈ R, and r > 0 such that |an − bn | ≤ r for all n. Prove that |a − b| ≤ r. √ 12. Prove that if nan → a ∈ R, then n an → 0. Show that the converse is false. 1/k

13. Let an ≥ 0 for all n and an → a. Prove that an

→ a1/k , k ∈ N.

14. Let r > 0 and k ∈ N. Prove in each case that an → 1: (a)S an = r1/n . (c) an = r + nk

1/n

(b) an = n1/n . (d) an = sin(1/n)]1/n .

.

+ − − 15. Prove that an → a iff a+ n → a and an → a . (See 1.3.7.)

16. Let m ∈ N and r ∈ (−1, 1). Prove that limn nm rn = 0. 17.S Let 0 < r < 1, an > 0, and an+1 /an < r for all n. Prove that an → 0. Construct a sequence {an } such that an > 0 and an+1 /an < 1 for all n but an 6→ 0. 18. Suppose that an → a ∈ R. Prove that lim(a1 + · · · + an )/n = a. n

Is the converse true? 19.S Let an → a ∈ R and let an ≥ a for all n. Prove that lim min{a1 , · · · , an } = a. n

Does min{a1 , · · · , an } → a imply that an → a? 20. Show that if n−1 an → 0, then n−1 max{a1 , · · · , an } → 0. Prove that the converse holds if {an } is bounded below. Give an example to show that the converse is not generally true. 21. Let 0 < x1 ≤ · · · ≤ xk . Prove that lim(xn1 + · · · + xnk )1/n = xk . n

22.S Let f (x) be any real-valued function on R such that f (x) − x is bounded for all x (for example, f (x) = bxc). Use Exercise 1.5.4 to prove that Pn Pn (a) (1/n2 ) j=1 f (jx) → x/2. (b) (1/n3 ) j=1 f (j 2 x) → x/3. √ 23. Let a0 , a1 > 0 and an = an−1 an−2 , n ≥ 2. Find limn an . 24. Let k ∈ N and let {an } be a sequence such that an+k − an → c ∈ R. Prove that an /n → c/k. Suggestion. Consider first the case k = 1 to get the general idea.

36

2.2

A Course in Real Analysis

Monotone Sequences

2.2.1 Definition. A sequence {an } in R is said to be increasing (strictly increasing) if an ≤ an+1 (an < an+1 ) for all n. Decreasing and strictly decreasing sequences are defined analogously. A sequence that is either increasing or decreasing is called monotone. If {an } is increasing (decreasing), we write an ↑ ( an ↓). If an ↑ (an ↓) and an → a ∈ R, we write an ↑ a (an ↓ a). ♦ 2.2.2 Monotone Sequence Theorem. If {an } is increasing (decreasing), then an ↑ supk ak (an ↓ inf k ak ). In particular, every bounded monotone sequence converges in R. Proof. Assume {an } is increasing and let r < supk ak . By the approximation property of suprema, r < aN ≤ supk ak for some N . Since {an } is increasing, r < an ≤ supk ak for all n ≥ N . Therefore, an ↑ supk ak . The proof for the decreasing case is similar. 2.2.3 Example. Let 0 < a < x1 , y1 < b := a + 1 and define {xn } and {yn } recursively by p p xn+1 = a + |xn − a| and yn+1 = b − |b − yn |. By Exercise 1.5.1, {xn } is strictly increasing, {yn } is strictly decreasing, and a < xn , yn < b for all n. By 2.2.2, xn ↑ x and √ yn ↓ y for some x, y ∈√R. To find x, let n → ∞ in the equation xn+1 = a+ xn − a to obtain x = a+ x − a. This has solutions x = a and x = b. Since {xn } is increasing, x = b. Similarly, y = a. ♦ 2.2.4 Example. We use the monotone sequence theorem to show that the sequence {(1 + 1/n)n } converges. By the binomial theorem (1.5.5) and the inequality k! ≥ 2k−1 (easily established by induction), n

(1 + 1/n) =

n X n 1/nk k

k=0

=2+ ≤2+

n X k=2 n X

(1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n)/k! 1/2k−1 .

k=2 n

Since the sum in the last inequality is ≤ 1, {(1 + 1/n) } is bounded above by 3. Now let m = n + 1. Then 1 − k/m ≥ 1 − k/n ≥ 0,

k = 1, . . . , n − 1,

Numerical Sequences

37

hence (1 + 1/m)

m

≥2+ >2+

n X k=2 n X

(1 − 1/m)(1 − 2/m) · · · (1 − (k − 1)/m)/k! (1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n)/k!

k=2 n

= (1 + 1/n) . n

Thus {(1 + 1/n) } is increasing. By 2.2.2, the sequence has a limit in R, which is denoted by the letter e: n

e := lim (1 + 1/n) = 2.71828182845905 . . . n

♦

Exercises 1.S Let 0 < a < 1 < b. Prove that a1/n ↑ 1 and b1/n ↓ 1. 2. Let an = an /nk and bn = bn /nk , where 0 < a < 1 < b and k ∈ Z+ . Prove that {an } is strictly decreasing and that {bn } is eventually strictly increasing. 3.S Let

na , a, b > 0. 1 + n2 b Prove: an ↓ 0 (eventually) and nan ↑ a/b. an =

4. Let xn > 0 and xn ↑ x. Prove that (xn1 + · · · + xnn )1/n → x. 5. Prove that for any nonempty set A of real numbers there exist sequences {an } and {bn } in A such that an ↑ sup A and bn ↓ inf A. 6. Let {an } be monotone and set bn := (a1 + a2 + · · · + an )/n. Prove that {bn } is monotone. (Compare with Exercise 2.1.18.) 7.S Define a1 = 1 and an = 1 + (1 + an−1 )−1 . Find limn an by first showing that 1 ≤ an ≤ 2, {a2n } is decreasing, and {a2n+1 } is increasing. √ √ 8. Let r > 0, a0 = r, and an = r + an−1 , n ≥ 1. Find limn an . 9.S Let r > 0, a1 > 0 and define an = 21 (an−1 + r/an−1 ), n > 1. √ Show that an ≥ an+1 ≥ r and find limn an . −n

10. Prove that e = limn (1 − 1/n) 11. Let < x0 < y0 and define √ xn+1 = xn yn

.

and yn+1 = (xn + yn )/2.

Prove that 0 < xn < xn+1 < yn+1 < yn and that limn xn = limn yn .

38

A Course in Real Analysis

2.3

Subsequences and Cauchy Sequences

2.3.1 Definition. A subsequence of a sequence {an }∞ n=1 in R is a sequence {ank }∞ , where the indices satisfy 1 ≤ n < n < · · · . The limit in R of a 1 2 k=1 subsequence is called a cluster point of {an }. ♦ For example, in the following sequence the underlined terms define the beginning of a subsequence {ank } with n1 = 3, n2 = 4, n3 = 6, etc. a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 , a10 , a12 , a13 , a14 , a15 , . . . Note that the indices nk of a subsequence satisfy nk ≥ k. Examples. (a) The sequence nh io 1 − (−1)b(n−1)/2c = {0, 0, 2, 2, 0, 0, 2, 2 . . .} is a subsequence of

{1 − (−1)n } = {2, 0, 2, 0, . . .},

which has cluster points 0 and 2. (b) The sequence {n sin (nπ/2)} has cluster points 0 and ±∞. (c) Let {r1 , r2 , . . .} be an arbitrary enumeration of the rational numbers (see Appendix A). Then every real number is a cluster point of {rn }. Indeed, since every interval of the form (x − 1/n, x + 1/n) contains infinitely many terms of the sequence, we may choose n1 ≥ 1 such that |x − rn1 | < 1, n2 > n1 such that |x − rn2 | < 1/2, etc. In this way we may construct a subsequence inductively such that |x − rnk | < 1/k for all k, hence rnk → x. ♦ Notation. It is occasionally convenient to use the following alternate method to describe a subsequence: If we set bk = ank and then change the index in {bk }∞ k=1 to n, then {bn } may be used to denote the subsequence {ank }. This provides a convenient way to denote a subsequence of a subsequence. In this regard, note that if {cn } is a subsequence of {bn } and {bn } is a subsequence of {an }, then {cn } is a subsequence of {an }. The following proposition shows that a convergent sequence has a single cluster point. 2.3.2 Proposition. If {an } is a sequence in R and an → a ∈ R, then ank → a for any subsequence {ank } of {an }. Proof. We prove the proposition for the case a ∈ R and leave the other cases for the reader. Given ε > 0, choose N such that |an − a| < ε for all n ≥ N . Since nk ≥ k, |ank − a| < ε for all k ≥ N . Therefore, ank → a.

Numerical Sequences

39 2

2.3.3 Example. We calculate limn (1 + 1/n2 )3n +5 by writing 3n2 +5 " n2 #3 5 1 1 1 = . 1+ 2 1+ 2 1+ 2 n n n The term in the square brackets is a subsequence of (1 + 1/n)n and hence 2 converges to e (see 2.2.4). It follows that (1 + 1/n2 )3n +5 → e3 . ♦ The following result will have important consequences in later chapters. 2.3.4 Bolzano–Weierstrass Theorem. Every bounded sequence in R has a convergent subsequence. Proof. The proof is based on the observation that if a union of two sets contains infinitely many terms of a sequence, then at least one of the sets must contain infinitely many of the terms of the sequence. Let {an } be a bounded sequence, say c0 ≤ an ≤ d0 for all n. Bisect the interval I0 := [c0 , d0 ]. By the preceding observation, one of the resulting subintervals, call it I1 , contains infinitely many terms of the sequence. Choose one such term, say an1 . Now bisect I1 . Again, one of the resulting subintervals, call it I2 , contains infinitely many terms of the sequence. Choose one such term an2 with n2 > n1 . By repeating this procedure, we produce a subsequence {ank }∞ k=1 of {an } and a sequence of intervals Ik = [ck , dk ], k = 0, 1, . . ., such that c0 ≤ ck−1 ≤ ck ≤ ank ≤ dk ≤ dk−1 ≤ d0 , and dk+1 − ck+1 = 21 (dk − ck ). Since {ck } and {dk } are monotone and bounded ck → c and dk → d for some c, d ∈ R. Since dk − ck = 2−k (d0 − c0 ) → 0, c = d. By the squeeze principle, ank → c.

I0

c0

I1 c 1 I2 I3

d0

an1 an2 c2 c3

d2

an3 .. .

d1

d3

FIGURE 2.3: Interval halving process. The Bolzano–Weierstrass theorem may be extended as follows: 2.3.5 Theorem. Every sequence in R has a subsequence that converges in R. Proof. If {an } is bounded, then the Bolzano–Weierstrass theorem applies. Suppose that {an } is unbounded above. Then for each k ∈ N there exist infinitely many indices n such that an > k. We may then construct a subsequence {ank } with ank > k for all k so ank → +∞.

40

A Course in Real Analysis

2.3.6 Corollary. A sequence {an } in R has a limit in R iff it has exactly one cluster point in R. Proof. The necessity is 2.3.2. For the sufficiency, suppose that {an } has exactly one cluster point a ∈ R. Consider first the case a = +∞. We claim that an → +∞. If not, then there exists M ∈ R such that an ≤ M for infinitely many n, hence there exists a subsequence {ank } of {an } with ank ≤ M for all k. By 2.3.5, {ank } has a cluster point b ∈ R. But b ≤ M < a, so {an } has more than one cluster point, a contradiction. Therefore, an → +∞, as claimed. The case a = −∞ is treated similarly. Now suppose a ∈ R. Then an → a. If not, then there exists ε > 0 such that |an − a| ≥ ε for infinitely many n, so there is a subsequence {ank } of {an } with |ank − a| ≥ ε for all k. By 2.3.4, {ank } has a cluster point b in R. But then |b − a| ≥ ε, so again {an } has more than one cluster point. 2.3.7 Definition. A sequence {an } is said to be Cauchy if for each ε > 0 there exists an index N such that |an − am | < ε for all m, n ≥ N . We express this condition by writing lim(an − am ) = 0. ♦ m,n

The definition asserts that the terms of a Cauchy sequence get closer to one another. Thus the following result is not surprising. 2.3.8 Proposition. Every convergent sequence is Cauchy. Proof. Let an → a. Given ε > 0, choose N such that |an − a| < ε/2 for all n ≥ N . Then for n, m ≥ N , |an − am | = |(an − a) + (a − am )| ≤ |an − a| + |am − a| < ε. It is of fundamental importance that the converse of 2.3.8 is true. To prove this, we need the following lemma. 2.3.9 Lemma. A Cauchy sequence is bounded. Proof. Let {an } be a Cauchy sequence. Choose N such that |an − am | < 1 for all m, n ≥ N . Then |an | ≤ |an − aN | + |aN | < 1 + |aN | for all n ≥ N , hence |an | ≤ max{1 + |aN |, |a1 |, |a2 |, . . . , |aN −1 |} for all n. 2.3.10 Cauchy Criterion. Every Cauchy sequence in R converges. Proof. By 2.3.9 and the Bolzano–Weierstrass theorem, a Cauchy sequence {an } has a convergent subsequence, say ank → a ∈ R. We claim that an → a. Let ε > 0 and choose N such that |an −am | < ε for all m, n ≥ N . In particular, |an − ank | < ε for n, k ≥ N . Fixing n ≥ N and letting k → ∞ in the last inequality yields |an − a| ≤ ε, verifying the claim.

Numerical Sequences

41

Exercises 1. Find all cluster points of {an }, where an = nπ 2n + 1 2n + 1 2 nπ n S n . (b) (−1) . (a) (−1) sin cos2 4n + 3 3 n+5 4 (c)S (−1)bn/3c (1 + 1/n)2 + (−1)bn/4c (2 + 1/n)2 + (−1)bn/5c (3 + 1/n)2 . (d) (−1)n rn + r2n , where rk is the remainder on division of k by 3. 2. Construct a sequence with precisely the cluster points 1, 2, 3, +∞. 3. Let k ∈ N. Use the fact that limn (1 + 1/n)n = e (2.2.4) to find limn an for an = n n n 1 1 1 1 . (b) 1+ . (c) + . (a) 1+ kn k+n k n kn 7n3 −4 1 1 (d)S 1 + . (e) 1+ 3 . 2n + k 3n + 5 4. Let {an } and {bn } be bounded sequences. Show that there exist convergent subsequences of {an } and {bn } with the same indices. 5.S Prove that a sequence contained in a finite set has a constant subsequence. 6. Let −∞ < an < r ≤ +∞ with an → r. Show that {an } has a strictly increasing subsequence. 7. Show that every sequence of distinct real numbers has a strictly monotone subsequence. P∞ 8.S Let k ∈ N and suppose that the series n=1 |an+k − an | converges (see Chapter 6). Prove that {an } has a convergent subsequence. 9. Let a0 , a1 be arbitrary and define an+1 = (an + an−1 )/2, n ≥ 1. Show directly that {an } is a Cauchy sequence. (Its limit may be found from Exercise 1.5.14.) 10.S Let 0 < p ≤ q and an > 0 for all n. Set bn = aqn /(1 + apn ). Show that an → 0 iff bn → 0. Is the assertion true if 0 < q < p? 11. Let I be an open interval and let {an } have the property that each open subinterval J of I contains an for infinitely many n. Prove that every point of I is a cluster point of {an }. Give an example of such a sequence. 12. Suppose that the cluster points of {an } form a sequence {bn }. Show that every cluster point b of {bn } is a cluster point of {an }. Hint. Choose a subsequence {bnk } such that |bnk − b| < 1/k.

42

A Course in Real Analysis

2.4

Limits Inferior and Superior

For an arbitrary sequence {an } in R, define an = inf ak k≥n

and an = sup ak , n = 1, 2, . . . . k≥n

Then {an } is increasing and {an } is decreasing, hence the limits lim inf an := lim an n

n

and lim sup an := lim an n

n

exist in R. These limits are called, respectively, the limit inferior and limit superior of the sequence {an }.

an

a a

an

FIGURE 2.4: a = lim inf n an and a = lim supn an . Clearly,

an ≤ an ≤ an and lim inf an ≤ lim sup an . n

n

Furthermore, if {an } is unbounded below, then lim inf n an = −∞, and if {an } is unbounded above, then lim supn an = +∞. Here are some examples: (−1)n n = −1, n+1 (b) lim inf n [(−1)n + 1]n = 0,

(−1)n n = 1, n+1 lim supn [(−1)n + 1]n = +∞,

(c) lim inf n sin n = −1,

lim supn sin n = 1.

(a) lim inf n

lim supn

Example (c) follows from Example 8.3.10. (See Exercise 8.3.15.) The next proposition shows that lim sup and lim inf have properties similar to those of limits. Their usefulness derives from this fact together with the property that, in contrast to ordinary limits, the limits inferior and superior of a sequence always exist (in R). 2.4.1 Proposition. For any sequences {an } and {bn } in R, (a) lim supn (−an ) = − lim inf n an . (b) lim supn (an + bn ) ≤ lim supn an + lim supn bn if the right side is defined. (c) lim inf n (an + bn ) ≥ lim inf n an + lim inf n bn if the right side is defined. (d) lim supn can = c lim supn an , if c ≥ 0.

Numerical Sequences

43

(e) lim inf n can = c lim inf n an , if c ≥ 0. (f) lim supn (an bn ) ≤ (lim supn an )(lim supn bn ) if an , bn ≥ 0 for all n. (g) lim inf n (an bn ) ≥ (lim inf n an )(lim inf n bn ) if an , bn ≥ 0 for all n. (h) lim inf n an ≤ lim inf n bn , lim supn an ≤ lim supn bn if an ≤ bn for all n. Proof. Part (a) follows from supk≥n (−ak ) = − inf k≥n ak and part (h) is a direct consequence of the definitions. Part (b) follows by taking limits in the inequality sup(ak + bk ) ≤ sup ak + sup bk . k≥n

k≥n

k≥n

Part (f) follows similarly from sup ak bk ≤ sup ak sup bk . k≥n

k≥n

k≥n

Part (d) is a consequence of sup cak = c sup ak , c ≥ 0. k≥n

k≥n

Parts (c), (e), and (g) are proved in a similar manner. 2.4.2 Theorem. For any sequence {an } in R, the extended real numbers a := lim inf n an and a := lim supn an are cluster points of {an }. All other cluster points of {an } in R lie between these. Proof. We leave the case a = −∞ to the reader. Assume then that a > −∞ and recall that an ↓ a. Choose a strictly increasing sequence of real numbers rn tending to a. Since r1 < a1 , by the approximation property of suprema there exists an index n1 such that r1 < an1 ≤ an1 . Similarly, since r2 < an1 +1 , there exists an index n2 > n1 such that r2 < an2 ≤ an2 . In this way we may construct inductively a subsequence {ank } such that rk < ank ≤ank . By the squeeze principle, ank → a. The limit infimum case is treated similarly. Now let {ank } be any subsequence of {an } with ank → a ∈ R. Then, for any m and k ≥ m, am ≤ ank ≤ am . Letting k → ∞ yields am ≤ a ≤ am . Letting m → ∞ we obtain a ≤ a ≤ a. Since limn an exists in R iff {an } has exactly one cluster point (2.3.6), the following result is immediate. 2.4.3 Corollary. For any sequence {an } in R, limn an exists in R iff lim inf n an = lim supn an . In this case, all three limits are equal.

44

A Course in Real Analysis

Exercises 1. Find lim inf n an and lim supn an if (−1)n 5n + 7 (a)S an = . 3n + 5 (b) an = nsin(nπ/2) + (1/n) cos(n). (c)S an = (−1)bn/3c (1+1/n)2 +(−1)bn/4c (2+1/n)2 +(−1)bn/5c (3+1/n)2 . 2nrn + 1 , rk the remainder on division of k ∈ N by 3. (d) an = nr2n + 1 (e) an = (−1)rn xn + (−1)rn+1 yn + (−1)rn+2 zn , where xn → x, yn → y, zn → z, and x < y < z. (f) a1 = 1, a2n = ra2n−1 , a2n+1 = ar + a2n , 0 < r < 1, a > 0. (g) an = 2n + 2−n + (−1)n (2n − 2−n ). 3n cos (nπ/4) + 2 (h)S an = . 2n sin (nπ/4) + 3 2. Show by example that the inequalities (b), (c), (f), and (g) in 2.4.1 may be strict. 3.S Let an > 0 for all n. Prove that lim sup(1/an ) = 1/ lim inf an and lim inf (1/an ) = 1/ lim sup an . n

n

n

n

4. Let {an } be bounded and nonnegative and let r ∈ Q+ . Prove that r r lim sup arn = lim sup an and lim inf arn = lim inf an . n

n

n

n

5.S Show that for any subsequence {ank } of {an }, lim sup ank ≤ lim sup an and lim inf ank ≥ lim inf an . k→∞

n

k→∞

n

6. Let bn → b ∈ (0, +∞). Prove that lim sup(an + bn ) = b + lim sup an and n

n

lim inf (an + bn ) = b + lim inf an . n

n

7.S Let an ≥ 0 for all n and bn → b ∈ (0, +∞). Prove that lim sup an bn = b lim sup an n

n

and

lim inf an bn = b lim inf an . n

n

8. Prove that lim sup an ≤ lim sup |an | and lim inf an ≥ lim inf |an |. n

n

n

Show by examples that the inequalities may be strict.

n

Numerical Sequences

45

9. Let {nk } be a sequence of positive integers that contains each positive integer exactly once. Show that lim sup ank = lim sup an and lim inf ank = lim inf an . n

k

n

k

In particular, if an → a, then ank → a. Note: {ank }∞ k=1 is not necessarily a subsequence {an }. 10.S Let an → a > 0 and lim inf n bn > 0. If b2n − an bn − 6a2n → 0, prove that lim supn→∞ bn ≤ 3a. 11. Prove that for any sequence {an }, n

lim inf an ≤ lim inf n

n

n

1X 1X aj ≤ lim sup aj ≤ lim sup an . n j=1 n j=1 n n

12.S ⇓1 Let an > 0 for all n. Prove that lim inf n

an+1 an+1 ≤ lim inf a1/n ≤ lim sup a1/n ≤ lim sup . n n n an an n n

Use this to calculate limn n/(n!)1/n .

1 This

exercise will be used in 7.4.2.

Chapter 3 Limits and Continuity on R

3.1

Limit of a Function

The definition of limit of a function f given in 3.1.3 below is a precise formulation of the intuitive idea that as x gets closer to a number a, the function value f (x) approaches some fixed number L. This notion is conveniently described in terms of certain subsets of R called neighborhoods. 3.1.1 Definition. Let r > 0. A neighborhood of form (a − r, a + r) N (a) = Nr (a) := (r, +∞) (−∞, −r)

a ∈ R is an interval of the if a ∈ R, if a = +∞, if a = −∞.

If a ∈ R, the set N (a) \ {a} := (a − r, a) ∪ (a, a + r) is called a deleted neighborhood of a. ♦ The reader should verify that the intersection of finitely many neighborhoods of a is again a neighborhood of a and that neighborhoods separate points, that is, if a = 6 b are extended real numbers, then there exist neighborhoods N (a) and N (b) such that N (a) ∩ N (b) = ∅. 3.1.2 Definition. An accumulation point of a nonempty set E of real numbers is an extended real number a such that every neighborhood of a contains a point of E not equal to a. A member of E that is not an accumulation point of E is called an isolated point of E. ♦ For example, the set of accumulation points of E := Q ∩ (−1, 0) ∪ N is [−1, 0] ∪ {+∞}, and the set of isolated points of E is N. The following definition of limit is sufficiently general to include the usual limits encountered in calculus: one-sided limits, two-sided limits, limits at infinity, and infinite limits. 3.1.3 Definition. Let E ⊆ R, let f be a real-valued function whose domain includes E, and let a, L ∈ R, where either a ∈ E or a is an accumulation point of E (not necessarily in the domain of f ). We write L = x→a lim f (x) x∈E

47

48

A Course in Real Analysis

if, for each neighborhood N (L) of L, there is a neighborhood N (a) of a such that x ∈ E ∩ N (a) implies f (x) ∈ N (L). (3.1) In this case we say that that f (x) approaches L as x tends to a along E

♦

The restrictions on a guarantee that E ∩ N (a) 6= ∅, hence condition (3.1) is not vacuously satisfied. Note that if a ∈ E is not an accumulation point of E, then it must be an isolated point, in which case lim{x→a, x∈E} f (x) trivially exists and equals f (a). We single out the following important special cases, where a ∈ R and s > 0: (a) left-hand limit :

lim f (x) := x→a lim f (x), E = (a − s, a).

x→a−

x∈E

(b) right-hand limit : lim+ f (x) := x→a lim f (x), E = (a, a + s). x→a

(c) two-sided limit : (d) limit at +∞ : (e) limit at −∞ :

lim f (x)

x→a

x∈E

:= x→a lim f (x), E = (a − s, a + s) \ {a}. x∈E

lim f (x) := lim f (x), E = (s, +∞).

x→+∞

x→+∞ x∈E

lim f (x) := lim f (x), E = (−∞, −s).

x→−∞

x→−∞ x∈E

f L + 1 L + 2 L L − 2 L − 1 a−δ

a

a+δ

x

FIGURE 3.1: δ works for ε1 but not for ε2 . Applying the definition of limit to the cases (a)–(e) above produces the standard limit definitions encountered in beginning calculus. For example, if the limit L in (c) is finite, then, in the context of (c), 3.1.3 asserts that for each ε > 0 there exists a δ ∈ (0, s) such that |f (x) − L| < ε for all x with 0 < |x − a| < δ. (See Figure 3.1.) For (e) and the case L = +∞, the definition asserts that for each M ∈ R there exists an r > s such that f (x) > M for all x with x < −r.

Limits and Continuity on R

49

The advantage of having a single definition of limit is that it provides a unified theory and allows for economy of thought and presentation. As in the case of sequences, limits of functions are unique. Indeed, if L1 = 6 L2 both satisfy criterion (3.1), then, given neighborhoods N (L1 ) and N (L2 ), there would exist a neighborhood N (a) such that x ∈ E ∩ N (a) ⇒ f (x) ∈ N (L1 ) ∩ N (L2 ). However, N (L1 ) and N (L2 ) may be taken to be disjoint, and choosing any x ∈ E ∩ N (a) then results a contradiction. In any discussion of limits we shall tacitly assume that a and E satisfy the conditions of 3.1.3. 3.1.4 Example. Let f (x) = (3x + 2)/(2x − 1). Then (a) limx→∞ f (x) = limx→−∞ f (x) = 3/2. (b) limx→a f (x) = f (a), (a 6= 1/2). (c) limx→1/2+ f (x) = +∞. (d) limx→1/2− f (x) = −∞. To verify (a,) let ε > 0 and note that the quantity 7 f (x) − 3 = 2 2|(2x − 1)| will be less than ε if |2x − 1| > 7/2ε. The latter inequality is satisfied if either x > (1 + 7/2ε)/2 or x < (1 − 7/ε)/2. For (b), observe first that 3x + 2 3a + 2 7|x − a| = − . |f (x) − f (a)| = 2x − 1 2a − 1 |2x − 1||2a − 1| By the triangle inequality, |2x − 1| ≥ |2a − 1| − |(2a − 1) − (2x − 1)| = |2a − 1| − 2|a − x|. Hence if |a − x| < |2a − 1|/4, then |2x − 1| > |2a − 1|/2 and therefore |f (x) − f (a)| <

14|x − a| . |2a − 1|2

It follows that |f (x) − f (a)| will be less than ε if we require additionally that |x − a| < ε|2a − 1|2 /14. Therefore, any δ < min{|2a − 1|/4, ε|2a − 1|2 /14} will satisfy criterion (3.1). To prove (c), note that if 0 < |x − 1/2| < 1/2, then x > 0, hence f (x) =

1 3x + 2 1 > . 2 x − 1/2 x − 1/2

Given M > 2, let δ = 1/M . Then |x − 1/2| < δ ⇒ 0 < x − 1/2 < 1/M ⇒ f (x) > M , proving (c). The proof of part (d) is similar. ♦

50

A Course in Real Analysis

3.1.5 Theorem. Let f be a function with domain D and let E = E1 ∪E2 ⊆ D. Suppose that one of the following holds: • a is an accumulation point of both E1 and E2 . • a is an isolated point of both E1 and E2 . • a is an accumulation point of E1 and an isolated point of E2 . • a is an accumulation point of E2 and an isolated point of E1 . Then lim{x→a, x∈E} f (x) exists in R iff both limits lim{x→a, x∈E1 } f (x) and lim{x→a, x∈E2 } f (x) exist in R and are equal. In this case all three limits are equal. Proof. If a is an accumulation point of E1 or E2 , then a is an accumulation point of E. If a is an isolated point of E1 and E2 , then a is an isolated point of E. This shows that in each case lim{x→a, x∈E} f (x) is at least defined. Now suppose that L := lim{x→a, x∈E} f (x) exists. Then (3.1) holds for E, so it must hold for each of the subsets E1 and E2 as well. Therefore, lim{x→a, x∈E1 } f (x) and lim{x→a, x∈E2 } f (x) exist and equal L. Conversely, suppose that the limits along E1 and E2 exist and equal K ∈ R. Then, given a neighborhood N (K), there exists a neighborhood N (a) of a such that x ∈ Ej ∩ N (a) implies f (x) ∈ N (K), j = 1, 2. Thus x ∈ E ∩ N (a) implies f (x) ∈ N (K), proving that lim{x→a, x∈E} f (x) = K. 3.1.6 Example. Take E1 = N and E2 = (0, 2). Then 2 is an isolated point of E1 and an accumulation point of E2 , and lim{x→2, x∈E1 } f (x) = f (2). Therefore, by the theorem, lim{x→2, x∈E} f (x) exists iff limx→2− f (x) = f (2). ♦ 3.1.7 Example. (Dirichlet function). Let ( 1 if x ∈ Q, d(x) = 0 otherwise. Since lim{x→a, x∈Q} d(x) = 1 and lim{x→a, x∈I} d(x) = 0, limx→a d(x) cannot exist. ♦ The following is an immediate consequence of 3.1.5. 3.1.8 Corollary. limx→a f (x) exists iff limx→a− f (x) and limx→a+ f (x) exist and are equal. In this case all three limits are equal. The next result shows that function limits may be characterized in terms of limits of sequences. 3.1.9 Sequential Characterization of Limit. Let f be a function whose domain includes E and let a ∈ R be an accumulation point of E. Then lim{x→a, x∈E} f (x) exists in R and equals L iff f (an ) → L for all sequences {an } in E with an → a.

Limits and Continuity on R

51

Proof. Assume that lim{x→a, x∈E} f (x) = L and let {an } be a sequence in E with an → a. Given a neighborhood N (L), choose N (a) as in (3.1) and then choose N such that an ∈ N (a) for all n ≥ N . For such n, f (an ) ∈ N (L). Therefore, f (an ) → L. Now suppose that lim{x→a, x∈E} f (x) 6= L. Then there is a neighborhood of L such that (3.1) fails for each neighborhood N (a) of a. Consider the case a, L ∈ R. Then N (L) is of the form (L − r, L + r) for some r > 0. Taking N (a) = (a − 1/n, a + 1/n) we see that for each n ∈ N there exists an ∈ E with |an − a| < 1/n and |f (an ) − L| ≥ r. Thus an → a and f (an ) 6→ L, so the sequential condition does not hold. A similar argument works if either a or L is infinite. 3.1.10 Example. Let f (x) = sin (1/x), x = 6 0. Since f 1/nπ = 0 and f 2/(4n + 1)π = 1, limx→0+ f (x) does not exist. ♦ 3.1.11 Cauchy Criterion for Functions. Let a be an accumulation point of E. Then lim{x→a, x∈E} f (x) exists in R iff given ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ. Proof. If lim{x→a, x∈E} f (x) exists in R, then an application of the triangle inequality shows that the ε, δ-condition of the theorem holds. Conversely, assume that the ε, δ-condition holds and let {an } be a sequence in E with an → a. By the hypothesis, {f (an )} is a Cauchy sequence and so converges to some real number L. Suppose {bn } is another sequence in E converging to a. Then an − bn → 0 so, by the ε, δ-condition, f (an ) − f (bn ) → 0. Therefore, f (bn ) → L. By 3.1.9, lim{x→a, x∈E} f (x) = L. 3.1.12 Theorem. Let f be a function whose domain includes E and let a ∈ R be an accumulation point of E. Then the following properties hold in the sense that if the expressions on the right exist in R, then the limits on the left exist and the equality holds. (a) x→a lim [sf (x) + tg(x)] = s x→a lim f (x) + t x→a lim g(x), s, t ∈ R. x∈E

x∈E

x∈E

(b) x→a lim f (x)g(x) = x→a lim f (x) x→a lim g(x). x∈E

x∈E

x∈E

lim{x→a, x∈E} f (x) f (x) = if x→a lim g(x) 6= 0. g(x) lim{x→a, x∈E} g(x) x∈E x∈E (d) x→a lim |f (x)| = x→a lim f (x) . (c) x→a lim

x∈E

x∈E

Proof. The assertions follow immediately from 2.1.11 and 3.1.9. However, it is instructive to formulate direct proofs. We do this for the finite version of part (c). Assume that the limits L := x→a lim f (x) and M := x→a lim g(x) 6= 0 x∈E

x∈E

52

A Course in Real Analysis

are finite and let ε > 0. Choose N1 (a) such that |g(x) − M | < |M |/2 for all x ∈ E ∩ N1 (a). For such x, |g(x)| ≥ |M | − |M − g(x)| ≥ |M |/2, hence f (x) L |M f (x) − Lg(x)| g(x) − M = |M g(x)| |M (f (x) − L) + L(M − g(x))| = |M g(x)| |M | |f (x) − L| + |L| |M − g(x)| ≤ |M |2 /2 2 2|L| |f (x) − L| + |M − g(x)| = |M | M2 ≤ K |f (x) − L| + |M − g(x)| , K := 2/|M | + 2|L|/M 2 . Now choose N2 (a) so that |f (x) − L| < ε/2K and |M − g(x)| < ε/2K for all x ∈ E ∩ N2 (a). Then x ∈ E ∩ N1 (a) ∩ N2 (a) ⇒ |f (x)/g(x) − L/M | < ε. 3.1.13 Example. (Limits of rational functions at infinity). Let f (x) = P (x)/Q(x), where P (x) = a0 + a1 x + · · · + an xn and Q(x) = b0 + b1 x + · · · + bm xm , an , bm 6= 0. For any a, c ∈ R, limx→c a = a and limx→c x = c, hence, by 3.1.12, limx→c f (x) = f (c), provided Q(c) 6= 0. To calculate limits at +∞, write f (x) =

a0 x−n + a1 x−n+1 + · · · + an−1 x−1 + an n−m x . b0 x−m + b1 x−m+1 + · · · + bm−1 x−1 + bm

Since limx→+∞ x−j = 0 for j ∈ N, we see that if m > n, 0 lim f (x) = an /bn if m = n, and x→+∞ ±∞ if m < n, where the sign in the last case is that of an /bm .

♦

3.1.14 Theorem. Let f be a function whose domain includes E and let a ∈ R be an accumulation point of E. If f (x) ≤ g(x) for all x ∈ E and if L := lim{x→a, x∈E} f (x) and M := lim{x→a, x∈E} g(x) exist in R, then L ≤ M . Proof. Assume, for a contradiction, that M < L. Choose any K ∈ (M, L) and then choose neighborhoods N (L) ⊆ (K, +∞) and N (M ) ⊆ (−∞, K) (see Figure 3.2). Then there exists a neighborhood N (a) such that f (x) ∈ N (L) and g(x) ∈ N (M ) for all x ∈ E ∩ N (a). But for any such x, g(x) < f (x), contradicting the hypothesis.

Limits and Continuity on R N (M )

53

N (L)

M g(x)

K

f (x) L

FIGURE 3.2: L can’t be greater than M . 3.1.15 Theorem (Squeeze principle for functions). Let f be a function whose domain contains E and let a ∈ R be an accumulation point of E. If f (x) ≤ g(x) ≤ h(x) for all x ∈ E and if the limits lim{x→a, x∈E} f (x) and lim{x→a, x∈E} h(x) exist in R and are equal, then lim{x→a, x∈E} g(x) exists in R and all three limits are equal. Proof. Let L denote the common limit. For the case L ∈ R, given ε > 0 there exists a neighborhood N (a) of a such that L − ε ≤ f (x) ≤ g(x) ≤ h(x) < L + ε for all x ∈ E ∩ N (a). The cases L = ±∞ are proved similarly. 3.1.16 Definitions. A function f is said to be strictly increasing on E if f (x) < f (y) for all x, y ∈ E with x < y. Similarly, f is increasing on E if f (x) ≤ f (y) for all x, y ∈ E with x < y. The notions of strictly decreasing and decreasing are defined analogously. If f is either (strictly) increasing or (strictly) decreasing on E, then f is said to be (strictly) monotone on E. Finally, f is bounded on E if there exists a real number M such that |f (x)| ≤ M for all x ∈ E. ♦ The reader should compare the following theorem with the monotone sequence theorem (2.2.2). 3.1.17 Monotone Function Theorem. Let a, b, c ∈ R with a < c < b. If f is monotone on (a, b), then limx→a+ f (x), limx→b− f (x) exist in R and limx→c− f (x), limx→c+ f (x) exist in R. Proof. Assume that f is increasing. Let s := supa 0 for all x ∈ E. Prove that lim sup x→a x∈E

1 1 = . f (x) lim inf {x→a, x∈E} f (x)

5. Prove that lim sup f (x) ≤ lim sup |f (x)| and lim inf f (x) ≥ lim inf |f (x)|. x→a x→a x→a x→a x∈E

x∈E

x∈E

x∈E

Show by examples that the inequalities may be strict. 6. Let f : [a, b) → R and g(x) = supa≤t≤x f (t), a ≤ x < b. Prove that g(x0 ) ≤ limx→x0 + g(x) for every x0 ∈ [a, b).

3.3

Continuous Functions

3.3.1 Definition. A function f with domain D is said to be continuous at a point a ∈ D if lim{x→a, x∈D} f (x) = f (a); that is, for each ε > 0 there exists a δ > 0 such that |f (x) − f (a)| < ε for all x ∈ D with |x − a| < δ. If f is continuous at each point of a subset E of D, then f is said to be continuous on E. If f is continuous on D, then f is simply said to be continuous. A point in D at which f is not continuous is called a discontinuity of f . ♦ The definition of continuity implies that any function f : D → R is continuous at an isolated point of D. For example, if D is a finite set or a set of integers, then every function f : D → R is continuous. Continuity of f on E is not the same as continuity of the restriction f |E . For example, the function on R that is identically equal to one on Z and zero elsewhere is not continuous on Z, yet its restriction to Z is continuous (as a function with domain Z). From the sequential characterization of limit we have 3.3.2 Sequential Characterization of Continuity. A function f with domain D is continuous at a ∈ D iff f (an ) → f (a) for all sequences {an } in D with an → a. 3.3.3 Example. Let {r1 , r2 . . .} be an enumeration of the rationals in (0, 1). Define f on (0, 1) by f (rn ) = 1/n and f (x) = 0 if x is irrational. We use the sequential characterization of continuity to show that f is continuous precisely at the irrational numbers in (0, 1).

60

A Course in Real Analysis

Let x ∈ (0, 1) be rational. Choose a sequence {xn } of irrational numbers converging to x. Since f (xn ) = 0 for all n and f (x) 6= 0, f (xn ) 6→ f (x). Therefore, f is not continuous at any rational. Now let x ∈ (0, 1) be irrational and let {xn } be any sequence converging to x. If f (xn ) 6→ f (x), then there exists an N ∈ N and a subsequence {yn } of {xn } such that f (yn ) ≥ 1/N for all n. By definition of f , yn ∈ {r1 , r2 , . . . , rN }. But this implies that x ∈ {r1 , r2 , . . . , rN }, contradicting that x is irrational. (For a variation of this example, see Exercise 10.) ♦ The following is an immediate consequence of 3.1.12. 3.3.4 Theorem. Let f and g be functions with domain D, let α, β ∈ R and let a ∈ D. If f and g are continuous at a, then so are αf + βg, f g, f /g (the last provided that g(a) 6= 0). 3.3.5 Theorem. Let g : D → R and f : E → R with g(D) ⊆ E. If g is continuous at a ∈ D and f is continuous at g(a), then f ◦ g is continuous at a. Proof. Let b := g(a). Given ε > 0, choose η > 0 such that |f (y) − f (b)| < ε for all y ∈ E with |y − b| < η. Next, choose δ > 0 such that |g(x) − b| < η for all x ∈ D with |x − a| < δ. Then |x − a| < δ implies |f (g(x)) − f (b)| < ε. A more succinct proof uses the sequential characterization of continuity: an → a in D ⇒ g(an ) → g(a) ⇒ f g(an ) → f g(a) . Constant functions and the function f (x) = x are clearly continuous. It follows from 3.3.4 that polynomials and rational functions are continuous. Continuity of trigonometric, logarithmic, and exponential functions will follow from results in Chapter 4. Power functions xα := eα ln x are continuous as they are compositions of continuous functions. Of course, in each case the domain of the function must be carefully specified. It is possible that a function is nowhere continuous. The Dirichlet function (3.1.7) is an example. By contrast, we have 3.3.6 Theorem. A monotone function on an open interval I has at most countably many discontinuities. Proof. Assume without loss of generality that f is increasing on I. Let D denote the set of discontinuities of f on I. For each t ∈ I, let at = lim− f (x) and bt = lim+ f (x) x→t

x→t

and let It = (at , bt ). Clearly, It 6= ∅ iff t ∈ D (see Figure 3.3). Furthermore, by monotonicity, s < t ⇒ bs ≤ at . Therefore, the sets It are pairwise disjoint. For each t ∈ D, choose a rational number rt in It . Since the correspondence t → rt is one-to-one and the set of rationals is countable, D is countable.

Limits and Continuity on R

61

bt rt at bs rs as s

t

FIGURE 3.3: One-to-one correspondence between t ∈ D and rt ∈ Q.

Exercises 1.S Define

( mx + 3 f (x) = 3x2 + 7

if x < 2, if x > 2.

If f is continuous at x = 2, find the values of f (2) and m. 2. Find all values of a for which the following function is continuous on R. ( 3x2 + 5x − 7 if x < a f (x) = 2x2 + 2x + 3 if x ≥ a. 3. Let f : (a, b) → R and g : (b, c) → R be continuous and suppose that lim f (x) = lim+ g(x).

x→b−

x→b

Show that there exists a continuous function h : (a, c) → R such that h = f on (a, b) and h = g on (b, c). 4.S Let g be continuous on R and let d(x) be the Dirichlet function. Show that f (x) := g(x)d(x) is continuous precisely at the zeros of g. 5. Let f be defined on an open interval I and let c ∈ I. Show that f is continuous at c iff for each strictly increasing sequence {an } converging to c and each strictly decreasing sequence {bn } converging to c, f (an ) → f (c) and f (bn ) → f (c). 6. Let f be a continuous function on [a, b] and let {an } be a sequence in [a, b]. Prove: (a) f lim sup an ≤ lim sup f (an ). (b) f lim inf an ≥ lim inf f (an ). n→∞

n→∞

n→∞

n→∞

Show that equality holds in each case if f is increasing. Give examples to show that the inequalities may be strict.

62

A Course in Real Analysis 7. Let f1 , . . . , fn be continuous at x0 . Prove that the functions Mn (x) := max fj (x) 1≤j≤n

and mn (x) := min fj (x) 1≤j≤n

are continuous at x0 . Give examples to show that the corresponding result is not true for infinitely many functions, where max is replaced sup and min by inf. 8.S Let f : R → R be continuous at zero and satisfy f (x + y) = f (x) + f (y) for all x, y ∈ R. Prove that f (tx) = tf (x) for all t, x ∈ R. Conclude that f (x) = f (1)x for all x ∈ R. 9. A function f is right continuous at a if limx→a+ = f (a) and left continuous at a if limx→a− f (x) = f (a). (a) Prove that f is continuous at a iff f is both right and left continuous at a. (b) Prove that the greatest integer function bxc is right continuous on R but not left continuous at any integer. (c)S Let {cn } be any sequence in R. For x ∈ R define X f (x) = 2−n , n:cn ≤x

where the notation indicates that the sum, possibly infinite, is taken over all indices n for which cn ≤ x. (If there are no such indices, the sum is defined to be 0.) Prove that f is right continuous everywhere. Prove also that f is left continuous P∞ at a iff a is not equal to any cn . (Note that, because the series n=1 2−n converges, the order of summation is irrelevant (6.4.10). Thus f (x) is well-defined.) (d) Let f be increasing on an interval I. Define g on I by g(x) = lim+ f (t) = inf f (t). t→x

t>x

Prove that g is increasing and right continuous on I and that g is continuous at a iff f is continuous at a. 10. Define f : (0, 1) → R by ( 0 f (x) = 1/n

if x is irrational if x = m/n, reduced.

Use the sequential characterization of continuity to show that f is continuous precisely at the irrational numbers in (0, 1).

Limits and Continuity on R

63

11.S Let f : [0, 1] → R have the property that the limit g(x) := limt→x f (t) exists in R for all x ∈ [0, 1]. Prove that (a) g is continuous. (b) f has at most countably many discontinuities. Hint. For (a), use the sequential criterion. For (b), use ideas similar to those used in the proof of 3.3.6.

3.4

Properties of Continuous Functions

3.4.1 Extreme Value Theorem. If f is continuous on a closed bounded interval [a, b], then f has a maximum and a minimum; that is, there exist xm , xM ∈ [a, b] such that f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ [a, b]. Proof. We show first that f is bounded. Suppose, for instance, that f is not bounded above. Then for each n ∈ N there exists an ∈ [a, b] such that f (an ) > n. On the other hand, by the Bolzano–Weierstrass theorem, {an } has a convergent subsequence, say ank → x0 . But then, by continuity, nk < f (ank ) → f (x0 ) < +∞, impossible. Thus f must be bounded above. Similarly, f is bounded below. Now let M := sup{f (x) : x ∈ [a, b]}. By the first paragraph, M is finite. By the approximation property for suprema, there exists a sequence xn ∈ [a, b] such that f (xn ) → M . By the Bolzano–Weierstrass theorem again, there exists a subsequence xnk converging to some xM ∈ [a, b]. By continuity, f (xM ) = M Therefore, f (xM ) is the maximum of f . The proof for the minimum case is similar. The examples f (x) = 1/x on (0, 1) and f (x) = x on [0, +∞) show that the interval in the theorem must be both closed and bounded. 3.4.2 Definition. A function f is said to have the intermediate value property on an interval I if, for each a, b ∈ I with a < b and each y0 between f (a) and f (b), there exists an x0 ∈ (a, b) such that f (x0 ) = y0 . ♦ The intermediate value property simply asserts that f (I) is an interval whenever I is an interval. 3.4.3 Intermediate Value Theorem. A continuous function f on an interval I has the intermediate value property.

64

A Course in Real Analysis

Proof. Let a, b ∈ I with a < b and suppose that f (a) < y0 < f (b). The set E := {x ∈ [a, b] : f (x) < y0 } contains a and is bounded below, hence x0 := sup E exists and lies in [a, b]. By continuity of f at a, E contains an interval [a, a + δ), hence x0 > a. Since f (x) < y0 for all x ∈ E, 3.1.14 and the continuity of f at x0 imply that f (x0 ) = x→x lim f (x) ≤ y0 . 0

x∈E

In particular, x0 6= b. Similarly, since f (x) ≥ y0 for all x ∈ (x0 , b), f (x0 ) = lim+ f (x) ≥ y0 . x→x0

Therefore, y0 = f (x0 ). Figure 3.4 illustrates the proof.

f (b) f (x) y0 f (x) f (a) a

E

x

x0

x b

FIGURE 3.4: y0 = f (x0 ). Simple examples show that the continuity hypothesis is essential. Of course, there are many discontinuous functions that have the intermediate value property (see Exercise 5). Interestingly, all derivatives have the intermediate value property, whether they are continuous or not (Exercise 4.2.25). Thus a function without the intermediate value property cannot have an antiderivative. Combining the extreme and intermediate value theorems we obtain 3.4.4 Corollary. If f is continuous on [a, b], then f [a, b] = [f (xm ), f (xM )]. 3.4.5 Corollary (Existence of nth roots). For each b > 0 and n ∈ N, the equation xn = b has a unique positive solution. Proof. Let f (x) = xn . Since limx→+∞ xn = +∞, we may choose c > 0 such that f (c) > b > f (0) = 0. By the intermediate value theorem, the equation f (x) = b has a positive solution. By Exercise 3.1.12, xn is strictly increasing on (0 + ∞), hence the solution is unique. Here is another application of the intermediate value theorem.

Limits and Continuity on R

65

3.4.6 Example. The equation √ 2 x + sin (3x2 ) 5x2 + e2x+7 f (x) := + =0 (x − 1)3 (x − 2)5 has a solution x = x0 between 1 and 2. Indeed, since lim f (x) = +∞ and

x→1+

lim f (x) = −∞,

x→2−

there must exist 1 < a < b < 2 such that f (a) > 0 > f (b). By the intermediate value theorem, f (x0 ) = 0 for some x0 ∈ (a, b). ♦ Remark. The zeros of a continuous function f may be approximated using the interval halving method, reminiscent of the proof of the Bolzano–Weierstrass theorem: Suppose f (a) < 0 < f (b) so that a zero of f lies in (a, b). Bisect the interval [a, b] and compute the values of f at the endpoints of the resulting two intervals. If one of these values is zero, stop. If neither is zero, then for one of the intervals, denote it by [a1 , b1 ], the values of f at the endpoints have opposite signs. The intermediate value theorem then implies that a zero of f lies in (a1 , b1 ), and we may approximate the zero by either a1 or b1 . Continuing this process, we may (theoretically) approximate a zero of f to any desired degree of accuracy. The procedure is easily programmable. ♦

Exercises 1. Find an example of a bounded function on [0, 1] with a single discontinuity that has no maximum or minimum. 2.S Let f be continuous and positive on R with lim f (x) = 0. Prove that x→±∞

f has a maximum value on R.

3. Let f be continuous on R with lim f (x) = +∞. Prove that f has a x→±∞

minimum value on R.

4. A function f defined on an interval J and taking values in R is said to be upper (lower) semicontinuous at x0 ∈ J if f (x0 ) ≥ lim sup f (x) f (x0 ) ≤ lim inf f (x) , x→x0

x→x0

where the limits are one-sided if x0 is an endpoint of J. If f is upper (lower) semicontinuous at each point of J, then f is said to be upper (lower) semicontinuous on J (a) Prove that f is upper semicontinuous at x0 iff −f is lower semicontinuous at x0 . (b) Prove that f is continuous at x0 iff it is both upper and lower semicontinuous at x0 .

66

A Course in Real Analysis (c) Show that, at any integer n, bxc is upper semicontinuous but not lower semicontinuous. (d) Let f (x) = sin (1/x), x 6= 0, and f (0) = a. Show that f is upper (lower) semicontinuous at 0 iff a ≥ 1 (a ≤ −1). (e)S Let fi be defined on J and upper semicontinuous at x0 for every i in some index set I. Define f (x) = inf i∈I fi (x), x ∈ J. Show that f is upper semicontinuous at x0 . Give an example to show that f may not be continuous at x0 even if each fi is continuous on J. (f) (Semi-extreme value property) Prove: If f is upper (lower) semicontinuous at each point of [a, b], then f is bounded above (below) on [a, b] and there exists x0 ∈ [a, b] such that f (x0 ) ≥ f (x) (f (x0 ) ≤ f (x)) for all x ∈ [a, b]. 5. Give an example of a function on [0, 1] with the intermediate value property that is (a) discontinuous at precisely the points 1/n, n = 1, 2, . . . . (b)S discontinuous everywhere. 6. Prove that a polynomial P of odd degree maps R onto R. In particular, P has a real zero. 7. Use the intermediate value theorem to show that each of the following equations has a solution in the indicated interval I. (a) ln x + x = e, I = (1, e). (b) sin x = ax, I = (π/2, π), 0 < a < 2/π. (c)S tan x = x, I = (nπ, (n + 1/2)π), n ∈ N. (d) ex = 4.82 sin x, I = (0, π/2) and I = (π/2, π). (e)

x4 + x2 + 1 x3 + 1 e−x + x + + = 0, I = (−1, 0) and I = (0, 1). x+1 x x−1

(f)S

e1−x − x2 2x2 − 5 = , I = (0, π/2). sin x cos x

2

8. Prove that the equation ex = xn (n ∈ N) has a solution in R iff n ≥ 3. Hint. Find the minimum of ex /xn on (0, +∞). 9.S Let f : [a, b] → [a, b] be continuous. Prove that there exists x ∈ [a, b] such that f (x) = x. 10. Prove that if n ∈ N is odd, then every real number has a unique nth root. 11. Let f be continuous and nonzero on R. Let a0 be arbitrary and define {an } recursively by an = an−1 + f (an−1 ), n ≥ 1. Show that either an ↑ +∞ or an ↓ −∞.

Limits and Continuity on R

3.5

67

Uniform Continuity

Recall that a function f is continuous on a set E if for each y ∈ E and each ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all x in the domain of f with |x − y| < δ. The number δ typically depends on both ε and y. Removing the dependence on y results in the notion of uniform continuity: 3.5.1 Definition. A function f is said to be uniformly continuous on a subset E of the domain of f if for each ε > 0 there exists δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ.

♦

The following result is frequently useful in determining whether or not a function is uniformly continuous. 3.5.2 Sequential Characterization of Uniform Continuity. A function f is uniformly continuous on E iff f (xn ) − f (yn ) → 0 for all sequences {xn } and {yn } in E with xn − yn → 0. Proof. Let f be uniformly continuous on E and let {xn } and {yn } be sequences in E with xn − yn → 0. Given ε > 0, choose δ > 0 so that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ. Next, choose N ∈ N such that |xn − yn | < δ for all n ≥ N . For such n, |f (xn ) − f (yn )| < ε. Thus f (xn ) − f (yn ) → 0. Now assume that f is not uniformly continuous on E. Then there exists an ε > 0 and sequences xn , yn ∈ E with |xn − yn | < 1/n and |f (xn ) − f (yn )| ≥ ε. Then xn − yn → 0 but f (xn ) − f (yn ) 6→ 0, so f does not satisfy the sequential condition. 3.5.3 Example. The function f (x) = 1/x, x > 0, is uniformly continuous on intervals of the form [r, +∞), r > 0, as may be seen from the inequality |f (x) − f (y)| =

|x − y| |x − y| ≤ , x, y ≥ r. xy r2

However, f is not uniformly continuous on (0, +∞). Indeed, if xn = 1/2n and yn = 1/n, then xn − yn → 0 yet f (xn ) − f (yn ) = n → +∞. ♦ 3.5.4 Theorem. Let f , g be uniformly continuous on E and let α, β ∈ R. Then (a) αf + βg is uniformly continuous on E. (b) If f and g are bounded, then f g is uniformly continuous on E. (c) If g 6= 0 and 1/g is bounded on E, then 1/g is uniformly continuous on E.

68

A Course in Real Analysis

Proof. Part (a) follows easily from the sequential characterization of uniform continuity. For (b), let M > 0 such that |f (x)|, |g(x)| ≤ M for all x ∈ E. Uniform continuity of f g then follows from the inequalities |f (x)g(x) − f (y)g(y)| ≤ |f (x)g(x) − f (y)g(x)| + |f (y)g(x) − f (y)g(y| ≤ M |f (x) − f (y)| + M |g(x) − g(y)|. For (c), choose K > 0 such that 1/|g(x)| < K for all x ∈ E. Uniform continuity of 1/g then follows from 1 1 |g(x) − g(y)| 2 − g(x) g(y) = |g(x)g(y)| ≤ K |g(x) − g(y)|, x, y ∈ E. The following theorem may be given a short proof based on the sequential criterion for uniform continuity. We leave the details to the reader. 3.5.5 Theorem. Suppose that g is uniformly continuous on D, f is uniformly continuous on E, and g(D) ⊆ E. Then f ◦ g is uniformly continuous on D. The next theorem shows that on closed and bounded intervals the notions of continuity and uniform continuity coincide. 3.5.6 Theorem. If f is continuous on a closed bounded interval [a, b], then f is uniformly continuous there. Proof. We use the sequential characterization of uniform continuity. Let {xn } and {yn } be sequences in [a, b] with xn − yn → 0. Suppose, for a contradiction, that f (xn ) − f (yn ) 6→ 0. Then |f (xn ) − f (yn )| > ε for some ε > 0 and infinitely many n and hence for a subsequence of {n}. Changing notation if necessary, we may suppose that the inequality holds for all n. By the Bolzano– Weierstrass theorem, {xn } has a convergent subsequence, say xnk → x0 . Since xnk − ynk → 0, ynk → x0 . But then, by continuity, |f (xnk ) − f (ynk )| → 0, which is impossible. The connection between continuity and uniform continuity on open intervals is more complicated. For this, we need the following definitions. 3.5.7 Definition. A continuous function f on D is said to have a continuous extension to a set D1 ⊇ D if there exists a continuous function f1 : D1 → R such that f1 |D = f . In the special case D1 = D ∪ {a}, where a 6∈ D, f (x) is said to have a removable discontinuity at x = a. ♦ 3.5.8 Proposition. Let f be defined and continuous on D and let a be an accumulation point of D, a 6∈ D. Then f has a removable discontinuity at x = a iff L := lim{x→a, x∈D} f (x) exists in R. Proof. The necessity is clear. For the sufficiency, simply set f (a) = L to obtain a continuous extension of f to D ∪ {a}.

Limits and Continuity on R

69

For example, the functions 1 x sin , x

sin x , x

and

x p |x|

defined for x 6= 0, have removable discontinuities at x = 0 and hence have unique continuous extensions to R. On the other hand, since limx→0+ sin(1/x) does not exist, the function sin(1/x) does not have a removable discontinuity at x = 0. The following theorem is the main result regarding uniform continuity of functions on bounded open intervals. 3.5.9 Theorem. Let f be continuous on the bounded interval (a, b). The following statements are equivalent: (a) limx→a+ f (x) and limx→b− f (x) exist in R. (b) f has a continuous extension to [a, b]. (c) f is uniformly continuous on (a, b). Proof. (a) ⇒ (b) is immediate from 3.5.8. (b) ⇒ (c): By 3.5.6, a continuous extension g of f to [a, b] is uniformly continuous. Therefore, f = g|(a,b) is uniformly continuous. (c) ⇒ (a): Let {an } be any sequence in (a, b) converging to a. Then {an } is Cauchy and since f is uniformly continuous, {f (an )} is Cauchy (Exercise 7). Therefore, L := limn→∞ f (an ) exists. We claim that limx→a+ f (x) exists and equals L. To see this, let {a0n } be any sequence in (a, b) converging to a. Then an − a0n → 0, hence, by uniform continuity, f (an ) − f (a0n ) → 0, so f (a0n ) → L. By the sequential characterization of limit (3.1.9), limx→a+ f (x) = L. A similar argument shows that limx→b− f (x) exists. For example, since sin(1/x) has no continuous extension to [0, 1], it is not uniformly continuous on (0, 1]. On the other hand, for any p > 0, limx→0+ xp sin(1/x) = 0, hence xp sin(1/x) is uniformly continuous on (0, 1]. For another example, consider f (x) = (1 − cos x)/x on R \ {0}. By l’Hospital’s rule, proved in the next chapter, limx→0 f (x) = limx→0 sin x = 0, hence f has a continuous extension to R. Moreover, since limx→±∞ f (x) = 0, f is uniformly continuous on R (Exercise 5). 3.5.10 Corollary. A bounded, continuous, monotone function f on a bounded interval (a, b) is uniformly continuous there. Proof. By 3.1.17, limx→a+ f (x) and limx→b− f (x) exist in R. The following result relies on the mean value theorem proved in the next chapter. 3.5.11 Theorem. If f has a bounded derivative on an interval I, then f is uniformly continuous on I.

70

A Course in Real Analysis

Proof. Let M be a bound for |f 0 | on I. By the mean value theorem, for any x, y ∈ I there exists a z between x and y such that f (x) − f (y) = f 0 (z)(x − y). Thus |f (x) − f (y)| ≤ M |x − y|, which implies uniform continuity. For example, sinn x and cosn x have bounded derivatives for every n ∈ N, hence are uniformly continuous on R. This also follows from periodicity (see Exercise 11). On the other hand, xp is not uniformly continuous on (0, +∞) for p > 1. Indeed, if xn = n + n(1−p)/2 and yn = n, then, by the mean value theorem, for each n there exists zn ∈ (yn , xn ) such that xpn − ynp =

pznp−1 ≥ pn(p−1)/2 → +∞. n(p−1)/2

Since xn − yn → 0, 3.5.2 implies that xp is not uniformly continuous.

Exercises 1.S Find functions f and g with f continuous and g uniformly continuous such that neither f ◦ g nor g ◦ f is uniformly continuous. 2. Let r > 0. Show that the function f (x) = (3x + 2)/(2x − 1) in 3.1.4 is uniformly continuous on Dr but not on its domain D, where Dr := (−∞, 1/2 − r] ∪ [1/2 + r, +∞) and D = (−∞, 1/2) ∪ (1/2, +∞). 3. Let a, b > 0. Give a careful ε, δ proof that each of the following functions is uniformly continuous on R. √ √ (b) 1/ ax2 + b. (c) |ax + b|. (a)S ax2 + b. 4. Show that ln x is uniformly continuous on (r, +∞) for every r > 0 but is not uniformly continuous on (0, 1). 5. Let f be continuous on [0, ∞). Prove that if limx→+∞ f (x) exists and is finite, then f is uniformly continuous on [0, +∞). Give an example of a bounded continuous function on [0, +∞) that is not uniformly continuous. 6. Prove that each of the following functions is uniformly continuous on the indicated interval, where n ∈ N: (a) sin(1/x), [r, +∞), r > 0.

(b) x sin(1/x), [0, +∞).

(c) arctan x, (−∞, +∞). p (e) cos x2 + 1, (−∞, +∞).

(d) xn e−x , [0, +∞).

(g) (1 + xn )1/n , [0, +∞).

(h) (1 + xn )−1/n , [0, +∞).

(f) xp , 0 < p ≤ 1, [0, +∞).

7.S Let f be uniformly continuous on E and let {an } be a Cauchy sequence in E. Prove that {f (an )} is Cauchy.

Limits and Continuity on R

71

8. Suppose that f (x) is uniformly continuous on [0, +∞). Prove that the function g is uniformly continuous on R, where ( f (x) if x ≥ 0, g(x) := f (−x) if x < 0. 9.S Let f be uniformly continuous on R. Prove that f (|x|), |f (x)|, and |f (|x|)| are uniformly continuous on R. 10. Let f be uniformly continuous on each of the intervals (a, b) and (c, d), where a < b < c < d. Prove that f is uniformly continuous on the set (a, b) ∪ (c, d). What if b = c? 11. Let f : R → R be periodic with period p > 0, that is, f (x + p) = f (x) for all x ∈ R. If f is continuous on [0, p], prove that f is uniformly continuous and bounded on R. 12. Let f1 , . . . , fn be uniformly continuous on E. Prove that the functions M (x) := max fj (x) and m(x) := min fj (x) 1≤j≤n

1≤j≤n

are uniformly continuous on E. 13.S Find all values of p > 0 for which the function f (x) = x−p sin x, x > 0, has a continuous extension to [0, +∞). Prove that for all such p the extension is uniformly continuous. 14. Let r > 0. Prove that f (x) := sin(xp ) is uniformly continuous on (r, +∞) iff p ≤ 1. 15.S Prove that a uniformly continuous function f on a bounded interval (a, b) is bounded. Give examples to show that the result is not true if (a, b) is unbounded or if f is merely continuous. 16. Give examples to show that parts (b) and (c) of 3.5.4 are not necessarily true if the boundedness conditions are removed. 17. Let f be continuous on [a, b]. Prove that g(x) := sup f (t) a≤t≤x

is continuous on [a, b]. 18.S Let

f (x) = (1 − e1/x )−1 , x 6= 0.

Is it possible to define f (0) so that f is continuous on R? What about for the function g(x) = x(1 − e1/x )−1 , x 6= 0?

Chapter 4 Differentiation on R

The notion of rate of change of one quantity with respect to another is fundamental to many disciplines. It is expressed mathematically as the derivative of a function. In this chapter we establish the main properties of this important construct.

4.1

Definition of Derivative and Examples

4.1.1 Definition. A real-valued function f defined in a neighborhood of a ∈ R is said to be differentiable at a if the limit df f (x) − f (a) f (a + h) − f (a) 0 f (a) = Df (a) = := lim = lim x→a h→0 dx a x−a h exists in R. The limit is then called the derivative of f at a. If f is differentiable at each member of a set E, then f is said to be differentiable on E and the function df f 0 = Df = dx is called the derivative of f on E. If f 0 is continuous on E, then f is said to be continuously differentiable on E. ♦ It follows immediately from the definition that the derivative of a constant function is 0. Here are some nontrivial examples. 4.1.2 Example. We prove the following special cases of the power rule (the general power rule will be proved later): Let n ∈ N and r = n or 1/n. Then Dxr = rxr−1 . (In the second case x 6= 0, and x > 0 if n is even.) The case r = n is obtained by letting h → 0 in the identity n

X (x + h)n − xn = (x + h)n−j xj−1 h j=1 73

74

A Course in Real Analysis

(Exercise 1.2.4.) Each term in the sum tends to xn−1 , and since there are n terms the formula follows. For the case r = 1/n we use the identity X −1 n (x + h)1/n − x1/n 1−j/n (j−1)/n = (x + h) x h j=1 (Exercise 1.4.15). As h → 0, the term in square brackets tends to nx1−1/n , verifying the formula. ♦ For the next example, and indeed for the remainder of the book, we shall use the standard definitions of cosine and sine as coordinates of points on the unit circle.1 From this one can derive the usual trigonometric identities, which we shall invoke as needed.

1 sin h tan h h h cos h

1

FIGURE 4.1: sin h < h < tan h. 4.1.3 Example. D sin x = cos x. From the identity sin2 h + cos2 h = 1 and the inequalities sin h < h < tan h, 0 < h < π/2, which may be derived with the help of Figure 4.1, we see that p p sin h 1 − h2 < 1 − sin2 h = cos h < < 1, 0 < h < π/2. h

(4.1)

Since sin(−h) = − sin h and cos(−h) = cos h, (4.1) holds for −π/2 < h < 0 as well. By the squeeze principle, lim cos h = lim

h→0

h→0

sin h = 1. h

From this and the calculation cos h − 1 cos2 h − 1 = =− h h(cos h + 1)

sin h h

2

h (cos h + 1)

1 A more rigorous approach to the calculus of trigonometric functions may be based on the inverse sine function. This approach is described briefly in Section 4.4.

Differentiation on R we see that lim

h→0

75

cos h − 1 = 0. h

Therefore, sin(x + h) − sin x sin x cos h + cos x sin h − sin x = h h cos h − 1 sin h = sin x + cos x h h → cos x as h → 0.

♦

It is occasionally necessary to consider one-sided derivatives, which are defined by using one-sided limits in 4.1.1. Specifically, the left-hand and righthand derivatives are, respectively, f (x) − f (a) and x−a f (x) − f (a) . Dr f (a) = fr0 (a) := lim+ x→a x−a D` f (a) = f`0 (a) := lim− x→a

From the general theory of limits, a function is differentiable at a iff it has equal right-hand and left-hand derivatives at a. For example, at x = 0 the function f (x) = |x| has right-hand derivative 1 and left-hand derivative −1 and so is not differentiable there. Although we shall have no need to do so, one may even consider the more general expressions lim inf x→a x∈E

f (x) − f (a) f (x) − f (a) and lim sup , x→a x−a x−a x∈E

where a is an accumulation point of E. The so-called Dini derivates are obtained by taking E to be intervals of the form (c, a) and (a, c). The following proposition provides a useful characterization of differentiability. It asserts that for x near a, f (x) is approximated by the linear function y = f (a) + f 0 (a)(x − a), the equation of the tangent line at a. 4.1.4 Proposition. Let f be defined in a neighborhood N (a) of a. Then f is differentiable at a iff there exists a function η on N (a), continuous at a, such that f (x) = f (a) + η(x)(x − a) for all x ∈ N (a). In this case, f 0 (a) = η(a). Proof. If such a function η exists, then f (x) − f (a) = η(x) → η(a) as x → a, x−a

76

A Course in Real Analysis

hence f 0 (a) exists and equals η(a). Conversely, if f is differentiable at a, define f (x) − f (a) if x ∈ N (a) \ {a}, η(x) = x−a f 0 (a) if x = a. Then η has the required properties. 4.1.5 Corollary. If f is differentiable at a, then f is continuous there. Proof. Simply note that f (x) = f (a) + η(x)(x − a) → f (a) as x → a. The example |x| considered above shows that the converse of the corollary is false: |x| is continuous at 0 but not differentiable there. It is a remarkable fact that there are continuous functions on R that are nowhere differentiable (see 8.9.7). 4.1.6 Theorem. If c ∈ R and f and g are differentiable a, then so are f + g, cf , f g, and f /g, the last provided that g(a) 6= 0. Moreover, in this case, (a) (f + g)0 (a) = f 0 (a) + g 0 (a), (c) (f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a),

(b) (cf )0 (a) = cf 0 (a), 0 f g(a)f 0 (a) − f (a)g 0 (a) (d) (a) = . g g 2 (a)

Proof. We prove only (d). Let h = f /g. Since g is continuous at a and g(a) 6= 0, h is defined in a neighborhood N (a) on which g is not 0. For x ∈ N (a) \ {a}, a little algebra shows that g(a) h(x) − h(a) = x−a

f (x) − f (a) g(x) − g(a) − f (a) x−a x−a . g(x)g(a)

Letting x → a, using the continuity of g at a, yields (d). The preceding theorem, together with 4.1.2 and 4.1.3, show that polynomials, rational functions, and trigonometric functions are differentiable. (See Exercise 2.) The following important result will yield additional examples. 4.1.7 Chain Rule. Let g be differentiable at a and let f be differentiable at g(a). Then f ◦ g is differentiable at a and (f ◦ g)0 (a) = f 0 (g(a))g 0 (a). Proof. Set b := g(a). By 4.1.4, there exists a function η, defined in a neighborhood N (b) of b and continuous at b with η(b) = f 0 (b), such that f (y) = f (b) + η(y)(y − b), y ∈ N (b).

(4.2)

Since g is continuous at a, we may choose a neighborhood N (a) of a such that g(N (a)) ⊆ N (b). Then f ◦ g is defined on N (a), and by (4.2) f (g(x)) − f (g(a)) g(x) − g(a) = η(g(x)) , x ∈ N (a) \ {a}. x−a x−a Letting x → a produces the desired result.

Differentiation on R

77

The formula (f ◦ g)0 (x) = f 0 (g(x))g 0 (x) is sometimes easier to apply when written in Leibniz notation as dy du dy = , where y = f (u) and u = g(x). dx du dx 4.1.8 Example. The power rule Dxr = rxr−1 , r ∈ Q, follows from 4.1.2 and the chain rule: Let r = m/n, m, n ∈ N, and set u = x1/n and y = um . Then y = xr and dy dy du 1 m m/n−1 = = mum−1 x1/n−1 = x = rxr−1 . dx du dx n n The case r < 0 may be verified using the quotient rule.

♦

Higher order derivatives of y = f (x) are defined inductively by f 00 = D2 f = .. . f (n) = Dn f =

d dy d2 y := , dx2 dx dx dn f d dn−1 f := . dxn dx dxn−1

By convention, we set f (0) = D0 f := f .

Exercises 1. Use the limit definition to find the derivative of √ 1 (c) 2 . (a) x2 + x + 1. (b)S 2x + 1. x +1

(d)S √

1 . 3x + 2

2. Use the techniques of 4.1.3 to find the derivative of cos x. Use rules of differentiation to obtain the derivatives of tan x, cot x, sec x, and csc x. 3. Use rules of differentiation to find f 0 for each of the functions f : 2/3 2 √ √ 2x + 5 x −1 5 S 3 S (a) 5x + 7 3x + 2. (b) . (c) sin . 7x + 2 x2 + 1 q √ sin2 x − 1 (d) . (e) tan cos(1/x) . (f) ax + bx + c. 2 sin x + 1 4. Assuming that y is a differentiable function of x that satisfies the given dy equation, use the rules of differentiation to find : dx (a) x3 + y 3 − xy = 1. (b) S sin(xy 2 ) + x2 = 1. (c) tan(x + y) + y 2 = x.

78

A Course in Real Analysis 5. Let f (x) = xn |x|, n ∈ N. Find f (n−1) and f (n) . 6. Let f (x) = xm bxc, m ∈ N. Find f`0 (n) and fr0 (n), n ∈ Z. 7.S Find all values of a, b, such that f 0 exists on R, where ( ax2 + bx + a/x if x > 1, f (x) = x3 if x ≤ 1. 8. Find all values of a, b, and c such that f 0 is continuous on (0, +∞), where ( ax2 + bx if x > 1, f (x) = √ c x if 0 < x ≤ 1. 9. Let

( f (x) =

ax2 + bx + c if x > 1, x3 if x ≤ 1.

Find all values of a, b, and c such that (a) f is continuous on R.

(b) f is differentiable on R.

(c) f is continuous on R.

(d) f 00 exists on R.

0

10. Find all values of c such that f 0 (c) exists, where ( ax − 4 if x > c, f (x) = 9x2 if x ≤ c. Is f 0 continuous at these values? 11. Let f be differentiable at a. Use the limit definition of derivative to calculate f (a + 5 sin h) − f (a + 2 sin h) f (a + h2 ) − f (a − h) . (b)S lim . h→0 h→0 h h

(a) lim

12. Let g be differentiable on an open interval I and let f (x) = g(x)d(x), where d(x) is the Dirichlet function (3.1.7). Let a be a zero of g. Prove that f 0 (a) exists iff a is a zero of g 0 . 13. Let f be differentiable at c and let {an } and {bn } be sequences such that an < c < bn and an , bn → c. Prove that f (bn ) − f (an ) = f 0 (c). n→∞ bn − an lim

14.S Let f be differentiable and increasing on (a, b). Prove that f 0 (x) ≥ 0 for all x ∈ (a, b).

Differentiation on R

79

15. Let f be differentiable at a and nonnegative in a neighborhood of a with f (a) = 0. Prove that f 0 (a) = 0. 16.S Prove Leibniz’s rule: If f and g are n times differentiable, then D (f g) = n

n X n k=0

k

(Dk f ) (Dn−k g).

17. Prove that if f has right-hand and left-hand derivatives at a (not necessarily equal), then f is continuous at a. 18. Assuming that f , g, and h have the necessary differentiability, find general formulas for (a) D f ◦ (gh) . (b) D f ◦ (g/h) . (c)S D2 f ◦ g . (d) D f ◦ g ◦ h . 19. Find a formula for the nth derivative of √ (a)S 1/x. (b) 1/ x. (c) xex .

(d) xe−x .

20. Find all values of p ∈ R for which the function ( |x|p sin(1/x) if x 6= 0, f (x) = 0 otherwise is (a) continuous, (b) differentiable, (c) continuously differentiable onR. 21.S Define f (0) = 0 and f (x) = xm sin xn , x 6= 0, where m ∈ Z, n ∈ N. For what values of m and n does f 0 (0) exist? For which of these values is f 0 continuous on R? 22. A function f defined on a symmetric neighborhood (−a, a) of 0 is said to be odd if f (−x) = −f (x) and even if f (−x) = f (x). (a) Prove that any function h : (−a, a) → R is the sum of an even function f and an odd function g. (b) Prove that if f is differentiable and odd (even), then f 0 is even (odd). (c) Is the converse true? That is, if f 0 is even (odd), is f odd (even)? 23.S Let fj , gj , and hj be differentiable, j = 1, 2, 3. Prove that f1 g1 f1 g1 h1

f2 g2 h2

0 0 f f2 = 1 g2 g1

0 0 f1 f3 g3 = g1 h1 g3

f20 g2 h2

f20 f1 f2 + and g2 g10 g20 f30 f1 f2 f3 f1 g3 + g10 g20 g30 + g1 h3 h1 h2 h3 h01

f2 g2 h02

f3 g3 . h03

80

4.2

A Course in Real Analysis

The Mean Value Theorem

The mean value theorem relates the average rate of change of a function to its instantaneous rate of change. It is one of the most useful theorems in analysis and will play a central role in the proof of the fundamental theorem of calculus in Chapter 5. The proof of the mean value theorem is based on the existence of local extrema. 4.2.1 Definition. A function f is said to have a local maximum (local minimum) at c if f is defined on an open interval I containing c and f (x) ≤ f (c) (f (x) ≥ f (c)) for all x ∈ I. In either case, f is said to have a local extremum at c. ♦

f

c1

c2

x

FIGURE 4.2: Local extrema of f . 4.2.2 Local Extremum Theorem. If f has a local extremum at c and if f is differentiable at c, then f 0 (c) = 0. Proof. Suppose that f has a local maximum at c. Let I be an open interval containing c such that f (x) ≤ f (c) for all x ∈ I. Then ( f (x) − f (c) ≥ 0 if x ∈ I and x < c x−c ≤ 0 if x ∈ I and x > c. It follows that the left-hand derivative of f at c is ≥ 0 and the right-hand derivative is ≤ 0, hence f 0 (c) = 0. The proof for the local minimum case is similar. 4.2.3 Rolle’s Theorem. Let f be continuous on [a, b] and differentiable on (a, b). If f (a) = f (b), then there exists a point c ∈ (a, b) such that f 0 (c) = 0. Proof. By the extreme value theorem there exist xm , xM ∈ [a, b] such that f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ [a, b]. If f (xm ) = f (xM ), then f is a constant function and the assertion of the theorem holds trivially. If f (xm ) 6= f (xM ), then either xm ∈ (a, b) or xM ∈ (a, b), and the conclusion follows from the local extremum theorem.

Differentiation on R

81

The following result is the key ingredient in the proof of l’Hospital’s rule in Section 4.5. 4.2.4 Cauchy Mean Value Theorem. Let f and g be continuous on [a, b] and differentiable on (a, b). Then there exists a point c ∈ (a, b) such that [f (b) − f (a)]g 0 (c) = [g(b) − g(a)]f 0 (c). Proof. The function h(x) := [f (b) − f (a)]g(x) − [g(b) − g(a)]f (x) is continuous on [a, b], differentiable on (a, b), and satisfies h(a) = h(b). By Rolle’s theorem, h0 (c) = 0 for some c ∈ (a, b), which is the assertion of the theorem. y

y (f (c), g(c))

(f (b), g(b))

(f (a), g(a)) x

(a)

a

x

c

(b)

b

FIGURE 4.3: (a) Cauchy mean value theorem. (b) Mean value theorem. If f (a) 6= f (b) and f 0 (x) 6= 0 on (a, b), then the conclusion of 4.2.4 may be written g(b) − g(a) g 0 (c) = 0 . f (b) − f (a) f (c) For smooth functions f and g, this equation asserts that at some point f (c), g(c) on the curve given parametrically by the equations x = f (t) and y = g(t), the line through the endpoints (f (a), g(a)) and (f (b), g(b)) is parallel to the line tangent to the curve at f (c), g(c) . See Figure 4.3(a). Taking g(x) = x in the Cauchy mean value theorem yields the standard mean value theorem (Figure 4.3(b)): 4.2.5 Mean Value Theorem. If f is continuous on [a, b] and differentiable on (a, b), then there exists c ∈ (a, b) such that f (b) − f (a) = f 0 (c). b−a

82

A Course in Real Analysis

4.2.6 Corollary. Let f (x) and g(x) be differentiable on an open interval I such that f 0 (x) = g 0 (x) for all x ∈ I. Then there exists a constant k such that f = g + k on I. Proof. Let a, b ∈ I. By the mean value theorem applied to h := f − g, there exists c ∈ (a, b) such that h(a) − h(b) = h0 (c)(a − b). Since h0 = 0, h(a) = h(b). Since a and b were arbitrary, h must be constant. 4.2.7 Corollary. Let f be differentiable on an open interval I. (a) If f 0 ≥ 0 (f 0 > 0) on I, then f is increasing (strictly increasing) on I. (b) If f 0 ≤ 0 (f 0 < 0) on I, then f is decreasing (strictly decreasing) on I. Proof. We prove (a) for the strictly increasing case. Let a, b ∈ I, a < b. By the mean value theorem, f (b) − f (a) = f 0 (c)(b − a) for some c ∈ (a, b). Since f 0 (c) > 0, f (b) > f (a).

Exercises 1.S Show that cos x = (0, π/2).

√

x − 1 has exactly one solution x in the interval

2. Find an interval I such that for each c ∈ I, sin x = x2 /2 + x + c has exactly one solution x in the interval (0, π/2). 3.S Show that f (x) = x4 − 4x3 + 4x2 + c has at most one zero in the interval (1, 2). For what interval of values of c does f have exactly one zero in (1, 2)? 4. Let f have k derivatives and n distinct zeros on an interval I. Prove that f (k) has at least n − k distinct zeros in I. 5. Let f have a continuous second derivative on [−1, 3], f (1) = 0, and set g(x) = x2 f (x). Prove that g 00 has at least one zero in [−1, 2]. Hint. Consider the function gn (x) := x(x + 1/n)f (x). 6. Let P (x) be a polynomial of degree n and let a 6= 0. Prove that the equation eax = P (x) has at most n + 1 solutions. 7.S Let P (x) be a polynomial of degree n and let a 6= 0. Prove that the equation sin(ax) = P (x) has at most n + 1 solutions. 8. Prove Bernoulli’s inequality: (1 + x)r ≥ 1 + rx for all x ≥ −1 and all rational numbers r ≥ 1. (Cf. Exercise 1.5.10.) 9.S Let f and g be continuous on [a, b] and differentiable on (a, b) such that |f 0 | ≤ |g 0 |. If g 0 is never zero on (a, b), prove that |f (x) − f (y)| ≤ |g(x) − g(y)| for all x, y ∈ [a, b].

Differentiation on R

83

10. Let f and g be differentiable on an open interval I and let a, b ∈ I with a < b. Prove that if f (a) = g(a) and f 0 > g 0 on (a, b), then f > g on (a, b). Use this to show that (a) ln x < x − 1 on the interval (1, +∞). (b) sin x < x on the interval (0, π/2). (c) cos x > 1 − x on the interval (0, π/2). (d) tan x > x on the interval (0, π/2). (e) ex > 1 + x + x2 /2! + · · · + xn /n! on the interval (0, +∞). (Use induction.) 11.S Show that

sin x is a decreasing function on (0, π/2). x

12. Show that on (0, π/2) (a) x sin x + cos x > 1.

(b) x sin x + p cos x < p, p ≥ 2.

(c) x

(d) x−2 (1 − cos x) is decreasing.

−1

(1 − cos x) is increasing.

13. Let a, b, p > 0, and for x ≥ 0 define f (x) = ap + xp − (a + x)p . Show that for x > 0, ( > 0 if 0 < p < 1, f 0 (x) < 0 if p > 1. Conclude that ( (a + b)

p

< ap + bp > ap + bp

if 0 < p < 1, if p > 1.

14. Let f and g have derivatives of order n on an open interval I and let a ∈ I. Suppose that f (j) (a) = g (j) (a) = 0, j = 0, . . . , n − 1, and f (j) (x)g (j) (x) 6= 0 for x > a and j = 0, . . . , n. Prove that for any b ∈ I with b > a there exists c ∈ (a, b) such that f (b) f (n) (c) = (n) . g(b) g (c) 15. Suppose that f has a local maximum at c. Prove that lim inf − x→c

f (x) − f (c) f (x) − f (c) ≥ 0 ≥ lim sup . x−c x−c x→c+

84

A Course in Real Analysis

16. Let f and g be continuous on [a, b], differentiable on (a, b) and let f (a) = f (b) = 0. Show that there exists c ∈ (a, b) such that f 0 (c) = g 0 (c)f (c). 17.S Show that for any polynomial P (x) there exist finitely many intervals with union R such that P is strictly monotone on each interval. 18. Suppose that f has the property |f (x) − f (y)| ≤ c|x − y|1+ε for all x, y ∈ R, where c, ε > 0. Prove that f is constant. 19.S Let f have a bounded derivative on R. Prove that for sufficiently large r the function g(x) := rx + f (x) is one-to-one and maps R onto R. 20. Suppose f > 0 on (1, +∞) and limx→+∞ xf 0 (x)/f (x) ∈ (1, +∞). Prove that x/f (x) is decreasing on (b, +∞) for some b > 1. 21. Let f be twice differentiable on (0, a), f 00 ≥ 0, and limx→0+ f (x) = 0. Prove that f (x)/x is increasing on (0, a). Show that the conclusion is false if the hypothesis f 00 ≥ 0 is dropped. 22.S Let g(x) = x2 sin(1/x) if x 6= 0 and g(0) = 0. Set f (x) = x + g(x). Show that f 0 (0) > 0 but f is not monotone on any neighborhood of 0. 23. Let limx→+∞ f 0 (x) = 0. Prove that if g ≥ c > 0 on (a, +∞), then lim f x + g(x) − f (x) = 0. x→+∞

24. Let f be differentiable on R with supx∈R |f 0 (x)| < 1. Prove that the sequence {xn } defined by xn+1 = f (xn ) converges, where x1 is arbitrary. Conclude that f has a unique fixed point; that is, there exists a unique x ∈ R such that f (x) = x. 25.S Suppose f is differentiable on an open interval I. Show that f 0 has the intermediate value property. Conclude that if f 0 (x) 6= 0 on I, then f is strictly monotone on I. Hint. Apply the extreme value theorem to the function g(x) = f (x) − y0 (x − a), a ≤ x ≤ b. 26. Let f be differentiable on I := (1, +∞). Prove that if f 0 has finitely many zeros in I, then limx→+∞ f (x) exists in R. 27. Let f and g have continuous derivatives on an interval I with g 0 6= 0 and let aj , bj ∈ I with aj < bj , j = 1, . . . , n. Prove that there exists c ∈ I such that n X j=1

[f (bj ) − f (aj )]g 0 (c) =

n X j=1

[g(bj ) − g(aj )]f 0 (c).

Differentiation on R

85

28.S A function f is said to be uniformly differentiable on an open interval I if, given ε > 0, there exists δ > 0 such that f (x) − f (y) 0 0, there exists a δ > 0 such that f (x) − f (y) f 0 (y) g(x) − g(y) − g 0 (y) < ε, for all x and y in I with 0 < |x − y| < δ. 30. Let f be differentiable on [a, +∞) and suppose that the zeros of f 0 form a strictly increasing sequence an ↑ +∞. Prove that if L := limn f (an ) exists in R, then limx→+∞ f (x) = L. 31.S Prove that a function f is continuously differentiable on an open interval I iff there exists a continuous function ϕ on I 2 such that f (x) − f (y) = ϕ(x, y)(x − y) for all x, y ∈ I. 32. Let f be continuous on (−r, r) and differentiable on (−r, 0) ∪ (0, r). If limx→0 f 0 (x) exists, prove that f 0 (0) exists and f 0 is continuous at 0.

*4.3

Convex Functions

4.3.1 Definition. A function f is said to be convex on an interval (a, b) if f (1 − t)u + tv ≤ (1 − t)f (u) + tf (v) for all a < u < v < b and all t ∈ [0, 1]. f is concave if −f is convex.

♦

For example, |x| is convex on R, as is easily established using the triangle inequality. To see the geometric significance of convexity, let Luv : [u, v] → R denote the function whose graph is the line segment from (u, f (u)) to (v, f (v)). Since a typical point on the line segment may be written (1 − t) u, f (u)) + t(v, f (v) = (1 − t)u + tv, (1 − t)f (u) + tf (v) , t ∈ [0, 1],

86

A Course in Real Analysis

we see that

Luv (1 − t)u + tv = (1 − t)f (u) + tf (v).

This shows that f is convex iff the line segment connecting any two points on the graph of f lies above the part of the graph between the two points. (See Figure 4.4.)

f

Luv

a

u

v

b

x

FIGURE 4.4: Convex function. Now let x ∈ (u, v). Then for some t ∈ (0, 1), x = (1 − t)u + tv = t(v − u) + u = (1 − t)(u − v) + v, hence

t = (x − u)/(v − u) and 1 − t = (v − x)/(v − u).

It follows that f is convex on (a, b) iff f (x) ≤ Luv (x) = f (u)

v−x x−u + f (v) for all a < u < x < v < b. (4.3) v−u v−u

4.3.2 Theorem. If f : (a, b) → R has an increasing derivative, then f is convex. In particular, f is convex if f 00 ≥ 0. Proof. Let a < u < x < v < b. By the mean value theorem applied to f on each of the intervals [u, x] and [x, v], there exist points y ∈ u, x and z ∈ x, v such that f (x) − f (u) f (v) − f (x) = f 0 (y) ≤ f 0 (z) = . x−u v−x Solving the inequality for f (x) yields (4.3). Thus x2n is convex on R for any n ∈ N, ln(x) is concave on (0, +∞), and x is convex on (0, +∞) if p ≥ 1 and concave if p < 1. There is a partial converse to 4.3.2. For this we need following lemma. p

4.3.3 Lemma. If f is convex and a < u < x ≤ y < v < b, then (a)

f (x) − f (u) f (y) − f (u) f (v) − f (y) ≤ ≤ , and x−u y−u v−y

(b)

f (v) − f (x) f (v) − f (y) ≤ . v−x v−y

Differentiation on R

87

Proof. Referring to Figure 4.5, for (a) we have f (x) − f (u) Luy (x) − f (u) ≤ x−u x−u f (y) − f (u) = y−u Luv (y) − f (u) ≤ y−u Luv (v) − Luv (y) = v−y f (v) − f (y) ≤ v−y f

by convexity, since u < x < y, by equality of slopes on Luy , by convexity, since u < y < v, by equality of slopes on Luv , by convexity since u < y < v.

Luv Lxv Luy u

y

x

v

FIGURE 4.5: Convex function inequalities. A similar calculation verifies (b): Lxv (v) − Lxv (y) Lxv (v) − Lxv (x) f (v) − f (x) f (v) − f (y) ≥ = = . v−y v−y v−x v−x 4.3.4 Theorem. If f is convex, then fr0 and f`0 exist, are increasing, and f`0 (x) ≤ fr0 (x). Proof. Let a < u < x ≤ y < v < b. By (a) of the lemma, the difference quotients [f (x) − f (u)]/(x − u) decrease as x → u+ , so fr0 (u) exists in R and fr0 (u) ≤

f (v) − f (y) < +∞. v−y

Letting v → y + shows that fr0 (u) ≤ fr0 (y). Therefore, fr0 is increasing. Similarly, by (b) the difference quotients [f (v) − f (y)]/(v − y) increase as y → v − so f`0 (v) exists in R and f`0 (v) ≥

f (v) − f (x) > −∞. v−x

Taking x = y in (a) of the lemma, we have f (x) − f (u) f (v) − f (x) ≤ . x−u v−x

88

A Course in Real Analysis

Letting u ↑ x and v ↓ x, we obtain f`0 (x) ≤ fr0 (x). In particular, f`0 (x) and fr0 (x) are finite. 4.3.5 Corollary. A convex function f is continuous. Proof. By the theorem, f has finite left-hand and right-hand derivatives and hence is left and right continuous. 4.3.6 Theorem. If a convex function f is differentiable at x ∈ (u, v), then f 0 (x)(t − x) + f (x) ≤ f (t) for all t ∈ (u, v). That is, the tangent line at (x, f (x)) lies below the graph of f on (u, v). Proof. Since the difference quotients f (t) − f (x) /(t − x) decrease as t ↓ x, fr0 (x) ≤

f (t) − f (x) , t > x. t−x

The same difference quotients increase as t ↑ x, hence fl0 (x) ≥

f (t) − f (x) , t < x. t−x

Therefore, if f 0 (x) exists, then f 0 (x)(t − x) + f (x) ≤ f (t) for all t.

4.4

Inverse Functions

In this section we prove that under suitable conditions the inverse of a one-to-one continuous (differentiable) function is continuous (differentiable). For this we need the following two lemmas. The proof of the first is illustrated in Figures 4.6 and 4.7. 4.4.1 Lemma. Let f be one-to-one on an interval I. If f has the intermediate value property, then f is strictly monotone and continuous on I. Proof. Let a, b be arbitrary points in I with a < b. Assume, for definiteness, that f (a) < f (b). We claim that f (a) < f (x) < f (b) for all a < x < b. Indeed, if, say f (x) < f (a), then f (a) lies between f (x) and f (b), hence, by the intermediate value property, there exists c ∈ (x, b) such that f (c) = f (a), contradicting that f is one-to-one. Next we show that f is strictly increasing on [a, b]. Let a < x1 < x2 < b and suppose that f (x2 ) < f (x1 ). Then f (x2 ) lies between f (a) and f (x1 ), hence there exists d ∈ (a, x1 ) such that f (d) = f (x2 ), again contradicting that f is one-to-one. Thus f is strictly increasing on [a, b]. It follows that f must be strictly increasing on any closed and bounded subinterval of I containing

Differentiation on R

89 f

f f (b)

f (x1 ) f (d) = f (x2 )

f (a) = f (c)

f (a)

f (x) x

a

a

c b

d

x1

x2 b

FIGURE 4.6: f (x) < f (a) or f (x1 ) > f (x2 ) violates one-to-one hypothesis. f β f (x0 ) α

x1 x x2 x0 α = f (x) < f (x2 ) < α

x

FIGURE 4.7: Intermediate value property implies continuity. [a, b]. Since every pair of points in I lies in such a subinterval, f is strictly increasing on I. Now let x0 ∈ I. To verify continuity of f at x0 , note that by monotonicity α := lim− f (x) ≤ f (x0 ) ≤ β := lim+ f (x). x→x0

x→x0

(If x0 is an endpoint, only one of these inequalities holds.) Continuity of f at x0 will then follow if we show that α = f (x0 ) = β. Suppose, for example, that α < f (x0 ). Choose any x1 < x0 in I. Since f (x1 ) < α < f (x0 ), there exists some x ∈ (x1 , x0 ) such that f (x) = α. But choosing x2 ∈ (x, x0 ) then produces the contradiction f (x) = α < f (x2 ) < α. 4.4.2 Lemma. If f is strictly increasing (decreasing) on an interval I, then f −1 is strictly increasing (decreasing) on f (I). Proof. Assume that f is strictly increasing. If y1 = f (x1 ) < y2 = f (x2 ), then x1 < x2 (that is, f −1 (y1 ) < f −1 (y2 )), since otherwise f (x1 ) ≥ f (x2 ). Therefore, f −1 is strictly increasing on I.

90

A Course in Real Analysis

The next two theorems are the main results on inverse functions. They assert that the properties of continuity or differentiability of a one-to-one function are inherited by the inverse function. 4.4.3 Theorem. Let f be continuous and one-to-one on an interval I. Then J := f (I) is an interval and f −1 : J → I is continuous. Moreover, f and f −1 are strictly monotone. Proof. Since f is continuous, it has the intermediate value property, hence J is an interval. Moreover, by 4.4.1 and 4.4.2, f and f −1 are strictly monotone. Since I = f −1 (J) is an interval, f −1 has the intermediate value property. The continuity of f −1 now follows from 4.4.1. 4.4.4 Theorem. Let I be an open interval and let f : I → R be continuous and one-to-one on I. If f is differentiable at a ∈ I and f 0 (a) 6= 0, then f −1 is differentiable at f (a), and 0 f −1 (f (a)) =

1 f 0 (a)

.

Proof. Let y = f (x) and b = f (a). For x near a, −1 f −1 (y) − f −1 (b) x−a f (x) − f (a) = = . y−b f (x) − f (a) x−a Since f −1 is continuous, x = f −1 (y) → f −1 (b) = a as y → b and the conclusion follows. If f is differentiable and nonzero on I and y = f −1 (x), then x = f (y) and assertion of the theorem may be written in Leibniz notation as dy = dx

1 . dx dy

From 4.4.3 we obtain the following result, which will be generalized in Chapter 9 to functions on open subsets of Rn . 4.4.5 Inverse Function Theorem. Let f be continuously differentiable on an open interval I. If f 0 (a) 6= 0, then there exist open intervals Ia ⊆ I and Ja = f (Ia ) with a ∈ Ia such that f is one-to-one on Ia and f −1 : Ja → Ia is continuously differentiable. Proof. Since f 0 is continuous and f 0 (a) 6= 0, there exists an open interval Ia containing a on which f 0 6= 0. By the mean value theorem, f is one-to-one on Ia , hence, by 4.4.3, Ja = f (Ia ) is an interval, and, by 4.4.4, f −1 : Ja → Ia is continuously differentiable.

Differentiation on R

91

4.4.6 Global Inverse Function Theorem. Let f be continuously differentiable with f 0 nonzero on an open interval I. Then f is one-to-one on I, J := f (I) is an open interval, and f −1 : J → I is continuously differentiable. Proof. That f is one-to-one follows from the mean value theorem. By 4.4.3, J is an interval and f −1 : J → I is continuous. Since continuous differentiability is a local property, 4.4.5 implies that f −1 is continuously differentiable. The following examples, as well as exercises below, establish the existence and basic properties of several well-known functions. 4.4.7 Example. Since x = sin y is strictly increasing on [−π/2, π/2], the inverse function y = sin−1 x exists, is strictly increasing on [−1, 1], and dy = dx

dx dy

−1

=

1 1 =√ , −1 < x < 1. cos y 1 − x2

Similarly, x = cos y is strictly decreasing on [0, π], hence y = cos−1 x exists, is strictly decreasing on [−1, 1], and dy = dx

dx dy

−1

=

−1 −1 =√ , −1 < x < 1. sin y 1 − x2

♦

An alternate approach to the preceding example is to define the inverse sine by Z x dt −1 √ sin x = , −1 < x < 1 1 − t2 0 and then obtain the sine function as the inverse of sin−1 . This allows the derivation of the standard properties of sin x, and ultimately of the other trig functions, without relying on geometric arguments. The disadvantage of this approach is that verification of these properties is detailed and lengthy. Still another approach is based on complex infinite series. For the latter, the reader may wish to consult [7]. The following example illustrates the integral approach for the exponential function. Some of the assertions in the example rely on results from Chapters 5 and 6 but should be familiar to the reader. 4.4.8 Example. The natural logarithm function is defined by Z x 1 ln x := dt, x > 0. 1 t One may show that all the familiar algebraic properties of the natural log follow from this definition. (See Exercise 5.) Since ln x is strictly increasing on (0, +∞), the inverse function exp x := ln−1 x

92

A Course in Real Analysis

exists and is strictly increasing. Since ln 2 > 0, ln 2n = n ln 2 → +∞ and ln 2−n = −n ln 2 → −∞, hence

lim ln x = +∞ and

lim ln x = −∞.

x→+∞

x→0+

It follows from these limits and the intermediate value theorem that the range of ln x, that is, the domain of exp x, must be R. Thus, by Exercise 4, lim exp x = 0 and

x→−∞

lim exp x = +∞.

x→+∞

From the fundamental theorem of calculus, proved in the next chapter, 1 d ln y = , hence dy y −1 d exp x d ln y = = y = exp x. dx dy Moreover, since ln(1 + 1/n) − ln 1 d ln y = lim = lim ln(1 + 1/n)n , 1= n→+∞ dy y=1 n→+∞ 1/n continuity of exp and 2.2.4 imply that exp 1 = lim exp ln(1 + 1/n)n = lim (1 + 1/n)n = e. n→+∞

n→+∞

Additional properties of exp x may be found in the exercises, including the identity exp r = er , r ∈ Q. Because of this identity, we frequently write ex for exp x. Indeed, the function exp is the basis for rigorous definitions of the general exponential function ax , a > 0, and the power function xa , x ≥ 0. (See Exercises 8 and 9.) ♦

Exercises 1. Find f −1 and its domain for each of the following functions f with the given domain: (a) x2 − 4x + 5, [2, +∞). (c)

(b) S

5e−x + 2 , (−∞, +∞). (d) 3e−x + 7

(e) ex − 2e−x , (−∞, +∞)

(f) S

3x + 2 , R \ {−3/2}. 2x + 3 sin2 x − 4 sin x + 3, [−π/2, π/2]. 2 + cos x , (0, π). 3 + cos x

2. Let f (x) = ax + |x| + |x − 1|. Find all values of a for which f −1 exists on R. For these values, find f −1 .

Differentiation on R

93

3. Give an example of a one-to-one continuous function on the union of two intervals that is (a) not monotone, (b) strictly monotone but with discontinuous inverse. 4. Let f be defined, continuous, and strictly increasing on (a, b), so the limits c := lim f (x) and d := lim f (x) x→a+

x→b−

exist in R. Show that the domain of f −1 is (c, d) and that lim f −1 (x) = a and

x→c+

lim f −1 (x) = b.

x→d−

5. Verify the following properties of ln x, as defined in 4.4.8: (a)

ln 1 = 0, ln e = 1.

(b) S ln(xy) = ln x + ln y.

(c)

ln(x/y) = ln x − ln y.

(d)

ln xr = r ln x, r ∈ Q.

6. Prove that exp(x + y) = exp(x) exp(y). 7. For c, d ∈ R with c > 0, define cd = exp(d ln c). Show that this definition agrees with the usual one if d is rational and verify the following properties, where x, y ∈ R and a, b > 0. (a) ln ax = x ln a. y (d) ax = axy .

(b)S ax ay = ax+y . (e) (ab)x = ax bx .

(c) ax /ay = ax−y . (f) aln b = bln a .

8. Let a > 0, a 6= 1, and define ax as in Exercise 7. Find limx→−∞ ax , limx→+∞ ax , and (ax )0 . 9.S Let a ∈ R and for x > 0 define xa as in Exercise 7. Prove the power rule (xa )0 = axa−1 . 10. Prove that tan x restricted to (−π/2, π/2) has a differentiable inverse defined on R. Find limx→−∞ tan−1 x, limx→+∞ tan−1 x, and (tan−1 x)0 . 11. Prove that sec x restricted to [0, π/2) ∪ [π, 3π/2) has a continuous inverse defined on (−∞, −1] ∪ [1, +∞). Show that sec−1 x is differentiable on (−∞, −1) ∪ (1, +∞) and compute its derivative. Also, find limx→−∞ sec−1 x and limx→+∞ sec−1 x. 12. Verify the inequalities x−1 (a) < ln x < x − 1, x > 1. (b) | tan−1 x − tan−1 y| ≤ |x − y|. x y−x y−x (c) √ < | sin−1 y − sin−1 x| < p , −1 < x < y < 1. 1 − x2 1 − y2

94

A Course in Real Analysis

13. Verify the identities x (a) tan sin−1 x = √ , −1 < x < 1. 1 − x2 (b) sin−1 x + cos−1 x = π/2, −1 ≤ x ≤ 1. 2 x −1 + 2 tan−1 x = π, x ≥ 0. (c)S cos−1 x2 + 1 r 1−x −1 −1 (d) cos x = 2 sin , −1 ≤ x ≤ 1. 2 (e) tan−1 x + tan−1 (2/x) + tan−1 (x + 2/x) = π, x 6= 0. 14.S Suppose f satisfies f (x + y) = f (x)f (y) for all x, y ∈ R. Show that if a := f 0 (0) exists, then f (x) = f (0)eax . 15. Suppose that f : [0, 1] → [0, 1] is continuous, one-to-one, onto, and f = f −1 . Prove that either f (x) = x for all x or f is monotone decreasing. 16. Suppose f 0 is one-to-one on an open interval I. Show that f 0 is continuous and strictly monotone on I. (See Exercise 4.2.25.) 17. Let f be differentiable on an open interval I with f 0 6= 0. Let a, b ∈ I with a < b and suppose that f : [a, b] → [a, b] is one-to-one and onto. Prove that there exists c ∈ (a, b) such that f (b) − f (a) = f 0 (c)f 0 f −1 (c) . −1 − f (a)

f −1 (b)

18.S Let f be twice differentiable and f 0 6= 0 on an open interval I. Show that (f −1 )00 (x) exists on f (I) and find a formula.

4.5

L’Hospital’s Rule

The rule for calculating the limit of a quotient of functions, namely, lim

x→a x∈E

lim{x→a, x∈E} f (x) f (x) = , g(x) lim{x→a, x∈E} g(x)

(4.4)

requires that the limits on the right are finite and the denominator is not 0. If, instead, the limits in the quotient are both zero or ±∞, then the expression on the left in (4.4) is called an indeterminate form of type 00 or ±∞ ±∞ , respectively. There are other types of indeterminate forms, but all may be converted to one of these. The following theorem describes a method for evaluating these limits.

Differentiation on R

95

4.5.1 l’Hospital’s Rule. Let J be an open interval, finite or infinite, and let a ∈ R be an accumulation point of J. Suppose that f and g are differentiable on E := J \ {a} and that g(x)g 0 (x) 6= 0 for every x ∈ E. If the limits A := x→a lim f (x), B := x→a lim g(x), and L := x→a lim x∈E

x∈E

x∈E

f 0 (x) g 0 (x)

exist in R and either A = B = 0 or B = ±∞, then lim

x→a x∈E

f (x) = L. g(x)

Proof. There are a number of cases to consider, but the proofs of many of these are essentially the same. We prove the theorem for four fundamentally different cases and for one-sided limits, so E = (a, c) or (c, a) for some c. As a first step, we use the Cauchy mean value theorem to obtain, for every pair of distinct numbers x, b ∈ E, a number ξ = ξ(x, b) between x and b such that [f (x) − f (b)]g 0 (ξ) = [g(x) − g(b)]f 0 (ξ). (4.5) Now set h(x) =

f (x) . g(x)

Case 1 : A = B = 0, a and L are finite, and E = (a, c). Extend f and g continuously to [a, c) by defining f (a) = g(a) = 0. Taking b = a and x ∈ (a, c) in (4.5) we see that f 0 (ξ) h(x) = 0 . g (ξ) Since ξ → a as x → a, limx→a+ h(x) = L, as required. For the remaining cases, we use the Cauchy mean value theorem in the following form. Divide (4.5) by g 0 (ξ)g(x) and solve the resulting equation for h = f /g to obtain f (b) g(b) f 0 (ξ) h(x) = + 1− , x, b ∈ E. (4.6) g(x) g(x) g 0 (ξ) Case 2 : A = B = 0, a = L = +∞, and E = (c, +∞). Let M > 0 and choose x0 ∈ E such that f 0 (x) > 2M for x > x0 . g 0 (x) Let b > x > x0 . For large b, g(b)/g(x) < 1/2, hence from (4.6) h(x) ≥

f (b) 1 f (b) + (2M ) = + M. g(x) 2 g(x)

96

A Course in Real Analysis

Letting b → +∞ we see that h(x) ≥ M . Therefore, limx→+∞ h(x) = +∞. Case 3 : B = +∞, a and L are finite and E = (c, a). Given ε > 0, choose b ∈ E such that 0 f (t) < ε/2 for all t ∈ (b, a). − L g 0 (t) Let x ∈ (b, a). By (4.6), h(x) −

f 0 (ξ) f (b) g(b) f 0 (ξ) = − . 0 g (ξ) g(x) g(x) g 0 (ξ)

Since the right side tends to 0 as x → a, f 0 (ξ) f 0 (ξ) |h(x) − L| ≤ h(x) − 0 + 0 − L < ε/2 + ε/2 = ε g (ξ) g (ξ) for all x near a. Therefore, limx→a− h(x) = L. Case 4 : B = +∞, a = L = +∞, and E = (c, +∞). Given M > 0, choose b > c such that f 0 (t) > 3M for all t > b. g 0 (t) Let x > b such that g(x) > g(b). By (4.6), f (b) g(b) + 1− M. h(x) ≥ g(x) g(x) Since the quotients on the right side tend to zero, for all sufficiently large x we have 1i M h + 1 − (3M ) = M h(x) ≥ − 2 2 Therefore, limx→+∞ h(x) = +∞. The following examples illustrate typical applications of l’Hospital’s rule. Examples. (a) The limit L := lim

x→0

x − tan x x3

is of the form 00 , hence 1 − sec2 x 2 sec2 x tan x sec4 x + 2 sec x tan2 x 1 = lim = lim =− . x→0 x→0 x→0 3x2 −6x −3 3

L = lim

Note that each step except the last produces a limit of the form 00 , allowing another application of l’Hospital’s rule. The validity of each step is ultimately justified by the existence of the final limit.

Differentiation on R (b) The limit

97

sin(1/x) x→+∞ e1/x − 1

L := lim

is of the form 00 ; however, it is complicated to apply l’Hospital’s rule directly. Making the substitution y = 1/x produces a more tractable problem: L = lim+ y→0

(c) The limit

sin y cos y = 1. = lim ey − 1 y→0+ ey

L := lim x sin(1/x) x→+∞

is of the form ∞ · 0, but a simple algebraic manipulation produces the form 00 : sin(1/x) sin y = lim+ = 1. x→+∞ y→0 1/x y

L = lim

Here, l’Hospital’s rule was unnecessary, since we could use a known limit. p

(d) The limit L := limx→1+ x1/(x logarithms to obtain the form 00 :

−1)

, p > 0, is of the form 1∞ , so we take

h i p 1/x 1 ln x = lim+ p−1 = . lim+ ln x1/(x −1) = lim+ p x→1 px x→1 x→1 x − 1 p Thus L = e1/p . (e) The technique used in (d) shows that x t lim 1+ = et , x→+∞ x since

x t ln(1 + ty) t lim ln 1 + = lim+ = lim+ = t. x→+∞ y→0 y→0 1 + ty x y

(f) The limit L :=

lim

x→π/2+

h

i 1 + sec x x − π/2

is of the form ∞ − ∞. Combining fractions we obtain a limit of the form 00 . Thus L= = =

lim

x→π/2+

lim

x→π/2+

lim

x→π/2+

= 0.

cos x + x − π/2 (x − π/2) cos x 1 − sin x (π/2 − x) sin x + cos x − cos x (π/2 − x) cos x − 2 sin x ♦

98

A Course in Real Analysis

Exercises 1. Evaluate the following limits, where p, q > 0: epx − eqx epx − ep (b) lim x→0 x→1 tan(x − 1) sin x ln(sin px) (d) S lim x 1 − e1/x (e) lim+ x→+∞ x→0 ln(sin qx) −1 x − tan x 1 1 S (g) lim (h) lim+ − x→0 x − sin−1 x x→0 x sin x (a) S lim

(j)

(xp )

lim (sin x)(ln x) (k) lim+ x x→0+ x→0 x−1 1 x x+1 − (n) lim+ (m) S lim+ x→1 x→0 tan x x x−1 2 1 − cos x sin x + cos x − 1 (p) S lim 2 (q) lim x→0 x + x3 sin x x→0 ln(1 + x) −x 1 S 1/(ln ln x)p (s) lim x (t) lim 1− √ x→+∞ x→+∞ x S

(v) S lim+ xsin x x→0

x cos x − sin x x→0 x2 sin x

(w) lim

ln(3x2 − 1) x→+∞ ln(5x2 − 1) p sin(px) − p2 x (f) lim x→0 x3 (c) lim

(i) lim+ ln(x − 1) ln x x→1

1/x2

(l) lim (cos x) x→0 √ x ln x (o) lim x→1 x − 1 x x−1 (r) lim x→+∞ x + 1 (u) lim+ (sin x)

x

x→0

(x) lim+ x→0

(1 + x)1/x − e x

2. For each function f : (0, 1] → R below, define f (0) so that f continuous on [0, 1]. (a) (d)

1 − ex . x x . tan x

(b)

ln(1 + x) . x

(e) x ln x.

(c) S (f)

sin 5x . sin 3x 1 − cos 2x . 1 − cos 3x

3. Find limn an , where an = (a)S sin1/n (1/n). 4. Show that

(b) n − n2 ln(1 + 1/n).

(c) n [(1 + 1/n)n − e] .

if p < 2, 0 p p n + 1/n − n → 2 if p = 2 +∞ if p > 2

5. By considering the sequences {n} and {n + 1/n}, use l’Hospital’s rule to prove that ex is not uniformly continuous on [0, +∞). 6.S Let f (x) = x1+1/x . Evaluate limn f (n + 1) − f (n) . 7. Let f be differentiable on (a, b) and suppose that limx→a+ f (x) and limx→a+ f 0 (x) exist in R. Find a continuous extension of f to [a, b) such that f 0 exists and is continuous at a.

Differentiation on R

99

8. Let g be differentiable on (1, +∞) and h differentiable on (−∞, 1] with lim g(x) = h(1) and

x→1+

Define

( f (x) =

lim g 0 (x) = h0` (1).

x→1+

(†)

g(x) if x > 1 h(x) if x ≤ 1.

Show that f is differentiable at x = 1 and hence on R. Conversely, suppose that f 0 (1) exists. Do the limit equations in (†) hold? 9.S Let f and g be differentiable on (0, +∞) with lim f (x) = lim g(x) = +∞, and

x→+∞

Evaluate

x→+∞

f 0 (x) ∈ (0, +∞). x→+∞ g 0 (x) lim

ln f (x) . x→+∞ ln g(x) lim

10. Let f be differentiable in a neighborhood of a and suppose that f 00 (a) exists. For α, β ∈ R calculate βf (a + αh) − αf (a + βh) + (α − β)f (a) . h2 f (a + αh) + f (a + βh) − 2f (a) lim if f 0 (a) = 0. h→0 h2

(a)S lim

h→0

(b)

11. Suppose that f has n derivatives on [a, +∞) and that limx→+∞ f (n) (x) exists in R. Prove that limx→+∞ f (x)/xn exists in R. 12.S Suppose that f has n derivatives on (0, a) and L := lim+ x2n f (n) (x) exists in R. Find limx→0+ xn f (x) in terms of L.

x→0

13. Let f be differentiable on (1, +∞) and limx→+∞ f (x) = 0. Prove that if limx→+∞ x2 f 0 (x) exists in R, then limx→+∞ xf (x) also exists in R. Is the converse true? 14. Suppose that, in a deleted neighborhood of 0, f is differentiable with f 0 6= 0 and that limx→0 f (x) = 0. Prove that if limx→0 f (x)/f 0 (x) exists, then it must equal 0. 15. Let g(x) be differentiable on (1, ∞) with g and g 0 nonzero and let f (x) be differentiable in a neighborhood of 0. Suppose that limx→+∞ g(x) = 0, f (0) = 0 and f 0 is continuous at 0. Find lim

x→+∞

f (g(x)) . g(x)

Give nontrivial examples of functions f and g that satisfy these conditions.

100

A Course in Real Analysis

16.S Let f and g be differentiable on (1, +∞) with g 0 6= 0 and suppose that limx→+∞ f (x) = limx→+∞ g(x) = +∞ and that the limit L := limx→+∞ f 0 (x) exists in R. Find lim

x→+∞

f (g(x)) . g(x)

Give nontrivial examples that satisfy these conditions with L finite. 17. Let f be differentiable on (1, +∞) and suppose that limx→+∞ f (x) and limx→+∞ f 0 (x) exist in R. Prove that the second limit must be zero. Does the assertion still hold if limx→+∞ f (x) is infinite? 18.S Let f be differentiable on (1, +∞) and suppose that limx→+∞ f (x) and limx→+∞ xf 0 (x) exist in R. Prove that the second limit is zero. Does the assertion still hold if limx→+∞ f (x) is infinite? 19. Let f be differentiable in a deleted neighborhood of 0 and suppose that limx→0 f (x) and limx→0 f 0 (x) tan x exist in R. Prove that the second limit must be 0. Does the assertion still hold if limx→0 f (x) is infinite? 20. Let f be differentiable on (0, b) and suppose that the limits limx→0+ f (x) and limx→0+ x2 f 0 (x) exist in R. Prove that one of these limits must be zero. Does the assertion still hold if limx→0+ f (x) is infinite? 21. Let f 00 exist and be continuous on (−1, 1) and f (0) = f 0 (0) = 0. Prove that there exists a continuous function g on (0, 1) such that f (x) = x2 g(x). Must g be differentiable at 0?

4.6

Taylor’s Theorem on R

Taylor’s theorem may be viewed as a generalization of the mean value theorem. Its importance derives from its use in establishing various inequalities and from its fundamental connection with power series. 4.6.1 Taylor’s Theorem. Let f have n + 1 derivatives in an open interval I. Then, for each x, a ∈ I with x 6= a, there exists a number c between x and a such that f (x) =

n X f (k) (a) k=0

k!

(x − a)k +

f (n+1) (c) (x − a)n+1 . (n + 1)!

(4.7)

Proof. Assume for definiteness that a < x. Define a function g on [a, x] by g(t) =

n X f (k) (t) k=0

k!

(x − t)k + α

(x − t)n+1 − f (x), (n + 1)!

(4.8)

Differentiation on R

101

where α is chosen so that g(a) = 0. Since g is continuous on [a, x], differentiable on (a, x) and g(x) = g(a), there exists, by Rolle’s theorem, c ∈ (a, x) such that g 0 (c) = 0. From the calculations (k+1) (t) f (k) (t) f (k) d f (t) (x − t)k − (x − t)k−1 if k ≥ 1, k (x − t) = k! (k − 1)! 0 dt k! f (t) if k = 0, we have g 0 (t) = =

n X f (k+1) (t) k=0 (n+1)

f

k! (t)

n!

(x − t)k −

n−1 X k=0

(x − t)n f (k+1) (t) (x − t)k − α k! n!

(x − t)n (x − t)n − α . n!

In particular, 0 = g 0 (c) =

(x − c)n f (n+1) (c) (x − c)n − α , n! n!

hence α = f (n+1) (c). Thus from (4.8), 0 = g(a) =

n X f (k) (a) k=0

k!

(x − a)k +

f (n+1) (c) (x − a)n+1 − f (x), (n + 1)!

which is (4.7). Equation (4.7) is frequently written f (x) = Tn (x, a) + Rn (x, a), where n (k) X f (a) f (n+1) (c) Tn (x, a) = (x − a)k and Rn (x, a) = (x − a)n+1 . k! (n + 1)! k=0

The expression Tn (x, a) is called the nth Taylor polynomial of f about a, and Rn (x, a) is called the remainder. It may be shown that Tn (x, a) is the unique polynomial of degree ≤ n that best approximates f near a in the sense that lim

x→a

f (x) − Tn (x, a) = 0. (x − a)n

(See Exercise 4.) The remainder term Rn (x, a) has other forms, one of which is given in Exercise 3. Observe that if Rn (x, a) → 0 as n → +∞, then Tn (x, a) → f (x), which implies that f (x) is expressible as a power series about a. We exploit this idea in Section 7.4. The following application of Taylor’s theorem is a generalization of the second derivative test.

102

A Course in Real Analysis

4.6.2 nth Derivative Test. Let f have n continuous derivatives on an open interval I and let a ∈ I with f (j) (a) = 0, 1 ≤ j ≤ n − 1, and f n (a) 6= 0. (a) If n is even and f (n) (a) > 0 (f (n) (a) < 0), then f has a local minimum (local maximum) at a. (b) If n is odd, then f has a neither a local minimum nor a local maximum at a. Proof. Assume f (n) (a) > 0. By continuity, f (n) (x) > 0 for all x in an open interval J containing a. Let x ∈ J, x 6= a. By Taylor’s theorem, there exists c between a and x such that f (x) = f (a) + f (n) (c)

(x − a)n . n!

Thus if n is even, then f (x) > f (a), hence f has a local minimum at a. If n is odd, then f (x) > f (a) if x > a and f (x) < f (a) if x < a, so f has a neither a local maximum nor a local minimum at a. A similar argument works for the case f (n) (a) < 0. Note that the familiar second derivative test, obtained by taking n = 2 in the theorem, is inconclusive for the function f (x) = x4 at a = 0. Here, one must take n = 4.

Exercises 2

1. Define f (0) = 0 and f (x) = e−1/x x 6= 0. Prove that f (n) exists on R and f (n) (0) = 0 for all n. Conclude that every Taylor polynomial for f about 0 is identically 0. 2. Verify the following inequalities: (a)

2n−1 X

2n

(−1)k xk <

k=0

(b)S

k=0

2n−1 X k=0

(c)

2n X k=1

(d)

k=1

2n X k=0

(−1)k k x , x > 0. k!

2n+1 X (−1)k+1 (−1) x2k−1 < sin x < x2k−1 , 0 < x < π. (2k − 1)! (2k − 1)!

k=0

(e)

(−1) k x < e−x < k! k

k+1

k=1

2n−1 X

n−1 X

X 1 < (−1)k xk , x > 0. 1+x

(−1) 2k x < cos x < (2k)!

(−1) k

k

k−1

2n X k=0

xk < ln(1 + x) <

(−1)k 2k x , 0 < x < π. (2k)! n X (−1)k−1 k=1

the reverse inequalities if n is even.

k

xk if n is odd,

Differentiation on R

103

3.S ⇓2 Show that if f (n+1) is continuous on I, then Z 1 x Rn (x, a) = (x − t)n f (n+1) (t) dt. n! a Hint. Integrate by parts n times. 4. Prove that a polynomial Pn (x) =

Pn

k=0

ak (x − a)k satisfies

f (x) − Pn (x) =0 (x − a)n

lim

x→a

iff Pn = Tn , the nth Taylor polynomial of f about a. Pn Pn 5.S Let P (x) = k=0 ak (x − a)k = k=0 bk (x − b)k . Show that bk =

n−k X j=0

j+k (b − a)j ak+j . k

6. Let P be a polynomial of degree n. Prove that the polynomials P (x ± 1) may be written as linear combinations of P (k) (x), k = 0, . . . , n. Find simplified expressions for P (x + 1) ± P (x − 1). 7. Let f have n derivatives on [0, 1]. Show that for each y = 6 f (1) there exists an extension g of f to [0, +∞) with n derivatives such that g(b) = y for some b > 1.

*4.7

Newton’s Method

A simple zero of a differentiable function f is a number z such that f (z) = 0 and f 0 (z) 6= 0. Newton’s method is a rapidly converging recursion scheme for approximating such a zero. The idea is to choose x1 near z and then define a sequence {xn } recursively by xn+1 = xn −

f (xn ) , n = 1, 2, . . . , f 0 (xn )

(4.9)

as illustrated in Figure 4.8. Under suitable conditions, the sequence is welldefined and converges to z, hence may be used to approximate z to (theoretically) any desired degree of accuracy. 4.7.1 Newton’s Method. Let f 00 be continuous on an open interval I and let z be a simple zero of f in I. If x1 is chosen sufficiently near z, then the sequence {xn } lies in I and converges to z. 2 This

exercise will be used in 5.6.3.

104

A Course in Real Analysis

y = f (xn ) + f 0 (xn )(x − xn ) y = f (x)

z

xn+2

xn+1

xn

x

FIGURE 4.8: Newton’s method. Proof. Since f 0 (z) 6= 0, there exists a neighborhood Iz of z contained in I on which |f 0 | ≥ c > 0. Suppose that xn ∈ Iz . By Taylor’s theorem, for each x ∈ I there exists ξ between x and xn such that f (x) = f (xn ) + f 0 (xn )(x − xn ) + 12 f 00 (ξ)(x − xn )2 . In particular, 0 = f (z) = f (xn ) + f 0 (xn )(z − xn ) + 12 f 00 (ξ)(z − xn )2 . Dividing by f 0 (xn ), we have xn+1 − z = xn − z −

f 00 (ξ) f (xn ) = (z − xn )2 . f 0 (xn ) 2f 0 (xn )

Thus if d is the maximum of |f 00 | on Iz , then |xn+1 − z| ≤ α|xn − z|2 ,

α :=

d . 2c

Iterating, we have |xn+1 − z| ≤ · · · ≤ α2

k+1

−1

|xn−k − z|2

k+1

≤ · · · ≤ α2

n

−1

n

|x1 − z|2 .

Thus if x1 is sufficiently near z, and in particular if α|x1 − z| < 1, then xn ∈ Iz for all n and xn → z. 4.7.2 Example. Let f (x) = sin x − x/3. Since √ f (3π/4) = 1/ 2 − π/4 < 0 < 1 − π/6 = f (π/2), f has a zero in [3π/4, π/2] by the intermediate value theorem. Taking x1 = 3π/4 yields the zero 2.27886266, accurate to eight decimal places. Taking x1 = 1 produces the symmetric zero −2.27886266, while x1 = π/4 produces 0. ♦

Differentiation on R

105

If x1 is not sufficiently near z, then the sequence {xn } may converge more slowly to z or may not converge at all (see Exercise 6). 4.7.3 Example. For an approximate solution of ex = 2−x we apply Newton’s method to f (x) = ex + x − 2. By the intermediate value theorem, f has a zero in (0, 1). The recursion formula for f is xn+1 = xn − (exn + xn − 2)(exn + 1)−1 . Table 4.1 gives the first few terms of the sequence {xn } and the corresponding TABLE 4.1: Newton’s method for ex + x − 2 = 0. x1 1 f (x1 ) 1.7182818 x1 5 f (x1 ) 151.4131591

x2 .5378828 f (x2 ) .2502604 x2 3.9866142 f (x2 ) 55.8587993

x3 .4456167 f (x3 ) .0070696 x3 2.9686340 f (x3 ) 20.4339472

x4 .4428567 f (x4 ) .0000059 x4 1.9701667 f (x4 ) 7.1420387

x5 .4428544 f (x5 ) .0000000 x5 1.0961884 f (x5 ) 2.0889256

values of f (accurate up to seven decimal places) for the initial values x1 = 1 and x1 = 5. The convergence is significantly slower for the larger value. The solution, accurate to 10 decimal places, is .4428544010. ♦

Exercises 1. Find a zero, accurate to eight decimal places, of the given polynomial in the indicated interval. (a) S x3 − x + 2, [−2, −1].

(b) x3 + x + 1, [−1, 0].

(c) x3 − 2x + 2, [−2, −1].

(d) S x5 − 2x + 3, [−2, −1].

(e) x7 − x − 1, [1, 2].

(f) x4 − 2x3 + 5x2 − 8x − 6, [2, 3].

(g)S 20x4 − 20x3 − 8x2 + 4x − 1, [1, 2]. (h) 20x4 − 20x3 − 4x + 1, [1, 2]. 2. Find a solution of the given equation in the indicated interval, correct to eight decimal places. (a) S sin x = x2 , [.5, 1].

(b)

(c)

ln x + x = 2, [1, 2].

(d) 2 cos x = ex , [0, 1].

ln x = e−x , [1, 2].

(f)

(e)

S

sin x = x3 , [.5, 1]. tan x + x = 1, [0, 1].

106

A Course in Real Analysis

3. Show that Newton’s method applied to the function x−1 − c produces the equation xn+1 = 2xn − cx2n . Use this to find 1/2.34567, correct to eight decimal places. Check your answer with a calculator. √ 4.S Use Newton’s method to find 63 correct to eight decimal places. Check your answer with a calculator. 5. What happens when you apply Newton’s method with x1 = 1 to the polynomial in part (c) of Exercise 1? 6. Show that the sequence generated by Newton’s method applied to f (x) = x1/3 cannot converge for any value of x1 6= 0.

Chapter 5 Riemann Integration on R

5.1

The Riemann–Darboux Integral Throughout this section, f denotes an arbitrary bounded, real-valued function on a closed and bounded interval [a, b].

The first step in the development of the Riemann–Darboux integral is to partition the interval [a, b] into finitely many subintervals, which are used to form upper and lower sums of f . Under suitable conditions, the sums converge to the integral. 5.1.1 Definition. A partition of [a, b] is a set P = {x0 , x1 , . . . , xn−1 , xn }, where x0 := a < x1 < · · · < xn−1 < xn := b. The points x1 , . . . , xn−1 are called the interior points of the partition. The mesh of the partition is defined as kPk := max ∆xj , where ∆xj := xj − xj−1 , 1 ≤ j ≤ n. 1≤j≤n

A refinement of P is a partition containing P. The common refinement of partitions P and Q is the partition P ∪ Q. ♦ 5.1.2 Example. Let p ∈ N. Then, for each n ∈ N, Pn := {j/pn : j = 0, 1, . . . , pn } is a partition of [0, 1], kPn k = p−n , and Pn+1 is a refinement of Pn .

♦

5.1.3 Definition. The lower and upper (Darboux) sums of f over a partition P of [a, b] are defined, respectively, by S(f, P) :=

n X

mj ∆xj

and S(f, P) :=

n X

Mj ∆xj ,

j=1

j=1

inf

f (x) and Mj = Mj (f ) :=

where mj = mj (f ) :=

xj−1 ≤x≤xj

sup

xj−1 ≤x≤xj

f (x).

♦ 107

108

A Course in Real Analysis

A geometric interpretation of the upper and lower sums for a positive continuous function is given in Figure 5.1. The lower (upper) sum is the total area of the smaller (larger) rectangles.

f

a

x2

x1

x3

x4

b

x

FIGURE 5.1: Upper and lower sums of f . The following proposition asserts that refinements increase lower sums and decrease upper sums. 5.1.4 Proposition. If Q is a refinement of P, then S(f, P) ≤ S(f, Q) ≤ S(f, Q) ≤ S(f, P). Proof. The middle inequality is clear. To prove the rightmost inequality, let P = {x0 = a < x1 < · · · < xn−1 < xn = b} and assume first that Q = P ∪ {c}. Choose k so that xk−1 < c < xk and set Mk0 =

sup

xk−1 ≤x≤c

f (x) and Mk00 =

sup f (x). c≤x≤xk

Then Mk0 , Mk00 ≤ Mk , hence S(f, Q) =

k−1 X

Mj ∆xj +

j=1

≤

k−1 X

n X

Mj ∆xj + Mk0 (c − xk−1 ) + Mk00 (xk − c)

j=k+1

Mj ∆xj +

j=1

n X

Mj ∆xj + Mk (c − xk−1 ) + Mk (xk − c)

j=k+1

= S(f, P). For the general case, observe that any refinement Q of P may be obtained by successively adding points to P. At each step, the upper sum is decreased so that ultimately one obtains the desired inequality. The proof for lower sums is similar. 5.1.5 Corollary. For any partitions P and Q of [a, b], S(f, Q) ≤ S(f, P). Proof. By 5.1.4, S(f, Q) ≤ S(f, P ∪ Q) ≤ S(f, P ∪ Q) ≤ S(f, P).

(5.1)

Riemann Integration on R

109

5.1.6 Definition. The lower and upper (Darboux ) integrals of f on [a, b] are defined, respectively, by Z b Z b Z b Z b f= f (x) dx := sup S(f, P) and f= f (x) dx := inf S(f, P), a

P

a

a

P

a

where the supremum and infimum are taken over all partitions P of [a, b]. In each case, f is called the integrand and x the integration variable. ♦ 5.1.7 Proposition. For any partition P of [a, b], Z b Z b S(f, P) ≤ f≤ f ≤ S(f, P). a

a

Proof. The left and right inequalities are immediate from the definition of lower and upper integrals. The middle inequality follows by taking the infimum over Q and then the supremum over P in (5.1). 5.1.8 Proposition. The following statements are equivalent: Z b Z b (a) f= f. a

a

(b) For each ε > 0 there exists a partition Pε of [a, b] such that S(f, Pε ) − S(f, Pε ) ≤ ε. Proof. (a) ⇒ (b): Given ε > 0, there exist partitions P 0 and P 00 such that Z b Z b 0 00 f − ε/2 < S(f, P ) and S(f, P ) < f + ε/2. a

a

By 5.1.4, the inequalities still hold if P 0 and P 00 are each replaced by their common refinement Pε := P 0 ∪ P 00 . Subtracting the resulting inequalities and applying (a) yields (b). (b) ⇒ (a): If the inequality in (b) holds then, by 5.1.7, Z b Z b 0≤ f− f ≤ S(f, Pε ) − S(f, Pε ) < ε. a

a

Since ε is arbitrary, the integrals must be equal. 5.1.9 Definition. The function f is said to be (Darboux) integrable on [a, b] if one (hence both) of the conditions (a), (b) of 5.1.8 hold. In this case, the common value of the integrals in (a) is called the (Riemann–Darboux ) integral of f on [a, b] and is denoted by Z b Z b f= f (x) dx. a

a

110

A Course in Real Analysis

Also, define

a

Z

f =−

Z

b

b

f and

a

a

Z

f = 0.

a

The collection of all integrable functions on [a, b] is denoted by Rba .

♦

The following theorem guarantees a rich supply of integrable functions. 5.1.10 Theorem. If f is continuous on [a, b] except possibly at finitely many points, then f ∈ Rba . Proof. Denote the points of discontinuity of f , if any, by d1 < · · · < dn . For convenience, we assume that these lie in (a, b); only a minor modification of the proof is needed if d1 = a or dn = b. Let ε > 0. For each j, remove an open interval of width r centered at dj , the value of r to be determined. Since f is continuous on each of the resulting n + 1 closed intervals I0 , . . . , In , it is uniformly continuous there. (If f is continuous on [a, b], then n = 0 and I0 = [a, b].) Thus there exists a δ > 0 such that for each j, |f (x) − f (y)| < ε/2(b − a) for all x, y ∈ Ij with |x − y| < δ. Now, the endpoints of the intervals Ij form a partition P of [a, b]. If necessary, refine P by inserting points (marked by ∗ in Figure 5.2) into these intervals so that the distance between consecutive points is less than δ. The subintervals of P

Q

r

r

a I0

d1

β

α

I1 β

I2

d2 β

∗

d1

b

β

α

β

∗

d2

FIGURE 5.2: The partitions P and Q. the resulting partition Q are of two types: those that contain some dj , which we mark by α, and those that do not, which we mark by β. Thus, in the obvious notation, X X S(f, Q) − S(f, Q) = (Mj − mj )∆xj + (Mj − mj )∆xj . α

β

In the first sum, ∆xj < r and in the second, Mj − mj ≤ ε/2(b − a). Since the first sum has n terms (corresponding to the n discontinuities dj ), S(f, Q) − S(f, Q) < 2M nr + ε/2, where M is a bound for |f | on [a, b]. Choosing r < ε/4M n, we then have S(f, Q) − S(f, Q) < ε, which shows that f is integrable on [a, b].

Riemann Integration on R

111

The set of discontinuities of an integrable function can be infinite but may not be too large. We make this precise in Section 5.8. In the meantime, we offer the following examples to illustrate the basic idea. In the first example, the function is discontinuous only on a countably infinite set, while in the second the function is discontinuous everywhere.

0

x3 1/(n − 1) x4 · · · x2n−3 1/2 x2n−2

x1 1/n x2

1

FIGURE 5.3: The partition Pn of Example 5.1.11.

5.1.11 Example. Let f be any bounded function on [0, 1] such that f (x) = 0 R1 if x 6∈ {1/n : n = 2, 3 . . .}. We claim that f is integrable and that 0 f = 0. The idea is to enclose the points of discontinuity of f in small intervals, as in the proof of 5.1.10. Fix n and let Pn = {x0 = 0, x1 , x2 , . . . , x2n−2 , x2n−1 = 1}, where x2j−1 < 1/(n − j + 1) < x2j < x2j+1 , j = 1, 2, . . . , n − 1, and ∆x2j = x2j − x2j−1 < 1/n2 , j = 1, 2, . . . , n. (See Figure 5.3.) Let |f | ≤ M on [0, 1]. Since f = 0 on [x2j , x2j+1 ] and mj ≥ −M , S(f, Pn ) = m1 x1 + m2 (x2 − x1 ) + · · · + m2n−2 (x2n−2 − x2n−3 ) ≥ −M x1 + (x2 − x1 ) + (x4 − x3 ) + · · · + (x2n−2 − x2n−3 ) ≥ −M (1/n + (n − 1)/n2 ) = −M (2/n − 1/n2 ). A similar calculation shows that S(f, Pn ) ≤ M (2/n − 1/n2 ). Therefore, lim S(f, Pn ) = lim S(f, Pn ) = 0, n

n

hence f is integrable with zero integral.

♦

5.1.12 Example. The Dirichlet function d(x) (3.1.7) is not integrable on any (nondegenerate) interval [a, b]. Indeed, every upper sum of d(x) has the value b − a and every lower sum has the value 0. ♦ A useful characterization of integrability may be given in terms of the limits of S(f, P) and S(f, P) as kPk → 0. 5.1.13 Definition. Let L ∈ R. We write L = limkPk→0 S(f, P) if, given ε > 0, there exists δ > 0 such that |S(f, P) − L| < ε for all partitions P with kPk < δ. The limit limkPk→0 S(f, P) is defined analogously. ♦

112

A Course in Real Analysis

5.1.14 Lemma. Let P 0 = {x00 = a < x01 < · · · x0n < x0n+1 = b} be a partition of [a, b] and let |f | ≤ M on [a, b]. Then S(f, P) ≤ S(f, P 0 ) + 3nM kPk for all partitions P of [a, b] with kPk < δ 0 := minj ∆x0j .

P

0

P P 00

x02

x01

a γ

γ

γ

γ

γ

α β

γ

α

γ

β

b

β

β

γ

γ γ

FIGURE 5.4: The partitions P 0 , P, and P 00 . Proof. Since kPk < ∆x0j , no interval of P can contain more that one interior point of P 0 . Mark the intervals of P that contain exactly one interior point of P 0 by α and mark those that contain no interior point of P 0 by γ. Consider the common refinement P 00 = P ∪ P 0 of P and P 0 . Some of the intervals of P 00 were formed from an interval of P of type α; we mark those by β. The remaining intervals of P 00 , intervals that were not formed from an interval of P of type α, are precisely the intervals marked γ in P. Thus the terms of S(f, P) and S(f, P 00 ) corresponding to intervals of type γ are identical, hence cancel under substraction of upper sums. Therefore, in the obvious notation, X X S(f, P) − S(f, P 00 ) = Mj (f )∆xj − Mj00 (f )∆x00j α

≤M

β

hX

∆xj +

α

X

∆x00j

i

β

≤ M nkPk + 2nkP 00 k , the last inequality because there are at most n intervals of type α and at most 2n intervals of type β. Since P 00 is a refinement of P 0 and P, S(f, P) − S(f, P 0 ) ≤ S(f, P) − S(f, P 00 ) ≤ 3nM kPk. 5.1.15 Theorem. For any bounded function f on [a, b], Z

b

f = lim S(f, P) and kPk→0

a

b

Z

f = lim S(f, P).

a

kPk→0

(5.2)

Thus f is integrable on [a, b] iff the limits in (5.2) are equal, in which case Z a

b

f = lim S(f, P) = lim S(f, P). kPk→0

kPk→0

(5.3)

Riemann Integration on R

113

Proof. Given ε > 0, choose a partition P 0 such that Z b 0 S(f, P ) < f + ε/2. a

In the notation of 5.1.14, for any partition P with kPk < δ 0 , Z b S(f, P) ≤ S(f, P 0 )| + 3nM kPk < f + ε/2 + 3nM kPk. a

Hence if kPk < min{δ 0 , ε/6nM }, then Z b Z b f ≤ S(f, P) < f + ε. a

a

Since ε was arbitrary, the first limit in (5.2) is established. The second follows from the first by considering −f and using Exercise 5.1.3. Equation (5.3) represents the integral as a limit of upper and lower sums. It is also possible to represent the integral as a limit of intermediate sums, called Riemann sums. 5.1.16 Definition. Let P = {x0 = a < x1 < · · · < xn = b} be a partition of [a, b] and let ξ = (ξ1 , . . . , ξn ), where ξj ∈ [xj−1 , xj ]. The sum S(f, P, ξ) :=

n X

f (ξj )∆xj

j=1

is called the Riemann sum of f determined by P and ξ.

♦

Figure 5.5 illustrates a Riemann sum for a positive continuous function f . In this case S(f, P, ξ) is the total area of the rectangles with heights f (ξj ) and bases ∆xj . f

a

ξ1

x1

x2 ξ2

ξ3

x3

ξ4

x4

ξ5 b

x

FIGURE 5.5: A Riemann sum. 5.1.17 Definition. Let P = {x0 = a < x1 < · · · < xn = b} be a partition of [a, b] and let ξ = (ξ1 , . . . , ξn ), where ξj ∈ [xj−1 , xj ]. We write L = lim S(f, P, ξ) kPk→0

114

A Course in Real Analysis

if for each ε > 0 there exists δ > 0 such that |S(f, P, ξ) − L| < ε for all partitions P with kPk < δ and all choices of ξ. Similarly, we write L = lim S(f, P, ξ) P

if for each ε > 0 there exists a partition Pε such that |S(f, P, ξ) − L| < ε for all refinements P of Pε and all choices of ξ. ♦ We may now give Riemann’s characterization of integrability. 5.1.18 Theorem. The following statements are equivalent: (a) f ∈ Rba . (b)

lim S(f, P, ξ) exists in R.

kPk→0

(c) lim S(f, P, ξ) exists in R. P

If these conditions hold, then Z b f = lim S(f, P, ξ) = lim S(f, P, ξ). a

kPk→0

Proof. (a) ⇒ (b): Let L =

Rb a

P

f . For any partition P and any ξ, we have

S(f, P) − L ≤ S(f, P, ξ) − L ≤ S(f, P) − L, hence (b) follows from 5.1.15. (b) ⇒ (c): Let L := limkPk→0 S(f, P, ξ). Given ε > 0, choose δ > 0 such that |S(f, P, ξ) − L| < ε for all partitions P with kPk < δ and all ξ.

(5.4)

Choose any partition Pε with kPε k < δ. If P is any refinement of Pε , then kPk ≤ kPε k < δ, hence (5.4) holds for P. (c) ⇒ (a): Let L := limP S(f, P, ξ). Given ε > 0, choose a partition Pε such that |S(f, P, ξ) − L| < ε for all refinements P of Pε and all ξ.

(5.5)

For such a partition P, by the approximation property of suprema there exists for each j a sequence {ξj,k }∞ k=1 in [xj−1 , xj ] such that limk f (ξj,k ) = Mj (f ). It follows that lim S(f, P, ξ k ) = S(f, P), where ξ k = (ξ1k , ξ2k , . . . , ξnk ). k

Rb Rb From (5.5), a f − L ≤ S(f, P) − L ≤ ε. Since ε was arbitrary, a f ≤ L. Rb Rb Rb Similarly, a f ≥ L. Therefore a f = a f .

Riemann Integration on R

115

Exercises 1. Prove that if k is a constant, then

Rb a

k=

Rb a

k = k(b − a).

2. Let a ≤ c < d ≤ b. Define f on [a, b] by f (x) = 1 if x ∈ [c, d] and Rb f (x) = 0 otherwise. Show that f ∈ Rba and evaluate a f . 3.S ⇓1 Prove that Rb Rb (a) S(−f, P) = −S(f, P) and a (−f ) = − a f. Rb Rb (b) f ∈ Rba ⇒ −f ∈ Rba and a (−f ) = − a f. 4. ⇓2 Prove that a monotone function is integrable. 5.S Let f ∈ Rba and let g : [a, b] → R be any function that differs from f at Rb Rb finitely many points in [a, b]. Prove that g ∈ Rba and that a f = a g. Does the same result hold if g differs from f at countably many points? 6. Let f ∈ Rba . Prove: (a) If inf a≤x≤b f (x) > 0, then 1/f ∈ Rba . √ (b) If f (x) ≥ 0 for all x ∈ [a, b], then f ∈ Rba . (c)S sin(f ) ∈ Rba . 7.S Let F (P) be a real-valued function of partitions P on an interval [a, b]. Write L = lim F (P) P

if, given ε > 0, there exists a partition Pε such that |F (P) − L| < ε for all partitions P refining Pε . (a) Show that the limit is linear, that is, lim αF (P) + βG(P) = α lim F (P) + β lim G(P), P

P

P

provided the right side exists. (b) Let f be a bounded function on [a, b]. With this definition, show that Z a

b

f = lim S(f, P) and P

Z a

b

f = lim S(f, P). P

8. Let f ∈ R10 and set g(x) = xq , where q > 0. Prove that f ◦ g ∈ R10 . 1 This 2 This

exercise will be used in 5.2.2. exercise will be used in 5.9.8.

116

A Course in Real Analysis

5.2

Properties of the Integral

The following lemma will be useful in proving certain properties of integrals. 5.2.1 Lemma. Let f : [a, b] → R be bounded. Then there exists a sequence of partitions {Pn } of [a, b] such that lim S(f, Pn ) =

n→∞

b

Z

f and

f.

n→∞

a

b

Z

lim S(f, Pn ) =

a

Moreover, the limits still hold if each Pn is replaced by a refinement. Proof. By the approximation property of infima and suprema, for each n there exist partitions Pn0 and Pn00 of [a, b] such that Z

b

f − 1/n <

S(f, Pn0 )

Z ≤

a

b

b

Z

f and

f≤

a

Z

S(f, Pn00 )

<

a

b

f + 1/n.

a

Since refinements decrease upper sums and increase lower sums, the inequalities still hold if Pn0 and Pn00 are replaced by their common refinement Pn or by any refinement of Pn . Letting n → +∞ completes the proof. 5.2.2 Theorem. If f, g ∈ Rba and α, β ∈ R, then αf + βg ∈ Rba and Z

b

b

Z

αf + βg = α

a

f +β

a

Z

b

g. a

Proof. By 5.2.1, we may choose a sequence of partitions Pn such that lim S(f, Pn ) = n

Z

b

f and lim S(g, Pn ) = n

a

Z

b

g. a

(There exists one such sequence for f , another for g; the sequence of common refinements then works for both functions.) Letting n → ∞ in Z

b

(f + g) ≤ S(f + g, Pn ) ≤ S(f, Pn ) + S(g, Pn )

a

yields Z

b

(f + g) ≤

a

Similarly, Z a

Z

b

f+

a b

(f + g) ≥

Z a

Z

b

g. a

b

f+

Z

b

g. a

Riemann Integration on R

117

It follows that f + g is integrable and b

Z

(f + g) =

b

Z

g. a

a

a

b

Z

f+

Rb Rb It remains to prove that αf is integrable and that a αf = α a f . If α > 0, then S(αf, P) = αS(f, P) and S(αf, P) = αS(f, P). Taking the infimum and supremum over P yields b

Z

αf = α

Z

b

f=

αf .

a

a

b

Z a

If α < 0, then −α > 0, hence b

Z

αf =

a

Z

b

(−α)(−f ) = (−α)

Z

a

b

(−f ) = α

b

Z

a

f, a

the last equality by Exercise 5.1.3. 5.2.3 Proposition. If f ∈ Rba and a ≤ c < d ≤ b, then f |[c,d] ∈ Rdc . Proof. Given ε > 0, let P be a partition of [a, b] with S(f, P) − S(f, P) < ε. We may assume that c, d ∈ P, otherwise replace P by the refinement P ∪ {c, d}. If Q = P ∩ [c, d], then clearly S f |[c,d] , Q − S f |[c,d] , Q ≤ S(f, P) − S(f, P) < ε, hence f |[c,d] ∈ Rdc . The following is a converse of 5.2.3. 5.2.4 Theorem. Let a < c < b. If f |[a,c] ∈ Rca and f |[c,b] ∈ Rbc , then f ∈ Rba and Z b Z c Z b f= f+ f. a

a

c

Proof. By 5.2.1, we may choose sequences of partitions Pn0 of [a, c] and Pn00 of [c, b] such that lim S(f |[a,c] , Pn0 ) n

=

Z

c

f a

and

lim S(f |[c,b] , Pn00 ) n

=

Z

Then Pn := Pn0 ∪ Pn00 is a partition of [a, b] and Z a

b

f ≤ S(f, Pn ) = S(f |[a,c] , Pn0 ) + S(f |[c,b] , Pn00 ).

b

f. c

118

A Course in Real Analysis

Letting n → ∞, we obtain Z

b

Z f≤

a

c

f+

a

Z

b

f. c

Replacing f by −f produces the reverse inequality for the lower integral of f , proving the theorem. 5.2.5 Theorem. If f, g ∈ Rba and f ≤ g on [a, b], then Z b Z b f≤ g. a

a

In particular, if m ≤ f (x) ≤ M for all x ∈ [a, b], then Z b m(b − a) ≤ f ≤ M (b − a). a

Proof. Let P be a partition of [a, b]. By hypothesis, Mj (f ) ≤ Mj (g) for each j, hence S(f, P) ≤ S(g, P). Taking the infimum over P yields the first inequality. The second inequality follows from the first and Exercise 5.1.1. Z b Z b f ≤ |f |. 5.2.6 Theorem. If f ∈ Rba , then |f | ∈ Rba and a

a

Proof. By Exercise 1.4.5, for any partition P of [a, b], Mj (|f |) − mj (|f |) ≤ Mj (f ) − mj (f ). Summing over j, S(|f |, P) − S(|f |, P) ≤ S(f, P) − S(f, P). Since the right side can be made arbitrarily small, |f | ∈ Rba . Applying 5.2.5 to ±f ≤ |f | we obtain Z b Z b ± f≤ |f |, a

a

which gives the desired inequality. 5.2.7 Theorem. If f, g ∈ Rba , then f g ∈ Rba . Proof. Since f g = 12 (f + g)2 − f 2 − g 2 , it suffices to prove that f 2 ∈ Rba . To this end, let P be any partition of [a, b] and let |f | ≤ M . Then Mj (f 2 ) − mj (f 2 ) = Mj2 (|f |) − m2j (|f |) ≤ 2M Mj (|f |) − mj (|f |) . Summing over j, S(f 2 , P) − S(f 2 , P) ≤ 2M S(|f |, P) − S(|f |, P) . Since |f | ∈ Rba , the right side of the last inequality may be made arbitrarily small. Therefore, f 2 ∈ Rba .

Riemann Integration on R

119

Exercises 1.S Let {cn } be a convergent sequence in [a, b] and let f be a bounded function on [a, b] with f (x) = 0 for all x 6∈ {cn }. Prove that f ∈ Rba and Rb find a f . 2. Define f on [0, 1] by f (0) = 0 and f (x) = 2−n

if 2−n−1 < x ≤ 2−n , n ≥ 0. R1 Prove that f ∈ R10 and evaluate 0 f . 3. Prove or disprove: |f | ∈ Rba implies f ∈ Rba . 4. A function s on [a, b] is called a step function if there exists a partition of [a, b] such that s is constant on the interior of each partition interval. Show that a step function is integrable. Prove that a bounded function f is integrable on [a, b] iff for each ε > 0 there exist step functions s` and Rb su such that s` ≤ f ≤ su and a (su − s` ) < ε. 5.S Prove that if fj ∈ Rba , 1 ≤ j ≤ n, then max{f1 , . . . , fn } ∈ Rba and min{f1 , . . . , fn } ∈ Rba . 6.S Let f be continuous and f (x) < M for all x ∈ [a, b]. Prove that Z b f < M (b − a). (Compare with 5.2.5.) a

7. Let f ∈ Rba be nonnegative. Prove that if f is continuous at some point Rb x0 ∈ [a, b] and f (x0 ) 6= 0, then a f > 0. 8. Let f ∈ Rba such that either Rb (a) a f g = 0 for every continuous function g, or Rb (b) a f g = 0 for every step function g. Prove that f is zero at each point of continuity of f . Ry 9.S Let f ∈ Rba and for x, y ∈ [a, b] define F (x, y) = x f . Prove that F (x, y) is continuous in y for each x and continuous in x for each y. 10. Let f be bounded on [a, b] and integrable [c, b] for every a < c < b. Prove that the following statements are equivalent: Z b (a) lim+ f exists in R. x→a

(b) lim

x

Z

n→+∞

b

f exists in R for some sequence an ↓ a.

an

(c) f ∈ Rba . Conclude from Exercise 9 that if f ∈ Rba , then the limit in (a) is

Rb a

f.

120

A Course in Real Analysis

11. Let f be integrable on [0, x] for all x > 0. Prove that Z Z 1 x 1 x lim inf f (x) ≤ lim inf f ≤ lim sup f ≤ lim sup f (x). x→+∞ x→+∞ x 0 x→+∞ x 0 x→+∞ Conclude that if L := limx→+∞ f (x) exists in R, then Z 1 x lim f (t) dt = L. x→+∞ x 0 12.S Let f be continuous on [a, b] and let M = supa≤x≤b |f (x)|. Prove: (a) For each ε > 0 there exists δ > 0 such that Z b δ(M − ε) ≤ |f (x)| dx ≤ M (b − a). a b

Z

(b) M = lim

p→+∞

|f |p

1/p

.

a

13. ⇓3 Let f, g : [a, b] → R be continuous. Supply the details in the following outline of a proof of the Cauchy–Schwarz inequality. Z b 2 Z b Z b fg ≤ f2 g2 . a

a

(a) The inequality holds if

a

b

Z

g 2 = 0.

a

(b) For any real number t, Z b Z 2 0≤ (f − tg) = a

(c) Let t =

Z

f − 2t 2

a

Z fg

a

5.3

b

b

b

g

2

−1

Z a

b

fg + t

2

Z

b

g2 .

a

in (b).

a

Evaluation of the Integral

The theorems in this section describe standard methods for evaluating integrals. The first of these expresses the integral of a function f in terms of a primitive or antiderivative, that is, a function whose derivative is f . It also shows that the process of integration is the inverse of that of differentiation. 3 This

exercise will be used in 5.7.19.

Riemann Integration on R

121

5.3.1 Fundamental Theorem of Calculus. Let f : [a, b] → R be continuous. Z x (a) The function G(x) := f (t) dt, x ∈ [a, b], is a primitive of f . a

(b) For any primitive F of f ,

Z a

(c) If f 0 ∈ Rba , then

b

b f = F (x) := F (b) − F (a). a

b

Z

f 0 = f (b) − f (a). In particular, f (x) = f (a) +

x

Z

a

f 0.

a

Proof. (a) We assume that a ≤ x < b and prove that lim

h→0+

G(x + h) − G(x) = f (x). h

(5.6)

By 5.2.4 and 5.2.6, if h > 0 and x + h < b, then Z x+h Z 1 G(x + h) − G(x) 1 x+h − f (x) = f (t) − f (x) dt ≤ |f (t) − f (x)| dt. h x h x h By continuity of f at x, given ε > 0 we may choose δ > 0 such that |t − x| < δ implies |f (t) − f (x)| < ε. Thus if h < δ, then the term on the right in the above inequality is ≤ ε, proving (5.6). (b) Let F be any primitive of f . Then F = f 0 = G, hence F = G + c for some constant c. Thus from (a), Z b f = G(b) − G(a) = F (b) − F (a). a

(c) For any partition P, by the mean value theorem f (xj ) − f (xj−1 ) = f 0 (ξj )∆xj for some ξj ∈ [xj−1 , xj ], j = 1, . . . , n. For this choice of ξj , S(f 0 , P, ξ) =

n X j=1

f 0 (ξj )∆xj =

n X f (xj ) − f (xj−1 )] = f (b) − f (a). j=1

Since we may choose P so that S(f 0 , P, ξ) is arbitrarily near

Rb a

f 0 , (c) follows.

R The general primitive of a continuous function f is denoted by f and Rb is called the indefinite integral of fR. (In this context, a f is called a definite integral.) For example, one writes 3x2 dx = x3 + c, where c is the so-called constant of integration. In general, since primitives of a function differ only by a constant, we write Z f (x) dx = F (x) + c, where F is any particular primitive of f .

122

A Course in Real Analysis

5.3.2 Change of Variables Theorem. Let ϕ : [a, b] → R be continuously differentiable with ϕ0 never zero and let f be integrable on [c, d] := ϕ([a, b]). Then (f ◦ ϕ)|ϕ0 | ∈ Rba and b

Z

f (ϕ(x))|ϕ (x)| dx = 0

a

Z

d

f (y) dy.

(5.7)

c

Proof. By the intermediate value theorem, we may assume that ϕ0 (x) > 0 for all x, so ϕ is strictly increasing, c = ϕ(a), and d = ϕ(b).

y = ϕ(x) d yn−1 .. . yj yj−1 .. . y1 c a x1 · · · xj−1

xj · · · xn−1 b

x

FIGURE 5.6: The partitions P x and P y . We show first that f ◦ ϕ ∈ Rba . For this we use the fact that ϕ induces a one-to-one correspondence between partitions P x = {x0 , . . . , xn } of [a, b] and partitions P y = {y0 , . . . , yn } of [c, d], where yj = ϕ(xj ) (xj = ϕ−1 (yj )) (see Figure 5.6). Since ϕ([xj−1 , xj ]) = [yj−1 , yj ], Mjx (f ◦ ϕ) =

sup

xj−1 ≤x≤xj

f (ϕ(x)) =

sup

yj−1 ≤y≤yj

f (y) = Mjy (f ).

(5.8)

Moreover, by the mean value theorem, there exists zj ∈ [yj−1 , yj ] such that ∆xj = ϕ−1 (yj ) − ϕ−1 (yj−1 ) = (ϕ−1 )0 (zj )∆yj ≤ C∆yj , where C is a bound for |(ϕ−1 )0 | on [c, d]. From (5.8) and (5.9), S(f ◦ ϕ, P x ) ≤ CS(f, P y ). The same inequality evidently holds for −f , hence −S(f ◦ ϕ, P x ) ≤ −CS(f, P y ). Adding these inequalities, S(f ◦ ϕ, P x ) − S(f ◦ ϕ, P x ) ≤ C[S(f, P y ) − S(f, P y )].

(5.9)

Riemann Integration on R

123

Since the right side may be made arbitrarily small, f ◦ ϕ ∈ Rba , hence also (f ◦ ϕ)ϕ0 ∈ Rba . To prove (5.7), we argue as in the first part of the proof, but now compare the Riemann sums S((f ◦ ϕ)ϕ0 , P x , ξ) and S(f, P y , ζ), where the intermediate points in each case are taken to be left endpoints: ξ := (x0 , . . . , xn−1 ), Then

ζ := (y0 , . . . , yn−1 ) = (ϕ(x0 ), . . . , ϕ(xn−1 )).

n X S (f ◦ ϕ)ϕ0 , P x , ξ = f (ζj )ϕ0 (xj )∆xj j=1

and, by the mean value theorem, S(f, P y , ζ) =

n X

f (ζj )∆ϕ(xj ) =

j=1

n X

f (ζj )ϕ0 (tj )∆xj ,

j=1

for some tj ∈ [xj−1 , xj ]. Subtracting these equations and using the triangle inequality, we obtain n X S (f ◦ ϕ)ϕ0 , Px , ξ − S(f, Py , ζ) ≤ |f (ζj )| |ϕ0 (xj ) − ϕ0 (tj )|∆xj j=1 n X

≤M

|ϕ0 (xj ) − ϕ0 (tj )|∆xj ,

j=1

where M is a bound for |f | on [c, d]. By the uniform continuity of ϕ0 on [a, b], given ε > 0 there exists a δ > 0 such that |ϕ0 (s) − ϕ0 (t)| < ε/M (b − a) for all s, t with |s − t| < δ. Hence if kP x k < δ, then S((f ◦ ϕ)ϕ0 , P x , ξ) − S(f, P y , ζ) < ε. Letting kP x k → 0 and noting that then also kP y k → 0 (because ∆yj = ϕ0 (cj )∆xj ≤ BkP x k, where B is a bound for |ϕ0 |), we see that Z b Z b f (ϕ(x))ϕ0 (x) dx − f (y) dy ≤ ε. a

a

Since ε was arbitrary, the two integrals are equal, completing the proof. Remark. Whether ϕ is increasing or decreasing, (5.7) may be written as Z a

b

f ϕ(x) ϕ0 (x) dx =

Z

ϕ(b)

f (y) dy.

ϕ(a)

This formula has an easy proof if f is continuous. Indeed, in this case f has a

124

A Course in Real Analysis

primitive F on [c, d], hence, by the chain rule, F ◦ ϕ is a primitive for (f ◦ ϕ)ϕ0 . The desired formula now follows from the fundamental theorem of calculus: Z b Z ϕ(b) 0 f ϕ(x) ϕ (x) dx = F ϕ(b) − F ϕ(a) = f (y) dy. a

ϕ(a)

Note that in this case it is not necessary to assume that ϕ0 6= 0.

♦

5.3.3 Integration by Parts Formula. Let f and g be differentiable on [a, b] with f 0 , g 0 ∈ Rba . Then Z b b Z b f (x)g 0 (x) dx = f (x)g(x) − f 0 (x)g(x) dx. (5.10) a

a

a

Proof. Since (f g) = f g + f g ∈ 5.3.1(c) implies that Z Z b Z b b b f 0 g + f g0 . f (x)g(x) = (f g)0 = 0

0

0

Rba ,

a

a

a

a

5.3.4 Example. We show that (k − 1)(k − 3) · · · 4 · 2 Z π/2 , k(k − 2) · · · 5 · 3 sink x dx = π (k − 1)(k − 3) · · · 5 · 3 0 2 k(k − 2) · · · 4 · 2, Z π/2 Let Ik = sink x dx. Integrating by parts,

k odd, k even.

0

Ik =

Z

π/2

sink−1 x sin x dx = (k − 1)

Z

0

π/2

sink−2 x cos2 x dx.

0

Since cos x = 1 − sin x, Ik = (k − 1)(Ik−2 − Ik ), hence 2

2

Ik =

k−1 Ik−2 . k

Iterating, we obtain Ik =

(k − 1)(k − 3) · · · (k − 2j + 1) Ik−2j . k(k − 2) · · · (k − 2j + 2)

If k = is odd, take j = (k − 1)/2 so Ik =

(k − 1)(k − 3) · · · 4 · 2 I1 . k(k − 2) · · · 3 · 1

If k is even, take j = (k − 2)/2 so Ik =

(k − 1)(k − 3) · · · 5 · 3 I2 . k(k − 2) · · · 6 · 4

Since I1 = 1 and I2 = π/4, the formula follows.

♦

Riemann Integration on R

125

If f 0 and g 0 are continuous, then (5.10) has the following analog for indefinite integrals: Z Z f (x)g 0 (x) dx = f (x)g(x) −

f 0 (x)g(x) dx.

(5.11)

Setting h = g 0 and using the symbols D for differentiation and I for integration, we may write (5.11) as I(f h) = f · Ih − I(Df · Ih). By induction we obtain I(f h) =

n X

(−1)(k−1) Dk−1 f · I k h + (−1)n I Dn f · I n h .

(5.12)

k=1

Rb The fundamental theorem of calculus may then be used to calculate a f h. Formula (5.12) may be expressed in tabular form as shown in Table 5.1. For each k, the entries in column k are multiplied and the resulting products are added. The exception is in column n + 1, where the product must be integrated before adding. The process terminates if and when Dn f = 0. R TABLE 5.1: Table for evaluating f h by parts. k (−1)k−1 Dk−1 f Ikh

1 +1 f Ih

2 3 −1 +1 Df D2 f I 2h I 3h

··· ··· ··· ···

n (−1)n−1 Dn−1 f I nh

n+1 (−1)n Dn f I nh

5.3.5 Example. Using Table 5.1 with f (x) = (x + 1)3 and h(x) = e5x , we have Z 3 3(x + 1)2 6(x + 1) 6(x + 1) 3 5x 5x (x + 1) (x + 1) e dx = e − + − + c. 5 52 53 54 R TABLE 5.2: Table for evaluating (x + 1)4 e5x dx by parts. k (−1)k−1 Dk−1 f Ikh

1 +1 (x + 1)3 e5x /5

2 −1 3(x + 1)2 e5x /52

3 +1 6(x + 1) e5x /53

4 −1 6 e5x /54

5 +1 0 e5x /55 ♦

126

A Course in Real Analysis

Exercises 1.S ⇓4 Let f : R → R be continuous and periodic with period p > 0, that is, f (x + p) = f (x) for all x. Prove that Z p Z p f (x + y) dx = f (x) dx for all y ∈ R. 0

0

2. Let f : (a, b) → R have a uniformly continuous derivative. Prove that f 0 ∈ Rba and Z b f 0 = lim+ f (b − ε) − f (a + ε) . a

ε→0

3. Verify the following inequalities: Z 1 √ sin x dx 2 √ √ 2−1 ≤ ≤ 2 − 1. (a)S 2 π 1+x 0 Z 1 xp dx 21−q − 1 1 ≤ ≤ , p, q > 0, q 6= 1. (b) q p q 2 (p + 1) (p + 1)(1 − q) 0 (1 + x ) 4. Establish the formula Z 1 (1 − x)m xn dx = 0

5. Let n ∈ N. Evaluate Z 1 S (a) exp(x1/n ) dx. 0

m! . (n + 1)(n + 2) · · · (n + m + 1)

(b)

Z

e

lnn x dx.

1

6. Let k ∈ N. Show that (k − 1)(k − 3) · · · 4 · 2 , π/2 k(k − 2) · · · 5 · 3 k cos x dx = π (k − 1)(k − 3) · · · 5 · 3 0 2 k(k − 2) · · · 4 · 2,

Z

7.S ⇓5 Let k ∈ N. Show that (k − 1)(k − 3) · · · 4 · 2 Z 1 xk k(k − 2) · · · 3 · 1 √ dx = (k − 1)(k − 3) · · · 3 · 1 π 1 − x2 0 k(k − 2) · · · 4 · 2 2

k odd, k even.

if k is odd if k is even.

N.B. The integral is improper but converges by Exercise 5.7.7. For the even case, use Exercise 6. 4 This 5 This

exercise will be used in 13.6.4. exercise will be used in 13.4.2

Riemann Integration on R

127

8. Let f 0 be continuous and positive on [a, b]. Prove that b

Z

f (x) dx +

f (b)

Z

f −1 (y) dy = bf (b) − af (a).

f (a)

a

Interpret geometrically for f > 0 and a > 0. 9.S (Young’s inequality). Let f be continuous and strictly increasing on [0, a] with f (0) = 0. Prove that Z x Z y Z x f+ f −1 = yf −1 (y) + f. 0

Deduce that Z

x

f+

0

f −1 (y)

0

Z

y

f −1 ≥ xy, 0 ≤ x ≤ a, 0 ≤ y ≤ f (a).

0

10. Use Young’s inequality to verify the following inequalities: p 1 − y 2 + y sin−1 y ≥ xy + cos x, 0 ≤ x ≤ π/2, 0 ≤ y ≤ 1. (a) (b)S x ln x + ey ≥ xy + x, 1 ≤ x ≤ 2, 0 ≤ y ≤ ln 2. 11. Give an example of a discontinuous function that (a) has a primitive, (b) has no primitive. 12. Let f and g be continuously differentiable with g > 0. Prove that Z Z 0 f (x)g 0 (x) f (x) f (x) dx = dx − . 2 g (x) g(x) g(x) 13.S Let f 0 ∈ Rba . Prove that lim n

Z

b

f (x) sin(nx) dx = 0.

a

14. Let f be continuous on [0, +∞) such that limx→+∞ f (x) exists in R and let a > 0. Find Z a lim f (nx) dx. n→+∞

0

15. Let h0 be continuous and positive on [a, b] and let g 0 be continuous on [c, d] = [h(a), h(b)]. Prove that Z a

b

g h(x) dx = g(d)b − g(c)a −

Z c

d

g 0 (t)h−1 (t) dt.

128

A Course in Real Analysis

16. Let f ∈ Ra−a , a > 0. Show that ( Z a 0 if f is an odd function, Ra f= 2 f if f is an even function. −a 0 17.S Let f : [a, b] → R be continuous and let u, v be differentiable functions with range contained in [a, b]. Prove that v(x)

Z

d dx

f = f v(x) v 0 (x) − f u(x) u0 (x).

u(x)

18. Let functions a, b, c, d : [0, 1] → [0, 1] have continuous derivatives and let f : [0, 1] → R be continuous. Suppose that b(x)

Z

f=

Z

a(x)

Prove that

Z

d(x)

f for all x ∈ [0, 1].

c(x)

b(1)

f+

Z

b(0)

c(1)

f=

Z

c(0)

a(1)

f+

Z

a(0)

d(1)

f. d(0)

19.S Let f be continuous and g differentiable with bounded derivative on [a, b]. Evaluate Z x g(x) lim f. x→a x − a a 20. Let p > 0, q > 1, and m, k ∈ N with m > k. Evaluate lim sn if sn = n→+∞

(a) S

n X k q−1 . (c) q n + kq

n X kp . (b) np+1

k=1

k=1

n X k=1

(mn)! nkn [(m − k)n]!

1/n .

21.S Let |f 0 | ≤ M on [a, b]. For n ∈ N set h = (b − a)/n and xk = a + kh, k = 0, 1, . . . , n − 1. Prove that Z b n X f −h f (x ) k−1 ≤ hM (b − a). a

k=1

22. Let f be continuous on [0, 1]. Prove that Z 0

1

Z 0

x

f (t) dt dx =

Z

1

(1 − x)f (x) dx.

0

23. Let f, g : [0, 1] → R be continuously differentiable, f monotone, and R1 g(x) > g(0) = g(1) on (0, 1). Prove that 0 f g 0 = 0 iff f is constant.

Riemann Integration on R

*5.4

129

Stirling’s Formula

Stirling’s formula gives an estimate for n! when n is large. The proof relies on material from Section 4.3. We begin with the following lemma, which provides the fundamental inequality needed to establish the formula. 5.4.1 Lemma. If f is concave and differentiable on (a, b), then Z v u+v f (u) + f (v) 1 f (t) ≤ f ≤ , a < u < v < b. 2 v−u u 2 Proof. By the concave versions of 4.3.6 and (4.3), f (u)

t−u v−t + f (v) ≤ f (t) ≤ f 0 (x)(t − x) + f (x) v−u v−u

for all a < u < v < b and all x, t ∈ [u, v]. Integrating with respect to t, Z v v − u (v − x)2 − (x − u)2 f (u) + f (v) ≤ + f (x)(v − u). f (t) ≤ f 0 (x) 2 2 u Taking x = (u+v)/2 and dividing by v−u produces the desired inequalities. 5.4.2 Stirling’s Inequalities. For all n, en n! √ ≤ e, nn n

e7/8 ≤

(5.13)

where the middle term is decreasing in n. Proof. Taking f (x) = ln x, u = k ∈ N, and v = k + 1 in the lemma, we have Z k+1 2 1 1 ln(t) dt ≤ ln k + 21 . 2 ln(k + k) ≤ 2 ln(k) + ln(k + 1) ≤ k

Rearranging, Z k+1 0≤ ln(t) dt − k

1 2

ln(k 2 + k) ≤ ln k +

1 2

−

1 2

ln(k 2 + k).

(5.14)

Now observe that n−1 X Z k+1 k

k=1 n−1 X k=1

ln(k 2 + k) =

ln t dt =

n−1 X

1 2

ln t dt = n ln n − n + 1,

1

[ln(k + 1) + ln k] = 2

k=1

ln(k + 21 ) −

n

Z

n X

ln k − ln n = 2 ln n! − ln n, and

k=2

ln(k 2 + k) =

1 2

ln 1 +

1 2 4(k + k)

≤

1 , + k)

8(k 2

130

A Course in Real Analysis

where, for the last inequality, we used the fact that ln(1 + x) < x for x > 0, which follows directly from the integral definition of ln(x + 1). Summing in (5.14) and using the above inequalities, we obtain 0≤ n+

1 2

ln n − n + 1 − ln n! ≤

n−1 X k=1

n−1 1X 1 1 1 1 = − ≤ . 8(k 2 + k) 8 k k+1 8 k=1

Note that the term n + 2 ln n − n + 1 − ln n! is increasing in n since it was obtained as a sum of nonnegative terms in (5.14). Rearranging, we have 1

7 ≤ − n + 12 ln n + n + ln n! ≤ 1, 8 where the middle term is decreasing in n. Exponentiating yields the desired inequalities. 5.4.3 Stirling’s Formula. lim n

√ en n! √ = 2π. n n n R π/2

Proof. By 5.4.2, the limit L in the formula exists in R. Set In = By 5.3.4, I2n+1 =

0

sinn x dx.

(2n)(2n − 2) · · · 4 · 2 π (2n − 1)(2n − 3) · · · 5 · 3 and I2n = . (2n + 1)(2n − 1) · · · 5 · 3 2 2n(2n − 2) · · · 4 · 2

For x ∈ [0, π/2] and n ≥ m, sinn x ≤ sinm x, hence I2n+2 I2n+1 I2n ≤ ≤ = 1. I2n I2n I2n It follows that 2n + 1 π 22 · 42 · 62 · · · (2n − 2)2 · (2n)2 π ≤ 2 2 2 ≤ , 2n + 2 2 1 · 3 · 5 · · · (2n − 1)2 (2n + 1) 2 from which we obtain Wallis’s product lim n

22 · 42 · 62 · · · (2n − 2)2 · (2n)2 π = . 2 2 2 2 1 · 3 · 5 · · · (2n − 1) (2n + 1) 2

Denote the general term in Wallis’s product by αn . Since 2 · 4 · · · (2n − 2) · (2n) = 2n n! and 3 · 5 · · · (2n − 1) = we see that

√

αn =

22n (n!)2 √ . (2n)! 2n + 1

(2n)! , 2n n!

Riemann Integration on R

131

en n! √ and note that nn n

Now set βn =

√ √ βn2 e2n (n!)2 (2n)2n 2n (n!)2 22n 2 √ . = 2n+1 = β2n n e2n (2n)! (2n)! n Dividing by

√

αn ,

√ √ (n!)2 22n 2 (2n)! 2n + 1 √ p √ = 2 2 + 1/n → 2. = β2n αn 22n (n!)2 (2n)! n p √ Since αn → π/2 and βn → L, we also have r βn2 2 lim . →L √ n β2n αn π q √ Therefore, L π2 = 2, hence L = 2π. βn2 √

5.5

Integral Mean Value Theorems

The following theorem asserts that the average value of a continuous function over an interval [a, b] is actually assumed by the function at some intermediate point c. 5.5.1 First Mean Value Theorem for Integrals. If f is continuous on [a, b], then there exists c ∈ (a, b) such that Z b 1 f = f (c). b−a a and the fundamental Proof. Apply the mean value theorem for derivatives Rx theorem of calculus to the function G(x) := a f (t) dt. The next theorem is a weighted average generalization of 5.5.1. 5.5.2 Weighted Mean Value Theorem for Integrals. Let f be continuous on [a, b] and let g ∈ Rba . If g does not change sign in [a, b], then there exists c ∈ [a, b] such that Z b Z b f g = f (c) g. (5.15) a

a

Rb Proof. We may assume that g ≥ 0 on [a, b], so a g ≥ 0. Suppose first that Rb g = 0. If C is an upper bound for |f | on [a, b], then a Z b Z b Z b f g ≤ |f |g ≤ C g = 0, a

a

a

132

A Course in Real Analysis Rb hence both sides of (5.15) are zero. Now assume that a g > 0. Let m = f (xm ) and M = f (xM ) denote the minimum and maximum values of f on [a, b]. Since mg ≤ f g ≤ M g, b

Z

b

Z

g≤

m

Z

a

a

hence

b

fg ≤ M

g, a

b

Z

fg m ≤ Za

≤ M.

b

g a

An application of the intermediate value theorem completes the proof. 5.5.3 Second Mean Value Theorem for Integrals. Let f be continuous and g differentiable and monotone on [a, b] with g 0 ∈ Rba . Then there exists c ∈ [a, b] such that Z b Z c Z b f g = g(a) f + g(b) f. a

Proof. Let F (x) =

Rx

a

c

f . Integrating by parts,

a

Z a

b

fg =

Z

b

F g = F (b)g(b) − 0

a

Z

b

g 0 F.

a

Since g is monotone, the sign of g 0 does not change, hence, by 5.5.2, there exists c ∈ [a, b] such that Z a

b

g 0 F = F (c)

Z

b

g 0 = F (c)[g(b) − g(a)].

a

Therefore, Z

b

f g = F (b)g(b) − F (c)[g(b) − g(a)] = g(a)F (c) + g(b)[F (b) − F (c)],

a

which is the assertion of the theorem. Remarks. (a) Because derivatives have the intermediate value property (Exercise 4.2.25), the monotonicity requirement on g will be satisfied if g 0 6= 0 on [a, b]. (b) The second mean value theorem for integrals holds under the less restrictive hypotheses that f is integrable and g is monotone. A proof may be found in [3]. ♦

Riemann Integration on R

133

Exercises

√ √ 1. Let 0 ≤ a < b and let f be continuous on [ a, b]. Prove that there exists c ∈ [a, b] such that 1 2

b

√ f ( x) dx = a

Z

√

a

√

√ c

Z

f (x) dx + b

Z √

a

b

f (x) dx. c

2. Let 0 < a < b and let f be continuous on [b−1 , a−1 ]. Prove that there exists c ∈ [a, b] such that Z

b

f (1/x) dx = b

2

1/c

Z

a

f (x) dx + a

2

1/a

Z

1/b

f (x) dx.

1/c

3.S Let f be continuous on [0, 1]. Prove that there exists c ∈ [1/2, such that Z

2 f sin x dx = √ 3

π/3

π/6

√

c

Z

f (x) dx + 2

Z

√

3/2]

3/2

f (x) dx.

c

1/2

4. Let f be continuous on [0, 1]. Prove that there exists c ∈ [0, 1] such that Z

π/4

f tan x dx =

c

Z

0

f (x) dx +

0

1 2

Z

1

f (x) dx.

c

5. Let f and g be continuous on [a, b]. Show that there exists c ∈ (a, b) such that Z Z b

b

f = f (c)

g(c) a

g. a

6. Prove: If f is continuous, g ∈ Rba , and m is lower bound for g, then there exist c, d ∈ [a, b] such that Z

b

f g = f (c)

b

Z

a

g + m(b − a)[f (d) − f (c)].

a

7.S Prove the following variant of the second mean value theorem for integrals: Let f, g ∈ Rba with g ≥ 0. If m ≤ f ≤ M on [a, b], then there exists c ∈ [a, b] such that Z

b

fg = m

a

Hint. Consider G(x) := m

Z

c

g+M

a

Rx a

g+M

Z

f. c

Rb x

g.

b

134

A Course in Real Analysis

8. Let g have a nonnegative integrable derivative on [0, 1] with g(0) = 0 and g(1) = 1. Show that there exists c ∈ [0, 1] such that Z

1

xn g(x) dx =

0

1 − cn+1 . n+1

9.S Let g have a nonnegative integrable derivative on [0, π] with g(0) = 0 and g(π) = 1. Show that there exists c ∈ [0, π] such that Z π g(x) sin x dx = cos c + 1. 0

10. Let g be twice differentiable on [a, b] with g 00 < 0 and g 00 ∈ Rba , and let f be continuous on g([a, b]). Show that if g 0 ≥ m > 0 and |f | ≤ M , then Z

a

b

2M . f 0 ◦ g ≤ m

Hint. Use the second mean value theorem for integrals.

*5.6

Estimation of the Integral

Integrals that cannot be evaluated exactly may be approximated by various numerical methods. Of course, an integral may always be approximated by a Riemann sum; however, unless the intermediate points of the subintervals are chosen judiciously, a Riemann sum usually offers only a coarse approximation of the integral. In this section we discuss three techniques, the trapezoidal rule, the midpoint rule, and Simpson’s rule, that yield good numerical estimates of an integral. The approximation techniques are given in order of increasing precision. For each of these, we use partitions of the form xk = a + khn , k = 0, 1, . . . , n, where hn :=

b−a . n

(5.16)

Rb The integral a f is then estimated by replacing f on the interval [xk , xk+1 ] by a simpler function fk . The approximation is therefore Z a

b

f (x) dx ≈

n−1 X Z xk+1 k=0

fk (x) dx.

xk

The error in the approximation is simply the difference between the left and right sides. The main goal in the approximation schemes described below is

Riemann Integration on R

135

to obtain, for a given class of functions, the sharpest upper bound for the magnitude of the error The reader may wish to compare the error bounds in the three approximation techniques described below with the error bound for the approximation given by the Riemann sum Rn =

b − a f (x0 ) + f (x1 ) + · · · + f (xn−1 ) . n

(5.17)

By Exercise 5.3.21, for functions f with a bounded derivative one has in general only the first order error bound Z b f − Rn ≤ hn (b − a)kf 0 k∞ , a

implying that a good estimate requires a large n. Here, for a bounded function g on [a, b], kgk∞ := sup {|g(x)| : a ≤ x ≤ b} ,

Trapezoidal Rule Let

Pk := (xk , f (xk )) = (xk , yk ), k = 0, 1, . . . , n,

(5.18)

where the points xk are given in (5.16). The trapezoidal rule uses the line segment from Pk to Pk+1 to approximate f on [xk , xk+1 ], k = 0, 1, . . . n − 1. Thus the approximating function fk is given by fk (x) = yk + mk (x − xk ), xk ≤ x ≤ xk+1 , mk :=

yk+1 − yk . xk+1 − xk

A simple calculation shows that Z xk+1 hn (yk+1 + yk ), fk = 2 xk The sum Tn :=

n−1 X Z xk+1

fk =

k=0

xk

hn y0 + 2y1 + · · · + 2yn−1 + yn 2

Rb is then used to approximate a f . If f > 0, Tn may be realized as the sum of areas of trapezoids. (See Figure 5.7.) Rb 5.6.1 Trapezoidal Rule. If f ∈ Rba , then limn Tn = a f . Moreover, if f 00 exists and is continuous on [a, b], then the following error estimate holds: Z

a

b

h2 f − Tn ≤ n (b − a)kf 00 k∞ . 12

136

A Course in Real Analysis

f

x0

x2

x1

x3

x4

x5

x

x6

FIGURE 5.7: Trapezoidal rule approximation. Proof. For the Riemann sum Rn in (5.17), b − a b − a f (x0 ) − f (xn ) = f (a) − f (b) → 0, 2n 2n Rb hence Tn = (Tn − Rn ) + Rn → a f. To obtain the error estimate, consider the function Rn − Tn =

gk (x) :=

f (x) − yk − mk (x − xk ) f (x) − fk (x) = , (x − xk )(x − xk+1 ) (x − xk )(x − xk+1 )

which has singularities at xk and xk+1 . Since both the numerator and the denominator vanish at these points, the singularities may be removed using l’Hospital’s rule. Therefore, gk (x) has a continuous extension to [xk , xk+1 ]. Since (x − xk )(x − xk+1 ) does not change sign on [xk , xk+1 ], by the weighted mean value theorem for integrals (5.5.2) there exists a point zk ∈ [xk , xk+1 ] such that Z xk+1 Z xk+1 [f (x) − fk (x)] dx = gk (x)(x − xk )(x − xk+1 ) dx xk xk Z xk+1 = gk (zk ) (x − xk )(x − xk+1 ) dx xk 3

= −gk (zk ) It follows that Z b n−1 XZ f (t) dt − Tn = a

k=0

xk+1

xk

h . 6

[f (x) − fk (x)] dx = −

n−1 h3n X gk (zk ). 6

(5.19)

k=0

Now fix x ∈ (xk , xk+1 ) and define ψ(z) on [xk , xk+1 ] by ψ(z) = f (z) − fk (z) − gk (x)(z − xk )(z − xk+1 ). Since ψ has distinct zeros x, xk , and xk+1 , Rolle’s theorem applied twice shows

Riemann Integration on R

137

that ψ 00 has a zero vk ∈ (xk , xk+1 ). It follows that f 00 (vk ) = 2gk (x). Since x was arbitrary, |gk (x)| ≤ 21 kf 00 k∞ for all x ∈ [xk , xk+1 ]. From this and (5.19) we see that Z

b

a

nh3n 00 h2 f (t) dt − Tn ≤ kf k∞ = n (b − a)kf 00 k∞ . 12 12

Midpoint Rule Let xk :=

xk + xk+1 = a + k + 21 hn , k = 0, 1, . . . , n − 1, 2

where the points xk are given in (5.16). The midpoint rule uses the constant function fk (x) = f (xk ) , xk ≤ x ≤ xk+1 , Rb to approximate f on [xk , xk+1 ]. This amounts to approximating a f by Riemann sums Mn , where the intermediate points are the midpoints of the intervals: b − a Mn = f (x0 ) + f (x1 ) + · · · + f (xn−1 ) . n

f

a

x0

x1

x1

x2

x2

x3

x3

x b

FIGURE 5.8: Midpoint rule approximation. 5.6.2 Midpoint Rule. If f 00 exists and is continuous on [a, b], then the following error estimate holds: Z b h2 f − Mn ≤ n (b − a)kf 00 k∞ . a 24

138

A Course in Real Analysis

Proof. The function gk (x) =

f (x) − f (xk ) − f 0 (xk )(x − xk ) (x − xk )2

has a double singularity at xk , which may be removed by applying l’Hospital’s rule twice and defining gk (xk ) to be the resulting limit. Since f (x) − f (xk ) − f 0 (xk )(x − xk ) = gk (x)(x − xk )2 and

Z

xk+1

(x − xk ) dx = 0,

xk

we see that Z

xk+1

[f (x) − f (xk )] dx =

Z

xk+1

gk (x)(x − xk )2 dx.

xk

xk

Since (x − xk )2 has constant sign on [xk , xk+1 ], the weighted mean value theorem for integrals implies that the integral on the right equals Z xk+1 h3 gk (zk ) (x − xk )2 dx = gk (zk ) n 12 xk for some point zk ∈ [xk , xk+1 ]. Therefore, Z xk+1 h3 [f (x) − f (xk )] dx = gk (zk ) n . 12 xk

(5.20)

Now fix x ∈ [xk , xk ) ∪ (xk , xk+1 ]. By Taylor’s theorem, there exists a point ξk ∈ [xk , xk ] such that f (x) = f (xk ) + f 0 (xk )(x − xk ) +

f 00 (ξk ) (x − xk )2 . 2

Solving for f 00 (ξk ) we see that f 00 (ξk ) = 2gk (x). Therefore, |gk (x)| ≤ kf 00 k∞ /2 for all x ∈ [xk−1 , xk+1 ], hence from (5.20), Z xk+1 h3 h3 − n |f 00 k∞ ≤ [f (x) − f (xk )] dx ≤ n |f 00 k∞ . 24 24 xk Summing, we obtain −

nh3n 00 kf k∞ ≤ 24

Z

b

f (x) dx − Mn ≤

a

nh3n 00 kf k∞ , 24

which is the assertion of the theorem. Note that the estimates in both the trapezoidal rule and the midpoint rule are exact for all linear functions f , since then f 00 = 0.

Riemann Integration on R

139

Simpson’s Rule Simpson’s rule assumes n = 2m in (5.16) and uses a parabola through each triple of points (Pk−1 , Pk , Pk+1 ), k = 2j + 1, j = 0, . . . , m − 1, Pk := (xk , f (xk )) = (xk , yk ), to approximate f . To obtain the rule, observe that any polynomial p(x) of f

x0

x1

x2

x3

x4

x

FIGURE 5.9: Simpson’s rule approximation. degree ≤ 2 may be written in the form p(x) = bk (x − xk−1 )(x − xk ) + ck (x − xk−1 ) + dk ,

(5.21)

where p(xk+1 ) − 2p(xk ) + p(xk−1 ) , 2h2 p(xk ) − p(xk−1 ) ck = , and h dk = p(xk−1 ). bk =

It follows that the unique polynomial pk of degree ≤ 2 that passes through the points Pk−1 , Pk , and Pk+1 is obtained by choosing f (xk+1 ) − 2f (xk ) + f (xk−1 ) , 2h2 f (xk ) − f (xk−1 ) ck = ck (f ) := , and h dk = dk (f ) := f (xk−1 ). bk = bk (f ) :=

(5.22)

With this choice, one readily calculates Z xk+1 hn Sn,k := pk (x) dx = [yk−1 +yk+1 +4yk ], k = 2j +1, j = 0, · · · , m−1, 3 xk−1 R xk+1 which is taken as an approximation of xk−1 f . Note that the approximation is exact for all polynomials f of degree ≤ 2, since such a polynomial may be

140

A Course in Real Analysis

written in the form (5.21). Summing this result, we see that the integral of the approximating function on [a, b] is b−a y0 + 4y1 + 2y2 + 4y3 + 2y4 + · · · + 2yn−2 + 4yn−1 + yn . 3n Rb 5.6.3 Simpson’s Rule. If f ∈ Rba , then limn Sn = a f . Moreover, if f (4) exists and is continuous on [a, b], then the following error estimate holds: Z b h4 (b − a)kf (4) k ∞ f − Sn ≤ n . 180 a Sn :=

Proof. Set Rn0 := y0 + y2 + · · · + yn−2 (2hn ) and Rn00 := y1 + y3 + · · · + yn−1 (2hn ). These are Riemann sums for f on [a, b] and 6Sn = 2Rn0 + 4Rn00 + (b − a)(2hn ). Rb It follows that Sn → a f . To obtain the error estimate, let f (4) be continuous on [a, b] and denote the errors by En,k =

Z

xk+1

f (x) dx − Sn,k and En =

xk−1

m−1 X

En,2j+1 =

j=0

Z

b

f (x) dx − Sn .

a

We show that there exists a point ξk ∈ [xk , xk+1 ] such that En,k = −

h5n f (4) (ξk ) . 90

(5.23)

It will follow that |En | ≤

h4 (b − a)kf (4) k∞ mh5n kf (4) k∞ = n , 90 180

proving the theorem. To verify (5.23), fix k and choose a point in x∗k ∈ (xk−1 , xk ) ∪ (xk , xk+1 ). For any function g, define a function Lg on [xk−1 , xk+1 ] by (Lg)(x) = ak (g)(x − xk−1 )(x − xk )(x − xk+1 ) + bk (g)(x − xk−1 )(x − xk ) + ck (g)(x − xk−1 ) + dk (g), where bk (g), ck (g), and dk (g) are defined as in (5.22) and ak (g) is chosen so that (Lg)(x∗k ) = g(x∗k ). Then Lg is the unique polynomial of degree ≤ 3 passing through the four points xk−1 , g(xk−1 ) , xk , g(xk ) , xk+1 , g(xk+1 ) , and x∗k , g(x∗k ) .

Riemann Integration on R

141

Note that the coefficients in the definition of L are linear functions of g, hence L itself is a linear function. Furthermore, Lg = g for all polynomials of degree ≤ 3. Since (Lf )(x) = ak (f )(x − xk−1 )(x − xk )(x − xk+1 ) + pk (x) and

Z

xk+1

(x − xk−1 )(x − xk )(x − xk+1 ) dx = 0,

xk−1

we see that

Z

xk+1

Lf =

xk−1

Z

xk+1

pk = Sn,k .

xk−1

By Taylor’s formula with integral remainder (Exercise 4.6.3), there exists a polynomial T3 (x) of degree ≤ 3 such that Z 1 x f (x) = T3 (x) + R3 (x), where R3 (x) := (x − t)3 f (4) (t) dt. 3! xk−1 The remainder may be written ( Z (x − t)3 1 xk+1 (4) qt (x)f (t) dt where qt (x) := R3 (x) = 3! xk−1 0 Since

if t ≤ x if t > x.

Lf = LT3 + LR3 = T3 + LR3 = f − R3 + LR3 ,

we see that En,k =

Z

xk+1

(f − Lf ) =

xk−1

Z

xk+1

(R3 − LR3 ).

xk−1

In the remaining calculations, for ease of notation we assume that [xk−1 , xk+1 ] = [−h, h]. By Fubini’s theorem for continuous functions, Z Z h Z h 1 h (4) R3 (x) dx = f (t) qt (x) dx dt 3! −h −h −h Z 1 h (4) f (t)(h − t)4 dt. (5.24) = 4! −h Also, because L is linear, (LR3 )(x) =

1 3!

Z

h

f (4) (t)(Lqt )(x) dt.6

−h

Therefore, by Fubini’s theorem, Z h Z Z h 1 h (4) (LR3 )(x) dx = f (t) (Lqt )(x) dx dt. 3! −h −h −h 6 This

(5.25)

may be proved using the dominated convergence theorem. (See Exercise 11.3.??)

142

A Course in Real Analysis Now, by definition of L, (Lqt )(x) = at (x + h)x(x − h) + bt (x + h)x + ct (x + h) + dt ,

where bt =

qt (h) − 2qt (0) + qt (−h) qt (0) − qt (−h) , ct = , and dt = qt (−h). 2 2h h

Since qt (−h) = 0 and qt (h) = (h − t)3 , t ∈ [−h, h], h

Z

−h

(Lqt )(x) dx = 32 h3 bt + 2h2 ct = 13 h[(h − t)3 + 4qt (0)].

(5.26)

From (5.24), (5.25), and (5.26), Z

h

(f − Lf ) =

−h

where

Z

h

−h

(R3 − LR3 ) =

1 72

Z

h

f (4) (t)α(t) dt,

(5.27)

−h

α(t) := 3(h − t)4 − 4h[(h − t)3 + 4qt (0)].

Recalling the definition of qt (0), we see that ( (t − h)3 (3t + h) + 16ht3 α(t) = (t − h)3 (3t + h)

if −h ≤ t ≤ 0, if 0 ≤ t ≤ h.

(5.28)

Thus if t ≥ 0, α(−t) = (t + h)3 (3t − h) − 16ht3 and α(t) = (t − h)3 (3t + h).

(5.29)

The cubic polynomials in (5.29) are easily seen to be equal at the conveniently chosen points t = 0, ±h, 2h and therefore must be identical. Thus α is an even function of t so (5.28) may be rewritten ( (t + h)3 (3t − h) if −h ≤ t ≤ 0, α(t) = (t − h)3 (3t + h) if 0 ≤ t ≤ h. Taking derivatives, we see that α is decreasing on [−h, 0] and increasing on [0, h]. Since α(−h) = α(h) = 0, it follows that α ≤ 0 on [−h, h]. By (5.27) and the weighted mean value theorem for integrals, for some point ξ ∈ [−h, h] we have Z Z Z h f (4) (ξ) h f (4) (ξ) h h5 f (4) (ξ) (f −Lf ) = α(t) dt = (t−h)3 (3t+h) dt = − . 72 36 90 −h −h 0 The same result holds for En,k , with the point ξ depending on k. This completes the proof of the theorem.

Riemann Integration on R

143

Comparison of the Approximations R2 Table 5.3 below gives the errors 1 x−1 dx − An , rounded to 10 decimal places, where An is the approximation. The left point rule simply refers to approximation by the Riemann sum Rn . The exact value of the integral, up to 10 decimal places, is ln 2 = .6931471805 . . . TABLE 5.3: A comparison of the methods. Method Left Point Rule Trapezoidal Rule Midpoint Rule Simpson’s Rule

5.7

n=4 .1836233710 -.0038766290 .0019272893 -.0001067877

n=8 .0927753302 -.0009746698 .0004866265 -.00000735011

Improper Integrals

In this section, the Riemann integral is extended in two ways: First, the integrand is allowed to be unbounded and second, the integration interval can be infinite. 5.7.1 Definition. A function f is said to be locally integrable on an interval I if f ∈ Rdc for every interval [c, d] contained in I. ♦ For example, a continuous function is locally integrable on any interval. 5.7.2 Definition. Each expression in (a)–(c) below is called an improper integral. The integral is said to converge if the limit exists in R and to diverge otherwise. In the former case, f is said to be improperly integrable on I. Z b Z t (a) f := lim f , where f is locally integrable on [a, b). t→b−

a

(b)

Z a

(c)

Z a

b

f := lim+ t→a

b

f :=

Z a

a b

Z

f,

where f is locally integrable on (a, b].

t

c

f+

Z

b

f , where f is locally integrable on (a, c) ∪ (c, b).

♦

c

Note that the limits of integration in these definitions, where appropriate, may be infinite.

144

A Course in Real Analysis

It is easy to see that if f is locally integrable on (a, b], then Rc iff a f converges for some (every) c ∈ (a, b). In this case, Z

b

c

Z

f=

a

f+

Z

Rb a

f converges

b

f.

a

c

The first integral on the right is improper while the second is a Riemann integral. Moreover, if f is also bounded and a, b ∈ R, then, by Exercise 5.2.10, Rb the improper integral a f is simply the Riemann integral. Analogous remarks apply to the other cases. 5.7.3 Examples. (a) Let p ∈ R. For 0 < s < t, ( Z t (1 − p)−1 t1−p − s1−p if p 6= 1 dx = p x ln t − ln s if p = 1. s It follows that Z ∞ Z 1 dx dx converges iff p > 1 and converges iff p < 1. p p x 1 0 x (b) Let r > 0, r 6= 1. For t > 1,

Z

t

rx dx =

1

rt − r , hence ln r

∞

Z

rx dx converges iff r < 1.

1

(c) Since

t

Z

(1 + x2 )−1 dx = tan−1 t − tan−1 s,

s ∞

Z 0

Z

1

(d) −1

dx = 1 + x2

dx p = |x|

Z

0

−1

Z

0

−∞

dx √ + −x

Z 0

1

dx √ = 2 lim t→0+ x

∞

Z

dx π = , hence 2 1+x 2

−∞ 1

Z t

dx = π. 1 + x2

dx √ = 4. x

♦

For ease of exposition, for the remainder of the section we consider only integrals that are improper at the upper limit. Analogous discussions hold for the other types of improper integrals. The proof of the following theorem is left to the reader. 5.7.4 Theorem. Let f and g be locally integrable on [a, b) and let α, β ∈ R. Rb Rb Rb If the improper integrals a f and a g converge, then so does a (αf + βg), and Z Z Z b

b

(αf + βg) = α

a

b

f +β

a

g. a

Riemann Integration on R

145

In contrast to the Riemann integral, the product of improperly integrable √ functions may not be improperly integrable. For example, f (x) := 1/ 1 − x is improperly integrable on the interval [0, 1) but f 2 is not. The following example illustrates the same phenomenon, but on an unbounded interval. It is the first of several examples in this P∞section that uses the fact, proved in Chapter 6, that a series of the form n=1 1/np converges iff p > 1. 5.7.5 Example. Define f on [1, +∞) by f (x) = n if n ≤ x < n + 1/n5/2 , n = 1, 2, . . ., and f (x) = 0 otherwise. Then n+1

Z

f=

1

hence

R∞ 1

n X 1 3/2 k k=1

f converges, whereas

and

n+1

Z

f2 =

1

R∞ 1

n X 1 , 1/2 k k=1

f 2 diverges.

♦

We now have examples, on both bounded and unbounded intervals, of nonnegative improperly integrable functions whose squares are not improperly integrable. Conversely, there exist locally integrable nonnegative functions on unbounded intervals, for example, f (x) = 1/x on [1, +∞), such that f 2 is improperly integrable but f is not. However, for bounded intervals this is not possible: If f 2 is improperly integrable on a bounded interval, then so is |f |. (Exercise 25.) The remainder of this section describes various convergence tests for improper integrals. Many of these are analogs of convergence tests for infinite series, discussed in Chapter 6. 5.7.6 Comparison Test for Integrals. Let f and g be locally integrable on Rb Rb [a, b) such that 0 ≤ f ≤ g. If a g converges, then so does a f . Rx Rx Proof. Let F (x) = a f and G(x) = a g, a ≤ x < b. Since f and g are nonnegative, F and G are increasing, hence, by the monotone function theorem (3.1.17), Z b Z b f = lim− F (x) and g = lim− G(x) a

x→b

x→b

a

exist in R. Since F ≤ G, the conclusion follows. 1 + sin x , x > 0. By definition, 5.7.7 Example. Let f (x) = √ x(x + 1)2 Z 0

∞

f=

Z 0

1

f+

Z

∞

f, 1

provided the integrals on the right converge. That this is √indeed the case follows from 5.7.3(a), 5.7.6, and the inequalities f (x) ≤ 2/ x on (0, 1] and f (x) ≤ 2/(x + 1)2 on [1, +∞). ♦

146

A Course in Real Analysis

5.7.8 Example. Define the gamma function Γ by Z ∞ tx−1 e−t dt, x > 0. Γ(x) = 0

To see that the integral converges for all x > 0, note that tx−1 e−t ≤ tx−1 R 1 x−1 R 1 for t ∈ (0, 1], hence 0 t e−t dt converges by comparison with 0 tx−1 dt (see 5.7.3(a)). Furthermore, by l’Hospital’s rule applied sufficiently many times, lim tx+1 e−t = 0

t→+∞

x+1 −t so there exists e ≤ 1, or tx−1 e−tR≤ t−2 , for all t ≥ t0 . 0 > 1 such that t R ∞ tx−1 ∞ −t Therefore, 1 t e dt converges by comparison with 1 t−2 dt. The gamma function has the following recursive property:

Γ(x + 1) = xΓ(x). To see this, integrate Γ(x + 1) by parts to obtain Z b Z b t=a tx e−t dt = tx e−t +x tx−1 e−t dt, t=b

a

a

and then let a → 0 and b → +∞. In particular, for n ∈ N Γ(n + 1) = nΓ(n) = n(n − 1)Γ(n − 1) = · · · = n(n − 1) · · · 1 · Γ(1). Since Γ(1) =

Z

∞

e−t dt = 1,

0

we see that Γ(n + 1) = n!. Thus Γ(x) is a continuous (indeed, differentiable) extension of the factorial function on N. ♦ 5.7.9 Limit Comparison Test for Integrals. Let f and g be locally integrable on [a, b) with f ≥ 0 and g > 0. If L := limx→b f (x)/g(x) exists and Rb Rb 0 < L < +∞, then a g converges iff a f converges. Rb Rb Proof. Since f, g ≥ 0, a f and a g exist in R. Choose c ∈ (a, b) such that L/2 < f (x)/g(x) < 2L for all x ∈ [c, b). For such x, g(x) < 2f (x)/L and f (x) < 2Lg(x). The assertion then follows from the inequalities Z b Z Z b Z b 2 b g≤ f and f ≤ 2L g. L c c c c √ 5 2 2x − x + 1 5.7.10 Example. Let f (x) = , x ≥ 1. For g(x) = x−3/2 , x4 + 3x + 5 √ √ 2x8 − x5 + x3 f (x) lim = lim = 2. 4 x→+∞ g(x) x→+∞ x + 3x + 5 R∞ R∞ Since 1 g converges, so does 1 f . ♦

Riemann Integration on R

147

5.7.11 Root Test for Integrals. Let f be locally integrable and nonnegative on [a, b), where b > 0, and suppose that L := limx→b− [f (x)]1/x exists in R. Rb Then a f converges if L < 1 and diverges if L > 1. Proof. Suppose L < 1. Choose r ∈ (L, 1) and x0 ∈ (a, b) ∩ (0, b) such that [f (x)]1/x < r for all x ≥ x0 . For such x, f (x) < rx , hence, by the comparison Rb Rb theorem and 5.7.3(b), x0 f converges. Therefore, a f converges. A similar Rb argument shows that a f diverges if L > 1. 5.7.12 Example. For p ∈ R and x ≥ 1, let px 2x + cos x f (x) = . 3x + sin x 1/x

Since lim [f (x)] x→+∞

= (2/3)p ,

R +∞ 1

f converges iff p > 0.

♦

There are examples of convergent integrals and divergent integrals with L = 1, so the root test in inconclusive in this case (see Exercise 3). 5.7.13 Definition. Let f be locally integrable on [a, b). The improper integral Rb Rb f is said to converge absolutely if a |f | converges. In this case f is said to be a Rb improperly absolutely integrable on [a, b). If a f converges but not absolutely, then the integral is said to converge conditionally. ♦ 5.7.14 Proposition. If f is improperly absolutely integrable on [a, b), then Rb f converges and a Z b Z b f ≤ |f |. a

a

Proof. Set g(x) := |f (x)| + f (x), so 0 ≤ g ≤ 2|f | on [a, b]. By the comparison Rb test, a g converges. Since f = g − |f | is the difference of two improperly integrable functions, f is improperly The inequality follows on R t integrable. Rt letting t → +∞ in the inequality | a f | ≤ a |f |. 5.7.15 Example. For p > 0, define f (x) = Then Z 1

(−1)n+1 , n ≤ x < n + 1, n = 1, 2, . . . . np

n+1

|f | =

Z n+1 X n n X 1 (−1)k+1 and f = . p k kp 1

k=1

k=1

The first sum has a finite limit iff p > 1, Rwhile the second sum has a finite ∞ limit iff p > 0 (see Chapter 6). Therefore, 1 f converges absolutely iff p > 1 and conditionally iff 0 < p ≤ 1. ♦

148

A Course in Real Analysis

The following theorem is useful in establishing conditional convergence of improper integrals. 5.7.16 Dirichlet’s Test for Integrals. Let f be continuous and g 0 improperly Rt absolutely integrable on [a, b). If the function F (t) := a f is bounded on [a, b) Rb and limx→b− g(x) = 0, then a f g converges. Proof. Let M be a bound for |F | on [a, b). Then |F g 0 | ≤ M |g 0 |, hence, by the comparison test, F g 0 is absolutely integrable on [a, b). Integrating by parts yields Z Z t

t

f g = F (t)g(t) −

a

Since

Rb a

F g0 .

a

F g 0 converges and limt→b− F (t)g(t) = 0,

Rb a

f g converges.

5.7.17 Corollary. Let f be continuous and g 0 locally integrable on [a, b) with Rt limx→b− g(x) = 0. If the function F (t) := a f is bounded on [a, b) and if g 0 Rb has constant sign, then a f g converges. Rt Proof. By the fundamental theorem of calculus, a g 0 = g(t) − g(a), hence g 0 is absolutely integrable on [a, b) and Dirichlet’s test applies. 5.7.18 Example. Let h(x) = x−p sin x, Rx ≥ 1, where p > 0. Taking f (x) = ∞ sin x and g(x) = x−p in 5.7.17 shows that 1 h converges. Since |h(x)| ≤ 1/xp , h is improperly absolutely integrable on [1, +∞) if p > 1. If 0 < p ≤ 1, the sums on the right in the inequality Z

nπ

π

|h| =

n Z X k=2

kπ

|h| >

(k−1)π

n X

Z

(kπ)−p

kπ

| sin x| dx = 2π −p

(k−1)π

k=2

n X

k −p

k=2

are unbounded (see Example 6.2.5), hence h is not improperly absolutely integrable in this case. ♦ 5.7.19 Cauchy–Schwarz Inequality for Improper Integrals. Let f and g be continuous with f 2 and g 2 improperly integrable on [a, b). Then f g is improperly absolutely integrable on [a, b) and b

Z

|f g|

2

b

Z

2

≤

a

b

Z

g2 .

f · a

a

Proof. By Exercise 5.2.13, for all t ∈ [a, b) Z

2

t

|f g| a

Now let t → b.

Z ≤ a

t

f2 ·

Z a

t

g2 .

Riemann Integration on R

149

Exercises Z

1

dx converges. p (1 − x)q (sin x) 0 Z ∞ Z ∞ 2. Let p > 0. Show that x−px dx converges and x−p/x dx diverges. 1 1 Z 1 Z 1 −px Show that the same behavior holds for x dx and x−p/x dx. 1.S Determine all values of p, q > 0 for which

0

0

3. Find examples for which limx→+∞ [f (x)] = 1 and Z ∞ Z ∞ (a) f converges. (b) f diverges. 1/x

1

1

4. Let f and g be positive and continuous on [1,R +∞). Prove that if ∞ f f (x) L := limx→+∞ exists in R, then lim Rx∞ = L. x→+∞ g(x) g x 5.S Determine if the integrals converge or diverge: Z 1 Z 1 Z 1√ sin x sin x sin x − x dx. (b) dx. (c) dx. (a) 3 2 x x x 0 0 0 Z ∞ Z ∞ Z ∞ (ln x)(sin x) (sin x)(cos x−1 ) 1 (d) dx. (e) dx. (f) dx. sin2 ln x x x 2 2 1 π/2

Z

6. Prove that

cos(secp x) dx converges for all p > 0.

0 1

7. Show that

Z

8.S Show that

Z

√

0

0

1

xp dx converges iff p > −1. 1 − x2

sinp x dx converges iff p < 1 + q. xq

9. Find all values of p for which the integral converges: Z ∞ Z 1 Z (a) S xp e−x dx. (b) xp e−x dx. (c) S 1

(d) (g) (j)

Z

1

xp sin xp dx. (e)

0 Z π/2 0 Z π/2 0

0 1

Z

xp ln x dx.

(f)

0

sinp x dx.

(h) S

Z

π/2

x sinp x dx.

(i)

0

tanp x dx.

(k) S

Z

Z

1

sin xp dx.

0 ∞

xp ln x dx.

1 Z π/2

(1 − sin x)p dx.

0 π/2

xp cos x dx. (l)

0

10. Find all values of p > 0 for which

Z

π/2

xp sin x dx.

0

Z 0

1

x−p sin ex dx converges absolutely.

150

A Course in Real Analysis

11.S Prove that

∞

Z 1

12. Prove that

x sin x dx converges conditionally. 1 + x2

∞

Z

xp sin ex dx converges for all p. For what values of p does

1

the integral converge conditionally? (See 5.7.18.) 13.S Find all values of p, q > 0 for which the integral converges: Z 1 Z 1 Z ∞ xp dx dx √ p . (b) dx. (c) . (a) p )q 2p )q (1 − x (1 − x x + xq 0 1 0 Z 1 Z π/2 Z π/2 sinp x 1 dx √ p (d) . (e) dx. (f) p q dx. qx q cos sin x x + x 0 0 0 Z ∞ 14. Prove by induction that xn e−x dx = n!. 0

Z

15.S Given that

∞

e−x

2

/2

dx =

√

2π (to be established in 11.5.3) show

−∞

that, 1 √ 2π

Z

∞

2

x2n e−x

/2

dx = (2n − 1)(2n − 3) · · · 3 · 1 =

−∞

(2n)! . n!2n

√ 2 e−s ds = π/2, show that √ √ √ 1 3 π 5 3 π Γ = π, Γ = , and Γ = . 2 2 2 2 4

16. Given that

R∞ 0

17. The formula Γ(x) = x−1 Γ(x + 1) may be used to extend the gamma func tion to non-integer values x < 0. Use this to find Γ − 21 and Γ − 32 . 18. Prove that if f is absolutely integrable on [1, ∞), then Z ∞ lim f (xn )dx = 0. n→∞

1

19. (Log test for integrals). Let f be locally integrable and positive on [0, +∞) such that − ln f (x) L := lim x→∞ ln x Z ∞ exists in R. Prove that f converges if L > 1 and diverges if L < 1. 0

20. Use Exercise 19 to determine the convergence behavior of Z ∞ Z ∞ − ln x −√x (a) ln x dx. (b) ln x dx. S

1

What does the root test reveal?

1

Riemann Integration on R 151 Z t sin ax 21. Prove that L(a) := lim dx converges for all a ∈ R and that t→+∞ 1/t x L(a) = L(1) for all a > 0. 22. Let f be differentiable and nonzero on [1, +∞). If lim xf 0 (x)/f (x) x→+∞ R∞ exists in R and is less than −1, prove that 1 f converges. R∞ R1 23. Prove that if 0 f (x) dx converges, then limn 0 f (nx) dx = 0. R∞ R∞p 24.S Prove that if f ≥ 0 and 1 f converges, then 1 f (x)/x dx converges. 25. Prove that if [a, b) is finite and f 2 is improperly integrable on [a, b), then |f | is improperly integrable on [a, b). 26.S Let f be continuous and g locallyR integrable and positive on [a, b). x Suppose that the function G(x) := a g is bounded on [a, b) and that Rb limx→b− f (x) = 0. Prove that a f g converges. Rb 27. Let f be continuous on [a, b) such that a f converges. If g 0 is locally Rb integrable and has constant sign on [a, b), prove that a f g converges. 28.S Let f be improperly integrable on (−∞, +∞) and c ∈ R. Prove that Z

∞

f (x + c) dx =

−∞

5.8

Z

+∞

f (x) dx.

−∞

A Deeper Look at Riemann Integrability

In this section we characterize Riemann integrability of a function in terms of the size of its set of discontinuities. 5.8.1 Definition. A set A of real numbers is said to have (Lebesgue ) measure zero if for each ε P > 0 there exists a finite or infinite sequence of intervals In with total length n |In | < ε such that the sequence covers A, that is, every member of A is contained in some In . ♦ Any countable set has measure zero. Indeed, if A = {a1 , a2 , . . .} and ε > 0, then the intervals In = (an − ε/2n+2 , an + ε/2n+2 ) obviously cover A and have total length < ε. In particular, the set of rational numbers has measure zero. An uncountable set of measure zero is constructed in Example 10.3.4. The following result will be proved in Chapter 11. 5.8.2 Theorem. Let f be bounded on [a, b]. Then f ∈ Rba iff its set of discontinuities has measure zero.

152

A Course in Real Analysis

Examples 5.1.11 and 5.1.12 are relevant here: The function in the first example, shown to be integrable, has a countable set of discontinuities. The function in the second example, shown not to be integrable, has [0, 1] as its set of discontinuities, certainly not a set of measure zero. Theorem 5.8.2 allows simple proofs of many of the properties discussed in this chapter. For example, if f and g are integrable with sets of discontinuity A and B, respectively, then f + g and f g have sets of discontinuity contained in A ∪ B, a set of measure zero (Exercise 2), and hence are integrable.

Exercises 1. Show that if B has measure zero and A ⊆ B, then A has measure zero. 2.S Prove: If An has measure zero for every n ∈ N, then so does A1 ∪A2 ∪· · · . 3. Let A have measure zero. Prove that A + Q has measure zero. 4. Let f : [a, b] → [c, d] be integrable and g : [c, d] → R continuous. Prove that g ◦ f is integrable. 5. A set A of real numbers has (Jordan) content zero if for each ε > 0 there exist finitely many intervals of total length < ε that cover A. Show that (a) a convergent sequence has content zero. (b) [0, 1] ∩ Q does not have content zero. 6.S Prove that the function f in Exercise 3.3.10 is integrable on [a, b] and find its integral.

*5.9

Functions of Bounded Variation

5.9.1 Definition. Let P = {a = x0 < x1 < · · · < xn = b} be a partition of [a, b]. For f : [a, b] → R define VP (f ) =

n X

|f (xj ) − f (xj−1 )|.

j=1

The total variation of f on [a, b] is the extended real number Vab (f ) := sup VP (f ). P

The function f is said to have bounded variation on [a, b] if Vab (f ) < +∞. The set of all functions with bounded variation on [a, b] is denoted by BV ba . ♦

Riemann Integration on R

153

5.9.2 Proposition. Let f : [a, b] → R. (a) If f ∈ BV ba , then f is bounded. (b) If f has a bounded derivative on [a, b], then f ∈ BV ba . (c) If f is monotone on [a, b], then Vab (f ) = |f (b) − f (a)|. Rx (d) If g ∈ Rba and f (x) = a g(t) dt, then Vab (f ) ≤ (b − a) sup[a,b] |g|. (e) If P is a partition of [a, b] and Q is a refinement of P, then VP (f ) ≤ VQ (f ). (f) If f, g ∈ BV ba and c ∈ R, then f + g, cf, f g ∈ BV ba . Proof. (a) Let a < x < b and P = {a, x, b}. Then 2|f (x)| ≤ |f (x) − f (a)| + |f (x) − f (b)| + |f (a)| + |f (b)| = VP (f ) + |f (a)| + |f (b)| ≤ Vab (f ) + |f (a)| + |f (b)|. (b) Let |f 0 | ≤ C on [a, b]. By the mean value theorem, given a partition P, there exists for each j a point tj ∈ (xj−1 , xj ) such that X X VP (f ) = |f (xj ) − f (xj−1 )| = |f 0 (tj )|(xj − xj−1 ) ≤ C(b − a). P

P

Therefore, Vab (f ) ≤ C(b − a). (c) If f is increasing, then X X |f (xj ) − f (xj−1 )| = f (xj ) − f (xj−1 ) = f (b) − f (a). P

P

(d) Let M := supa≤t≤b |g(t)|. Then, for any partition P, X Z xj X VP (f ) ≤ |g(t)| dt ≤ M (xj − xj−1 ) = M (b − a). P

xj−1

P

(e) Let P = {a = x0 < x1 < · · · < xn = b} and P 0 = P ∪ {c}, where c ∈ [xi−1 , xi ]. Then X VP (f ) = |f (xj ) − f (xj−1 )| + |f (xi ) − f (xi−1 )| j6=i

≤

X

|f (xj ) − f (xj−1 )| + |f (xi ) − f (c)| + |f (c) − f (xi−1 )|

j6=i

= VP 0 (f ). Adding points successively, yields (e).

154

A Course in Real Analysis

(f) Let |f |, |g| ≤ M on [a, b]. The inequality |(f g)(xj ) − (f g)(xj−1 )| ≤ M |g(xj ) − g(xj−1 )| + M |f (xj ) − f (xj−1 )| shows that f g ∈ BV ba . The proofs of the remaining parts of (f) are similar. 5.9.3 Example. For α > 0, define a continuous function fα on [0, 1] by ( xα sin(1/x) if 0 < x ≤ 1, fα (x) := 0 if x = 0. We show that if α ≤ 1, then fα does not have bounded variation on [0, 1]. Set ak :=

2 1 1 = and bk := 2kπ + π/2 (4k + 1)π 2kπ

and note that fα (bk ) = 0 and fα (ak ) = aα k =

c 2α , where c := α . α (4k + 1) π

Since bk+1 < ak < bk , for sufficiently small ε > 0 we may form the partition Pε = {ε < ap < bp < ap−1 < · · · < ak < bk < · · · < bq+1 < aq < bq < 1} of [ε, 1], where p and q are, respectively, the largest and smallest integers satisfying ε < ap < bq < 1, equivalently, 1 2 − πε s. It follows that lim+ Vtb (f ) ≥ s. Since s was arbitrary, the assertion follows. t→a

5.9.6 Example. We use 5.9.5 to show that the function fα in 5.9.3 has bounded variation on [0, 1] if α > 1. We have |fα0 (x)| = |αxα−1 sin(1/x) − xα−2 cos(1/x)| ≤ αxα−1 + xα−2 . R1 R1 If α > 1, the integral 0 xα−2 dx converges, hence 0 |fα | converges.

♦

5.9.7 Theorem. If f ∈ BV ba , then there exist monotone increasing functions g and h on [a, b] such that f = g − h. Proof. For x ∈ [a, b], define g(x) := Vax (f ) and h(x) := g(x) − f (x). Clearly, g is increasing. To see that h is increasing, let x < y, let Px be an arbitrary partition of [a, x], and let Py = Px ∪ {y}. Then VPx (f ) + f (y) − f (x) = VPy (f ) ≤ g(y). Taking suprema over all partitions Px yields g(x) + f (y) − f (x) ≤ g(y), that is, h(x) ≤ h(y). From Exercise 5.1.4 we have 5.9.8 Corollary. BV ba ⊆ Rba .

156

A Course in Real Analysis

*5.10

The Riemann–Stieltjes Integral

In this section we describe the main features of the Riemann-Stieltjes integral, a generalization of the Riemann integral. These integrals have many of the properties of Riemann integrals; however, as we shall see, there are some striking differences.

Definition and General Properties 5.10.1 Definition. Let f and w be bounded, real-valued functions on an interval [a, b]. If P = {x0 = a < x1 < · · · < xn = b} and ξj ∈ [xj−1 , xj ], then Sw (f, P, ξ) :=

n X

f (ξj )∆wj , ∆wj := w(xj ) − w(xj−1 ), ξ := (ξ1 , . . . , ξn ),

j=1

is called a Riemann-Stieltjes sum of f with respect to w. The function f is said to be Riemann-Stieltjes integrable with respect to w if for some I ∈ R and each ε > 0, there exists a partition Pε such that |Sw (f, P, ξ) − I| < ε for all refinements P of Pε and all choices of ξ. In this case I is called the Riemann-Stieltjes integral with respect to w and is denoted by Z b Z b f dw = f (x) dw(x) = lim Sw (f, P, ξ). (5.32) a

a

P

The function f is called the integrand and w the integrator. The collection of all functions that are Riemann-Stieltjes integrable with respect to w is denoted by Rba (w). ♦ It follows from 5.1.18 that, for the integrator w(x) = x, the RiemannStieltjes integral reduces to the Riemann integral. It is clear that constant functions are Riemann-Stieltjes integrable. The following example shows that, in contrast to the Riemann integral, if f has a Rb simple discontinuity, then a f dw may not exist. 5.10.2 Example. Let f : [0, 1] → R and define ( 0 if 0 ≤ x < 1, w(x) := 1 if x = 1 We show that f ∈ R10 (w) iff f is continuous at 1. Let P = {x0 = 0 < x1 < · · · < xn = 1} be any partition of [0, 1]. Then Sw (f, P, ξ) = f (ξn )[w(1) − w(xn−1 )] = f (ξn ).

Riemann Integration on R

157

Hence if f ∈ R10 (w) and ξ is chosen so that first ξn = 1 and second ξn < 1, we R1 see that f is continuous at 1 and 0 f dw = f (1). Conversely, if f is not continuous at 1, then there exists a sequence {am } and r > 0 such that am ↑ 1 and |f (am ) − f (1)| ≥ r for every m. Let Pm denote the refinement P ∪ {am } of P, where am ∈ (xn−1 , 1]. If ξ consists of the left endpoints of the intervals of Pm , then Sw (f, Pm , ξ) = f (am ), hence |Sw (f, Pm , ξ) − f (1)| = |f (am ) − f (1)| ≥ r. Since P was arbitrary, f 6∈ R10 (w).

♦

5.10.3 Theorem. If f, g ∈ Rba (w) and α, β ∈ R, then αf + βg ∈ Rba (w) and Z

b

(αf + βg) dw = α

Z

a

b

f dw + β

a

Z

b

g dw. a

Proof. This follows from the identity Sw (αf + βg, P, ξ) = αSw (f, P, ξ) + βSw (g, P, ξ) and the linearity of the limit in (5.32), as is readily established by a standard argument. 5.10.4 Theorem. Let w := αu + βv, where α, β ∈ R and u, v : [a, b] → R are bounded. If f ∈ Rba (u) ∩ Rba (v), then f ∈ Rba (w) and Z

b

f dw = α

a

Z

b

f du + β

Z

a

b

f dv. a

Proof. This follows from Sw (f, P, ξ) = αSu (f, P, ξ) + βSv (f, P, ξ) and the linearity of the limit in (5.32). 5.10.5 Theorem. Let a < c < b. If f |[a,c] ∈ Rca (w) and f |[c,b] ∈ Rbc (w), then f ∈ Rba (w) and Z b Z c Z b f dw = f dw + f dw. a

a

c

Proof. Given ε > 0, choose partitions Pε0 of [a, c] and Pε00 of [c, b] such that the following hold: Z c 0 0 0 0 0 f dw − S (f, P , ξ ) w < ε/2 for all refinements P of Pε and all ξ , Z a b 00 00 f dw − Sw (f, P , ξ ) < ε/2 for all refinements P 00 of Pε00 and all ξ 00 . c

158

A Course in Real Analysis

Then Pε := Pε0 ∪ Pε00 is a partition of [a, b] containing c. Moreover, if P is a refinement of Pε , then P 0 := P ∩ [a, c] and P 00 = P ∩ [c, b] are refinements of Pε0 and Pε00 , respectively. From Sw (f, P, ξ) = Sw (f, P 0 , ξ 0 ) + Sw (f, P 00 , ξ 00 ) and the above inequalities we see that Z Z b c f dw + f dw − Sw (f, P, ξ) < ε/2 + ε/2 = ε. a c This establishes the existence of

Rb a

f dw as well as the desired equality.

5.10.6 Example. Consider the floor function integrator R n w(x) = bxc. A slight modification of the argument in 5.10.2 shows that 0 f (x) dbxc exists iff f is Rk left continuous at the integers 1, 2, . . . , n, in which case k−1 f (x) dbxc = f (k). For such a function, 5.10.5 implies that Z n n Z k n X X f (x) dbxc = f (x) dbxc = f (k). ♦ 0

k=1

k−1

1

The preceding example suggests that improper Riemann-Stieltjes integration could be used to provide a unified theory that includes both improper Riemann integrals and infinite series. This is indeed possible; however, it turns out that Lebesgue integration is a more efficient approach. Lebesgue theory on Rn is developed in Chapter 11. The following theorem reveals a remarkable symmetry between integrand and integrator. 5.10.7 Integration by Parts Formula. If f ∈ Rba (w), then w ∈ Rba (f ) and Z b Z b f dw + w df = f (b)w(b) − f (a)w(a). a

a

Proof. For any partition P{x0 = a, x1 , . . . , xn−1 , xn = b}, f (b)w(b) − f (a)w(a) = Sf (w, P, ξ) =

n X j=1 n X j=1

f (xj )w(xj ) − w(ξj )f (xj ) −

n X j=1 n X

f (xj−1 )w(xj−1 ) and w(ξj )f (xj−1 ).

j=1

Subtracting we obtain f (b)w(b) − f (a)w(a) − Sf (w, P, ξ) n n X X = f (xj−1 )[w(ξj ) − w(xj−1 )] + f (xj )[w(xj ) − w(ξj )] j=1

= Sw (f, Q, ζ),

j=1

Riemann Integration on R

159

where ζ = (a, x1 , x1 , x2 , x2 , . . . , xn−1 , xn−1 , b) and Q is the refinement of P obtained by adding the coordinates of ξ to P. Therefore,

ξ P a

ξ1

ξ2

ξ3 x2

x1

ξ4

ξ5 x4

b

ξ4 x4 ξ5

b

x3

ζ Q a

ξ1 x1 ξ2 x2 ξ3 x3

FIGURE 5.10: The partition Q. Z b Z b (b)w(b) − f (a)w(a) − f dw − S (w, P, ξ) = (f, Q, ζ) − f dw . f Sw f a

a

Since f ∈ Rba (w), the right side may be made arbitrarily small, Therefore, Rb Rb w df exists and equals f (b)w(b) − f (a)w(a) − a f dw. a The next result shows that under certain general conditions the RiemannStieltjes integral reduces to a Riemann integral. 5.10.8 Theorem. Let f ∈ Rba (w). If w is continuously differentiable, then f w0 ∈ Rba and Z b Z b f dw = f (x)w0 (x) dx. a

a

Proof. For any partition P of [a, b] and any ξ, Sw (f, P, ξ) − S(f w0 , P, ξ) =

n X j=1

f (ξj )∆wj −

n X

f (ξj )w0 (ξj )∆xj .

j=1

By the mean value theorem, for each j there exists tj ∈ (xj−1 , xj ) such that ∆wj = w(xj ) − w(xj−1 ) = w0 (tj )∆xj . Therefore, Sw (f, P, ξ) − S(f w0 , P, ξ) =

n X

f (ξj ) w0 (tj ) − w0 (ξj ) ∆xj .

(5.33)

j=1

Let |f | ≤ M on [a, b]. By uniform continuity of w0 , given ε > 0, there exists a δ > 0 such that |w0 (x) − w0 (y)| <

ε whenever |x − y| < δ. 2M (b − a)

(5.34)

160

A Course in Real Analysis

Let Pε0 be a partition of [a, b] with kPε0 k < δ. From (5.33) and (5.34), n

|Sw (f, P, ξ) − S(f w0 , P, ξ)| ≤

X ε ε ∆xj = 2(b − a) j=1 2

(5.35)

for all refinements P of Pε0 and all ξ. Next, choose a partition Pε00 such that Z b f dw − Sw (f, P, ξ) < ε/2 for all ξ and all refinements P of Pε00 . (5.36) a

If P is a refinement of Pε0 ∪ Pε00 , then both (5.35) and (5.36) hold, hence, by the triangle inequality, Z b f dw − S(f w0 , P, ξ) < ε. a

This shows that f w0 ∈ Rba and establishes the equality.

Monotone Increasing Integrators If w : [a, b] → R is monotone increasing, then the Riemann-Stieltjes integral may be characterized in terms of upper and lower sums, as in the Darboux theory. This fact will lead to an important existence theorem for integrators of bounded variation and continuous integrands. Let f : [a, b] → R be bounded and let P be a partition of [a, b]. Define the upper and lower Darboux–Stieltjes sums of f with respect to w by S w (f, P) =

n X

Mj ∆wj

and S w (f, P) =

j=1

n X

mj ∆wj ,

j=1

where Mj = Mj (f ) :=

sup

xj−1 ≤x≤xj

f (x) and mj = mj (f ) :=

inf

xj−1 ≤x≤xj

f (x).

The upper and lower Darboux–Stieltjes integrals of f with respect to w are defined, respectively, by Z b Z b f dw := inf S w (f, P) and f dw := sup S w (f, P). a

P

a

P

As in the Darboux theory, if Q is a refinement of P then, because w is increasing, Z b Z b S w (f, P) ≤ S w (f, Q) ≤ f dw ≤ f dw ≤ S w (f, Q) ≤ S w (f, P). a

a

Here is the analog of 5.1.8 for Riemann–Stieltjes integrals.

Riemann Integration on R

161

5.10.9 Theorem. The following statements are equivalent: (a) f ∈ Rba (w). (b) For each ε > 0, there exists a partition Pε such that S w (f, P) − S w (f, P) < ε. (c)

Z

b

f dw =

a

b

Z

f dw. a

If these conditions hold, then

Z

b

f dw =

a

Z

b

f dw =

a

Z

b

f dw. a

Proof. That (b) and (c) are equivalent is proved exactly as in 5.1.8. Assume that (a) holds. Given ε > 0, choose a partition Pε such that Z b f dw − Sw (f, P, ξ) < ε/3 for all refinements P of Pε and all ξ. (5.37) a

For such a partition P and for each j, there exists a sequence {ξj,k }∞ k=1 in [xj−1 , xj ] such that limk f (ξj,k ) = Mj (f ). It follows that lim Sw (f, P, ξ k ) = S w (f, P), where ξ k = (ξ1,k , . . . , ξn,k ). k

From (5.37), Z

b

Z

b

a

Similarly, a

f dw − S w (f, P) ≤ ε/3. f dw − S w (f, P) ≤ ε/3.

Part (b) now follows from the triangle inequality. Now assume that (c) holds. Let I denote the common value of the integrals in (c). Given ε > 0, choose partitions Pε0 and Pε00 such that I − ε < S w (f, Pε0 ) and S w (f, Pε00 ) < I + ε. The inequalities still hold if Pε0 and Pε00 are replaced by any refinement P of Pε := Pε0 ∪ Pε00 . Thus −ε < S w (f, P) − I ≤ Sw (f, P, ξ) − I ≤ S w (f, P) − I < ε. This shows that f ∈ Rba (w) and

Rb a

f dw = I.

162

A Course in Real Analysis

Integrators of Bounded Variation Recall that a function of bounded variation may be expressed as the difference of two monotone increasing functions (5.9.7). This, together with 5.10.9, allows for a simple proof of the following existence theorem. 5.10.10 Theorem. If f : [a, b] → R is continuous and w : [a, b] → R has bounded variation, then f ∈ Rba (w). Proof. By the remark preceding the theorem and by 5.10.4, we may assume that w is increasing. By uniform continuity of f , given ε > 0, there exists a δ > 0 such that ε |f (x) − f (y)| < for all x, y with |x − y| < δ. w(b) − w(a) + 1 Let Pε be a partition with kPε k < δ. For any refinement P of Pε , kPk < δ, hence ε Mj (f ) − mj (f ) ≤ . w(b) − w(a) + 1 Therefore, S w (f, P) − S w (f, P) =

n X Mj (f ) − mj (f ) ∆wj ≤ ε, j=1

which shows that f ∈ Rba (w). The conclusion of the theorem does not necessarily hold if w fails to have bounded variation, even if w is continuous: 5.10.11 Example. Let f = w = f1/2 , where fα is defined as in Example 5.9.3. R1 We show that 0 f dw does not exist. Referring to that example, let Pε be the partition ε < ap < bp < ap−1 < · · · < bk+1 < ak < bk < · · · < bq+1 < aq < bq < 1, of [ε, 1], and let ξ consist of left endpoints of Pε . Then Sw (f, P, ξ) = f (ε) w(aq ) − w(ε) + f (bq ) w(1) − w(bq ) +

p X

p−1 X f (ak ) w(bk ) − w(ak ) + f (bk+1 ) w(ak ) − w(bk+1 ) .

k=q

Since f1/2 (bk ) = 0 and f1/2 (ak ) =

k=q

√

ak ,

Sw (f, Pε , ξ) = f (ε)

√

p X aq − w(ε) − ak . k=q

Since the sums diverge as ε → 0, limε→0 Sw (f, Pε , ξ) = −∞.

♦

Chapter 6 Numerical Infinite Series

An infinite series is the limit of a sequence of expanding finite sums. The terms of these sums may be real numbers or functions. In this chapter we examine the convergence behavior of series of the former type; series whose terms are functions are treated in the next chapter. In the first section, we give examples of series that may be summed, that is, for which an explicit numerical value may be calculated. The remaining sections describe various tests for convergence of general series. Additional methods of summing series may be found in Section 7.4.

6.1

Definition and Examples

6.1.1 Definition. Let {an } be a sequence of real numbers. The various symbols ∞ X X X an = an = an = a1 + a2 + · · · + an + · · · n

n=1

represent what is called an infinite series with nth term an or, simply, a series. The nth partial sum of the series is defined by sn =

n X

ak .

k=1

The series is said to converge if the sequence of partial sums converges, in which case we write X an = lim sn n P and call an the sum of the series. If the sequence {sn } diverges, then the series is said to diverge. ♦ 6.1.2 Remark. A series may begin with an index other than 1. In this regard, note that, because sn = sm−1 +

n X

ak , n ≥ m > 1,

k=m

163

164

A Course in Real Analysis P∞ P∞ the series s := n=1 an converges iff n=m an converges. In this case the “tail end” of the series tends to zero: ∞ X

lim

m→+∞

an =

n=m

lim (s − sm−1 ) = 0.

♦

m→+∞

6.1.3 Example. Using the definition e := lim (1 + 1/n)n (see 2.2.4), we show n→∞ that ∞ X 1 e= . n! n=0 Pn First, since the partial sums sn := k=0 1/k! increase, the limit s := limn sn exists in R. From the calculations in 2.2.4, n

(1 + 1/n) = 2 +

n X 1 (1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n) ≤ sn . k!

k=2

Letting n → ∞, we obtain e ≤ s. On the other hand, if n > m, then n

(1 + 1/n) > 2 +

m X 1 (1 − 1/n)(1 − 2/n) · · · (1 − (k − 1)/n). k!

k=2

Letting n → ∞, we see that e ≥ sm . Letting m → ∞ yields e ≥ s. 6.1.4 Example. The geometric series

arn =

n=0

This follows from the calculation sn =

♦

arn , where a, r ∈ R and a = 6 0,

n=0

converges iff |r| < 1, in which case ∞ X

∞ X

a . 1−r

n X

ark = a

k=0

1 − rn+1 , r 6= 1. 1−r

♦

6.1.5 Example. For m ∈ N, ∞ X

m

1 1 X1 = . n(m + n) m k n=1

(6.1)

k=1

To see this, we use partial fractions: For n > m msn =

n X k=1

X n m n+m X X 1 m 1 1 1 = − = − . k(m + k) k (k + m) k k k=1

k=1

(6.2)

k=n+1

The second sum on the extreme right in (6.2) is less than m/(n + 1) and hence tends to zero as n → ∞. ♦

Numerical Infinite Series

165

The series in (6.1) is an example of a telescopic series, the name referring to the cancellations taking place in (6.2). P 6.1.6PTheorem. Let {anP } and {bn } be sequences and let α, β ∈ R. If an and bn converge, then (αan + βbn ) converges and X X X (αan + βbn ) = α an + β bn . (6.3) Pn Pn Proof. Let sn = Pk=1 ak and tn = k=1 bk . Then αsn + βtn is the nth partial sum of the series (αan + βbn ) and lim (αsn + βtn ) = α lim sn + β lim tn ,

n→∞

n→∞

n→∞

which is (6.3). 6.1.7 Example. By 6.1.6 and 6.1.4, ∞ ∞ ∞ X X 2 · 3n+1 + 3 · 2n−1 1 3X 1 =6 + 6n 2n 2 n=1 3n n=1 n=1

1/2 3 1/3 + 1 − 1/2 2 1 − 1/3 = 6.75.

=6

♦

The following result is a test for divergence. It implies that a seriesPwhose nth term does not tend to zero must diverge. For example, the series sin n P −1/n and 2 diverge. P 6.1.8 Proposition. If an converges, then an → 0. Proof. an = sn − sn−1 → s − s = 0. The converse of 6.1.8 is false: P∞ 6.1.9 Example. The harmonic series n=1 1/n diverges. Indeed, if sn is the nth partial sum of the series, then for all n s2n − s2n−1 =

1 2n−1 + 1

+ ··· +

1 2n−1 1 > = , 2n−1 + 2n−1 2n−1 + 2n−1 2

hence {sn } is not a Cauchy sequence. It is of interest to note that, while the sequence sn diverges, the sequence tn := sn − ln n converges. To see this, observe first that ln n =

Z 1

n

n−1 Z n Z n dx X k+1 dx X k+1 dx X 1 = < = = sn , x x k k k k k=1

k=1

k=1

so tn > 0. Furthermore, ln (n + 1) − ln n =

Z

n+1

n

dx > x

Z

n+1

n

1 1 dx = , n+1 n+1

166

A Course in Real Analysis

hence tn − tn+1 = ln (n + 1) − ln n + sn − sn+1 = ln (n + 1) − ln n −

1 > 0. n+1

Therefore, {tn } is bounded below and decreasing, hence converges. The number X n 1 γ := lim tn = lim − ln n n→∞ n→∞ k k=1

is known as Euler’s constant. Its value to eleven decimal places is .57721566490 . . .. As of this writing, it is not known whether γ is irrational. Note that since sn = tn + ln n, the convergence of {tn } provides another proof that the harmonic series diverges. ♦

Exercises 1. Let m ∈ N. Sum the series (a) S (c) S

P∞

n=1

m2n+1 . (m + 1)2n−1 n+1 2 +2 ln n+1 . 2 +1

an , where an = (−1)n+1 m3n+1 . (m + 1)3n−1 1 (d) p √ √ . n(n + 1)( n + 1 + n) 12 (f) . (n + 1)(n + 2)(n + 3) (−1)n (h) . (n + 1)(n + 3)(n + 5) (b)

(−1)n , m even. n(n + m) 1 . (g) S (n + 1)(n + 3)(n + 5) p √ m n + n(n + 1) S p (i) ln √ . m n + 1 + n(n + 1) 1 (k) . √ √ √ (n + m) n + n n + m (−1)n (n + m + 1) (m) . (2n + 1)(2n + 4m + 3) (e) S

n2 + 4n + 4 . n2 + 4n + 3 18 . (l) n(n + 1)(n + 2)(n + 3) (−1)n (2n + 2m + 1) (n) S . n(n + 2m + 1) (j)

ln

P∞ 2. Let 0 < r < 1 and m ∈ N. Sum the series n=0 an if an = (a)S rn cos (nπ)/2 . (b) (−1)bn/3c rn . (c) (−1)bn/mc rn . P∞ P∞ 3. Given thatPe = n=0 1/n! and e−1 = n=0 (−1)n 1/n!, find the value of ∞ the series n=0 an if an = (a) S

(2n + 3)3 . n!

(b)

4. Let p > 0 and sn = sn / ln ln n → +∞.

1 1 n . (c) S . (d) . (2n)! (2n + 1)! (2n + 1)!

Pn

k=1

(e)

n . (2n)!

1/k. Prove that sn /np → 0, sn / ln n → 1, and

Numerical Infinite Series

167

5. Let γ denote Euler’s constant (6.1.9). Prove that n X √ γ 1 S (a) − ln n → ln 2 + . 2k − 1 2 (b)

k=1 n X

k=1

4k − ln n → ln 4 + γ − 1. (2k − 1)(2k + 1)

∞ X

1 = ln 4 − 1. n(2n − 1)(2n + 1) n=1 P an converges iff for each ε > 0 there exists an index N 6. Prove that such that n+p X an < ε for all n ≥ N and p ≥ 1. (c)

k=n

7. Suppose that an tends monotonically to 0 and that s := converges.

P∞

n=1

an

(a) Prove that nan → 0. (b) Let p ∈ N. Show that t := in terms of s.

P∞

n=1

n(an −an+p ) converges and express t

Suggestion. For (b), consider first the case p = 1. P P 8.S Let an and bn be convergent series with bn > 0 for all n. Suppose that L := limn (an /bn ) exists in R. Prove that P∞ k=n ak lim P∞ = L. n k=n bk P∞ P∞ 2 2 −1 Use this to calculate limn . k=n sin(3/k ) k=n 1/k 9. For a sequence {cn }, define ∆cn = cn+1 − cn . Prove the following discrete analog of l’Hospital’s rule: Let {an } and {bn } be sequences with {bn } strictly monotone. Suppose that either (a) an → 0 and bn → 0, or (b) bn → ±∞. Then an ∆an lim = lim , n bn n ∆bn provided that the limit on the right exists in R. P 10. Let {an } and Pn {bn } be sequences Pnwith bn > 0 for all n and n bn = +∞. Set An = k=1 ak and Bn = k=1 bk . Use Exercise 9 to prove that lim n

an An = lim , n bn Bn

provided that the limit on the right exists in R. Use this to calculate the limits of

168

A Course in Real Analysis n X

(a)

n X

sin(1/k)

k=1 n X

,

(b)

1/k

k=1

n X

ln k

k=1 n X

,

(c)

kp

k=1

rk

k=1 n X

, kp

k=1

where r, p > 0. 11. Let {bn }∞ n=1 be a sequence obtained P by rearranging finitely P many terms of a sequence {an }∞ bn converges in R iff an converges n=1 . Show that in R, in which case the series are equal. 12.S Let {bk } be a sequence obtained from a sequence {an } by grouping, that is, bk = ank−1 +1 + ank−1 +2 + · · · + ank , k = 1, 2, . . . , where {nk }k is a strictly increasing sequence of nonnegative integers P P and n0 = 0. Show that if n an converges in R, then so does k bk and the series are equal. Show that the converse is true if an ≥ 0 for all sufficiently large n. What if the terms an change sign infinitely often? P∞ 13. LetP{an } be decreasing and nonnegative. Prove that n=1 an converges ∞ iff k=0 2k a2k converges. Hint. Set sn =

n X

aj and tk =

j=1

k X

2j a2j .

j=0

Show that sn ≤ tk if n ≤ 2k+1 − 1 and sn ≥ tk /2 if n ≥ 2k . 14. (Decimal representation of real numbers). Prove that every real number x ≥ 0 has a decimal representation x = bN bN −1 · · · b0 .a1 a2 · · · :=

N X

bn 10n +

n=0

∞ X

an 10−n ,

n=1

where the digits bn , an are integers from 0 to 9. Hint. By Exercise 1.5.16, it may be assumed that x ∈ [0, 1). Prove by induction that for each n there exist aj ∈ {0, 1, . . . 9} and xn ∈ [0, 10−n ) such that n X x = xn + aj 10−j = xn + (.a1 a2 · · · an ). j=1

15.S Call a decimal representation bN bN −1 · · · b0 .a1 a2 · · · standard if no index n exists such that ak = 9 for all k ≥ n. Prove that every real number has a unique standard decimal representation.

Numerical Infinite Series

169

16. A real number x ≥ 0 is a repeating decimal if it has decimal representation of the form x = bN bN −1 · · · b0 .a1 a2 · · · am am+1 am+2 · · · am+k , where the upper bar indicates that the block repeats forever. (For example, 61/495 = .12323 · · · = .123.) Prove that every repeating decimal is rational. 17. Prove the converse of Exercise 16, that is, every rational number p/q is a repeating Conclude that if f : N 7→ N is strictly increasing, P −fdecimal. then 10 (n) is irrational. Hint. By the division algorithm you may assume that 1 ≤ p < q. Begin by showing that if p/q = .a1 a2 · · · , then for each n p rn = .a1 a2 · · · an + n , where rn ∈ {0, 1, . . . , q − 1}, q 10 q and use this to show that qan = 10rn−1 − rn , where r0 := p.

6.2

Series with Nonnegative Terms

There are a variety of tests for the convergence of series with nonnegative terms. The most basic of these is the following theorem. P an converges in R iff 6.2.1 Theorem. If an ≥ 0 for all n, then the series its partial sums are bounded. Proof. Since the terms of the series are nonnegative, the sequence of partial sums is increasing. The assertion therefore follows from the monotone sequence theorem (2.2.2). 6.2.2 Remark. By 6.1.2, the theorem is still valid if the inequality an ≥ 0 holds only eventually, that is, for all n ≥ some m. Many of the results in this chapter have similar extensions. Rather than make these explicit, we leave the straightforward formulations to the reader. ♦ P P 6.2.3 Example. Let an , bn ≥ 0 for all n and suppose that an and bn converge. By the Cauchy–Schwarz inequality (1.6.3(e)), n p X k=1

ak bk ≤

X n k=1

ak

1/2 X n

1/2 bk .

k=1

Since P √ the sums on the right are bounded, so are the sums on the left. Therefore, an bn converges. ♦

170

A Course in Real Analysis

The following test relates the convergence of a series to that of an improper integral. 6.2.4 Integral Test. Let f be decreasing, P∞ positive, and locally integrable on the interval [1, ∞). Then the series n=1 f (n) converges iff the improper R∞ integral 1 f converges. Moreover, for every n ∈ N Z ∞ 0 ≤ s − sn ≤ f (x) dx. (6.4) n

Proof. For each n ∈ N let sn =

n X

f (k) and tn =

Z

n

f. 1

k=1

For each k ∈ N and x ∈ [k, k + 1], f (k + 1) ≤ f (x) ≤ f (k), hence f (k + 1) ≤

k+1

Z

f ≤ f (k)

k

and so sn − f (1) =

n X

f (k) =

k=2

n−1 X

f (k + 1) ≤

k=1

n−1 X Z k+1

f = tn ≤

k=1

k

n−1 X

f (k) = sn−1 .

k=1

Therefore, {sn } is bounded iff {tn } is bounded. The first assertion of the theorem now follows from 6.2.1. Now observe that for m > n, 0 ≤ sm − sn =

m X

f (k) =

k=n−1

m−1 X k=n

f (k + 1) ≤

m−1 X Z k+1

f=

k=n

k

Z

m

f. n

Letting m → +∞ yields (6.4). Inequality (6.4) allows one to estimate the error made by approximating s by a partial sum sn . R∞ 6.2.5 Example. (p-series). By 5.7.3(a), 1 1/xp dx converges iff p > 1. ThereP∞ fore, the same is true of the series s := n=1 1/np . . Furthermore, if p > 1, then Z ∞ 1 0 ≤ s − sn ≤ x−p dx = . p−1 (p − 1)n n Thus if the partial sum sn is to agree with s in, say, the first 10 decimal places, then n should be chosen so that (p − 1)np−1 > 1010 . ♦ P 6.2.6 Comparison Test. Let 0 ≤ an ≤ bn for all n. If bn converges, then P so does an .

Numerical Infinite Series 171 P P Proof. The partial sums of bn are bounded and dominate those of an , hence assertion follows from 6.2.1. 6.2.7 Limit Comparison Test. Let an , bn > 0 for all n. P P (a) If r := lim sup(an /bn ) < +∞ and bn converges, then an converges. P P (b) If r := lim inf(an /bn ) > 0 and an converges, then bn converges. P P (c) If r := lim(an /bn ) exists and r ∈ (0, +∞), then bn converges iff an converges. Proof. For (a), let r ∈ (r, +∞) and choose N so that supn≥N an /bn < r. Then an < bn r for every n ≥ N , hence the conclusion follows from the comparison test and 6.2.2. Part (b) follows similarly by choosing r ∈ (0, r) and then N so that inf n≥N an /bn > r. Part (c) follows from (a) and (b). 6.2.8 Examples. (a) The series X 2n + n3 n

3n + n2

converges by comparison with the convergent series

n (2/3)

P

n

, since

1 + n3 /2n 2n + n3 (2/3)−n = → 1. n 2 3 +n 1 + n2 /3n (b) The series

√ X n

cn + d − n+1

√

cn

, c, d > 0,

P converges by comparison with the convergent series n n−3/2 , since √ √ dn3/2 d cn + d − cn 3/2 √ n = → √ . √ n+1 2 c (n + 1)( cn + d + cn)

♦

6.2.9 Ratio Test. Let an > 0 for all n. P an+1 < 1, then an converges. (a) If r := lim sup an n P an+1 (b) If r := lim inf > 1, then an diverges. n an Proof. (a) Let r ∈ (r, 1) and choose N so that supn≥N an+1 /an < r. For n>N an < an−1 r < an−2 r2 < · · · < aN rn−N , P so an converges by the comparison test. (b) If r > 1 there exists N such that inf n≥N an+1 /an > 1. Therefore, an > an−1 > an−2 > · · · > aN > 0, n > N, P so an cannot converge to zero. Therefore, an diverges.

172

A Course in Real Analysis

6.2.10 Examples. (a) Let an denote the general term of the series ∞ X 8 · 14 · 20 · · · (6n + 2) n c , 6 · 11 · 16 · · · (5n + 1) n=1

where c > 0. Then

an+1 6n + 8 6 = c → c, an 5n + 6 5 hence the series converges if c < 5/6 and diverges if c > 5/6. If c = 5/6, then an =

8 · 14 · 20 · · · (6n + 2)5n (1 + 1/3)(2 + 1/3) · · · (n + 1/3) = > 1, 6 · 11 · 16 · · · (5n + 1)6n (1 + 1/5)(2 + 1/5) · · · (n + 1/5)

so the series diverges in this case as well. (b) For the series

∞ X

2

(n!)p rn ,

r > 0, p ∈ R

n=1

the ratios are

an+1 = (n + 1)p r2n+1 , an hence the series converges iff r < 1. (c) For the series

∞ X 2n ln2 n , n! n=2

an+1 2 ln2 (n + 1) 2 ln2 (n + 1) → 0, = ≤ an (n + 1) (n + 1) ln2 n hence the series converges.

♦ 1/n

6.2.11 Root Test. Let an ≥ 0 for all n and set ρ := lim supn an . P (a) If ρ < 1, then an converges. P (b) If ρ > 1, then an diverges. 1/n

Proof. (a) Let r ∈ (ρ, 1) andPchoose N such that supn≥N an < r. Then an < rn for all n ≥ N , hence an converges by the comparison test, 1/nk

(b) By 2.4.2, there exists a subsequence ank large k, ank > 1, hence the series diverges.

→ ρ. Then, for all sufficiently

6.2.12 Example. For the series ∞ X

a + (−1)n b

n

, where a > b > 0,

n=1 1/n

lim supn an = a + b, hence the series converges if a + b < 1 and diverges if a + b > 1. If a + b = 1, an 6→ 0 so the series diverges in this case as well. ♦

Numerical Infinite Series

173 P 6.2.13 Remark. No conclusion regarding the convergence of the series an in 6.2.9 and 6.2.11Pcan be inferred from the relations r ≥ 1, r ≤ 1, or ρ = 1. P Indeed, the series 1/n2 and 1/n satisfy r = r = ρ = 1, yet the first series converges while the second diverges. ♦ In Section 6.3, we consider more refined tests that can detect convergence or divergence in cases where the ratio or root test fails. Here’s an example: 6.2.14 Example. Let an denote the nth term of the series n ∞ q X √ √ an + b n − an , n=1

where a, b > 0. Then q √ √ 1/n an = an + b n − an = p

√ b b n → √ , √ √ 2 a an + b n + an

P hence the series an converges if b2 < 4a and diverges if b2 > 4a. If b2 = 4a, the root test fails but the log test (6.3.4) shows that the series converges in this case. (Exercise 6.3.11.) ♦ 6.2.15 Remark. By Exercise 2.4.12, if an > 0 for all n, then lim inf n

an+1 an+1 ≤ lim inf a1/n ≤ lim sup a1/n ≤ lim sup . n n n an an n n

This shows that if the ratio test determines convergence or divergence conclusively, then so does the root test. It also suggests that the root test may be effective when the ratio test fails. ♦ 6.2.16 Example. Let an = sn δn−1 + tn δn , where δn = 0 < s < t < 1. Then ( sn if n is odd, an = n t if n is even,

1 2 [1

+ (−1)n ] and

1/n

so the ratios an+1 /an are sn+1 /tn or tn+1 /sn , and the roots an are s or t, depending on the parity of n. Therefore, r = 0, r = P+∞ and ρ = t, which shows that the root test detects the convergence of an while the ratio test does not. ♦

Exercises 1.S Determine whether the series n! . 3 · 5 · · · (2n + 1) 4n n! (d) . 5 · 8 · · · (3n + 2) (a)

P

an converges or diverges, where an =

3 · 5 · · · (2n + 1) . (2n + 1)! 2 · 4 · · · (2n) (e) . 4 · 7 · · · (3n + 1) (b)

3 · 6 · · · (3n) . 3 · 5 · · · (2n + 1) 4 · 7 · · · (3n + 1) (f) . 5 · 9 · · · (4n + 1) (c)

174

A Course in Real Analysis P 2. Determine whether the series an converges or diverges, where an = (a) S (d) S

n3 . 2n n! . nn

2

ln n . n1.1 1 (f) 1+1/n . n

(b) (1 + r/n)n , r > 0. (c) (e) (n1/n − r)n .

2n 1 . (h) . n(ln n)(ln ln n)p n! √ rn , r 6= ±1. (k) (j) S sin2 (1/ n). 1 − rn n + sin n n + ln n (m) S 3 . (n) r . n + sin n n ln n n! 3n n! (q) . (p) S n . n (1.1)n3 1 1 (s) S ln n . (t) ln n . 2 3 3 3n + 4n . (w) (1 − r/n)n , r > 0. (v) S n 8 − 6n

(g) S

(i) sin2 (1/n). 1 , r 6= ±1. (1 − rn )2 1 (o) r . n ln n n 1 + an (r) , a, b > 0. 1 + bn (l)

(u) rsin n , r > 0. (x) 1/rln n , r > 0.

P P 3. Let an > 0 for all n and suppose that an diverges. Prove that an bn diverges for all sequences {bn } with lim inf n bn > 0. P∞ 4. Let bn → p > 0. Prove that n=1 n−bn converges if p > 1 and diverges if p < 1. Give anP example of a sequence {bn } with bn > 1 for all n and bn ↓ 1 such that n−bn diverges. P P 1/n 5.S Let an > 0 for all n. Prove that an converges iff n an converges. P∞ 6. Find all values of a, b, p, q > 0 for which n=1 an converges if an = lnp n 1 . (c) . q q n n lnp n −1 n Y (n + 1)p − np qp 1/q p S n/2 (d) (n + 1) − n . (e) . (f) p n! pj + 1 . nq j=1 n a + np 1 + anp 1 + anp (g) S . (h) . (i) . b + nq 1 + bnq 1 + bnq (a) S

1 . lnp n

(b)

P P 7. Let {an } be positive and decreasing. Prove that an converges iff a2n converges. P∞ 8. Let an > 0 for all n. Prove or disprove: If n=1 an converges, then

Numerical Infinite Series P∞

n=1 bn

175

converges, where bn =

(a) S a2n .

(b)

√

X

(c)

an .

(d) S min aj .

aj .

n≤j≤2n

n≤j≤n+m

(e) max aj . (f) n≤j≤2n

(i)

n X

an aj .

j=1

X

(g)

aj .

1 an

Y

aj . (k)

n 0. Prove that if

(h) S

aj .

n≤j≤2n

n≤j≤2n

(j)

Y 1 an

X

Y

aj .

1≤j≤n

aj . (l) S

n 0 and p > 3/r. Prove that bn /(rp − 3) converges for all sequences {bn } with bn → r. P 11.S Let an , bn > 0 and an+1 bn P/an ≤ bn+1 /bn for all n. Prove that if converges, then so does an . P P 12. Let an > 0. Show that an converges iff f (an ) converges, where f (x) = (a) sin x. x (e) . 1 + ax

P

(b) tan x.

(c) sin−1 x.

(d) tan−1 x.

(f) ln(1 + x).

(g) ex − 1.

(h) x3 + x2 + x.

13. Let {pn } be a sequence in Z+ and {an } a sequence of positive reals. P P √ (a) Prove that if n an converges, then n an an+pn converges, provided that either {pn } is bounded or an is decreasing. P √ (b) Suppose {pnP } is bounded and an ↓ 0. Prove that if n an an+pn converges, then n an converges. Does (b) hold if {an } is not monotone or {pn } is not bounded? 14.S Let g be positive and differentiable on [1, ∞) such that limx→∞ g(x) = 0, and let f be differentiable in a neighborhood of 0 such that f (0) = 0, fP(x) > 0 for x > 0, f 0 is continuous at 0, and f 0 (0) > 0. Prove that P∞ ∞ n=1 f (g(n)) converges iff n=1 g(n) converges. 15. Let f : R → [0, +∞) be twice differentiable and p > 0. Prove: P (a)S If p ≤ 1 and f (1/np ) converges, then f (0) = f 0 (0) = 0. P (b) If p ≥ 1 and f (0) = f 0 (0) = 0, then f (1/np ) converges. P 16. Let an ≥ 0 forPall n and suppose that an converges. Prove that if √ n−α an converges. Give an example which shows that α > 1/2, then the assertion is false if α = 1/2.

176

A Course in Real Analysis

Assume, for a contradiction, that 17.S This exercise shows that e is irrational. Pn e = m/n, m, n ∈ N. Let s = 1/k!. Using the series representation n k=0 P∞ e = k=0 1/k!, show that (a) n!(e − sn ) ∈ N. P∞ (b) n!(e − sn ) < k=1 (n + 1)−k = 1/n. Conclude that e must be irrational. Pn 18. Let sn = k=1 k −p , 0 < p < 1. Show that {sn −(1−p)−1 n1−p } converges. Conclude that if p + q > 1, 0 n 1 1 X 1 if p + q = 1, = lim n→+∞ nq 1−p kp k=1 +∞ if p + q < 1. Pn P 2 −p an n < +∞, where an , p > 0. 19. Let sn = k=1 ak and suppose that Prove that limn sn n−q = 0 for all q > (p + 1)/2. P cn diverges, 20. Let {an }, {bn }, and {cn } be positive sequences such that bn → b ∈ (0, +∞], and an /an+1 = 1 + bn cn . Prove that an → 0. Hint. Let r ∈ (0, b) and choose m so that bn > r for all n ≥ m. Then am+k /am+k+1 > 1 + rcm+k for all k ≥ 0.

6.3

More Refined Convergence Tests

The tests in this section are frequently useful when the root and ratio tests fail. The first is a generalization of the ratio test. 6.3.1 Kummer’s Test. Let an , bn > 0 for all n and set an bn − bn+1 . an+1 P (a) If c := lim inf n cn > 0, then n an converges. P P (b) If c := lim supn cn < 0 and n b−1 n diverges, then n an diverges. Pn Proof. (a) Set sn = k=1 ak and let r ∈ (0, c). Choose N so that cn ≥ r for all n ≥ N . Since an bn − an+1 bn+1 = cn an+1 , for all m > N we have cn :=

aN bN ≥ aN bN − am bm =

m−1 X n=N

m−1 X an bn − an+1 bn+1 ≥ r an+1 = r(sm − sN ), n=N

Numerical Infinite Series 177 P hence sm ≤ sN + aN bN /r. The partial sums of an are therefore bounded so the series converges. (b) If c < 0, there exists an N such that ak bk − ak+1 bk+1 < 0 for all k ≥ N . Then aN bN − an bn =

n−1 X

(ak bk − ak+1 bk+1 ) < 0

k=N

so an > (aN bN )/bn , for all n > N . Since the comparison test.

P

1/bn diverges,

P

an diverges by

A simple but important consequence of Kummer’s test is 6.3.2 Raabe’s Test. Let an > 0 for all n and set a n −1 . dn := n an+1 P (a) If d := lim inf n dn > 1, then an converges. P (b) If d := lim supn dn < 1, then an diverges. Proof. Take bn = n in Kummer’s test, so cn =

an n − (n + 1) = dn − 1. an+1

Then c = d − 1 and c = d − 1 and the assertions follow. 6.3.3 Example. We use Raabe’s test to show that the series X n

n Y 1 (k + a), where a > 0 and m ∈ N, (n + m)! k=1

converges iff m > 1 + a. Indeed, since an n+m+1 n(m − a) n −1 =n −1 = → m − a, an+1 n+1+a n+1+a the series converges if m − a > 1 and diverges if m − a < 1. If m − a = 1, then the general term reduces to n Y 1 m(m + 1) · · · (m + n − 1) 1 (m − 1 + k) = = , (n + m)! (n + m)! (m + n)(m − 1)! k=1

hence the series diverges in this case as well. Note that the ratio test is inconclusive in this example since an+1 /an → 1. ♦

178

A Course in Real Analysis The following test is sometimes useful when the root test fails.

6.3.4 Log Test. Let an > 0 for all n and set cn := ln(a−1 n )/ ln n. P (a) If c := lim inf n cn > 1, then an converges. P (b) If c := lim supn cn < 1, then an diverges. Proof. (a) Let p ∈ (1, c). Then there exists N such that cn P > p for all n ≥ N . p p an converges by For such n, ln(a−1 n ) > ln n , hence an < 1/n . Since p > 1, the comparison test. The proof of (b) is similar. 6.3.5 Example. Let an denote the general term of the series n ∞ X a + np n=1

b + nq

,

where a, b, p, q > 0. The root test shows that the series converges if p < q and diverges if p > q. If p = q, the test is inconclusive, so we consider cases. If a ≥ b, then an ≥ 1 and the series diverges. If a < b, we use the log test: By l’Hospital’s rule, the sequence cn =

− ln an ln(b + np ) − ln(a + np ) = ln n (ln n)/n

has the same limit as pnp−1 pnp−1 − p+1 (a + np ) − (b + np ) b + np a + np = pn 1 − ln n 1 − ln n (a + np )(b + np ) n2 p(a − b) n = . b/np + 1 (1 − ln n)(a + np ) The first quotient in the last expression tends to p(a − b) < 0. By l’Hospital’s rule, the second quotient has the same limit as 1 (1 − ln n)(pnp−1 ) − (a + np )/n

=

−n1−p , p(ln n − 1) + (a/np + 1)

which converges to 0 if p ≥ 1 and to −∞ if p < 1. Thus if p = q and a < b, then ( 0 if p ≥ 1 lim cn = n +∞ if p < 1, P hence an converges iff p < 1. ♦

Numerical Infinite Series

179

Exercises 1. Show that the ratio test is a consequence of Kummer’s test. 2. Show that Raabe’s test detects the convergence properties of the p-series P 1/np for p 6= 1, whereas the ratio and root tests do not. P 3.S Use Raabe’s test to determine the convergence of an if an = n n n Y Y 3k − 1 1 Y 2k − 1 1 . (b) . (c) n (3k + 1). (a) 3k + 1 2n 2k 3 (n + 1)! k=1

k=1

k=1

Show that the ratio test is inconclusive in each case. 4. Let a, b > 0 and m ∈ N. Use Raabe’s test to show that the following series converges iff b − a > m: n X Y n

Y −1 n mk + a mk + b .

k=1

k=1

5. Find all values of p > 0 for which the series converge: X pn n! X pn n! . (a)S . (b) n n (p + 1)(2p + 1) · · · (np + 1) n n What does the ratio test reveal? 6.S Show that the series ∞ X

1 · 3 · · · (2n − 1) (2 + p) · (4 + p) · · · (2n + p) n=1 converges iff p > 1. 7. Let p ∈ N. Use Raabe’s test to show that the series X (pn)! ppn (n!)p converges if p > 3 and diverges if p < 3. What does the ratio test tell us for these values of p? 8. Let a, b, c > 0 and m ∈ Z+ . Use Raabe’s test to show that the series ∞ n X 1 Y ak + b nm ak + c n=1 k=1

converges iff c > (m + 1)a + b.

180

A Course in Real Analysis

9. Let b > 0 and m ∈ N. Use Raabe’s test to show that !m ∞ n X Y kb kb + 1 n=1 k=1

converges if m > b and diverges if m < b. What does the ratio test reveal? What happens if m = b = 1? 10. Let P r > 0. Use the log test to determine the convergence behavior of an if an = 1 1 1 (a)S rln ln n . (b) . (c) . (d)S . r ln n r ln ln n n n (ln n)rn P 11. Let an be as in 6.2.14. Use the log test to verify that an converges if b2 = 4a. P (np ) 12. Let p, r > 0. Use the log test to verify that r converges iff r < 1. P ln n 13.S Let bn → b > 0. Use the log test to verify that b− converges if n b > e and diverges if b < e. P (np ) 14. Use the log test to show that the series (1 − 1/n) converges iff p > 1. P 15. Let p > 0 and a 6= 0. Use the log test to verify that (1 − a/np )n diverges if p ≥ 1, converges if 0 < p < 1 and a > 0, and diverges if 0 < p < 1 and a < 0. What does the root test reveal? P 16. Let a, b, p, q > 0. Determine the convergence behavior of an if an = a + np ln n a + np ln ln n 1 + anp ln ln n (a)S . (b) . (c) . b + nq b + nq 1 + bnq P 17. Show that (ln n)bn diverges if {bn } is bounded. What happens in the unbounded special cases (a) bn = − ln n and (b) bn = −np , p > 0? What does the root test reveal in (b)? 18.S (Loglog test) Let an > 0 for all n and set cn = −

ln (nan ) , c := lim inf cn , and c := lim sup cn . n ln ln n n

P Prove that an converges if c > 1 and diverges if c < 1. Use the test to determine the convergence behavior of ln ln n ∞ X 1 + an , a, b > 0. 1 + bn n=2

Numerical Infinite Series

181

19. Let a, b > 0. Use the log test to show that X 1 + anp ln n n

1 + bnq

diverges if p > q; converges if p < q; and if p = q, then converges if b/a > e and diverges if b/a < e. Use the log log test to show that the series also diverges if p = q and b/a = e. 20. Use Kummer’s test to prove Gauss’s test: Let an > 0 for all n and let {αn } be a bounded sequence such that αn an r =1+ − s, an+1 n n P where r, s ∈ R, s > 1. Then an converges iff r > 1. 21.S Use Kummer’s test to prove Bertrand’s test: Let an > 0 for all n and let {βn } be a sequence such that βn an 1 . =1+ − an+1 n n ln n Then

6.4

P

an converges if lim inf βn > 1 and diverges if lim sup βn < 1. n

n

Absolute and Conditional Convergence

The convergence tests in Sections 6.2 and 6.3 apply only to series with nonnegative terms. In this section we consider tests applicable to general series. P P 6.4.1 Definition. A series an is said to converge absolutely if |an | converges. A convergent series that does not converge absolutely is said to converge conditionally. ♦ P 6.4.2 Theorem. (a) If an converges absolutely, then the series X X X an , a+ a− n , and n converge and X (b) If

P

an =

X

a+ n −

X

a− n,

X

an converges conditionally, then

|an | =

P

X

a+ n and

a+ n +

P

X

a− n.

a− n diverge.

182

A Course in Real Analysis

Proof. (a) If

|an | converges, then the inequalities 1 0 ≤ a± n = 2 |an | ± an ≤ |an | P P − and the comparison test show that a+ an converge. The remaining n and + − assertionsP in (a) follow from the identities a = a − a− |an | = a+ n n P n andP n + an . P − P − (b) If an and an converge, then |an | = an + 2 Pan converges. P P an converges The same conclusion holds ifP an and P a+ n converge. Hence if conditionally, then neither a+ a− n nor n can converge. P∞ All series of the form n=1 (−1)n+1 /np , 0 < p ≤ 1 converge conditionally. This follows from the alternating series test given below. The following example is somewhat more interesting. P

6.4.3 Example. We show that the series s :=

∞ X −1 (−1)n np − 1 n=2

converges conditionally iff 1/2 < p ≤ 1 and absolutely iff p > 1. To see this, note first that if p < 0, then the nth term of the series does not tend to zero, and if p = 0 the series is undefined. So assume p > 0. If sn denotes the nth partial sum of the series, then X n n X 1 1 − = (αk + βk ), (6.5) s2n+1 = (2k)p − 1 (2k + 1)p + 1 k=1

k=1

where (2k + 1)p − (2k)p 2 and βk := . αk := p p p (2k) − 1 (2k + 1) + 1 (2k) − 1 (2k + 1)p + 1 By the mean value theorem applied to xp on the interval [2k, 2k + 1], pxkp−1 , for some xk ∈ (2k, 2k + 1). αk = (2k)p − 1 (2k + 1)p + 1 If 0 < p ≤ 1, then p 1 1 = ≤ p+1 p p 1−p 2p k (2k) − 1 (2k) + 1 (2k) (2k) − 1 Pn the last inequality large k. Therefore, k=1 αk converges by Pfor sufficiently comparison with k 1/k p+1 . Also, since αk ≤

(2k)1−p

βk 2 1 → 2p−1 , = p k −2p [2 − k −p ][(2 + 1/k)p + k −p ] 2 Pn the limit comparison test shows that k=1 βk converges iff p > 1/2. Therefore the partial sum (6.5) has a finite limit iff p > 1/2. Since s2n+1 − s2n → 0, the series s converges iff p > 1/2. Since np − 1 ≤ (−1)n+1 np − 1 ≤ np + 1, s converges absolutely iff p > 1. ♦

Numerical Infinite Series

183

The tests of Sections 6.2 and 6.3 for positive-term series may be used in conjunction with 6.4.2 to test series with terms of mixed sign. For example, −2 n sin n ≤ n−2 , together with the comparison test, shows the inequality P that the series n−2 sin n converges absolutely and hence converges. The remainder of the section describes tests that are useful for establishing conditional convergence. They rely on the following discrete analog of the integration-by-parts formula, due to Abel. 6.4.4 Summation by Parts. Let {an }, {bn }, and {sn } be sequences such that s0 = 0 and sk − sk−1 = ak , k ≥ n ≥ 1. Then, for m > n ≥ 1, m X

ak bk =

k=n

m−1 X

sk (bk − bk+1 ) + sm bm − sn−1 bn .

k=n

Proof. Since ak = sk − sk−1 , m X

ak bk =

k=n

m X

sk bk −

k=n

m X

sk−1 bk =

k=n

m X

sk bk −

k=n

m−1 X

sk bk+1 .

k=n−1

Combining the last two sums yields the desired formula. 6.4.5 Dirichlet’s Test. Let {an } and {bn } be sequences such that the following conditions hold: P (a) The partial sums of an are bounded. (b) limn bn → 0, and P (c) The series |bn+1 − bn | converges, which is the case, for example, if {bn } is monotone. P Then an bn converges. Proof. Let sn :=

n X

ak and tn :=

k=1

n X

ak bk .

k=1

If |sn | ≤ M for every n, then, by 6.4.4, m m X X |tm − tn−1 | = ak bk ≤ M |bk − bk+1 | + M (|bn | + |bm |) m ≥ n > 1. k=n

k=n

Since the right side of the inequality tends to 0 as m, n → ∞, {tn } is a Cauchy sequence and hence converges. If {bn } is monotone, say decreasing, then n X k=1

which converges.

|bk+1 − bk | =

n X k=1

(bk − bk+1 ) = b1 − bn+1 ,

184

A Course in Real Analysis

P∞ 6.4.6 Example. We apply Dirichlet’s test to the series n=1 bn sin(nθ), where {bn } is monotone andP bn → 0. To establish the boundedness of the sequence n of partial sums sn := k=1 sin(kθ), we use the identity 2 sin (θ/2) sin (kθ) = cos (k − 1/2)θ − cos (k + 1/2)θ . Summing, 2 sin (θ/2)

n X

sin (kθ) = cos(θ/2) − cos (n + 1)θ/2 .

k=1 −1 Thus P∞ if θ is not a multiple of 2π, then |sn | ≤ | sin(θ/2)| . By 6.4.5, n=1 bn sin(nθ) converges for all θ. Note that if, for example, θ = π/2 and bn = 1/n, then the convergence is conditional. ♦ P∞ n+1 6.4.7 Alternating Series Test. If bn ↓ 0, then n=1 (−1) bn converges. P∞ n+1 Proof. The partial sums of are clearly bounded, hence the n=1 (−1) assertion follows from 6.4.5.

6.4.8 (Alternating Harmonic Series). By 6.4.7, the series P∞ Example. n+1 −1 (−1) n converges. We show that its value is ln 2. Let n=1 sn =

n X (−1)k+1

k

k=1

and tn =

n X 1 − ln n. k

k=1

By 6.1.9, the sequence {tn } converges. Also, by Exercise 1.5.3, s2n =

2n X (−1)k+1 k=1

k

=

2n X 1 = t2n − tn + ln 2. k

k=n+1

It follows that s2n → ln 2. Since s2n+1 − s2n → 0, sn → ln 2.

♦

The contrast between absolutely convergent and conditionally convergent series is strikingly displayed in the context of rearrangements. P∞ P∞ 6.4.9 Definition. A rearrangement of a series n=1 an is a series k=1 amk , where {mk } is a sequence of positive integers that contains every positive integer exactly once.1 ♦ P∞ 6.4.10PTheorem. If n=1 an converges absolutely to s, then any rearrange∞ ment k=1 amk converges absolutely to s. Proof. Assume first that an ≥ 0 for all n. Let tn =

n X k=1

1 In

amk and sn =

n X

ak .

k=1

other words, k 7→ mk is a one-to-one mapping of N onto itself.

Numerical Infinite Series

185

For each N , choose K so large that the terms ak , 1 ≤ k P ≤ N , are included ∞ among the terms amP , 1 ≤ k P ≤ K. Then sN P ≤ tK ≤ k k=1 amk . Letting a ≤ a a N → ∞ shows that . Since is a rearrangement of n m n k n k n P reverse inequality holds as well. The general case follows by k amk , the P + P − considering an and an and using 6.4.2. 6.4.11 Example. Consider the series t := 1 −

1 1 1 1 1 1 1 1 − p + p − p − p + p − p − p + ··· , p 2 4 3 6 8 5 10 12

which is a rearrangement of the alternating series s := 1 −

1 1 1 1 1 1 1 1 1 + p − p + p − p + p − p + p − p + ··· , 2p 3 4 5 6 7 8 9 10

If p > 1, then both series converge absolutely and t = s. If p = 1, then the two series converge to different values. Indeed, if sn and tn denote the nth partial sums of s and t, respectively, then t3n =

n X k=1

1 1 1 − − 2k − 1 4k − 2 4k

Since t3n+1 = t3n +

n

=

1X 2

k=1

1 1 − 2k − 1 2k

=

s2n s → . 2 2

1 1 and t3n+2 = t3n+1 − , 2n + 1 4n + 2

we see that tn → s/2.

♦

The phenomenon illustrated in the last example holds generally, as shown by the following remarkable result due to Riemann. P∞ 6.4.12 Theorem. If s := n=1 an converges conditionally, then, for any real number x, some rearrangement of s converges to x. Proof. We may assume that x ≥ 0. For n ∈ N let sn :=

n X j=1

aj , s + n :=

n X

− a+ j , sn :=

j=1

n X

+ a− j , and s0 := 0.

j=1

+ Since s+ n → +∞ (6.4.2), there exists a smallest integer m1 such that sm1 > x. Since x ≥ 0, m1 6= 0. Because s− → +∞, there exists a smallest positive n − integer n1 such that s+ m1 − sn1 < x and then a smallest integer m2 such that − s+ m2 − sn1 > x. Obviously, m2 > m1 . Continuing in this manner, we obtain strictly increasing sequences {mk } and {nk } with the following properties:

• mk is the smallest integer such that − + − + − tk := s+ mk − snk−1 = (a1 + · · · + amk ) − (a1 + · · · + ank−1 ) > x,

186

A Course in Real Analysis

• nk the smallest integer such that − + − + − rk := s+ mk − snk = (a1 + · · · + amk ) − (a1 + · · · + ank ) < x.

Now consider the series − − + + − + s0 := a+ 1 + · · · + am1 − a1 − · · · − an1 + am1 +1 + · · · + am2 − an1 +1 − · · · .

The terms of s0 are either aj or 0, and s0 contains each term of the series s exactly once. Thus s0 is a rearrangement of s. We show that s0 = x. By the minimality properties of the sequences {mk } and {nk }, − tk − a+ mk ≤ x < tk and rk < x ≤ rk + ank ,

hence

+ x − a+ nk ≤ rk < x < tk ≤ x + amk .

Since an → 0,

lim rk = lim tk = x. k

k

(6.6)

Now let s0k denote the kth partial sum of the series s0 and consider the partial sums − + − r1 = (a+ 1 + · · · + am1 ) − (a1 + · · · + an1 ), − + − t2 = (a+ 1 + · · · + am2 ) − (a1 + · · · + an1 ), − + − r2 = (a+ 1 + · · · + am2 ) − (a1 + · · · + an2 ).

If m1 + n1 ≤ k ≤ m2 + n1 , then s0k includes the terms of r1 , additional terms + 0 from a+ m1 +1 + · · · + am2 , and no others, hence r1 ≤ sk ≤ t2 . Similarly, if m2 + n1 ≤ k ≤ m2 + n2 , then s0k includes the terms of t2 , additional terms − 0 from −a− n1 +1 − · · · − an2 , and no others, so r2 ≤ sk ≤ t2 . In general, for j ≥ 1, mj + nj ≤ k ≤ mj+1 + nj ⇒ rj ≤ s0k ≤ tj+1 and mj+1 + nj ≤ k ≤ mj+1 + nj+1 ⇒ rj+1 ≤ s0k ≤ tj+1 . From (6.6), s0k → x.

Exercises P P 1. Suppose that an converges absolutely. Prove that an bn converges absolutely for all sequences {bn } with lim supn→∞ |bn | < +∞. P 2.S Suppose an does not converge absolutely. P the ratio test shows that Can an still converge conditionally? P∞ n+1 3. For an alternating series s = bn , prove the inequality n=1 (−1) |s − sn | ≤ bn . This result is useful in estimating the error made by using sn to approximate s. For example, use the estimate to determine how large P∞ n should be so that the partial sum sn agrees with s = n=1 (−1)n+1 /n4 in nine decimal places.

Numerical Infinite Series 4. Let p > 0. Determine whether the series conditionally, or diverges, where an = (−1)n . n1/n (c) S (−1)n sin(1/np ).

187 P

an converges absolutely,

(b) S (−1)n (n1/n − 1).

(a) S

(d) (−1)n sin−1 (1/np ).

(e) (−1)n tan(1/np ).

(f) (−1)n tan−1 (1/np ).

sin[(2n + 1)π/2] . ln n √ √ n+1− n . (i) S (−1)n np 3n . (k) (−1)n √ n3 + 2 (−1)n , (p 6= 1). (m) S n p + (−1)n (−1)n n! (o) . 3 · 5 · · · (2n + 1) (−1)n en n! . (q) 5 · 8 · · · (3n + 2) (g)

(h) (j) (l) (n) (p)

(−2)n . n! (−1)n . n lnp (n + 1) (n!)2 (−1)n pn . (2n)! (−1)n , (n ≥ 2). np + (−1)n (−1)n 3 · 6 · · · (3n) . 3 · 5 · · · (2n + 1)

(r) (−1)n+1 n[(1)

n

−3]/2

.

5. Suppose that {bn } is monotone and bn → 0. Use the identity 2 sin (θ/2) cos (nθ) = sin (n + 1/2)θ − sin (n − 1/2)θ P∞ to verify that the series n=1 bn cos nθ converges if θ/(2π) 6∈ Z. P∞ 6. Let bn ↓ 0 and m ∈ N. Show that n=0 (−1)bn/mc bn converges. 7. Let m ∈ N. Show that m ∞ X (−1)n+1 m X (−1)n+m+1 = + δm ln 2, n(n + m) n n=1 n=1

where δm = 0 or 2 according as m is even or odd. P∞ 8. (Abel) Prove that if n=1 an converges and {bn } is bounded and monoP∞ tone, then n=1 an bn converges. 9.S Prove that

∞ X (n − 1/2) sin(nθ) converges for all real θ iff p > 1. np + (−1)n n=2

10. Let p > 1. Express each of the series ∞ X 1 terms of . p n n=1

∞ X

∞ X 1 (−1)n and in (2n − 1)p np n=1 n=1

188

A Course in Real Analysis P P that if 11. Prove nan converges, then an converges and, moreover, P |an |p converges for every p > 1. What if p = 1? P −p P −q 12. Prove that if n an converges, then n an converges for all q > p. P n 13.S (a) Let sn = k=1 an , where an → 0. Suppose P there exists a positive integer q such that snq → s ∈ R. Prove that an converges to s. (b) Use (a) to sum the series s := 1 +

1 1 1 1 1 1 1 1 + − − − + + + − ··· , 2 3 4 5 6 7 8 9

where sums of length three alternate signs. Generalize your result to alternating sums of length p > 1. (c) Show that in contrast to (b), the following series diverges, where sums of lengths p = 3 and q = 2 alternate signs. t := 1 +

*6.5

1 1 1 1 1 1 1 1 + − − + + + − − ··· . 2 3 4 5 6 7 8 9

Double Sequences and Series

A double sequence is a doubly indexed infinite array {am,n } = {am,n }∞ m,n=1 of real numbers am,n .2 Associated with each double sequence are the so-called iterated limits lim lim am,n and lim lim am,n . m

n

n

m

For the first iterated limit to exist, each inner limit bm := limn am,n , as well as the outer limit limm bm , must exist. Similar remarks apply to the second iterated limit. The following scheme illustrates the case when the iterated limits exist and equal L. a1,1 a2,1 .. .

a1,2 a2,2 .. .

am,1 ↓ c1

am,2 ↓ c2

··· ··· ··· ··· ··· ···

a1,n a2,n .. .

→ b1 → b2 .. .

am,n ↓ cn

→ bm ↓ →L

In addition to iterated limits, a double sequence gives rise to a third type of limit, frequently called a double limit to distinguish it from iterated limits. 2 More

precisely, a double sequence is a function (m, n) 7→ am,n from N × N to R.

Numerical Infinite Series

189

6.5.1 Definition. Let L ∈ R. We write L = lim am,n m,n

and say that am,n converges to L or has limit L if for each ε > 0 there exists N ∈ N such that |am,n − L| < ε for all n, m ≥ N . We also write lim am,n = +∞ (−∞)

m,n

if for each r ∈ R there exists N ∈ N such that am,n > r (< r) for all n, m ≥ N . ♦ Double limits have properties similar to limits of single sequences. For example, double limit analogs of 2.1.3, 2.1.4, 2.1.5, and 2.1.11, are readily formulated and proved. It is easy to find examples of iterated limits that exist but are unequal; am,n = (1 − 1/n)m is one such. When this happens, the double limit cannot exist, as shown in 6.5.2 below. However, even if the iterated limits are equal, the double limit may fail to exist. This is the case for the sequence defined by ( 1 if m = n, and am,n = 0 otherwise, which has zero iterated limits. Finally, the example am,n = (−1)m+n (1/m + 1/n) shows that a double limit may exist even if both iterated limits fail to exist. The following theorem gives the basic connection between double limits and iterated limits. 6.5.2 Iterated Limit Theorem. Let {am,n } be a double sequence such that limn am,n exists for each m and limm am,n exists for each n. If the double limit limm,n am,n exists, then the iterated limits limm limn am,n and limn limm am,n exist and equal the double limit. Proof. Let L := limm,n am,n , bm := limn am,n , and cn := limm am,n . Given ε > 0, choose N ∈ N such that |am,n − L| < ε for all m, n ≥ N . Letting n → +∞ yields |bm − L| ≤ ε for all m ≥ N . Therefore, bm → L. Similarly, cn → L. 6.5.3 Definition. Given a double sequence {am,n }, form the partial sums sm,n =

m X n X j=1 k=1

aj,k , m, n ∈ N.

190

A Course in Real Analysis

The double infinite series X

am,n =

X

∞ X

am,n =

m,n

am,n

m,n=1

is said to converge to s ∈ R ifP {sm,n } converges to s in the sense of 6.5.1. The series converges absolutely if |am,n | converges, and converges conditionally P if am,n converges but not absolutely. ♦ As in the case of single series, an absolutely convergent double series converges (Exercise 7). Moreover, aP doublePseries with nonnegative terms converges m n absolutely iff the partial sums j=1 k=1 aj,k are bounded (Exercise 5). The iterated limits lim lim sm,n = lim lim m

n

m

n

and lim lim sm,n = lim lim n

m

n

m X n X

m

aj,k =

j=1 k=1 n X m X

∞ X ∞ X

aj,k

j=1 k=1

aj,k =

k=1 j=1

∞ X ∞ X

aj,k

k=1 j=1

are called iterated series. The following result, a special case of the Fubini– Tonelli theorem, establishes a connection between double and iterated series. P 6.5.4 Fubini–Tonelli Theorem for Series. A double series am,n is absolutely convergent iff one (hence both) of the following conditions hold: ∞ X ∞ X

|am,n | < +∞ and

|am,n | < +∞.

(6.7)

n=1 m=1

m=1 n=1

In this case, X

∞ X ∞ X

am,n =

m,n

∞ X ∞ X

am,n =

m=1 n=1

∞ X ∞ X

am,n .

(6.8)

n=1 m=1

Pm Pn Pm Pn Proof. Set sm,n = j=1 k=1 aj,k and tm,n = j=1 k=1 |aj,k |. The first assertion of the theorem is clear, since each condition in (6.7) implies that T := supm,n tm,n < +∞, P and conversely. Now suppose that am,n is absolutely convergent. Let s := limm,n sm,n . For each j, n X |aj,k | ≤ tj,n ≤ T for all n, k=1

hence that

P∞

k=1 aj,k converges. Set rm :=

m X ∞ X

aj,k . Given ε > 0, choose N such

j=1 k=1 m X n X aj,k − s < ε for all m, n ≥ N. j=1 k=1

Numerical Infinite Series

191

Fixing m ≥ N and letting n → +∞ in this inequality yields |rm − s| ≤ ε. This shows that rm → s, which is the first equality in (6.8). The proof of the second equality is similar.

Exercises 1. Let α : N → N be strictly increasing. Show that if L := limm,n am,n exists in R, then limm,n aα(n),n exists and equals L. 2. A double sequence {am,n } is said to be Cauchy if, given ε > 0, there exists N ∈ N such that |am,n − am+p,n+q | < ε for all m, n ≥ N and all p, q ≥ 0. Prove that {am,n } converges iff it is Cauchy. Hint. Show that {an,n } converges. 3.S Determine the convergence behavior, double and iterated, of the following sequences, where a, b > 0: (a) sin(m/n). m−n . m+n 1 (g) 1/n . m n + nm sin(1/n) (j) . am + bn (d)

4. Show that if

ln(mn) . n mn (e) . (m + n)2 n (h) . m + n2 m2 n (k) 2 . an + bm4 (b)

(−1)m m . m+n mn (f) 2 . m + n2 n3 m (i) 4 . m + n4 n2 sin(1/n) (l) . m+n

(c)

am,n converges, then limm,n am,n = 0. P 5. Let am,n ≥ 0 for all m, n ∈ N. Prove that m,n am,n converges iff s := supm,n sm,n < +∞, in which case the series sums to s. P

6. State and prove a comparison test for double series with nonnegative terms. 7. Prove that an absolutely convergent double series converges. 8. For = an bm . Prove P that c := P sequences {an } and {bn }, set cm,n P b conm,n cm,n converges absolutely iff a := n an and b := P n n verge absolutely, in which case c = ab. Conclude that m,n m−q n−p converges iff p, q > 1. 9.S Given a double sequence {am,n } with am,n ≥ 0, let {bn } be the sequence obtained = n + 1, that is, Pn by summing am,n alongPthe diagonals j + k P bn := j=1 aj,n+1−j . Prove that am,n converges iff n bn converges, in which case the two series are equal.

192

A Course in Real Analysis

10. Use Exercise 9 to show that the double series X X 1 1 S , and (c) (a) , (b) p 2 + n2 )p/2 (m + n) (m m,n m,n

1 p + np m m,n

X

converge iff p > 2. Show that for p > 2, ∞ ∞ X X 1 1 1 = − . p p−1 (m + n) n np n=2 n=2 m,n=1 ∞ X

P 11.S Prove that m,n rmn converges iff |r| < 1, in which case the iterated P∞ P∞ mn series m=1 n=1 r converges. 12.S Prove the root test for double series with Pnonnegative terms: Suppose that L := limm,n am,n 1/mn exists. Then m,n am,n converges if L < 1 and diverges if L > 1. 13. Let am,n = (−1)m n−m−2 . Prove that X |am,n | = 1 and m≥0,n≥2

X m≥0,n≥2

am,n = 1/2.

Chapter 7 Sequences and Series of Functions

7.1

Convergence of Sequences of Functions

Unlike numerical sequences, sequences of functions have several modes of convergence. In this chapter we consider the two most common types: pointwise and uniform. Other types of convergence will be examined in Chapter 11. 7.1.1 Definition. Let S be a nonempty set. A sequence of real-valued functions fn on S is said to converge pointwise on S to a function f : S → R if fn (x) → f (x) for each x ∈ S. We then write f = limn f or fn → f (on S). ♦ The following theorem is an immediate consequence of 2.1.11 and 3.1.9. 7.1.2 Theorem. Let fn → f and gn → g pointwise on S and let h be continuous such that h ◦ fn and h ◦ f are defined on S. Then, for α, β ∈ R, αfn + βgn → αf + βg, fn gn → f g,

fn f → (if g 6= 0) and h ◦ fn → h ◦ f gn g

pointwise on S. The definition of pointwise convergence may be phrased as follows: For each x ∈ S and ε > 0 there exists an index N such that |fn (x) − f (x)| < ε for all n ≥ N . Here, the index N usually depends on both ε and x. Removing the

f + fn f f − S

FIGURE 7.1: Uniform convergence of fn to f . dependence on x results in the stronger property of uniform convergence: 193

194

A Course in Real Analysis

7.1.3 Definition. A sequence of functions fn : S → R is said to converge uniformly on S to a function f : S → R if, for each ε > 0, there exists N ∈ N such that |fn (x) − f (x)| < ε for all n ≥ N and all x ∈ S. (See Figure 7.1.) ♦ Clearly, uniform convergence implies pointwise convergence. The examples below show that the converse is not generally true. For these examples and for the exercises at the end of the section, the following propositions are useful. 7.1.4 Proposition. Let fn , f : S → R. Suppose that there exists a sequence {an } of positive real numbers such that an → 0 and |fn (x) − f (x)| ≤ an for all x ∈ S and all n. Then fn converges uniformly to f on S. Proof. One need only choose N in the definition of uniform convergence so that an < ε for all n ≥ N . 7.1.5 Proposition. Let fn , f : S → R. Then fn converges uniformly to f on S iff lim fn (bn ) − f (bn ) = 0 n

for any sequence {bn } in S. Proof. If fn converges uniformly to f on S, choose N so that |fn (x)−f (x)| < ε for all n ≥ N and all x ∈ S. For such n, |fn (bn ) − f (bn )| < ε. Conversely, suppose fn does not converge uniformly to f on S. Then there exists an ε > 0, and points bn ∈ S such that |fn (bn ) − f (bn )| ≥ ε for infinitely many n. Thus the sequential condition fails. 7.1.6 Examples. (a) The sequence {xn } converges pointwise but not uniformly to zero on (−1, 1). (Take bn = 1/21/n in 7.1.5.) The convergence is uniform on intervals [−r, r], 0 < r < 1, since on such an interval |xn | ≤ rn and rn → 0. (b) The sequence {n/xn } converges pointwise to zero on (1, +∞) but the convergence is not uniform there, as can be seen by taking bn = 21/n in 7.1.5. The convergence is uniform for x ∈ [r, +∞), r > 1, since then |n/xn | ≤ n/rn → 0. (c) The sequence {xn e−nx } converges uniformly to zero on [0, +∞) since xn e−nx ≤ e−n for x ≥ 0. (d) The sequence {n−1 sin nx} converges uniformly to zero on R since |n sin nx| ≤ 1/n for all x. −1

(e) The sequence {sin(x/n)} converges pointwise to zero on R, but the convergence is not uniform, as can be seen, for example, by takingbn = πn/2 in 7.1.5. The convergence is uniform on bounded intervals [a, b] since on this interval | sin(x/n)| ≤ |x|/n ≤ max{|a|, ||b}. ♦ There is an analog of 7.1.2 for uniform convergence; however, it is more restrictive and requires the notion of uniform boundedness.

Sequences and Series of Functions

195

7.1.7 Definition. A sequence of functions fn is said to be uniformly bounded on S with uniform bound M if |fn (x)| ≤ M for all x ∈ S and all n. ♦ 7.1.8 Proposition. Let fn → f pointwise on a set S. (a) If {fn } is uniformly bounded on S, then f is bounded on S. (b) If each fn is bounded on S and fn → f uniformly on S, then {fn } is uniformly bounded on S, hence f is bounded. (c) If fn → f uniformly on S and f is bounded, then {fn }∞ n=N is uniformly bounded for some N . Proof. (a) This follows by letting n → +∞ in the inequality |fn (x)| ≤ M . (b) Choose N such that |fn (x) − f (x)| ≤ 1 for all n ≥ N and x ∈ S. For such n and for all x ∈ S, |fn (x)| ≤ |fn (x) − f (x)| + |f (x) − fN (x)| + |fN (x)| ≤ 2 + MN , where MN is a bound for fN on S. Since the functions f1 , . . ., fN −1 are bounded, {fn }∞ n=1 is uniformly bounded. (c) Let |f (x)| ≤ M for all x. Choose N such that |fn (x) − f (x)| ≤ 1 for all n ≥ N and x ∈ S. For such n, |fn (x)| ≤ 1 + M for all x ∈ S. The sequence {fn } on (0, 1) defined by ( n if 0 < x < 1/n, fn (x) = 0 if otherwise shows that the first assertion in (b) may be false if the convergence is merely pointwise. 7.1.9 Theorem. Let fn → f and gn → g uniformly on S and let h be uniformly continuous such that h ◦ f and h ◦ fn are defined on S. Then (a) αfn + βgn → f + g uniformly on S, α, β ∈ R. (b) h ◦ fn → h ◦ f uniformly S. (c) fn gn → f g uniformly on S if {fn } and {gn } are uniformly bounded on S. 1 1 1 (d) → uniformly on S if is uniformly bounded on S. gn gn gn

196

A Course in Real Analysis

Proof. The proof of (a) is left to the reader. To prove (b), choose δ > 0 such that |h(u) − h(v)| < ε for all u, v with |u − v| < δ and choose N such that |fn (x)−f (x)| < δ for all x ∈ S and n ≥ N . For such n, |h◦fn (x)−h◦f (x)| < ε. For (c), let M > 0 be a common uniform bound for the sequences {|fn |} and {|gn |} and let ε > 0. Choose N such that |fn (x) − f (x)| < ε/2M and |gn (x) − g(x)| < ε/2M. for all x ∈ S and n ≥ N . For such n and x, |fn (x)gn (x) − f (x)g(x)| ≤ |fn (x)gn (x) − f (x)gn (x)| + |f (x)gn (x) − f (x)g(x)| = |gn (x)| |fn (x) − f (x)| + |f (x)| |gn (x) − g(x)| ≤ M |fn (x) − f (x)| + M |gn (x) − g(x)| < ε. For (d), let 1/|gn (x)| ≤ M for all n and x. Then the same inequality holds for g, and 1 1 |gn (x) − g(x)| 1 gn (x) − g(x) = |gn (x)g(x)| ≤ M 2 |gn (x) − g(x)|. The hypothesis of uniform boundedness in parts (c) and (d) of the theorem cannot be relaxed. (See Exercises 6 and 7.) There are versions of the Cauchy criterion for pointwise and uniform convergence of sequences of functions. For the pointwise version, consider a sequence of functions fn on S such that limm,n |fn (x) − fm (x)| = 0 for each x ∈ S. Then {fn (x)}∞ n=1 is a Cauchy sequence of real numbers and hence converges to a unique real number f (x). Thus fn → f on S. Here is the analogous result for uniform convergence: 7.1.10 Uniform Cauchy Criterion. A sequence of functions fn converges uniformly on a set S iff for each ε > 0 there exists an index N such that |fn (x) − fm (x)| < ε for all x ∈ S and all m, n ≥ N .

(7.1)

Proof. If fn → f uniformly on S, then, given ε > 0, there exists an index N such that |fn (x) − f (x)| < ε/2 for all x ∈ S and all n ≥ N . An application of the triangle inequality yields (7.1). Conversely, assume that the condition holds. Then, in particular, limm,n |fn (x) − fm (x)| = 0 for every x ∈ S, hence, by the observation preceding the theorem, there exists a function f such that fn → f pointwise on S. We claim that the convergence is in fact uniform. To see this, let ε > 0 and choose N as in (7.1). Letting m → +∞ in that inequality then yields |fn (x) − f (x)| ≤ ε for all x ∈ S and all n ≥ N . This shows that fn → f uniformly on S. 7.1.11 Definition. Let S be an arbitrary set and let fn : S → R. If the sequence {fn (x)} is increasing (decreasing) for each x ∈ S and fn → f on S, we write fn ↑ f (fn ↓ f ). In either case we say that {fn } is monotone. ♦

Sequences and Series of Functions

197

The following theorem gives general conditions under which pointwise convergence implies uniform convergence. 7.1.12 Dini’s Theorem. Let f and fn be continuous on [a, b] for each n and suppose that either fn ↓ f or fn ↑ f on [a, b]. Then fn → f uniformly. Proof. We may assume that fn ↓ f . Let gn = fn − f , so gn ↓ 0. Suppose the assertion of the theorem is false. Then there exists an ε > 0, a subsequence {hn } of {gn }, and a sequence {xn } in [a, b] such that hn (xn ) ≥ ε for all n. (Why?) By the Bolzano–Weierstrass theorem, there exists a subsequence {xnk } converging to some x ∈ [a, b]. Since hn ↓, for any fixed n and all sufficiently large k, hn (xnk ) ≥ hnk (xnk ), hence hn (xnk ) ≥ ε. Letting k → +∞ in the last inequality yields hn (x) ≥ ε for all n, contradicting that hn (x) → 0. The examples xn on [0, 1) and x−n on [2, +∞) show that Dini’s theorem is false if the interval is not closed and bounded. The decreasing sequence defined by if 0 ≤ x ≤ 1, 1 fn (x) = 1 + n(1 − x) if 1 ≤ x ≤ 1 + 1/n, (7.2) 0 if 1 + 1/n ≤ x ≤ 2 shows that continuity of the limit function in Dini’s theorem is essential.

Exercises 1. Find the largest subset of R on which the given sequence converges pointwise, and determine the intervals on which the convergence is uniform. (a) xn (1 − x)n . nx2 . enx2 √ 2 nx (g) S . 1 + nx2 x2n (j) S . 2 + x2n (d) S

(b) S np xn (1 − x).

(c) ex/n .

1 . 2n 1 + x (1 − x)2 nx2 (h) . 1 + nx2 1 (k) . 1 + |x|n

x (f) n1/2 sin 2/3 . n n x (i) . 2+x n sin x2 (l) . 1 + nx2

(e)

2. Describe the convergence behavior of the following sequences on [0, 1]: x nx nx 1 . (b)S . (c) . (d) . (a)S nx + 1 nx + 1 n2 x + 1 n2 x2 + 1 3. Describe the convergence behavior of the sequences on (0, 1): (a) {x1/n }.

(b) {x1+1/n }.

(c) {x−1/n }.

(d) {x1−1/n }.

4. Show directly that the sequence defined in (7.2) does not converge uniformly.

198

A Course in Real Analysis

5. Let p, q > 0. Prove that the sequence of functions uniformly to zero on [0, +∞) iff p < q.

xp converges n + xq

6.S Give an example of sequences {fn }, {gn } and functions f , g such that fn → f and gn → g uniformly, and fn gn → f g pointwise but not uniformly. 7. Give an example of a sequence {gn } and a function g such that gn → g uniformly and 1/gn → 1/g pointwise but not uniformly. 8. Let −∞ < a < b ≤ +∞. Suppose that fn → f uniformly on [a, r] for every r ∈ (a, b). Prove that fn → f uniformly on [a, b) iff for each sequence {bn } with bn ↑ b, fn (bn ) − f (bn ) → 0. Use this to show that fn (x) := x−n does not converge uniformly on [2, +∞). 9. Let fn be bounded for each n and let fn → f uniformly on a set S. Prove that supS fn → supS f and inf S fn → inf S f . 10.S Let f be uniformly continuous on R and an → a. Set fn (x) = f (x + an ). Show that {fn } converges uniformly on R. 11. Let fn be continuous on [a, b] for each n and let fn converge uniformly on (a, b) ∩ Q. Prove that fn converges uniformly on [a, b]. 12. Prove: If fn → f uniformly on each of the sets S1 , . . . , Sm , then fn → f uniformly on S1 ∪ · · · ∪ Sm . Show that the corresponding statement for a union of infinitely many sets is false. 13.S For x ∈ [0, 1] define ( 1 if x ∈ Q and x = k/m in reduced form with m ≤ n, fn (x) = 0 otherwise. Show that {fn } converges pointwise but not uniformly to the Dirichlet function. 14. Let p ∈ N. For x ∈ [0, 1] define ( (m + 1/n)p if x ∈ Q, x = k/m in reduced form gn (x) = 0 if x is irrational. Show that gn converges uniformly on [0, 1] iff p = 1. 15. Let {fn } be uniformly bounded, let f, g be bounded on [0, 1], and suppose that fn → f pointwise (uniformly) on [r, 1] for each 0 < r < 1. If g is continuous at 0 and g(0) = 0, prove that fn g → f g pointwise (uniformly) on [0, 1].

Sequences and Series of Functions

199

16. Let {fn } be uniformly bounded and fn → f uniformly on S. (a) Prove that (f1 + f2 + · · · + fn )/n → f uniformly on S. (b)S Suppose for some r > 0 that fn (x) ≥ r for all n and all x ∈ S. Prove that (f1 f2 · · · fn )1/n → f uniformly on S. 17.S Let f0 be a bounded function on a set S and 0 < r < 1. Define a sequence {fn } recursively by fn (x) = sin rfn−1 (x) , x ∈ S, n ≥ 1. Prove that {fn } converges uniformly on S. Show that a similar result holds if S is an interval and sin x is replaced by any function g such that supx |g 0 (x)| < 1/r, where r is any positive number. 18. Let g and h be positive and continuous on [a, b] and define fn (x) :=

ng(x) . 1 + n2 h(x)

Prove that the following convergence is uniform on [a, b]: g g2 (a) n sin fn → . (b) n 1 − cos fn → 0. (c) n2 1 − cos fn → 2 . h 2h

7.2

Properties of the Limit Function

The theorems in this section give conditions under which the properties of continuity, integrability, or differentiability of functions in a sequence are passed along to the limit function. We shall see that pointwise convergence is generally insufficient for this—the stronger property of uniform convergence is needed. The following theorem asserts that under suitable conditions two limit processes may be interchanged. It is one of several such results to be found in the text. 7.2.1 Interchange of Limits. Let fn → f uniformly on a subset E of R and let a be an accumulation point of E such that Ln := lim{x→a, x∈E} fn (x) exists in R for each n. Then L := limn Ln exists in R and lim{x→a, x∈E} f (x) = L. In other words, the equality lim x→a lim fn (x) = x→a lim lim fn (x) n

x∈E

x∈E

n

holds provided that each inner limit exists in R and the convergence in the inner limit on the right is uniform.

200

A Course in Real Analysis

Proof. Given ε, for each n choose δn > 0 such that |fn (x) − Ln | < ε/3 for all x ∈ E with |x − a| < δn . Next, choose N ∈ N such that |fn (x) − f (x)| < ε/6 for all x ∈ E and all n ≥ N . For n, m ≥ N , choose x ∈ E such that |x − a| < min{δn , δm }. Then |Ln − Lm | ≤ |Ln − fn (x)| + |fn (x) − fm (x)| + |Lm − fm (x)| < ε. This shows that {Ln } is a Cauchy sequence and hence converges to some L ∈ R. Let n ≥ N be sufficiently large so that |Ln − L| < ε/6. If x ∈ E and |x − a| < δn , then |f (x) − L| ≤ |f (x) − fn (x)| + |fn (x) − Ln | + |Ln − L| < ε/6 + ε/3 + ε/6 < ε. Therefore, lim{x→a, x∈E} f (x) = L. 7.2.2 Corollary. If fn → f uniformly on an interval I and if each fn is continuous at some a ∈ I, then f is continuous at a. Proof. Take Ln = fn (a) in the theorem. The corollary is false if the convergence is only pointwise. For example, the sequence of continuous functions xn converges pointwise on [0, 1] to a function that is discontinuous at x = 1. 7.2.3 Theorem. If fn → f uniformly on [a, b] and fn ∈ Rba for all n, then f ∈ Rba and Z b Z b lim fn (t) dt = f (t) dt. (7.3) n

a

a

Proof. By 7.1.8, f is bounded. By uniform convergence, given ε > 0, there exists an N such that ε ε fn (x) − < f (x) < fn (x) + 4(b − a) 4(b − a) for all x ∈ [a, b] and n ≥ N . It follows that for fixed n ≥ N and any partition P, ε ε S(fn , P) − ≤ S(f, P) ≤ S(f, P) ≤ S(fn , P) + , 4 4 hence ε S(f, P) − S(f, P) ≤ S(fn , P) − S(fn , P) + . 2 Since fn is integrable, P may be chosen so that the right side of this inequality is less than ε. Therefore, f is integrable. Since |fn (t) − f (t)| < ε/4(b − a) for n ≥ N and all t, Z b Z b Z b ε fn (t) dt − f (t) dt ≤ |fn (t) − f (t)| dt ≤ , 4 a a a Rb Rb which shows that a fn → a f .

Sequences and Series of Functions

201

The following examples show that the hypothesis of uniform convergence in 7.2.3 cannot be relaxed. 7.2.4 Example. Define fn : [0, π] 7→ R by ( n sin(nx) if 0 ≤ x ≤ π/n, fn (x) = 0 if π/n ≤ x ≤ π.

fn n

π/n

π

x

FIGURE 7.2: Pointwise convergence insufficient. Each fn isR continuous and {fn } converges pointwise on [0, π] to the zero π function, yet 0 fn = 2 for all n. ♦ 7.2.5 Example. Let r1 , r2 , . . . be an enumeration of the rationals in [0, 1] and let ( 1 if x ∈ {r1 , . . . , rn }, fn (x) = 0 otherwise. Then fn is integrable with zero integral and fn converges pointwise to the Dirichlet function, which is not Riemann integrable. ♦ In the two preceding examples, either the sequence was not uniformly bounded or the limit function was not integrable. It will follow from results in Chapter 11 that if {fn } is uniformly bounded, fn , f ∈ Rba , and fn → f merely pointwise on [a, b], then (7.3) holds. 7.2.6 Theorem. Let fn be differentiable on (a, b) for each n and let {fn0 } converge uniformly on (a, b). If {fn (x0 )} converges for some x0 ∈ (a, b), then {fn } converges uniformly to a differentiable function f on (a, b) and fn0 → f 0 on (a, b). Proof. Given ε > 0, choose N such that, for all m, n ≥ N and x ∈ (a, b), |fn (x0 ) − fm (x0 )| <

ε ε 0 and |fn0 (x) − fm (x)| < . 2 2(b − a)

Fix m, n ≥ N . By the mean value theorem applied to fn − fm , for each pair

202

A Course in Real Analysis

x, y ∈ (a, b) there exists ξm,n ∈ (a, b) such that 0 fn (x) − fm (x) − fn (y) − fm (y) = |fn0 (ξm,n ) − fm (ξm,n )||x − y| ≤

ε|x − y| ε ≤ . 2(b − a) 2

(7.4)

In particular, for all x ∈ (a, b), |fn (x) − fm (x)| ≤ fn (x) − fm (x) − fn (x0 ) − fm (x0 ) + |fn (x0 ) − fm (x0 )| < ε/2 + ε/2 = ε. By the uniform Cauchy criterion, {fn } converges uniformly on (a, b) to some function f . Also, from (7.4), for fixed y and for all x 6= y, fn (x) − fn (y) fm (x) − fm (y) ε ≤ − 2(b − a) . x−y x−y Therefore, the sequence of functions [fn (x)−fn (y)]/(x−y) converges uniformly in x on the set Ey := (a, y) ∪ (y, b). Since fn converges to f , f (x) − f (y) fn (x) − fn (y) → x−y x−y

uniformly in x on Ey .

By 7.2.1 with E = Ey , lim fn0 (y). = lim lim n

n x→y

fn (x) − fn (y) f (x) − f (y) = lim = f 0 (y). x→y x−y x−y

The sequence given by fn (x) = xn /n, 0 < x < 1 shows that uniform convergence of a sequence of functions does not guarantee that the derivatives converge uniformly.

Exercises 1. Prove: If fn → f uniformly on an interval I and each fn is continuous at a ∈ I, then, for any sequence {an } in I with an → a, limn fn (an ) = f (a). 2. Show that if fn → f uniformly on a subset E of R and each fn is uniformly continuous on E, then f is uniformly continuous on E. 3. Prove that (1 + x/n)n → ex uniformly on any bounded interval of R. R1 Conclude that 0 (1 + x/n)n → e − 1. R1 4.S Show that n2 xe−nx → 0 for all x ≥ 0, yet 0 n2 xe−nx dx 6→ 0. Why does this not contradict 7.2.3?

Sequences and Series of Functions R1 5. Evaluate limn 0 fn if fn (x) =

203

1 x n(ex/n − 1) . (b) . (c) . cos(x/n) n sin(x/n) x √ n e−x/n − 1 ax(x + 1)n + 1 (d)S . (e) arctan , a > 0. x nx + 1 √ 6.S Prove that fn (x) := n/(1 + n2 x2 ) converges to 0 pointwise on (0, +∞), uniformly on [r, +∞) for every r > 0, but not uniformly on (0, 1). Show R1 that, nonetheless, 0 fn → 0. (a)

7. Let {an } be a positive, strictly increasing sequence. Prove that lim n

Z 0

1

an x dx = 1 + an x

Z

1

lim n

0

an x dx. 1 + an x

8. Let f and f 0 be positive and continuous on [a, b]. Define p 2n f 0 (x) nf 0 (x) and gn (x) := . fn (x) := 1 + n2 f (x) 1 + n2 f (x) Use Exercise 7.1.18 to find Z b Z b Z b (a)S lim n sin fn . (b) lim n(1 − cos fn ). (c) lim n(1 − cos gn ). n

n

a

n

a

a

9. Show that if fn → f uniformly on [a, b] and fn is integrable for each n then Z x Z x fn (t) dt → f (t) dt S

a

a

uniformly in x on [a, b]. 10. Suppose that fn is improperly integrable on [a, c), fn → f uniformly on [a, t] for all t ∈ [a, c), and |fn | ≤ g on [a, c) for all n, where g is improperly integrable on [a, c). Prove that f is improperly integrable on [a, c) and Z Z c

lim n

c

fn =

a

f. a

11. Prove that if f is continuous on [0, 1], then lim n

Z

1

f (xn ) dx = f (0).

0

12. For each n, let fn be continuous on [a, +∞), a > 0, and suppose that cn := limx→+∞ fn (x) exists in R. Prove that if fn → f uniformly on

204

A Course in Real Analysis [a, +∞), then limn cn and limx→+∞ f (x) exist and are equal. Show also that Z 1/a Z 1/a lim fn (x) dx = f (x) dx. n

0

0

Hint. Let gn (x) = fn (1/x), 0 < x ≤ 1/a and apply 7.2.1. 13. Let fn be as in 7.2.4 and define Z x gn (x) = fn (t) dt, hn (x) = xgn (x),

0 ≤ x ≤ π.

0

Show that (a) {gn } converges pointwise and monotonically on [0, π] but not uniformly. (b) {hn } converges uniformly on [0, π]. (c) {h0n } does not converge uniformly on [0, π].

7.3

Convergence of Series of Functions

7.3.1 Definition. Let {fn } be a sequence of real-valued functions on a set S. For each x ∈ S and n ∈ N form the nth partial sums sn (x) =

n X

fn (x) and tn (x) =

k=1

n X

|fn (x)|.

k=1

P P∞ The infinite series of functions n fn = n=1 fn is said to converge P • pointwise on S if n fn (x) converges for each x ∈ S; P • absolutely pointwise on S if n fn (x) converges absolutely for every x ∈ S; • uniformly on S if {sn } converges uniformly on S; • absolutely uniformly on S if {tn } converges uniformly on S.

♦

The methods of Chapter 6 series may be applied at each x to test pointwise convergence of a series of functions. For uniform convergence, additional tests are required. The following result is an immediate consequence of 7.1.9. P P 7.3.2 Theorem. Let P n fn and n gn converge uniformly on a set S and let α, β ∈ R. Then n (αfn + βgn ) converges uniformly on S and X X X (αfn + βgn ) = α fn + β gn . n

n

n

Sequences and Series of Functions

205

The next theorem is a useful test for nonuniform convergence of a series. The proof is immediate from the identity fn = sn − sn−1 . P 7.3.3 Theorem. If n fn converges uniformly on a set S, then fn → 0 uniformly on S. For example, the geometric series ∞ X

xn =

n=0

1 , |x| < 1, 1−x

(7.5)

converges pointwise but not uniformly on (−1, 1), since xn does not tend to zero uniformly on (−1, 1). We show below that the series converges uniformly on all closed subintervals of (−1, 1). The comparison test for uniform convergence of a series of functions takes the following form: 7.3.4 Uniform P Comparison Test. If |fn (x)| ≤ Pgn (x) for all n and all x ∈ S and if n gn converges uniformly on S, then n fn converges absolutely uniformly on S. Pn Pn Proof. Since k=m gn (x), the assertion follows from the k=m |fn (x)| ≤ uniform Cauchy criterion. P 7.3.5 Corollary. If n fn converges absolutely uniformly on a set S, then P n fn converges uniformly on S. P Proof. 0 ≤ fn +|fn | ≤ 2|fn |, hence, by 7.3.4, n (fn +|fn |) converges uniformly on S and therefore so must X X X fn = (fn + |fn |) − |fn |. n

n

n

7.3.6 Weierstrass M -test. If there exist positive P P constants Mn such that +∞ and |f | ≤ M on S for all n, then M < n n n n n fn converges absolutely uniformly on S. Proof. Take gn to be the constant function Mn in 7.3.4. For example, taking Mn = rn , we see that the geometric series (7.5) converges uniformly in every interval [−r, r], 0 < r < 1. The next results are uniform convergence analogs of Dirichlet’s and Abel’s tests for numerical series. P 7.3.7 Theorem. If n fn converges uniformly on a set S and if there exists a constant M such that |g1 (x)| +

∞ X

|gn+1 (x) − gn (x)| ≤ M for all x ∈ S,

n=1

then

P

n

fn gn converges uniformly on S.

206

A Course in Real Analysis Pn P∞ Pn Proof. Let sn = k=1 fk − n=1 fn and tn = k=1 fk gk . For each n > 1, gn =

n−1 X

gk+1 − gk + g1 ,

k=1

hence |gn | ≤ M on S. Given ε > 0, choose N so that |sn (x)| < ε for all n, m ≥ N and x ∈ S. By 6.4.4, for m > n > N and x ∈ S, |tm (x) − tn−1 (x)| m X ≤ |sk (x)| |gk (x) − gk+1 (x)| + |sm (x)| |gm (x)| + |sn−1 (x)| |gn (x)|

(7.6)

k=n

≤ M ε + M ε + M ε = 3M ε. Therefore, {tn } is uniformly Cauchy on S and hence converges uniformly. P 7.3.8 Theorem. If, on a set S, the partial sums of n fn are uniformly P bounded, |g − g | converges uniformly, and g → 0 uniformly, then n+1 n n n P f g converges uniformly on S. n n n Pn Proof. Let tn be in the proof of 7.3.7, sn := k=1 fk , and let M be a uniform bound for {sn } on S. Given ε > 0, choose N such that |gn (x)| < ε and

m X

|gk (x) − gk+1 (x)| < ε, m > n > N, x ∈ S.

(7.7)

k=n

Since (7.6) holds in the current setting, (7.7) implies that |tm (x) − tn−1 (x)| ≤ 3M ε, m > n > N, x ∈ S. Therefore, {tn } converges uniformly on S. P 7.3.9 Corollary. If the partial sums of P n fn are uniformly bounded and if gn ↓ 0 or gn ↑ 0 uniformly on S, then n fn gn converges uniformly on S. Proof. Assume that {gn } is decreasing. Then n X k=1

hence

P∞

n=1

|gk+1 − gk | =

n X

(gk − gk+1 ) = g1 − gn+1 ,

k=1

|gn+1 − gn | converges uniformly.

7.3.10 Example. Let gn be continuous and gn ↓ 0 or gn ↑ 0 on R. We apply the preceding corollary to the series X s(x) := gn (x) sin nx n

Sequences and Series of Functions

207

on closed bounded intervals I not containing any integer multiple of 2π. By Dini’s theorem, gn → 0 uniformly on I. Also, by 6.4.6, s(x) converges pointwise on R. Moreover, if x is not a multiple of 2π, then n X sin(kx) ≤ k=1

1 . sin(x/2)

Pn Since inf I | sin(x/2)| > 0, the sums k=1 sin(kx) are uniformly bounded on I. By 7.3.9, s(x) converges uniformly on I. By 7.3.8, the sameP result holds if, instead of monotonicity of the sequence {gn }, we require that n |gn+1 − gn | converges and P gn → 0, both uniformly on I. Analogous results hold for series ♦ of the form n gn (x) cos nx. 7.3.11 UniformPAlternating Series Test. If gn ↓ 0 or gn ↑ 0 uniformly ∞ on a set S, then n=1 (−1)n+1 gn converges uniformly on S. Proof. Take fn = (−1)n+1 in 7.3.9. 7.3.12 Example. Let f be continuous on R and monotone in some neighborhood N of 0 with f (0) = 0. If an ↓ 0, then the series ∞ X

(−1)n+1 f (an x)

n=1

converges uniformly on any closed, bounded interval I. We verify this for the case I ⊆ [0, +∞) and f increasing. Choose N so that an x ∈ N for all n ≥P N and x ∈ I. Then f (an x) ↓ 0 on I, hence, by Dini’s ∞ theorem and 7.3.11, n=1 (−1)n f (an x) converges uniformly on I. For example, taking an = 1/n we see that the series ∞ X n=1

(−1)n+1 sin(x/n),

∞ X

(−1)n+1 n−1 xex/n , and

n=1

all converge uniformly on closed bounded intervals.

∞ X

(−1)n+1 [1 − e−n

−2

x2

]

n=1

♦

The following theorem is an immediate consequence of 7.2.2, 7.2.3, and 7.2.6 applied to the sequence of partial sums of the series. P 7.3.13 Theorem. Let fn : [a, b] → R and s := n fn . (a) If s converges uniformly on [a, b] and each fn is continuous, then s is continuous. (b) If s converges uniformly on [a, b] and fn ∈ Rba for all n, then s ∈ Rba and Rb P Rb s = n a fn . a P 0 (c) Let fn be differentiable on (a, b) and suppose P that the derived series n fn converges uniformly on (a, b) and that n fn (x0 ) converges P for some x0 ∈ (a, b). Then s converges uniformly on (a, b) and s0 = n fn0 .

208

A Course in Real Analysis P −1 7.3.14 Example. (a) By 7.3.10, sin(nx) converges uniformly on nn intervals [a, b] ⊆ (0, 2π), hence Z a

b

X

n−1 sin(nx) dx =

n

XZ n

b

n−1 sin(nx) dx =

a

X cos(na) − cos(nb) n2

n

.

P On the other hand, the derived series n cos(nx) does not converge. P P (b) Both s(x) := n n−1 sin(x/n) and its derived series n n−2 cos(x/n) converge uniformly on R, hence the latter equals s0 (x). ♦ P A closed form for a series s := n fn on a subset E of R is a “standard function” that equals s on E. Closed forms are typically combinations of rational, power, exponential, logarithmic, trigonometric, or inverse trigonometric functions. 7.3.15 Example. Since 1/(1 − x) is a closed form for the geometric series (7.5) on (−1, 1), the function 2 + sin x = 1 1 + sin x 1− 2 + sin x n ∞ X 1 on intervals I not containing is a closed form for the series 2 + sin x n=0 (4n − 1)π/2 or −(4n + 1)π/2, n = 0, 1, 2, . . .. By the Weierstrass M -test, the series converges absolutely uniformly on closed subintervals of I, since on such a subinterval 0 < 1/(2 + sin x) < 1/(1 + ε) for some ε > 0. ♦ 1

Exercises 1. For the fn below, determine all subintervals of [0, +∞) on Pfunctions ∞ which n=0 fn (x) converges pointwise or uniformly, where p ∈ N. (a) S (d) S

1 . 1 + xn x . 1 + n2 x

(g) S np e−nx . (j) xn (1 − x)n .

(b) (e)

xn . 1 + xn n x . x−2

(h) n−x . n 1−x (k) . 1+x

(c) (f)

x . +x sin(nx) . 1 + n2 x2 n2

(i) S sin(x/np ). (l) xn e−nx .

2. Find the largest intervals of pointwise P convergence and uniform conver∞ gence and a closed form for the series n=0 fn (x), where fn (x) = (−1)n (a) cosn πx/2 , x ∈ [0, 1]. (b)S lnn (1/x). (c) nx . (d) (x2 ln x)n . e

Sequences and Series of Functions 209 P 3. Prove P +that if P n−fn converges absolutely uniformly on a set S, then n fn and n fn converge uniformly on set S, where, for each x ∈ S, fn+ (x) and fn− (x) are, respectively, the positive and negative parts of fn (x). P 4.S Suppose that the numerical series n an converges absolutely. Let s(t) =

X

X an sin (2n + 1)t and c(t) = an cos(nt).

n

n

Find series expansions for Z

π/2

s(t) dt and

x ∞ X

5. Let p > 0 and s(x) =

Z

x

c(t) dt. 0

sin(x/np ). Prove:

n=1

(a) If p ≤ 1, then s(x) diverges for all x 6= 0. (b) If p > 1, then s(x) converges absolutely uniformly on bounded intervals, (hence pointwise on R) but not uniformly on R. 6.S Let p > 0 and s(x) =

∞ X

[1 − cos(x/np )]. Prove:

n=1

(a) If p ≤ 1/2, then s(x) diverges for all x 6= 0. (b) If p > 1/2, then s(x) converges absolutely uniformly on bounded intervals, (hence pointwise on R) but not uniformly on R. 7. Let f (x) be bounded on [0, 1] and t(x) :=

∞ X

xn f (x), x ∈ [0, 1].

n=0

(a) Prove that t(x) converges pointwise on [0, 1) and uniformly on [0, r] for 0 < r < 1. (b) Prove that if f (1) 6= 0, then the convergence of t(x) is not uniform on [0, 1). (c) Suppose that L := limx→1− (1 − x)−1 f (x) exists. Prove that the convergence of t(x) is uniform on [0, 1) iff L = 0. (d) Let m ∈ N. Determine whether the convergence of t(x) is uniform on [0, 1) for f (x) = (i) (1 − x)m .

(ii) 1 − xm .

(iii) 1 − sin(πx/2).

(iv) cos(πx/2).

210

A Course in Real Analysis

8. (Uniform limit comparison test). Let fn ≥ 0 and gn > 0 on a set S and let fn /gn → h uniformly on S, where h : S → R satisfies 0 < inf h ≤ sup h < +∞. S

Prove that on S.

P

n

S

fn converges uniformly on S iff

P

n gn

converges uniformly

9.S Suppose that f 0 exists, is bounded on I := (−r, r), and f (0) = 0. Prove that the series ∞ X 1 x s(x) := f n n+1 n=0 converges uniformly on I and that s0 (0) = f 0 (0). 10. Suppose that |f (x)| ≤ |x| on I = (−r, r), r > 0. If f is differentiable on I and f 0 is continuous at 0, show that the series s(x) in Exercise 9 converges uniformly on I and that |s0 (0)| ≤ 1. P 11.S Let fn (x) be continuous and nonnegative on [a, b]. Prove that if n fn converges pointwise on [a, b] to a continuous function, then the convergence is uniform. P∞ −1 12. Let {aP n } be a sequence such that n=1 an converges absolutely. Prove ∞ −1 that |x − a | converges uniformly on bounded intervals not n n=1 containing any an . P 13.S Suppose each n. Prove that if n fn (a) P that fn is monotone on [a, b] forP and n fn (b) converge absolutely, then n fn ∈ Rba and Z bX XZ b fn = fn . a

n

n

a

P 14. Let n fn converge uniformly on S and let {gn } be a uniformly bounded sequence of functions on a set S such that eitherP {gn } is monotone increasing or monotone decreasing on S. Prove that n fn gn converges uniformly on S. P 15.S Suppose that the partial sums of n fn are uniformly bounded on I = [a, Pb], gn is continuous for each n, and gn ↓ 0 or gn ↑ 0 on I. Prove that n fn gn converges uniformly on I. P 16. Suppose that n fn converges uniformly on I = [a, b], gn is continuous for P each n, and gn ↓ g or gn ↑ g on I, where g is continuous. Prove that n fn gn converges uniformly on I. 17. Suppose thatPgn is continuous on I = [a, b] for each n, {gn } is monotone, and s(x) := n (−1)n gn (x) converges for each x ∈ I. Prove that s(x) is continuous on I.

Sequences and Series of Functions

211

18.S Let g be continuous and nonnegative on R. Prove that the series s(x) :=

∞ X

(−1)n

n=1

g(x) + n n2

converges uniformly on bounded intervals, hence pointwise on R, but does not converge absolutely for any x. 19. Let gn be continuous and gn ↓P 0 on R. Show that if [a, b] does not contain any odd multiple of π, then n (−1)n gn (x) cos nx converges uniformly on [a, b].

7.4

Power Series

A power series in x about a is an infinite series of the form s(x) =

∞ X

cn (x − a)n ,

a, cn ∈ R.

n=0

In the following four subsections we examine the properties of these important series. The first step is to determine the convergence set of a power series.

Radius of Convergence of a Power Series 7.4.1 Convergence Theorem. Given a power series s(x) := P∞ Radius of n n=0 cn (x − a) , define the extended real number R ∈ [0, +∞] by R = ρ−1 , where ρ := lim sup |cn |1/n . n

Then s(x) (a) converges absolutely pointwise for |x − a| < R; (b) converges absolutely uniformly for |x − a| ≤ r < R; (c) diverges for |x − a| > R. Proof. For the case R = 0 (ρ = +∞), the theorem asserts that s(x) diverges for all x 6= a. This is immediate from the root test. A similar application of the root test proves (c): If |x − a| > R, then lim sup |cn (x − a)n |1/n = ρ|x − a| > 1. n

To prove (a) and (b), assume R > 0 (ρ < +∞) and let 0 < r < s < R.

212

A Course in Real Analysis

Then ρ < 1/s so there exists an index N such that |cn |1/n < 1/s for all n ≥ N . For such n and for all x with |x − a| ≤ r, |cn (x − a)n | ≤ (r/s)n . Since r/s < 1, the series converges uniformly on [a − r, a + r] by Weierstrass M -test. Since r is arbitrary, part (a) follows. The number R = 1/ρ is called the of convergence of the series. The Pradius ∞ set I of all x for which the series n=0 cn (x − a)n converges is called the interval of convergence. By 7.4.1, I is one of the intervals {a}, (a − R, a + R), (a − R, a + R], [a − R, a + R), or [a − R, a + R]. The theorem gives no further information regarding I. The methods of Chapter 6 may be applied to determine convergence behavior at the endpoints a ± R if R is finite. The following characterization of R is frequently useful. 7.4.2 Theorem. If cn > 0 for all sufficiently large n, then R = lim n

|cn | , |cn+1 |

provided the limit exists in R. Proof. Let L denote the limit and set an = |cn | > 0 for all n ≥ N . The assertion then follows from the inequalities an+1 1 1 an+1 = lim inf ≤ lim inf a1/n ≤ ρ = lim sup a1/n ≤ lim sup = , n n n n L an an L n n (Exercise 2.4.12). Here are some typical examples using 7.4.2, where I is the convergence interval. Examples. ∞ X (a) nn xn , I = {0}. n=1 ∞ X

xn , I = (−∞, +∞). n! n=1 ∞ X xn √ , I = [−1, 1), conditional convergence at −1. (c) n n=1 ∞ n X x (d) , I = [−1, 1], absolute convergence at ±1. n2 n=1 (b)

The following example is somewhat more interesting.

♦

Sequences and Series of Functions

213

7.4.3 Example. The Fibonacci sequence {cn } is defined by c0 = c1 = 1, cn = cn−1 + cn−2 , n ≥ 2. P∞ The Fibonacci power series is the series n=0 cn xn . We use 7.4.2 to show that √ the radius of convergence of the series is ( 5 − 1)/2. Set rn = cn+1 /cn . Note that the first few terms of the sequence {rn } are 1, 2, 3/2, 5/3, 8/5 and that rn =

cn + cn−1 1 =1+ , cn rn−1

n ≥ 2.

(7.8)

An induction argument then shows that 3/2 ≤ rn ≤ 5/3, n ≥ 2.

(7.9)

Now, from (7.8), rn − rm =

1 rn−1

−

1 rm−1

=

rm−1 − rn−1 . rm−1 rn−1

(7.10)

In particular, r2k+1 − r2k−1 =

r2k−2 − r2k r2k−3 − r2k−1 and r2k − r2k−2 = , r2k r2k−2 r2k−1 r2k−3

hence r2k+1 − r2k−1 =

r2k−1 − r2k−3 . r2k r2k−2 r2k−1 r2k−3

Iterating, we obtain r2k+1 − r2k−1 = (r3 − r1 )/ak for some ak > 0, hence {r2k+1 } is increasing. A similar argument shows that {r2k } is decreasing. Therefore, the sequences converge, say, r2k+1 → L and r2k → M . From (7.10), |rn − rn−1 | =

|rn−1 − rn−2 | |r2 − r1 | = ... = , rn−1 rn−2 bn

where bn is a product of 2n−2 terms, each of which is an rk . From (7.9), bn → +∞, hence rn − rn−1 → 0. Therefore, cn+1 = rn → L = M = 1/R, cn where R is the radius of convergence of the series. Taking √ limits in (7.8) shows that 1/R = 1 + R, which has positive solution R = ( 5 − 1)/2. ♦ Since a power series converges uniformly on closed bounded subintervals of (a − R, a + R), 7.3.13 implies that the series is continuous on the entire interval. The following theorem extends continuity to the endpoints.

214

A Course in Real Analysis

P∞ n 7.4.4 Abel’s Continuity Theorem. Let s(x) := n=0 cn (x − a) have radius of convergence R with 0 < R < +∞. If s(x) converges at x = a + R, then s(x) converges uniformly on [b, a + R] for any b ∈ (a − R, a + R). In particular, s is continuous on (a − R, a + R]. Proof. The transformation x = Ry +a produces a power series in y = (x−a)/R that converges on (−1, 1]. Hence we may assume in the original series that a = 0 and s(x) converges on (−1, 1]. It suffices then to show that s(x) converges uniformly on [0,P 1]. n Let sn (x) = k=0 ck xk , 0 ≤ x ≤ 1. For n > m > 1, define n X

Cm,n =

ck = sn (1) − sm−1 (1).

k=m

By 6.4.4, sn (x) − sm−1 (x) =

n−1 X

Cm,k (xk − xk+1 ) + Cm,n xn − Cm−1,n xm .

k=m

Since n cn converges, given ε > 0, we may choose N such that |Cm,n | < ε/3 for all n > m ≥ N . Then for all n > m ≥ N , P

|sn (x) − sm−1 (x)| ≤

n−1 X

|Cm,k |(xk − xk+1 ) + |Cm,n | + |Cm−1,n |

k=m

≤

n−1 2ε ε X k (x − xk+1 ) + . 3 3 k=m

Pn−1

Since k=m (xk − xk+1 ) = xm − xn ≤ 1, the last expression is ≤ ε. This shows that {sn } is uniformly Cauchy on [0, 1], hence converges uniformly. The next result shows that a power series may be differentiated or integrated term by term over the interior of the interval of convergence. P∞ 7.4.5 Theorem. Let s(x) := n=0 cn (x − a)n have radius of convergence R > 0. Then the derived series and the integrated series D(x) :=

∞ X

ncn (x − a)n−1 and I(x) :=

n=1

∞ X cn (x − a)n+1 n + 1 n=0

have radius of convergence R. Moreover, s(x) is differentiable on the interval (a − R, a + R), and for x ∈ (a − R, a + R) Z x s0 (x) = D(x) and s(t) dt = I(x). a

Sequences and Series of Functions

215

Proof. Since limn n1/n = limn 1/(n + 1)1/n = 1, lim sup |ncn |1/n = lim sup |cn /(n + 1)|1/n = lim sup |cn |1/n . n

n

n

Therefore, the series s(x), D(x), and I(x) have the same radius of convergence. Since the differentiation and integration takes place on closed subintervals where the convergence of each of the three series is uniform, the remaining assertions follow from 7.3.13.

Representation of Functions by Power Series P∞ A power series s(x) = n=0 cn (x − a)n is said to represent a function f on an interval I if f = s on I. The largest interval for which the representation is valid is called the representation interval. Note that the representation interval may be smaller than the convergence interval. (See the examples below.) P∞Power seriesn representations are unique. Indeed, if f is represented by n=0 cn (x − a) on Ia := (a − r, a + r), r > 0, then, by 7.4.5, f has derivatives of all orders on Ia , and repeated differentiation of the identity f (x) =

∞ X

cn (x − a)n , x ∈ Ia

n=0

shows that f (a) = cn n!. Therefore, if f has a power series representation about a, then ∞ X f (n) (a) (x − a)n , x ∈ Ia . f (x) = n! n=0 (n)

The last series is called the Taylor series expansion of f about a. For a = 0 it is called a Maclaurin series. The following examples show how various power series representations may be obtained from the geometric series representation of (1 − x)−1 given in (7.5). 7.4.6 Examples. (a) Differentiating (7.5) term by term and multiplying the result by x yields the representation ∞ X x = nxn , |x| < 1. (1 − x)2 n=1

(b) Replacing x in (7.5) by −t and integrating produces Z x ∞ X 1 (−1)n+1 n ln(x + 1) = dt = x , |x| < 1. n 0 1+t n=1

(7.11)

(7.12)

Since the series converges at x = 1, Abel’s continuity theorem shows that ∞ X (−1)n+1 ln 2 = , n n=1

216

A Course in Real Analysis

a result obtained in 6.4.8 by another method. (c) Replacing x in (7.5) by −t2 and integrating produces Z x ∞ X 1 x2n+1 arctan x = dt = (−1)n , |x| < 1. 2 2n + 1 0 1+t n=0

(7.13)

(d) For an example with a 6= 0, consider ∞ X 3 1 3 2n (x − 1)n , = = = 5 − 2x 3 − 2(x − 1) 1 − 2(x − 1)/3 n=0 3n

|x − 1| <

3 . ♦ 2

The next example and the theorem thereafter show that differentiation can be a powerful tool for finding a closed form for a power series. 7.4.7 Example. We show that ex =

∞ X xn , n! n=0

−∞ < x < +∞.

(7.14)

Let s(x) denote the series. By 7.4.2, the radius of convergence of s is (n + 1)! = lim(n + 1) = +∞, n n! so s(x) converges for all x. Differentiating the series term by term yields s0 (x) = s(x). Now set g(x) = e−x s(x). Then g 0 (x) = e−x [s0 (x) − s(x)] = 0, hence g is constant. Since g(0) = 1, s(x) = ex . ♦ lim n

a n

The following result is an extension of the binomial theorem. The coefficient in (7.15) is called a generalized binomial coefficient.

7.4.8 Binomial Series. For any a ∈ R and |x| < 1, ∞ X a a n a a(a − 1) · · · (a − n + 1) a (1 + x) = x , := , := 1. (7.15) 0 n n n! n=0 Proof. Let s(a, x) denote the series in (7.15). A simple calculation shows that −1 a a n+1 → 1. = n n+1 |a − n| Therefore, by 7.4.2, s(a, x) converges for |x| < 1. For such x, ∞ ∞ X a − 1 n X a − 1 n+1 (1 + x)s(a − 1, x) = x + x n n n=0 n=0 ∞ X a−1 a−1 =1+ + xn+1 n + 1 n n=0 ∞ X a =1+ xn+1 n + 1 n=0 = s(a, x),

(7.16)

Sequences and Series of Functions

217

where for the third equality we used the identity (Exercise 6) a−1 a−1 a + = , n ∈ Z+ . n n+1 n+1

(7.17)

Now differentiate the series s(a, x) term by term to obtain s0 (a, x) =

∞ X a n=1

n

nxn−1 =

∞ ∞ X X a a−1 n (n + 1)xn = a x n+1 n n=0 n=0

= as(a − 1, x).

(7.18)

Set g(x) = (1 + x)−a s(a, x), |x| < 1. By (7.18) and (7.16), g 0 (x) = −a(1 + x)−a−1 s(a, x) + a(1 + x)−a s(a − 1, x) = a(1 + x)−a−1 − s(a, x) + (1 + x)s(a − 1, x) = 0. Therefore, g(x) = g(0) = 1, hence s(a, x) = (1 + x)a , as claimed. 7.4.9 Example. Replacing x in (7.15) by −x, we have √

∞ X 1 −1/2 = (−1)n xn , n 1 − x n=0

|x| < 1.

Since 1 3 2n − 1 −1/2 1 − − ··· − = n! 2 2 2 n (−1)n 1 · 3 · 5 · · · (2n − 1) = n! 2n (−1)n 1 · 2 · 3 · 4 · · · (2n − 1) · 2n = n! 2n 2 · 4 · · · 2n n (−1) (2n)! = , (n!)2 4n we see that √

∞ X 1 (2n)! n = x , 1 − x n=0 (n!)2 4n

|x| < 1.

(7.19)

Replacing x by t2 and integrating term by term from 0 to x yields the Maclaurin series for arcsin x: arcsin x =

∞ X

(2n)! x2n+1 , 2 (2n + 1)4n (n!) n=0

|x| < 1.

(7.20)

218

A Course in Real Analysis

7.4.10 Remark. If a > 0 and is not an integer, then the binomial series converges absolutely uniformly on [−1, 1]. Indeed, if an = | na |, then an |a(a − 1) · · · (a − n + 1)| (n + 1)! n+1 = = , an+1 n! |a(a − 1) · · · (a − n)| |a − n| hence, for sufficiently large n, n+1 n(1 + a) an −1 =n −1 = → 1 + a > 1. n an+1 n−a n−a By Raabe’s test (6.3.2) the series converges absolutely at x = ±1, hence, by Abel’s continuity theorem (7.4.4), the series converges absolutely uniformly on the interval [−1, 1]. ♦

Multiplication of Power Series 7.4.11 The Cauchy product of the power series P∞ Definition. P∞ n n b x is the power series n n=0 n=0 cn x , where cn =

n X

P∞

n=0

an xn and

ak bn−k .

♦

k=0

P Note that n cn xn is precisely the series one obtains by formally carrying out the multiplication (a0 + a1 x + a2 x2 + · · · )(b0 + b1 x + b2 x2 + · · · ) P∞ and collecting like powers. We show below that if the power series n=0 an xn P∞ and n=0 bn xn converge for |x| < R, then so does the Cauchy product. For this we need the following result due to Mertens. P∞ P∞ 7.4.12 Lemma. If the numerical series A := n=0 αn and B := n=0 βn both converge, and if at least one of the series converges absolutely, then the Cauchy product ∞ n X X C := γn , γ n = αk βn−k , n=0

k=0

converges and C = AB. P∞ Proof. Assume that n=0 αn converges absolutely. Let An =

n X k=0

αk , Bn =

n X

βk , C n =

k=0

n X k=0

γk , and A0 =

∞ X

|αn |.

n=0

Then Cn = α0 β0 + (α0 β1 + α1 β0 ) + · · · + (α0 βn + α1 βn−1 + · · · + αn β0 ) = α0 Bn + α1 Bn−1 + · · · + αn B0 = α0 (Bn − B + B) + α1 (Bn−1 − B + B) + · · · + αn (B0 − B + B) = α0 (Bn − B) + α1 (Bn−1 − B) + · · · + αn (B0 − B) + An B.

Sequences and Series of Functions

219

Thus to show that Cn → AB it suffices to verify that α0 (Bn − B) + α1 (Bn−1 − B) + · · · + αn (B0 − B) → 0. Given ε > 0, choose N such that |Bn − B| < ε/(2A0 ) for all n > N . Since αn → 0, we may choose N 0 > N so that for all n > N 0 |αn (B0 − B) + αn−1 (B1 − B) + · · · + αn−N (BN − B)| < ε/2. For such n, |α0 (Bn − B) + α1 (Bn−1 − B) + · · · + αn (B0 − B)| ≤ |αn (B0 − B) + αn−1 (B1 − B) + · · · + αn−N (BN − B)| + |αn−N −1 | |BN +1 − B| + |αn−N −2 | |BN +2 − B| + · · · + |α0 | |Bn − B| < ε/2 + ε/2 = ε. 7.4.13 Cauchy Product Theorem. For each x, let C(x) =

∞ X

cn xn

cn :=

n=0

n X

ak bn−k

k=0

P∞ P∞ be the Cauchy product of series A(x) = n=0 an xn and B(x) = n=0 bn xn . If A(x) and B(x) have radii of convergence Ra and Rb , respectively, then C(x) has radius of convergence Rc ≥ min{Ra , Rb } and C(x) = A(x)B(x),

|x| < min{Ra , Rb }.

(7.21)

Moreover, if, say Rb < Ra and B(Rb ) converges, then C(Rb ) converges and C(Rb ) = A(Rb )B(Rb ). Proof. Assume that Rb ≤ Ra and let |x| < Rb . By 7.4.12 applied to αn = an xn and βn = bn xn , the series C(x) converges, hence Rc ≥ |x| and 7.21 holds. Since |x| was arbitrary, Rc ≥ Rb = min{Ra , Rb }. The last assertion of the theorem follows from 7.4.4 by letting x ↑ Rb in 7.21. 7.4.14 Example. By (7.5) and (7.14), for |x| < 1 ∞ n n X X X ex (−1)n−k (−1)k = cn xn , where cn = = (−1)n . 1 + x n=0 k! k! k=0

♦

k=0

Remark. If Ra = Rb and both A(Ra ) and B(Rb ) in 7.4.13 converge, it does not necessarily P∞ follow that√ C(Ra ) converges. Consider, for example, A(x) = B(x) = n=1 (−1)n xn / n, which has radius of convergence 1 and

220

A Course in Real Analysis

converges conditionally at x = 1. The Cauchy product at x = 1 is where n−1 X 1 p . cn = (−1)n k(n − k) k=1

P∞

n=1 cn ,

However, for odd n, |cn | =

n−1 X

p k=1

hence

P

(n−1)/2

1

n cn

k(n − k)

≥

1

X p k=1

k(n − k)

(n−1)/2

≥

1

√

2 p = , 2 2 (n − 1) /2

X k=1

diverges.

♦

Analytic Functions 7.4.15 Definition. A function f is said to be (real ) analytic at a point a if, for some r > 0, f has derivatives of all orders on (a−r, a+r) and is represented there by its Taylor series at a, that is, f (x) =

∞ X f (n) (a) (x − a)n , |x − a| < r. n! n=0

If f is analytic at each point of a set E, then f is said to be analytic on E.♦ A function that has derivatives of all orders on an interval may not be analytic there. This is the case for the function in Exercise 29 below. The following theorem gives a necessary and sufficient condition for analyticity at a point. 7.4.16 Taylor Series Representation. Let f have derivatives of all orders on an open interval I containing a. Then f is analytic at a iff there exist positive constants M and r such that |f (k) (x)| ≤ k!M k for all k ∈ N and x ∈ (a − r, a + r).

(7.22)

Proof. Assume condition (7.22) holds. To prove that f is analytic at a we use Taylor’s theorem (Section 4.6), which asserts that for each n ∈ N and x ∈ (a − r, a + r) there exists a number c = c(n, x) between x and a such that f (x) = Tn (x) + Rn (x), where Tn (x) :=

n−1 X k=0

f (k) (a) (x − a)k , k!

and Rn (x) :=

f (n) (c) (x − a)n . n!

Now let r ∈ (0, 1/M ) and |x − a| < r. By hypothesis, |Rn (x)| ≤ M n |x − a|n ≤ (M r)n . Since M r < 1, Rn (x) → 0, hence Tn (x) → f (x). Therefore, f is analytic at a.

Sequences and Series of Functions

221

Conversely, let f be analytic at a. Then there exist constants r1 ∈ (0, 1) and cn such that f (x) =

∞ X

cn (x − a)n , |x − a| ≤ r1 .

(7.23)

n=0

In particular, |cn r1n | → 0. Choose M1 > 1 so that |cn r1n | < M1 for all n. Termwise differentiation of (7.23) yields f (k) (x) =

∞ X

n(n − 1) · · · (n − k + 1)cn (x − a)n−k ,

n=k

hence for |x − a| ≤ r1 /2, |f (k) (x)| ≤

∞ X

n(n − 1) · · · (n − k + 1)|cn |(r1 /2)n−k

n=k

≤ M1 r1−k

∞ X

n(n − 1) · · · (n − k + 1)(1/2)n−k .

n=k

The last series is the kth derivative of the geometric series for (1 − x)−1 evaluated at 1/2 and therefore equals dk (1 − x)−1 = k!(1 − 1/2)−k−1 = k!2k+1 . dxk x=1/2 Thus

|f (k) (x)| ≤ M1 r1−k k!2k+1 ,

|x − a| ≤ r1 /2.

To obtain (7.22), take r = r1 /2 and choose M > 4M1 /r1 , so that M k > M1 r1−k 2k+1 for all k. 7.4.17 Example. Let f (x) = sin x. Then f (2k) (0) = 0 and f (2k+1) (0) = (−1)k . Since the derivatives of f are bounded, (7.22) holds for all x. Therefore, sin x =

∞ X (−1)n 2n+1 x3 x5 x =x− + − ..., (2n + 1)! 3! 5! n=0

∞ < x < +∞.

Similarly, cos x =

∞ X (−1)n 2n x2 x4 x =1− + − ..., (2n)! 2! 4! n=0

∞ < x < +∞.

♦

It is clear from 7.3.2 and 7.4.13 that the sum and product of functions analytic at a are analytic at a. In Exercise 33 the reader is asked to show that the reciprocal of a nonzero analytic function is analytic. It follows that the ratio of two analytic functions, if defined, is analytic. The next result extends the property of analyticity to nearby points.

222

A Course in Real Analysis P∞ 7.4.18 Theorem. If the series f (x) = n=0 an (x − a)n converges on I := (a − r, a + r), then f is analytic on I. Proof. By considering g(x) = f (x + a), we may suppose that a = 0. Let |b| < r, 0 < s < r − |b|, and |x − b| < s. We show that f has a power series expansion about b on the interval |x − b| < s. Since b is arbitrary, it will follow that f is analytic on I. n By the binomial theorem applied to (x − b) + b , ∞ X n ∞ X ∞ X X n f (x) = an (x − b)k bn−k = an dk,n (x − b)k bn−k , (7.24) k n=0 k=0 n=0 k=0 n where dk,n = k for k = 0, 1, · · · , n and dk,n = 0 for k > n. Now, ∞ X ∞ ∞ n X X X n k n−k |an (x − b) b dk,n | = |an | |x − b|k |bn−k | k n=0 n=0 k=0

k=0

=

∞ X

|an |(|x − b| + |b|)n .

n=0

If |x − b| < s then |x − b| + |b| < s + |b| < r and the last series converges. Therefore, (7.24) converges uniformly for |x − b| < s. By 6.5.4, the order of summation may be interchanged, so f (x) = = =

∞ X ∞ X k=0 n=0 ∞ X ∞ X k=0 n=k ∞ X

an dk,n (x − b)k bn−k an

n (x − b)k bn−k k

bk (x − b)k , where bk :=

k=0

∞ X n=k

an

n n−k b . k

This shows that f has a power series expansion about b on (b − s, b + s). 7.4.19 Theorem. Let f be analytic on an open interval I and let f = 0 on a subinterval (a, b) of I. Then f = 0 on I. Proof. Let c ∈ I, c > b, and define A = {t ∈ (a, c) | f (n) = 0 on (a, t] for all n ≥ 0}. Then A 6= ∅ and t0 := sup A ≤ c. Suppose, for a contradiction, that t0 < c. Since f is analytic at t0 , f has a Taylor series representation about t0 on J := (t0 − r, t0 + r) for some r > 0. By continuity and the approximation property of suprema, f (n) (t0 ) = 0 for each n. It follows that f is identically zero on J, contradicting the definition of t0 . Therefore, t0 = c, hence f = 0 on (a, c). Since c was arbitrary, f (x) = 0 for all x ∈ I with x ≥ a. Similarly, f (x) = 0 for all x ∈ I with x ≤ b.

Sequences and Series of Functions

223

The proof of the following corollary is left to the reader. 7.4.20 Corollary. Let f and g be analytic on the open intervals I and J, respectively. If I ∩ J = 6 ∅ and f = g on an open subinterval of I ∩ J, then there exists an analytic function h on I ∪ J such that h|I = f and h|J = g. The preceding corollary is known as analytic continuation, as it may be used to extend an analytic function to a larger interval.

Exercises 1. Find the interval of convergence of

P∞

n=1

fn (x), where fn (x) =

(−1)n n 23n n3 xn n2 n! (x − 1)n . (b) √ (x − 2)n . . (c) n 2 (2n)! n! (1 + 2/n)n (−1)n nxn n!xn (d) S . (e) n+2−1/n . (f) (x + 1)n . (n + 1) ln(n + 2) 3n n (1.5)(2.5) · · · (n + .5) n 1 n 2n + 5n 2n (g) S [3 + (−1)n ]n sin x . (i) S x . x . (h) n n n 3 +4 n! (a) S

2. Use (7.5) to represent the following functions as power series about the given point a. In each case, find the representation interval. x3 x x (a) , a = 0. (b)S , a = 0. (c) , a = 1. 2 (x + 1) 2 − 3x 3 + 2x 3. Use (7.12) to find power series representations for (a)S x ln x, (b) x2 ln x about the point a = 1. 4. Without using 7.4.16, find the Maclaurin series and representation interval for the following functions. 2 1 + 2x S (a) ln . (b) (1 + x2 ) arctan x. (c) x3 e−3x . 1 − 3x √ ex − 1 sin x cos x − 1 S √ . . (e) (d) (f) . x x2 x 1 (g) S sin x cos x. (h) √ . (i) sin(x + π/3). 9 − x2 5.S Use an identity and 7.4.9 to find the Maclaurin series for arccos x. 6. Verify the identity (7.17). 7. Without using 7.4.16, show that (a) sin x = (b) cos x =

∞ X n=0 ∞ X n=0

an (x − a)n , a2n =

(−1)n sin a (−1)n cos a , a2n+1 = . (2n)! (2n + 1)!

bn (x − a)n , b2n =

(−1)n cos a (−1)n+1 sin a , b2n+1 = . (2n)! (2n + 1)!

224

A Course in Real Analysis

8. Prove that

∞ ∞ X X 4(−1)n 2(2n)! = π. = 2 (2n + 1)4n 2n + 1 (n!) n=0 n=0

9. Find a power series representation for (a)

S

sin t − t . t3

√

(b)

cos t t.

10. Find a closed form for the series (a) n2 xn .

Rx

P∞

n=0 (n

2

f (t) dt if f (t) =

(c)

P∞

cos t − 1 . t

2

(d)

et − 1 . t2

fn (x), |x| < 1, where fn (x) =

n=0

(b)S (−1)n (2n + 1)x2n+1 .

11.S Sum the series

0

(c)

xn+1 n2 xn . (d) . (n + 1)(n + 2) n+1

+ n + 1)3−n .

12. Use 7.4.13 to find a series representation and representation interval for sin x ln(1 − x) e−x arctan x 2 . (c) (a)S . (b) √ . (d)S ex sin x. (e) . 2 1+x 1 − x x(1 + x2 ) 1−x 13. By calculating the Maclaurin series of the function sin2 x in two ways, establish the identity n

X 22n+1 1 = . (2n + 2)! (2k + 1)!(2n − 2k + 1)! k=0

14. By calculating the Maclaurin series of the function cos2 x in two ways, establish the identity n

X 22n−1 1 = . (2n)! (2k)!(2n − 2k)! k=0

15. By calculating the Maclaurin series of the function (1 − x)−3/2 in two ways, establish the identity n

(2n + 1)! X (2k)! = . (n!)2 4n (k!)2 4k k=0

16.S Show that the Fibonacci power series s(x) (7.4.3) has the closed form √ 5 − 1 /2. (1 − x − x2 )−1 , |x| < Conclude from Abel’s continuity theorem (7.4.4) that s(x) cannot con√ verge at the endpoint ( 5 − 1)/2.

Sequences and Series of Functions 225 P∞ 17. Let an → L ∈ R and set s(x) := n=0 an xn , |x| < 1. For m ∈ N, define ϕm (x) :=

2m−1 X

(−1)k xk .

k=0

Prove that limx→1− ϕm (x)s(x) = mL. Hint. Use Abel’s continuity theorem. 18.S Use the method of 7.4.9 to establish the representation ln

p

∞ X 1 + x2 + x =

(−1)n (2n)! x2n+1 , 2 4n (2n + 1)(n!) n=0

19. Let R be the radius of convergence of Prove:

P

n cn (x

|x| < 1.

− a)n and let p ≥ 0.

(a) If lim inf n |cn |np > 0, then R ≤ 1. (b) If lim supn |cn |/np < +∞, then R ≥ 1. 20. Let Ra and P Rb denote the radii of convergence of A(x) := B(x) := n bn xn , respectively. Suppose that

P

n

an xn and

lim sup(|an |/|bn |) < +∞. Prove that Ra ≥ Rb . 21.S Let Rs and Rt denote the radii of convergence of X X s(x) := cn (x − a)n and t(x) := cn2 (x − a)n , n

n

respectively. Prove: (a) If Rs > 1, then Rt = +∞. (b) If Rs ≤ 1, then no conclusion is possible. 22. Let Rs and Rt denote the radii of convergence of X X 2 s(x) := cn (x − a)n and t(x) := cn (x − a)n , n

n

respectively. Prove: (a)S If 0 < Rs < +∞, then Rt = 1. (b) If Rs = 0, then Rt ≤ 1, and any value of Rt ≤ 1 is possible. (c) If Rs = +∞, then Rt ≥ 1, and value of Rt ≥ 1 is possible.

226

A Course in Real Analysis

23. Suppose that the series A :=

∞ X

an , B :=

n=0

converge, where cn = AB = C.

∞ X

bn , and C :=

n=0

Pn

k=0

∞ X

cn

n=0

an bn−k . Use 7.4.12 and 7.4.4 to prove that

24. Prove that for any a, b ∈ R and n ∈ N, a b a b a b a+b + + ··· + = . 0 n 1 n−1 n 0 n 25. Let n ∈ Z+ . The Bessel function of order n may be defined as the power series ∞ X (−1)k x n+2k Jn (x) = . (n + k)!k! 2 k=0

Prove: (a) The radius of convergence of Jn (x) is +∞. (b) Jn satisfies Bessel’s differential equation x2 y 00 + xy 0 + (x2 − n2 )y = 0. d n (c) x Jn (x) = xn Jn−1 (x), n ≥ 1. dx ( ∞ X xn = +∞ if p ≤ 1, 26. Prove that lim np x→1− < +∞ if p > 1. n=1 P 27.S Let {cn } tend monotonically to 0. Prove that n cn xn is continuous on [−1, 1). 28. Let f (x) be bounded on [0, 1]. P (a)S Prove that t(x) := n nxn f (x) converges pointwise on [0, 1) and uniformly on [0, r] for 0 < r < 1. (b) Suppose that L := limx→1− (1 − x)−2 f (x) exists. Prove that the convergence of t(x) in (a) is uniform on [0, 1) iff L = 0. (Compare with Exercise 7.3.7.) 29. Show that the function ( f (x) =

2

e−1/x 0

if x 6= 0, otherwise

is not analytic at 0. (See Exercise 4.6.1.) 30.S Prove 7.4.20.

Sequences and Series of Functions 31. Prove: If f (x) is analytic at a, then f 0 (x) and g(x) := analytic at a.

227 Rx a

f (t) dt are

32. Let f be analytic at a and let {an } be a sequence of distinct real numbers such that an → a and f (an ) = 0 for all n. Prove that f is identically zero in a neighborhood of a. Hint. Assume that an ↑ a (how?). Construct, by (k) (k) (k) induction, sequences {an }n such that limn an = a and f (k) (an ) = 0 for all n and k. 33.S Let f be analytic at a and f (a) 6= 0. Carry out the following steps to show that 1/f is analytic at a. (a) Assume that f (a) = 1 and that f (x) =

∞ X

an (x − a)n 6= 0, |x − a| < r

n=0

for some r. Define a series g formally by g(x) =

∞ X

bn (x − a)n ,

n=0

where the sequence {bn } is given recursively by b0 = 1 and bn = −

n X

ak bn−k , n ≥ 1.

k=1

Show that if g(x) converges for |x − a| < r1 for some 0 < r1 < r, then f (x)g(x) = 1 for |x − a| < r1 . (b) Show that if |an | ≤ M n for all n, then |bn | ≤ (2M )n for all n. (c) Conclude that g is analytic at a and that g = 1/f .

Part II

Functions of Several Variables

Chapter 8 Metric Spaces

The essential feature in the notion of limit of a function is the idea of nearness. This is made precise by a distance function, which, in the case of limits on R, is derived from the absolute value function. It turns out that there are many other important mathematical structures equipped with a distance function and therefore admitting a definition of limit. In this chapter, we examine the general properties of these structures.

8.1

Definitions and Examples

8.1.1 Definition. A metric on a nonempty set X is a function d : X × X → R such that, for all x, y, z ∈ X, (a) d(x, y) ≥ 0 (nonnegativity), (b) d(x, y) = 0 iff x = y (coincidence), (c) d(x, y) = d(y, x) (symmetry), and (d) d(x, y) ≤ d(x, z) + d(y, z) (triangle inequality). The ordered pair (X, d) is called a metric space. A nonempty subset E of X with the metric d E×E is called a subspace of X and is denoted by (E, d). ♦ The real number system is a metric space under the usual metric d(x, y) = |x − y|. The following example shows that any nonempty set may be given a metric. 8.1.2 Example. (Discrete metric space). On a nonempty set X define d(x, x) = 0 for all x ∈ X, and d(x, y) = 1 if x = 6 y. Then d is easily seen to be a metric, called the discrete metric on X. For example, the triangle inequality d(x, y) ≤ d(x, z) + d(y, z) holds because the left side of the inequality is at most 1, in which case either x 6= z or y 6= z implying that the right side must be at least 1. ♦ 8.1.3 Definition. A subset E of a metric space X is said to be bounded if for some x0 ∈ X and M > 0, d(x, x0 ) ≤ M for all x ∈ E. ♦ 231

232

A Course in Real Analysis

The point x0 in the preceding definition may be replaced by any other point y0 ∈ X since for x ∈ E, d(x, y0 ) ≤ d(x, x0 ) + d(x0 , y0 ) ≤ M + d(x0 , y0 ). The notions of convergence and completeness readily carry over to general metric spaces: 8.1.4 Definition. A sequence {xn } in a metric space (X, d) is said to converge to a member x of X if limn d(xn , x) = 0. In this case we write xn → x or limn xn = x. A cluster point of a sequence in X is the limit of a convergent subsequence. ♦ The limit of a sequence {xn } in X, if it exists, must be unique. Indeed, if xn → x and xn → y, then, by the triangle inequality, 0 ≤ d(x, y) ≤ d(x, xn ) + d(y, xn ) → 0, hence d(x, y) = 0 and so x = y. 8.1.5 Definition. A sequence {xn } in a metric space (X, d) is said to be Cauchy if limm,n d(xm , xn ) = 0. A metric space (X, d) is said to be complete if every Cauchy sequence in X converges to a member of X. A subset E of X is complete if it is complete as a subspace of X, that is, every Cauchy sequence in E converges to a member of E. ♦ The real number system is complete under the usual metric. The subspace Q of R is not complete: a sequence of rational numbers converging to an irrational number is Cauchy. A discrete metric space is complete, since every Cauchy sequence is eventually constant and therefore trivially converges. 8.1.6 Proposition. (a) Every Cauchy sequence is bounded. (b) Every convergent sequence is Cauchy, hence bounded. Proof. (a) If {xn } is Cauchy, choose an index N such that d(xm , xn ) < 1 for all m, n ≥ N . Then, for all n ∈ N, d(xN , xn ) < 1 + max{d(xN , x1 ), d(xN , x2 ), . . . , d(xN , xN −1 )}. (b) If xn → x, then the inequality d(xm , xn ) < d(xm , x) + d(xn , x) implies that {xn } is Cauchy. The notions of pointwise convergence and uniform convergence of a sequence of real-valued functions easily extend to general metric spaces: 8.1.7 Definition. Let S be a nonempty set and let (X, d) be a metric space. A sequence of functions fn : S → X is said to converge pointwise to a function f : S → X if fn (s) → f (s) for each s ∈ S. In this case we write f = limn f or fn → f (on S). The sequence converges uniformly to f on S if for each ε > 0 there exists N ∈ N such that d fn (s), f (s) < ε for all n ≥ N and s ∈ S. ♦

Metric Spaces

233

8.1.8 Definition. Let X be a vector space. A norm on X is a function k · k from X to R such that for all x, y ∈ X and t ∈ R (a) kxk ≥ 0 (nonnegativity), (b) kxk = 0 iff x = 0 (coincidence), (c) ktxk = |t| kxk (absolute homogeneity), (d) kx + yk ≤ kxk + kyk (triangle inequality). The pair (X , k · k) is then called a normed vector space.

♦

The proof of the following proposition is left to the reader. 8.1.9 Proposition. If (X , k · k) is a normed vector space, then the function d(x, y) := kx − yk is a metric on X . From 1.6.4 and Exercise 1.6.4 we see that k · k2 , k · k1 , and k · k∞ are norms on Rn , hence, according to 8.1.9, give rise to metrics. We denote these, respectively, by d2 , d1 , and d∞ . In Exercise 17 the reader is asked to show that Rn is complete in each of these metrics. The metric d2 is called the Euclidean metric on Rn . The metric d1 is the `1 metric on Rn and d∞ the max metric on Rn . Clearly, for n = 1, all three metrics reduce to absolute value on R. 8.1.10 Example. Let S be a nonempty set and let B(S) denote the set of all bounded real-valued functions on S. Then B(S) is a vector space under the operations of addition f + g and scalar multiplication cf defined by (f + g)(s) = f (s) + g(s) and (cf )(s) = cf (s), s ∈ S. The supremum norm of f ∈ B(S) is defined by kf k∞ = sup {|f (s)| : s ∈ S} . It is easy to check that k · k∞ is indeed a norm. For example, the triangle inequality follows by taking the supremum over s ∈ S in the inequality |f (s) + g(s)| ≤ |f (s)| + |g(s)| ≤ kf k∞ + kgk∞ . Note that convergence of a sequence of functions in B(S) is simply uniform convergence on S. For this reason, k · k∞ is also called the uniform norm. The space B(S) is complete in the metric d∞ (f, g) := kf − gk∞ induced by the norm. To see this, let {fn } be a Cauchy sequence in B(S) and let ε > 0. Choose N such that d∞ (fn , fm ) < ε for all m, n ≥ N . For such m, n |fn (s) − fm (s)| < ε for all s ∈ S,

(8.1)

234

A Course in Real Analysis

hence {fn (s)} is a Cauchy sequence in R for every s ∈ S. Since R is complete, fn (s) → f (s) for some f (s) ∈ R. Fixing n in (8.1) and letting m → +∞ yields |fn (s) − f (s)| ≤ ε for all s ∈ S and all n ≥ N . This shows that f is bounded and that fn → f in B(S).

♦

In the case S = N, B(S) may be identified with the set of all bounded sequences and as such is denoted by `∞ . 1 8.1.11 the set of all sequences a = {an } in R such P Example. Let ` denote < +∞. Clearly, `1 is a vector subspace of `∞ . It is easy that n |an | P to check that kak1 := n |an | defines a norm on `1 . We show that `1 , k·k1 is complete in this norm. 1 Let {an := (a1,n , a2,n , . . .)}∞ n=1 be a Cauchy sequence in ` , and let ε > 0. Choose N so that

kan − am k1 =

∞ X

|ak,n − ak,m | < ε for all n, m ≥ N.

(8.2)

k=1

Since |ak,n − ak,m | ≤ kan − am k1 , the sequence {ak,n }n is Cauchy for each k, hence converges. Let ak = limn ak,n . Fix K ∈ N and n ≥ N . From (8.2), K X

|ak,n − ak,m | < ε for all m ≥ N .

k=1

Letting m → +∞, we obtain kan − ak1 =

PK

∞ X

k=1

|ak,n − ak | ≤ ε. Since K was arbitrary,

|ak,n − ak | ≤ ε for all n ≥ N.

k=1

It follows that a ∈ `1 and an → a.

♦

8.1.12 Definition. Let (X, d) and (Y, ρ) be metric spaces. The product metric d × ρ on X × Y is defined by (d × ρ) (x, y), (a, b) := d(x, a) + ρ(y, b), x, a ∈ X, y, b ∈ Y. The pair (X × Y, d × ρ) is called the product of the metric spaces X and Y . ♦ In Exercise 13 the reader is asked to prove, among other things, that d × ρ is indeed a metric and that a sequence {(xn , yn )} converges to (a, b) in X × Y in this metric iff xn → a in X and yn → b in Y .

Metric Spaces

235

Exercises 1.S Determine whether d is a metric on R2 , where d((x1 , x2 ), (y1 , y2 )) = (a) 2|x1 − y1 | + 3|x2 − y2 |.

(b) |x21 − y12 | + |x22 − y22 |.

(c) |x31 − y13 | + |x32 − y23 |.

(d) |x1 − x2 | + |y1 − y2 |.

(e)

|x1 − y1 | + |x2 − y2 | . 2 + |x1 − y1 | + |x2 − y2 |

(f) |ex1 − ey1 | + |ex2 − ey2 |.

2. (p-adic metric). Let p be a fixed prime number. Define ρp (n, n) = 0, and for m 6= n ∈ Z define ρp (m, n) = 1/pα , where α is the power of p in the unique prime factorization of |m − n|. (For example, ρ2 (42, 2) = 1/8, ρ5 (42, 2) = 1/5, and ρ3 (42, 2) = 1.) Show that ρp is a metric on Z. 3.S (Hamming distance). Let A be a nonempty set and X := An . For x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ X define d(x, y) to be the number of indices j for which xj 6= yj . Show that d is a metric on X. (The metric is named after Richard Hamming, who pioneered the field of error correcting codes.) 4. Let X be as in Exercise 3. Define ρ(x, x) = 0, and for distinct points x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in X define ρ(x, y) = 2−j , where j is the smallest index for which xj 6= yj . Show that ρ is a metric on X. 5.S Prove that a metric d satisfies |d(x, y) − d(a, b)| ≤ d(x, a) + d(y, b). Conclude that if xn → a and yn → b, then d(xn , yn ) → d(a, b). 6. Let X and Y be nonempty sets and let f : X → Y be one-to-one. Show that if ρ is a metric on Y , then d(x, y) := ρ(f (x), f (y)) defines a metric on X. 7. Prove that a finite union of bounded sets in a metric space is bounded. 8. Prove 8.1.9. 9. ⇓1 Prove that a Cauchy sequence with a cluster point converges. 10.S Let E1 , . . . , Em be complete subspaces of (X, d). Prove that the finite union E1 ∪ · · · ∪ Em is complete. Does the analogous assertion hold for a countable union of complete subspaces? 11. Let X := [1, +∞) have the metric d(x, y) = |x−1 − y −1 | (see Exercise 6). Show that xn → x with respect to the usual metric on X iff xn → x with respect to d. Is (X, d) complete? 1 This

exercise will be used in 8.5.8.

236

A Course in Real Analysis

12. Same as Exercise 11 but with the metric ρ(x, y) = x(1 + x2 )−1 − y(1 + y 2 )−1 . 13.S Let (X, d) and (Y, ρ) be metric spaces and let (Z, η) = (X × Y, d × ρ) be the product space. Prove: (a) η is a metric on Z. (b) A sequence {(xn , yn )} is Cauchy in Z iff {xn } is Cauchy in X and {yn } is Cauchy in Y . (c) A sequence {(xn , yn )} converges to (x, y) in Z iff xn → x in X and yn → y in Y . (d) Z is complete iff X and Y are complete. 14. Metrics d, ρ on a set X are said to be metrically equivalent if there exist positive constants a and b such that d(x, y) ≤ a ρ(x, y) and ρ(x, y) ≤ b d(x, y) for all x, y ∈ X. For example, by Exercise 1.6.6, the metrics d1 , d2 , and d∞ are metrically equivalent. Suppose that d and ρ are metrically equivalent. Let {xn } be a sequence in X and let x ∈ X. Prove the following: (a) xn → x in (X, d) iff xn → x in (X, ρ). (b) {xn } is Cauchy in (X, d) iff {xn } is Cauchy in (X, ρ). (c) (X, d) is complete iff (X, ρ) is complete. 15.S Let d be a metric on a set X and a > 0. Define ρ(x, y) := min{d(x, y), a}. Prove: (a) ρ is a metric on X. (b) A sequence is Cauchy in (X, ρ) iff it is Cauchy in (X, d). (c) A sequence converges in (X, ρ) iff it converges in (X, d). (d) (X, ρ) is complete iff (X, d) is complete. Are d and ρ metrically equivalent? Does σ(x, y) := max{d(x, y), a} define a metric on X? 16. Let ρ1 and ρ2 be metrics on X. Prove that max{ρ1 , ρ2 } is a metric. Is min{ρ1 , ρ2 } a metric? 17. Let x := (x1 , . . . , xn ), xk := (x1,k , . . . , xn,k ) ∈ Rn , k = 1, 2, . . .. Prove: (a) xk → x in (Rn , d2 ) iff xj,k → xj for j = 1, . . . , n. (b) {xk } is Cauchy in (Rn , d2 ) iff {xj,k }∞ k=1 is Cauchy in R for each j = 1, . . . , n. (c) Rn is complete in each of the metrics d1 , d2 , d∞ . (Use Exercise 14.)

Metric Spaces

237

18.S Let d be a metric on a set X and define ρ(x, y) :=

d(x, y) . 1 + d(x, y)

Verify that (a)–(d) of Exercise 15 hold. Are d and ρ metrically equivalent? 19. Let ρ1 and ρ2 be metrics on a set X and let α, β > 0. Define ρ(x, y) := αρ1 (x, y) + βρ2 (x, y). Prove: (a) ρ is a metric on X. (b) A sequence {xn } converges to x in (X, ρ) iff it converges to x in both (X, ρ1 ) and (X, ρ2 ). 20.S Let {dk }∞ k=1 be a sequence of metrics on a set X. For x, y ∈ X define ∞

ρk (x, y) =

X dk (x, y) 2−k ρk (x, y). and ρ(x, y) = 1 + dk (x, y) k=1

Prove: (a) ρ is a metric on X. (See Exercise 18.) (b) ρ(xn , x) → 0 iff dk (xn , x) → 0 for every k. 21. Let C(R) denote the set of continuous, real-valued functions on R. For f, g ∈ C(R) define ∞ X ρ(f, g) := 2−k ρk (f, g), k=1

where dk (f, g) =

sup

−k≤x≤k

|f (x) − g(x)| and ρk (f, g) =

dk (f, g) . 1 + dk (f, g)

Prove: (a) ρ is a metric on C(R). (b) fn → f in this metric iff fn → f uniformly on each bounded subset of R. (c) C(R) is complete in this metric. Rb 22. For f ∈ C([a, b]) define kf k1 = a |f |. Show that k · k1 is a norm on C([a, b]) and that C([a, b]) is not complete in the metric induced by this norm. 23.S Show that the sequence of functions fn (x, y) = (1 + xn )1/n (1 + y n )−1/n converges uniformly to f (x, y) = x/y on [1, b] × [1, b] for any b > 1.

238

8.2

A Course in Real Analysis

Open and Closed Sets Throughout this section, (X, d) denotes an arbitrary metric space.

It is frequently useful to formulate assertions regarding a metric space X in terms of certain subsets of X rather than the metric. The subsets of most interest in this regard are described in the next two definitions. 8.2.1 Definition. Let x ∈ X and r > 0. The sets Br (x) := {y ∈ X : d(x, y) < r} and Cr (x) := {y ∈ X : d(x, y) ≤ r} are called, respectively, the open and closed balls with center x and radius r. The set Sr (x) := Cr (x) \ Br (x) = {y ∈ X : d(x, y) = r} is called the sphere with center x and radius r. The ball Br (x) is also called a neighborhood of x. ♦ The open (closed) balls in R with the usual metric are simply the bounded open (closed) intervals. The spheres are the endpoints of these intervals. The open (closed) balls in Euclidean space R2 are open (closed) disks and the spheres are circles. The open and closed balls in a discrete metric space X are the sets X and {x}; the spheres are X \ {x} and the empty set. 8.2.2 Definition. A subset U of X is said to be open if either U = ∅ or else U has the following property: For each x ∈ U there exists ε > 0 such that Bε (x) ⊆ U . A subset of X is closed if its complement is open. The collection of all open sets is called the (metric ) topology of (X, d). ♦ In any metric space, X and ∅ are both open and closed. There are many metric spaces for which these are the only subsets that are both open and closed; Euclidean space Rn is an important example (see Section 8.7). The sets Q and I are neither open nor closed in R since every open ball (= open interval) contains members of both sets. A finite set F is always closed. Indeed, if x ∈ F c , then Br (x) ⊆ F c , where r = min{d(x, y) : y ∈ F }, hence F c is open. 8.2.3 Proposition. An open ball is open, a closed ball is closed, and a sphere is closed. Proof. Let x ∈ Br (x0 ). We claim that Bε (x) ⊆ Br (x0 ), where ε := r − d(x, x0 ). Indeed, if y ∈ Bε (x) then d(y, x0 ) ≤ d(y, x) + d(x, x0 ) < ε + d(x, x0 ) = r,

Metric Spaces

239

ε x

r x0

y

B(x)

Br (x0) FIGURE 8.1: An open ball is open. hence y ∈ Br (x0 ) (Figure 8.1). Since x was arbitrary, Br (x0 ) is open. A similar argument shows that Cr (x0 )c and Sr (x0 )c are open, hence Cr (x0 ) and Sr (x0 ) are closed. (See Exercise 2.) That Sr (x0 ) is closed also follows from 8.2.6 below. 8.2.4 Theorem. Open sets in (X, d) have the following properties: S (a) If Ui is open for each i in an index set I, then i∈I Ui is open. (b) If V1 , . . . , Vn are open, then V1 ∩ · · · ∩ Vn is open. Proof. (a) Let U denote the union. If x ∈ U , then x ∈ Ui for some i, hence there exists r > 0 such that Br (x) ⊆ Ui ⊆ U . Therefore, U is open. (b) Let V denote the intersection and let x ∈ V . For each j = 1, . . . , n there exists rj > 0 such that Brj (x) ⊆ Vj . Then Br (x) ⊆ V , where r = min{r1 , . . . , rn }. Therefore, V is open. 8.2.5 Corollary. A nonempty subset U is open iff it is the union of open balls. For example, in a discrete metric space, every subset is a union of open balls {x} = B1 (x) and hence is open. It follows that every subset is also closed. 8.2.6 Corollary. Closed sets in (X, d) have the following properties: T (a) If Ci is closed for each i in an index set I, then i∈I Ci is closed. (b) If C1 , . . . , Cn are closed, then C1 ∪ · · · ∪ Cn is closed. Proof. In (a), each Cic is open, hence, using DeMorgan’s law and 8.2.4, \ c [ Ci = Cic i∈I

i∈I

is open, that is, i∈I Ci is closed. Part (b) is proved in a similar manner, using DeMorgan’s law for complements of finite unions. T

240

A Course in Real Analysis

8.2.7 Theorem. A subset C of X is closed iff C contains the limit of each convergent sequence in C. Proof. Assume that C is closed and let {xn } be a sequence in C with xn → x. If x 6∈ C, then, because C c is open, there exists ε > 0 such that Bε (x) ∩ C = ∅. But then xn is eventually in Bε (x) ⊆ C c , impossible. Therefore, x ∈ C. Now suppose C is not closed. Then C c is not open, hence there exists x ∈ C c such that B1/n (x) 6⊆ C c , that is, B1/n (x) ∩ C 6= ∅, for every n ∈ N . Choosing a point xn in this intersection, we then obtain a sequence {xn } in C that converges to a point not in C. 8.2.8 Corollary. Let (X, d) be a metric space and let Y be a subspace of X. (a) If X is complete and Y is closed, then Y is complete. (b) If Y is complete, then Y is closed. Proof. (a) Let {yn } be a Cauchy sequence in Y . Since X is complete, there exists x ∈ X such that yn → x. Since Y is closed, x ∈ Y . Therefore, Y is complete. (b) Let {yn } be a sequence in Y such that yn → x ∈ X. Then {yn } is Cauchy and hence converges to some y ∈ Y . Since limits are unique, x = y. Therefore, x ∈ Y , hence Y is closed. 8.2.9 Example. Let C([a, b]) denote the set of all continuous real-valued functions on the interval [a, b]. Each such function is bounded, hence C([a, b]) is a vector subspace of B([a, b]) (8.1.10). Since the uniform limit of continuous functions is continuous (7.2.2), C([a, b]) is closed in the uniform metric. Since B([a, b]) is complete, 8.2.8(a) shows that C([a, b]) is complete. ♦ 8.2.10 Example. The subspace D([a, b]) of C([a, b]) consisting of all differentiable functions is not complete in the uniform metric. To see this take [a, b] = [0, 1] and define a sequence of continuous functions gn (x), n ≥ 2, on [0, 1] such that gn = 1 on [0, 1/2], gn = 0 on [1/2 + 1/n, 1], and gn is linear on [1/2, 1/2 + 1/n]. Also, define g(t) on [0, 1] by g = 1 on [0, 1/2] and g = 0 on (1/2, 1]. (See Figure 8.3.)

gn

g

1

x 1 2

1 2

+

1 n

1

1 2

FIGURE 8.2: The functions gn and g.

1

Metric Spaces

241

Now set fn (x) =

x

Z

gn (t) dt and f (x) =

0

Z

x

g(t) dt,

x ∈ [0, 1].

0

Then fn ∈ D([0, 1]), f ∈ C([0, 1]), and |fn (x) − f (x)| ≤

Z 0

1

|gn − g| =

Z

1/2+1/n

gn =

1/2

1 . 2n

Therefore, fn → f uniformly on [0, 1]. Since f is not differentiable at 1/2, D([0, 1]) is not closed. ♦ 8.2.11 Definition. Let Y be a subset of X. A subset A ⊆ Y is said to be relatively open (relatively closed ) in Y if A is open (closed) in the subspace (Y, d) of (X, d). ♦ 8.2.12 Theorem. Let A ⊆ Y ⊆ X. Then A is relatively open (relatively closed) in Y iff A = Y ∩ B for some open (closed) subset B of X. Proof. By definition, a nonempty open set A in the subspace Y is a union of open balls in Y . The latter are of the form Y ∩ Br (y), where y ∈ Y and Br (y) is an open ball of X. Therefore, A = Y ∩ B, where B is the corresponding union of the open balls Br (y). From the first paragraph, the closed sets of Y are of the form Y \A = Y ∩B c , where B is open in X. Since B c is closed in X, the assertion regarding closed sets follows. 8.2.13 Definition. Let X be a vector space and let a, b ∈ X . The line segment from a to b is defined by [a : b] = {(1 − t)a + tb : 0 ≤ t ≤ 1} . A subset E of X is said to be convex if a, b ∈ E implies [a : b] ⊆ E.

♦

a a

b

b

FIGURE 8.3: Convex and non-convex sets. Recall that, by definition, the convex subsets of R are the intervals. The reader may easily check that if D ⊆ Rp and E ⊆ Rq are convex, then D × E, as a subset of Rp+q , is convex. In particular, Cartesian products I1 × · · · × In of intervals Ij are convex in Rn . Other examples are given in Exercise 5.

242

A Course in Real Analysis

Exercises 1.S Sketch B1 (0, 0) ⊆ R2 for the metrics d1 and d∞ derived from the norms k · k1 and k · k∞ . 2. Prove that a closed ball is closed. 3.S Let x, y be distinct points in a metric space (X, d). Find the largest number r such that Br (x) ∩ Br (y) = ∅. 4. Show that every open subset U of Rn is a countable union of open balls as well as a countable union of bounded open n-dimensional intervals (a1 , b1 ) × · · · × (an , bn ). 5.S Prove that open and closed balls in a normed vector space are convex. Are spheres convex? 6. Show by example that arbitrary intersections of open sets may not be open and that arbitrary unions of closed sets may not be closed. 7. Metrics d and ρ on a set X are said to be topologically equivalent if they have the property that a sequence {xn } converges to x in (X, d) iff it converges to x in (X, ρ). (a) Prove that metrically equivalent metrics are topologically equivalent. (See Exercise 8.1.14.) (b) Prove that d and ρ are topologically equivalent iff (X, d) and (X, ρ) have the same topologies, that is, the metrics produce the same open sets. (c) Are topologically equivalent metrics necessarily metrically equivalent? 8.S Prove that the metric ρ(x, y) = |ex − ey | on R is topologically equivalent to the usual metric. Is R complete in this metric? Is ρ metrically equivalent to the usual metric on R? 9. Let Y be a subspace of (X, d) with the property that for some r > 0, d(x, y) ≥ r for all x, y ∈ Y with x = 6 y. Prove that Y is complete, hence closed. Conclude that finite metric spaces, discrete metric spaces, and the subspaces N and Z of R are complete. 10. Let xn → x0 in (X, d). Prove that the set C := {x0 , x1 , x2 , . . .} is closed in X. 11. Let Y be open (closed) in (X, d). Prove that a subset U of Y is relatively open (relatively closed) in Y iff it is open (closed) in X. 12.S Prove that the set C := {f ∈ C [0, 1] : f (x) = f (1 − x) for all x ∈ [0, 1]} is closed in the supremum metric (8.1.10) but not in the metric of Exercise 8.1.22.

Metric Spaces

243

13. Prove that the subspaces V := f ∈ B [0, +∞) : lim f (x) exists in R and x→+∞ W := f ∈ V : lim f (x) = 0 x→+∞

are closed in the supremum metric.

8.3

Closure, Interior, and Boundary Throughout this section, (X, d) denotes an arbitrary metric space.

8.3.1 Definition. Let E ⊆ X. • The closure cl(E) = clX (E) of E in X is the intersection of all closed subsets of X containing E. • The interior int(E) = intX (E) of E is the union of all open subsets of X contained in E. • The boundary bd(E) = bdX (E) of E is the set cl(E) \ int(E).

♦

8.3.2 Examples. (a) Since every nonempty open set of R (with the usual metric) contains rational and irrational points, int(Q) = int(I) = ∅ and cl(Q) = cl(I) = R, hence bd(Q) = bd(I) = R. For bounded intervals we have cl((a, b)) = [a, b], int([a, b]) = (a, b), and bd((a, b)) = bd([a, b]) = {a, b}. (b) In a discrete metric space a subset E is both open and closed, hence cl(E) = int(E) = E and bd(E) = ∅. ♦ By 8.2.4 and 8.2.6, int(E) is open and cl(E) is closed, hence bd(E) is closed. The following proposition asserts that int(E) is the largest open set contained in E and cl(E) is the smallest closed set containing E. 8.3.3 Proposition. If U is open, C is closed, and U ⊆ E ⊆ C, then U ⊆ int(E) ⊆ E ⊆ cl(E) ⊆ C. Proof. Simply note that U is one of the open sets in the definition of int(E) and that C is one of the closed sets in the definition of cl(E).

244

A Course in Real Analysis

8.3.4 Corollary. Let E ⊆ X. (a) E is open in X iff int(E) = E.

(b) int int(E) = int(E). (d) cl cl(E) = cl(E).

(c) E is closed in X iff cl(E) = E.

Proof. If E is open, take U = E in the proposition. If E is closed, take C = E. This proves (a) and (c). Parts (b) and (d) follow from these. 8.3.5 Proposition. For any subset E of X, c (a) cl(E c ) = int(E) ,

c (b) int(E c ) = cl(E) ,

(c) bd(E) = cl(E) ∩ cl(E c ) = bd(E c ). Proof. For (a) we have c int(E) =

[ U ⊆E U open

c U

=

\

C = cl(E c ).

C⊇E c C closed

Parts (b) and (c) follow from (a). 8.3.6 Proposition. Let E ⊆ X. Then x ∈ cl(E) iff there exists a sequence {an } in E such that an → x. Proof. Let C be the set of all limits of convergent sequences in E, including constant sequences, so E ⊆ C. We show that C = cl(E), which will establish the proposition. First, C is closed. If not, then C c is not open, hence there exists y ∈ C c and for each n a point yn ∈ B1/n (y) ∩ C. By definition of C, each yn is the limit of a sequence in E, hence there exists an ∈ E such that d(yn , an ) < 1/n. By the triangle inequality, d(an , y) < 2/n hence an → y. But then y ∈ C, a contradiction. Therefore C must be closed. It follows that cl(E) ⊆ C. Since cl(E) contains the limit of all convergent sequences in E (8.2.7), C ⊆ cl(E). Therefore, C = cl(E). 8.3.7 Example. (Topologist’s sine curve). Let A = {(x, sin(1/x)) : 0 < x < 2/π} and B = {0} × [−1, 1]. We show that cl(A) = A ∪ B. For the inclusion A ∪ B ⊆ cl(A), note first that 1 2 2 sin : ≤x≤ = [−1, 1], n ∈ Z+ . x (4n + 3)π (4n + 1)π It follows from the intermediate value theorem that for each y ∈ [−1, 1] and n ∈ N there exists xn ∈ R such that 0 < xn ≤

2 and sin(1/xn ) = y. (4n + 1)π

Metric Spaces

245

Since (xn , y) ∈ A and (xn , y) → (0, y), (0, y) ∈ cl(A). Therefore, B ⊆ cl(A), hence A ∪ B ⊆ cl(A). The reverse inclusion will follow if we show that A ∪ B is closed. For this we use 8.2.7. Let {(xn , yn )} be a sequence in A ∪ B with (xn , yn ) → (x, y). Case 1. There exists a subsequence {(xnk , ynk )} that lies in B. Then, since B is closed, (x, y) ∈ B. Case 2. {(xn , yn )} eventually lies in A, so yn = sin(1/xn ) for all sufficiently large n. Since limt→0 sin(1/t) does not exist, x cannot be zero, hence y = sin(1/x), that is, (x, y) ∈ A. In each case (x, y) ∈ A ∪ B, hence A ∪ B is closed. ♦ 8.3.8 Definition. A subset E of X is said to be dense in X if cl(E) = X. Equivalently, every x ∈ X is the limit of a sequence in E. ♦ By 8.3.2, Q and I are dense in R. The set of all points in R2 with rational coordinates is dense in R2 . A discrete space has no proper dense subsets. In Section 8.8 we show that the set of polynomials on [a, b] is dense in C([a, b]) in the uniform norm. 8.3.9 Example. (Dirichlet). If ξ is irrational, then the set E := {nξ + m : m ∈ Z, n ∈ N} is dense in R. To verify this we show that for any x ∈ R and k ∈ N there exists z ∈ E such that |z − x| < 1/k. To this end, let yj = jξ − bjξc, j = 1, . . . , k + 1. Because ξ is irrational, 0 < yj < 1, hence yj must be in one of the intervals (0, 1/k), (1/k, 2/k), . . . , ((k − 1)/k, 1). Since there are only k intervals, one of these must contain yi and yj for some i 6= j.2 By the irrationality of ξ, yj = 6 yi . Hence one of the quantities ±(yj − yi ), call it y, is in E and |y| < 1/k. We consider two cases. If y > 0, choose m ∈ Z such that x + m > 0 and let n be the smallest integer such that ny > x + m. Then n ∈ N and (n − 1)y ≤ x + m, hence z := ny − m ∈ E and 0 < z − x = ny − m − x ≤ y < 1/k. On the other hand, if y < 0, choose m ∈ Z such that x + m < 0 and let n be the smallest integer such that n(−y) > −(x + m), that is, ny < x + m. Again, z := ny − m ∈ E, and in this case, since (n − 1)y ≥ x + m, −1/k < y ≤ ny − m − x = z − x < 0. In either case, |z − x| < 1/k, as required. 2 This

is an instance of the so-called pigeon hole principle.

♦

246

A Course in Real Analysis

8.3.10 Example. We show that the set S := {sin n : n ∈ N} is dense in the interval [−1, 1]. Let x ∈ R and take ξ = 1/2π in the preceding example. Then nk /2π + mk → x for some integer sequences {nk } and {mk } with nk > 0, hence sin nk = sin 2π(nk /2π + mk ) → sin(2πx). Since x was arbitrary, every member of [−1, 1] is the limit of a sequence in S. A similar argument shows that {cos n : n ∈ N} is dense in [−1, 1]. ♦ 8.3.11 Definition. A metric space is said to be separable if it has a countable dense subset. ♦ For example, Rn is separable (consider all points with rational coordinates). An uncountable discrete space is not separable. The space C([a, b]) is separable in the supremum norm (Exercise 19).

Exercises 1. Let (X, d) be a metric space and A, B ⊆ X. Prove the following: (a) S cl(A ∪ B) = cl(A) ∪ cl(B).

(b)

cl(A ∩ B) ⊆ cl(A) ∩ cl(B).

(c)

int(A ∩ B) = int(A) ∩ int(B). (d)

S

(e)

bd(A ∪ B) ⊆ bd(A) ∪ bd(B).

(f)

S

(g)

bd(int(A)) ⊆ bd(A).

(h)

int(A ∪ B) ⊇ int(A) ∪ int(B). bd(cl(A)) ⊆ bd(A). cl(A) = A ∪ bd(A).

Show by examples that the inclusions may be strict. 2. Prove: bd(A ∩ B) ⊆ A ∩ bd(B) ∪ B ∩ bd(A) ∪ bd(A) ∩ bd(B) . Show that the inclusion may be strict. 3. Find cl(A) \ A for A = (a) (c) (e) S

{(1/n, 1/m) : m, n ∈ N} . (b) S (cos t, sin t, e−t ) : t > 0 . t cos t, sin t, : t ∈ R . (d) {(t cos t, t sin t, t) : t > 0} . 1 + |t| cos t sin t t cos t t sin t S , : t>0 . (f) , : t>0 . 1+t 1+t 1+t 1+t

4. An induction argument shows that parts (a) and (c) of Exercise 1 hold for any finite number of sets. Show, by example, that the analogous statements for infinitely many sets are false. 5. Prove that if cl(A) ∩ cl(B) = ∅, then int(A ∪ B) = int(A) ∪ int(B). 6. Let Y be a subspace of (X, d) and A ⊆ Y . Prove that (a)S clY (A) = clX (A) ∩ Y . (c) bdY (A) ⊆ bdX (A).

(b) intX (A) ∩ Y ⊆ intY (A).

Show by examples that the inclusions in (b) and (c) may be strict.

Metric Spaces

247

7. Let xn → x0 in X. Show that cl {x1 , x2 , . . .} = {x0 , x1 , x2 , . . .}. 8.S Let fn (x) = xn, 0 ≤ x ≤ 1. Show that the set{f1 , f2 , . . .} is closed in C([0, 1]), k · k∞ . Is it closed in C([0, 1]), k · k1 ? 9. Let B = Br (x0 ) and C = Cr (x0 ). Prove that (a)S B ⊆ int(C).

(b) cl(B) ⊆ C.

(c) bd(B) ⊆ C \ B.

Show, by example, that the inclusions may be strict. 10. Prove that in a normed vector space the inclusions in Exercise 9 are equalities. 11. Prove that the set E = {(x, y) : x, y ∈ Q and x 6= y} is neither open nor closed and is dense in Euclidean space R2 . 12. Let x ∈ R, r ∈ Q, r 6= 0. In each case, find the largest interval in which the given set is dense. (a) {sin(rn) : n ∈ N}. (c) {sin n cos n : n ∈ N}.

(b)S {sin(x + n) : n ∈ N}. (d) tan2 n : n ∈ N .

13. Show that limn sin(πnx) does not exist for any irrational number x. Conclude that limn sin(nr) does not exist for any nonzero rational number r. 14. (a) Let E be dense in X and let F be a proper finite subset of E. Show that E \ F is dense in X \ F . Is E \ F is necessarily dense in X? (b) Let X be a normed vector space with {a1 , a2 , . . .} dense in X . Show that {an : n ≥ N } is dense in X for every N ∈ N. Conclude that {sin n : n ≥ N } is dense in [−1, 1]. 15. Show that lim inf n sin n = −1 and lim supn sin n = 1. 16.S Let Y be dense in X and U ⊆ X open. Show that U ∩ Y is dense in U . What if U is not open? 17. Let X = Rn with the Euclidean metric and let Y ⊆ X have the property of Exercise 8.2.9. Prove that Y c is open and dense in X. Conclude that Nc and Zc are open and dense in R. 18. Show that in a separable space, every nonempty open set U is a countable union of open balls. 19. Use the Weierstrass approximation theorem (8.8.5, below) to show that C([a, b]), k · k∞ is separable 20. (a)S Let {Ii : i ∈ I} be a family of open intervals in R S with the property that each pair has a nonempty intersection. Show that i∈I Ii is an open interval. (b) Prove that every nonempty open set in R is a countable union of disjoint open intervals.

248

A Course in Real Analysis

8.4

Limits and Continuity

In this section, (X, d), (Y, ρ), and (Z, µ) denote arbitrary metric spaces. 8.4.1 Definition. Let E ⊆ X. A member a ∈ X is said to be an accumulation point of E if E ∩ Br (a) \ {a} 6= ∅ for each r > 0. A member of E that is not an accumulation point is called an isolated point of E. ♦ It follows from the definition that a is an accumulation point of E iff there exists a sequence of distinct points of E converging to a. No subset of a discrete metric space has an accumulation point. The set of functions x 7→ xn in C([0, 1]), n ∈ N, has no accumulation points in the uniform norm but the identically zero function is an accumulation point in the norm k · k1 . 8.4.2 Definition. Let E ⊆ X, f : E → Y , and let a ∈ X be either a member of E or an accumulation point of E. If b ∈ Y , we write b = lim{x→a, x∈E} f (x) if for each ε > 0 there exists δ > 0 such that x ∈ E and d(x, a) < δ implies ρ(f (x), b) < ε. In the special case E = X \ {a}, we write simply b = limx→a f (x).

(8.3) ♦

Note that condition (8.3) may be written f E ∩ Bδ (a) ⊆ Bε (b). This observation will be useful later in proving a global characterization of continuity. Many of the results in Chapter 3 on limits of functions on subsets of R hold for real-valued functions defined on a metric space. These include the theorems on limits of sums, products, and quotients of functions, the comparison theorem, the squeeze principle, and the sequential characterization of limit. The statements and proofs are essentially the same: simply replace |x − y| by the metric d(x, y). For future reference, we explicitly state: 8.4.3 Sequential Characterization of Limit. Let a be an accumulation point of E ⊆ X and let f : E → Y . Then lim{x→a, x∈E} f (x) exists and equals b ∈ Y iff f (an ) → b for all sequences {an } in E with an → a. The following theorem gives sufficient conditions for a double limit to equal an iterated limit.

Metric Spaces

249

8.4.4 Iterated Limit Theorem. Let X×Y have the product metric η := d×ρ, and let a and b be accumulation points of X \ {a} and Y \ {b}, respectively. If f : X × Y \ {(a, b)} → Z has the properties (a) g(x) := limy→b f (x, y) exists in Z for each x ∈ X, and (b) z := lim(x,y)→(a,b) f (x, y) exists in Z, then limx→a g(x) exists and equals z. Proof. Given ε > 0, by (b) choose δ > 0 such that µ f (x, y), z < ε for all (x, y) ∈ X × Y with 0 < η (x, y), (a, b) < δ. Let 0 < d(x, a) < δ. Then, for all y sufficiently near b, η (x, y), (a, b) < δ, hence µ g(x), z ≤ µ g(x), f (x, y) + µ f (x, y), z < µ g(x), f (x, y) + ε. Letting y → b in this inequality, noting that f (x, y) → g(x), we obtain µ g(x), z ≤ ε. This shows that limx→a g(x) = z. The theorem implies that lim

(x,y)→(a,b)

f (x, y) = lim lim f (x, y) = lim lim f (x, y) x→a y→b

y→b x→a

provided the limit on the left exists and inner limits on the right exist for each x and y, respectively. The limits on the right are called iterated limits and the limit on the left is sometimes called a double limit. In particular, if the iterated limits exist and are unequal, then the double limit cannot exist. In many cases, the iterated limit theorem (suitably modified) still holds if f is defined on subsets E of X × Y more general than X × Y \ {(a, b)}. This is the case in Examples (c) and (d) that follow. 8.4.5 Examples. In (a)–(e), X = Y = Z = R. Note that in this case the product metric η is equivalent to the Euclidean metric on R2 . (a) Let E = (0, +∞) × (0, +∞). To calculate the limit lim

(x,y)→(0,0) (x,y)∈E

we write the function as

sin(x + 2y) 2x + y

sin(x + 2y) x + 2y . x + 2y 2x + y

As (x, y) → (0, 0) along E, the first factor tends to 1 but the second factor has no limit. Indeed, along a path y = mx, m > 0, x > 0, x + 2y x + 2mx 1 + 2m = = . 2x + y 2x + mx 2+m

250

A Course in Real Analysis

Therefore, the double limit does not exist. The iterated limits exist and are unequal: lim lim f (x, y) = lim+

y→0+ x→0+

y→0

sin(2y) = 2, y

lim lim f (x, y) = lim+

x→0+ y→0+

x→0

sin x 1 = . 2x 2

(b) Let E be as in (a) and let p, q > 0. The limit L :=

xp + y q (x,y)→(0,0) x2 + y 2 lim

(x,y)∈E

exists iff p, q > 2 or p = q = 2. In the former case, L = 0 and in the latter, L = 1. This is best seen by converting to polar coordinates x = r cos θ, y = r sin θ, 0 < θ < π/2: L = lim rp−2 cosp θ + rq−2 sinq θ . r→0+

Both iterated limits exist iff p, q ≥ 2. (c) Let E = {(x, y) : x > 0, y > 0, x 6= y}. Then xp − y p (x,y)→(0,0) x − y lim

(x,y)∈E

exists iff p ≥ 1 and has zero limit if p > 1. Indeed, if 0 < x < y, then, by the mean value theorem, there exists t ∈ (x, y) such that xp − y p = ptp−1 (x − y), hence xp − y p pxp−1 < < py p−1 , x−y and the assertion follows from the squeeze principle. Clearly, the iterated limits exist (hence equal the double limit) iff p ≥ 1. (d) Let E be as in (c). Then xp + y p (x,y)→(0,0) y − x lim

(x,y)∈E

does not exist for any value of p Indeed, along the path y = mx, m, x > 0, m 6= 1, the function has values xp + (mx)p xp−1 (1 + mp ) = mx − x 1−m so the limit cannot exist if p ≤ 1. Let p > 1 and set θr = mrp−1 + π/4. Along the path given by r x = r cos θr = √ cos mrp−1 − sin mrp−1 2 r y = r sin θr = √ cos mrp−1 + sin mrp−1 , 2

Metric Spaces

251

where r ↓ 0, the function has values xp + y p 1 mrp−1 =√ cosp θr + sinp θr , p−1 ) y−x sin(mr 2m which tends to 2(1−p)/2 /m as r → 0. Neither of the iterated limits exists if p < 1. If p > 1, then clearly xp + y p xp + y p = lim lim = 0, y→0 x→0 y − x x→0 y→0 y − x lim lim

and if p = 1, then xp + y p xp + y p = −1, while lim lim = 1. y→0 x→0 y − x x→0 y→0 y − x lim lim

(e) Let E = {(x, y) : x > 0, y > 0}. Then xp y (x,y)→(0,0) x + y lim

(x,y)∈E

exists iff p > 0, in which case the limit is zero. Indeed, along the path y = mx the function has values mxp /(1 + m), so the limit cannot exist if p ≤ 0. If p > 0, one can introduce polar coordinates as in (b). Both iterated limits exist iff p ≥ 0, but are unequal if p = 0. ♦ 8.4.6 Definition. A function f : X → Y is said to be continuous at a point a ∈ X if limx→a f (x) = f (a). Also, f is said to be continuous on a set E ⊆ X if f is continuous at each point of E. If E = X, then f is simply said to be continuous. If f is one-to-one and onto Y and if f −1 : Y → X is continuous, then f is called a homeomorphism. ♦ From the sequential characterization of limit we have 8.4.7 Sequential Characterization of Continuity. Let f : X → Y and a ∈ X. Then is continuous at a iff f (an ) → f (a) for all sequences {an } in X with an → a. The next theorem gives an important global characterization of continuity. 8.4.8 Theorem. Let f : X → Y . The following statements are equivalent: (a) f is continuous. (b) f −1 (V ) is open in X for each open subset V of Y . (c) f −1 (C) is closed in X for each closed subset C of Y .

252

A Course in Real Analysis

Proof. That (b) and (c) are equivalent follows from the general set-theoretic c identity f −1 B c = f −1 (B) . (a) ⇒ (b): Let V ⊆ Y be open. If x ∈ f −1 (V ), then f (x) ∈ V so there exists ε > 0 such that Bε f (x) ⊆ V . By continuity there exists δ > 0 such −1 that f Bδ (x) ⊆ Bε f (x) . Therefore, f Bδ (x) ⊆ V , hence B (V ). δ (x) ⊆ f −1 (b) ⇒ (a): Let x ∈ X and ε > 0. Since U := f Bε f (x) is open in X and contains x, we may choose δ > 0 such that Bδ (x) ⊆ U . Then f Bδ (x) ⊆ f (U ) ⊆ Bε f (x) , which shows that f is continuous at x. 8.4.9 Definition. A function f : X → Y is said to be uniformly continuous on a set E ⊆ X if, given ε > 0, there exists δ > 0 such that ρ(f (u), f (v)) < ε for all u, v ∈ E with d(u, v) < δ.

♦

8.4.10 Example. The function f (x, y) =

1 2.1 + sin x + sin y

is uniformly continuous on R2 . Indeed, for all (x, y), (a, b) ∈ R2 , | sin x + sin y) − (sin a + sin b)| (2.1 + sin x + sin y)(2.1 + sin a + sin b) | sin x − sin a| + | sin y − sin b| ≤ (2.1 + sin x + sin y)(2.1 + sin a + sin b) ≤ 100| sin x − sin a| + 100| sin y − sin b)|

|f (x, y) − f (a, b)| =

≤ 100(|x − a| + |y − b|) p ≤ 200 (x − a)2 + (y − b)2 .

♦

The proof of the following theorem is entirely analogous to that of 3.5.2. The details are left to the reader. 8.4.11 Sequential Characterization of Uniform Continuity. A function f : X → Y is uniformly continuous on E ⊆ X iff ρ f (un ), f (vn ) → 0 for all sequences {un } and {vn } in E with d(un , vn ) → 0. For example, every function on a discrete metric space is uniformly continuous,R since eventually un = vn . The indefinite integral function x F (f )(x) = a f (t) dt on the space C([a, b]) is uniformly continuous with respect to the uniform norm, since kfn − gn k∞ → 0 ⇒ kF (fn ) − F (gn )k∞ → 0. The addition function (x, y) 7→ x + y is uniformly continuous on R2 since (xn , yn ) − (an , bn ) → (0, 0) clearly implies that xn + yn − (an + bn ) → (0, 0). On the other hand, the multiplication function (x, y) 7→ xy is not uniformly continuous on R2 , since (n+1/n, n+1/n)−(n, n) → 0 but (n+1/n)2 −n2 → 1.

Metric Spaces

253

Exercises 1. For each of the functions f (x) below, find lim{x→0, x∈E} f (x) and the corresponding iterated limits or show that the limits fail to exist. In each case take E to be the natural domain of the function. (a) (d) (g) (j) (m) (p)

y 2 + sin2 x . 3x2 + 2y 2 sin x sin y p . x2 + y 2

x2 y 2 x2 y . (c) . 2 4 + 2y x + 7y 4 x4 1 (e) S 4 . (f) (x + y) sin 2 . 2 4 x − xy + y x + y2 p (1 + x2 )(1 + y 2 ) − 1 sin(3xy 2 + 2xy 3 ) xy 2 cos(xy) S . (h) . (i) . 2 2 2 xy x +y x2 + y 2 3x + 2y x2 + |y|2.1 1 − cos(xy) S . (l) . . (k) sin x sin y x2 + y 2 (x2 + y 2 )1/3 p 1 − cos |xy| sin x ± sin y x−y . (n) . (o) S . |x|p x−y ln x − ln y xy + yz + xz x|y|1.1 3x2 + 2y 2 + z 2 p . (r) S p . (q) . x2 + y 2 + z 2 sin2 x2 + y 2 x2 + y 2 + z 2 (b) S

5x2

2.S Let a > 0, p > 1. Evaluate the limit x2 − 5y 2 . (x,y)→(0,0) x2 + 3y 2 lim

(x,y)∈E

for the sets (a) E = {(x, y) : |y| ≤ a|x|p |}

(b) E = {(x, y) : |y| < |x|}.

3.S Let f be continuously differentiable on (−π/2, π/2). Define g on the set E := {(x, y) ∈ (−π/2, π/2)2 : x 6= y} by g(x, y) =

f (x) − f (y) . sin x − sin y

Show that g has a continuous extension to (−π/2, π/2)2 . 4. Let f and g be continuously differentiable on some open interval (a, b) and suppose that g 0 6= 0. Define h on the set E := {(x, y) ∈ (a, b)2 : x 6= y} by h(x, y) =

f 2 (x) − f 2 (y) . g(x) − g(y)

Prove that h has a continuous extension to (a, b)2 .

254

A Course in Real Analysis

5. Let f : X → Y . Prove that the following statements are equivalent: (a) f is continuous. (b) f cl(A) ⊆ cl f (A) for each subset A of X. (c) cl f −1 (B) ⊆ f −1 (cl(B)) for each subset B of Y . (d) f −1 int(B) ⊆ int f −1 (B) for each subset B of Y . 6.S Show that d : X × X → R is uniformly continuous with respect to the product metric η := d × d on X × X. 7.S Let f : [0, a) → R and g(x, y) := f

p

x2 + y 2 ,

p x2 + y 2 < a.

(a) Prove that g is uniformly continuous iff f is uniformly continuous. (b) Use (a) to show that the functions p x2 + y 2 , p

1 x2

+

y2

+1

, and sin

p

x2 + y 2

are uniformly continuous on R2 but sin(x2 + y 2 ) is not. 8.S Let f (x) be uniformly continuous on R. Prove that the function g(x, y) := f (αx + βy) is uniformly continuous on R2 . Give an example of a bounded uniformly continuous function f on R such that the function h(x, y) := f (xy) is not uniformly continuous on R2 . 9. Show that the function f (x, y) =

1 1 − sin x sin y

is uniformly continuous on the set Er := [−π/2 + r, π/2 − r] × [−π/2 + r, π/2 − r] for any 0 < r < π/2, but is not uniformly continuous on E := (−π/2, π/2) × (−π/2, π/2). 10. Let f : (X, d) → (Y, ρ) and g : (Y, ρ) → (Z, µ) be (uniformly) continuous. Prove that g ◦ f : (X, d) → (Z, µ) is (uniformly) continuous. 11.S Let f : X → Rk , say f (x) = f1 (x), . . . , fk (x) . Prove that f is (uniformly) continuous iff each fj is (uniformly) continuous. 12.S Let fn : (X, d) → (Y, ρ) converge uniformly to f on X. Prove that if each fn is (uniformly) continuous, then f is (uniformly) continuous.

Metric Spaces

8.5

255

Compact Sets

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces. Compactness is one of the most important concepts in analysis. For example, it allows the formulation of results such as the extreme value theorem and the uniform continuity theorem in the context of general metric spaces. It is also the key feature that distinguishes the finite dimensional space Rn from its infinite dimensional counterparts `∞ and `1 . 8.5.1 Definition. Let E ⊆ X. A collection U = {Ui : i ∈ I} of subsets of X is called a cover of E if E is contained in the union of the sets Ui . If each Ui is open, then U is called an open cover of E. A cover U of E is said to have a finite subcover if there exists a finite subset I0 of I such that {Ui : i ∈ I0 } is a cover of E. If every open cover of E has a finite subcover, then E is said to be compact. ♦ Finite subsets of a metric space are compact. In a discrete metric space, these are the only compact sets. Indeed, if E is an infinite subset of a discrete space, then {x} : x ∈ E is an open cover of E with no finite subcover. 8.5.2 Proposition. A compact subset of a metric space is closed and bounded. Proof. Let E be compact and let a ∈ E c . For each x ∈ E let Ux and Vx denote disjoint open balls with centers x and a, respectively (see Figure 8.5). Then {Ux : x ∈ E} is an open cover of E, hence there T exists a finite subset E0 of E such that {Ux : x ∈ E0 } covers E. Set V = x∈E0 Vx . Then V is an open ball with center a, and since V ∩ Ux = ∅ for each x ∈ E0 , V ⊆ E c . Therefore E c is open.

a Vx E

x Ux

FIGURE 8.4: The neighborhoods Ux and Vx . To show that E is bounded, choose any x ∈ X and consider the open cover {Bn (x) : n ∈ N} of E. Let F be a finite subset of N such that {Bn (x) : n ∈ F } covers E. Then E ⊆ Bm (x), where m is the largest member of F .

256

A Course in Real Analysis

√ √ The converse of 8.5.2 is false. For example, the set Q ∩ [− 2, 2] is closed and bounded in Q but not compact. Indeed, if {rn } is a sequence √ √ √ in Q with rn ↑ 2, then {(−rn , rn ) : n ∈ N} is an open cover of Q ∩ [− 2, 2] with no finite subcover. For another example, consider a discrete metric space. Here, the entire metric space is closed and bounded but only finite sets are compact. 8.5.3 Proposition. A closed subset of a compact metric space is compact. Proof. Let X be compact, E ⊆ X closed, and let U = {Ui : i ∈ I} be an open cover of E. Then U ∪ {E c } is an open a finite S cover of X, hence there exists S subset I0 of I such that X = E c ∪ λ∈I0 Ui . It follows that E ⊆ i∈I0 Ui . Closely related to compactness is the notion of total boundedness. 8.5.4 Definition. Let E ⊆ X and ε > 0. An ε-net for E is a set F ⊆ X such that {Bε (x) : x ∈ F } covers E. E is said to be totally bounded if for each ε > 0 there exists a finite ε-net for E. ♦ An ε-net F for E has the property that every member of E is within ε of a member of F . For example, Q is an ε-net for R, and Z is a 1-net for R. The following proposition shows that the set F in the definition of total boundedness may be taken to be a subset of E. 8.5.5 Proposition. If E has a finite ε-net F , then E has a finite 2ε-net contained in E. Proof. For each x ∈ F , apply the following procedure: If E ∩Bε (x) = ∅, remove x from F . Otherwise, choose any a ∈ E ∩ Bε (x) and replace Bε (x) by B2ε (a)

ε x a 2ε E

FIGURE 8.5: A 2ε-net. and x in F by a. Since Bε (x) ⊆ B2ε (a), the revised set is a finite 2ε-net for E contained in E. Since a finite union of open balls is bounded (Exercise 8.1.3), every totally bounded set is bounded. The converse is false. For example, in a discrete space all sets are bounded but no infinite set can be totally bounded. Open and closed balls in C([0, 1]) with the supremum norm are bounded but not totally bounded (Exercise 8). Contrast this with the following example:

Metric Spaces

257

8.5.6 Example. Every bounded subset E of Rn is totally bounded. To see this, √ let ε > 0 and choose k ∈ N so large that E ⊆ [−kδ, kδ]n , where 0 < δ < 2ε/ n. Subdividing, we see that I is a finite union of sets of the form J := [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ], where bj − aj = δ. √ (See Figure 8.6.) The largest diagonal in J has length n δ < 2ε, hence J may

kδ

c

J B (c)

E −kδ

kδ

−kδ FIGURE 8.6: A bounded set in Rn is totally bounded. be enclosed in an open ball with radius ε and center c = (c1 , . . . cn ), where cj = (aj + bj )/2. The resulting collection of balls is a finite ε-cover of E. ♦ 8.5.7 Definition. A subset E of X is said to be sequentially compact if every sequence in E has a cluster point in E. ♦ By the Bolzano–Weierstrass theorem, closed and bounded intervals√in R are sequentially compact. The same is not true in Q; for example, Q ∩ [0, 2] is not sequentially compact. In a discrete space, no infinite set can be sequentially compact since sequences with distinct terms cannot converge. 8.5.8 Heine–Borel Theorem. The following statements are equivalent: (a) X is compact. (b) X is sequentially compact. (c) X is complete and totally bounded. Proof. (a) ⇒ (b): We prove the contrapositive ∼(b) ⇒ ∼(a). Let {an } be a sequence in X with no cluster point. Then for each x ∈ X there must exist an open ball B(x) with center x that contains only finitely many terms of the sequence. This implies that every finite subcover of the open cover

258

A Course in Real Analysis

{B(x) : x ∈ X} of X contains only finitely many terms of the sequence and hence cannot cover X. Therefore, X is not compact. (b) ⇒ (c): Let X be sequentially compact and let {an } be a Cauchy sequence in X. By hypothesis, {an } has a convergent subsequence, say ank → a ∈ X. By Exercise 8.1.9, an → a. Therefore, X is complete. Suppose that X is not totally bounded. Then there exists ε > 0 such that no finite collection of open balls of radius ε covers X. Choose any a1 ∈ X. Since Bε (a1 ) does not cover X, there exists a2 ∈ X \ Bε (a1 ). Since Bε (a1 ) ∪ Bε (a2 ) does not cover X, there exists a3 ∈ X \ Bε (a1 ) ∪ Bε (a2 ) . Continuing in this fashion, we construct a sequence {an } in X such that an ∈ X \ Bε (a1 ) ∪ Bε (a2 ) ∪ · · · ∪ Bε (an−1 ) . It follows that d(an , am ) ≥ ε for all m 6= n. But then no subsequence of {an } can converge. Therefore, X must be totally bounded. (c) ⇒ (a): Assume that X is complete and totally bounded but not compact. Then X has an open cover U = {Ui : i ∈ I} with no finite subcover. For each k let Fk be a finite set of points in X such that {B1/k (x) : x ∈ Fk } is a cover of X. Consider the case k = 1. If for each x ∈ F1 the ball B1 (x) could be covered by finitely many members of U, then X itself would have such a cover, contradicting our assumption. Thus there exists x1 ∈ F1 such that E1 := B1 (x1 ) cannot be covered by finitely many members of U. Since {B1/2 (x) : x ∈ F2 } covers X, {E1 ∩ B1/2 (x) : x ∈ F2 } covers E1 , so by similar reasoning there exists x2 ∈ F2 such that E2 := E1 ∩ B1/2 (x2 ) cannot be covered by finitely many members of U. In this manner we construct a sequence of points xn in X and decreasing sets En = B1 (x1 ) ∩ B1/2 (x2 ) ∩ · · · ∩ B1/n (xn ) = En−1 ∩ B1/n (xn )

(8.4)

that cannot be covered by finitely many members of U. In particular, En 6= ∅. Choose a point yn ∈ En . If n > m, then yn ∈ Em , hence from (8.4) d(xm , xn ) ≤ d(xm , yn ) + d(yn , xn ) < 1/m + 1/n. It follows that {xn } is a Cauchy sequence. Since X is complete, xn → x for some x ∈ X. Choose i ∈ I such that x ∈ Ui . Since Ui is open, there exists r > 0 such that Br (x) ⊆ Ui . Next, choose n > 2/r such that d(xn , x) < r/2. By the triangle inequality, B1/n (xn ) ⊆ Br (x). But then En ⊆ Ui , contradicting the noncovering property of En . Therefore, X must be compact, completing the proof. 8.5.9 Corollary. A subset of Rn is compact iff it is closed and bounded. Proof. We have already seen that a compact set in a metric space is closed and bounded. Conversely, let C ⊆ Rn be closed and bounded. Since Rn is complete (Exercise 8.1.17), C is complete (8.2.8). Since C is bounded, it is totally bounded (8.5.6). By the theorem, C is compact.

Metric Spaces

259

The validity of the preceding corollary ultimately rests on the finite dimen sionality of Rn . For infinite dimensional normed spaces such as C [0, 1] , a closed and bounded set need not be compact (Exercise 8). In the next section, we characterize the compact subsets of spaces like C [0, 1] . 8.5.10 Theorem. If f : X → Y is continuous and X is compact, then f (X) is compact. Proof. Let {Vi : i ∈ I} be an open cover of f (X) in Y . For each i ∈ I, set Ui = f −1 (Vi ). Then {Ui : i ∈ I} is an open cover of X, hence there exists a finite subset I0 of I such that {Ui : i ∈ I0 } is a cover of X. It follows that {Vi : i ∈ I0 } is a finite cover of f (X). 8.5.11 Corollary. Let f : X → Y be continuous, one-to-one, and onto Y . If X is compact then f −1 : Y → X is continuous, hence f is a homeomorphism. Proof. Let g = f −1 and let C be a closed subset of X. Then C is compact (8.5.3), hence, by the theorem, g −1 (C) = f (C) is compact and therefore closed in Y (8.5.2). By 8.4.8, g is continuous. Corollary 8.5.11 is false for noncompact X (Exercise 19). 8.5.12 Extreme Value Theorem. If f : X → R is continuous and X is compact, then there exist points xm and xM in X such that f (xm ) ≤ f (x) ≤ f (xM ) for all x ∈ X. Proof. By 8.5.10 and 8.5.2, f (X) is closed and bounded in R and therefore contains its supremum and infimum. 8.5.13 Theorem. If f : X → Y is continuous and X is compact, then f is uniformly continuous. Proof. Let ε > 0. By continuity, for each x ∈ X there exists γx > 0 such that f Bγx (x) ⊆ Bε/2 f (x) . (8.5) Set δx = γx /2. The collection {Bδx (x) : x ∈ X} is an open cover of X, hence there exists a finite set F ⊆ X such that the collection {Bδx (x) : x ∈ F } covers X. Let δ := minx∈F δx and let a, b ∈ X with d(a, b) < δ. Choose x ∈ F such that a ∈ Bδx (x). Then d(x, a) < δx < γx

and d(x, b) ≤ d(a, b) + d(x, a) < δx + δx = γx ,

so a, b ∈ Bγx (x). By (8.5), ρ f (a), f (b) ≤ ρ f (a), f (x) + ρ f (x), f (b) < ε/2 + ε/2 = ε. Therefore, f is uniformly continuous.

260

A Course in Real Analysis The following is a generalization of 3.5.9.

8.5.14 Corollary. Let X be compact, Y complete, E a dense subset of X, and f : E → Y continuous. The following statements are equivalent: (a) lim{x→a, x∈E} f (x) exists for each a ∈ X. (b) f has a continuous extension to X; that is, there exists a continuous function g : X → Y such that g|E = f . (c) f is uniformly continuous on E. Proof. (a) ⇒ (b): For each a ∈ X define g(a) = lim{x→a, x∈E} f (x). Since f is continuous, g|E = f . If g is not continuous at a ∈ X, then thereexist ε > 0 and a sequence {xn } in X such that xn → a and ρ g(xn ), g(a) ≥ ε for all n. By definition of g(xn ), for each n we may choose an ∈ E such that d(xn , an ) < 1/n and ρ g(xn ), f (an ) < ε/2. Then an → a but ρ f (an ), g(a) ≥ ρ g(xn ), g(a) − ρ g(xn ), f (an ) > ε/2, contradicting the definition of g(a). Therefore, g is continuous. (b) ⇒ (c): By 8.5.13, g is uniformly continuous on X, hence f is uniformly continuous on E. (c) ⇒ (a): Let a ∈ X and let {xn } be a sequence in E such that xn → a. Since f is uniformly continuous, {f (xn )} is Cauchy and therefore converges to some b ∈ Y . If {yn } is another sequence in E such that yn → a, then d(yn , xn ) → 0 so, by uniform continuity again, ρ f (yn ), f (xn ) → 0, hence f (yn ) → b. By the sequential criterion for limits, lim{x→a, x∈E} f (x) exists and equals b.

Exercises 1. Determine which of the following subsets of R2 are closed, bounded, or compact. (a) S {(x, y) : 2x2 + y 2 + 6y ≤ 8x}.

(b) S {(x, y) : 3x2 + 2y ≤ 6x}.

(c) {(x, y) : xy = 1}.

(d) {(x, y) : x1/3 + y 1/3 = 1}. x cos x x sin x (f) S , :x≥0 . 1+x 1+x

(e) {(x, y) : x2/3 + y 2/3 = 1}. −x (g) (e cos x, e−x sin x) : x ≥ 0 . (h) S {(x, y) : x3 /y + y 3 /x > 0}.

2. Let {xn } be a convergent sequence in X with xn → x0 . Prove that the set {x0 , x1 , x2 , . . .} is compact. 3.S Prove that a finite union of totally bounded (compact) sets is totally bounded (compact).

Metric Spaces

261

4.S Prove that the intersection of an arbitrary family of compact subsets of a metric space X is compact. 5. Prove that X × Y is compact in the product metric η := d × ρ iff X and Y are compact. 6. Prove that the closure of a totally bounded subset of a metric space is totally bounded. 7.S Prove that a subset E of a complete metric space X is totally bounded iff every sequence in E has a cluster point in X. 8. Prove that in C([0, 1]), k · k∞ , the closed ball with radius 1 and center the zero function is not compact. 9. Let C0 ([0, +∞)) be the vector subspace of B([0, +∞)) consisting of all realvalued continuous functions f on [0, +∞) such that limx→+∞ f (x) = 0. Prove that C0 ([0, +∞)) is closed in the uniform norm and that the closed ball C1 (0) in C0 ([0, +∞)) with radius 1 and center the zero function is not compact and therefore is not totally bounded. 10. For n ∈ N, define fn ∈ B([0, +∞)) by fn = 1 on [n, n + 1] and zero elsewhere. Prove that the set E := {f1 , f2 , . . .} is bounded but not totally bounded in the sup metric. 11.S (Cantor’s intersection theorem). Let C1 , C2 , . . . be a sequence of nonempty compact subsets of a metric space X such that Cn+1 ⊆ Cn T∞ for all n. Prove that n=1 Cn 6= ∅. 12. A collection of subsets of a metric space X is said to have the finite intersection property if every finite subcollection has a nonempty intersection. Prove that X is compact iff every collection of closed subsets of X with the finite intersection property has a nonempty intersection. 13.S The diameter of a nonempty subset A of (X, d) is defined by d(A) := sup {d(a, b) : a, b ∈ A} . (a) Prove that if A is compact, then there exist points a, b ∈ A such that d(A) = d(a, b). (b) Give an example of a closed and bounded set A in a metric space such that d(A) > d(a, b) for all a, b ∈ A. 14. ⇓3 The distance between nonempty subsets A and B of (X, d) is defined as d(A, B) := inf {d(a, b) : a ∈ A, b ∈ B} . 3 This

exercise will be used in 8.7.2.

262

A Course in Real Analysis (a) Prove that if A and B are disjoint with A closed and B compact, then d(A, B) > 0. (b) Show by example that the conclusion in (a) is false if B is merely closed. (c) Show that if both sets are compact, then there exist a ∈ A and b ∈ B such that d(A, B) = d(a, b).

15.S ⇓4 Let A be a nonempty subset of X and define d(A, ·) : X → R by d(A, x) = d(A, {x}) (see Exercise 14). Prove the following: (a) |d(A, x) − d(A, y)| ≤ d(x, y), hence d(A, ·) is uniformly continuous. (b) d(A, x) = 0 iff x ∈ cl(A). (c) If A and B are disjoint closed sets, then the function FAB (x) =

d(x, A) , d(x, A) + d(x, B)

x ∈ X,

is well-defined and continuous, 0 ≤ FAB ≤ 1 on X, and A = {x : FAB (x) = 0}, B = {x : FAB (x) = 1}. (d) If A and B are disjoint closed sets of X, then there exist disjoint open sets U and V such that A ⊆ U and B ⊆ V . (U and V are then said to separate A and B.) 16. Referring to 8.1.10, show that the set {f ∈ `∞ : |f (n)| ≤ e−n } is compact. Is {f ∈ B([1, +∞)) : |f (x)| ≤ e−x } compact? 17. (Lebesgue’s number). Let X be compact and let U = {Ui : i ∈ I} be an open cover of X. Prove that there exists a number r > 0 such that every set with diameter < r (Exercise 13) is contained in some Ui . 18. (Dini’s Theorem). Let X be compact and let fn , g : X → R be continuous such that either fn ↓ g or fn ↑ g on X. Prove that the convergence is uniform. (See 7.1.12.) 19.S Let f : [0, 2π) → R2 be defined by f (t) = (cos t, sin t). Show that f is continuous, one-to-one, and maps [0, 2π) onto the circle x2 + y 2 = 1 but has a discontinuous inverse. 20. Let f : R2 → R be defined by f (x, y) = x. Prove or disprove: (a) If E ⊆ R2 is closed, then f (E) is closed. (b) If E ⊆ R2 is open, then f (E) is open. 4 This

exercise will be used in 11.2.17.

Metric Spaces

263

21.S Let A and B be compact subsets of R. Prove that the sets AB := {ab : a ∈ A, b ∈ B} and A + B := {a + b : a ∈ A, b ∈ B} are compact. 22. ⇓5 Let a sequence of continuous functions fn : (X, d) → (Y, ρ) converge uniformly to f on X, let C ⊆ X be compact, and let U ⊆ Y be open. Prove that if f (C) ⊆ U , then fn (C) ⊆ U for all sufficiently large n.

*8.6

The Arzelà–Ascoli Theorem

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces and C(X, Y ) denotes the set of all continuous functions from X to Y . As noted in the previous section, closed and bounded subsets in infinite dimensional spaces such as C [0, 1] need not be compact. The additional property of equicontinuity is needed to characterize compact subsets of such spaces. 8.6.1 Definition. A family F of functions in C(X, Y ) is said to be a ∈ X if, for each ε > 0, there exists δ > 0 • equicontinuous at a point such that ρ f (x), f (a) < ε for all x ∈ X with d(x, a) < δ and all f ∈ F; • equicontinuous on E ⊆ X if F is equicontinuous at each point of E; • uniformly equicontinuous on E if, for each ε > 0, there exists δ > 0 such that ρ f (x), f (y) < ε for all f ∈ F and all x, y ∈ E with d(x, y) < δ.♦ The distinguishing feature of equicontinuity is that, while δ may vary with the point a, it is independent of the functions f ∈ F. With uniform equicontinuity, δ is independent of both f and a. 8.6.2 Example. For each x, t ∈ R, define ft (x) = tx. Let I = (c, d) be a bounded interval and set M = max{|c|, |d|}. The inequality |ft (x) − ft (y)| = |t| |x − y| ≤ M |x − y|, t ∈ I, shows that the collection of functions {ft : t ∈ I} is uniformly equicontinuous on R. On the other hand, the larger collection {ft : t ∈ R} is not equicontinuous at any a ∈ R. Indeed, no δ can be chosen so that |tx − ta| < 1 for all t ∈ R and all x ∈ R with |x − a| < δ. ♦ 5 This

exercise will be used in 13.6.5.

264

A Course in Real Analysis A straightforward modification of the proof of 8.5.13 yields

8.6.3 Theorem. If X is compact and F is equicontinuous on X, then F is uniformly equicontinuous. 8.6.4 Definition. A metric space is said to have the Bolzano–Weierstrass property if every bounded sequence has a cluster point. ♦ A compact metric space and the space Rn have the Bolzano–Weierstrass property, while infinite discrete metric spaces, the space Q, and the infinite dimensional space C [0, 1] , k · k∞ do not. 8.6.5 Proposition. (a) A metric space with the Bolzano–Weierstrass property is complete. (b) A metric space has the Bolzano–Weierstrass property iff every closed and bounded set is compact. Proof. For (a), use the fact that a Cauchy sequence is bounded and apply Exercise 8.1.9. Part (b) follows from 8.5.8. The following lemma may be proved using familiar ideas such as those found in 8.1.10. The details are left to the reader. 8.6.6 Lemma. Let (X, d) be compact and (Y, ρ) complete. For f, g ∈ C(X, Y ) define σ(f, g) = sup ρ(f (x), g(x)). x∈X

Then σ is a metric on C(X, Y ), and C(X, Y ) is complete in this metric. 8.6.7 Lemma. A compact metric space X has a countable dense subset of S∞ the form D = k=1 Fk , where Fk is a finite (1/k)-net for X. Proof. For each k ∈ N, the collection {B1/k (x) : x ∈ X} is an open cover of X, S∞hence has a finite subcover {B1/k (x) : x ∈ Fk }. By definition of ε-net, k=1 Fk is dense in X. 8.6.8 Arzelà–Ascoli Theorem. Let X be compact and let Y have the Bolzano–Weierstrass property. Then a set F is compact in C(X, Y ), σ iff it is closed, bounded, and equicontinuous. Proof. Suppose F is compact in C(X, Y ), hence closed and bounded. If F is not equicontinuous at some a ∈ X, then there exists an ε > 0 and for every n members xn of X and fn of F such that d(xn , a) < 1/n and ρ(fn (xn ), f (a)) ≥ ε.

(8.6)

By compactness of F, we may assume that {fn } converges uniformly to some f ∈ F (otherwise, take a subsequence). Since xn → a, the uniform convergence of {fn } implies that fn (xn ) → f (a). But this contradicts (8.6). Therefore, F is equicontinuous.

Metric Spaces

265

Conversely, assume that F is closed, bounded, and equicontinuous and let {fn } be any sequence in F. We show that {fn } has a convergent subsequence. The compactness of F will then follow from 8.5.8. Let Fk and D = {x1 , x2 , . . .} be as in 8.6.7. We show first that {fn } has a subsequence that converges pointwise on D. For this we use the Bolzano– Weierstrass property of Y and the following diagonalization argument: Because (1) (0) {fn } is bounded, we may choose a subsequence {fn } of {fn := fn } such (1) that the sequence {fn (x1 )} converges to some y1 ∈ Y . We may then choose (2) (1) (2) a subsequence {fn } of {fn } such that {fn (x2 )} converges to some y2 ∈ Y . (k) Continuing in this way, we obtain for each k a sequence {fn } such that (k+1) (k) (k) {fn } is a subsequence of {fn } and limn fn (xk ) = yk . Now take the (n) diagonal sequence {gn := fn }, which is a subsequence of {fn } and for each (k) k, except for the first k − 1 terms, is a subsequence of {fn }. It follows that limn gn (xk ) = yk for each k. The scheme may be depicted as follows: (1)

(1)

→ y1 at x1

(2)

(2)

→ y2 at x2

f1 , f2 , . . . , fn(1) , . . . f1 , f2 , . . . , fn(2) , . . . .. . (n)

(n)

→ yn at xn

f1 , f2 , . . . , fn(n) , . . . .. .

& yk

at each xk

Having obtained a subsequence {gn } of {fn } that converges pointwise on the dense set D, we now show that {gn } converges uniformly on X, which will complete the proof. By the uniform equicontinuity of {gn }, given ε > 0, we may choose δ > 0 such that ρ gn (x), gn (y) < ε/3, for all n ∈ N and x, y ∈ X with d(x, y) < δ. (8.7) Let k > 1/δ. Since {gn } converges pointwise on Fk and Fk is finite, we may choose Nk so that ρ gn (y), gm (y) < ε/3, for all n, m ≥ Nk and all y ∈ Fk . (8.8) Since Fk is a δ-net, given x ∈ X, there exists y ∈ Fk such that d(x, y) < δ. It follows from (8.7) and (8.8) that for m, n ≥ Nk , ρ gn (x), gm (x) ≤ ρ gn (x), gn (y) + ρ gn (y), gm (y) + ρ gm (y), gm (x) < ε/3 + ε/3 + ε/3 = ε. Since x was arbitrary, {gn } is a Cauchy sequence in C(X, Y ). Since C(X, Y ) is complete, {gn } converges in C(X, Y ).

266

A Course in Real Analysis

Remark. The proof of the sufficiency of the theorem did not require that F be uniformly bounded. All that was used was the property of pointwise boundedness, that is, {f (x) : f ∈ F} bounded in Y for each x ∈ X. Uniform boundedness is then a consequence of equicontinuity. ♦ 8.6.9 Example. Let X be compact. Then any convergent sequence of functions fn in C(X, R), say fn → f , is equicontinuous. This may be verified directly, but a quick proof uses 8.6.8 applied the set {f, f1 , f2 , . . .}, whose compactness is readily established. ♦

Exercises 1. Let X × Y have the product metric η := d × ρ and let f : X → Y . The graph of f is the set G(f ) = {(x, y) : x ∈ X and y = f (x)}. Prove that if f is continuous, then G(f ) is closed in X × Y . Conversely, prove that if G(f ) is closed, f (X) is bounded, and Y has the Bolzano– Weierstrass property, then f is continuous. Give an example of a realvalued discontinuous function on [0, 1] with a closed graph. 2. Let X have the Bolzano–Weierstrass property and let {xn } be a bounded sequence in X with only finitely many cluster points y1 , . . . , yk . Prove that the set C := {y1 , . . . , yk , x1 , x2 , . . .} is compact. 3.S Prove that a subset F of C(X, Y ) is equicontinuous at a ∈ X iff for any sequences {fn } in F and {xn } in X with xn → a, ρ fn (xn ), fn (a) → 0. 4. Prove that a subset F of C(X, Y ) is uniformly equicontinuous on E ⊆ X iff for any sequences {fn } in F and {xn }, {an } in E with d(xn , an ) → 0, ρ fn (xn ), fn (an ) → 0. 5. Prove that a finite set of uniformly continuous functions f : X → Y is uniformly equicontinuous. 6. Prove that the uniform closure of a set F ⊆ C(X, Y ) of uniformly equicontinuous functions is uniformly equicontinuous. 7.S Let c, p > 0 and define fn (x) = (nx)−p , x ≥ c. Show that the sequence {fn } is uniformly equicontinuous. 8. Define fn (x) = ln(n + x). Show that the sequence {fn } is uniformly equicontinuous on (0, +∞). 9.S Define fn (x) = sin(nx). Use Exercise 3 and Exercise 8.3.13 to show that the sequence {fn } is not equicontinuous at any nonzero rational number r.

Metric Spaces

267

10. Let M > 0 and define RM := {f : f is locally integrable on [0, +∞) and kf k∞ ≤ M } . For f ∈ RM define Ff (x) =

Z

x

f, x ≥ 0.

0

Prove that the set F := {Ff : f ∈ RM } is uniformly equicontinuous on [0, +∞). 11.S Let M > 0 and define DM := {f : (a, b) → R : |f 0 (x)| ≤ M for all a < x < b} . Show that DM is uniformly equicontinuous. Conclude that if g has a bounded derivative on R, then the set of functions {gt : t ∈ R} is uniformly equicontinuous on I, where gt (x) = g(t + x). 12. Let f : X × Y → R have the property that f (x, y) is continuous in y for each fixed x and continuous in x for each fixed y. Define F := {f ( · , y) : y ∈ Y } . Prove: (a) If F is equicontinuous, then f is continuous. (b) If f is continuous and Y is compact, then F is equicontinuous. 13. Let X be compact. Show that a totally bounded subset of C(X, Y ) is uniformly equicontinuous. 14.S Let {fi : i ∈ I} be a uniformly bounded subset of Rba . Define Z x Fi (x) := fi (t) dt, a ≤ x ≤ b. a

Show that {Fi : i ∈ I} is a totally bounded subset of C([a, b]). 15. Let f (t, x, y) be continuous on [a, b]3 and define ft (x, y) = f (t, x, y). Prove that the family {ft : t ∈ [a, b]} is uniformly equicontinuous on [a, b]2 . Apply this to the function f (t, x, y) =

1 + t sin x on [0, 1]3 . 2 + t sin y

268

8.7

A Course in Real Analysis

Connected Sets

Throughout this section, (X, d) and (Y, ρ) denote arbitrary metric spaces. 8.7.1 Definition. A pair (U, V ) of open sets in X is said to separate X if X = U ∪ V, U 6= ∅, V 6= ∅, and U ∩ V = ∅. The pair (U, V ) is then called a separation of X. The space X is said to be connected if it has no separation, and disconnected otherwise. A subset E of X is connected if it is connected as a subspace of X. ♦ It follows from the definition that if E is disconnected, then there exist sets U , V open in X such that (E ∩ U, E ∩ V ) is a separation of E. The sets U and V need not be disjoint in this definition; however the next theorem shows that this useful state of affairs may always be achieved. In this case we shall call (U, V ) a separation of E. 8.7.2 Theorem. A subset E of X is disconnected iff there exists a separation (E ∩ U, E ∩ V ) of E such that U ∩ V = ∅.

U

E

V

FIGURE 8.7: A separation (U, V ) of E. Proof. The sufficiency is clear. For the necessity, assume that E is disconnected and that (E ∩ U1 , E ∩ V1 ) is a separation of E. Here, U1 and V1 are open in X but may not be disjoint. However, since E ∩ U1 and E ∩ V1 are disjoint, clE (E ∩ U1 ) ∩ V1 = ∅. Indeed, if, to the contrary, x ∈ clE (E ∩ U1 ) ∩ V1 for some x, then there would be a sequence {xn } in E ∩ U1 converging to x, which would imply that eventually xn ∈ E ∩ V1 , impossible. Recalling that clE (E ∩ U1 ) = E ∩ clX (U1 ), we now see that v 6∈ clX (U1 ) for each v ∈ E ∩ V1 . Similarly,

u 6∈ clX (V1 ) for each u ∈ E ∩ U1 .

By Exercise 8.5.14 it follows that for u ∈ E ∩ U1 and v ∈ E ∩ V1 the distances r(u) := inf{d(u, x) : x ∈ clX (V1 )}

and s(v) := inf{d(v, x) : x ∈ clX (U1 )}

Metric Spaces are positive. Define [ U= u∈E∩U1

269

[

Br(u)/2 (u), and V =

Bs(v)/2 (v).

v∈E∩V1

Clearly, U and V are open in X and contain E ∩ U1 and E ∩ V1 , respectively. To prove that (U, V ) is a separation of E, it remains to show that U ∩ V = ∅. Suppose the the contrary that there exists a point x ∈ U ∩ V . Then, by the above, d(x, u) < r(u)/2 for some u ∈ U1 and d(x, v) < s(v)/2 for some v ∈ V1 . Adding and using the triangle inequality we have d(u, v) < r(u)/2 + s(v)/2. On the other hand, by definition of r(u) and s(v), d(u, v) ≥ r(u) and d(u, v) ≥ s(v), hence

d(u, v) ≥ r(u) + s(v) /2

This contradiction shows that U ∩ V = ∅ and completes the proof of the theorem. In any metric space, the empty set and the singletons {x} are trivially connected, but no other finite subsets are connected. In a discrete space the only connected sets are the empty set and The set Q is not √ the singletons. √ connected in R, since the open sets (−∞, 2) and ( 2, +∞) separate Q. 8.7.3 Theorem. X is not connected iff there exists a continuous function from X onto {0, 1}. Equivalently, X is connected iff every continuous function from X into {0, 1} is constant. Proof. Assume that X is not connected and let (U, V ) separate X. Define ( 0 if x ∈ U , g(x) = 1 if x ∈ V . Then g maps X onto {0, 1}. Let W be any open set in R. Then g −1 (W ) is one of the sets ∅, U , V , or X, each of which is open in X. Therefore, g is continuous. Conversely, if a continuous function g from X onto {0, 1} exists, then the open sets g −1 ((−1, 1/2)) and g −1 ((1/2, 2)) separate X. 8.7.4 Corollary. The nonempty connected subsets of R are the intervals.

270

A Course in Real Analysis

Proof. By the intermediate value theorem, there can be no continuous function from an interval onto {0, 1}. Hence intervals must be connected. Now let E be a nonempty subset of R that is not an interval. Choose real numbers a < c < b with a, b ∈ E but c 6∈ E. Then (−∞, c) and (c, +∞) separate E, hence E is not connected. The following is a generalization of the intermediate value theorem. 8.7.5 Corollary. If f : X → Y is continuous and X is connected, then f (X) is connected. Proof. Let g : f (X) → {0, 1} be continuous. Then g ◦ f : X → {0, 1} is continuous and hence must be constant. It follows that g itself must be constant. 8.7.6 Corollary. If A ⊆ X is connected and A ⊆ B ⊆ cl(A), then B is connected. In particular, the closure of a connected set is connected. Proof. Let g : B → {0, 1} be continuous. Then g|A is continuous, hence must be constant. Since B ⊆ cl(A), g itself must be constant. Therefore, A is connected. The converse of 8.7.6 is false. For example, cl(Q) = R is connected but Q is not. 8.7.7 Definition. A path in X from x to y is a continuous function ϕ from an interval [a, b] to X such that ϕ(a) = x, the initial point of the path, and ϕ(b) = y, the terminal point. X is said to be path connected if for each pair of points x, y ∈ X there exists a path in X from x to y. A subset E of X is path connected if it is path connected as a subspace of X. ♦ Note that if ϕ : [a, b] → X is a path from x to y, then −ϕ(t) := ϕ(−t), −b ≤ t ≤ −a, defines a path from y to x. Also, if ϑ : [c, d] → X is a path from y to z, then the sum or concatenation ϕ + ϑ : [0, 2] → X of the paths ϕ and ϑ is a path from x to z, where ( ϕ a + (b − a)t if 0 ≤ t ≤ 1, (ϕ + ϑ)(t) = ϑ c + (d − c)(t − 1) if 1 ≤ t ≤ 2. A convex subset C of a normed vector X is path connected. Indeed, if x, y ∈ C, then the line segment ϕ(t) := (1 − t)x + ty,

0 ≤ t ≤ 1,

joins x to y and lies in C. In particular, open and closed balls in X are path connected.

Metric Spaces

271

8.7.8 Theorem. If X is path connected, then it is connected. Proof. Let g : X → {0, 1} be a continuous function, let x, y ∈ X, and let ϕ[a, b] → X be a path from x to y. Then g ◦ ϕ : [a, b] → {0, 1} is continuous and, because [a, b] is connected, must be constant. In particular, g(x) = (g ◦ α)(a) = (g ◦ α)(b) = g(y). Since x and y were arbitrary, g is constant. 8.7.9 Example. The subset B1 (−1, 0)∪B1 (1, 0) of R2 is not connected, hence not path connected.

x

y

(−1, 0)

(1, 0)

C1 (−1, 0)

C1 (1, 0)

FIGURE 8.8: C1 (−1, 0) ∪ C1 (1, 0) is path connected. However, its closure C1 (−1, 0) ∪ C1 (1, 0) is path connected, as can be seen from the figure, hence is connected. ♦ 8.7.10 Example. A sphere in Rn , n > 1, is path connected, hence connected. For example, consider the sphere S = {x ∈ Rn : kxk2 = 1} . We show that there is a path from the point a = (1, 0, . . . , 0) to any point b = (b1 , b2 , . . . , bn ). It will then follow that any pair of points in S may be joined by a path in S through a. If b = (−1, 0, . . . , 0), then (cos t, sin t, 0, . . . , 0), 0 ≤ t ≤ π, is such a path. Suppose b 6= (−1, 0, . . . , 0). Then the line segment ϕ(t) = (1 − t)a + tb = (1 − t + tb1 , tb2 , . . . , tbn ), 0 ≤ t ≤ 1, is never zero, hence kϕ(t)k−1 2 ϕ(t) is a path from a to b in S.

♦

The converse of 8.7.8 is false, as the following example—the topologist’s sine curve (8.3.7)—demonstrates.

272

A Course in Real Analysis

8.7.11 Example. Let A = {(x, sin(1/x)) : 0 < x < 2/π}, B = {0} × [−1, 1], and E = A ∪ B. Since A is connected and E = cl(A), 8.7.6 shows that E is connected. However, E is not path connected. Indeed, no point in A can be joined to a point in B by a path in E. Suppose such a path existed, say ϕ : [a, b] → E, where ϕ(t) = x(t), y(t) , ϕ(a) ∈ A, and ϕ(b) ∈ B. Let

S := t ∈ [a, b] : ϕ [a, t] ⊆ A .

Since S is nonempty and bounded, c := sup S exists and c ∈ [a, b]. Note that x(t) > 0 on S. If x(c) > 0, then c < b, hence, by continuity, x(s) is positive on [a, c + δ] for some δ > 0, contradicting the definition of c. Therefore, x(c) = 0 and x(t) > 0 on [a, c). This implies that ϕ(t) = x(t), sin(1/x(t)) on [a, c) and limt→c− x(t) = 0. By continuity, for each δ > 0 the set x([c − δ, c]) is an interval of the form [0, d], d > 0. Therefore, y(t) = sin(1/x(t)) takes on all values in [−1, 1] on each interval [c − δ, c), which implies that limt→c− y(t) cannot exist. But this contradicts the continuity of ϕ at c. ♦ While there is no strict converse to 8.7.8, the next theorem provides a partial converse. 8.7.12 Theorem. An open connected subset E of a normed vector space X is path connected. Proof. Fix a point x ∈ E and let U denote the set of all points u ∈ E for which there exists a path in E from x to u. We claim that U is open. Let

u u0 x E

Br (u0 )

FIGURE 8.9: E is path connected. u0 ∈ U and choose r > 0 such that Br (u0 ) ⊆ E. By definition of U , there exists a path in E from x to u0 . Since Br (u0 ) is convex, there exists a line segment in Br (u0 ) from u0 to any point u ∈ Br (u0 ). The sum of these paths is then a path in E from x to u. Therefore, Br (u0 ) ⊆ U , which shows that U is open. A similar argument shows that V := E \ U is open. Since E is connected and x ∈ U , V = ∅. Therefore, E = U .

Metric Spaces

273

Exercises 1. Determine which sets are connected in R2 : (a) B1 (−1, 0) ∪ {(0, 0)} ∪ B1 (1, 0). (b) R2 \ {(1/m, 1/n) : m, n ∈ N}. (c)S Q2 . (d)S R2 \ Q2 . (e)S {(x, sin(1/x)) : x 6= 0} ∪ {(0, a)}. (f) R2 \ G, where G is the graph of a bounded function f : [a, b] → R. (g) R2 \ G, where G is the graph of an equation F (x, y) = 0. (h) {(x, y, z) : x2 + y 2 − z 2 = 1}. (i) {(x, y, z) : x2 + y 2 − z 2 = −1}. (j) {(x, y, z) : x2 + y 2 − z 2 = 0, 0 < x2 + y 2 ≤ 1}. 2. Prove that a metric space X is connected iff it has no proper nonempty subset that is both open and closed. 3. Prove that X is connected iff it cannot be expressed as the union of nonempty sets A and B such that A ∩ clX (B) = clX (A) ∩ B = ∅. Hint. Use 8.7.3 and the sequential characterization of continuity. 4. Prove that X × Y is connected in the product metric d × ρ iff X and Y are connected. 5.S Let X be connected and f : X → R continuous. Suppose there exist u, v ∈ X such that f (u)f (v) < 0. Show that the equation f (x) = 0 has a solution. 6. Let X be connected and f : X → Y continuous. Suppose f has the property that for each x ∈ X there exists ε > 0, possibly depending on x, such that f is constant on Bε (x). Prove that f is constant on X. 7.S Let X be connected and let g, h : X → R be continuous such that g(x) 6= h(x) for all x ∈ X. Prove that g > h or h > g on X. 8. ⇓6 Let X be a normed vector space and u, v ∈ X . A polygonal path P from u to v is a finite sequence of line segments Lk = [xk : xk+1 ], k = 1, . . . , n − 1, where x1 = u and xn = v. The path P is nonoverlapping if Lj ∩ Lk = ∅ unless j = k − 1, in which case Lj ∩ Lk = xk . A subset E of a normed vector space X is polygonally connected if for 6 This

exercise will be used in 12.2.10.

274

A Course in Real Analysis each pair of points u and v in E there exists a polygonal path from u to v contained in E. For example, a convex set is polygonally connected. Prove that every open connected subset E of X is polygonally connected. Show also that it is always possible to choose P to be non-overlapping

9.S Show that for n > 1 the complement of an open ball or a closed ball in Rn is path connected, hence connected. 10. Suppose A ⊆ X is connected. By 8.7.6, cl(A) is connected. Prove or disprove: (a) int(A) is connected, (b) bd(A) is connected. 11. The exterior ext(E) of a subset E of a metric space X is defined as the interior of E c . Show that X = int(E) ∪ bd(E) ∪ ext(E). Conclude that X is connected iff every subset of X with nonempty interior and nonempty exterior also has a nonempty boundary. 12.S Let {An } be a finite or infinite sequence of connected subsets of X such S that An ∩ An+1 6= ∅ for each n. Prove that n An is connected. 13. Let {Ai : i ∈ I} be a collection of nonempty S connected sets and i0 ∈ I such that Ai ∩ Ai0 6= ∅ for all i. Prove that i Ai is connected. of compact connected subsets of X 14. Let {An } be an infinite sequence T such that An+1 ⊆ An . Prove that n An is connected. 15. Let X = A1 ∪ · · · ∪ Ap and Y = B1 ∪ · · · ∪ Bq , p < q, where Aj and Bj are connected, the Aj ’s are pairwise disjoint, and the Bj ’s are pairwise disjoint and closed. Show that no continuous function f : X → Y can map X onto Y . 16.S Prove that no one-to-one continuous function can map a closed line segment L onto a circle C. Show, however, that there are continuous functions that can do this. 17. Suppose closed line segments L1 , L2 , L3 in the plane meet at a single endpoint P . Show that no one-to-one continuous function can map a closed line segment L onto L1 ∪ L2 ∪ L3 . Show, however, that there are continuous functions that can do this. 18. Let C1 and C2 be tangent circles in the plane. Show that no one-to-one continuous function can map C1 ∪ C2 onto a circle C. Show, however, that there are continuous functions that can do this. 19. Show that no one-to-one continuous function can map the set E := {(x, y, z) : x2 + y 2 = z 2 , x2 + y 2 ≤ 1} onto a closed disk D. Show, however, that there are continuous functions that can do this.

Metric Spaces

275

20.S Let X be a normed vector space and f : X → R continuous. Let A := {x ∈ X : f (x) ≥ c} and B := {x ∈ X : f (x) = c}. Prove that bd(A) ⊆ B and that the inclusion may be strict. 21. Let X be connected and have at least two points. Show that X is uncountable. Hint. For all sufficiently small r > 0, X 6= Br (x) ∪ Crc (x). 22.S Let U be an open subset of a normed vector space X and let x ∈ U . The component of U containing x is the union Cx of all connected subsets of U containing x. (a) Prove that Cx is open and connected and that U is a union of pairwise disjoint components. (b) Show that the number of components is countable if X is a Euclidean space Rn . 23. Let (X, d) be complete, (Y, ρ) connected, c > 0, and let f : X → Y be a continuous mapping such that f (X) is open and ρ f (u), f (v) ≥ c d(u, v) for all u, v ∈ X. Prove that Y is complete.

8.8

The Stone–Weierstrass Theorem

Let (X, d) be a compact metric space and let C(X) denote the space of all continuous real-valued functions of X with the supremum norm kf k∞ = supx∈X |f (x)|. A member f of C(X) is said to be uniformly approximated by members of a subset S of C(X) if f ∈ cl(S). This is equivalent to the existence of a sequence {fn } in S converging uniformly to f on X. Weierstrass’s approximation theorem asserts that any function in C [a, b]) may be uniformly approximated by polynomials. Stone’s generalization of Weierstrass’s theorem replaces [a, b] by a compact metric space7 and the set of polynomials by a more general class of functions. The proof of Weierstrass’s theorem given below is due to Lebesgue. The basic idea is to show that every continuous function may be uniformly approximated by piecewise linear functions and that these in turn may be uniformly approximated by polynomials. 7 more

generally, by a compact Hausdorff topological space.

276

A Course in Real Analysis

8.8.1 Definition. Let a = x0 < x1 < . . . < xk = b. A function g on [a, b] is said to be piecewise linear with vertices (xj , yj ) if, for j = 0, 1, . . . , k − 1, g(x) = yj + mj (x − xj ), mj =

yj+1 − yj , xj ≤ x ≤ xj+1 . xj+1 − xj

♦

Note that a piecewise linear function is necessarily continuous and that its graph consists of a sequence of line segments joined at the vertices. (See Figure 8.10.) y y3 y5 y1 y2 y0 y4 a

x1

x2

x3

x4

x

b

FIGURE 8.10: A piecewise linear function. 8.8.2 Lemma. Every continuous function f on [a, b] may be uniformly approximated by a piecewise linear function. Proof. Given ε > 0, choose δ > 0 such that |f (x) − f (y)| < ε/2 whenever |x − y| ≤ δ. Let x0 = a < x1 < · · · < xk = b be a partition of [a, b] with mesh < δ and let g be as in 8.8.1 with yj = f (xj ). If xj ≤ x ≤ xj+1 , then |mj |(x − xj ) = |f (xj+1 ) − f (xj )| hence

x − xj ≤ |f (xj+1 ) − f (xj )| < ε/2, xj+1 − xj

|f (x) − g(x)| ≤ |f (x) − f (xj )| + |mj |(x − xj ) < ε.

8.8.3 Lemma. The function g in 8.8.1 may be written g(x) = y0 +

k−1 X

cj (x − xj )+ , a ≤ x ≤ b,

j=0

for suitably chosen constants cj . Proof. For 0 ≤ j ≤ k − 1 and xj ≤ x ≤ xj+1 , the desired equation reduces to yj + mj (x − xj ) = y0 +

j X i=0

ci (x − xi ) = y0 −

j X i=0

ci xi + x

j X i=0

ci .

Metric Spaces

277

This holds iff mj =

j X

ci , and yj − mj xj = y0 −

i=0

j X

ci x i .

(8.9)

i=0

The first equation in (8.9) is satisfied by taking c0 = m0 and cj = mj − mj−1 , j ≥ 1. For this choice, y0 −

j X

ci xi = y0 +

i=0

j X

mi−1 xi −

i=1

= y0 − mj xj +

j X

m i xi

i=0 j−1 X

mi (xi+1 − xi )

i=0

= y0 − mj xj +

j−1 X (yi+1 − yi ) i=0

= yj − mj xj , which shows that the second equation in (8.9) is also satisfied. 8.8.4 Lemma. The functions |x| and x+ may be uniformly approximated by polynomials on any bounded interval I. Proof. By 7.4.10, the binomial series ∞ X 1/2 (−t)n n n=0 converges uniformly to

√

1 − t on [−1, 1]. Setting t = 1 − x2 we see that ∞ X 1/2 (x2 − 1)n n n=0

√ converges uniformly to x2 = |x| on [−1, 1]. Thus if sn (x) denotes the nth partial sum of the last series and m is chosen so that I ⊆ [−m, m], then Qn (x) := msn (x/m) defines a sequence of polynomials converging uniformly to |x| on I. Since x+ = 12 (x + |x|), the polynomials Pn (x) := 12 x + Qn (x) converge uniformly to x+ on I. 8.8.5 Weierstrass Approximation Theorem. The set of all polynomials on [a, b] is dense in C([a, b]). That is, every member of C([a, b]) may be uniformly approximated by polynomials. Proof. Let f ∈ C([a, b]) and ε > 0. By 8.8.2, there exists a piecewise linear function g on [a, b] such that kf − gk∞ < ε/2. By 8.8.3 and 8.8.4, there exists a polynomial P such that kP − gk∞ < ε/2. Then, by the triangle inequality, kf − P k∞ < ε.

278

A Course in Real Analysis

For the statement of the Stone–Weierstrass theorem, we need the following definitions. 8.8.6 Definition. A collection A of real-valued functions on a set S is said to be an algebra if A is closed under addition, multiplication, and scalar multiplication; that is, f, g ∈ A and α ∈ R ⇒ f + g, f g, αf ∈ A. A is said to separate points of S if for each pair of distinct points s and t in S there exists f ∈ A such that f (s) 6= f (t). ♦ For example, the collection of all polynomials on [a, b] is an algebra that separates points of [a, b]. 8.8.7 Stone–Weierstrass Theorem. Let X be a compact metric space and let A be an algebra in C(X) that contains the constant functions and separates points of X. Then A is dense in C(X). Proof. Set B := cl(A). The proof that B = C(X) consists of the following sequence of steps. I. B is an algebra in C(X). J If fn , gn ∈ A, fn → f , gn → g, and α ∈ R, then (a) kαfn − αf k∞ = |α| |fn − f k∞ → 0,

(b) k(fn + gn ) − (f + g)k∞ ≤ kfn − f k∞ + kgn − gk∞ → 0, and (c) kfn gn − f gk∞ ≤ kfn gn − f gn k∞ + kf gn − f gk∞ ≤ kgn k∞ kfn − f k∞ + kf k∞ kgn − gk∞ → 0, the convergence in (c) holding because {gn } is uniformly bounded. (Each gn is bounded and gn converges uniformly to a bounded function.) Thus B is closed under addition, multiplication, and scalar multiplication. K

II. f ∈ B ⇒ |f | ∈ B.

J Let M = kf k∞ . By 8.8.4 there exists a sequence of polynomials Pn (x) converging uniformly to |x| on [−M, M ]. It follows that Pn ◦ f converges uniformly to |f | on X. Because B is an algebra containing the constants, Pk Pk Pn ◦ f ∈ B. Indeed, if Pn (x) = j=0 aj xj , then Pn ◦ f = j=0 aj f j . Since B is closed, |f | ∈ B. K

III. f1 , . . . , fk ∈ B ⇒ max{f1 , . . . , fk }, min{f1 , . . . , fk } ∈ B.

J By induction, it suffices to consider the case k = 2. This follows from step II and the identities max{f1 , f2 } = 12 f1 + f2 + |f1 − f2 | , min{f1 , f2 } = 12 f1 + f2 − |f1 − f2 | . K

Metric Spaces

279

IV. Let f ∈ C(X). Then for each pair of distinct points x, y in X there exists a function gxy ∈ A such that gxy (x) = f (x) and gxy (y) = f (y). J Choose a function h ∈ A such that h(x) 6= h(y) (A separates points). Define gxy (z) = f (x) +

f (x) − f (y) h(z) − h(x) , z ∈ X. h(x) − h(y)

Because A contains the constant functions, gxy ∈ A. Clearly, gxy (x) = f (x) and gxy (y) = f (y). K

V. If f ∈ C(X), x ∈ X, and ε > 0, then there exists a function gx ∈ B such that gx (x) = f (x) and gx (z) < f (z) + ε for all z ∈ X. J By continuity, for each y ∈ X the set Uy := {z ∈ X : gxy (z) < f (z) + ε} is open in X, where gxy is the function in step IV. Moreover, Uy contains both x and y. Since X is compact, there exist y1 , . . . yk ∈ X such that X = Uy1 ∪ · · · ∪ Uyk . Set gx := min{gxy1 , . . . , gxyk }. Then gx clearly has the required properties and, by step III, gx ∈ B. K

VI. If f ∈ C(X) and ε > 0, then there exists a function g ∈ B such that f (z) − ε < g(z) < f (z) + ε, for all z ∈ X. J By continuity, for each x ∈ X the set Vx := {z ∈ X : gx (z) > f (z) − ε}

is open in X, where gx is the function in step V. Moreover, Vx clearly contains x and f (z) − ε < gx (z) < f (z) + ε, for all z ∈ Vx . Since X is compact, there exist x1 , . . . , xm ∈ X such that X = V x1 ∪ · · · ∪ V xk . Set g := max{gx1 , . . . , gxm }. By step III, g ∈ B, and g clearly satisfies the desired inequality. K

To complete the proof of the theorem, observe that step VI asserts that C(X) = cl(B). Since B is closed, C(X) = B.

280

A Course in Real Analysis

8.8.8 Example. A trigonometric polynomial is a function on R of the form T (x) = a0 +

m X

aj cos(jx) + bj sin(jx),

aj , bj ∈ R.

j=1

The collection T ([a, b]) of all trigonometric polynomials on the interval [a, b] clearly contains the constant functions and is closed under addition and scalar multiplication. Since sin jx sin kx = 12 sin(j − k)x + sin(j + k)x , with similar identities holding for sin jx cos kx and cos jx cos kx, T ([a, b]) is an algebra. If 0 < b − a < 2π, then {cos x, sin x}, and hence T ([a, b]), separate points of [a, b]. By the Stone–Weierstrass theorem, every member of C([a, b]) may be uniformly approximated by trigonometric polynomials on [a, b]. If b − a = 2π, then T ([a, b]) no longer separates points of [a, b]. However, in this case every member f of C([a, b]) with f (a) = f (b) may be uniformly approximated by a trigonometric polynomial. We verify this for the interval [0, 2π]. Let E denote the algebra of continuous functions f : [0, 2π] → R with f (0) = f (2π), and let X denote the circle x2 + y 2 = 1 with the Euclidean R2 metric. For each f ∈ E, define Ff : X → R by Ff (cos t, sin t) = f (t),

0 ≤ t ≤ 2π.

It is straightforward to verify that Ff is continuous. For example, if (cos tn , sin tn ) → (1, 0), then every convergent subsequence {tnk } converges either to 0 or to 2π, hence Ff (cos tnk , sin tnk ) = f (tnk ) → f (0) = f (1) = Ff (1, 0). The set

A := {FT : T ∈ T ([0, 2π])}

is easily seen to be an algebra that contains the constant functions. Moreover, A separates points of X. Indeed, if x := (cos s, sin s) and y := (cos t, sin t) with x = 6 y, then, say, cos s 6= cos t hence FT (x) 6= FT (y), where T (x) = cos x. Therefore, each Ff may be uniformly approximated on X by members of A. It follows that each member of E may be uniformly approximated on [0, 2π] by trigonometric polynomials. ♦

Exercises 1. Give an example of a bounded continuous function that cannot be approximated uniformly by polynomials on (0, 1). 2. Let f be continuous on [a, +∞) such that limx→+∞ f (m) (x) 6= 0 for all sufficiently large m ∈ N. Prove that f cannot be uniformly approximated by polynomials on [a, +∞). Give an example of such a function.

Metric Spaces

281

Rb 3.S Let f ∈ C([a, b]) have the property that a xn f (x) dx = 0 for all n ∈ Z+ . Prove that f = 0 on [a, b]. Show that if a ≥ 0, then it is enough that the given property holds for even integers n in Z+ . 4. Let f : [a, b] → R have continuous derivatives up to order k such that Z

b

xn f (k) (x) dx = 0 for all n ∈ Z+ .

a

Prove that f is a polynomial. 5. Let f : [a, b] → R have continuous derivatives up to order k. Prove that (j) there exists a sequence of polynomials Pn such that limn Pn = f (j) uniformly on [a, b] for j = 0, 1, . . . , k. 6.S Let X be compact and let A be an algebra in C(X) that contains the constant functions and separates the points of X. Let x0 ∈ X and let f ∈ C(X) satisfy f (x0 ) = 0. Prove that there exists a sequence fn ∈ A converging uniformly to f such that fn (x0 ) = 0 for all n. 7. Show that there exists a sequence of polynomials Pn converging uniformly to sin x on [0, π] such that Pn (0) = Pn (π) = 0 for all n. 8. Let f be an odd (even) continuous function on [−a, a], a > 0. Prove that there is a sequence of odd (even) polynomials that converges uniformly to f on [−a, a]. 9.S Let f ∈ C([0, 2π]) have the properties f (0) = f (2π) and Z

2π

f (x) sinm x cosn x dx = 0 for all m, n ∈ Z+ .

0

Prove that f is identically zero on [0, 2π]. 10. Let f : R → R be continuous and periodic with period 2π. Prove that there exists a sequence of trigonometric polynomials that converges uniformly to f on R. 11.S Let f ∈ C([−π/2, π/2]) with f (0) = 0. Prove that f can Pmbe uniformly approximated on [−π/2, π/2] by functions of the form j=1 bj sin(jx). 12. Let g be continuous and one-to-one on [a, b]. Prove that any function in C [a, b] may be uniformly approximated by functions of the form Pm j j=0 aj g . 13. Prove the following version of the Stone–Weierstrass theorem: If V is a linear subspace of C(X) that contains the constant functions, separates points of X, and contains |f | for all f ∈ V, then V is dense in C(X).

282

A Course in Real Analysis

14. Show that for any f ∈ C([0, 2π]) there exists a sequence of trigonometric R 2π polynomials Tn such that 0 |f − Tn | → 0. 15.S Let X and Y be compact metric spaces and let f (x, y) ∈ C(X × Y ) be a continuous real-valued function on X × Y . Show that for every ε > 0 there exist g1 , . . . , gn ∈ C(X) and h1 , . . . , hn ∈ C(Y ) such that n X gi (x)hi (y) < ε for all (x, y) ∈ X × Y . f (x, y) − i=1

16. Let E0 denote the algebra of all continuous functions f [a, b] :→ R such that f (a) = f (b) = 0. If A0 is an algebra in E that separates points of (a, b) show that A0 is dense in E0 in the uniform norm. Hint. Use ideas of 8.8.8 by considering the algebra generated by A0 and the constant functions. 17. Let C0 (R) denote the algebra of all continuous functions f on R such that limt→±∞ f (t) = 0. Let B0 be an algebra in C0 (R) that separates points of R. Show that B0 is dense in C0 (R) in the uniform norm. Hint. Consider θ(t) = tan−1 [(t − π)/2], 0 < t < 2π and use Exercise 16.

*8.9

Baire’s Theorem

Let (X, d) be a metric space. The diameter d(E) of a nonempty subset E of X is defined by d(E) = sup d(x, y). x,y∈E

8.9.1 Lemma. If X is complete, then the intersection C of any decreasing sequence of nonempty closed sets Cn in X with d(Cn ) → 0 contains a single point. Proof. For each n choose a point xn ∈ Cn . If m > n, then xm ∈ Cn , hence d(xm , xn ) ≤ d(Cn ). Since d(Cn ) → 0, {xn } is Cauchy. Let xn → x. Since xn , xn+1 , . . . ∈ Cn and Cn is closed, x ∈ Cn for all n, that is, x ∈ C. Since d(C) ≤ d(Cn ) → 0, C = {x}. 8.9.2 Baire Category Theorem. Let X be a complete metric space. Then the following statements hold: T (a) If Un ⊆ X is open and dense in X for all n, then G := n Un is dense in X. S (b) If Cn ⊆ X is closed and has empty interior for all n, then F := n Cn has empty interior.

Metric Spaces

283

Proof. To prove (a), we show that B∩G 6= ∅ for any open ball B. Since B∩U1 is open and nonempty, C1 := Cr1 (x1 ) ⊆ B ∩ U1 for some x1 ∈ X and 0 < r1 ≤ 1. Since Br1 (x1 ) ∩ U2 is open and nonempty, C2 := Cr2 (x2 ) ⊆ Br1 (x1 ) ∩ U2 for some x2 ∈ X and 0 < r2 < 1/2. Continuing in this manner, we obtain a decreasing sequence of closed balls Cn ⊆ B ∩ Un with diameters tending to T zero. By 8.9.1, n Cn contains a point x. Then x ∈ B ∩ Un for all n, hence x ∈ B ∩ G. Part (b) follows from (a). T Indeed, suppose int(Cn ) = ∅ for all n. Then c Un := C is dense in X, hence n n Un is dense in X. It follows that the interior T of ( n Un )c = F is empty. We give three applications of Baire’s theorem. The first is known as the principle of uniform boundedness. 8.9.3 Theorem. Let X and Y be complete normed vector spaces and let L be a family of continuous linear transformations from X to Y such that sup kT xk < ∞ for each x ∈ X .

T ∈L

Then there exists M > 0 such that kT xk ≤ M kxk for all x ∈ X and T ∈ L. Proof. For each n, set Cn = {x ∈ X : kT xk ≤ n for all T ∈ L}. S By hypothesis, X = n Cn . By continuity of the transformations T , each Cn is closed. Therefore, Baire’s theorem shows that int(Cn ) 6= ∅ for some n. Thus there exists x0 and r > 0 such that kT yk ≤ n for all T ∈ L and y ∈ X with ky − x0 k ≤ r. If kxk ≤ r, then, taking y = x + x0 , we have kT xk ≤ kT x + T x0 k + kT x0 k = kT yk + kT x0 k ≤ n + kT x0 k. It follows that for all x 6= 0 and T ∈ L

T rx ≤ n + kT x0 k

kxk hence

kT xk ≤ r−1 n + kT x0 k kxk.

The following corollary is one of the few instances in analysis (Dini’s theorem being another) when pointwise convergence of a sequence of continuous functions is sufficient to convey the property continuity to the limit function. 8.9.4 Corollary. Let X and Y be complete normed vector spaces and let {Tn } be a sequence of continuous linear transformations from X to Y converging pointwise on X to a function T . Then T is linear and continuous.

284

A Course in Real Analysis

Proof. Linearity of T is clear. For continuity, note that supn kTn xk < +∞ for each x ∈ X , hence, by the theorem, there exists M > 0 such that kTn xk ≤ M kxk for all n and x. Letting n → +∞ yields kT xk ≤ M kxk, hence T is continuous. For the second application of Baire’s theorem, recall that there exist functions f : R → R whose set of discontinuity points is precisely Q (3.3.3). The obvious question raised by this fact is answered in the following theorem. 8.9.5 Theorem. There is no function f : R → R whose set of continuity points is precisely Q. Proof. For each n, let Un denote the union of all intervals (a, b) such that |f (x) − f (y)| < 1/n for all x, y ∈ (a, T b). Then Un is open and the set of ∞ continuity points of f is precisely C := n=1 Un . Suppose that C = Q. Then each Un contains Q and hence is dense in R. Let {r1 , r2 , . . .} be an enumeration of Q. Then the open sets Vm := R \ {rm } are also dense in R and have intersection I. By Baire’s theorem, the collection of sets {Un , Vm : m, n ∈ N} has a nonempty intersection. But this intersection is Q ∩ I = ∅. Therefore, C cannot equal Q. The last application of Baire’s theorem shows that there is a rich supply of continuous, nowhere differentiable functions. For the proof we need the following lemma. 8.9.6 Lemma. If g is piecewise linear on [a, b], then there exists M > 0 such that |g(x) − g(y)| ≤ M |x − y| for all x, y ∈ [a, b]. Proof. Let g be as in 8.8.1 and set M = maxj {|mj |}. If xi ≤ x ≤ xi+1 ≤ xj ≤ y ≤ xj+1 then |g(x) − g(y)| ≤ |g(x) − g(xi+1 )| + |g(xi+1 ) − g(xi+2 )| + · · · + |g(xj ) − g(y)| ≤ |mi |(xi+1 − x) + |mi+1 |(xi+2 − xi+1 ) + · · · + |mj |(y − xj ) ≤ M (y − x). 8.9.7 Theorem. The set of all continuous, nowhere differentiable functions on an interval [a, b] is dense in C([a, b]) in the uniform norm. Proof. For each n ∈ N and f ∈ C([a, b]) define En (f ) = {x ∈ [a, b] : |f (y) − f (x)| ≤ n|x − y| for all y ∈ [a, b]}. we break the proof into several steps:

Metric Spaces I.

S∞

n=1

285

En (f ) contains all points at which f is differentiable.

J Let x be such a point and choose δ > 0 such that f (y) − f (x) 0 − f (x) < 1 for all y ∈ [a, b] with 0 < |x − y| < δ. y−x Then ( |f (y) − f (x)| ≤

1 + |f 0 (x)| |y − x| if |x − y| < δ, −1 2kf k∞ ≤ 2δ kf k∞ |y − x| if |x − y| ≥ δ,

which shows that x ∈ En (f ) for all n > 1 + |f 0 (x)| + 2δ −1 kf k∞ . K

II. En := {f ∈ C([a, b]) : En (f ) 6= ∅} is closed in C([a, b]).

J Let {fk } be a sequence in En converging uniformly to f ∈ C([a, b]). For each k, choose a point xk ∈ En (fk ). We may assume that xk → x for some x ∈ [a, b] (otherwise, take a subsequence). Then for all y ∈ [a, b], |f (y) − f (x)| ≤ |f (y) − fk (y)| + |fk (y) − fk (xk )| + |fk (xk ) − fk (x)| + |fk (x) − f (x)| ≤ 2kf − fk k∞ + n|y − xk | + n|xk − x|. Letting k → ∞ shows that |f (y) − f (x)| ≤ n|y − x|, that is, x ∈ En (f ). Therefore, f ∈ En . K

III. Enc is dense in C([a, b]).

J Let f ∈ C([a, b]) and ε > 0. We construct a function h ∈ Bε (f )∩Enc . By 8.8.2, there exists a piecewise linear function g such that kf − gk∞ < ε/2. By 8.9.6, there exists M > 0 such that |g(x) − g(y)| ≤ M |x − y| for all x, y ∈ [a, b]. Let r > 0 and let x0 = a < x1 < · · · < x2p = b be a partition of [a, b] with mesh < r. Construct a “sawtooth” piecewise linear function hr with hr

c = |hr (x) − hr (xj )| ≥ 1, |x − xj | < r

1

x

c

x1 x0

x5

x3 x2

−1

x4

x7 x6

x8

x

xj FIGURE 8.11: The sawtooth function hr .

vertices (x0 , 1), (x2 , 1), . . . , (x2p , 1)

and (x1 , −1), (x3 , −1), . . . , (x2p−1 , −1),

286

A Course in Real Analysis and set h := g + εhr /2. Then kh − f k∞ ≤ kh − gk∞ + kg − f k∞ =

ε ε ε khr k∞ + kg − f k∞ < + = ε, 2 2 2

so h ∈ Bε (f ). To show that h ∈ Enc , let x be an arbitrary member of [a, b]. If hr (x) ≤ 0 (≥ 0) choose xj such that |x − xj | < r and hr (xj ) = 1 (= −1) (see Figure 8.11). Then |hr (x) − hr (xj )| ≥ 1, hence ε |hr (x) − hr (xj )| − |g(x) − g(xj )| 2 ε ≥ − M |x − xj | 2 ε ≥ − M |x − xj |. 2r

|h(x) − h(xj )| ≥

If r is chosen so that

ε − M > n, then x 6∈ En (h), hence h 6∈ En . K 2r

the proof note that by step III and Baire’s theorem, F := T∞To complete c E is dense in C([a, b]). Since f ∈ F implies that En (f ) = ∅ for every n=1 n n, and since a point at which f is differentiable must lie in some En (f ), no member of F can be differentiable at any point of [a, b].

Exercises 1.S Prove the converse of 8.9.1: If the intersection of any decreasing sequence of nonempty closed sets Cn in X with d(Cn ) → 0 contains a single point, then X is complete. Find a decreasing sequence of closed sets 2. Let Q have the usual metric. T Cn in Q with d(Cn ) → 0 and n Cn = ∅. 3.S Show that 8.9.2 does not hold in Q with the usual metric. 4. Let D = {x1 , x2 , . . .} be a proper subset of a complete metric space X. Show that (a) and (b) of 8.9.2 hold for Y := X \ D. Conclude that the set of irrationals I with the usual metric satisfies (a) and (b) of the theorem.

Chapter 9 Differentiation on Rn

For the remainder of the book, the Euclidean norm k · k2 on the spaces Rn will be denoted simply by k · k. In this chapter we extend the ideas of Chapter 4 to vector-valued functions of several variables. This will require some notions from linear algebra, a brief review of which may be found in Appendix B.

9.1

Definition of the Derivative

To motivate the general definition of the derivative of a function on Rn , we begin with two important special cases.

Derivative of a Vector-Valued Function of a Real Variable The definition of derivative in this case is a natural extension of the definition of the derivative of a scalar-valued function: 9.1.1 Definition. Let I ⊆ R be an interval and a ∈ I. A function f : I → Rm is said to be differentiable at a if the (vector) limit f 0 (a) := lim

h→0

f (a + h) − f (a) f (t) − f (a) = lim t→a h t−a

exists in Rm . (The limit is one-sided if a is an endpoint of I.) The vector f 0 (a) is called the derivative of f at a. If f is differentiable at each point in I, then f is said to be differentiable on I and the resulting function f 0 : I → Rm is called the derivative of f on I. ♦ The function f may be viewed as a parametrization of a curve C in Rm . The vector f 0 (a) is then called the tangent vector to C at the point f (a). If the variable t is interpreted as time, then C may be viewed as the path of a particle in Rm . In this context, f 0 (a) is called the velocity of the particle and kf 0 (a)k the speed. The curve is said to be smooth if f 0 is continuous and nonzero on I. Parameterized curves will be examined in detail in Chapter 12. 287

288

A Course in Real Analysis

Note that the function f : I → Rm may be written f = (f1 , . . . , fm ), where fj : I → R is the jth component function of f . 9.1.2 Proposition. Let I be an interval and f = (f1 , . . . , fm ) : I → Rm . Then f is differentiable at a ∈ I iff each fj is differentiable at a, in which case 0 f 0 (a) = (f10 (a), . . . , fm (a)). In particular, if f is differentiable at a, then f is continuous at a. Proof. The assertions follow directly from the inequalities 2

2

f (a + h) − f (a)

fj (a + h) − fj (a)

≤ − x − (x , . . . , x ) j 1 m

h h 2 m X fi (a + h) − fi (a) ≤ − xi . h i=1

The differential of f at a is the linear transformation dfa : R → Rm that takes a real number h to the vector hf 0 (a): dfa (h) = hf 0 (a), h ∈ R. Definition 9.1.1 may then be rephrased as follows: f is differentiable at a iff there exists a linear transformation T : R → Rm such that lim

h→0

f (a + h) − f (a) − T h = 0, |h|

in which case T = dfa

Derivative of a Real-Valued Function of Several Variables The derivative of a scalar-valued function of n variables is defined as follows: 9.1.3 Definition. Let U ⊆ Rn be open and a ∈ U . Then f : U → R is said to be differentiable at a if there exists a vector f 0 (a) in Rn such that f (a + h) − f (a) − f 0 (a) · h = 0. h→0 khk lim

(9.1)

The vector f 0 (a) is called the derivative of f at a. The differential of f at a is the linear transformation dfa ∈ L(Rn , R) defined by dfa (h) = f 0 (a) · h, Now let

h ∈ Rn .

♦

j

ej = (0, . . . , 0, 1, 0, . . . , 0), j = 1, . . . , n, denote the standard basis vectors in Rn . If f 0 (a) exists, then, taking h = tej in (9.1), we have f (a + tej ) − f (a) − tf 0 (a) · ej = 0, t→0 t lim

Differentiation on Rn

289

or, equivalently,

f (a + tej ) − f (a) = f 0 (a) · ej . (9.2) t→0 t The expression the right is just the jth component of f 0 (a). The limit on the left is called the jth partial derivative of f at a and is denoted variously by lim

∂j f = fxj =

∂f . ∂xj

We have proved the following result. 9.1.4 Proposition. If f is differentiable at a, then the partial derivatives ∂j f (a) of f exist at a and f 0 (a) = ∂1 f (a), ∂2 f (a), . . . , ∂n f (a) . (9.3) In particular, the derivative is unique. The vector on the right in (9.3) is called the gradient of f at a and is denoted by ∇f or grad f . The linear transformation dfa ∈ L(Rn , R) may now be written dfa (h) = ∇f (a) · h, h ∈ Rn . (9.4) For an alternate notation, let dxj : Rn → R be the linear function defined by dxj (h) = hj , h = (h1 , . . . , hn ). Then dfa may be expressed as dfa (h) =

n X ∂f (a) j=1

∂xj

dxj (h).

If the partial derivatives of f exist at each point of U , we write simply df =

n X ∂f dxj . ∂xj j=1

For example, d sin(x2 y) = 2xy cos(x2 y) dx + x2 cos(x2 y) dy. We show below that if f has continuous partial derivatives on U , then f is differentiable on U . The continuity hypothesis cannot be removed: There are functions f that are not differentiable on U but whose partial derivatives exist throughout U . This is the case for the function in the following example.

290

A Course in Real Analysis

9.1.5 Example. Let m ∈ N. The function m x y if (x, y) 6= (0, 0), f (x, y) = x2 + y 2 0 otherwise exhibits a variety of behavior depending on the values of m. The partial derivatives of f are m+1 y + mxm−1 y 3 − 2xm+1 y mx , if x 6= (0, 0), (x2 + y 2 )2 fx (x, y) = 0 otherwise, m 2 2 x (x − y ) , if x 6= (0, 0), (x2 + y 2 )2 fy (x, y) = 0 otherwise. If m = 1, f is not continuous at (0, 0), hence is not differentiable there (see 9.1.11, below). If m = 1 or 2, the partial derivatives exist at (0, 0) but are not continuous there. If m = 2, the function is continuous at (0, 0), with zero partial derivatives at (0, 0), but is not differentiable there since in this case the limit f (x) − f (0) − 0 · x x2 y lim ,= lim x→0 kxk (x,y)→(0,0) (x2 + y 2 )3/2 fails to exist. If m ≥ 3, f has continuous partial derivatives and is differentiable on R2 . ♦ The definition of the jth partial derivative of f at a may be written explicitly as ∂j f (a) = lim

h→0

f (a1 , . . . , aj + h, . . . , an ) − f (a1 , . . . , aj , . . . , an ) . h

This is simply the derivative at aj of the one-variable function t 7→ f (a1 , . . . , aj−1 , t, aj+1 , . . . , an ). Thus to find the jth partial derivative of f (x1 , . . . , xj , . . . , xn ), one simply differentiates f with respect to xj while holding the other variables fixed. It follows that the standard formulas for derivatives of functions of one variable hold for partial derivatives of functions of several variables. For example, the product rule takes the form ∂j (f g)(a) = f (a)∂j g(a) + g(a)∂j f (a), and the quotient rule becomes f g(a)∂j f (a) − f (a)∂j g(a) (a) = , g(a) 6= 0. ∂j g g 2 (a)

Differentiation on Rn

291

Derivative of a Vector-Valued Function of Several Variables We now consider the general case. The following definition includes the two special cases discussed before. 9.1.6 Definition. Let U ⊆ Rn be open. A function f : U → Rm is said to be differentiable at a ∈ U if there exists a linear transformation dfa : Rn → Rm , called the differential of f at a, such that lim

h→0

f (a + h) − f (a) − dfa (h) = 0. khk

The m × n matrix [dfa ] is called the derivative of f at a, or the Jacobian matrix of f at a, and is denoted by f 0 (a). ♦ 9.1.7 Example. If T ∈ L(Rn , Rn ), then, by the linearity of T , T (x + h) − T (x) − T h = 0 for all h. It follows that dTx = T for all x. This is the n-dimensional version of the familiar result that the derivative of the function x → tx is the constant t. ♦ 9.1.8 Theorem. Let U ⊆ Rn be open, f = (f1 , . . . , fm ) : U → Rm , and let a ∈ U . Then f is differentiable at a iff each function fi : U → R is differentiable at a. In this case, ∂j fi (a) exists and equals dfa (ej ) · ei , and dfa (h) = ∇f1 (a) · h, . . . , ∇fm (a) · h , h ∈ Rn . (9.5) In particular, if the differential exists, it is unique. Proof. Let f be differentiable at a. For i = i, . . . , m and j = 1, . . . , n, let bij = dfa (ej ) · ei , the ith component of dfa (ej ) and the (i, j)th entry of the matrix [dfa ]. Then dfa (h) = b1 · h, . . . , bm · h , where bi := (bi1 , . . . , bin ). Thus for each i, |fi (a + h) − fi (a) − bi · h| ≤ kf (a + h) − f (a) − dfa (h)k, from which it follows that lim

h→0

fi (a + h) − fi (a) − bi · h = 0. khk

Therefore, the derivative of fi at a exists and equals bi . By 9.1.4, bi = ∇fi (a), that is, bij = ∂j fi (a). Conversely, suppose each fj is differentiable at a. Then ∇fj (a) exists and by (9.4), lim

h→0

|fi (a + h) − fi (a) − ∇fi (a) · h| = 0, i = 1, . . . , m. khk

292

A Course in Real Analysis

Let T (h) denote the right side of (9.5). Then T is linear and m

X |fi (a + h) − fi (a) − ∇fi (a) · h|2 kf (a + h) − f (a) − T (h)k2 = →0 khk2 khk2 i=1 as h → 0. Therefore, dfa exists and equals T . By the theorem, the (i, j) entry of f 0 (a) is ∂j fi (a). The effect of dfa on a vector h ∈ Rn may therefore be expressed in matrix form as ∇f1 (a) · h h1 ∂1 f1 (a) · · · ∂n f1 (a) .. .. .. .. f 0 (a)ht = , . = . . . ∂1 fm (a) · · ·

∂n fm (a)

∇fm (a) · h

hn

where ht denotes the transpose of the vector h. In the special case m = n, the determinant of f 0 (a) is called the Jacobian of f at a and is denoted variously by ∂(f1 , . . . , fn ) det f 0 (a) = Jf (a) = (a). ∂(x1 , . . . , xn ) 9.1.9 Example. The transformation (x, y, z) = (r cos θ, r sin θ, z) from cylindrical coordinates to rectangular coordinates in R3 has Jacobian cos θ sin θ 0 ∂(x, y, z) ♦ = −r sin θ r cos θ 0 = r. ∂(r, θ, z) 0 0 1 The following characterization of differentiability will be useful. 9.1.10 Theorem. Let f : U → Rm , where U ⊆ Rn is open. Then f is differentiable at a ∈ U iff there exists T ∈ L(Rn , Rm ) and, for sufficiently small r, a function η : Br (0) → Rm such that f (a + h) = f (a) + T h + khk η(h), and

lim η(h) = 0.

h→0

(9.6)

In this case, T = dfa . Proof. Assume that f is differentiable at a. Choose r > 0 such that Br (a) ⊆ U and define η : Br (0) → Rm by η(0) = 0 and η(h) =

f (a + h) − f (a) − dfa (h) khk

if h 6= 0.

Then (9.6) holds with T = dfa . Conversely, if (9.6) holds for some η and T , then kf (a + h) − f (a) − T hk = lim kη(h)k = 0, h→0 h→0 khk lim

hence f is differentiable at a with dfa = T .

Differentiation on Rn

293

9.1.11 Corollary. If f is differentiable at a, then f is continuous at a. Proof. By (9.6) and the continuity of linear transformations, lim f (a + h) − f (a) = lim khkη(h) + lim dfa (h) = 0. h→0

h→0

h→0

Exercises 1. Find the differential df for each of the functions f (x, y): (a) S (d) (g)

x−y . x+y x cos . y

(b)

sec (yex ).

(h) S exy .

ln(x2 + y 3 ).

(e) S sin (x2 y). 2

(c)

arctan (xy 2 ).

(f)

y arcsin , 0 < y < x. x 3x + 2y tan . 2x + 3y

(i)

2. Find f 0 (x) where f (x) = xy x2 − y 2 3 3 2 2 S x y S (a) x − y , x y . (b) e sin y, e sin x . (c) , . x2 + y 2 x2 + y 2 (d) ln(x2 + y 2 + z 2 + 1), xyz . (e) arctan(x − y), exy , x/y . 3. For each of the functions f (x, y) below, find all values of p, q ∈ N for which on R2 (i) fx , fy exist, (ii) fx , fy are continuous, (iii) f 0 exists. p p q q x + y if (x, y) 6= 0, x y if (x, y) 6= 0, S 2 2 x +y x2 + y 2 (a) (b) 0 0 otherwise. otherwise. ( xp sin 1 + y q if x 6= 0, (x − y)p sin(x − y)−1 if x 6= y, (c) (d) S x y q 0 otherwise. otherwise. p p q q x + y x y if x 6= y, if (x, y) 6= 0, x−y x−y (e) (f) 0 0 otherwise. otherwise. 4. Find all values of p, q, s ∈ (0, +∞) for which on R2 (i) fx , fy exist,

(ii) fx , fy are continuous,

(iii) f 0 exists,

where f (0, 0) = 0 and, for (x, y) 6= (0, 0), f (x, y) = (a)S |x|p |y|q ln(x2 + y 2 ). (d)S

sin |x|p |y|q . (x2 + y 2 )s

(b)

sin(x2 + y 2 )p . (x2 + y 2 )q

(e)

sin−1 |x|p |y|q . (x2 + y 2 )s

(c)

tan(x2 + y 2 )p . (x2 + y 2 )q

294

A Course in Real Analysis

5. Spherical coordinates (ρ, φ, θ) in R3 are defined by x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ, where ρ ≥ 0, 0 ≤ φ ≤ π, and 0 ≤ θ < 2π. Show that ∂(x, y, z) = ρ2 sin φ. ∂(ρ, φ, θ) ∂(u, v) 6. Let (u, v) = sin f (x, y), cos f (x, y) . Find . ∂(x, y) 7. Let (u, v, w) = (y/z, z/x, x/y), where xyz 6= 0. Find 8.S Let f (x) =

n X

xai i and g(x) =

i=1

where xi , ai > 0 and

P

i

n Y

∂(u, v, w) . ∂(x, y, z)

xai i ,

i=1

ai = 1. Find

(a) x · ∇f (x).

(b) x · ∇g(x).

9. Let f (x) be defined implicitly by the equation n

X 1 1 = . f (x) x i=1 i Express ∇f (x) in terms of f . Pn xi 10.S Let f (x) = ln . Express ∇f (x) in terms of f . i=1 e 11. Let the equation αxn − x1 x2 · · · xn−1 = 0, α 6= 0, define each of the variables x1 , . . . , xn−1 as a differentiable function of xn . Show that xn−2 n

∂x1 ∂x2 ∂xn−1 ··· = α. ∂xn ∂xn ∂xn

12. Let x = (x1 , . . . , xn ). Find ∂i of 1 (a)S kxk. (b) . kxk

(c)S

xi . kxk

(d)

xi . kxk2

13. Let f : R → R be differentiable and p > 0. Show that for x 6= 0, x · ∇kxkp = pkxkp and x · ∇f kxkp = pf 0 kxkp kxkp .

Differentiation on Rn

9.2

295

Properties of the Differential

In this section we consider analogs of differentiation rules for single variable functions. Deeper properties of the differential are taken up in later sections.

Linearity of the Differential 9.2.1 Theorem. Let U ⊆ Rn be open, let f, g : U → Rm be differentiable at a ∈ U , and let α, β ∈ R. Then αf + βg is differentiable at a and d(αf + βg)a = αdfa + βdga . Proof. By 9.1.10, there exist functions η(h), µ(h), defined for h ∈ Rn with sufficiently small norm, such that f (a + h) = f (a) + dfa (h) + khkη(h), g(a + h) = g(a) + dga (h) + khkµ(h),

lim η(h) = 0, and

h→0

lim µ(h) = 0.

h→0

Then (αf + βg)(a + h) = (αf + βg)(a) + (αdfa + βdga )(h) + khk αη + βµ (h) and

lim αη + βµ (h) = 0.

h→0

Another application of 9.1.10 completes the proof.

The Norm of a Linear Transformation For additional properties of the differential, including product rules, we need the notion of operator norm on the space L(Rn , Rm ) of linear transformations from Rn to Rm . 9.2.2 Definition. Let T ∈ L(Rn , Rm ). The operator norm of T is defined as kT k = sup kT xk : x ∈ Rn , kxk = 1 . ♦ The following proposition justifies the use of the term “norm.” 9.2.3 Proposition. kT k defines a norm on L(Rn , Rm ) such that kT xk ≤ kT k kxk for all x ∈ Rn . Moreover, if [aij ]m×n is the matrix of T , then for all k, ` X 1/2 m X n |ak` | ≤ kT k ≤ a2ij . i=1 j=1

(9.7)

(9.8)

296

A Course in Real Analysis

Proof. Inequality (9.7) is clear if x = 0. If x 6= 0, then kxk−1 x has norm 1 hence

kxk−1 kT xk = T (kxk−1 x) ≤ 1. To verify (9.8), let ai = (ai1 , . . . , ain ). Since T x = a1 · x, . . . , an · x , by the Cauchy–Schwarz inequality, kT xk2 =

m m X X X (ai · x)2 ≤ kai k2 kxk2 = a2ij , i=1

i=1

kxk = 1,

i,j

which verifies the second inequality in (9.8). The first inequality follows from |ak` |2 ≤

m X

|ai` |2 = kT e` k2 ≤ kT k2 .

i=1

To see that kT k defines a norm, note that homogeneity follows directly from the definition, and the triangle inequality kT1 + T2 k ≤ kT1 k + kT2 k is a consequence of k(T1 + T2 )xk ≤ kT1 xk + kT2 xk ≤ kT1 k + kT2 k, kxk = 1. The property of coincidence follows directly from (9.7). 9.2.4 Corollary. A linear transformation T : Rn → Rm is uniformly continuous. Proof. This follows from kT x − T yk = kT (x − y)k ≤ kT k kx − yk, using the linearity of T . Since L(Rn , Rm ) is a normed vector space, it is a metric space under the distance function ρ(T1 , T2 ) := kT1 − T2 k. Thus the methods of Chapter 8 apply. In particular, we have the following consequence of 9.2.3. 9.2.5 Corollary. Let (X, d) be a metric space and let F be a function from X to L(Rn , Rm ). For each x ∈ X, let [aij (x)]m×n denote the matrix of F (x). Then F is (uniformly) continuous with respect to the metric ρ iff each function aij (x) is (uniformly) continuous on X. Proof. The matrix of F (x) − F (y) is [aij (x) − aij (y)]m×n , hence, by (9.8), X |ak` (x) − ak` (y)|2 ≤ kF (x) − F (y)k2 ≤ [aij (x) − aij (y)]2 . i,j

The assertion follows.

Differentiation on Rn

297

Product Rules We consider two product rules; additional product rules, as well as a quotient rule, are given in the exercises. 9.2.6 Theorem (Scalar Product Rule). Let U be open in Rn and f : U → Rm and ψ : U → R differentiable at a ∈ U . Then d(ψf )a (h) = ψ(a)dfa (h) + ∇ψ(a) · h f (a), h ∈ Rn . (9.9) Proof. By 9.1.10, there exist functions η(h) and µ(h), defined for h ∈ Rn with sufficiently small norm, such that f (a + h) − f (a) − dfa (h) = khkη(h),

lim η(h) = 0,

h→0

ψ(a + h) − ψ(a) − ∇ψ(a) · h = khkµ(h),

lim µ(h) = 0.

h→0

Let T h denote the right side of (9.9) and set ν(h) := (ψf )(a + h) − (ψf )(a) − T h = ψ(a + h)f (a + h) − ψ(a)f (a) − ψ(a)dfa (h) − ∇ψ(a) · h f (a). Then T is linear and ν(h) = ψ(a + h) f (a + h) − f (a) − dfa (h) + ψ(a + h) − ψ(a) − ∇ψ(a) · h f (a) + ψ(a + h) − ψ(a) dfa (h) = ψ(a + h)khkη(h) + khkµ(h)f (a) + ψ(a + h) − ψ(a) dfa (h). Since kdfa (h)k ≤ kdfa k khk, kν(h)k ≤ |ψ(a + h)| kη(h)k + |µ(h)| kf (a)k + kψ(a + h) − ψ(a)k kdfa k. khk By continuity of ψ at a, the right side of the last inequality tends to zero as h → 0, proving the theorem. 9.2.7 Theorem (Dot Product Rule). Let U be open in Rn and f, g : U → Rm differentiable at a ∈ U . Then d(f · g)a (h) = f (a) · dga (h) + g(a) · dfa (h), h ∈ Rn .

(9.10)

Proof. Let η(h) and µ(h) be functions defined for sufficiently small khk such that f (a + h) − f (a) − dfa (h) = khkη(h), g(a + h) − g(a) − dga (h) = khkµ(h),

lim η(h) = 0,

h→0

lim µ(h) = 0.

h→0

298

A Course in Real Analysis

Let T h denote the right side of (9.10) and define ν(h) :=(f · g)(a + h) − (f · g)(a) − T h, h ∈ Rn =f (a + h) · g(a + h) − f (a) · g(a) − f (a) · dga (h) − g(a) · dfa (h). Then T is linear and ν(h) = f (a + h) · g(a + h) − g(a) − dga (h) + g(a) · f (a + h) − f (a) − dfa (h) + f (a + h) − f (a) · dga (h) = khkf (a + h) · µ(h) + khkg(a) · η(h) + dfa (h) + khkη(h) · dga (h). By the Cauchy–Schwarz and operator norm inequalities, |ν(h)| ≤ kf (a + h)k kµ(h)k + kg(a)k kη(h)k khk + kdfa k kdga k khk + kη(h)k kdga k khk. Since the right side of this inequality tends to zero as h → 0 so does the left, completing the proof.

Continuity of the Differential If U is an open subset of Rn and f : U → Rm is differentiable, then the mapping x 7→ dfx is a function from U to L(Rn , Rm ). Since L(Rn , Rm ) is a metric space in the operator norm, the notion of continuity of this mapping is meaningful. 9.2.8 Definition. Let U ⊆ Rn be open. A function f : U → Rm is said to be continuously differentiable on U if dfx exists and is continuous as a function of x on U . In this case, f is also said to be of class C 1 on U . A function g is continuously differentiable on a subset E of Rn if g is the restriction to E of a continuously differentiable function f on an open set U ⊇ E. ♦ 9.2.9 Theorem. Let f = (f1 , . . . , fm ) : U → Rm , where U ⊆ Rn is open. Then f is continuously differentiable on U iff the partial derivatives ∂j fi , 1 ≤ i ≤ m, 1 ≤ j ≤ n, exist and are continuous on U . Proof. If f is continuously differentiable on U then, by 9.2.5, the matrix f 0 (x) has continuous entries. By 9.1.8, these entries are the partial derivatives of the components of f . For the sufficiency, by 9.1.8 we may assume that m = 1, that is, f is realvalued. Suppose then that the partial derivatives ∂j f exist and are continuous on U . Let a ∈ U and ε > 0. Choose r > 0 such that Br (a) ⊆ U and fix h = (h1 , . . . , hn ) such that khk < r. For 1 ≤ j ≤ n set gj (t) := f a + hj (t) , hj (t) := (h1 , . . . , hj−1 , thj , 0, . . . , 0), 0 ≤ t ≤ 1.

Differentiation on Rn Then

299

gj (1) − gj (0) = f a + hj (1) − f a + hj (0) .

Also, by the mean value theorem and the chain rule, there exists tj ∈ (0, 1) such that gj (1) − gj (0) = gj0 (tj ) = hj ∂j f a + hj (tj ) . Therefore, n n X X f (a + h) − f (a) = gj (1) − gj (0) = hj ∂j f a + hj (tj ) , j=1

j=1

hence f (a + h) − f (a) − ∇f (a) · h =

n X ∂j f a + hj (tj ) − ∂j f (a) hj = ν(h) · h, j=1

where ν(h) :=

n X

[∂j f a + hj (tj ) − ∂j f (a) ei .

j=1

Since limh→0 hj (tj ) = 0, the continuity of ∂j f at a implies that limh→0 ν(h) = 0. Since |ν(h) · h| ≤ kν(h)k khk, |f (a + h) − f (a) − ∇f (a) · h| ≤ kν(h)k → 0, khk completing the proof.

Exercises 1.S Prove that for T ∈ L(Rn , Rm ), kT k = sup kT xk : x ∈ Rn , kxk ≤ 1 . 2. Let T1 ∈ L(Rm , Rk ) and T2 ∈ L(Rn , Rm ). Prove that kT1 T2 k ≤ kT1 k kT2 k. (We use the standard notation T1 T2 for composition of linear operators.) 3.S (Quotient rule) Let U f , and ψ be as in 9.2.6. If ψ(a) 6= 0, prove that ψ(a)dfa (h) − ∇ψ(a) · h f (a) f d (h) = . ψ a ψ 2 (a) 4. Find dgx (x), kxk = 6 0, if g(x) = (a)S kxkx.

(b) kxk−2 x.

(c) kxk−1 x.

300

A Course in Real Analysis

5. The cross product of vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) is defined by a2 a3 1 a1 a3 2 a1 a2 3 e . a×b= e − e + b2 b3 b1 b3 b1 b2 (See Exercise 1.6.9.) Let f : U → R3 and g : U → R3 , where U ⊆ Rn is open. Define f × g on U by (f × g)(x) = f (x) × g(x). Prove that d(f × g)a (h) = f (a) × dga (h) + dfa (h) × g(a). 6.S Let V ⊆ Rp and W ⊆ Rq be open, f : V → Rk , g : W → Rk , and α, β ∈ R. Define F on V × W ⊆ Rp+q by F (x, y) = αf (x) + βg(y),

x ∈ V,

y ∈ W.

If f is differentiable at a ∈ V and g is differentiable at b ∈ W , prove that F is differentiable at c := (a, b) and dFc (h, k) = αdfa (h) + βdgb (k), h ∈ Rp , k ∈ Rq . 7. Let V ⊆ Rp and W ⊆ Rq be open and f : V → Rk , g : W → Rk . Define F on V × W ⊆ Rp+q by F (x, y) = f (x) · g(y),

x ∈ V,

y ∈ W.

If f is differentiable at a ∈ V and g is differentiable at b ∈ W , prove that F is differentiable at c := (a, b) and dFc (h, k) = g(b) · dfa (h) + f (a) · dgb (k), h ∈ Rp , k ∈ Rq . 8. Formulate and prove the analog of Exercise 7 for cross products. 9. Let f : I → Rm be differentiable and kf k = 1 on an open interval I. Prove that f (t) and f 0 (t) are perpendicular for all t, that is, f · f 0 = 0 on I. 10.S Let f : [a, b] → Rm be differentiable and v 6∈ A := f [a, b]. Referring to Exercise 8.5.15 with d(x, y) = kx − yk, show that (a) d(A, v) = kf (t0 ) − vk for some t0 ∈ [a, b]. (b) f (t0 ) − v · f 0 (t0 ) = 0 if t0 ∈ (a, b). 11. A path ϕ : [a, b] → Rn is piecewise smooth if there exists a partition a0 = a < a1 < · · · < an = b of [a, b] such that ϕ0 exists and is continuous on each subinterval [aj−1 , aj ]. Let U ⊆ Rn be nonempty open and connected. Show that if ε > 0, then any pair of points can be joined by a piecewise smooth path ϕ in U such that supaj−1 ≤t≤aj kϕ0 (t)k < ε for each j.

Differentiation on Rn

9.3

301

Further Properties of the Differential

In this section we prove two important theorems, the first of which is an n-dimensional version of the chain rule. 9.3.1 Chain Rule. Let U ⊆ Rn and V ⊆ Rm be open and f : U → Rm , g : V → Rk with f (U ) ⊆ V . If f is differentiable at a ∈ U and g is differentiable at b := f (a), then g ◦ f : U → Rk is differentiable at a and the linear transformation d(g ◦ f )a : Rn → Rk is the composition of the linear transformations dgb : Rm → Rk and dfa : Rn → Rm : d(g ◦ f )a = dgb ◦ dfa . Proof. Choose r, s > 0 such that Br (a) ⊆ U and Bs (b) ⊆ V . By 9.1.10 there exist functions η : Br (0) → Rm and ν : Bs (0) → Rk such that f (a + h) = f (a) + dfa (h) + khkη(h), g(b + k) = g(b) + dgb (k) + kkkν(k), Set

lim η(h) = 0, and

(9.11)

lim ν(k) = 0.

(9.12)

h→0 k→0

k = f (a + h) − f (a) = dfa (h) + khkη(h).

(9.13)

By the continuity of f at a, k ∈ Bs (0) for all sufficiently small khk. For such h set µ(h) = (g ◦ f )(a + h) − (g ◦ f )(a) − (dgb ◦ dfa )(h). To complete the proof we show that µ(h) = 0. h→0 khk lim

From (9.11), (9.12), and (9.13), µ(h) = g(b + k) − g(b) − dgb (k) + dgb [k − dfa (h)] = kkkν(k) + khkdgb (η(h)), hence

Since we have

kµ(h)k kkk ≤ kν(k)k + kdgb (η(h)k khk khk

kkk = dfa (h) + khkη(h) ≤ kdfa k + kη(h)k khk, kµ(h)k ≤ kdfa k + kη(h)k kν(k)k + kdgb (η(h))k. khk

Since h → 0 implies k → 0, (9.14) follows.

(9.14)

302

A Course in Real Analysis

9.3.2 Remark. Let f be differentiable on U and g differentiable on V . Set y = f (x) and z = (g ◦ f )(x) = g(y). Then the chain rule may be written in matrix form as (g ◦ f )0 (x) = g 0 (y)f 0 (x) or ∂z1 ∂x1 . . . ∂z k ∂x1

···

···

∂z ∂z1 1 ∂y1 ∂xn .. . . = .. ∂z ∂z k

k

∂xn

∂y1

∂z1 ∂y1 ∂ym ∂x1 .. . . .. ∂zk ∂ym ∂ym ∂x1

···

···

···

···

∂y1 ∂xn .. . . ∂y m

∂xn

From this we obtain the familiar formulas m

X ∂z` ∂yi ∂z` = ∂xj ∂yi ∂xj i=1

j = 1, . . . , n,

` = 1, . . . , k.

♦

9.3.3 Example. Let the partial derivatives of u = f (x, y) and v = g(x, y) exist on R. If x = r cos θ and y = r sin θ, we may use the chain rule to find fx , fy , gx , and gy in terms of ur , vr , uθ , and vθ . Indeed, from 9.3.2, ur uθ f fy cos θ −r sin θ = x , vr v θ gx gy sin θ r cos θ hence fx gx

fy u = r gy vr

uθ vθ

cos θ sin θ

−r sin θ r cos θ

−1 1 ur = r vr

uθ vθ

r cos θ − sin θ

Thus, for example, fx = (cos θ)ur − r−1 (sin θ)uθ .

r sin θ . cos θ ♦

9.3.4 Remark. The chain rule may be used to suggest a definition of tangent plane to a smooth surface. Let f : U → R be differentiable on the open subset U of Rn and let c ∈ R. The set S = {x ∈ U : f (x) = c and ∇f (x) 6= 0} is called a level surface of f in Rn . Let a ∈ S and let ϕ : (−r, r) → Rn be a smooth path in S such that ϕ(0) = a. The existence of such paths may be justified by the implicit function theorem, proved in the next section. Applying the chain rule to the identity f ϕ(t) = c, we see that 0 = (f ◦ ϕ)0 (0) = ∇f (a) · ϕ0 (0). Since ϕ0 (0) is tangent to the curve at a, ∇f (a) is perpendicular to S at a. The tangent hyperplane to S at a is then defined as the set of all points x ∈ Rn such that x − a is perpendicular to ∇f (a), that is, (x − a) · ∇f (a) = 0.

Differentiation on Rn

303

For the hyperplane tangent at a to the (n − 1)-dimensional sphere example, x ∈ Rn : |xk2 = 1 is the set of all x such that n X

2ai (xi − ai ) = 0

or a · x = 1.

i=1

The tangent hyperplane at a to a surface S may be seen as the best linear approximation to S near a. ♦ The second main result of this section is an n-dimensional version of the mean value theorem of Chapter 4. While such a theorem is not generally available for vector-valued functions (Exercise 14), there is a version for scalarvalued functions. For its statement, we recall that the line segment in Rn from a to b is defined by [a : b] = {(1 − t)a + tb : 0 ≤ t ≤ 1} . 9.3.5 Mean Value Theorem. Let U ⊆ Rn be open and let f : U → R be differentiable on U . For each pair of points a, b ∈ U with [a : b] ⊆ U there exists c ∈ [a : b] such that f (b) − f (a) = dfc (b − a) = ∇f (c) · (b − a). Proof. Set ϕ(t) = (1 − t)a + tb, 0 ≤ t ≤ 1, and g = f ◦ ϕ. Since ϕ0 (t) = b − a, the chain rule and one-variable mean value theorem imply that f (b) − f (a) = g(1) − g(0) = g 0 (c) = dfϕ(c) (b − a) for some c ∈ (0, 1). Setting c = ϕ(c) completes the proof. We conclude this section with two applications of the mean value theorem. 9.3.6 Theorem. Let U ⊆ Rn be open and let f : U → Rm be continuously differentiable on U . Let C ⊆ U be compact and convex and define c := supz∈C kdfz k. Then c < +∞ and kf (x) − f (y)k ≤ ckx − yk, x, y ∈ C. Proof. Since z 7→ dfz is continuous and C is compact, c < +∞. Let x, y ∈ C and u ∈ Rm . By 9.3.5 applied to the scalar function g := u · f , there exists a point c ∈ [x : y] ⊆ C such that u · f (x) − f (y) = g(x) − g(y) = dgc (x − y) = u · dfc (x − y). Taking u = f (x) − f (y) and using the Cauchy–Schwarz and the operator norm inequalities, we have kf (x) − f (y)k2 = f (x) − f (y) · dfc (x − y) ≤ ckf (x) − f (y)k kx − yk. Dividing by kf (x) − f (y)k completes the proof.

304

A Course in Real Analysis

9.3.7 Corollary. Let U ⊆ Rn be open and connected and let f : U → Rm be differentiable on U . If dfx = 0 for all x ∈ U , then f is constant. Proof. Let x ∈ U and choose r > 0 such that Cr (x) ⊆ U . Since Cr (x) is compact and convex, 9.3.6 implies that kf (x) − f (y)k ≤ ckx − yk, y ∈ Cr (x), c :=

sup kdfz k. z∈Cr (x)

By hypothesis, c = 0, hence f (y) = f (x) for all y ∈ Cr (x). Thus f is constant on any ball contained in U . Now let a ∈ U and define Ua = {x ∈ U : f (x) = f (a)} and Va = {x ∈ U : f (x) 6= f (a)} . By the first paragraph, if x ∈ Ua , then a ball with center x is contained in Ua . Therefore, Ua is open. A similar argument shows that Va is open. Since U is connected and Ua 6= ∅, Ua = U , that is, f (x) = f (a) for all x ∈ U .

Exercises 1.S Let g, ϕ, ψ : R → R be differentiable and let f (x, y) = g ϕ(x)ψ(y) . Find ∇f (x, y) in terms of g, ϕ, and ψ. 2. Let ϕ : R → R and g : R3 → R be differentiable and set f (x, y) := g x, ϕ(x + 2y), ϕ(x − 3y) . Find fy in terms of g and ϕ. 3.S Let g : R2 → R be differentiable, a, b ∈ Rn , and set f (x) = g a·x, b·x). Find ∇f . 4. Let the partial derivatives of f : R2 → R of f exist and let z = f (x, y) = f (r cos θ, r sin θ). Prove that

2 2 2 2 ∂z ∂z ∂z ∂z r + = + r . ∂r ∂θ ∂x ∂y

5. Let F : Rn → R be differentiable and set f (x) = F (x, . . . , x). Prove that f 0 (x) = (1, . . . , 1) · ∇F (x, . . . , x). 6. Let f (x, y) be continuously differentiable. Prove that f (x, y) =

Z 0

1

(x, y) · ∇f (tx, ty)t dt +

Z

1

f (tx, ty) dt.

0

7.S Let f : U → Rm be differentiable on an open set U ⊆ Rn . Find (T ◦f )0 (x) for T ∈ L(Rm , Rk ).

Differentiation on Rn

305

8. Let f : Rn → R be differentiable and a = (a1 , . . . , an ) ∈ Rn with an 6= 0. Prove that a · ∇f (x) = 0 for all x ∈ Rn iff there exists a differentiable function g : Rn−1 → R such that f (x1 , x2 , . . . , xn ) = g x1 − b1 xn , x2 − b2 xn , . . . , xn−1 − bn−1 xn , where bj = aj /an , 1 ≤ j ≤ n − 1. 9. Let U ⊆ Rn be open and f : U → R smooth. Let α, β : I → Rn be smooth paths in U such that ∇(f ◦ α) = α0 , α(t1 ) = β(t1 ), and kα0 (t1 )k = kβ 0 (t1 )k = 1 for some t1 ∈ I (that is, α and β both have unit speed at the intersection). Show that (f ◦ α)0 (t1 ) ≥ (f ◦ β)0 (t1 ). 10. Let U ⊆ Rn be open and f : U → R differentiable at a ∈ U . If u ∈ Rn with kuk = 1, define the directional derivative of f in the direction of u by f (a + tu) − f (a) Du f (a) = lim . t→0 t (a)S Show that if f is differentiable at a, then Du f (a) exists and equals u · ∇f (a). (b) Show that if Du f exists, then D−u f exists and D−u f = −Du f . (c)S Define

2 xy 2 f (x, y) = x + y 4 0

if (x, y) 6= (0, 0), otherwise.

Show that Du f (0, 0) exists for each u but f is not even continuous at (0, 0). (d) Find all unit vectors u such that Du (xy)1/3 exists at (0, 0). (e) Find all unit vectors u such that Du |x + y| exists at (x0 , −x0 ). (f) Find all unit vectors u such that Du (x + y)1/3 exists at (0, 0). 11. Let z = F (x, y), where x = x(u, v), y = y(u, v), z = z(u, v), and the partial derivatives of these functions exist on R2 . Suppose that xu yv − yu xv 6= 0. Find zx and zy in terms of zu , zv , xu , xv , yu , and yv . 12.S Let f and fx be continuous on [a, b] × [c, d]. Use the mean value theorem to prove that Z b Z b d f (t, x) dt = fx (t, x) dt, c ≤ x ≤ d. dx a a 13. Let f and fx be continuous on R2 and u(x), v(x) differentiable on R. Use Exercises 5 and 12 to prove that Z v(x) Z v(x) d f (t, x) dt = fx (t, x) dt + f v(x), x v 0 (x) − f u(x), x u0 (x). dx u(x) u(x)

306

A Course in Real Analysis

14. Show that the mean value theorem does not generally hold for vectorvalued functions. 15.S A function f : Rn \ {0} → R is homogeneous of degree p > 0 if f (tx) = tp f (x) for all t > 0 and all x 6= 0. Prove that a differentiable function f is homogeneous of degree p iff x · ∇f (x) = pf (x) for every x 6= 0. 16. Prove the following generalization of the Cauchy mean value theorem: Let U ⊆ Rn be open and convex and let f, g : U → R be differentiable on U . Then, for each pair of points a, b ∈ U , there exists c ∈ [a : b] such that f (b) − f (a) ∇g(c) · (b − a) = g(b) − g(a) ∇f (c) · (b − a). 17.S Let f : U → Rm be continuously differentiable on the open set U ⊆ Rn and let C be a compact convex subset of U . Prove that kf (x) − f (y) − dfy (x − y)k ≤ sup kdfz − dfy k kx − yk, x, y ∈ C, z∈C

and that the supremum is finite. 18. Let f (x, y) = x2 −y 2 , 2xy and (a, b) 6= (0, 0). Show that if the functions ϕ, ψ : (−1, 1) → R2 are differentiable and ϕ(0) = ψ(0) = (a, b), then ϕ0 (0) · ψ 0 (0) (f ◦ ϕ)0 (0) · (f ◦ ψ)0 (0) = , k(f ◦ ϕ)0 (0)k k(f ◦ ψ)0 (0)k kϕ0 (0)k kψ 0 (0)k that is, the angle between the curves ϕ and ψ at their intersection is preserved under the transformation f .

9.4

Inverse Function Theorem

The one-dimensional inverse function theorem of Section 4.4 has the following n-dimensional extension. 9.4.1 Inverse Function Theorem. Let U ⊆ Rn be open and let f : U → Rn be continuously differentiable on U . If Jf (a) 6= 0 for some a ∈ U , then there exist open sets Ua ⊆ U and Va = f (Ua ) with a ∈ Ua such that f is one-to-one on Ua and f −1 : Va → Ua is continuously differentiable. Moreover, dfx

−1

= d(f −1 )y ,

x ∈ Ua , y := f (x).

(9.15)

Differentiation on Rn

307

The conclusion of the theorem may be summarized by saying that f has a continuously differentiable local inverse at a. Of course, since f need not be one-to-one on U , f may not have a “global” inverse. The proof of the theorem requires two lemmas. The first is of some independent interest. 9.4.2 Lemma (Contraction Mapping Principle). Let (X, d) be a complete metric space and let ϕ : X → X be a continuous function such that, for some 0 ≤ c < 1, d ϕ(x), ϕ(y) ≤ c d(x, y) for all x, y ∈ X. Then there exists a unique point x ∈ X such that ϕ(x) = x. Proof. Choose any point x0 in X and define a sequence {xn } recursively by xn = ϕ(xn−1 ), n ≥ 1. By hypothesis, d(xk+1 , xk ) ≤ c d(xk , xk−1 ) ≤ c2 d(xk−1 , xk−2 ) ≤ · · · ≤ ck d(x1 , x0 ). Thus, by the triangle inequality, for m > n d(xn , xm ) ≤

m−1 X

d(xk , xk+1 ) ≤ d(x1 , x0 )

k=n

∞ X

ck .

k=n

P∞

Since c < 1, the series k=1 ck converges, hence the sum on the right tends to zero as n → ∞. It follows that {xn } is a Cauchy sequence and therefore converges to some x ∈ X. Letting n → +∞ in the equation xn = ϕ(xn−1 ) yields ϕ(x) = x. If also ϕ(y) = y, then d(x, y) = d ϕ(x), ϕ(y) ≤ c d(x, y), which is possible only if x = y. 9.4.3 Lemma. Let U ⊆ Rn be open and f : U → Rn continuously differentiable. If a ∈ U with Jf (a) 6= 0, then there exists r > 0 such that the linear transformation dfx is invertible for each x ∈ Br (a). Proof. Since f 0 is continuous, its entries are continuous, hence Jf (x) is a continuous function of x. Since Jf (a) 6= 0, there exists r > 0 such that Jf (x) 6= 0 on Br (a) ⊆ U . Since a linear transformation on Rn is invertible iff the determinant of its matrix is not zero, dfx is invertible for x ∈ Br (a). Proof of the inverse function theorem. By 9.4.3, there exists an r > 0 such that Cr (a) ⊆ U and dfx is invertible for each x in an open set Wr containing Cr (a). Let T = dfa and define g = T −1 ◦f on Wr . Then dga = T −1 ◦ dfa = In , the identity transformation on Rn . Now apply 9.3.6 to the function g(x) − x on Cr (a). The constant c in that theorem is sup{kdgz − dga k : z ∈ Cr (a)},

308

A Course in Real Analysis

which we can make less than 1/2 by taking r sufficiently small, using the continuity of the function z 7→ dgz at a. Thus kg(x) − g(y) − (x − y)k ≤ 12 kx − yk, x, y ∈ Cr (a). Since

(9.16)

kx − yk − kg(x) − g(y)k ≤ kg(x) − g(y) − (x − y)k,

we see from (9.16) that 1 2 kx

− yk ≤ kg(x) − g(y)k, x, y ∈ Br (a).

In particular, g is one-to-one on Br (a). Next, we use 9.4.2 to show that g Br (a) is open. Let c ∈ Br (a), d = g(c) and choose s > 0 so that Cs (c) ⊆ Br (a). We claim that Bs/2 (d) ⊆ g Cs (c) ⊆ g Br (a) . (9.17) The second inclusion is clear. For the first, let u ∈ Bs/2 (d). To show that u ∈ g Br (a) define ϕ(x) = x − g(x) + u, x ∈ Cs (c). Then kc − ϕ(x)k = kg(x) − g(c) − (x − c) + d − uk ≤ kg(x) − g(c) − (x − c)k + kd − uk ≤ 21 kx − ck + kd − uk

by (9.16)

< s/2 + s/2 = s, so ϕ Cs (c) ⊆ Bs (c). Moreover, using (9.16) again we have kϕ(x) − ϕ(y)k ≤ 21 kx − yk,

x, y ∈ Cs (c).

By Lemma 9.4.2, ϕ(x) = x for some x ∈ Bs (c), hence u = g(x) ∈ g Bs (c) . Since u was arbitrary, (9.17) holds. Since d ∈ g Br (a) was arbitrary, g Br (a) is open. Next, we show that g −1 : g Br(a) → Br (a) is differentiable at b := g(a). Since b ∈ g Br (a) and g Br (a) is open, b + k ∈ g Br (a) for sufficiently small kkk, that is, for each such k, b + k = g(a + h) for some khk < r. By (9.16), khk − kkk ≤ kh − kk = kg(a + h) − g(a) − hk ≤ 12 khk,

Differentiation on Rn

309

hence kkk ≥ 12 khk. Since g −1 (b + k) = a + h and g −1 (b) = a, recalling that dga = In we have kg −1 (b + k) − g −1 (b) − In kk kh − kk kg(a + h) − g(a) − dga (h)k = ≤2 . kkk kkk khk Since k → 0 implies that h → 0, which in turn implies that the right side of the above inequality tends to zero, we see that g −1 is differentiable at b with derivative In . Now set Ua = Br (a) and Va = (T ◦ g)(Ua ). Since T is invertible, it is a homeomorphism, hence Va is open. Moreover, since g is one-to-one on Ua and maps Ua onto g(Ua ), f = T ◦ g is one-to-one on Ua and maps Ua onto Va . Since f −1 = g −1 ◦ T −1 , the chain rule implies that f −1 is differentiable at f (a) = T b. Now observe that the entire above argument may be used at any point x of Ua , since all that is needed is the invertibility of dfx . Therefore, f −1 is differentiable on Va . To verify (9.15) apply the chain rule to f −1 ◦ f = In : d(f −1 )y ◦ dfx = d(f −1 ◦ f )x = d(In )x = In , y = f (x) ∈ Va . 9.4.4 Corollary. Let U ⊆ Rn be open and f : U → Rn continuously differentiable with Jf (x) 6= 0 for each x ∈ U . Then f is an open map, that is, if E ⊆ U is open, then f (E) is open. If particular, f (U ) is open. Proof. In the notation of the theorem, f (E) is the union of the open sets f (Ua ∩ E), a ∈ E. Since continuous differentiability is a local property, we have 9.4.5 Global Inverse Function Theorem. Under the conditions of the preceding corollary, if f is also one-to-one on U , then f −1 : f (U ) → U is continuously differentiable. 9.4.6 Example. The function (x, y) = f (r, θ) = (r cos θ, r sin θ), r > 0, θ ∈ R, has Jacobian r, hence is locally invertible at each point of its domain. Since the function is not one-to-one, it has no global inverse. However, if the domain of f is suitably restricted, say by requiring θ0 < θ < θ0 + 2π, then f is one-to-one on the resulting open set Uθ0 := (0, +∞) × (θ0 , θ0 + 2π). By 9.4.5, the restriction g of f to Uθ0 has a continuously differentiable inverse r(x, y), θ(x, y) = g −1 (x, y) on the open set Vθ0 = fp (Uθ0 ), obtained by removing the ray (r, θ0 ), r ≥ 0, from R2 . Clearly, r(x, y) = x2 + y 2 . The function θ(x, y) is called the argument of (x, y) (determined by θ0 ) and is denoted by argθ0 (x, y). Thus p g −1 (x, y) = x2 + y 2 , argθ0 (x, y) on Vθ0 . For example, if θ0 = −π, then argθ0 (x, y) = arctan(y/x) for x > 0.

♦

310

A Course in Real Analysis

y

θ0 x FIGURE 9.1: The domain of argθ0 . If a function f has a nonzero Jacobian on an open set U and if f is oneto-one on an open subset U0 of U , then the inverse of the restriction of f to U0 is called a branch of f −1 (even though a global f −1 may not exist). In the preceding example, g −1 is one of infinitely many branches of f −1 . 9.4.7 Example. The function (x, y) = f (u, θ) = (eu cos θ, eu sin θ), where (u, θ) ∈ R2 , has Jacobian eu , hence is locally invertible at each point of R2 . The set Uθ0 = R × (θ0 , θ0 + 2π) is open, and f restricted to Uθ0 is one-to-one. Therefore, the corresponding branch of f −1 is continuously differentiable on f (Uθ0 ), which is the set Vθ0 of 9.4.6. The inverse may be given explicitly by p ♦ u = ln x2 + y 2 , θ = argθ0 (x, y). 9.4.8 Example. Let (u, v) = f (x, y) = 2x2 − 3y 2 , 3x2 − 2y 2 . The Jacobian is nonzero on the open set U = {(x, y) : xy 6= 0}. Solving the equations for x2 and y 2 yields 3v − 2u 2v − 3u and y 2 = . x2 = 5 5 Restricting f to each of the open quadrants of R2 , we obtain four natural branches of f −1 , each defined on the open set V := {(u, v) : 3v > 2u and 2v > 3u} = {(u, v) : v > max{2u/3, 3u/2}} , and each of the form r f

−1

(u, v) =

±

! r 3v − 2u 2v − 3u ,± , (u, v) ∈ V, 5 5

For example, in the open second quadrant of the x, y plane, one chooses the minus sign in the first coordinate and the plus sign in the second. ♦

Differentiation on Rn

311

Exercises 1. Find the largest set at each point of which the inverse function theorem guarantees a local C 1 inverse of f , where f (x) = (a) S (x + y, xy). (c) (e) S

2

(b) S (sin x + cos y, cos x + sin y). 2

(d) (sin x + sin y, cos x − cos y). 1 √ S , x, y > 0. (f) ln xy, 2 x + y2 x y (h) , . 1 + x2 + y 2 1 + x2 + y 2

ye−x , xe−y . ye−2x , ye3x .

(g) (xy, x2 − y 2 ). (i) S (x2 + y 2 , xy). 2 2 (k) ye−x , yex .

(j) S (xy 2 , x2 z, yz 2 ). (l) (x/y, y/z, z/x), xyz 6= 0.

2. Find a local inverse of the function in the specified part below of Exercise 1 −1 about the point (a, b) and find df(u,v) . (i) S (a) , a > b > 0.

(ii)

(iv) (g) , a > b > 0. (v)

(e) , ab 6= 0. S

(iii)

(i) , a, b > 0. (vi)

(f) , a > b > 0. (k) , a, b > 0.

Show that for part (a) in Exercise 1, no inverse is possible on (0, +∞)2 . 3. Let f (ρ, φ, θ) = x(ρ, φ, θ), y(ρ, φ, θ), z(ρ, φ, θ) be the spherical coordinate transformation of Exercise 9.1.5. Find an explicit formula for the branch of f −1 on the set {(ρ, φ, θ) : ρ > 0, 0 < φ < π, 0 < θ < π} . 4.S Let f (x, y) :=

y x , 2 2 2 x + y x + y2

, (x, y) 6= (0, 0).

Show that f = f −1 and find Jf . 5. By considering the function ( x + x2 sin(1/x) if x 6= 0, f (x) = 0 otherwise, show that the hypothesis in the statement of the inverse function theorem that df be continuous on U cannot be removed. 6. Let U ⊆ Rn be open and f : U → Rn of class C 1 such that for some c > 0, kf (x) − f (y)k ≥ ckx − yk for all x, y ∈ U , where c > 0. Prove that dfx is invertible for each x ∈ U . Conclude that f : U → f (U ) is a homeomorphism.

312

9.5

A Course in Real Analysis

Implicit Function Theorem

The implicit function theorem is one of the most important applications of the inverse function theorem. The theorem gives conditions under which an equation of the form F (x, y) = 0 may be solved locally for y in terms of x. The resulting function is then said to be implicitly defined by the equation F (x, y) = 0. The following simple example illustrates the basic idea. 9.5.1 Example. Let F (x, y, z) = x2 + y 2 + z 2 − 1. Consider the problem of finding all points (a, b, c) with F (a, b, c) = 0 such that the equation F (x, y, z) = 0 has a continuously differentiable solution z = z(x, y) satisfying z(a, b) = c. The key fact here is that such a solution is possible if Fz (a, b, c)(= 2c) 6= 0. Indeed, in this case a2 + b2 = 1 − c2 < 1, hence x2 + y 2 < 1 for all (x, y, z) sufficiently near (a, b, c) that satisfy F (x, y, z) = 0. For such points the solution p z(x, y) = ± 1 − x2 − y 2 is continuously differentiable, and if the sign chosen is that of c, then z(x, y) is the unique solution satisfying z(a, b) = c. ♦ Notation. For the statement and proof of the implicit function theorem we use the following conventions: For points z ∈ Rn+m we write z = (x, y) = (x1 , . . . xn , y1 , . . . ym ), x ∈ Rn , y ∈ Rm . For a differentiable function F (z) = F (x, y) = (F1 (x, y), . . . , Fm (x, y)), we denote by Fy (x, y) the m × m matrix with (i, j)th entry

∂Fi (x, y). ∂yj

♦

9.5.2 Implicit Function Theorem. Let U be an open subset of Rn+m , let F = (F1 , . . . , Fm ) : U → Rm be continuously differentiable, and let F (a, b) = 0 for some (a, b) ∈ U . If ∂(F1 , . . . , Fm ) = det Fy (a, b) 6= 0, ∂(y1 , . . . , ym ) then there is an open set Va ⊆ Rn containing a and a unique continuously differentiable mapping f : Va → Rm such that f (a) = b and F x, f (x) = 0 for every x ∈ Va . Proof. Define G : U → Rn+m by G(x, y) = x, F (x, y) = x, F1 (x, y), . . . , Fm (x, y) .

Differentiation on Rn

313

Then G is continuously differentiable, and In×n On×m 0 G (x, y) = , A Fy where In×n is the n × n identity matrix, Om×n is the m × n zero matrix, and A is an m × n matrix of partial derivatives of the components of F with respect to x. Therefore, JG = det Fy . Since det Fy (a, b) 6= 0, by the inverse function theorem there exists an open set W ⊆ U containing (a, b) and an open set V ⊆ Rn+m containing G(a, b) = (a, 0) such that G(W ) = V and H = (H1 , . . . , Hn , Hn+1 , . . . , Hn+m ) := G−1 : V → W is continuouslydifferentiable. Note that the identities H G(x, y) = (x, y) and G H(x, y) = (x, y) imply, respectively, that Hn+1 G(x, y) , . . . , Hn+m G(x, y) = y, (x, y) ∈ W (9.18) and

F x, Hn+1 x, y , . . . , Hn+m x, y = y, (x, y) ∈ V.

(9.19)

Now let Va = {x ∈ R : (x, 0) ∈ V }. Then Va is open and contains a. Define f on Va by f (x) = Hn+1 (x, 0), . . . , Hn+m (x, 0) . n

Then f is continuously differentiable, and since (a, 0) = G(a, b), (9.18) implies that f (a) = Hn+1 (a, 0), . . . , Hn+m (a, 0) = b. Furthermore, (9.19) implies that F (x, f (x)) = 0 on Va . This establishes the existence of f . To show uniqueness, assume that F (x, g(x)) = 0 for some function g : Va → Rm . Then G(x, f (x)) = x, F (x, f (x)) = x, F (x, g(x)) = G(x, g(x)). Since G is one-to-one, f (x) = g(x). 9.5.3 Example. The point (x, y, u, v) = (−1, 1, 1, 1) is a solution of the system F (x, y, u, v) := xu2 + y = 0 G(x, y, u, v) := xy 2 + u2 v 2 = 0, and at that point ∂(F, G) = 4xu3 v 6= 0. ∂(u, v) By the implicit function theorem, there are C 1 functions u(x, y) and v(x, y) defined on a ball Br (−1, 1) that satisfy the above system with u(−1, 1) = v(−1, 1) = 1. If r < 1, then (x, y) ∈ Br (−1, 1) implies that x < 0 < y and we have the explicit solution p √ u = −y/x, v = −x y. ♦

314

A Course in Real Analysis

9.5.4 Remark. Let f = (f1 , . . . , fm ) be the function in the statement of the implicit function theorem. Set y = f (x) and w = F (x, y) = F (z). Applying the chain rule to the identity F x, f (x) = 0 yields ∂wi ∂y1 ∂wi ∂ym ∂wi + + ··· + = 0, i = 1, . . . , m, j = 1, . . . , n. ∂xj ∂y1 ∂xj ∂ym ∂xj This may be written in matrix form as ∂w

1

∂y1 . . . ∂w

m

∂y1

···

···

∂w1 ∂y1 ∂ym ∂x1 .. . . .. ∂wm ∂ym ∂ym ∂x1

···

···

∂w1 ∂y1 ∂x1 ∂xn = − .. . ∂w ∂y m

∂xn

m

∂x1

···

···

∂w1 ∂xn , ∂w m

∂xn

or, in the above notation, as Fy (z)f 0 (x) = −Fx (z). Therefore, f 0 (x) = −Fy (z)−1 Fx (z), which shows that the partial derivatives of the solution f in the implicit function theorem may be calculated by carrying out a matrix inversion. However, this is practical only for small dimensions, and even in this case it is often easier to apply the chain rule directly and then use Cramer’s rule. The next example illustrates the latter approach. ♦ 9.5.5 Example. Suppose (x0 , y0 , u0 , v0 ) satisfies the system F (x, y, u, v) = G(x, y, u, v) = 0,

(9.20)

where F and G are C 1 in a neighborhood of (x0 , y0 , u0 , v0 ) and ∂(F, G) (x0 , y0 , u0 , v0 ) 6= 0. ∂(u, v) Then (9.20) has a C 1 solution u = u(x, y), v = v(x, y) near (x0 , y0 ) such that u0 = u(x0 , y0 ), v0 = v(x0 , y0 ). Differentiating each equation in (9.20) with respect to x and y, we obtain the two systems Fu ux + Fv vx = −Fx

Fu uy + Fv vy = −Fy

Gu ux + Gv vx = −Gx

Gu uy + Gv vy = −Gy

Cramer’s rule gives the following solutions near (x0 , y0 , u0 , v0 ): ∂(F, G) ∂(x, v) ux = − , ∂(F, G) ∂(u, v)

∂(F, G) ∂(u, x) vx = − , ∂(F, G) ∂(u, v)

∂(F, G) ∂(y, v) uy = − , ∂(F, G) ∂(u, v)

∂(F, G) ∂(u, y) vy = − . ♦ ∂(F, G) ∂(u, v)

Differentiation on Rn

315

Exercises 1.S What does the implicit function theorem tell us about solving the equation x + y 2 + exy = 1 near (0, 0) for one of the variables in terms of the other? 2. Suppose (x0 , y0 , z0 ) satisfies the equation F (x, y, z) = 0, where F is C 1 in a neighborhood of (x0 , y0 , z0 ) and Fz (x0 , y0 , z0 ) 6= 0. By the implicit function theorem, F (x, y, z) = 0 has a C 1 solution z = z(x, y) near (x0 , y0 ) with z0 = z(x0 , y0 ). Show that near (x0 , y0 , z0 ), zx = −

Fx Fz

and zy = −

Fy . Fz

3. Show that for each of the functions F below the equation F (x, y, z) = 0 has a local C 1 solution z = z(x, y) on some ball Br (a, b) such that z(a, b) = c. Calculate zx in a neighborhood of (a, b, c). (a) sin(xyz) + cos(xyz) − 1,

(a, b, c) = (1, π, 0).

(b) e

+ x + y + z − 1, √ (c) z sin(x + y + z) − π 3/6,

(a, b, c) = (0, 0, 0).

(d) xyz + ln(x + y + z) − 1 − ln 3,

(a, b, c) = (1, 1, 1).

(e) x ln z + y ln x + z ln y,

(a, b, c) = (1, 1, 1).

(f) x sin z + y sin x + z sin y − 3π/2,

(a, b, c) = (π/2, π/2, π/2).

(g) z

(a, b, c) = (1, 1, −1).

xyz

2n

+ xz

2n−1

(a, b, c) = (π/6, π/6, π/3).

+ xy − 1, n ∈ N,

(h) cos(xyz) + cos(xz) + cos(yz),

(a, b, c) = (0, 1, π/2).

4. Suppose (x0 , y0 , z0 ) satisfies the system F (x, y, z) = G(x, y, z) = 0, where F and G are C 1 in a neighborhood of (x0 , y0 , z0 ) and ∂(F, G) (x0 , y0 , z0 ) 6= 0. ∂(x, y) By the implicit function theorem, the system has a C 1 solution (x, y) = (x(z), y(z)) near (x0 , y0 ) with (x0 , y0 ) = (x(z0 ), y(z0 )). Show that near (x0 , y0 , z0 ), ∂(F, G) ∂(z, y) x0 (z) = − , ∂(F, G) ∂(x, y)

∂(F, G) ∂(x, z) and y 0 (z) = − . ∂(F, G) ∂(x, y)

5.S Show that each pair of variables in the system √ sin(x + z) + ln(y + z) = 2/2 exz + sin(πy + z) = 1

316

A Course in Real Analysis are C 1 functions of the other variable near (x, y, z) = (π/4, 1, 0). In the case (x, y) = x(z), y(z) , calculate x0 (z) and y 0 (z) in a neighborhood of (π/4, 1, 0).

6. Show that each pair of variables in the system xy + yz + xz = 11 xyz + x + y

=9

are C 1 functions of the other variable near (x, y, z) = (1, 2, 3). In the case (x, y) = x(z), y(z) , calculate x0 (z) and y 0 (z) in a neighborhood of (1, 2, 3). 7. Show that each pair of the variables (u, v), (x, y), and (x, v) in the system x2 − y 2 + uv−v 2 = 0 x2 + y 2 + uv+u2 = 4 are C 1 functions of the remaining variables near (x, y, u, v) = (1, 1, 1, 1). In the case u(x, y), v(x, y), calculate ux in a neighborhood of (1, 1). 8.S Show that the system x − y + z + u2 = 2 −x

+ 2z + u3 = 2 − y + 3z + u4 = 3

cannot be solved for x, y, and z in terms of u near the point (x, y, z, u) = (1, 1, 1, 1), but for any other group of three variables a local C 1 solution in terms of the fourth variable is possible. 9. Let f (x, y) be continuously differentiable with f (0, 0) = 0. Give conditions on fx and fy such that each of the equations below has a C 1 solution y = y(x) on some interval (−r, r) with y(0) = 0. Calculate y 0 (x) in each case. (a) f (2y, 2x − 3y) = 0. (b)S f f (x, y), y = 0. (c) f f (x, y), f (x, y) = 0. 10. Let f (x, y) be continuously differentiable with f (0, 0) = 0. Give conditions on fx and fy under which each of the equations below has a C 1 solution z = z(x, y) on some open ball Br (0, 0) with z(0, 0) = 0. (a) f (2y + 3z, 3x − 2z) = 0. (b) f f (x, −z), z ln(e2 + x + y) = 0. (c) f e2z f (x, 2z), f (y, sin 3z) = 0. (d) f f (z, x), f (y, z) = 0.

Differentiation on Rn

317

11. Let f (x, y) be continuously differentiable with f (0, 0) = 0. For each system below, give conditions on fx and fy under which the system has a C 1 solution x = g(z), y = h(z) on some interval (−r, r) with g(0) = h(0) = 0. (a) S f f (x, y), f (z, y) = 0 (b) f f (z, z), f (x, y) = 0 f f (y, z), f (x, z) = 0 f f (x, y), f (y, z) = 0 For each system, calculate g 0 (z). 12. Let f (x, y) be continuously differentiable with f (0, 0) = 0. What does the implicit function theorem tell us about the possibility of solving the system f f (u, x), f (v, y) = f f (y, u), f (x, v) = 0 (a) for (x, y) in terms of (u, v) such that x(0, 0) = y(0, 0) = 0? (b) for (u, v) in terms of (x, y) such that u(0, 0) = v(0, 0) = 0? 13.S Let f , g, and h be continuously differentiable and f (1) = g(1) = h(1) = 0. Give conditions on f 0 , g 0 , and h0 so that the system f (xu) + g(yu) + h(zu) = 0 f (xv) + g(yv) + h(zv) = 0 has a C solution u = u(x, y, z), v = v(x, y, z) on some ball Br (1, 1, 1) such that u(1, 1, 1) = v(1, 1, 1) = 1. Calculate ux . 1

14. Let D ⊆ R2 be compact and let F (x, y, z) be continuous on the set E := D × [a, b] such that for each (x, y) ∈ D there exists a unique z = z(x, y) ∈ [a, b] for which F x, y, z(x, y) = 0. Prove that z(x, y) is continuous on D. 15.S Suppose the equation F (x1 , . . . , xn ) = 0 may be solved for each variable xj in terms of the others. Show that under suitable conditions ∂x2 ∂x3 ∂xn ∂x1 ... = (−1)n . ∂x1 ∂x2 ∂xn−1 ∂xn Verify this for each of the functions (a) F (x1 , x2 , x3 ) = x1 x2 x3 − 1, (b) F (x1 , x2 , x3 , x4 ) = x1 x2 x3 x4 − 1. 16. Let p(x, y) and q(x, y) be C1 on an open set U containing (0, 0) such that p(0, 0) = q(0, 0) = 0 and for (x, y) ∈ U \ {(0, 0)} p(x, y) > 0, and − 1 ≤ q(x, y) ≤ 1. Let

f (x, y, z) = z 3 + p(x, y)z + q(x, y), (x, y) ∈ U, z ∈ R.

Prove that there is a unique solution z = z(x, y) to f (x, y, z) = 0 on all of U which is C 1 on U \ {(0, 0)} and satisfies z(0, 0) = 0.

318

9.6

A Course in Real Analysis

Higher Order Partial Derivatives

Let f be a real-valued function defined on an open subset of R2 with first partial derivatives fx and fy . The higher order partial derivatives are defined inductively by ∂ ∂f ∂2f := , 2 ∂y ∂y ∂y ∂2f ∂ ∂f fyx = := , ∂x∂y ∂x ∂y ∂3f ∂ ∂2f fxxy = := ∂y∂x2 ∂y ∂x2 .. .

∂ ∂f ∂2f := , 2 ∂x ∂x ∂x ∂2f ∂ ∂f fxy = := , ∂y∂x ∂y ∂x ∂ ∂2f ∂3f := , fxxx = ∂x3 ∂x ∂x2 .. .

fyy =

fxx =

Analogous definitions are given for functions of n variables. For such a function f , integers mi ∈ Z+ and a permutation (i1 , . . . , in ) of (1, . . . , n), ∂mf mn 1 ∂xm i1 · · · ∂xin

,

m := m1 + · · · + mn ,

is called a partial derivative of order m. The following result will allow some simplifications in calculating higher order partial derivatives. 9.6.1 Theorem. Let U ⊆ R2 be open and let f : U → R have continuous first partial derivatives fx and fy on U . If fxy exists on U and is continuous at (a, b) ∈ U , then fyx (a, b) exists and equals fxy (a, b). Proof. Choose r > 0 such that (a − r, a + r) × (b − r, b + r) ⊆ U . For |h|, |k| < r, define ϕk (x) = f (x, b + k) − f (x, b),

x ∈ (a − r, a + r),

ψh (y) = f (a + h, y) − f (a, y),

y ∈ (b − r, b + r),

∆(h, k) = ϕk (a + h) − ϕk (a) = ψh (b + k) − ψh (b) = f (a + h, b + k) − f (a, b + k) + f (a, b) − f (a + h, b). By the mean value theorem applied twice, there exist s, t ∈ (0, 1) such that ∆(h, k) = ϕ0k (a + sh)h = fx (a + sh, b + k) − fx (a + sh, b) h = fxy (a + sh, b + tk)hk.

Differentiation on Rn

319

By continuity of fxy at (a, b), lim

(h,k)→(0,0)

∆(h, k) = lim fxy (a + sh, b + tk) = fxy (a, b). hk (h,k)→(0,0)

On the other hand, for each h, lim

k→0

∆(h, k) ψh (b + k) − ψh (b) = lim = ψh0 (b) = fy (a + h, b) − fy (a, b), k→0 k k

so by the iterated limit theorem (8.4.4), fy (a + h, b) − fy (a, b) ∆(h, k) ∆(h, k) = lim lim = lim . h→0 h→0 k→0 h hk hk (h,k)→(0,0) lim

Therefore, fyx (a, b) = fxy (a, b). The following example shows that continuity of at least one of the second partial derivatives in the theorem is essential. 9.6.2 Example. Let f (0, 0) = 0 and define f (x, y) =

x3 y − y 3 x if (x, y) 6= (0, 0). x2 + y 2

Then the first partial derivatives exist and are continuous on R2 , the second partial derivatives exist on R2 , but fxy (0, 0) 6= fyx (0, 0). Indeed, since fx (0, 0) = 0, f (h, y) − f (0, y) h2 y − y 3 = lim 2 = −y, h→0 h→0 h + y 2 h

fx (0, y) = lim

and similarly fy (x, 0) = x. Therefore, fxy (0, 0) = −1 and fyx (0, 0) = 1.

♦

Theorem 9.6.1 may be extended to functions f of n variables. Indeed, if 1 ≤ i < j ≤ n, then under suitable continuity conditions one has ∂2f ∂2f = , ∂xi ∂xj ∂xj ∂xi since the only “active” variables in this identity are xi and xj . Combining this observation with an induction argument leads to the following result. 9.6.3 Corollary. Let f be a real-valued function defined on an open subset of Rn and let m = m1 + m2 + · · · + mn , mi ∈ Z+ . Then, for any permutation (i1 , . . . , in ) of (1, . . . , n), ∂mf m ∂xi1 i1

m · · · ∂xinin

=

∂mf 1 ∂xm 1

n · · · ∂xm n

,

provided that all partial derivatives of f up to order m are continuous on U .

320

A Course in Real Analysis

9.6.4 Definition. Let r ∈ N. A real-valued function f on an open set U ⊆ Rn is said to be of class C r on U (or simply C r on U ) if all partial derivatives up to order r exist and are continuous on U . Also, f is of class C ∞ on U if it is of class C r on U for every r ∈ N. A vector-valued function is C r if each component function is C r . Continuous functions are said to be of class C 0 . A function is of class C r on a set if it is the restriction of a C r function on a larger open set. ♦ 9.6.5 Remarks. (a) A function of class r + 1 is of class r. The function ( xr+1 if x1 ≥ 0, 1 f (x1 , . . . , xn ) = 0 otherwise is C r on Rn but not C r+1 . (b) The standard rules of differentiation show that if f and g are real-valued functions of class C r , then so are αf , f + g, f g, and f /g. For example, if f (x, y) and g(x, y) are of class C 2 , then (f g)xx = fxx g + f gxx + 2fx gx , with similar formulas holding for (f g)xy and (f g)yy . Since the terms on the right are continuous, f g is C 2 . In particular, polynomials and rational functions of several variables are of class C ∞ . (c) The composite f = g ◦ h of real-valued C r functions is again C r . This follows from the chain rule: The matrix equation f 0 (x) = g 0 h(x) h0 (x) shows that the entries of f 0 (x) are sums of products of C r−1 functions, hence the entries of f (x) are C r . (d) If the function f in the statement of the inverse function theorem is C r on U , then the local inverse of f is also C r . This is proved by induction on r as follows. Assume that the assertion holds for r − 1, and let f be C r on U . Then the entries of the matrix f 0 (x) are C r−1 , hence, near a, the entries of −1 (f −1 )0 (y) = f 0 (f −1 (y)) are C r−1 , as these are rational functions of the entries of f 0 . Therefore, the entries of f −1 are C r . (e) If the function F in the statement of the implicit function theorem is C r , then the solution y = f (x) to the equation F (x, y) = 0 is C r . This follows from (d) , since f is constructed using the inverse function theorem. ♦ The following example illustrates how the chain rule may be used to calculate higher order partial derivatives of composite functions.

Differentiation on Rn

321

9.6.6 Example. Let u = f (x, y) be C 2 on R2 and let x = r cos θ, y = r sin θ. Then ur = (cos θ)ux + (sin θ)uy ,

uθ = −(r sin θ)ux + (r cos θ)uy ,

urr = (cos θ)uxr + (sin θ)uyr = (cos θ)2 uxx + (2 sin θ cos θ)uxy + (sin θ)2 uyy , uθθ = −(r cos θ)ux − (r sin θ)uxθ − (r sin θ)uy + (r cos θ)uyθ , = (r sin θ)2 uxx − (2r2 sin θ cos θ)uxy + (r sin θ)2 uyy − rur . Calculations like these are useful for changing coordinates in differential operators. For example, the above equations imply that ∂2 ∂2 ∂2 1 ∂ 1 ∂2 + 2 = 2+ + 2 2. 2 ∂x ∂y ∂r r ∂r r ∂θ

(9.21)

The operator on the left is called the Laplacian. The equation expresses the Laplacian in polar coordinates. ♦

Exercises 1. Let z = f (x, y) be C 2 on R2 . Show that the following equations hold for the given functions x = x(r, t) and y = y(r, t). zrr + ztt (a) zxx + zyy = 2 , x = ar + bt, y = at − br. a + b2 rzrr − tztt (b)S xzxx − zyy = , x = rt, y = r + t. t−r zrr + ztt , x = rt, y = r2 − t2 . (c) zxx + 4zyy = 2 r + t2 (d) x2 zxx + y 2 zyy = 21 [r2 zrr + ztt − rzr ], x = ret , y = re−t . (e)S zxx + zyy = e−2r [zrr + ztt ],

x = er sin t, y = er cos t.

(f)S a2 x2 zxx + b2 y 2 zyy = zrr + ztt − azr − bzt ,

x = ear , y = ebt .

2. Let z = f (x, y) be C 2 on R2 , x = ar + bs, and y = cr + ds. Show that 2 zrr a c2 2ac zxx zss = b2 d2 2bd zyy . zrs ab cd ad + bc zxy In particular, if x = r − s and y = r + s, show that zxx 1 1 −2 zrr 1 zyy = 1 1 2 zss . 4 1 −1 0 zxy zrs 3. Let z = f (x, y), x = g(r, s), y = h(r, s) be C 2 on R2 . Show that 2 2 ∂2z ∂z ∂ 2 x ∂z ∂ 2 y ∂ 2 z ∂x ∂ 2 z ∂r ∂ 2 z ∂x ∂y = + + + +2 . 2 2 2 2 2 ∂r ∂x ∂r ∂y ∂r ∂x ∂r ∂y ∂r ∂x∂y ∂r ∂r

322

A Course in Real Analysis

4.S Let F (x, y, z) be C 2 on an open set U and assume that the equation F (x, y, z) = 0 defines z implicitly as a function of x and y. Express zxx in terms of partial derivatives of F . 5.S Show that each of the following functions u = u(t, x) satisfies the one dimensional heat equation ut = k 2 uxx . (a) u = (a sin x + b cos x) exp(−k 2 t).

(b) u = t−1/2 exp (−x2 /4k 2 t).

6. Let f (x) and g(x) be twice differentiable. (a) Show that the function u(t, x) = f (x − ct) + g(x + ct) satisfies the one dimensional wave equation utt = c2 uxx . 1 (b) Show that the function v(t, x) = [f (x − ct) + g(x + ct)], x > 0, x 1 c2 satisfies the equation vtt = c2 1 + vxx + vx . x x 7.S (Spherical coordinate analog of (9.21)). Let w = f (x, y, z) be of class C 2 on R3 , where x = ρ sin φ cos θ, y = ρ sin φ sin θ, and z = ρ cos φ. Show that ∂2w ∂2w ∂2w ∂ 2 w 2 ∂w 1 ∂ 2 w cos φ ∂w 1 ∂2w + + = + + + + . ∂x2 ∂y 2 ∂z 2 ∂ρ2 ρ ∂ρ ρ2 ∂φ2 ρ2 sin φ ∂φ ρ2 sin2 φ ∂θ2 8. Show that if f (x, y) is C 2 and homogeneous of degree n ≥ 2 (Exercise 9.3.15), then x2 fxx + 2xyfxy + y 2 fyy = n(n − 1)f (x, y). 9.S Let g be C 2 on (0, +∞), p 6= 0, and f (x) = g (kxkp ), x ∈ Rn \ {0}. Show that n

1X fx x = (n + p − 2)kxkp−2 g 0 (kxkp ) + pkxk2(p−1) g 00 (kxkp ) and p i=1 i i h iX 1X fxi xj = (p − 2)kxkp−4 g 0 (kxkp ) + kxk2(p−2) g 00 (kxkp ) xi xj . p i 0, 0 ≤ t ≤ T

Differentiation on Rn

323

into uτ (τ, x) = (k − 1)ux (τ, x) + uxx (x, τ ) − ku(x, τ ), k := 2r/σ 2 . The first equation arises in the Black–Scholes theory of option pricing. The second is an example of a diffusion equation. 11. Show that the substitutions u(τ, x) = eax+bτ w(τ, x), a := 12 (1−k), b := a(k−1)+a2 −k = − 14 (k+1)2 , reduce the diffusion equation in Exercise 10 to the heat equation wτ (τ, x) = wxx (τ, x)

9.7

Higher Order Differentials and Taylor’s Theorem

Higher order differentials of a function f of several variables are analogs of higher order derivatives of functions of a single variable. These may be conveniently expressed in terms of higher order partial derivatives of f . An important consequence is Taylor’s theorem in n-dimensions, which is used to establish convergence of power series in several variables. We begin by giving an alternate description of the space L Rn , L(Rn , R) . For a member B of this space and each h ∈ Rn , Bh ∈ L(Rn , R) has matrix (Bh)e1 · · · (Bh)en , which we identify with the vector (Bh)e1 , . . . , (Bh)en , so that (Bh)k may be written (Bh) · k. Now define ˜ B(h, k) := (Bh) · k,

h, k ∈ Rn .

˜ is linear in h for each fixed k and linear in k for each fixed h. Such Clearly, B a function is called a bilinear functional on Rn . Using the bilinearity, we have n n n X n X X X i i ˜ ˜ B(h, k) = B hi e , kj e = Bij hi kj , (9.22) i=1

j=1

i=1 j=1

˜ i , ej ) = (Bei ) · ej . In matrix notation, where Bij := B(e k1 . ˜ B(h, k) = [h1 · · · hn ] Bij .. . kn

324

A Course in Real Analysis

˜ on Rn , the equation (Bh)k := Conversely, given any bilinear functional B n n ˜ ˜ B(h, k) defines a member B of L R , L(R , R) . Thus, identifying B with B, n n we see that L R , L(R , R) may be viewed as the vector space of all bilinear functionals on Rn . Now let U ⊆ Rn be open and let f : U → R be C 2 on U . Then df is a function on U taking values in L(Rn , R). Identifying df with the vector function ∇f = (∂1 f, . . . , ∂n f ), we define d2 fx ∈ L Rn , L(Rn , R) by d2 fx = d(df )x = d(∇f )x that is, by the above identification, d2 fx (h, k) = d(∇f )x (h) · k, x ∈ U, h, k ∈ Rn . The matrix of d(∇f )x has (i, j) entry ∂j ∂i f (x) = ∂i ∂j f (x), since f is C 2 . Thus n X ∂ 2 f (x) d2 fx (h, k) = hi kj , h, k ∈ Rn . ∂x ∂x i j i,j=1 The bilinear function d2 fx is called the second order differential of f at x. For higher order differentials, we need the following generalization of a bilinear functional: 9.7.1 Definition. An m-multilinear functional on Rn is a real-valued function M (h1 , . . . , hm ) of vectors hj = (hj1 , . . . , hjn ) ∈ Rn that is linear in each variable hj when the other variables are held fixed. ♦ Analogous to (9.22) we have M (h1 , . . . , hm ) =

n X

···

j1 =1

n X

Mj1 ,...,jm h1j1 · · · hm jm

(9.23)

jm =1

where Mj1 ,...,jm := M (ej1 , . . . , ejm ). Now let f be C m on U , m ≥ 2. The mth order differential of f at x is defined inductively by dm fx = d(dm−1 f )x . As in the case m = 2, we may interpret dm fx as the m-multilinear functional dm fx (h1 , . . . , hm ) =

n X

···

j1 =1

n X jm

∂ m f (x) h1j1 · · · hm jm . ∂x · · · ∂x j j 1 m =1

The mth total differential Dm fx of f at x is then defined by Dm fx (h) := dm fx (h, . . . , h), h := (h1 , . . . , hn ) n n X X ∂ m f (x) = ··· hj · · · hjm , ∂xj1 · · · ∂xjm 1 j =1 j =1 1

m

(9.24)

Differentiation on Rn

325

which is frequently written D m fx =

n X j1 ,j2 ,...,jm

∂ m f (x) dxj1 dxj2 · · · dxjm , dxj (h) := hj . ∂xj1 ∂xj2 . . . ∂xjm =1

By 9.6.3, each partial derivative in (9.24) may be expressed as ∂mf , m := m1 + · · · + mn , mj ∈ Z+ . n . . . ∂xm n

1 ∂xm 1

Similarly, the corresponding product of h’s in (9.24) may be written in the mn + + 1 form hm 1 . . . hn . For a fixed multi-index (m1 , . . . , mn ) ∈ Z × · · · × Z , the number of terms in (9.24) of the form ∂mf m1 mn mn h1 · · · hn 1 ∂xm 1 . . . ∂xn is given by the multinomial coefficient m m , = m1 , m2 , . . . , mn m1 ! m2 ! · · · mn !

(9.25)

which is the number of distinct ways of arranging m objects, where m1 are alike, m2 are alike, etc. With this notation, (9.24) may be written X ∂ m f (x) m m D fx (h) = hm1 · · · hmn , (9.26) m1 n m1 , . . . , mn ∂x1 . . . ∂xm n or, in differential notation, X m ∂ m f (x) (dx1 )m1 · · · (dxn )mn , D m fx = m1 n m1 , . . . , mn ∂x1 . . . ∂xm n where the sums are taken over all multi-indices (m1 , . . . , mn ) ∈ Z+ × · · · × Z+ for which m1 + · · · + mn = m. We may go a step further by appealing to the following generalization of the binomial theorem. 9.7.2 Multinomial Theorem. Let h1 , . . . , hn ∈ R and m ∈ N. Then X m m n (h1 + · · · + hn ) = hm1 · · · hm (9.27) n , m1 , . . . , mn 1 where the summation is taken over all multi-indices (m1 , . . . , mn ) for which m1 + · · · + mn = m. Proof. The theorem may be proved by induction, but we give a combinatorial argument instead. The left side of (9.27) expands into a sum of products of the form x1 · · · xm , where each xi is one of the terms in the sum h1 + · · · + hn .

326

A Course in Real Analysis

mn 1 Each such product may be written uniquely as hm 1 · · · hn , where mj ≥ 0 and m1 + · · · + mn = m. For each fixed (m1 , . . . , mn ), the number of products of this form is the number of ways m1 factors in the product x1 · · · xn may be chosen to be h1 , m2 factors may be chosen to be h2 , etc. This number is precisely the multinomial coefficient (9.25).

Now consider the operator hi

∂ , which takes a C 1 function f to the ∂xi

∂f . If multiplication of such operators is defined as operator ∂xi composition, then the usual laws of algebra hold. For example, the operator ∂ ∂ ∂ h1 h2 + h2 ∂x1 ∂x2 ∂x2 function hi

applied to a C 2 function f yields ∂2f ∂2f = h1 h2 + h22 ∂x1 ∂x2 ∂x22

∂2 ∂2 h1 h2 + h22 2 ∂x1 ∂x2 ∂x2

f,

hence we may write ∂ ∂ ∂ ∂2 ∂2 h1 + h2 h2 = h1 h2 + h22 2 . ∂x1 ∂x2 ∂x2 ∂x1 ∂x2 ∂x2 Similarly, h

∂ ∂ +k ∂x ∂y

2

= h2

2 ∂2 ∂2 2 ∂ + 2hk + k . ∂x2 ∂x∂y ∂y 2

The last example suggests that the multinomial theorem is valid in this setting. This is indeed the case (a similar proof works). It follows from (9.26) that the mth total differential may be written in operator form as m ∂ ∂ ∂ m D fx (h) = h1 + h2 + · · · + hn = (h · ∇)m . ∂x1 ∂x2 ∂xn We may now state the n-dimensional version of Taylor’s theorem. 9.7.3 Taylor’s Theorem. Let U ⊆ Rn be open, m ∈ N, and let f : U → R be C m+1 on U . Then for each pair of distinct points a, x ∈ U for which [a : x] ⊆ U , there exists a point c ∈ [a : x] depending on x and a such that f (x) =

m X p m+1 1 1 h · ∇ f (a) + h·∇ f (c), p! (m + 1)! p=0

h := x − a. (9.28)

Proof. The line segment [a : x] is described by ϕ(t) := (1 − t)a + tx = a + th, 0 ≤ t ≤ 1.

Differentiation on Rn

327

Since U is open, there exists an r > 0 such that ϕ (−r, 1 + r) ⊆ U . Let F = f ◦ ϕ. By the chain rule, n n X X ∂f ϕ(t) ∂ 2 f ϕ(t) d ∂f ϕ(t) 0 F (t) = hj and = hi , ∂xj dt ∂xj ∂xi ∂xj j=1 i=1 hence

n X ∂ 2 f ϕ(t) F (t) = hi hj . ∂xi ∂xj i, j=1 00

An induction argument shows that F

(p)

(t) =

n X

j1 ,...,jp

p ∂ p f ϕ(t) hj1 . . . hjp = h · ∇ f ϕ(t) . ∂x · · · ∂x j1 jp =1

By Taylor’s theorem in one variable, there exists c ∈ (0, 1) such that f (x) = F (1) =

m X F (p) (0) p=0

p!

+

F (m+1) (c) . (m + 1)!

Setting c = ϕ(c) completes the proof. The summation in (9.28) is called an mth order Taylor polynomial about a and is denoted by Tm (x, a). For example, the second order Taylor polynomial of a C 2 function f (x1 , x2 ) is f + h1

∂2f ∂f ∂f 1 ∂2f 1 ∂2f + h1 h2 , + h2 + h21 + h22 2 ∂x1 ∂x2 2 ∂x1 ∂x1 ∂x2 2 ∂x22

where hj = xj − aj and the terms are evaluated at (a1 , a2 ). The last term in (9.28) is called the remainder term and is denoted by Rm (x, a). The following theorem gives a sufficient condition for a C ∞ function to be expressed as a multi-variable Taylor’s series. 9.7.4 Taylor Series Representation. Let U ⊆ Rn be open and convex and let f : U → R be C ∞ on U . Suppose that for some M < +∞ p p ∂ pf (x) p ≤ M ∂x 1 ∂x 2 . . . ∂xnn 1 2 for all x ∈ U , p ∈ N, and all pj ∈ Z+ , where p = p1 + . . . + pn . Then f (x) =

∞ X p 1 h · ∇ f (a), a, x ∈ U, h := x − a. p! p=0

(9.29)

328

A Course in Real Analysis

Proof. By 9.7.3, the theorem will follow if we show that the remainder term Rm (x, a) =

m+1 1 h·∇ f (c) (m + 1)!

tends to zero as m → ∞. By (9.26), X m + 1 M |Rm (x, a)| ≤ |h1 |m1 |h2 |m2 . . . |hn |mn , (m + 1)! m1 , . . . , mn where the summation is taken over all multi-indices (m1 , . . . , mn ) for which m1 + · · · + mn = m + 1. By the multinomial theorem, this sum is hm+1 , where h = |h1 | + |h2 | + · · · + |hn |. Therefore, |Rm (x, a)| ≤

M hm+1 , (m + 1)!

which implies limm Rm (x, a) = 0. The series on the right in (9.29) is called the Taylor series for f about a. While the theorem may be applied directly, in many cases it is easier to make use of single variable series. For example, from the series expansion for ex we have exy =

∞ ∞ X n X X xn y n xj y n−j and ex+y = ex ey = . n! j! (n − j)! n=0 n=0 j=0

Exercises 1. Let f be of class C 3 . Write out explicitly (a) D2 f (x, y).

(b)S D3 f (x, y).

(c) D2 f (x, y, z).

2.S Calculate D2 f for the functions f (x, y) = 1 2 2 (a) x3 y 2 + x2 y 3 . (b) 2 . (c) sin(xy). (d) ex +y . (e) ln(x2 + y). x y 3.S Find Dm+n+1 (xm y n ). 4. Let f (x, y) be C n . Show that for 1 ≤ k ≤ n, k ∂k f (tx, ty) t=1 = (x, y) · ∇ f (x, y). k ∂t Conclude that if f is homogeneous of degree n (Exercise 9.3.15), then k (x, y) · ∇ f (x, y) = n(n − 1) · · · (n − k + 1)f (x, y).

Differentiation on Rn

329

5. Write out explicitly (a)S the first order Taylor polynomial for a C 1 function f (x1 , x2 , x3 ). (b) the third order Taylor polynomial for a C 3 function f (x1 , x2 ), 6. A polynomial of degree m + n in two variables x and y is a function of the form m X n X

aij xi y j , where aij ∈ R and amn 6= 0.

i=0 j=0

Prove that f (x, y) is a polynomial of degree ≤ p on Br (a, b) iff Dp+1 f (x, y) = 0 for all (x, y) ∈ Br (a, b). 7. Let P (x, y) be a polynomial in x, y. Prove that the polynomials P (x ± 1) may be written as linear combinations of derivatives ∂ k P (x, y) , ∂xi ∂y j

k = i + j.

8.S Let ϕ(t) be of class C m on an interval (−r, r) and let f (x) = ϕ b · x where b, x ∈ Rn . Show that the Taylor polynomial for f of order m about 0 is m X p ϕ(p) (0) b·x . p! p=0 9. Let U ⊆ R2 be open and connected and let f be C ∞ on U such that for each (x, y) ∈ U there exists r > 0 and p ∈ N depending on (x, y) such that Dp f = 0 on Br (x, y). Prove that there exists a single p ∈ N such that Dp f = 0 on U . Hint. Use Exercise 6. 10. Let U ⊆ Rn be open and let f be C p on U such that all partial derivatives of f of order r < p vanish throughout U . Let C be a compact convex subset of U . Prove that there exists c < +∞ such that kf (x) − f (y)k ≤ ckx − ykp ,

x, y ∈ C.

11. Use the one variable Taylor series to find third order Taylor polynomials with a = (0, 0) for the functions √ cos xy.

(a) S sin(x + y).

(b)

(d) S arctan(x + y).

(e) e2x+3y .

(c) (f)

ln(1 − x − y)−1 . y . 1 + xy

330

*9.8

A Course in Real Analysis

Optimization Throughout the section, f : U → R denotes a C 1 function on an open subset U of Rn .

In this section we use differential theory to find the maximum and minimum values of f on subsets E of U . The first step is to find all local extrema.

Local Extrema and Critical Points 9.8.1 Definition. Let a ∈ U . If f (a) is the maximum (minimum) value of f on some ball in U with center a then f is said to have a local maximum (local minimum) at a ∈ U In either case, f is said to have a local extremum at a. ♦ The following theorem gives a necessary condition for the existence of a local extremum. 9.8.2 Local Extremum Theorem. If f has a local extremum at a, then dfa = 0. Proof. The function g(t) := f (a1 , . . . aj−1 , t, aj+1 , . . . , an ) has a local extremum at t = aj , hence, by the single variable local extremum theorem (4.2.2), ∂j f (a1 , . . . , an ) = g 0 (aj ) = 0. 9.8.3 Definition. A point a ∈ U is called a critical point of f if dfa = 0. A critical point a is a local maximum (local minimum) point if f has a local maximum (local minimum) at a. If a is neither a local maximum nor a local minimum point, then a is called a saddle point. ♦

FIGURE 9.2: Saddle point. By definition, a critical point a is a saddle point iff in each ball Br (a) there exist points x and y such that f (x) < f (a) < f (y). This means that the graph of f rises in some directions from a and falls in others. A familiar example is f (x, y) = y 2 − x2 at (0, 0) (Figure 9.2).

Differentiation on Rn

331

Second Derivative Test The following theorem gives sufficient conditions for a critical point of a function f to be a local maximum point, a local minimum point, or a saddle point. It may be seen as an extension of the second derivative test for functions of one variable. 9.8.4 Second Derivative Test. Let f be C 2 on U and let a ∈ U be a critical point of f . (a) If D2 fa (h) > 0 for all h 6= 0, then a is a local minimum point. (b) If D2 fa (h) < 0 for all h 6= 0, then a is a local maximum point. (c) If D2 fa (h) > 0 for some h and D2 fa (k) < 0 for some k, then a is a saddle point of f . Proof. Choose r > 0 such that Br (a) ⊆ U . By 9.28 with m = 1, for each h with khk < r there exists c ∈ [a : a + h] such that f (a + h) − f (a) = 21 D2 fc (h) = 12 D2 fa (h) + η(h) , (9.30) where η(h) = D fc (h) − D fa (h) = 2

2

n X

hi hj

i,j=1

Set

( ε(h) =

∂ 2 f (c) ∂ 2 f (a) − . ∂xi ∂xj ∂xi ∂xj

khk−2 η(h) if khk = 6 0, 0 if khk = 0.

Since |hi hj | ≤ khk2 , n 2 X ∂ f (c) ∂ 2 f (a) |ε(h)| ≤ ∂xi ∂xj − ∂xi ∂xj . i,j=1 Since f is C 2 , limh→0 ε(h) = 0. With these preliminaries out of the way, assume that the hypothesis in (a) holds. Since the function D2 fa (h) is continuous in h, it has a positive minimum m on the sphere S1 (0) in Rn . Thus h 2 2 2 D fa (h) = khk D fa ≥ mkhk2 , h 6= 0, khk so from (9.30) f (a + h) − f (a) ≥

1 2

mkhk2 + η(h) =

1 2

m + ε(h) khk2 .

Since m > 0 and ε(h) → 0, f (a + h) − f (a) > 0 for all h = 6 0 with sufficiently small norm. This proves (a). Part (b) follows from (a) by considering −f .

332

A Course in Real Analysis

To prove (c), suppose for some h, k that D2 fa (h) > 0 and D2 fa (k) < 0. By (9.30), t2 2 f (a + th) − f (a) = D fa (h) + khk2 ε(th) , 2 for all t > 0. Therefore, f (a + th) − f (a) > 0 for all sufficiently small t > 0. Similarly, f (a + tk) − f (a) < 0 for all sufficiently small t > 0. 9.8.5 Example. Let f (x, y, z) = x2 + y 2 + xy + 3x + sin2 z. The system fx = 2x + y + 3 = 0, fy = x + 2y = 0, fz = sin(2z) = 0 has solutions an = (−2, 1, nπ/2), n ∈ Z. From fxx = fyy = 2, fzz = 2 cos(2z), fxy = 1, and fxz = fyz = 0, we have D2 f (h, k, `) =

h

∂ ∂ ∂ +k +` ∂x ∂y ∂z

2 f

= h2 fxx + k 2 fyy + `2 fzz + 2(hkfxy + h`fxz + k`fyz ) = 2 h2 + k 2 + hk + `2 cos(2z) . Therefore, ( D fan (h, k, `) = 2

2(h2 + k 2 + hk + `2 ) if n = 2k, 2(h2 + k 2 + hk − `2 ) if n = 2k + 1.

Since h2 + k 2 + hk ≥ 0 for all h, k, a2k is a local minimum point and a2k+1 a saddle point. ♦ The second derivative test gives no information if D2 fa = 0. For example, the critical point (0, 0) of the function f (x, y) = xn + y 2 , n ≥ 3, is a saddle point if n is odd and a local minimum point if n is even. For n = 2, there is a simpler version of the second derivative test: 9.8.6 Corollary. Let U ⊆ R2 be open and let f : U → R be C 2 on U . For a critical point (a, b) of f , set f (a, b) fxy (a, b) 2 = fxx (a, b)fyy (a, b) − fxy ∆ = ∆(a, b) = xx (a, b). fyx (a, b) fyy (a, b) (a) If ∆ > 0 and fxx (a, b) > 0, then (a, b) is a local minimum point. (b) If ∆ > 0 and fxx (a, b) < 0, then (a, b) is a local maximum point. (c) If ∆ < 0, then (a, b) is a saddle point of f .

Differentiation on Rn

333

Proof. Let α = fxx (a, b), β = fxy (a, b), and γ = fyy (a, b). Then ∆ = αγ − β 2 and D2 f(a,b) (h, k) = αh2 + 2βhk + γk 2 ,

h, k ∈ R.

(9.31)

If α 6= 0, completing the square yields 2 2 kβ k 2 (αγ − β 2 ) kβ k2 ∆ D f(a,b) (h, k) = α h + + =α h+ + . α α α α 2

Thus if ∆ > 0, α > 0, and (h, k) 6= (0, 0), then D2 fa (h, k) > 0, hence, by the theorem, (a) holds. A similar argument proves (b). Now suppose ∆ < 0. If α 6= 0, then from (9.31) D2 f(a,b) (1, 0) = α

and D2 f (a, b)(−βα−1 , 1) =

∆ , α

which have opposite signs. If γ 6= 0, then completing the square yields 2 h2 ∆ hβ + , D2 f(a,b) (h, k) = γ k + γ γ and one may argue similarly. (This also shows that (a) and (b) hold with fxx in the statement replaced by fyy .) Finally, if α = γ = 0, then β 6= 0, and (9.31) shows that, again, D2 fa (h, k) has positive and negative values. This proves (c). 9.8.7 Example. Let f (x, y) = 3x2 y + 2xy 2 − 6xy. Since fx (x, y) = 2y(3x + y − 3)

and fy (x, y) = x(3x + 4y − 6),

the critical points are (0, 0), (2, 0), (0, 3), and (2/3, 1). TABLE 9.1: Values of ∆. (a, b) fxx (a, b) fyy (a, b) fxy (a, b)

(0, 0) 0 0 −6

(2, 0) 0 8 6

(0, 3) 18 0 6

(2/3, 1) 6 8/3 2

∆(a, b)

−36

−36

−36

12

Table 9.1 shows that f has three saddle points and one local minimum point. ♦

334

A Course in Real Analysis

9.8.8 Example. Let 2

f (x, y) = (cx2 + y 2 )e−x

−y 2

, c 6= 0, 1.

The system fx = 2xe−x

2

−y 2

(c − cx2 − y 2 ) = 0,

fy = 2ye−x

2

−y 2

(1 − cx2 − y 2 ) = 0

has solutions (0, 0), (0, ±1), and (±1, 0). The second partial derivatives are fxx = 2e−x fyy fxy

2

−y 2

c − 3cx2 − y 2 + 2x2 (cx2 + y 2 − c) , 2 2 = 2e−x −y 1 − cx2 − 3y 2 + 2y 2 (cx2 + y 2 − 1) , and 2 2 = 4xye−x −y cx2 + y 2 + −c − 1 . TABLE 9.2: Values of ∆.

(a, b) fxx (a, b) fyy (a, b) fxy (a, b) ∆(a, b)

(0, 0) (0, 1)) 2c 2(c − 1)/e 2 −4/e 0 0 4c

8(1 − c)/e

(0, −1) 2(c − 1)/e −4/e 0

(1, 0) −4c/e 2(1 − c)/e 0

(−1, 0) −4c/e 2(1 − c)/e 0

8(1 − c)/e2

8(c − 1)/e 8(c − 1)/e2

The values of ∆ at the critical points (a, b) are given in Table 9.2. Assigning values to c produces a variety of local extreme points. For example, if c > 1, then (0, ±1) are saddle points and the remaining critical points are local minimum points of f . ♦

Global Extrema We now turn to the problem described at the beginning of the section, namely, to find the points in a subset E of U at which f has a maximum or a minimum. Such points, called global extrema, will always exist if E is closed and bounded. The following examples illustrate a common technique for finding them. 9.8.9 Example. Let f (x, y) = 2x3 − x2 + 3y 2 ,

E := (x, y) : x2 + y 2 ≤ 1 .

By 9.8.2, the extreme values of f occur at points on bd(E) or at critical points of f in int(E). Solving the system fx = 6x2 − 2x = 0, fy = 6y = 0 yields the critical points (0, 0) and (1/3, 0), which are candidates for extrema

Differentiation on Rn

335

in int(E). To find possible extrema on bd(E) we substitute 1 − x2 for y 2 in the expression for f to obtain the function F (x) = 2x3 − 4x2 + 3, −1 ≤ x ≤ 1. Since the only zero of F 0 (x) in [−1, 1] is x = 0, single variable optimization theory gives us the additional extrema candidates (0, ±1) and (±1, 0). Calculating the values of f at these six points shows that f (0, ±1) = 3 is the maximum value of f on E and f (−1, 0) = −3 is the minimum. ♦ 9.8.10 Example. Let f (x, y, z) = (x − 1)2 + (y − 2)2 + z 2 ,

E := (x, y, z) : x2 + y 2 + z 2 ≤ 6 .

The solution of the system fx = fy = fz = 0 is (1, 2, 0), at which f has minimum value zero. The maximum of f must then occur on bd(E). Substituting the expression 6 − x2 − y 2 for z 2 in the definition of f , we obtain the function F (x, y) = (x − 1)2 + (y − 2)2 + 6 − x2 − y 2 = 11 − 2x − 4y, x2 + y 2 ≤ 6. The system Fx = Fy = 0 has no solution, hence the extreme values √ of F must 2 2 lie on the boundary x + y = 6. To find these values, let x = 6 cos θ and √ y = 6 sin θ, so √ √ F (x, y) = G(θ) := 11 − 2 6 cos θ − 4 6 sin θ, 0 ≤ θ ≤ 2π. Applying single variable optimization techniques to G, we see that possible extreme values of F on x2 + y 2 = 6 occur at points q y) for which θ = 0 q (x, √ 6 6 and θ = arctan 2, that is, (x, y) = ( 6, 0) and ± 5, 2 5 . Calculating the values of F at these points shows that the maximum value of f on E is r r 6 6 f − , −2 , 0 ≈ 22. ♦ 5 5 In the above examples, E was the closure of an open set whose boundary is a smooth surface. In many important cases, however, E itself is a surface. The surfaces we shall consider are of the form E = {x ∈ U : g1 (x) = · · · = gm (x) = 0} , where U ⊆ Rn is open, m < n, and the functions gj are C 1 on U . The equations gj (x) = 0 are then called constraints and E is the constraint set. If f (a) is the maximum or minimum value of f on E, then f is said to have an extremum at a subject to the constraints gj = 0. 9.8.11 Example. We find the points on the surface z 2 −x2 y = 1 closest to the origin. This is equivalent to minimizing f (x, y, z) = x2 + y 2 + z 2 subject to the constraint z 2 −x2 y−1 = 0. Since the surface is unbounded, it suffices to consider

336

A Course in Real Analysis

that part of the surface inside a ball with center 0. To find the minimum, we substitute z 2 = x2 y + 1 into f to obtain a function F (x, y) = x2 (1 + y) + y 2 + 1 defined on an open disk containing a point at which f is minimum. The critical points of F , solutions of the system Fx = 2x(1 + y) = 0, Fy = x2 + 2y = 0, √ are (0, 0), and (± 2, −1). The last two are easily seen to be saddle points, while (0, 0) is a local minimum point. Therefore, the minimum of f occurs at (0, 0, ±1), hence the distance from the surface to the origin is 1. ♦

Lagrange Multipliers In 9.8.11, it was possible to solve the constraint equation for one of the variables in terms of the others, reducing the dimension by one, thereby simplifying the problem. This is not always possible, but the implicit function theorem may be used to solve the constraint equation locally. This is the method used in the proof of the next theorem. For its statement, we use the following notational conventions, similar to those used in the proof of the implicit function theorem. Notation. Let m < n and p := n − m. For points z ∈ Rn = Rm+p we write z = (x, y) = (x1 , . . . xm , y1 , . . . yp ), x ∈ Rm , y ∈ Rp . If G := (g1 , . . . , gm ) : U → Rm , then G(z) may be written as differentiable, we define ∂g1 ∂g1 ∂g1 ··· ∂y1 · · · ∂x1 ∂x m . .. . and Gy = . Gx = .. ··· . ··· . ∂g ∂gm ∂gm m ··· ··· ∂x1 ∂xm ∂y1

G(x, y). If G is ∂g1 ∂yp .. . . ∂gm ∂yp

♦

9.8.12 Lagrange Multipliers. Let U ⊆ Rn be open and let f, gj : U → R, j = 1, . . . , m < n be C 1 functions. Set G := (g1 , . . . , gm ). Suppose that f has a global extremum at c = (a, b) ∈ U subject to the constraint G = 0. If det Gx (c) 6= 0, then there exist constants λ1 , . . . , λm such that ∇f (c) =

m X

λi ∇gi (c).

i=1

Proof. Equation (9.32) is the system ∂g1 ∂gm + · · · + λm , j = 1, . . . , m, ∂xj ∂xj ∂g1 ∂gm ∂j+m f (c) = λ1 + · · · + λm , j = 1, . . . , p, ∂yj ∂yj ∂j f (c) = λ1

(9.32)

Differentiation on Rn

337

which may be written in matrix form as λ1 · · · λm Gx (c) = ∂1 f (c) · · · ∂m f (c) λ1 · · · λm Gy (c) = ∂m+1 f (c) · · · ∂n f (c) .

(9.33) (9.34)

Equation (9.33) is satisfied by defining λ1 · · · λm := ∂1 f (c) · · ·

∂m f (c) G−1 (9.35) x (c). It remains to show that (9.34) is satisfied for this choice of λ1 · · · λm . By the implicit function theorem applied to G, there is an open set Vb ⊆ Rp containing b and a continuously differentiable mapping h = (h1 , . . . , hm ) : Vb → Rm such that h(b) = a and G h(y), y = 0 for every y ∈ Vb . Applying the chain rule to each component equation gi h(y), y = 0 yields ∂gi ∂h1 ∂gi ∂hm ∂gi + ··· + + = 0, i = 1, . . . , m, j = 1, . . . , p, ∂x1 ∂yj ∂xm ∂yj ∂yj which may be written in matrix form as ∂h ∂h1 ∂g1 ∂g1 ∂g1 1 · · · ··· ∂yp ∂x1 ∂xm ∂y1 ∂y1 . .. .. .. = − .. . . . . . . ∂g ∂gm ∂gm ∂hm ∂hm m ··· ··· ∂x1 ∂xm ∂y1 ∂yp ∂y1

···

···

∂g1 ∂yp .. . ∂g1 ∂yp

or in the above notation as Gx (c)h0 (b) = −Gy (c). Multiplying the last equation on the left by ∂1 f (c) · · · ∂m f (c) G−1 x (c) and using (9.35), we obtain ∂1 f (c) · · · ∂m f (c) h0 (b) = − λ1 · · · λm Gy (c). (9.36) Since f h(y), y has a local extremum at b, its partial derivatives must vanish there: ∂f (c) ∂h1 (b) ∂f (c) ∂hm (b) ∂f (c) + ··· + + = 0, j = 1, 2, . . . , p. ∂x1 ∂yj ∂xm ∂yj ∂yj In matrix form, ∂1 f (c) · · ·

∂m f (c) h0 (b) = − ∂m+1 f (c) · · ·

Equation (9.34) now follows from (9.36) and (9.37).

∂n f (c) .

(9.37)

338

A Course in Real Analysis

9.8.13 Example. Let c, x ∈ Rn , c 6= 0. We find the extreme values of f (x) := c · x on the sphere kxk = 1, that is, subject to the constraint g(x) := kxk2 − 1 = 0. By Lagrange multipliers, the extreme values occur at points x for which ∇f (x) = λ∇g(x) for some λ ∈ R. This leads to the system ci = 2λxi , 1 ≤ i ≤ n. Squaring and adding yields kck2 = 4λ2 kxk2 = 4λ2 , hence 2λ = ±kck and x = c/2λ = ±c/kck. Therefore, the extreme values of f are f ± c/kck = ±kck. ♦ The last example has an important application to directional derivatives: Let h be differentiable on Br (a). From Exercise 9.3.10, the directional derivative Dx h(a) of h at a in the direction of a unit vector x is c · x, where c = ∇h(a). Thus, by the example, Dx h(a) is maximum when x = c/kck, that is, when x is in the direction of the gradient of h. 9.8.14 Example. Let x = (x1 , . . . , xn ), a = (a1 , . . . , an ), and c = (c1 , . . . , cn ), where xj ≥ 0, aj > 0, and cj > 0. We find the maximum value of f (x) = xa1 1 xa2 2 · · · xann subject to the constraint c · x = 1. Note that the conditions xj ≥ 0 and cj > 0 imply that the constraint set is closed and bounded. Set g(x) = c · x − 1. The maximum of f occurs at points x for which ∇f (x) = λ∇g(x) for some λ ∈ R. This leads to the equations aj f (x) = λcj xj , j = 1, . . . , n.

(9.38)

Adding Pn and using the constraint yields af (x) = λ, or f (x) = λ/a, where a = j=1 aj . From (9.38), aj = acj xj so the maximum occurs at the point a a2 an 1 , ,..., . ac1 ac2 acn In particular, if a1 = · · · = an = 1 and c1 = · · · = cn = 1/c, c > 0, then f (x1 , x2 , . . . , xn ) = x1 x2 · · · xn has maximum f (c/n, . . . , c/n) = (c/n)n . Thus x1 x2 · · · xn ≤ (c/n)n , or equivalently (x1 x2 · · · xn )1/n ≤ c/n for all xj > 0 satisfying x1 + · · · + xn = c. Since c is arbitrary, we obtain the classic result (x1 x2 . . . xn )1/n ≤

x1 + x2 + · · · + xn , xj ≥ 0, n

which asserts that the geometric mean of nonnegative data does not exceed the arithmetic mean. ♦

Exercises 1. In each case classify the critical point a := (π/2, π/2, π/2) of the function. (a) (sin x)(sin y)(sin z).

(b) (sin x)(cos y)(cos z).

2.S Show that the function x2 + 2y 2 + 3z 2 − xy − yz − xz on R3 has minimum value zero.

Differentiation on Rn

339

3. Find and classify the critical points of the following functions. (a) S x3 + 2xy + 3x2 + y 2 .

(b) x3 + 3x2 y 2 − 6x2 − 12y 2 .

(c) x2 y 2 + 2/x + 2/y.

(d) S x4 + 2y 2 − 4xy.

(e) x−1 + y −1 + ln(x2 + y 2 ).

(f) S x−1 + y −1 + arctan(y/x).

(g) x3 − xy 2 + x2 − y 2 .

(h) x4 − 2x2 + 4y 3 − 12y.

(i) S xy − x2 y − xy 2 .

(j) x4 − 4x3 + 4x2 + y 2 .

4. Find the maximum and minimum values of each of the following functions f on R2 \ {(0, 0)}. √ x+y x + 3y x + 2y x2 + xy (a)S p . . (b) p . (c) p . (d) 2 x + y2 x2 + y 2 x2 + y 2 x2 + y 2 5. Show that the point (x, x2 ) on the curve y = x2 nearest the point (1, 2) satisfies the equation 2x3 − 3x − 1 = 0. In Exercises 6–9, use the method of 9.8.9 and 9.8.10. 6. Find the extreme values of the following functions on the disk D := (x, y) : x2 + y 2 ≤ 1 : (a) 3x2 + 2y 2 − x. (b)S x2 + xy − x + y 2 . (c) cos(xy). (d)S sin(xy). 7.S Prove that the maximum of f (x, y) = x2 + ay 2 + (a − 1)y on the disk D := (x, y) : x2 + y 2 ≤ 1 occurs on bd(D). 8. Let f (x, y) = x2 + y 2 + axy on the disk D := (x, y) : x2 + y 2 ≤ 1 . Prove that a maximum of f occurs on bd(D), and that a minimum of f occurs on bd(D) iff |a| ≥ 2. Pn 9. Show that the (minimum) value of fn (x1 , . . . , xn ) = i=1 xi √ maximum √ on C1 (0) is n (− n). 10.S Let f (x, y) = ax−1 + by −1 + xy, a, b > 0. Prove that f has a minimum on (0, +∞) × (0, +∞) and that the minimum value is 3(ab)1/3 . 11.S Consider the data points (xi , yi ), 1 ≤ i ≤ n, where xi = 6 xj for at least one pair of points. The linear least squares fit is the line y = mx + b with the property that the sum of squares of the vertical distances from 2 Pn the data points to the line, namely, i=1 yi − mxi − b , is minimum. Show that x · y − nx y , and b = y − mx, where kxk2 − nx2 n n 1X 1X x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), x := xi , and y := yi . n i=1 n i=1 m=

340

A Course in Real Analysis

12. Let x, a, b, c ∈ Rn . Prove that f (x) := kx − ak2 + kx − bk2 + kx − ck2 has a minimum value and find the point at which it occurs. In Exercises 13–28, use Lagrange multipliers. 13. Show that the maximum value of f (x) = x1 x2 · · · xn on the set E := Pn −n {x : xi ≥ 0 and . i=1 xi ≤ 1} is n 14. Find the maximum and minimum of 2x − 3y subject to the constraint (x + 1)2 + (y − 1)2 = 1. 15.S Find the maximum and minimum of ax2 + 2bxy + y 2 subject to the constraint x2 + y 2 = c2 , where abc 6= 0 and (a + 1)2 + 4(a − b2 ) ≥ 0. 16. Show that the point√(x, y, z)√on the√surface x2 + y 2 + z 2 = 1 nearest the point (1, 2, 3) is (1/ 14, 2/ 14, 3/ 14). 17.S Show that the point (x, y, z) on the surface z = x2 + y 2 nearest the point (1, 2, 3) satisfies the equations 10x3 − 5x − 1 = 0, y = 2x, and z = x2 + y 2 = 5x2 . 18. Show that the point on the surface x2 + y 2 − z 2 = 1 nearest to the point (1, 2, 3) is x, 2x, 3x/(2x − 1)), where 20x4 − 20x3 − 8x2 + 4x − 1 = 0. 19.S Show that the point on the surface z 2 − x2 − y 2 = 1 nearest (1, 2, 3) is x, 2x, 3x/(2x − 1)), where 20x4 − 20x3 − 4x + 1 = 0. 20. The intersection of the surfaces z = x2 + y 2 and x + y + z = 1 is an ellipse lying above the xy plane. Find the highest and lowest points of the ellipse. 21. Let a, b, c > 0. Show that the maximum and minimum values of the function f (x, y, z) = ax + by + cz subject to the constraints x2 + z 2 = 1, y 2 + z 2 = 1, x, y, z ≥ 0, are, respectively, √ the maximum and minimum of the quantities c, a + b, and (a + b + cd)/ 1 + d2 , where d := c/(a + b). 22.S Find the maximum and minimum values of x + 2y + 3z subject to the constraints x + y + z = 1 and x2 + y 2 + z 2 = 1. 23. Let a > 1/3. Show that the maximum value of xyz subject to the constraints x + y + z = 1 and x2 + y 2 + z 2 = a is r 1 3a − 1 2 3 xyz = (1 − 3t + 2t ), where t = . 27 2

Differentiation on Rn

341

24.S Let x = (x1 , · · · , xn ), a = (a1 , · · · , an ) 6= 0, and b = (b1 , · · · , bn ). Show shortest distance from b to the hyperplane a · x = c is √ that the 2 c − a · b kak−1 . Pn 25. Let p ≥ 2. Show that the largest distance from the surface i=1 |xi |p = 1 to the origin is n(p−2)/2p and the smallest distance is 1. 26.S Find the distance from the point a = (a1 , . . . , an ) to the (n − 1)dimensional sphere kxk = 1 in Rn , where aj > 0, and kak = 6 1. 27. Let a = (a1 , . . . , an ) and b = (b1 , . . . , bn ), where ai , bi > 0. (a)S Show that the minimum value of the function a · x subject to the n n p X 2 X constraint bi /xi = 1, where xi > 0, is ai bi . i=1

(b) Show that the minimum value of n X 3 1/3 2/3 in (a) is ai bi .

i=1

Pn

i=1

ai x2i subject to the constraint

i=1

28. Let Pn a = (a1 , . . . , an ) and b = (b1 , . . . , bn ), where ai , bi > 0 and i=1 bi = 1. Find the minimum value of a · x subject to the conQn b straint i=1 xjj = 1, where xi > 0. 29. Let U ⊆ Rn be open and let f : U → R be C 2 on U . Show that if f has a local maximum (minimum) at a ∈ U , then D2 fa (h) ≤ 0 (≥ 0) for all h ∈ Rn . 30.S Prove the following generalization of Rolle’s theorem: Let U ⊆ Rn be bounded and open and let f : U → R be differentiable on U , continuous on cl(U ), and constant on bd(U ). Then f 0 (u) = 0 for some u ∈ U . 31. Let U ⊆ Rn be open and f : U → Rn C 1 on U such that Jf 6= 0 on U . Let a ∈ U and let C := Cr (a) ⊆ U , r > 0. Prove that if supC kf (x) − xk < r/2, then the equation f (x) = a has a solution in C.

Chapter 10 Lebesgue Measure on Rn

The methods of Chapter 5 may be modified in a natural way to construct the Riemann integral of a function of several variables. In Section 11.1, we briefly describe how this is done. However, the main goal of the present chapter and the next is to construct the more general Lebesgue integral. The choice to develop the n-dimensional Lebesgue integral rather than the n-dimensional Riemann integral is motivated by the fact that, as an analytical tool, the former has several distinct advantages over the latter. For example, the Lebesgue theory allows the interchange of limit and integral in more general settings. Furthermore, the collection of Lebesgue integrable functions, which includes unbounded functions on unbounded domains, is significantly larger than the set of Riemann integrable functions. These advantages make the Lebesgue theory better suited for applications based on, for example, probability theory and, in particular, stochastic processes. The key idea in Riemann integration on Rn is the partitioning of the domain of the integrand f into n-dimensional subintervals. The Riemann integral is then obtained as a limit of Riemann sums, that is, sums of function values times the volumes of the subintervals. In Lebesgue integration, it is the range of f rather than the domain that is partitioned into subintervals (see Figure 10.7). This still produces a partition of the domain of f ; however, the sets in this partition are generally more complicated than subintervals. The Lebesgue integral is constructed by multiplying the measure of these sets by function values, adding the results, and then taking limits. In this chapter we construct the measure and in the next chapter we construct the integral. The precise connection between the Riemann and Lebesgue integrals is made in Section 11.4.

10.1

General Measure Theory

In this section we give brief description of those aspects of measure theory that will be needed to construct Lebesgue measure on Rn . For a comprehensive treatment see, for example, [4].

343

344

A Course in Real Analysis

Sigma Fields 10.1.1 Definition. A σ-field on a nonempty set S is a collection F of subsets of S such that (a) S, ∅ ∈ F; (b) A ∈ F implies Ac ∈ F; (c) Ak ∈ F, k ∈ N, implies

[ k

Ak ∈ F.

♦

Part (c) of the definition says that F is closed under countable unions. By DeMorgan’s law, [ c \ Ak = Ack , k

k

hence part (b) implies that F is also closed under countable intersections. The collection of all subsets of S and the collection {∅, S} are simple examples of σ-fields. The following examples are somewhat more interesting. 10.1.2 Example. If A is an arbitrary collection of subsets of S, then the σ-field generated by A is the intersection σ(A) of all σ-fields containing A. It is the smallest σ-field containing A in the sense that if F is a σ-field containing A then F contains σ(A). In the special case where A = {A1 , A2 , . . .} is a countable partition of S, σ(A) is simply the collection F of all unions of members of A. Indeed, F is clearly closed under countable unions, and the calculation c [ [ Ak = Ak , F ⊆ N k∈F c

k∈F

shows that F is closed under complements. Thus, by minimality, σ(A) = F. ♦ 10.1.3 Example. If F is a σ-field on S and E ⊆ S, then the collection FE := {A ∩ E : A ∈ F} is a σ-field of subsets of E. Moreover, FE ⊆ F iff E ∈ F. (See Exercise 2.) ♦

Measure on a Sigma Field 10.1.4 Definition. A measure on a σ-field F of subsets of S is a function µ : F → [0, +∞] such that µ(∅) = 0 and µ has the additivity property [ X µ Ak = µ(Ak ) k

k

for any finite or infinite sequence of pairwise disjoint sets Ak ∈ F. The extended real number µ(A) is called the measure of A. ♦

Lebesgue Measure on Rn

345

10.1.5 Example. Let {pk } be a sequence of nonnegative real numbers. Define X µ(E) = pk , E ⊆ N, k∈E

where the sum may be infinite. (By convention, the sum over the empty set is zero.) It is not difficult to show that µ is a measure on the σ-field of all subsets of N. In the special case pk = 1 for all k, µ(E) counts the number of elements in E if E is a finite set, and µ(E) = +∞ otherwise. In this case, µ is called a counting measure. ♦ 10.1.6 Proposition. Let µ be a measure on a σ-field F and A1 , A2 , · · · ∈ F. (a) If A1 ⊆ A2 , then µ(A1 ) ≤ µ(A2 ) (monotonicity). P S (b) µ k Ak ≤ k µ(Ak ) (subadditivity). (c) µ(A1 ) + µ(A2 ) = µ(A1 ∪ A2 ) + µ(A1 ∩ A2 ) (inclusion-exclusion). (d) If Ak ↑ A, then µ(Ak ) ↑ µ(A) (continuity from below). (e) If Ak ↓ A and µ(A1 ) < +∞, then µ(Ak ) ↓ µ(A) (continuity from above). Proof. (a) By additivity, µ(A2 ) = µ(A2 \ A1 ) + µ(A1 ) ≥ µ(A1 ). (b) Write [ Ak = A1 ∪ (A2 ∩ Ac1 ) ∪ · · · ∪ (Am ∩ Ac1 ∩ · · · ∩ Acm−1 ) ∪ · · · . k

Since the sets in the union on the right are pairwise disjoint, by countable additivity and monotonicity [ X X µ Ak = µ(A1 ) + µ Ac1 ∩ · · · ∩ Acm−1 ∩ Am ≤ µ(Am ). k

m≥2

m≥1

(c) Since A1 ∪ A2 is the union of the pairwise disjoint sets A1 ∩ Ac2 , A1 ∩ A2 , and A2 ∩ Ac1 , additivity implies that µ(A1 ∪ A2 ) = µ(A1 ∩ Ac2 ) + µ(A1 ∩ A2 ) + µ(A2 ∩ Ac1 ). Similarly, µ(A1 ) + µ(A2 ) = µ(A1 ∩ Ac2 ) + 2µ(A2 ∩ A1 ) + µ(A2 ∩ Ac1 ). It follows that µ(A1 ∪ A2 ) = +∞ iff µ(A1 ) + µ(A2 ) = +∞, which proves (c) in the infinite case. In the finite case, simply subtract the above equations to get (c). (d) This is clear if some Ak has infinite measure, so assume µ(Ak ) < +∞

346

A Course in Real Analysis

for all k. Set A0 = ∅ and Ek = Ak \ Ak−1 . The sets Ek are pairwise disjoint, S∞ A = k=1 Ek , and µ(Ek ) = µ(Ak ) − µ(Ak−1 ), hence by additivity µ(A) =

∞ X

µ(Ek ) = lim n

k=1

n X µ(Ak ) − µ(Ak−1 ) = lim µ(An ). n

k=1

(e) Note that A1 \ Ak ↑ A1 \ A, hence, by (d), µ(A1 ) − µ(A) = µ(A1 \ A) = lim µ(A1 \ Ak ) = µ(A1 ) − lim µ(Ak ). k

k

Exercises For the following exercises, F is a σ-field of subsets of a set S and µ is a measure on F. 1.S Find an example which shows that the hypothesis µ(A1 ) < +∞ in 10.1.6(e) cannot be removed. 2. Verify that the collection FE in 10.1.3 is a σ-field. 3.S Let A, B ∈ F with µ(B) = 0. Show that µ(A ∪ B) = µ(A \ B) = µ(A). P 4. Let Ak , Bk ∈ F and let s denote the sum k µ(Ak \ Bk ). Prove that [ \ [ \ (a) µ Ak \ Bk ≤ s. (b) µ Ak \ Bk ≤ s. k

k

k

k

5.S (General inclusion-exclusion principle). Let µ A1 ∪ · · · ∪ An < +∞. Prove that for n ≥ 2 n X µ A1 ∪ · · · ∪ An = µ(Ai ) − i=1

+

n X

n X

µ(Ai ∩ Aj )

1≤i |I| − ε. Let {Ik } be any sequence of intervals covering I. By

350

A Course in Real Analysis

10.2.4, we may take Ik ∈ O. Let {Jk } be a sequence in H such that Ik ⊆ Jk and |Jk | < |Ik | + ε/2j (Figure 10.2). Since J is compact, there exists an m such that J ⊆ I1 ∪ · · · ∪ Im ⊆ J1 ∪ · · · ∪ Jm . Therefore, |I| − ε < |J| ≤ |J1 | + · · · + |Jm | ≤ ε +

∞ X

|Ik |,

k=1

P∞ the second inequality by 10.2.2(b). Letting ε → 0, we have |I| ≤ k=1 |Ik |. Therefore, |I| ≤ λ∗ (I). For (e), we may assume that λ∗ (Ak ) < +∞ for all k. Let ε > 0 and for each k choose a sequence {Ik,j }∞ j=1 in I such that Ak ⊆

∞ [

Ik,j and

j=1

∞ X

λ∗ (Ik,j ) ≤ λ∗ (Ak ) +

j=1

ε . 2k

S∞ Since the countable collection {Ik,j : k, j = 1, 2, . . .} covers k=1 Ak , [ X ∞ ∞ X ∞ ∞ X Ak ≤ λ∗ λ∗ (Ik,j ) ≤ λ∗ (Ak ) + ε. k=1

k=1 j=1

k=1

Since ε was arbitrary, (e) Smfollows. For (f), let I ∪ J ⊆ k=1 Ik , where Ik ∈ H. Since Ik ⊇ (Ik ∩ I) ∪ (Ik ∩ J), 10.2.2(c) shows that |Ik | ≥ |Ik ∩ I| + |Ik ∩ J|. Therefore, by (c), m X

|Ik | ≥

k=1

m X

|Ik ∩ I| +

k=1

m X

|Ik ∩ J| ≥ λ∗ (I) + λ∗ (J) = |I| + |J|,

k=1

Taking the infimum we have λ∗ (I ∪ J) ≥ |I| + |J|. The reverse inequality follows from (e).

Exercises 1.S Prove the assertions in 10.2.4. More generally prove the following: Let J be a collection of bounded intervals with the property that for each bounded interval I and each ε > 0 there exists J ∈ J containing I such that |J| < |I| + ε. For A ⊆ Rn , define X [ α(A) := inf |Jk | : Jk ∈ J and Jk ⊇ A . k

k

Then λ∗ (A) = α(A). 2. Prove that in the definition of λ∗ (A), I may be replaced by the collection Ir of all bounded intervals I whose coordinate intervals have rational endpoints.

Lebesgue Measure on Rn

351

3. Prove that in the definition of λ∗ (A), I may be replaced by the collection U of all bounded open subsets of R and also by the collection K of all compact sets. 4.S Show that Lebesgue outer measure is translation invariant, that is, λ∗ (A + x) = λ∗ (A) for every A ⊆ Rn and x ∈ Rn , where A + x := {a + x : a ∈ A}. 5. Show that Lebesgue outer measure has the reflection property λ∗ (−A) = λ∗ (A) for every A ⊆ Rn , where −A := {x : −x ∈ A}. 6. Show that Lebesgue outer measure has the dilation property λ∗ (rA) = |r|n λ∗ (A) for every A ⊆ Rn and r ∈ R, where rA := {rx : x ∈ A}.

10.3

Lebesgue Measure

By subadditivity of outer measure, λ∗ (C) ≤ λ∗ (C ∩ E) + λ∗ (C ∩ E c ) for all subsets E and C of Rn . The following definition singles out those sets E that also satisfy the reverse inequality for all sets C. 10.3.1 Definition. A subset E of Rn is said to be Lebesgue measurable if λ∗ (C) ≥ λ∗ (C ∩ E) + λ∗ (C ∩ E c )

(10.3)

for all subsets C of Rn . The collection of all Lebesgue measurable subsets of Rn is denoted by M = M(Rn ). The restriction of λ∗ to M is called Lebesgue measure on Rn and is denoted by λ = λn . Any particular set C satisfying (10.3) is called a test set for E. ♦ If C is a test set for E, then λ∗ (C) = λ∗ (C ∩ E) + λ∗ (C ∩ E c ) ; the set E splits the outer measure of C. 10.3.2 Theorem. M is a sigma field containing all sets of outer measure zero and λ is a measure on M.

352

A Course in Real Analysis

Proof. Clearly, ∅, Rn ∈ M, and since E and E c appear symmetrically in (10.3), E c ∈ M iff E ∈ M. If λ∗ (E) = 0, then, by monotonicity, λ∗ (C ∩ E) + λ∗ (C ∩ E c ) ≤ λ∗ (E) + λ∗ (C ∩ E c ) = λ∗ (C ∩ E c ) ≤ λ∗ (C), hence E ∈ M. Therefore, M contains all sets of LebesgueSouter measure 0. ∞ It remains to showSthat, for a sequence {Ek } in M, k=1 Ek ∈ M and P ∞ ∞ ∗ furthermore that λ∗ k=1 Ek = k=1 λ (Ek ) if the sets Ek are pairwise disjoint. This is accomplished in the following four steps: I. If E, F ∈ M, then E ∪ F, E ∩ F ∈ M. J To show that E ∪ F ∈ M, take any set C as a test set for E and take C ∩ E c as a test set for F to obtain λ∗ (C) = λ∗ (C ∩ E) + λ∗ (C ∩ E c ) and λ∗ (C ∩ E c ) = λ∗ (C ∩ E c ∩ F ) + λ∗ (C ∩ E c ∩ F c ). Combining these and using subadditivity, λ∗ (C) = λ∗ (C ∩ E) + λ∗ (C ∩ E c ∩ F ) + λ∗ (C ∩ E c ∩ F c ) ≥ λ∗ (C ∩ E) ∪ (C ∩ E c ∩ F ) + λ∗ (C ∩ E c ∩ F c ).

(10.4)

Since C ∩ E ∪ C ∩ E c ∩ F ⊇ C ∩ (E ∪ F ), by monotonicity and (10.4), λ∗ (C) ≥ λ∗ C ∩ (E ∪ F ) + λ∗ C ∩ E c ∩ F c = λ∗ C ∩ (E ∪ F ) + λ∗ C ∩ (E ∪ F )c .

This shows that E ∪ F ∈ M. That E ∩ F ∈ M follows from De Morgan’s law E ∩ F = (E c ∪ F c )c . K

II. If C ⊆ Rn and E, F ∈ M with E ∩ F = ∅, then λ∗ C ∩ (E ∪ F ) = λ∗ (C ∩ E) + λ∗ (C ∩ F ).

J Use C ∩ (E ∪ F ) as a test set for E to obtain λ∗ C ∩ (E ∪ F ) = λ∗ C ∩ (E ∪ F ) ∩ E + λ∗ C ∩ (E ∪ F ) ∩ E c = λ∗ (C ∩ E) + λ∗ (C ∩ F ).

K

S III. If the sets P Ek are pairwise disjoint and F := k Ek , then F ∈ M and λ(F ) = k λ(Ek ). Sk J Set Fk = j=1 Ej and let C ⊆ Rn . By steps I and II and induction, Fk ∈ M and k X λ∗ (C ∩ Fk ) = λ∗ (C ∩ Ej ). j=1

Lebesgue Measure on Rn

353

Thus, by monotonicity, λ∗ (C) = λ∗ (C ∩ Fk ) + λ∗ (C ∩ Fkc ) ≥

k X

λ∗ (C ∩ Ej ) + λ∗ (C ∩ F c ).

j=1

Since k was arbitrary, by subadditivity λ∗ (C) ≥

∞ X

λ∗ (C ∩Ej )+λ∗ (C ∩F c ) ≥ λ∗ (C ∩F )+λ∗ (C ∩F c ) ≥ λ∗ (C).

j=1

The inequalities are therefore equalities, which shows that F ∈ M. Taking C = F verifies the second assertion of III. K IV.

∞ [

Ek ∈ M.

k=1

J Use I, III and

S∞

k=1

Ek = E1 ∪ (E2 ∩ E1c ) ∪ (E3 ∩ E1c ∩ E2c ) ∪ . . . . K

10.3.3 Definition. A set E is said to have (Lebesgue) measure zero if λ(E) = 0. A property P (x) depending on points x ∈ Rn is said to hold almost everywhere (a.e.) or for almost all x if the set of all x for which P (x) is false has measure zero. ♦ For example, the Dirichlet function is zero a.e. More generally, if E ∈ M then 1E = 0 a.e. iff λ(E) = 0. By subadditivity, a countable union of sets of measure zero has measure zero. Since a point has measure zero, it follows that every countable set has measure zero. In particular, Qn has measure zero. The following is an example of an uncountable set with measure zero. 10.3.4 Example. (Cantor ternary set). Remove from I0,1 := [0, 1] the “middle third” open interval (1/3, 2/3), leaving closed intervals I1,1 and I1,2 with union E1 and total length 2/3. Next, remove from each of I1,1 and I1,2 the middle third open interval, leaving closed intervals I2,1 , I2,2 , I2,3 , and I2,4 with union E2 and total length 4/9 = (2/3)2 . By induction, one obtains a decreasing sequence .00220 . . .

E1 E2 E3

I0,1

.22202 . . .

I1,2

I1,1 I2,1

I2,2

I3,1 I3,2

I3,3 I3,4

I2,3

I2,4

I3,5 I3,6

I3,7 I3,8

.. .

FIGURE 10.3: Middle thirds construction. of closed sets Ek =

S 2k

j=1 Ik,j

such that, by subadditivity, λ∗ (Ek ) ≤ (2/3)k . If

354

A Course in Real Analysis

E denotes the intersection of these sets, then E is closed and, by monotonicity, λ∗ (E) ≤ (2/3)k for all k. Therefore, λ∗ (E) = 0. To show that E is uncountable, we use the fact that every real number x ∈ [0, 1] has both ternary and binary representations x = .d1 d2 . . . (ternary) = x = .e1 e2 . . . (binary) =

∞ X k=1 ∞ X

dk 3−k , where dk ∈ {0, 1, 2}, ek 2−k , where ek ∈ {0, 1}.

k=1

These are obvious analogs of the decimal representation of a real number (see Exercise 6.1.14). As with decimal representations, there is some ambiguity; for example, 1/3 = .1000 . . . = .0222 . . . (ternary). Now observe that if dk = 0 or

Ik−1,j dk = 0

Ik,2j−1

dk = 2

Ik,2j

FIGURE 10.4: x ∈ Ik−1,j ⇒ x ∈ Ik,2j−1+dk /2 . 2 for all k in the above ternary representation, then x ∈ E. For example, .00220 . . . ∈ I1,1 ∩ I2,2 ∩ I3,4 ∩ I4,7 ∩ · · · and .22202 . . . ∈ I1,2 ∩ I2,4 ∩ I3,7 ∩ I4,14 ∩ · · · (see Figure 11.2). In general, if x ∈ Ik−1,j , then x ∈ Ik,2j−1+dk /2 . Conversely, let x ∈ E. Since x ∈ E1 , we may choose d1 = 0 or 2. Similarly, since x ∈ E2 , we may choose d2 = 0 or 2, etc. Continuing in this manner, we see that every member of E has a (unique) ternary representation with digits 0 or 2. Now define ϕ : E → [0, 1] by ϕ .d1 d2 . . . (ternary) = .e1 e2 . . . (binary), where dk ∈ {0, 2} and ek = dk /2. The function ϕ is not one-to-one; for example, ϕ(.0222 . . .) = .0111 . . . = .1000 . . . = ϕ(.2000 . . .). However, by removing from E the countable set of all numbers with ternary representations having a tail end of zeros, these being necessarily rational, we obtain a set F on which ϕ is one-to-one. Since ϕ(F ) = (0, 1), it follows that E is uncountable. ♦

Lebesgue Measure on Rn

355

We show in the next section that intervals, open sets, and closed sets are Lebesgue measurable. It follows that countable unions and intersections of these sets are also Lebesgue measurable. The reader may well ask if there are any subsets of Rn that are not Lebesgue measurable. The answer is that there are many, but their construction is surprisingly intricate. The following is an example for the case n = 1. set). 10.3.5 Example. (A non-measurable Consider sets of the form x + Q, x ∈ R. We claim that if x + Q ∩ y + Q 6= ∅, then x + Q = y + Q. To see this, choose z ∈ x + Q ∩ y + Q , say z = x + r1 = y + r2 , r1 , r2 ∈ Q. Then, for any r ∈ Q, x + r = y + r2 − r1 + r ∈ y + Q and y + r = x + r1 − r2 + r ∈ x + Q, hence x + Q = y + Q. It follows that every real number is in exactly one of the sets x + Q. Now form a set E by choosing exactly one number in each of the distinct sets x + Q.1 For each x ∈ R, the set E ∩ (x + Q) has a single member, hence x = y + r for unique y ∈ E and r ∈ Q. Thus R may be expressed as a disjoint union ∞ [ R= (rk + E), (10.5) k=1

where {r1 , r2 , . . .} is an enumeration of Q. Suppose, for a contradiction, that E is Lebesgue measurable. Then λ(E) > 0, otherwise, by (10.5), translation invariance (Exercise 1), and countable additivity, R would have measure zero. On the other hand, let I be an arbitrary bounded interval and set J = Q ∩ (0, 1). Since I is measurable (Section 10.4, below), the set [ F := r+E∩I r∈J

is measurable. Also, since I and J are bounded so is F . Thus, by countable additivity and translation invariance, X X +∞ > λ(F ) = λ r+E∩I = λ E∩I . r∈J

r∈J

Since J is an infinite set, λ(E ∩ I = 0. But then

λ(E) =

∞ X k=0

∞ X λ(E ∩ [k, k + 1) + λ(E ∩ [−k − 1, −k) = 0. k=0

This contradiction shows that E cannot be Lebesgue measurable.

♦

1 The existence of E requires the axiom of choice, one of the axioms of Zermelo–Fraenkel set theory.

356

A Course in Real Analysis

Exercises 1. ⇓2 Show that E ∈ M and x ∈ Rn imply that x + E ∈ M. Conclude from Exercise 10.2.4 that λ(x + E) = λ(E). 2.S Show that E ∈ M implies that −E ∈ M. Conclude from Exercise 10.2.5 that λ(−E) = λ(E). 3. Show that E ∈ M and r 6= 0 imply that rE ∈ M. Conclude from Exercise 10.2.6 that λ(rE) = |r|n λ(E). 4.S Show that for any ε > 0 there exists an open set D dense in Rn such that λ(D) < ε. 5. Prove that if f and g are continuous real-valued functions on Rn which are equal a.e., then f = g. Does the same result hold if only one of the functions is continuous? 6. Let A be the subset of [0, 1] whose members are missing the digit three in their decimal expansions. Prove that A is uncountable and λ(A) = 0.

10.4

Borel Sets

Recall that the σ-field generated by a collection A of sets is the intersection of all σ-fields containing A (10.1.2). The following special case is of particular importance. 10.4.1 Definition. The Borel σ-field B = B(Rn ) is the σ-field generated by the open sets of Rn . A member of B is called a Borel set. ♦ 10.4.2 Remark. Since open sets and closed sets are complements of one another, B is also generated by the closed sets. Furthermore, since an open set is a countable union of n-dimensional open intervals (Exercise 8.2.4), B is also generated by O. Since every open interval is a countable union of closed and bounded intervals and every closed interval is a countable intersection of open intervals, B is also generated by C. Similar considerations show that B is generated by H as well. ♦ 10.4.3 Theorem. B(Rn ) ⊆ M(Rn ). Proof. By 10.4.2, it suffices to show that H ⊆ M. Note first that if I, J ∈ H then, using partitions as in the proof of 10.2.2, I \ J may be expressed (usually in several ways) as a disjoint union of members of H. (See Figure 10.5.) 2 This

exercise will be used in 11.2.18.

Lebesgue Measure on Rn

357

J

I1

I5

I2

I4

I3 I

FIGURE 10.5: I \ J = I1 ∪ I2 ∪ I3 ∪ I4 ∪ I5 . Now let I ∈ H, C ⊆ Rn , and let {Ik } be any sequence in H that covers C. We show that X λ∗ (C ∩ I) + λ∗ (C ∩ I c ) ≤ λ∗ (Ik ). (10.6) k

Taking the infimum over all such sequences {Ik } produces the inequality λ∗ (C ∩ I) + λ∗ (C ∩ I c ) ≤ λ∗ (C), provingPthat I ∈ M. ∞ To verify (10.6), we may assume that k=1 λ∗ (Ik ) < +∞. For each k there exist, according to the observation at the beginning of the proof, intervals Smk Jj,k ∈ H such that Ik \ I = j=1 Jj,k (disjoint union). Then Ik = (Ik ∩ I) ∪ (Ik \ I) = (Ik ∩ I) ∪

m [k

Jj,k (disjoint union),

j=1

hence, by 10.2.5(f) and induction, λ∗ (Ik ) = λ∗ (Ik ∩ I) +

mk X

λ∗ (Jj,k ).

j=1

Since {Ik ∩ I}k covers C ∩ I and {Jj,k }j,k covers C ∩ I c , X

λ∗ (Ik ) =

k

X

λ∗ (Ik ∩ I) +

k

mk XX k

λ∗ (Jj,k ) ≥ λ∗ (C ∩ I) + λ∗ (C ∩ I c ).

j=1

It may be shown that the inclusion B ⊆ M is proper.3 The importance of Borel sets is that they are closely linked to the topology of Rn and hence are better suited for contexts involving continuous functions. The remainder of the section demonstrates the precise connection between B and M. 3 See,

for example, [4].

358

A Course in Real Analysis

10.4.4 Lemma. For any bounded E ∈ M, there exists a decreasing sequence of bounded open sets Uk ⊇ E such that lim λ(Uk ) = lim λ cl(Uk ) = λ(E). k

k

Proof. By definition of λ(E), for each k we may choose a sequence of open intervals Ij,k with union Vk containing E such that X λ(E) ≤ λ(Vk ) ≤ λ cl(Vk ) ≤ |cl(Ij,k )| < λ(E) + 1/k. j

The sequence of open sets Uk := V1 ∩ · · · ∩ Vk is decreasing, contains E, and satisfies λ(E) ≤ λ(Uk ) ≤ λ cl(Uk ) ≤ λ cl(Vk ) ≤ λ(E) + 1/k. Letting k → +∞ proves the assertion. 10.4.5 Lemma. For any E ∈ M, there exists an increasing sequence of compact sets Ck ⊆ E such that limk λ(Ck ) = λ(E). Proof. Suppose first that E is bounded. Let I be a bounded open interval containing cl(E) and let ε > 0. Choose a sequence of open intervals Ik with

E

K =I \U

I \ E ⊆ U :=

S

k Ik

Ik

I FIGURE 10.6: K = cl(E) \ U . P∞ union U ⊇ I \ E such that k=1 |Ik | < λ(I \ E) + ε. Since I is open, we may assume that Ik ⊆ I (otherwise, replace Ik by Ik ∩ I). Then I \ E ⊆ U ⊆ I and λ(U ) ≤ λ(I \ E) + ε = λ(I) − λ(E) + ε. Set K = I \ U . Then K ⊆ E ⊆ cl(E) ⊆ I, hence K = cl(E) \ U . Therefore, K is compact and λ(K) = λ(I) − λ(U ) ≥ λ(I) − λ(I) − λ(E) + ε = λ(E) − ε. Now let E ∈ M be arbitrary and let {Ek } be a sequence of bounded

Lebesgue Measure on Rn

359

measurable sets such that Ek ↑ E. By the first paragraph, for each k we may choose a compact set Kk ⊆ Ek such that λ(Kk ) > λ(Ek ) − 1/k. These conditions still hold if Kk is replaced by the compact set Ck = K1 ∪ · · · ∪ Kk . The sequence {Ck } is increasing, contained in E, and λ(Ck ) → λ(E). 10.4.6 Lemma. If E ∈ M is bounded, then there exists an increasing sequence of compact sets Ck and a decreasing sequence of bounded open sets Uk such that Ck ⊆ E ⊆ Uk and lim λ(Uk \ Ck ) = 0. k

Proof. If Ck and Uk are as in 10.4.4 and 10.4.5 with Uk bounded, then λ(Uk \ Ck ) = λ(Uk \ E) + λ(E \ Ck ) → 0. 10.4.7 Theorem. If E ∈ M, then there exist Borel sets F and G such that F ⊆ E ⊆ G and λ(G \ F ) = 0. S∞ T∞ Proof. Suppose first that E is bounded. Set F = k=1 Ck and G = k=1 Uk , where Ck and Uk are the sets in 10.4.6. Then F ⊆ E ⊆ G and G \ F ⊆ Uk \ Ck for all k, hence λ G \ F ≤ λ Uk \ Ck ) → 0. In the general case, there exists a sequence of bounded Borel sets Ek ↑ E. By the first paragraph, there exist Borel sets Fk and Gk such that Fk ⊆ Ek ⊆ Gk and λ(Gk \ Fk ) = 0. Let F =

∞ [ k=1

Fk

and G =

∞ [

Gk .

k=1

Then F and G are Borel sets, F ⊆ E ⊆ G, and G \ F ⊆ countable subadditivity, λ(G \ F ) = 0.

S∞

k=1

Gk \ Fk . By

10.4.8 Corollary. Every E ∈ M is the disjoint union of a Borel set and a set of Lebesgue measure zero. Proof. By the theorem, E = F ∪ (E \ F ), where F ∈ B and λ(E \ F ) = 0.

Exercises 1.S Let ε > 0. Construct an explicit compact subset C ⊆ E := [0, 1] ∩ I such that λ(E \ C) < ε. 2. Show that the graph G := {(x, y) : y = f (x)} of a continuous function f : R → R is a Borel set with two-dimensional Lebesgue measure zero. 3. Let E denote the Cantor set (10.3.4). Show that E + Q and E + E are Borel sets and find their measures. 4.S Let B ∈ B(Rn ), y ∈ Rn , and r ∈ R. Prove that B + y := {x + y : x ∈ B}, rB := {rx : x ∈ B} and −B := {x : −x ∈ B} are Borel sets.

360

A Course in Real Analysis

10.5 .

Measurable Functions In this section, F denotes a σ-field of subsets of a set S.

Definition and Basic Properties 10.5.1 Lemma. (a) f −1 {+∞} , (b) f −1 {+∞} , (c) f −1 {+∞} ,

Let f : S → R. The following statements are equivalent: f −1 {−∞} ∈ F, and f −1 (U ) ∈ F for all open sets U ⊆ R. f −1 {−∞} ∈ F, and f −1 (F ) ∈ F for all closed sets F ⊆ R. f −1 {−∞} ∈ F, and f −1 (B) ∈ F for all Borel sets B ⊆ R.

(d) {x : f (x) ≤ t} ∈ F for all t ∈ R. (e) {x : f (x) < t} ∈ F for all t ∈ R. (f) {x : f (x) ≥ t} ∈ F for all t ∈ R. (g) {x : f (x) > t} ∈ F for all t ∈ R. of (a) Proof. The equivalence c and (b) follows from the general set theoretic relation f −1 (Ac ) = f −1 (A) . Clearly, (c) implies (b). For the converse, denote by G the collection of all Borel subsets B of R such that f −1 (B) ∈ F. Then G is a σ-field. If (b) holds, then G contains the closed sets, hence, by minimality, G = B. This proves (c) and hence shows that (a)–(c) are equivalent. The implications (c) ⇒ (d) ⇒ (e) ⇒ (f) ⇒ (g) ⇒ (d) are proved using the following set relations: (c) ⇒ (d) : {x : f (x) ≤ t} = f −1 {−∞} ∪ f −1 (−∞, t] . ∞ [ (d) ⇒ (e) : {x : f (x) < t} = {x : f (x) ≤ t − 1/n} . n=1 c

(e) ⇒ (f) : {x : f (x) ≥ t} = {x : f (x) < t} . ∞ [ (f) ⇒ (g) : {x : f (x) > t} = {x : f (x) ≥ t + 1/n} . n=1 c

(g) ⇒ (d) : {x : f (x) ≤ t} = {x : f (x) > t} . Thus (d)–(g) are equivalent and are implied by (a)–(c). Now assume that (d)–(g) hold. Then the sets f −1 (+∞) =

∞ \ k=1

{x : f (x) > k} , f −1 (−∞) =

∞ \ k=1

{x : f (x) < −k}

Lebesgue Measure on Rn

361

are members of F, and for −∞ < a < b < +∞, f −1 (a, b) = {x : f (x) > a} ∩ {x : f (x) < b} ∈ F. Since every open subset of R is a countable union of open intervals, (a) holds, completing the proof. 10.5.2 Definition. A function f : S → R is said to be measurable with respect to F, or simply F-measurable, if any (hence all ) of the conditions in Lemma 10.5.1 hold. ♦ The following theorem shows that the collection of all measurable functions is closed under the standard ways of combining functions. The functions f + , f − , supn fn , inf n fn , lim supn fn , and lim inf n fn in the statement of the theorem are defined by f + (x) := max{f (x), 0}, (sup fk )(x) := sup fk (x), k

f − (x) := max{−f (x), 0}, (inf fk )(x) := inf fk (x), k

k

(lim sup fk )(x) := lim sup fk (x), k

k

k

(lim inf fk )(x) := lim inf fk (x). k

k

10.5.3 Theorem. Let f, g, fk be measurable with respect to a σ-field F on S. If α ∈ R and p > 0, then f + g, αf , f 2 , f g, |f |p , f + , f − , supk fk , inf k fk , lim supk fk , and lim inf k fk are measurable. Proof. The proof is based on the following equalities. The details are left to the reader. [ • {x : (f + g)(x) < t} = {x : f (x) < r} ∩ {x : g(x) < t − r}. r∈Q

• {x : αf (x) < t} = {x : f (x) < t/α} for α > 0. √ √ • x : f 2 (x) < t = x : − t < f (x) < t for t > 0. • f g = 12 [(f + g)2 − f 2 − g 2 ]. • {x : |f |p (x) < t} = x : −t1/p < f (x) < t1/p for t > 0. • f + = 12 (|f | + f ), f − = 12 (|f | − f ). n o \ • x : sup fk (x) ≤ t = {x : fk (x) ≤ t}. k

k

• inf k fk = − supk (−fk ). • lim inf k fk = supk inf j≥k fj ;

lim supk fk = − lim inf k (−fk ).

10.5.4 Corollary. If fk : S → R is F-measurable for every k and if fk → f on S, then f is F-measurable.

362

A Course in Real Analysis

Simple Functions 10.5.5 Definition. The indicator function of a set A ⊆ S is the function 1A on S defined by ( 1 if x ∈ A, and 1A (x) = ♦ 0 if x 6∈ A. For example, the Dirichlet function may be expressed as 1Q . 10.5.6 Definition. A function f : S → R with finite range is called a simple function. The collection of all nonnegative F-measurable simple functions is denoted by S+ (F). ♦ 10.5.7 Remarks. (a) A linear combination of indicator functions is a simple function. Conversely, a simple function f may be expressed in many ways as a linear combination of indicator functions. The most important of these is the standard form f=

m X

aj 1Aj , Aj := {x ∈ S : f (x) = aj } ,

(10.7)

j=1

where a1 , . . . , am ∈ R are the distinct values of f . Note that the sets Aj form a partition of Rn . By 10.5.3 and Exercise 8, f is F-measurable iff Aj ∈ F for each j. (b) If f1 , f2 ∈ S+ (F), α ≥ 0, and p > 0, then the functions αf1 , f1 + f2 , f1 f2 , f1p , max{f1 , f2 }, min{f1 , f2 } are nonnegative, measurable, and have finite ranges, hence are in S+ (F).

♦

The following theorem shows that the collection S+ (F) generates all measurable functions. It is a key ingredient in the development of the Lebesgue theory. 10.5.8 Theorem. For each nonnegative F-measurable function f on S, there exists a sequence {fk } in S+ (F) such that fk ↑ f on S. Proof. Let f0 = 0, and for each k ∈ N define k

fk =

k2 X j−1 j=1

2k

1Ak,j + k1Ak , where Ak = {x : f (x) ≥ k} and

Ak,j = x : (j − 1)2−k ≤ f (x) < j2−k , j = 1, 2, . . . , k2k . (See Figure 10.7.) We show that fk (x) ↑ f (x) for each x ∈ S. This is clear if f (x) = +∞, since then fk (x) = k for all k. Suppose f (x) ∈ R and let k ∈ N. If f (x) ≥ k + 1, then fk+1 (x) = k + 1 > k = fk (x). If

Lebesgue Measure on Rn

363

f k .. .

j2−k

(j − 1)2−k S

Ak Ak,j FIGURE 10.7: The components of fk . k +. 1

.. .

..

k .. . −k

j2

(2j − 1)2k+1 (j − 1)2−k x x x FIGURE 10.8: The components of fk+1 .

S

k ≤ f (x) < k + 1, then fk+1 (x) ≥ k = fk (x). Finally, suppose that f (x) < k. Then (j − 1)2−k ≤ f (x) < j2−k for some 1 ≤ j ≤ k2k , hence 2j − 2 2j − 1 ≤ f (x) < k+1 2k+1 2

or

2j − 1 2j ≤ f (x) < k+1 . 2k+1 2

(See Figure 10.8.) In either case, fk+1 (x) ≥

2j − 2 j−1 = k = fk (x). k+1 2 2

Thus fk ↑ on S. Since 0 ≤ f (x) − fk (x) < 2−k for all sufficiently large k, fk (x) → f (x).

Lebesgue and Borel Measurable Functions 10.5.9 Definition. A function f : Rn → R is said to be Borel (Lebesgue) measurable if f is measurable with respect to the σ-field B(Rn ) (M(Rn )). ♦ 10.5.10 Proposition. If f is Lebesgue measurable and f = g a.e., then g is Lebesgue measurable.

364

A Course in Real Analysis

Proof. Let A = {x : f (x) 6= g(x)}. By hypothesis, A has Lebesgue measure zero, hence Ac and {x : g(x) < t} ∩ A ∈ M. Therefore, {x : g(x) < t} = {x : f (x) < t} ∩ Ac ∪ {x : g(x) < t} ∩ A ∈ M. If f is Borel measurable and f = g a.e., then g need not be Borel measurable. Indeed, there exist sets E ∈ M \ B with measure zero, hence 1E = 0 a.e. but 1E is not Borel measurable.4 Clearly, a Borel measurable function is Lebesgue measurable. The preceding paragraph shows that the converse is false. However, we have 10.5.11 Proposition. If f : Rn → R is Lebesgue measurable, then there exists a Borel measurable function g : Rn → R such that g = f a.e. Proof. Consider first the case f = 1E , E ∈ M. By 10.4.8, E is the disjoint union of a Borel set F and a set A of Lebesgue measure zero. Thus g := 1F is Borel measurable and f = g + 1A = g a.e. The assertion therefore holds for indicator functions. If f is a simple function, then each term in the standard form of f is a.e. equal to a Borel function. Therefore, the assertion holds for simple functions. If f ≥ 0, then, by 10.5.8, there exists a sequence of nonnegative Lebesgue measurable simple functions fk such that fk → f on Rn . By the previous paragraph, for each k there exists a Borel measurable function gk such that fk = gk a.e. Let Ak := {x : fk (x) 6= gk (x)}

and A :=

∞ [

Ak .

n=1

Then A ∈ M, λ(A) = 0 and fk (x) = gk (x) for all x ∈ Ac and all k. Let B denote the set of all x such that the sequence {gk (x)} does not converge. Then B ⊆ A and, by 10.5.3, B ∈ B. Let g = limk gk 1B c . Then g is Borel measurable and {x : g(x) 6= f (x)} ⊆ A so g = f a.e. Therefore, the assertion holds for nonnegative f . The general case follows from the identity f = f + − f − . Part (a) of 10.5.1 implies that a continuous function f : Rn → R is Borel measurable. In a similar vein, 10.5.12 Proposition. If f : Rn → R be continuous except on a set E of Lebesgue measure zero, then f is Lebesgue measurable. Proof. Let U ⊆ R be open. Then f −1 (U ) = A ∪ B, where A := f −1 (U ) ∩ E and B := f −1 (U ) ∩ E c . Since A ⊆ E and λ(E) = 0, A ∈ M. Since f is continuous on E c , B is open in E c , hence B = V ∩ E c for some open subset V of Rn . Therefore, B ∈ M, so f −1 (U ) ∈ M. By 10.5.1, f is Lebesgue measurable. 4 See,

for example, [4].

Lebesgue Measure on Rn

365

Proposition 10.5.12 implies that a function with at most countably many discontinuities is Lebesgue measurable. An examination of the proof shows that such a function is in fact Borel measurable. In particular, monotone functions on R, hence also functions of bounded variation, are Borel measurable (see 3.3.6 and 5.9.7). Note that a function that is continuous except on a set of measure zero is not necessarily equal a.e. to a continuous function (Exercise 12). Conversely, a function equal a.e. to a continuous function need not be continuous anywhere; the Dirichlet function is an obvious example.

Exercises .

In Exercises 1–8, F denotes a σ-field of subsets of a set S. 1.S Let f : S → R haveSthe property that 1Ak f is F-measurable for every k, where Ak ∈ F and k Ak = S. Prove that f is F-measurable. 2. Prove that if f : S → R is F-measurable and never zero, then 1/f is F-measurable. 3. Let f : S → R have the property that {x : f (x) < r} ∈ F for all r ∈ Q. Prove that f is F-measurable. 4. Let f : S → R be F-measurable and let g : R → R be continuous. Show that g ◦ f is F-measurable. 5. Let g, h : S → R be F-measurable functions. Prove that the following sets are F-measurable: (a)S {x ∈ S : g(x) > h(x)},

(b) {x ∈ S : g(x) ≥ h(x)},

(c) {x ∈ S : g(x) = h(x)},

(d)S {x ∈ S : g(x)h(x) = 1}.

6. Let {fk : S → R} be a sequence of F-measurable functions. Prove that the set x ∈ S : limk fk (x) exists in R is F-measurable. 7. Let f : S → R have range consisting of the distinct values ak , k ∈ N. Show that f is F-measurable iff {x ∈ S : f (x) = ak } ∈ F for every k. 8.S Let E ⊆ S. Prove that 1E is F-measurable iff E ∈ F. 9. Let A, B, and C be subsets of S. Prove: (a) 1AB = 1A 1B .

(b) 1A∪B = 1A + 1B − 1A 1B .

(c) 1Ac = 1 − 1A

(d) 1A ≤ 1B iff A ⊆ B.

10.S Define the symmetric difference A∆B of sets A and B by A∆B = (A \ B) ∪ (B \ A) = (A ∪ B) \ (A ∩ B). Prove that 1A∆B = |1A − 1B |.

366

A Course in Real Analysis

11. Let Ak ⊆ S and set B = lim inf k Ak and C = lim supk Ak . (see Exercise 10.1.6). Prove that (a) 1B = lim inf k 1Ak .

(b) 1C = lim supk 1Ak .

12. Prove that 1[0,1] is not equal a.e. to a continuous function on R. 13. Let f : R → R. Prove that if f 0 exists on R, then f 0 is Borel measurable. 14.S Let f (x) = bx−1 c−1 , 0 < x ≤ 1. Show that f is Borel measurable on (0, 1]. 15. Let f (x) = 1 + r bx−1 c , 0 < x ≤ 1, where r(k) denotes the remainder on division of an integer k by 3. Show that f is Borel measurable. √ 16. Define f : [0, 1] → R by f (x) = 0 if x is rational and f (x) = 1/ d if x is irrational, where d is the first nonzero digit in the decimal expansion of x. Prove that f is Borel measurable. 17.S Prove that if the function f in 10.5.8 is bounded, then the convergence of the sequence is uniform. 18. Let f : R2 → R have the property that f (x, y) is continuous in x for each y and Borel measurable in y for each x. Let g : R → R be Borel measurable. Prove that the function h(y) := f g(y), y is Borel measurable. Hint. Start with indicator functions g. 19.S ⇓5 Let f = (f1 , . . . , fm ) : Rn → Rm , where each fj : Rn → R is Borel measurable. Prove: (a) F := B ∈ B(Rm ) : f −1 (B) ∈ B(Rn ) is a σ-field. (b) F = B(Rm ), that is, f −1 (B) ∈ B(Rn ) for every B ∈ B(Rm ). (c) If F : Rm → R is Borel measurable, then the function g := F ◦ f is Borel measurable. 20. (a) Show that B × R ∈ B(R2 ) for all B ∈ B(R). (b) Let f : R → R be Borel measurable and define g : R2 → R by g(x, y) = f (x). Show that g is Borel measurable. 21. Let 0 ∈ A ⊆ Rn . Define the “radius function” fA : Rn → R by fA (x) := sup {t ≥ 0 : tx ∈ A} ,

x ∈ Rn .

(a) Let 0 ∈ Ak for all k and Ak ↑ A. Show that fAk ↑ fA . (b) Show that if A is open, then fA is positive and Borel measurable. (c) Use (b) to show that if A is compact, then fA is Borel measurable. (d) Conclude from 10.4.5 that fA is Borel measurable for any Borel set A containing 0. 5 This

exercise will be used in 11.5.4.

Chapter 11 Lebesgue Integration on Rn

In this chapter we use the measure theory developed in Chapter 10 to construct the Lebesgue integral of a measurable function of several variables. For comparison purposes, we begin with a brief description of the Riemann integral on compact subintervals of Rn .

11.1

Riemann Integration on Rn

The n-dimensional Riemann integral is constructed in essentially the same way as the one-dimensional integral: Let f be a bounded real-valued function on an n-dimensional interval [a, b] := [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ], where a := (a1 , . . . , an ) and b := (b1 , . . . , bn ). x2 b2

P2

I

I2 a2 a1

P1

I1

b1

x1

FIGURE 11.1: Partition of [a, b] × [c, d]. For each j, let Pj be a partition of the coordinate interval [aj , bj ]. The collection of all Cartesian products of the resulting coordinate subintervals produces a partition P of [a, b] consisting of n-dimensional subintervals I = I1 ×I2 ×· · ·×In with volume ∆VI := |I1 | |I2 | · · · |In | (see Figure 11.1). The lower and upper

367

368

A Course in Real Analysis

sums of f over P are defined by X S(f, P) = mI ∆VI ,

mI := inf f (x), and x∈I

I∈P

X

S(f, P) =

MI ∆VI ,

MI := sup f (x). x∈I

I∈P

The lower and upper integrals on [a, b] are defined by Z

b

f := sup S(f, P) and P

a

Z

b

a

f := inf S(f, P), P

where the supremum and infimum are taken over all partitions P of [a, b]. If the two integrals are equal, then f is said to be Riemann–Darboux integrable Rb on [a, b]. The common value of these integrals is then denoted by a f . As in the one-variable case, Z

b

f = lim S(f, P, {ξ I }I ), kPk→0

a

where kPk = maxj kPj k and S(f, P, {ξ I }I ) is the Riemann sum X S(f, P, {ξ I }I ) := f (ξ I )∆VI , ξ I ∈ I, I ∈ P. I

The n-dimensional Riemann integral has properties analogous to those of the one-dimensional integral. Moreover, as is shown in Section 11.5, if f is Rb continuous, then a f may be expressed as an iterated integral Z

b1

Z

bn

... a1

f (x1 , . . . , xn ) dxn · · · dx1 ,

an

effectively reducing the theory to the one-dimensional case. Integrals over regions bounded by “nice” surfaces may be similarly evaluated.

11.2

The Lebesgue Integral

The Lebesgue integral on Rn is defined first for nonnegative Lebesgue measurable simple functions and is then extended to a larger class of functions, including all nonnegative Lebesgue measurable functions. The identity f = f + − f − is then used to define the integral for general measurable functions.

Lebesgue Integration on Rn

369

The Integral of a Simple Function 11.2.1 Definition. Let f ∈ S+ (M) have standard form f=

m X

aj 1Aj , Aj := {x : f (x) = aj } ,

j=1

where {A1 , . . . , Am } is a (measurable) partition of Rn . The Lebesgue integral of f on Rn is defined by Z

f dλ :=

m X

aj λ(Aj ).

♦

j=1

Note that the above sum may contain a term of the form 0 · (+∞). While this expression was heretofore undefined, it is now necessary to make the definition 0 · (+∞) := 0. In particular, the integral of the identically zero function is 0 · λ(Rn ) = 0. 11.2.2 Lemma. If f, g ∈ S+ (M) and α ≥ 0, then Z Z Z Z Z (a) αf dλ = α f dλ; (b) (f + g) dλ = f dλ + g dλ; Z Z Z Z (c) f dλ ≤ g dλ if f ≤ g a.e. (d) f dλ = g dλ if f = g a.e. Proof. Part (a) is immediate from the definition, and (d) follows from (c). To prove (b) and (c), let f and g have standard representations f=

m X

ai 1Ai and g =

i=1

so Z

f=

m X

Sm

i=1

Ai =

λ(Ai ) =

k X

Sk

ai λ(Ai ) and

j=1

bj 1Bj ,

j=1

i=1

Since Rn =

k X

Z

g=

k X

bj λ(Bj ).

j=1

Bj and the unions are disjoint,

λ(Ai ∩ Bj ) and λ(Bj ) =

j=1

m X

λ(Ai ∩ Bj ),

i=1

hence Z

f dλ =

m X k X i=1 j=1

ai λ(Ai ∩ Bj ) and

Z

g dλ =

k X m X j=1 i=1

bj λ(Ai ∩ Bj ). (11.1)

370

A Course in Real Analysis

Now let c1 , . . . , cp be the distinct values of f + g, and set C` = {x : (f + g)(x) = c` } , ` = 1, . . . , p. Then f +g =

p X

c` 1C` and C` =

[

Ai ∩ Bj (disjoint),

{(i,j):ai +bj =c` }

`=1

so Z

(f + g) dλ =

p X

c` λ(Cl ) =

`=1

=

k m X X

p X `=1

X

c`

λ(Ai ∩ Bj )

{(i,j):ai +bj =c` }

(ai + bj )λ(Ai ∩ Bj ).

i=1 j=1

R R By (11.1), the last sum is f dλ + g dλ, proving (b). For (c), suppose f ≤ g a.e. and let E = {x : f (x) ≤ g(x)}. Then λ(E c ) = 0 and ai ≤ bj for all i, j for which Ai ∩ Bj ∩ E 6= ∅. From λ(Ai ∩ Bj ) = λ(Ai ∩ Bj ∩ E) + λ(Ai ∩ Bj ∩ E c ) = λ(Ai ∩ Bj ∩ E) and (11.1), we have Z

f dλ =

m X k X

ai λ(Ai ∩ Bj ∩ E) ≤

i=1 j=1

k X m X

bj λ(Ai ∩ Bj ∩ E) =

Z g dλ.

j=1 i=1

The Integral of a Measurable Function 11.2.3 Definition. Let f : Rn → R be Lebesgue measurable. If f ≥ 0, define Z Z nZ o f dλ = f (x) dλ(x) := sup fs dλ : fs ≤ f, fs ∈ S+ (M) . (11.2) In general, define the Lebesgue integral on Rn by Z Z Z f dλ := f + dλ − f − dλ, provided at least one of the terms on the right is finite. For E ∈ M define the Lebesgue integral on E by Z Z f dλ := f · 1E dλ E

R R whenever the right side is defined. If both E f + dλ and E f − dλ are finite, then f is said to be (Lebesgue) integrable on E. The collection of all integrable functions on E is denoted by L1 (E). Finally, f is said to be integrable if it is integrable on Rn . ♦

Lebesgue Integration on Rn R Note that from the definition, f ≥ 0 ⇒ f ≥ 0. More generally,

371

11.2.4 If f, g : Rn → RR are Lebesgue measurable, f ≤ g a.e., R Proposition. R R and f dλ, g dλ are defined, then f dλ ≤ g dλ. In particular, if f ≥ 0 and g is integrable, then f is integrable Proof. Assume first that f, g ≥ 0. Let fs ∈ S+ (M) with fs ≤ f and set gs := 1E fs , where E := {x : f (x) ≤Rg(x)}. Then R gs ∈ SR+ (M), fs = gs a.e., and gs ≤ 1RE f ≤ 1ERg ≤ g. By 11.2.2, fs dλ = gs dλ ≤ g dλ. Since fs was arbitrary, f dλ ≤ g dλ. In the general case, f + ≤ g + and f − ≥ g − a.e., hence, by the first part of the proof, Z Z Z Z Z Z f dλ = f + dλ − f − dλ ≤ g + dλ − g − dλ = g dλ. 11.2.5 Corollary. If fR : Rn → R is integrable and f = g a.e., then g is R integrable and f dλ = g dλ. Proof. RBy 10.5.10,R g is Lebesgue measurable. Moreover, f + = g + a.e., so by R − R − + + 11.2.4, f dλ = g dλ. Similarly, f dλ = g dλ. 11.2.6 Proposition. If f : Rn → R is integrable, then f is finite a.e. Proof. Suppose first that f ≥ 0. Let A = {x : f (x) = +∞} and Ak = {x : f (x) ≥ k} . Since f ≥ f 1Ak ≥ k1Ak ≥ k1A , 1 0 ≤ λ(A) ≤ k

Z

f dλ < +∞.

Letting k → +∞ shows that λ(A) = 0. In the general case, apply the result of the first paragraph to f + and f − to obtain λ x : f + (x) = +∞ = λ x : f − (x) = +∞ = 0, hence λ {x : |f (x)| = +∞} = 0. 11.2.7 Proposition. Let f : Rn → [0, +∞] be Lebesgue measurable. Then R f dλ = 0 iff f = 0 a.e. R Proof. The sufficiency follows from 11.2.5. For the necessity, suppose f dλ = 0 and let B = {x : f (x) > 0} and Bk = {x : f (x) > 1/k} . S∞ Then B = k=1 Bk and f ≥ f 1Bk ≥ k −1 1Bk so Z 0 ≤ λ(Bk ) ≤ k f dλ = 0. Therefore, λ(Bk ) = 0. By countable subadditivity, λ(B) = 0.

372

A Course in Real Analysis R By 11.2.4, f ≥ 0 implies that A f dλ ≥ 0 for all A ∈ M. The following is a converse:

n 11.2.8 R Proposition. Let f : R → R be Lebesgue measurable and suppose that A f dλ is defined for all A ∈ M. R (a) If A f dλ ≥ 0 for all A ∈ M, then f ≥ 0 a.e. R (b) If A f dλ = 0 for all A ∈ M, then f = 0 a.e.

Proof. Part (b) follows from part (a). To prove (a), let Ak = x : f (x) ≤ −k −1 and A = {x : f (x) < 0} . R Then f 1Ak ≤ −k −1 1Ak or 1Ak ≤ −kf 1Ak , hence, since Ak f ≥ 0, 0 ≤ λ(Ak ) ≤ −k

Z

f dλ ≤ 0.

Ak

Therefore, λ(Ak ) = 0. Since A =

S∞

k=1

Ak , λ(A) = 0.

11.2.9 Remark. The above properties of integrals on Rn also hold for integrals on E ∈ M. For example, if f is integrable on E, then f is finite a.e. on E: simply replace f in 11.2.6 by f · 1E . This observation applies to most of the results that follow. We shall usually refrain from making this explicit, but the reader is invited to formulate and verify such generalizations. ♦

Linearity of the Integral The following lemma is a special case of the monotone convergence theorem proved in the next section. 11.2.10 Lemma (Beppo–Levi). If {fk } is a sequence of nonnegative Lebesgue measurable functions such that fk ↑ f on Rn , then Z Z f dλ = lim fk dλ. k

R Proof. By 10.5.3, f is Lebesgue measurable, hence f dλ is defined. It follows from 0 ≤ fk ≤ fk+1 ≤ f and 11.2.4 that Z Z Z fk dλ ≤ fk+1 dλ ≤ f dλ. R R Therefore, L := lim fk dλ existsR in R and L ≤ f dλ. For the reverse inequality, it suffices to show that g dλ ≤ L for any g ∈ S+ (M) with g ≤ f . Let 0 < r < 1 and set Ek = {x : fk (x) ≥ rg(x)} . Since the sequence {fk } is

Lebesgue Integration on Rn

373

n increasing, Ek ⊆ k+1 . Since fk (x) ≥ rg(x) for all large k, Ek ↑ R . If g has PE m standard form j=1 aj 1Aj , then

fk ≥ fk 1Ek ≥ r

m X

aj 1Ek ∩Aj ,

j=1

hence Z fk dλ ≥ r

m X

aj λ(Ek ∩ Aj ).

j=1

Letting k → +∞, noting that Ek ∩ Aj ↑k Aj , we then obtain L≥r

m X

aj λ(Aj ) = r

Z g dλ.

j=1

Letting r ↑ 1 yields L ≥

R

g dλ, as required.

11.2.11 Theorem. If f, g : Rn → [0, +∞] are Lebesgue measurable, then Z Z Z (αf + βg) dλ = α f dλ + β g dλ α, β ∈ R+ . In particular, if f and g are integrable then so is αf + βg. Proof. By 10.5.8, there exist sequences {fk } and {gk } in S+ (M) such that fk ↑ f and gk ↑ g. Then αfk + βgk ↑ f + g and, by 11.2.10 and 11.2.2, Z Z (αf + βg) dλ = lim (αfk + βgk ) dλ k Z Z = α lim fk dλ + β lim gk dλ k k Z Z = α f dλ + β g dλ. 11.2.12 Corollary. Let f, g : Rn → R be Lebesgue measurable. (a) f is integrable iff |f | is integrable. (b) If f is integrable and |g| ≤ |f |, then g is integrable. (c) If f and g are integrable, then f + g is integrable. (d) If f is integrable and E ∈ M, then f is integrable on E. Proof. (a) If f is integrable then, by definition, both f + and f − are integrable, hence, by the theorem, |f | = f + + Rf − is integrable. Conversely, if |f | is R integrable, then the inequalities 0 ≤ f ± dλ ≤ |f | dλ show that both f + and f − are integrable, hence f is integrable.

374

A Course in Real Analysis

(b) By (a), |f | is integrable. The inequality |g| ≤ |f | then implies that |g| is integrable. By (a) again, g is integrable. (c) If f and g are integrable, then so are |f | and |g|. The inequality |f + g| ≤ |f | + |g| then shows that |f + g| is integrable. By (a), f + g is integrable. (d) This follows from (b) since |f 1E | ≤ |f |. The following theorem complements 11.2.11. 11.2.13 Theorem. Let f, g : Rn → R be Lebesgue measurable with g integrable, and let c ∈ R. Then the following hold: Z Z (a) cg is integrable and cg dλ = c g dλ. (b) If f is integrable, then f + g is integrable and Z Z Z (f + g) dλ = f dλ + g dλ. (c) If

Z

f dλ is defined, then

Z

(11.3)

(f + g) dλ is defined and (11.3) holds.1

Proof. (a) If c ≥ 0, then (cg)+ = cg + and (cg)− = cg − , hence, by 11.2.11, the functions (cg)± are integrable and Z Z Z Z Z Z cg dλ = (cg)+ dλ − (cg)− dλ = c g + dλ − c g − dλ = c g dλ. Next, observe that (−g)+ = g − and (−g)− = g + so Z Z Z Z Z Z + − − + (−g) dλ = (−g) dλ − (−g) dλ = g dλ − g dλ = − g dλ. Therefore, if c < 0, Z Z Z Z cg dλ = (−c)(−g) dλ = −c (−g) dλ = c g dλ. (b) By 11.2.12, f +g is integrable. By 11.2.6, there exists a set A of Lebesgue measure zero such that f (x), g(x) ∈ R for x ∈ Ac . Then on the set Ac , (f + g)+ − (f + g)− = f + g = f + − f − + g + − g − , hence (f + g)+ + f − + g − = (f + g)− + f + + g + . 1 To avoid undefined expressions such as ∞ − ∞ in the integrand f + g in (b) and (c), it must be assumed that g is finite-valued. This is no real loss of generality since g is integrable, hence finite-valued a.e. (11.2.6).

Lebesgue Integration on Rn

375

By 11.2.11 and 11.2.5, Z Z Z Z Z Z (f + g)+ dλ + f − dλ + g − dλ = (f + g)− dλ + f + dλ + g + dλ. Since the integrals in this equation are finite, rearranging yields Z Z Z + (f + g) dλ = (f + g) dλ − (f + g)− dλ Z Z Z Z = f + dλ − f − dλ + g + dλ − g − dλ Z Z = f dλ + g dλ. (c) The cases to be considered are R R (i) f − dλ < +∞ and f + dλ = +∞; R R (ii) f + dλ < +∞ and f − dλ = +∞. Suppose that (i) holds. We may assume that both f − and g are finite-valued. Since Z Z − (f + g) dλ ≤ (f − + g − ) dλ < +∞, R (f + g) dλ is defined. If (f + g)+ dλ < +∞, then (f + g) would be integrable, hence, by part (b,) so would f + = (f + g) + f − − g, contrary to our assumption. Therefore, Z Z Z (f + g) dλ = +∞ = f dλ + g dλ. R

Case (ii) is similar (or apply Case (i) to −f ). Z Z 11.2.14 Corollary. If f is integrable, then f dλ ≤ |f | dλ. Proof. Since ±f ≤ |f |, ±

Z

f dλ =

Z

Z ±f dλ ≤

|f | dλ.

Approximation of Integrable Functions 11.2.15 Definition. For E ∈ M and f ∈ L1 (E) define the L1 seminorm of f by Z kf k1 :=

|f | dλ. E

11.2.16 Theorem. L1 (E) is a linear space and k · k1 has all the properties of a norm except the coincidence property.

376

A Course in Real Analysis

Proof. That L1 (E) is a linear space follows from 11.2.13. Coincidence may fail since kf k1 = 0 only implies that f = 0 a.e. (Consider the Dirichlet function.) The other properties of a norm are easily established. 11.2.17 Theorem. Let f ∈ L1 (Rn ) and ε > 0. Then there exists a simple function g and a continuous function h, each vanishing outside a bounded interval, such that kf − gk1 < ε and kf − hk1 < ε. positive and negative parts, we may assume that f ≥ 0. Proof. By considering R By definition of f dλ, there exists fs ∈ S+ (M) with fs ≤ f such that Z Z kf − fs k1 = f dλ − fs dλ < ε/4. Let fs =

Pm

i=1

ai 1Ai , where ai > 0. Since m X

ai λ(Ai ) =

Z

Z fs dλ ≤

f dλ < +∞,

i=1

λ(Ai ) < +∞ for each i. Let M = maxi ai . By 10.1.6(d), there exists a bounded interval I such that λ(Ai ) − λ(I ∩ Ai ) < ε/(4M m), i = 1, . . . , m. Set Bi := Ai ∩ I and g :=

m X

ai 1Bi . Then

i=1

kg − fs k1 =

m X

ai λ(Ai ) − λ(Bi ) < ε/4,

i=1

hence

kf − gk1 ≤ kf − fs k1 + kfs − gk1 < ε/2.

To obtain h, for each i choose a compact set Ci and a bounded open set Ui such that Ci ⊆ Bi ⊆ Ui and λ(Ui \Ci ) < ε/(4mM ) (10.4.6). By Exercise 8.5.15, there exists a continuous function hi : Rn → [0, 1] such that hi = 1 on Ci and hi = 0 on Uic . Since hi − 1Bi = 0 on Ci ∪ Uic = (Ui \ Ci )c , Z k1Bi − hi k1 = |1Bi − hi | dλ ≤ 2λ(Ui \ Ci ) < ε/2mM. Ui \Ci

Pm

The function h := i=1 ai hi is continuous and by the triangle inequality kg − hk1 < ε/2. Therefore, kf − hk1 < ε, completing the proof.

Lebesgue Integration on Rn

377

Translation Invariance of the Integral 11.2.18 Theorem. If f : Rn → R is Lebesgue measurable and y ∈ Rn , then Z Z f (x + y) dx = f (x) dx (11.4) in the sense that if one side is defined, then so is the other and the integrals are then equal. Proof. If E ∈ M, then E − y ∈ M and λ(E − y) = λ(E) (Exercise 10.3.1), hence Z Z Z 1E (y + x) dx = 1E−y dλ = λ(E − y) = 1E dλ. Therefore, (11.4) holds for indicator functions. For a function h, define hy (x) := h(y + x). Let f ≥ 0 and let gR ∈ S+R(M) with g R≤ f . Then gy ≤ fy and, by the first paragraph and g = gy , R R linearity, R hence g ≤ fy . Taking the supremum over g yields f ≤ fy . Replacing y by −y and f by fy in this inequality produces the reverse inequality. Therefore, (11.4) holds for f ≥ 0. The general case follows from this and the identities (f ± )y = (fy )± .

Exercises 1. Let f and g be integrable. Prove: R R (a) If E f dλ ≤ E g dλ for all E ∈ M, then f ≤ g a.e. R R (b) If E f dλ = E g dλ for all E ∈ M, then f = g a.e. 2. Let f (x) = 1 + r bx−1 c for 0 < x ≤ 1, where r(k) is the remainder on division of the positive integer k by 3. (Cf. Exercise 10.5.15.) Show that Z ∞ 2 X 1 2 3 f dλ = + + + . 3 3k(3k + 1) (3k + 1)(3k + 2) (3k + 2)(3k + 3) (0,1] k=1

3.S Define f : [0, 1] → R by f (x) = 0 if x is rational, and f (x) = d2 if x is irrational, where d is the first nonzeroR digit in the decimal expansion of x. (See Exercise 10.5.16.) Show that [0,1] f dλ = 95/3. 4. (a) Prove the following mean value theorem for integrals: Let f be continuous on a compact connected set K ⊆ Rn . Then there exists xK ∈ K such that Z f dλ = f (xK )λ(K).

K

(b) Let f be continuous on C1 (x0 ). Prove that Z 1 lim f dλ = f (x0 ). r→0 λ(Cr (x0 )) C (x ) r 0

378

A Course in Real Analysis

5.S Let f be Lebesgue measurable on R and let m ≤ f ≤ M on E ∈ M(R). (a) Prove that if g is integrable on E, then there exists a ∈ [m, M ] such that Z Z f |g| dλ = a |g| dλ E

E

(b) Show that part (a) may be false if |g| is replaced by g. (c) Use (a) to show that at each point x where f is continuous, Z Z 1 lim f dλ − f dλ = f (x). y→x y − x [a,y] [a,x] 6. (Cauchy–Schwarz inequality) Let f and g be Lebesgue measurable on Rn . Prove that Z 2 Z Z 2 |f g| dλ ≤ f dλ · g 2 dλ. (See 5.7.19.) 7.S Prove that if f is integrable on [0, 1] and ε > 0, then there exists a R1 polynomial P on [0, 1] such that 0 |f − P | dλ < ε. 8. (Absolute continuity of the integral). Let f ≥ 0 be integrable on Rn . Prove that for each ε > 0 there exists a δ > 0 such that Z f dλ < ε for all E ∈ M(Rn ) with λ(E) < δ. E

Conclude that if {Ek } is a sequence in M(Rn ) with λ(Ek ) → 0, then R f dλ → 0. Hint. Begin with simple functions. Ek R 9.S Let f be integrable. Prove that limk [k,k+1] f dλ = 0. (A quick proof uses the dominated convergence theorem. For now, give a proof starting with simple functions.) 10. Suppose f : I = [0, 1] → [−1, 1] is integrable. Prove that Z f 2 dλ ≤ ε2 + λ {x : |f (x)| > ε} for every ε > 0. I

11.S Let f : I = [0, 1] → R be integrable. Prove that Z f 2 dλ ≤ ε2 λ {x ∈ I : |f (x)| > ε} for every ε > 0. I

12. Prove: If f is integrable and f < 1 a.e. on I, then

R I

f < 1.

Lebesgue Integration on Rn

379

13. Let R f be Lebesgue integrable on E ∈ M with 0 < λ(E) < +∞ and f dλ ≥ λ(E). Prove that λ {x ∈ E : f (x) ≥ 1} > 0. E 14. Let f be integrable on Rn . Show R that for each R r > 0 the function fr (x) := f (rx) is integrable and fr dλ = r−n f dλ. 15.S Let Rf : [0, 1] → R be a bounded Lebesgue measurable function such that [0,1] x2k f (x) dλ(x) = 0 for all k ∈ Z+ . Prove that f = 0 a.e. 16. Let f be Lebesgue integrable and g, g 0 bounded and continuous on R. Carry out the following steps to show that Z lim f (x)g 0 (kx) dλ(x) = 0. (11.5) k

(a) Prove (11.5) for f = 1[a,b] . (Use the fact that the Riemann and Lebesgue integrals of a continuous function on a closed bounded interval are equal. (Section 11.4.)) (b) Use (a) to show that (11.5) holds for f = 1U , where U is bounded and open. (c) Use (b) and 10.4.7 to show that (11.5) holds for f = 1E , where E ∈ M is bounded. (d) Use (c) and 11.2.17 to complete the proof. If g 0 (x) = sin x or cos x, then (11.5) is known as the Riemann–Lebesgue lemma.

11.3

Convergence Theorems

In this section we state and prove three pointwise convergence theorems for the Lebesgue integral. The first of these is a generalization of 11.2.10. Let fk : Rn → R be Lebesgue 11.3.1 Monotone Convergence Theorem. R − n measurable with fk ↑ f on R and let f1 dλ < +∞. Then Z Z f dλ = lim fk dλ. (11.6) k

fk−

f1−

R R Proof. From 0 ≤ f − ≤ ≤ we have fk− dλ < +∞ and f − Rdλ < +∞, hence the integrals in the assertion of the theorem are defined. If f1+ dλ = + + + +∞, R + then from f1 ≤ fk ≤ f we see that each side of (11.6) is +∞. If f1 dλ < +∞, then f1 is integrable and we may apply 11.2.10 to fk − f1 (≥ 0) to obtain Z Z Z Z Z Z fk dλ = (fk − f1 ) dλ + f1 dλ → (f − f1 ) dλ + f1 dλ = f dλ.

380

A Course in Real Analysis

11.3.2 Remark. Equation (11.6) is still true if the inequalities fk ≤ fk+1 ≤ f and the convergence fk ↑ f hold only almost everywhere. To see this, let A denote the set on which fk ≤ fk+1 for all k and fk ↑ f . Set f˜k = fk 1A and f˜ = f 1A . Then (11.6) holds for the new functions. Since λ(Ac ) = 0, 11.2.5 shows that the equation holds for the original functions. Analogous remarks apply to the other convergence theorems in this section. ♦ 11.3.3 Corollary. If gk is Lebesgue measurable and nonnegative for every k, then Z X ∞ ∞ Z X gk dλ = gk dλ. k=1

k=1

Pk

Proof. Let fk = j=1 gj and f = the theorem and linearity, Z

f dλ = lim

Z

k

P∞

j=1 gj .

Then 0 ≤ fk ↑ f on Rn , hence, by

fk dλ = lim k

k Z X

gj dλ.

j=1

11.3.4 Corollary. Let f ≥ 0 be Lebesgue measurable. Define a function µ on M(Rn ) by Z µ(E) := f dλ, E ∈ M(Rn ). E

Then µ is a measure on M(R ). n

Proof. For countable additivity, apply 11.3.3 to gk = 1Ek . 11.3.5 Fatou’s Lemma. If fk is nonnegative and Lebesgue measurable for every k, then Z Z lim inf fk dλ ≤ lim inf fk dλ. (11.7) k

k

Proof. Let gk = inf j≥k fj and g = lim inf k fk . Then gk ≤ fk , gk ↑ g, and gk and g are Lebesgue measurable (10.5.3). By the monotone convergence theorem, Z Z Z Z Z lim inf fk dλ = g dλ = lim gk dλ = lim inf gk dλ ≤ lim inf fk dλ. k

k

k

k

The inequality in (11.7) may be strict. For example, if fk = k1[0,1/k] , then the left side is zero while the right side is one. 11.3.6 Dominated Convergence Theorem. Let g : Rn → [0, +∞] be integrable and let {fk : Rn → R} be a sequence of Lebesgue measurable functions n such that |f R R k | ≤ g for all k. If fk → f on R , then f is integrable and fk dλ → f dλ.

Lebesgue Integration on Rn

381

Proof. Since |f | ≤ g, fk and f are integrable (11.2.12). Fatou’s lemma applied to g ± fk (≥ 0) shows that Z Z Z Z Z g dλ + f dλ ≤ lim inf (g + fk ) dλ = g dλ + lim inf fk dλ k

k

and Z

Z g dλ −

Subtracting

f dλ ≤ lim inf k

Z

(g − fk ) dλ =

Z

g dλ − lim sup

Z fk dλ.

k

g dλ in each inequality yields Z Z Z Z f dλ ≤ lim inf fk dλ ≤ lim sup fk dλ ≤ f dλ. R

k

k

The following example illustrates that care must be taken when applying the dominated convergence theorem. 11.3.7 Example. Let p > 0 and define fk (x) :=

k , 0 < x ≤ 1, and Ik := 1 + k 2 x2p

Z

fk dλ, k ∈ N.

(0,1]

Clearly, fk → 0 for all p > 0. We show that limk Ik = 0 iff 0 < p < 1. By Section 11.4, below, the integrals are Riemann, hence, making the substitution t = kxp and setting q = p−1 − 1, we obtain Z 1 Z k Z k 1 tq Ik = dx = q dt = gk dλ, 2 2p pk 0 1 + t2 0 1+k x where gk (t) =

1 tq 1[0,k] . q pk 1 + t2

If p = 1, then q = 0 and Ik = arctan k → π/2. If 0 < p < 1, then gk → 0 and gk (t) ≤ p−1 (1 + t2 )−1 for all t ≥ 0 and all k, so Ik → 0 by the dominated convergence theorem. Finally, if p > 1, then −1 < q < 0 and Z 1 1 1 Ik ≥ q dt → +∞. ♦ pk 0 1 + t2 The following theorem gives general conditions under which one may “differentiate under the integral sign.” 11.3.8 Theorem. Let f (x, y) be Lebesgue measurable on I := (a, b) × (c, d) such that for each y in (c, d) the function f (·, y) is Lebesgue integrable on (a, b) and the derivative fy exists on I. If there exists an integrable function g on (a, b) such that |fy (x, y)| ≤ g(x) for all (x, y) ∈ I, then Z Z d ∂f f (x, y) dλ(x) = (x, y) dλ(x). dy (a,b) (a,b) ∂y

382

A Course in Real Analysis

Proof. We prove the right-hand derivative version. Let y ∈ (c, d) and yk ↓ y. Set Z f (x, yk ) − f (x, y) G(y) = f (x, y) dλ(x) and gk (x) = . yk − y (a,b) By the mean value theorem, gk (x) = fy (x, tk ) for some tk ∈ (y, yk ), hence |gk | ≤ g. Since gk (x) → fy (x, y), the dominated convergence theorem implies that Z Z G(yk ) − G(y) = gk (x) dλ(x) → fy (x, y) dλ(x). yk − y (a,b) (a,b) R Since {yk } was arbitrary, G0r (y) exists and equals (a,b) fy (x, y) dλ(x).

Exercises 1.S Prove the following: Z k (a) lim sink x (1 − sin x) dλ = 0. k

(b) lim k

[0,π]

Z [0,+∞)

k sin x3/2 dλ(x) = 0. 1 + k 2 x2

2. Let f : Rn → (0, +∞) be integrable. Prove that Z Z Z (a) k ln(1 + k −1 f ) dλ → f dλ. (b) S k ln(1 + k −2 f ) dλ → 0. Z Z Z (c) S k sin k −1 f dλ → f dλ. (d) f 1/k dλ → λ(E), E ∈ M. E

3.S Let f, g : Rn → (0, +∞) be Lebesgue measurable with g integrable. Prove: Z Z g(1 + k −1 f )k exp (−f ) dλ → g dλ. 4. Let f : Rn → [1, +∞) be Lebesgue measurable and g : Rn → [0, +∞) integrable. Prove that Z k 2 g exp (−kf ) dλ → 0. 5.S Let f be integrable on (0, ∞). Show that for each t ∈ R the function f (x) sin(tx)/x is integrable on (0, ∞) and prove that the integral R f (x)x−1 sin(tx) dλ(x) is continuous in t. (0,∞) 6. Prove that the derivative of the gamma function (5.7.8) is Z ∞ Γ0 (x) = tx−1 e−t ln t dt, x > 0. 0

Lebesgue Integration on Rn

383

(Use the fact, proved in the next section, that the improper Riemann integral and the Lebesgue integral of a nonnegative continuous function are equal.) 7. Let f : [0, +∞) → R be bounded and Lebesgue measurable and suppose that limx→+∞ f (x) = r. Show that Z lim f (kx) dλ(x) = ar for every a > 0. k

[0,a]

Hint. Use Exercise 11.2.14. 8.S Let f : Rn → R be Lebesgue measurable and have countable range {a1 , a2 , . . .}. P Set Ak = {x ∈ Rn : f (x) = ak }. Prove that f is integrable ∞ iff the series k=1 R ak λ(Ak ) converges absolutely, in which case the value of the series is f dλ. R 9. Let p > 1 and f (x) := bx−1 c−p , 0 < x < 1. Find (0,1) f dλ. 10. Let fk , f be integrable and Ek , E ∈ M(Rn ) such that lim kfk − f k1 = 0 and lim λ(Ek ∆E) = 0 k

k

(see Exercise 10.5.10). Prove that Z Z lim fk dλ = f dλ. k

Ek

E

11. Let f : Rn → R be integrable and ε > 0. (a) Prove that the set A = {x : |f (x)| ≥ ε} has finite measure. (b) Show that there exists B ∈ M with λ(B) < +∞ such that Z Z < ε. f dλ − f dλ B

12.S Let {fk } be a sequence of integrable functions on Rn such that ∞ X kfk k1 < +∞. Prove that limk fk (x) = 0 a.e. k=1

13. Let Lebesgue integrable on R and p > 0. Prove that the series P∞ f be −p f (kx) converges absolutely a.e. on R. k=1 k 14. Let T : C [a, b] → C [a, b] be linear and continuous in the L1 norm. If f : [a, b] × [c, d] → R is continuous, prove that Z d Z d T f (·, x) dx = T f (·, x) dx c

c

where the integrals may be taken to be Riemann.

384

A Course in Real Analysis

15.S Prove the following extension of Fatou’s lemma: If fk , g are Lebesgue integrable on Rn and fk ≥ g for all k, then Z Z lim inf fk dλ ≤ lim inf fk dλ. k

k

16. Let g be integrable on Rn and let {fk } be a sequence of Lebesgue measurable functions on Rn such that |fk | ≤ g. Show that Z Z Z Z lim inf fk ≤ lim inf fk ≤ lim sup fk ≤ lim sup fk . k

k

k

k

17. Let f, fk be nonnegative Lebesgue integrable functions on Rn such that fk → f . Prove that Z Z fk dλ → f dλ iff kfk − f k1 → 0. Hint. For the necessity, note that (fk − f )− ≤ f . 18. Let f, fk be integrable on Rn with fk → f . Prove that Z Z kfk − f k1 → 0 iff |fk | dλ → |f | dλ. Hint. For the sufficiency use Fatou’s lemma. R 19.S Let f be Lebesgue integrable on R such that [a,b] f dλ = 0 for all intervals [a, b]. Prove that f = 0 a.e. Hint. Use 11.3.4, 10.4.4, and Exercise 11.2.8. 20. Let f : R2 → R have the property that f (x, y) is Lebesgue measurable in y for each x and continuous in x for each y. Suppose there exists an integrable function g : R → R such that |f (x, y)| ≤ g(y) for all x and y. Prove that the function Z F (x) := f (x, y) dλ(y) is continuous. P∞ 21. Let f ≥ 0 be integrable on [1, +∞). Prove that k=1 f (x+k) is integrable on [0, 1]. Conclude that the series converges a.e. on [1, +∞). 22. Let f be Lebesgue measurable on I = [0, 1] and set Ak = {x ∈ I : |f (x)| ≥ k} . Prove: (a) f is integrable on I iff

P∞

k=0

λ(Ak ) converges.

(b) If f is integrable on I = [0, 1] then limk kλ(Ak ) = 0.

Lebesgue Integration on Rn

11.4

385

Connections with Riemann Integration Throughout the section, f denotes an arbitrary bounded real-valued function on a closed and bounded interval [a, b].

In this section we show that f is Riemann integrable if and only if its set of discontinuities has Lebesgue measure zero. The first step is to show that the upper and lower integrals of f may be expressed as integrals of Borel measurable functions. By 5.2.1 there exists a sequence of partitions {Pk } of [a, b] such that Pk+1 is a refinement of Pk , kPk k → 0, and Z

b k

a

Define

hk =

Z

f = lim S(f, Pk ),

X

mj 1[xj−1 ,xj ]

b

f = lim S(f, Pk ).

a

k

and gk =

X

j

where

Mj 1[xj−1 ,xj ] ,

j

mj :=

inf

xj−1 ≤x≤xj

f (x),

Mj :=

sup

xj−1 ≤x≤xj

f (x),

and the intervals [xj−1 , xj ] are those generated by the partition Pk . Then gk and hk are Borel measurable simple functions and Z Z S(f, Pk ) = hk dλ, S(f, Pk ) = gk dλ. [a,b]

[a,b]

Moreover, h1 ≤ h2 ≤ · · · ≤ f ≤ . . . ≤ g2 ≤ g1 , hence h(x) := limk hk (x) and g(x) := limk gk (x) exist in R for each x ∈ [a, b], h ≤ f ≤ g, and h and g are Borel measurable. If M is a bound for |f |, then |hk |, |gk | ≤ M a.e., hence by the dominated convergence theorem b

Z

f = lim S(f, Pk ) = lim k

a

and

Z a

k

b

f = lim S(f, Pk ) = lim k

Z

k

hk dλ =

Z

[a,b]

Z [a,b]

h dλ

(11.8)

g dλ.

(11.9)

[a,b]

gk dλ =

Z [a,b]

11.4.1 Lemma. f ∈ Rba iff g = h a.e. In this case, f is Lebesgue measurable Rb R and a f = [a,b] f dλ.

386

A Course in Real Analysis R Proof. From (11.8) and (11.9), f ∈ Rba iff [a,b] (g − h) dλ = 0, which, by 11.2.7, is equivalent to g = h a.e. If this holds, then h = f = g a.e. so f is Lebesgue Rb R measurable and a f = [a,b] f dλ by (11.8) and (11.9). 11.4.2 Lemma. Suppose that x ∈ [a, b] is not a member of any of the partitions Pk . Then f is continuous at x iff h(x) = g(x). Proof. Suppose f is continuous at x. Given ε > 0, choose δ > 0 such that y ∈ [a, b] and |x − y| < δ implies |f (x) − f (y)| < ε. Choose N so that kPk k < δ for all k ≥ N and fix k ≥ N . Since x is in some subinterval (xj−1 , xj ) of Pk , f (x) − ε < f (y) < f (x) + ε for all y ∈ [xj−1 , xj ], hence

f (x) − ε ≤ hk (x) = mj ≤ Mj = gk (x) ≤ f (x) + ε.

Letting k → +∞ yields f (x) − ε ≤ h(x) ≤ g(x) ≤ f (x) + ε, and since ε was arbitrary, g(x) = h(x). Conversely, let g(x) = h(x). Given ε > 0, choose k such that |gk (x) − g(x)| < ε and |hk (x) − h(x)| < ε. Suppose that x is in the open subinterval (xi−1 , xi ) of Pk . Choose δ > 0 so that (x − δ, x + δ) ⊆ (xi−1 , xi ). Then for all y ∈ (x − δ, x + δ), h(x) − ε ≤ hk (x) ≤ f (y) ≤ gk (x) ≤ g(x) + ε = h(x) + ε, which implies that |f (x) − f (y)| < 2ε. Therefore, f is continuous at x. Here is the main result of the section. 11.4.3 Theorem. Let f : [a, b] → R be bounded. Then f ∈ Rba iff the set D of discontinuities of f has Lebesgue measure zero. In this case, f is Lebesgue measurable and Z Z b

f (x) dx =

a

f dλ. [a,b]

Proof. Let A denote the union of the partitions Pk and set B {x : g(x) 6= h(x)}. By 11.4.2,

=

B ∩ Ac ⊆ D ⊆ A ∪ B Since A is countable, λ(A) = 0, hence λ(B ∩ Ac ) = λ(A ∪ B) = λ(B). It follows that λ(B) = λ(D). Thus, by 11.4.1, f ∈ Rba iff λ(D) = 0.

Lebesgue Integration on Rn

387

11.4.4 Example. Let A := (0, 1) \ E, where E is the Cantor ternary set (10.3.4). Since A is open, the function f (x) = 1A (x) sin(πx) is continuous on A. Since λ(E) = 0, f is both Riemann and Lebesgue integrable on [0, 1] and Z

1

f (x) dx =

0

Z 0

1

2 sin(πx) dx = . π

♦

11.4.5 Remark. Theorem 11.4.3 readily extends to n-dimensional Riemann integrals; the statement and proof are essentially the same. Note that in this case, a Riemann integrable function f may be discontinuous on m-dimensional hyperplanes, m < n, as these have Lebesgue measure zero (see 11.6.9). ♦ Here is the connection between improper integrals and Lebesgue integrals. 11.4.6 Corollary. Let g be locally Riemann integrable on [a, b) (where b could be infinite). Then g is Lebesgue measurable on [a, b). Moreover: (a) If g ≥ 0, then g is improperly integrable on [a, b) iff g is Lebesgue integrable on [a, b), in which case Z b Z g= g dλ. (11.10) a

[a,b)

(b) If g is Lebesgue integrable on [a, b), then g is improperly integrable on [a, b) and (11.10) holds. (c) If g is improperly integrable on [a, b), then g need not be Lebesgue integrable on [a, b). Proof. (a) Let bk ↑ b and let D denote the set of discontinuities of g on [a, b). Since g is Riemann integrable on [a, bk ], λ [a, bk ] ∩ D = 0. By the theorem, 1[a,bk ] g is Lebesgue measurable for every k and Z a

bk

g=

Z

g dλ =

Z 1[a,bk ] g dλ.

[a,bk ]

Taking limits we see that g is Lebesgue measurable and, by the monotone convergence theorem, 11.10 holds. (b) If g is Lebesgue integrable on [a, b), then, by (a), g + and g − are improperly integrable on [a, b) hence (b) holds. (c) The function g(x) = x−1 sin x is improperly integrable but not absolutely improperly integrable on [1, +∞) (5.7.18). Since a Lebesgue integrable function is absolutely integrable, g cannot be Lebesgue integrable on [1, +∞).

388

A Course in Real Analysis

11.5

Iterated Integrals For the remainder of the text we also use the notation dx Rb R for dλ(x) and a f (x) dx for [a,b] f (x) dλ(x), etc.

In this section we state and prove a result that gives general conditions under which the Lebesgue integral of a function on Rn may be expressed as an iterated integral, a useful tool for evaluating integrals. 11.5.1 Fubini–Tonelli Theorem. Let f be Borel measurable on Rn and let p, q ∈ N with p + q = n. (a) If f ≥ 0, then the functions Z Z f (x, z) dz and Rq

f (z, y) dz

Rp

are Borel measurable in x ∈ Rp and y ∈ Rq , respectively, and Z Z Z Z f (x, z) dz dx = f (z, y) dz dy. Rp

Rq

Rq

(b) If either of the iterated integrals Z Z Z |f (x, z)| dz dx or Rp

Rq

(11.11)

Rp

Rq

Z

|f (z, y)| dz dy

Rp

is finite, then both are finite, f is integrable, and (11.11) holds. By induction we have 11.5.2 Corollary. Let f be Borel measurable on Rn such that Z ∞ Z ∞ ··· |f (x1 , . . . , xn )| dxi1 · · · dxin < +∞ −∞

−∞

for some permutation (i1 , . . . , in ) of (1, . . . , n). Then f is integrable and Z Z ∞ Z ∞ f dλ = ··· f (x1 , . . . , xn ) dxj1 · · · dxjn . Rn

−∞

−∞

for every permutation (j1 , . . . , jn ) of (1, . . . , n). 11.5.3 Example. We prove the Gaussian density formula Z ∞ 2 e−t /2 . ϕ(t) dt = 1, where ϕ(t) := √ 2π −∞

(11.12)

Lebesgue Integration on Rn

389

By 11.4.6, the integral may be interpreted either as a Lebesgue integral or as an improper Riemann integral. The function ϕ is called the standard normal (or Gaussian) density. It plays an important role in probability and statistics. For Rb example, σ −1 a ϕ (x − µ)/σ dx is the probability that randomly chosen data from a normally distributed population R ∞ with mean µ and standard deviation σ lies between a and b, and σ −1 −∞ ϕ (x − µ)/σ x dx is the average of the data. To verifyR(11.12) note that because the integrand is an even function, the ∞ left side is 2 0 ϕ(t) dt. By a change of variable, Z ∞ Z ∞ Z ∞ 2 2 2 −t2 /2 e−t dt. 2 ϕ(t) dt = √ e dt = √ π 2π 0 0 0 Thus it suffices to show that 2 √ π

Z

∞

2

e−t dt = 1.

0

Let I denote the integral on the left. Then Z ∞ Z ∞ 2 2 −y 2 I = e e−t dt dy Z0 ∞ Z0 ∞ 2 2 −y 2 = e ye−x y dx dy, by the substitution t = xy 0 0 Z ∞Z ∞ 2 2 = ye−y (1+x ) dy dx, by 11.5.1 0 0 Z Z ∞ 1 ∞ (1 + x2 )−1 e−u du dx, by the substitution u = y 2 (1 + x2 ) = 2 0 0 ∞ R∞ 1 = arctan x because 0 e−u du = 1 2 0 π ♦ = 4 11.5.4 Example. Let f, g : Rn → R be Borel measurable and integrable. By Exercise 10.5.19, the function F (x, y) := f (x − y)g(y) is Borel measurable in (x, y). By the Fubini–Tonelli theorem and translation invariance of the integral, Z Z Z |F (x, y)| dλ(x, y) = |g(y)| |f (x − y)|dx dy = kgk1 kf k1 < +∞. Rn ×Rn

Rn

Rn

Therefore, F is integrable, hence the function Z (f ∗ g)(x) := f (x − y)g(y)dy, Rn

called the convolution of f and g, is finite a.e. and integrable on Rn . Convolutions are useful in calculating the probability distribution of a sum of independent random variables. ♦

390

A Course in Real Analysis

11.5.5 Example. (Volume of a simplex). Let a > 0 and let ej , 1 ≤ j ≤ n, be the standard basis in Rn . Define the n-dimensional simplex in Rn by n n o X S(a, n) = x : xj ≤ a and xj ≥ 0 . j=1

x3 a

a x1

x2

a

FIGURE 11.2: Three-dimensional simplex. We use the Fubini–Tonelli theorem and induction to show that an λn S(a, n) = . n! The formula holds for n = 1 since S(a, 1) = [0, a]. Assume the formula holds for n − 1 and all a > 0. Then Z λn S(a, n) = 1S(a,n) (x1 , . . . , xn ) d(x1 , . . . , xn ) Z = 1S(a−xn ,n−1) (x1 , . . . , xn−1 ) d(x1 , . . . , xn−1 ) dxn [0,a] Z 1 (a − xn )n−1 dxn . = (n − 1)! [0,a] The last integral evaluates to an /n, completing the proof.

♦

11.5.6 Example. Let Crn (x) denote the closed ball in Rn with center x and radius r. We show that λ Crn (x) = rn αn , where (2π)n/2 if n is even, ···4 · 2 αn = n(n − 2) (n−1)/2 2(2π) if n is odd. n(n − 2) · · · 3 · 1 For ease of notation we write Crn for Crn (0) and denote by 1r the indicator n function of Cr . By the translation and dilation properties of Lebesgue measure, λ Crn (x) = rn λ C1n , hence it suffices to establish the formula for r = 1 and x = 0.

Lebesgue Integration on Rn

391

If n = 1, then C1n = (−1, 1) and αn = 2, so the formula holds in this case. By a simple integration, λ C12 = π, hence the formula holds for n = 2 as well. Now assume that n > 2. From C1n = (x1 , . . . , xn ) : x21 + · · · + x2n ≤ 1 = (x1 , . . . , xn ) : x23 + · · · + x2n ≤ 1 − x21 − x22 , (x1 , x2 ) ∈ C12 we have

11 (x1 , . . . , xn ) = 1√1−x2 −x2 (x3 , . . . , xn )11 (x1 , x2 ), 1

2

hence, by the Fubini–Tonelli theorem, Z Z λ C1n = 11 (x1 , x2 ) 1√1−x2 −x2 (x3 , . . . , xn ) dλ(x3 , . . . , xn ) dx1 dx2 . 1

Rn−2

R2

The inner integral is n−2 λ C√

2

= (1 − x21 − x22 )(n−2)/2 λ C1n−2 ,

1−x21 −x22

hence, changing to polar coordinates,2 Z n−2 n λ C1 = λ C1 (1 − x21 − x22 )(n−2)/2 dx1 dx2 x21 +x22 ≤1

= λ C1n−2

Z 0

2π

2π λ C1n−2 . = n

Z

1

(1 − r2 )(n−2)/2 r dr dθ

0

Iterating, we obtain 2π (2π)2 λ C1n = λ C1n−2 = λ C1n−4 = · · · n n(n − 2) (2π)m−1 n−2(m−1) = λ C1 . n(n − 2) · · · (n − 2(m − 2)) Thus λ C12m =

(2π)m−1 (2π)m λ C12 = 2m(2m − 2) · · · 4 2m(2m − 2) · · · 2

and λ C12m−1 =

(2π)m−1 2(2π)m−1 λ C11 = . (2m − 1)(2m − 3) · · · 3 (2m − 1)(2m − 3) · · · 3

♦

2 The general change of variables theorem for Lebesgue integrals is proved in the next section.

392

A Course in Real Analysis

Proof of the Fubini–Tonelli theorem. We show first that part (b) of the theorem is a consequence of part (a). Indeed, if one of the iterated integrals in (b) is finite, then, by part (a) applied to |f |, so is the other and f is integrable. Applying part (a) to f ± , we see that (11.11). Next, observe that if part (a) of the theorem holds for indicator functions then, by linearity of the integrals, it holds for nonnegative simple functions. By 10.5.8 and the monotone convergence theorem, (a) holds for all nonnegative Borel measurable functions. It remains then to prove (a) for indicator functions. The proof consists of several lemmas, the first of which is a special case of a theorem due to E.B. Dynkin. 11.5.7 Lemma. Let F denote the intersection of all collections G of subsets of Rn with the following properties: (a) If A, B ∈ G and A ⊆ B, then B \ A ∈ G. (b) If Ak ∈ G and Ak ↑ A, then A ∈ G. (c) G contains every bounded interval. Then F is a σ-field containing B(Rn ). Proof. It is easy to see that F itself has properties (a)–(c). Moreover, from (b) and (c), F contains every interval. In particular, Rn ∈ F. We show first that F is closed under finite intersections. To see this, fix A ∈ F and define FA := {B ∈ F : A ∩ B ∈ F} . One easily checks that FA has properties (a) and (b). Furthermore, if A is an interval, then FA has property (c) so by minimality F ⊆ FA . This shows that if B ∈ F, then A ∩ B ∈ F for all intervals A; in other words, FB contains all intervals. Thus FB has properties (a)–(c). By minimality, F ⊆ FB , that is, A, B ∈ F ⇒ A ∩ B ∈ F. By induction, F is closed under finite intersections. Now observe that property (a), together with the fact that Rn ∈ F, implies that F is closed under complements. Thus if {Ek } is a sequence in F, then, by the result of the preceding paragraph, Ak :=

k [ j=1

Ek =

\ k

Ekc

c ∈ F.

j=1

S∞ S∞ By (b), k=1 Ek = k=1 Ak ∈ F. This shows that F is a σ-field. Since F contains all intervals, it must contain B(Rn ). 11.5.8 Lemma. Let p, q ∈ N with p + q = n. If A ∈ B(Rp ) and B ∈ B(Rq ), then A × B ∈ B(Rn ) and λ(A × B) = λ(A)λ(B). (11.13)

Lebesgue Integration on Rn

393

Proof. For fixed bounded intervals I ⊆ Rp and J ⊆ Rq , define GI,J = B ∈ B(Rq ) : I ×(B ∩J) ∈ B(Rn ) & λ I ×(B ∩J) = λ(I)λ(B ∩J) . We show that GI,J has properties (a)–(c) of 11.5.7. Clearly, (c) holds. If B ∈ GI,J , then I × (B c ∩ J) = (I × J) \ I × (B ∩ J) ∈ B(Rn ) and λ I × (B c ∩ J) = λ I × J − λ I × (B ∩ J) = λ(I) λ(J) − λ(B ∩ J) = λ(I)λ(J ∩ B c ), hence B c ∈ G. Therefore, GI,J is closed under complements. Now let Bk ∈ GI,J and Bk ↑ B. Then I × (J ∩ B) =

∞ [

I × (J ∩ Bk ) ∈ B(Rn )

k=1

and, by 10.1.6, λ I × J ∩ B) = lim λ I × (J ∩ Bk ) = λ(I) lim λ(J ∩ Bk ) = λ(I)λ(J ∩ B), k

k

which shows that B ∈ GI,J . Therefore, GI,J has properties (a)–(c) of 11.5.7, so B(Rq ) = GI,J . We have shown that for all bounded intervals I ⊆ Rp , J ⊆ Rq and all B ∈ B(Rq ), I × (B ∩ J) ∈ B(Rn ) and λ I × (B ∩ J) = λ(I)λ(B ∩ J). Taking a sequence of bounded intervals Jk ↑ Rn , we see that I × B ∈ B(Rn ) and λ I × B) = λ(I)λ(B).

(11.14)

Now fix B ∈ B(Rq ) and let I ⊆ Rp be a bounded interval. Define HB,I = {A ∈ B(Rp ) : (A ∩ I) × B ∈ B(Rn ) & λ (A ∩ I) × B = λ(A ∩ I)λ(B)}. By (11.14), HB,I contains all intervals. Arguing as above, we see that HB,I = B(Rp ). Thus for all A ∈ B(Rp ), B ∈ B(Rq ), and all bounded intervals I ⊆ Rp , (A ∩ I) × B ∈ B(Rn ) and λ (A ∩ I) × B = λ(A ∩ I)λ(B). Taking a sequence of bounded intervals Ik ↑ Rn in the last equation yields (11.13). The following lemma asserts that part (a) of the Fubini–Tonelli theorem holds for indicator functions of Borel sets and hence completes the proof of the theorem.

394

A Course in Real Analysis

11.5.9 Lemma. Let p, q ∈ N with p + q = n and let C ∈ B(Rn ). Then Z Z 1C (x, z) dz and 1C (z, y) dz Rq

Rp

are Borel measurable functions of x ∈ R and y ∈ Rq , respectively, and Z Z Z Z λ(C) = 1C (x, z) dz dx = 1C (z, y) dz dy. p

Rp

Rq

Rq

Rp n

Proof. Let G denote the collection of all C ∈ B(R ) for which the assertions of the lemma hold. We show that G = B(Rn ). The first step is to show that G has properties (b) and (c) of 11.5.7. For property (b), let Ck ∈ G and Ck ↑ C. Then 1Ck (x, z) ↑ 1C (x, z), hence, by the monotone convergence theorem, Z Z 1Ck (x, z) dz ↑ 1C (x, z) dz, x ∈ Rp . Rq

Rq

Thus Rq 1C (x, z) dz is Borel measurable in x. Applying the monotone convergence theorem again, we see that Z Z Z Z λ(C) = lim λ(Ck ) = lim 1Ck (x, z) dz dx = 1C (x, z) dz dx, R

k

k

Rp

Rq

Rp

Rq

and similarly for the other iterated integral. Therefore, G has property (b). For property (c), let A ∈ B(Rp ), B ∈ B(Rq ), and C = A × B. Then Z Z 1C (x, z) dz = 1A (x)1B (z) dz = 1A (x)λ(B), Rq

Rq

which is Borel measurable in x and, together with 11.5.8, implies that Z Z 1C (x, z) dz dx = λ(A)λ(B) = λ(C). Rp

Rq

Similar assertions hold for the other iterated integral. Thus, G contains Cartesian products of Borel sets and, in particular, all intervals. Now let I be a bounded interval in Rn and let GI = {B ∈ B : B ∩ I ∈ G}. Since G has properties (b) and (c) of 11.5.7, so does GI . We claim that GI also has property (a). To see this, let C, D ∈ GI with C ⊆ D and let E = D \ C. Since 1E∩I = 1D∩I − 1C∩I , Z Z Z 1E∩I (x, z) dz = 1D∩I (x, z) dz − 1C∩I (x, z) dz, Rq

Rq

Rq

which, because C ∩ I and D ∩ I ∈ G, is Borel measurable in x and implies that Z Z 1E∩I (x, z) dz dx Rp Rq Z Z Z Z = 1D∩I (x, z) dz dx − 1C∩I (x, z) dz dx Rp

Rq

= λ(D ∩ I) − λ(C ∩ I) = λ(E ∩ I).

Rp

Rq

Lebesgue Integration on Rn

395

Here we have used the fact that, because I is bounded, the calculations take place in R, hence subtraction is legitimate. The other iterated integral is treated similarly. Therefore E ∈ GI , as required. Since GI has properties (a)–(c) of 11.5.7, GI contains all Borel sets. This means that for any C ∈ B(Rn ) and bounded interval I ⊆ Rn , the functions Z Z 1C∩I (x, z) dz and 1C∩I (z, y) dz Rq

Rp

are Borel measurable in x and y, respectively, and Z Z Z Z 1C∩I (x, z) dz dx = λ(C ∩ I) = Rp

Rq

Rq

1C∩I (z, y) dz dy.

Rp

Taking an increasing sequence of bounded intervals I tending to Rn and using the monotone convergence theorem shows that C ∈ G. Therefore, G = B(Rn ), as required.

Exercises 1.S Prove that λn {(x1 , . . . , xn ) : xj ∈ Q for some j} = 0. R 2. Evaluate [0,+∞)n f , where 2

(a)S f (x) = x1 · · · xn e−kxk .

(b) f (x) = x1 · · · xn (1 + kxk2 )−n−1 .

3. (Cavalieri’s principle). For E ∈ M(Rn ) and t ∈ R, define Et := x = (x1 , . . . , xn−1 ) ∈ Rn−1 : (x, t) ∈ E . Suppose that Et ∈ M(Rn ) for all t ∈ [a, b]. Prove that h

λn E ∩ R

n−1

× [a, b]

i

=

b

Z

λn−1 (Et ) dt.

a

Thus the “volume” of the portion of E between the hyperplanes xn = a and xn = b is the integral from a to b of the “cross-sectional areas” λn−1 (Et ). 4. Let f and g be Riemann integrable on [0, 1]. Prove that 1

Z 0

Z

x

g(x − y)f (y) dy dx =

Z

=

Z

0

0

0

5.S Evaluate

Z 0≤x≤x1 ≤···≤xm ≤1

1

Z

1−y

g(x)f (y) dx dy

0 1Z

1−x

0

x dλ(x, x1 , . . . , xm ).

g(x)f (y) dy dx.

396

A Course in Real Analysis

6. Show that Z 1Z 0

1

x2 − y 2 dy dx = − (x2 + y 2 )2

0

Z

1

1

Z 0

0

x2 − y 2 π dx dy = . (x2 + y 2 )2 4

Why does this not contradict the Fubini–Tonelli theorem? 7.S Let f be integrable on (0, 1), p > 0, and define Z g(x) = t−p f (t) dt, 0 < x < 1. [x1/p ,1)

Prove that g is integrable on (0, 1) and that Z Z g dλ = f dλ. (0,1)

(0,1)

8. Let f be continuous on [−1, 1]. Show that Z 2π Z 1 (a) f 0 (r cos θ)r cos2 θ dr dθ 0

0

0

=

Z

2π

f (cos θ) cos θ dθ −

Z

0

(b) (c)

Z

2π

Z

1

0 Z 2π

0 Z 1

0

0

f 0 (r cos θ)r sin2 θ dr dθ =

Z 0

f 0 (r cos θ)r dr dθ =

Z

2π

Z

2π

0 cos θ

Z

cos θ

f (x) dx dθ.

0

f (x) dx dθ.

0

2π

f (cos θ) cos θ dθ.

0

9. Let a, b > 0. Use the Fubini–Tonelli R ∞theorem, the dominated convergence theorem, and the identity 1/x = 0 e−xt dt, x > 0, to prove that Z ∞ Z ∞ −ax π e − e−bx sin x (a)S dx = . (b) dx = ln b − ln a. x 2 x 0 0 x 1 10. Show that ϕ ∗ ϕ(x) = √ ϕ √ . 2 2 11. Let f, g : Rn → R be Borel measurable and integrable. Prove: (a)S f ∗ g = g ∗ f . (b) If f and g are continuous, then Z d f (x)g(y) dx dy = f ∗ g(z), where Az = {(x, y) : x + y ≤ z} . dz Az 12. Let f : [0, 1] → (0, +∞] be Lebesgue measurable. Use the Fubini–Tonelli theorem to prove that Z Z f dλ 1/f dλ ≥ 1. [0,1]

[0,1]

(A simpler but less interesting proof uses the Cauchy–Schwarz inequality.)

Lebesgue Integration on Rn

397

13. Let f and g be positive Lebesgue measurable functions on [0, 1] such that f g ≥ 1. Use the preceding exercise to prove that Z Z f dλ g dλ ≥ 1. [0,1]

[0,1]

(The Cauchy–Schwarz inequality may be used here as well.) 14.S Let f and g be Lebesgue integrable on [a, b] and for x ∈ [a, b] let Z Z F (x) = F (a) + f (t) dλ(t) and G(x) = G(a) + g(t) dλ(t), [a,x]

[a,x]

where F (a) and G(a) are arbitrary. Prove that Z Z F (x)g(x) dλ(x) + G(x)f (x) dλ(x) = F (b)G(b) − F (a)G(a). [a,b]

[a,b]

15. (a) Verify that the function 2 1 1 κ(t, x) = √ e−x /4t = √ ϕ 2 πt 2t

x √ 2t

is a solution of the heat equation wt (t, x) = wxx (t, x),

x ∈ R, t > 0.

(b) Let w0 (x) be integrable on R and define Z ∞ w(t, x) = w0 (y)κ(t, x − y) dy, −∞

the convolution of κ with w0 . Show that w(t, x) satisfies the heat equation. (c) Verify that w(t, x) =

Z

∞

√ w0 x + z 2t ϕ(z) dz.

−∞

(d) Use (c) and the dominated convergence theorem to show that if w0 is continuous and satisfies |w0 (x)| ≤ aeb|x| for some positive constants a, b and for all x, then limt→0+ w(t, x) = w0 (x). Conclude that the solution w(t, x) may be continuously extended to [0, +∞) × R and consequently satisfies the boundary condition w(0, x) = w0 (x). 16. For a Borel measurable function f : R → [0, +∞), define A := {(x, y) : 0 ≤ y ≤ f (x)} and Ay := {x : f (x) > y} , y ∈ R.

398

A Course in Real Analysis Prove: (a) A ∈ B(R2 ). (b) The function y 7→ λ(Ay ) is Borel measurable and Z Z f (x) dλ(x) = λ(Ay ) dλ(y) = λ(A). (0,+∞)

(c) Part (b) holds if A and Ay are replaced, respectively, by B = {(x, y) : 0 ≤ y < f (x)} and By = {x : f (x) ≥ y} . (d) λ {(x, y) : f (x) = y} measure zero.)

11.6

= 0. (The graph of a Borel function has

Change of Variables

In Chapter 5 we proved that if ϕ : [a, b] → R is continuously differentiable with everywhere nonzero derivative and if f is Riemann integrable on [c, d] := ϕ([a, b]), then Z d Z b f (y) dy = f (ϕ(x))|ϕ0 (x)| dx. c

a

In this section we prove the following n-dimensional version of this result. 11.6.1 Change of Variables Theorem. Let U and V be open subsets of Rn and let ϕ : U → V be C 1 on U with C 1 inverse ϕ−1 : V → U . If f is Lebesgue measurable on V and either f ≥ 0 or f is integrable, then Z Z f (y) dy = (f ◦ ϕ)(x)|Jϕ (x)| dx, (11.15) V

U

where Jϕ is the Jacobian of ϕ on U . 11.6.2 Example. Spherical coordinates (r, θ1 , θ2 , . . . , θn−1 ) in Rn are defined by the transformation formulas x1 = r cos θ1 x2 = r sin θ1 cos θ2 x3 = r sin θ1 sin θ2 cos θ3 .. . xn−1 = r sin θ1 sin θ2 · · · sin θn−2 cos θn−1 xn = r sin θ1 sin θ2 · · · sin θn−2 sin θn−1 ,

Lebesgue Integration on Rn

399

where r > 0,

0 < θj < π, j = 1, . . . , n − 2, and 0 < θn−1 < 2π. Pn Note that sin θj > 0 for j ≤ n − 2 and j=1 x2j = r2 . Let U := (0, +∞) × (0, π)n−2 × (0, 2π) and V := Rn \ Rn−2 × [0, +∞) × {0}

and define ϕ on U by ϕ r, θ1 , , . . . , θn−1 = (x1 , . . . , xn ), where the xj are as above. Clearly U and V are open and ϕ is C ∞ on U . We claim that ϕ maps U onto V and has a C ∞ inverse on U . The inclusion ϕ(U ) ⊆ V is established as follows: If (r, θ1 , . . . , θn−1 ) ∈ U and (x1 , . . . , xn ) = ϕ(r, θ1 , . . . , θn−1 ) 6∈ V , then xn−1 ≥ 0 and xn = 0. But the latter implies that θn−1 = π, which gives the contradiction xn−1 < 0. For the reverse inclusion, we show that for each (x1 , . . . , xn ) ∈ V there exists a unique solution (r, θ1 , θ2 , . . . , θn−1 ) to the above system. Clearly, r and θ1 have the unique solutions X 1/2 n 2 r= xj and θ1 = arccos(x1 /r). j=1

In particular, the system has a unique solution if n = 2. Now set yj = xj /(r sin θ1 ), 2 ≤ j ≤ n. By induction, we may assume that the reduced system y2 = cos θ2 y3 = sin θ2 cos θ3 .. . yn−1 = sin θ2 · · · sin θn−2 cos θn−1 yn = sin θ2 · · · sin θn−2 sin θn−1 has a unique solution (θ2 , . . . , θn−1 ). Then the original system has the unique solution (r, θ1 , . . . , θn−1 ). Therefore, ϕ is one-to-one and ϕ(U ) = V . By standard properties of determinants and a reduction argument, Jϕ (r, θ1 , θ2 , . . . , θn−1 ) = rn−1 sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2 . Since Jϕ > 0 on U , the inverse function theorem implies that ϕ has a global C ∞ inverse on U . Hence, by the change of variables theorem, if f is Lebesgue measurable on Rn and either f ≥ 0 or f is integrable, then Z Z f dλ = (f ◦ ϕ)Jϕ dλ. V

U

400

A Course in Real Analysis

Since V differs from Rn by a set of measure zero, we may write the last equation as Z ∞ Z ∞ ··· f (x1 , . . . , xn ) dx1 · · · dxn (11.16) −∞

=

Z

−∞ ∞Z π

Z

π

2π

Z

f r cos θ1 , r sin θ1 cos θ2 , . . . , r sin θ1 · · · sin θn−1 0 0 0 0 rn−1 sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2 dθn−1 dθn−2 · · · dθ1 dr. ···

In particular, taking f to be the indicator function of C1n (0) and using 11.5.6, n we see that the left side of (11.16) is λ C1 (0) = αn and the right side is Z

1

Z

π

Z ···

0

0

π

Z

2π

rn−1 sinn−2 θ1 · · · sin2 θn−3 sin θn−2 dθn−1 dθn−2 · · · dθ1 dr 0 0 Z Z π 2π π = ··· sinn−2 θ1 · · · sin2 θn−3 sin θn−2 dθn−2 · · · dθ1 . n 0 0

In particular, Z π Z π nαn ··· sinn−2 θ1 sinn−3 θ2 · · · sin2 θn−3 sin θn−2 dθn−2 · · · dθ1 = . ♦ 2π 0 0 Proof of the change of variables theorem. Before we begin the proof proper, we make some reductions. First, by considering f + and f − , we need only prove the case f ≥ 0. Second, since a Lebesgue measurable function is equal a.e. to a Borel measurable function, we may assume that f is Borel measurable. Note that in this case f ◦ ϕ is also Borel measurable. To prove (11.15) it then suffices to verify that Z Z f dλ ≤ (f ◦ ϕ)|Jϕ | dλ (11.17) V

U

for all Borel measurable functions f : V :→ [0, +∞]. Indeed, if (11.17) holds for all f and ϕ, then, switching the roles of U and V it must also be the case that Z Z g dλ ≤ (g ◦ ϕ−1 )|Jϕ−1 | dλ U

V

for all Borel measurable g : U :→ [0, +∞]. Taking g = (f ◦ ϕ)|Jϕ | and recalling that Jϕ Jϕ−1 = 1, we obtain the reverse of inequality (11.17). Finally, by considering simple functions and using linearity, 10.5.8, and the monotone convergence theorem, it suffices to prove (11.17) for indicator functions f = 1B , where B ∈ B(Rn ) and B ⊆ V . Equation (11.17) then reduces to Z λ(B) ≤ |Jϕ | dλ, B ⊆ V, B ∈ B(Rn ), ϕ−1 (B)

Lebesgue Integration on Rn or, equivalently, (taking B = ϕ(E)), Z λ ϕ(E) ≤ |Jϕ | dλ, E ⊆ U, E ∈ B(Rn ).

401

(11.18)

E

The proof of (11.18) is a sequence of lemmas, the first of which treats the case of a linear change of variable. 11.6.3 Lemma. If T ∈ L(Rn , Rn ) is nonsingular, then λ(T (E)) = | det T |λ(E), E ∈ B(Rn ).

(11.19)

Proof. Since T is nonsingular, T (E) ∈ B(Rn ) so the left side of (11.19) is defined. Furthermore, if (11.19) holds for T1 and T2 , then it holds for T1 T2 : λ T1 T2 (E) = | det T1 |λ T2 (E) = | det T1 | | det T2 |λ(E) = | det(T1 T2 )|λ(E). Now observe that a nonsingular linear transformation T may be expressed as a product of elementary linear transformations, that is, linear transformations whose matrices are obtained from the identity matrix by one of the following operations: (a) Interchange of two rows. (b) Multiplication of a row by a nonzero constant. (c) Addition of one row to another. This is simply the assertion that a matrix may be put into reduced row echelon form by a sequence of elementary row operations. (See Appendix B.) We claim that (11.19) holds for elementary linear transformations T and bounded intervals E = I1 × . . . × In . In case (a), det T = −1 and T (E) is the interval obtained from E by interchanging a pair of intervals Ii and Ij , hence (11.19) holds in this case. In (b), T (E) is the interval obtained from E by multiplying one of the coordinate intervals by a nonzero constant a, hence λ(T (E)) = |a|λ(E). Since | det T | = |a|, (11.19) holds in this case as well. For case (c), assume for definiteness that the matrix of T is the result of adding row two of the identity matrix to row one, so T (x1 , x2 , x3 , . . . , xn ) = (x1 + x2 , x2 , x3 , . . . , xn ). Then det T = 1 and λ T (E) =

Z

1T (E) (x) dx =

Z

1E (x1 − x2 , x2 , . . . , xn ) dx.

By the Fubini–Tonelli theorem and translation invariance, the last integral

402

A Course in Real Analysis

evaluates to ZZ

Z ···

1I1 (x1 − x2 )1I2 (x2 ) · · · 1In (xn ) dxn · · · dx2 dx1 Z Z = |In | · · · |I3 | 1I2 (x2 ) 1I1 (x1 − x2 ) dx1 dx2 = |In | · · · |I3 | |I2 | |I1 | = λ(E).

Therefore, (c) holds. It now follows that (11.19) holds for all nonsingular T and all intervals E. To verify (11.19) for all Borel sets E, we use 11.5.7. For a fixed bounded interval I, let GI denote the collection of all E ∈ B(Rn ) for which λ(T (E ∩ I)) = | det T |λ(E ∩ I).

(11.20)

By the first part of the proof, GI contains all intervals. Let A, B ∈ GI with A ⊆ B, and set C = A ∩ I and D = B ∩ I. Then (B \ A) ∩ I = D \ C and λ T (D\C) = λ T (D) −λ T (C) = | det T | λ(D)−λ(C) = | det T |λ(D\C), hence B \ A ∈ GI . (The operation of substraction is legitimate because C and D are bounded.) Now let Ak ∈ GI , Ak ↑ A. Letting k → +∞ in λ(T (Ak ∩ I)) = | det T |λ(Ak ∩ I) shows that A ∈ GI . Therefore, GI satisfies (a)–(c) of 11.5.7, hence (11.20) holds for every E ∈ B(Rn ). Taking a sequence of bounded intervals Ik ↑ Rn in (11.20) yields (11.19).

√ r n/2 y

r/2

Qr (y) Br√n/2 (y)

FIGURE 11.3: Concentric cube and ball. For the remaining lemmas, the following terminology and notation will be useful. The cube with center y ∈ Rn and edge r > 0 is the semi-closed interval Q = Qr (y) := {x ∈ Rn : yj − r/2 ≤ xj < yj + r/2, j = 1, . . . , n} .

Lebesgue Integration on Rn √ Note that |Q| = rn and the diameter of Q is r n. Thus

403

Br/2 (y) ⊆ Qr (y) ⊆ Br√n/2 (y). A paving of a subset A of Rn is a finite collection Qr of pairwise disjoint cubes with edge r that covers A. Two pavings Qr = {Qr (xj ) : 1 ≤ j ≤ m} and Qs = {Qs (xj ) : 1 ≤ j ≤ m} with the same centers are said to be concentric. Any bounded set A has a paving Qr with arbitrarily small r. Indeed, if A ⊆ [a, b)n , one need only subdivide [a, b) into subintervals of size (b − a)/k for sufficiently large k and form Cartesian products of these subintervals. 11.6.4 Lemma. Let K ⊆ U be compact. (a) For each sufficiently small δ > 0, there exists a compact set Kδ with K ⊆ Kδ ⊆ U . (b) For each r < δ, there exists a paving Qr of K contained in Kδ . Proof. For subsets A, B ⊆ Rn , denote by d(A, B) the distance between A and B: d(A, B) = inf {ka − bk : a ∈ A, b ∈ B} . √ Since K is compact and U c is closed, δ0 := d(U c , K) > 0. For 0 < δ < δ0 / n, let √ Kδ = x : d(x, K) ≤ δ n . Then Kδ is compact and K ⊆ Kδ ⊆ U . Let Q be a cube with edge r. If

Qi

K Kδ U

FIGURE 11.4: The paving Qr . x ∈ Q ∩ K and y ∈ Q ∩ Kδc , then √ √ δ n < d(y, K) ≤ kx − yk ≤ r n. Therefore, if r < δ and Q ∩ K 6= ∅, then Q ∩ Kδc = ∅, that is, Q ⊆ Kδ . Since K is bounded, there exists a paving Qr of K. Removing those members of Qr that do not meet K produces a paving of K contained in Kδ . 11.6.5 Corollary. Let ψ : U → Rn be C 1 on U and let E ⊆ U with λ(E) = 0. Then λ ψ(E) = 0.

404

A Course in Real Analysis

Proof. Suppose first that E is bounded. Let V ⊇ E be open with compact closure contained in U and set c := sup kψ 0 (z)k. z∈cl(V )

By continuity of ψ 0 and compactness of cl(V ), c < +∞. Given ε > 0, let W ⊇ E be open with compact closure K = cl(W ) ⊆ V such that λ(K) < ε/2. This is possible by 10.4.4, since λ(E) = 0. Now let Kδ be as in 11.6.4. Since Kδ ↓ K as δ ↓ 0, we may take δ sufficiently small so that λ(Kδ ) < ε. According to the lemma, we may choose a paving Qr = {Q1 , . . . , Qk } of K contained in Kδ with r < ε. It follows that kr = n

k X

λ(Qj ) = λ

j=1

[ k

Qj

< ε.

(11.21)

j=1

Let xj denote the center of Qj . Since Qj is convex, 9.3.6 implies that √ kψ(x) − ψ(xj )k ≤ ckx − xj k ≤ cr n, x ∈ Qj . Therefore,

ψ(Qj ) ⊆ Bcr√n ψ(xj )) ⊆ Q2cr√n ψ(xj )

and so λ ψ Qj

√ ≤ (2cr n)n .

Since the sets ψ(Qj ) cover ψ(K), √ √ λ (ψ(E)) ≤ λ (ψ(K)) ≤ k(2cr n)n ≤ (2c n)n ε, the last inequality by (11.21). Since ε was arbitrary, λ (ψ(E)) = 0. This proves the assertion of the lemma for bounded E. In the unbounded case, take a sequence of bounded Borel sets Ek ↑ E. 11.6.6 Lemma. Let ψ be C 1 on U , Q a cube contained in U , and let In denote the identity transformation on Rn . If kdψx0 − In k ≤ c for all x ∈ Q, then λ ψ(Q) ≤ [(1 + c)n]n λ(Q). ˜ Proof. Let ψ(x) = ψ(x) − x. Then dψ˜x = dψx − In . By 9.3.6, ˜ ˜ kψ(x) − ψ(y)k ≤ ckx − yk, for all x, y ∈ Q. Thus, if Q has center x0 and edge r, then for all x ∈ Q √ ˜ ˜ 0 )k+kx−x0 k ≤ (c+1)kx−x0 k ≤ (c+1) nr/2, kψ(x)−ψ(x0 )k ≤ kψ(x)− ψ(x that √ is, ψ(Q) is contained in the closed ball C with center ψ(x0 ) and radius n(c + 1)r/2. Since C is contained in the cube with center ψ(x0 ) and edge (c + 1)nr, λ ψ(Q) ≤ [(c + 1)nr]n = [(c + 1)n]n λ(Q).

Lebesgue Integration on Rn

405

11.6.7 Lemma. Let ψ : U → Rn be C 1 on U and let K ⊆ U be compact. Then, for each ε > 0, there exists δ > 0, a compact set Kδ with K ⊆ Kδ ⊆ U , and concentric pavings Qr , Qnr of K contained in Kδ with arbitrarily small r such that for any Qr (y) ∈ Qr , λ ϕ Qr (y) ≤ (1 + ε)n |Jϕ (y)|λ Qnr (y) (11.22) Moreover, δ may be chosen so that Z Z |Jϕ (x)| dx < |Jϕ (x)| dx + ε. Kδ

(11.23)

K

Proof. Let M = sup (dϕy )−1 : y ∈ Kδ , where Kδ is chosen as in 11.6.4. For x, y ∈ U define ψ y (x) = dϕy Since dϕy

−1

−1

−1 −1 ϕ(x) − ϕ(y) = dϕy ϕ(x) − dϕy ϕ(y) .

is linear, by the chain rule d(ψ y )x = (dϕy )−1 ◦ dϕx .

Thus for all x ∈ U , y ∈ Kδ , and z ∈ Rn ,

−1 kd(ψ y )x (z) − zk = dϕy dϕx (z) − dϕy (z) ≤ M kdϕx − dϕy k kzk. Therefore, by definition of the operator norm, kd(ψ y )x − In k ≤ M kdϕx − dϕy k.

(11.24)

Now, by the uniform continuity of dϕ on Kδ there exists 0 < √δ1 < δ such that kdϕx − dϕy k ≤ ε/M for all x, y ∈ Kδ with kx − yk < δ1 n. Let r < δ1 /n and let Qr Qnr be concentric pavings of√ K contained in Kδ . If √ x ∈ Q := Qr (y) ∈ Qr , then kx − yk < r n < δ1 n, hence, from (11.24), kd(ψ y )x − In k < ε. By 11.6.6, λ ψ y (Q) ≤ [(1 + ε)n]n λ(Q) = (1 + ε)n λ Qnr (y) . (11.25) On the other hand, since ψ y (Q) = dϕy ) translation invariance and 11.6.3,

−1

−1 ϕ(Q) − dϕy ϕ(y) , by

−1 λ ψ y (Q) = λ dϕy (ϕ(Q)) = |Jϕ (y)|−1 λ ϕ(Q) .

(11.26)

Inequality (11.22) now follows from (11.25) and (11.26). R For (11.23), note that since K1/k ↓ K and µ(A) := A |Jϕ | dλ is a measure on the Borel sets (11.3.4), µ K1/k ↓ µ(K). Thus there exists k such that µ K1/k < µ(K) + ε. Taking δ < 1/k completes the proof.

406

A Course in Real Analysis

11.6.8 Lemma. If K ⊆ U is compact, then Z λ ϕ(K) ≤ |Jϕ (y)| dy. K

Proof. Let ε > 0 and choose δ > 0 as in 11.6.7. By uniform continuity of Jϕ (x) on Kδ , there exists δ1 < δ such that |Jϕ (x) − Jϕ (y)| < ε for all x, y ∈ Kδ with kx − yk < δ1 . Choose pavings Qr = {Qr (y)}y and Qnr = {Qnr (y)}y as in 11.6.7. Then for x ∈ Qnr (y) |Jϕ (y)| ≤ |Jϕ (x) − Jϕ (y)| + |Jϕ (x)| < ε + |Jϕ (x)|, hence, by (11.22), (1 + ε)−n λ ϕ(Qr (y)) ≤ |Jϕ (y)|λ(Qnr (y)) ≤

Z

|Jϕ (x)| + ε dx,

Qnr (y)

so X (1 + ε)−n λ ϕ(K) ≤ (1 + ε)−n λ ϕ(Qr (y)) y

Z

|Jϕ (x)| + ε dx

≤ Kδ

Z

|Jϕ (x)| dx + ε 1 + λ(Kδ ) .

≤

by (11.23)

K

Letting ε → 0 verifies the lemma. Now use 10.4.5 to obtain an increasing sequence of compact sets Kk ⊆ E such that λ(Kk ) ↑ λ(E). Then λ ϕ(Kk ) ↑ λ ϕ(E) and, by 11.6.8, Z Z |Jϕ (y)| dy ≤ |Jϕ (y)| dy. λ ϕ(Kk ) ≤ Kk

E

Letting k → +∞ yields (11.18), completing the proof of the change of variables theorem. 11.6.9 Remark. If V is a linear subspace of Rn of dimension m < n, then λn (V) = 0. To see this, let v1 , . . ., vm , . . ., vn , be an orthonormal basis for Rn , where the first m vectors form a basis for V.3 Define TV ∈ L(Rn , Rn ) such that TV (vj ) = ej , 1 ≤ j ≤ n. Then TV is an orthogonal transformation and TV (V) = Rm × {0}. By 11.6.3 λn (V) = | det(TV )|λn (Rm × {0}) = 0, 3 This

is always possible by the Gram–Schmidt process.

Lebesgue Integration on Rn

407

as claimed. This also shows that (11.19) holds for singular transformations T as well, since then both sides of that equation are zero. While the n-dimensional volume of a subset E of V is zero, E may still have positive m-dimensional measure. This is defined as λV (E) := λm TV (E) for E ∈ TV−1 B(Rm ) . From a geometric point of view, this is a reasonable definition, since an orthogonal transformation is either a rotation or a rotation combined with a reflection and therefore does not change volumes or areas. To see that the definition does not depend on the particular choice of the orthonormal basis, let w1 , . . . , wn be another orthonormal basis for Rn whose first m members form a basis for V and let T˜V ∈ L(Rn , Rn ) satisfy T˜V (wj ) = ej , 1 ≤ j ≤ n. Set T = T˜V TV−1 . Then, by (11.19), λm T˜V (E) = λm T TV (E) = | det T |λm TV (E) = λm TV (E) , the last equality because T is orthogonal and hence has determinant ±1.

♦

Exercises 1. Define the n-dimensional ellipsoid ( ) 2 2 x1 xn E = (x1 , . . . , xn ) : + ··· + ≤1 , a1 an where aj > 0. Prove that λn (E) = a1 · · · an λn C1 (0) . p p p 2. Show that the volume of the solid with surface |x| + |y| + |z| = 1 is given by Z Z Z 1

1−u

1−u−v

64

uvw dw dv du. 0

0

0

3.S ⇓4 Let h be Lebesgue integrable on [0, +∞). Use 11.6.2 to prove that Z Z ∞ h(kxk) dx = nαn h(r)rn−1 dr. Rn

0

4. Use Exercise 3 to show that for n ≥ 2 Z Z (a) exp(−kxk) dx = n! αn . (b) Rn

exp(−kxk2 ) dx = π n/2 .

Rn

5.S A hole of radius R ∈ (0, 1) is drilled in the (n + 1)-dimensional ball C1n+1 (0) from the north pole (0, 0, . . . , 1) to the south pole (0, 0, . . . , −1). Use Exercise 3 to show that the amount removed from the ball is p p nαn R 1 − R2 − arcsin 1 − R2 + π/2 . 4 This

exercise will be used in 13.2.5 and 13.4.2.

408

A Course in Real Analysis

6.S (Theorem of Pappus) Let E ∈ M(Rn ) be bounded with positive n-dimensional Lebesgue measure such that xn > 0 for all x = (x1 , . . . , xn ) ∈ E. Define Er = {(x1 , . . . , xn−1 , xn cos θ, xn sin θ) : x ∈ E, 0 < θ < 2π} . Prove that

λn+1 (Er ) = 2πxn λn (E),

where

1 xn := λn (E)

Z

xn dλn (x1 , . . . , xn ),

E

the nth coordinate of the centroid x of E. Thus if n = 2, then Er is the rotation of E about the x1 -axis, and the theorem of Pappus asserts that the volume of Er is equal to the area of E times the distance the centroid of E travels around the x1 axis.

x2 E x θ x1

x3 FIGURE 11.5: Theorem of Pappus.

Chapter 12 Curves and Surfaces in Rn

12.1

Parameterized Curves

A parameterized curve Rn is a continuous function ϕ : I → Rn , where I is an interval in R. We shall usually refer to ϕ as simply a curve. The range ϕ(I) of ϕ is called the trace of ϕ and is denoted by trace(ϕ). The curve is said to lie in a set E ⊆ Rn if trace(ϕ) ⊆ E. The curve is called simple if ϕ is one-to-one. If I = [a, b], the point ϕ(a) is the initial point of the curve and ϕ(b) the terminal point. The curve ϕ is then said to be closed if ϕ(a) = ϕ(b), and simple closed if it is closed and ϕ is one-to-one on (a, b), that is, the curve intersects itself only at the initial and terminal points. For example, the curve (cos(2kπt), sin(2kπt)), t ∈ [0, 1], k ∈ N, is a simple closed curve iff k = 1; its trace is the circle x2 + y 2 = 1. ϕ(a)

ϕ(b)

ϕ(a)

ϕ(a) = ϕ(b)

ϕ(b)

Simple curve

Non-simple curve

Simple closed curve

FIGURE 12.1: Curves in R2 . A curve ϕ : I → Rn is said to be of class C r if ϕ is C r on an open interval containing I. A C 1 curve ϕ is smooth if ϕ0 (t) 6= 0 for all t ∈ I. For example, on [−1, 1] the curve ϕ(t) = (t, t2 ) is smooth but the curve ψ(t) = (t3 , t6 ), which has the same trace as ϕ, is not. A curve ϕ : [a, b] → Rn is said to be piecewise smooth if, for some partition a = a0 < a1 < · · · < am = b, ϕ is smooth on each interval [aj−1 , aj ]. This implies that ϕ0 is uniformly continuous on each interval of smoothness (aj−1 , aj ) and has right-hand and left-hand limits at the left and right endpoints, respectively. Thus a piecewise smooth curve may be viewed as a concatenation (sum) 409

410

A Course in Real Analysis

of smooth curves, as shown in Figure 12.2. Note that at junctions that are corners there are two tangent vectors, and at junctions that are cusps there is one. A point on a smooth portion of the curve will be called a smooth point. A piecewise smooth curve therefore consists of smooth points and finitely many corner or cusp points. corner

cusp

smooth point

corner

FIGURE 12.2: A piecewise smooth curve with tangent vectors. A reparametrization of a curve ϕ : I → Rn is a curve ψ = ϕ ◦ α : J → Rn , where α : J → I is continuous, strictly increasing, and α(J) = I (hence trace(ψ) = trace(ϕ)). If ϕ is smooth, then α is required to be smooth with positive Jacobian. If ψ is a reparametrization of ϕ, then ϕ and ψ are said to be equivalent. For example, the smooth curve (t, t2 , t3 ) (t > 0) is equivalent to the curve (et , e2t , e3t ) (t ∈ R). A curve ϕ : I → Rn has a positive direction, namely, the direction that ϕ(t) moves as t increases. An equivalent curve ψ = ϕ ◦ α has the same direction since α is strictly increasing. The curve −ϕ, defined by (−ϕ)(t) := ϕ(−t), −t ∈ I, has the opposite (negative) direction. If ϕ is piecewise smooth, then the positive direction is given by the tangent vectors ϕ0 (t), defined at smooth points. At corners and cusps the tangent vectors are right- and left-hand limits. The set of tangent vectors to a curve is called the tangent vector field (defined more precisely later). 12.1.1 Proposition. . Let ϕj : [aj , bj ] → Rn , j = 1, . . . , k, be piecewise C 1 curves such that ϕj (bj ) = ϕj+1 (aj+1 ), j = 1, . . . , k − 1. Then there exists a piecewise C 1 curve ϕ : [0, 1] → Rn , denoted by ϕ = ϕ1 + ϕ2 + · · · + ϕk and called the sum of the curves ϕj , such that ϕ [(j−1)/k,j/k] is equivalent to ϕj . Proof. Define αj : [(j − 1)/k, j/k] → [aj , bj ] by αj (t) = bj + (bj − aj )(kt − j), (j − 1)/k ≤ t ≤ j/k, and ϕ : [0, 1] → Rn by ϕ = ϕj ◦ αj on [(j − 1)/k, j/k].

Curves and Surfaces in Rn

411

Exercises 1.S Prove that the notion of equivalent smooth curves is an equivalence relation. 2. Show that if ϕ is smooth and ψ = ϕ ◦ α is an equivalent curve, then ϕ0 (t) ψ 0 (t) = . kϕ0 (t)k kψ 0 (t)k Thus the unit tangent vector field is invariant under a reparametrization. 3.S Sketch the trace of the curve ϕ(t) = (t2 , t3 − t) on the interval [−2, 2]. Find all points on the trace where there are two tangent vectors and express these vectors in terms of the standard basis. 4. Find the tangent vector field of the given curve ϕ on the interval [0, 2π]. Sketch the trace and find all points on the trace at which there are two tangent vectors. Express these vectors in terms of the standard basis. (a)S ϕ(t) = sin t, cos(2t) . (b) ϕ(t) = cos t, sin(2t) . (c) ϕ(t) = cos t, cos(2t) . (d) ϕ(t) = sin t, sin(2t) . 5. In (a)–(d) below, find a smooth simple curve or a smooth simple closed curve ϕ : I → C with trace C. x2 y2 (a) C is the intersection of the elliptic cylinder 2 + 2 = 1 and the a b plane x + y + z = 1. x2 y2 (b)S C is the intersection of the elliptic cylinder 2 + 2 = 1 and the a b surface z = 2xy. (c) C is the intersection in the first octant of the paraboloid z = x2 + y 2 and the plane x + y + z = 1. (d) C is the intersection in the first octant of the cone z = x2 + y 2 and the plane x + y + z = 1. 6.S Let ϕ : [a, b] → Rn be a C 1 curve with the property that for some x ∈ Rn , ϕ(t) = x for infinitely many t ∈ [a, b]. Prove that ϕ is not smooth. 1 7. Let f be C 1 on an open set U and let ϕ be a C curve in U . Suppose 0 that ϕ (t) = ∇f ϕ(t) for all t > a and that the limit x := limt→+∞ ϕ(t) exists in U . Prove that ∇f (x) = 0.

Hint. Assume ∇f (x) 6= 0. Let g = f ◦ ϕ and show that g 0 (t) > k∇f (x)k2 /2 for all sufficiently large t.

412

12.2

A Course in Real Analysis

Integration on Curves

Rectifiable Curves Let ϕ : I → Rn be a parameterized curve. Assume first that I = [a, b]. For a partition P = {t0 = a < t1 < · · · < tk−1 < tk = b} of [a, b] define LP (ϕ) =

k X

kϕ(tj ) − ϕ(tj−1 )k,

j=1

which is the length of the inscribed polygonal line with segments joining the points ϕ(tj−1 ) and ϕ(tj ).

ϕ(t2 )

ϕ(t3 )

ϕ(t1 )

ϕ(b)

ϕ(a)

FIGURE 12.3: Inscribed polygonal line. The (arc) length of ϕ is defined as length(ϕ) := sup LP (ϕ), P

where the supremum is taken over all partitions P of [a, b]. If length(ϕ) < +∞, then ϕ is said to be rectifiable. Note that if ψ = ϕ ◦ α is equivalent to ϕ, then length(ψ) = length(ϕ), since α : [c, d] → [a, b] induces a one-to-one correspondence between partitions of [c, d] and [a, b]. If I = [a, b) (where b could be infinite), define length(ϕ) := sup length ϕ [a,t] . a 1. This follows from the inequalities k X

|y(tj ) − y(tj−1 )| ≤

k X

j=1

kϕ(tj ) − ϕ(tj−1 )k ≤ 2(b − a) + 2

j=1

k X

|y(tj ) − y(tj−1 )|

j=1

and 5.9.3.

♦

We prove in 12.2.4 below that piecewise C 1 curves on [a, b] are rectifiable. For this, we require two lemmas. The proof of the first is similar to that of the corresponding result for lower Darboux sums and is left as an exercise. 12.2.2 Lemma. Let ϕ : [a, b] → Rn be a curve and let P and Q be partitions of [a, b]. If P is a refinement of Q, then LQ (ϕ) ≤ LP (ϕ). 12.2.3 Lemma. Let ϕ : [a, b] → Rn be a curve and c ∈ (a, b). Then length(ϕ) = length ϕ|[a,c] + length ϕ|[c,b] . In particular, ϕ is rectifiable iff ϕ|[a,c] and ϕ|[c,b] are rectifiable. Proof. Let P 0 and P 00 be partitions of [a, c] and [c, b], respectively, and set P = P 0 ∪ P 00 . Then P is a partition of [a, b] and length(ϕ) ≥ LP (ϕ) = LP 0 ϕ|[a,c] + LP 00 ϕ|[c,b] . Taking suprema over P 0 and then P 00 yields length(ϕ) ≥ length ϕ|[a,c] + length ϕ|[c,b] . For the reverse inequality, let P = {t0 = a < t1 < · · · < tk = b} be a partition of [a, b] and suppose c ∈ (ti−1 , ti ]. If P 0 = {t0 = a < t1 < · · · < ti−1 < c} and P 00 = {c ≤ ti < · · · < tk = b}, then an application of the triangle inequality shows that LP (ϕ) ≤ LP 0 ϕ|[a,c] + LP 00 ϕ|[c,b] ≤ length ϕ|[a,c] + length ϕ|[c,b] . Since P was arbitrary, length(ϕ) ≤ length ϕ|[a,c] + length ϕ|[c,b] . 12.2.4 Theorem. Let ϕ : [a, b] → Rn be piecewise C 1 . Then ϕ is rectifiable and m Z aj X length(ϕ) = kϕ0 (t)k dt, j=1

aj−1

where ϕ is smooth on the intervals [aj−1 , aj ], a = a0 < a1 < · · · < am = b.

414

A Course in Real Analysis

Proof. By 12.2.3 we may assume that ϕ = (ϕ1 , . . . , ϕn ) is C 1 on [a, b]. Given ε > 0, choose δ > 0 so that m Z b X 0 kϕ (t)k dt − (12.1) kϕ0 (tk )k∆tk < ε, ∆tk := tk − tk−1 a

k=1

for all partitions P = {t0 = a < t1 < · · · < tm−1 < tm = b} with kPk < δ. For such a partition P, choose sj,k ∈ (tk−1 , tk ) such that ϕj (tk ) − ϕj (tk−1 ) = ϕ0j (sj,k )∆tk ,

k = 1, . . . , m, j = 1, . . . , n.

Then LP (ϕ) =

m X

kϕ(tk ) − ϕ(tk−1 )k =

m X n X

1/2

∆tk ,

j=1

k=1

k=1

|ϕ0j (sj,k )|2

hence m X 0 LP (ϕ) − kϕ (t )k∆t k k k=1

m n 1/2 X 1/2 n X X 0 2 0 2 = |ϕj (sj,k )| − |ϕj (tk )| ∆tk . k=1 j=1 j=1 Taking a smaller δ if necessary, we may assume that the absolute value of the term in braces is less than ε/(b − a). This is possible by the uniform continuity of ϕ0 . It follows that m X 0 LP (ϕ) − kϕ (t )k∆t (12.2) k k < ε. k=1

From (12.1) and (12.2) we now have Z b Z kϕ0 (t)k dt − 2ε < LP (ϕ) < a

b

kϕ0 (t)k dt + 2ε

(12.3)

a

for all P with kPk < δ. Since LP (ϕ) ≤ length(ϕ) and ε was arbitrary, the first inequality in (12.3) implies that Z b kϕ0 (t)k dt ≤ length(ϕ). a

For the reverse inequality, let Q be any partition of [a, b]. Refine Q to obtain a partition P with kPk < δ. Then, from 12.2.2 and the second inequality in (12.3), Z b L(ϕ, Q) < kϕ0 (t)k dt + 2ε. a

Since Q and ε are arbitrary, length(ϕ) ≤

Rb a

kϕ0 (t)k dt.

Curves and Surfaces in Rn

415

The proof of the following corollary is left to the reader. 12.2.5 Corollary. If ϕ : [a, b) → Rn is C 1 , then length(ϕ) is the improper Rb integral a kϕ0 (t)k dt. 12.2.6 Example. Let ϕ(t) = e−t cos t, e−t sin t , where 0 ≤ t < +∞. Then R∞ kϕ0 (t)k = e−t , hence length(ϕ) = 0 e−t = 1. ♦

Line Integrals Let ϕ : [a, b] → Rn be a C 1 curve with trace C and let f : C → R be continuous. The line integral of f over ϕ is defined by Z Z Z b f ds = f ds = f ϕ(t) kϕ0 (t)k dt. ϕ

C

a

Note that if ψ = ϕ ◦ α is an equivalent parametrization, where α : [c, d] → [a, b] is C 1 , then, by the chain rule and the change of variables theorem, Z d Z d 0 f ψ(t) kψ (t)k dt = f ϕ(α(t)) kϕ0 α(t) kα0 (t) dt c

c

=

Z

b

f ϕ(u) kϕ0 (u)k du.

a

The value of a line integral is therefore independent of the choice of parametrization. If ϕ : [a, b] → Rn is piecewise C 1 , then the line integral is defined as Z XZ f ds = f ds, ϕ

j

ϕj

where ϕj is the restriction of ϕ to [aj , aj+1 ] and ϕ is C 1 on [aj , aj+1 ]. If ϕ : I → Rn is C 1 , where I is an arbitrary interval, then the line integral is defined as an improper integral, as in the case of arc length. 12.2.7 Remark. Theorem 12.2.4 shows that arc length is the line integral of the constant function 1. Using techniques similar to those found in the proof R of that theorem, one may show that if ϕ is C 1 , then ϕ f is the limit of sums of the form k X (f ◦ ϕ)(t∗j )kϕ(tj ) − ϕ(tj−1 )k, t∗j ∈ (tj−1 , tj ), j=1

as maxj kϕ(tj ) − ϕ(tj−1 )k → 0. This interpretation is useful in applications. For example, if f (x) is the mass per unit length at the point x of a wire C, then (f ◦ ϕ)(t∗j )kϕ(tj ) − ϕ(tj−1 )k is approximately the mass of a small piece of the wire. Summing and taking the limit gives the mass of the wire as the R line integral C f ds. ♦

416

A Course in Real Analysis

Vector Fields 12.2.8 Definition. A vector field on a set E ⊆ Rn is a function F~ = (f1 , . . . , fn ) : E → Rn . The vector field is said to be of class C r if each fj is C r .

♦

Geometrically, a vector field assigns to each point of E a unique vector in R , as illustrated in Figure 12.4. n

E

x F~ (x)

FIGURE 12.4: Vector field on E. If ϕ is a simple smooth curve and x = ϕ(t), then ϕ0 (t) ~vϕ (x) := ϕ0 (t) and T~ϕ (x) := kϕ0 (t)k denote, respectively, the tangent vector field and unit tangent vector field along ϕ. If ϕ denotes the position of a particle at time t, then the tangent vector field is called the velocity vector field of the particle. Vector fields that describe forces, such as gravitation or electromagnetism, are called force fields. Line integrals may then be used to calculate the work done by the force in moving a particle along a curve. Specifically, suppose the particle moves along a simple smooth curve ϕ : [a, b] → R3 under the action of a continuous force field F~ = f1 , f2 , f3 on C := trace(ϕ). The work ∆j W done by the force in moving the particle from a point xj = ϕ(tj ) on C to a nearby point xj+1 = ϕ(tj+1 ) is approximately the component of the force in the direction of the tangent to the curve at xj multiplied by the distance the particle travels: ∆j W ≈ F~ (xj ) · T~ϕ (xj ) kxj − xj+1 k. P The total work W done by the force is then approximately j ∆j W . Since F is continuous, the approximation gets better by taking smaller intervals. It is therefore reasonable to define the totalPwork done by the force in moving the particle along the curve as the limit of j ∆j W as maxj kxj − xj+1 k → 0. By 12.2.7, we are therefore led to the definition Z Z b W := F~ · T~ ds = F~ ϕ(t) · T~ϕ ϕ(t) kϕ0 (t)k dt. ϕ

a

Curves and Surfaces in Rn

417

Since T~ϕ ϕ(t) = kϕ0 (t)k−1 ϕ0 (t), we see that Z b Z b dx1 dx2 dx3 f1 (x) F~ ϕ(t) · ϕ0 (t) dt = W = + f2 (x) + f3 (x) dt, dt dt dt a a where x = ϕ(t). The last integral is frequently written Z f1 dx1 + f2 dx2 + f3 dx3 . ϕ

The integrand is called a (differential) 1-form on C in R3 .

Differential 1-Forms in Rn Let fj be defined on a set S ⊆ Rn . The symbol ω := f1 dx1 + · · · + fn dxn is called a (differential) 1-form on S. The form is said to be C r on S if each fj is C r on S, where r ∈ N ∪ {+∞}. Given another 1-form η = g1 dx1 + · · · + gn dxn on S and a, b ∈ R, the 1-form aω + bη on S is defined by aω + bη := (af1 + bg1 ) dx1 + · · · + (afn + bgn ) dxn . ~ = (h1 , . . . , hn ) is a vector field on S, we define the inner product ω · H ~ of If H ~ ω and H on S by ~ ω · H(x) :=

n X

fj (x)hj (x),

x ∈ S.

j=1

The integral of a continuous (that is, C 0 ) 1-form ω over a C 1 curve ϕ : [a, b] → S is defined as Z Z b Z b 0 0 ω= f1 ϕ(t) ϕ1 (t) + · · · + fn ϕ(t) ϕn (t) dt = F~ (ϕ(t)) · ϕ0 (t) dt, ϕ

a

a

R where F~ := (f1 , . . . , fn ). If ϕ is only piecewise C , then ϕ ω is defined to be the sum of the integrals over the intervals on which ϕ is C 1 . The following properties of the integral are easily established: Z Z Z • (aω + bη) = a ω + b η, 1

ϕ

•

Z

ϕ

ω=−

Z

−ϕ

•

Z

ϕ

and

ω, ϕ

ω=

ϕ1 +···+ϕk

k Z X j=1

ϕj

ω.

418

A Course in Real Analysis

A continuous 1-form ω = f1 dx1 + · · · + fn dxn on an open set U ⊆ Rn is said to be exact if there exists a C 1 function f on U such that fj = ∂j f on U for each j. We then write n X ∂f ω = df = dxj . ∂x j i=1

The following proposition shows that the integral of an exact form over a curve depends only on f and the endpoints of the curve. 12.2.9 Proposition. If ϕ : [a, b] → U is piecewise C 1 , then Z df = f ϕ(b) − f ϕ(a) . ϕ

Proof. If ϕ is C 1 , then, by the chain rule and the fundamental theorem of calculus, Z Z bX Z b n 0 df = (∂j f ) ϕ(t) ϕi (t) dt = (f ◦ ϕ)0 (t) dt = f ϕ(b) − f ϕ(a) . ϕ

a i=1

a

If ϕ is only piecewise C 1 , subdivide the interval [a, b] into intervals on which ϕ is smooth, apply the above result to each subinterval, and sum the results. 12.2.10 Theorem. Let U ⊆ Rn be open and connected and let ω be a continuous 1-form on U . The following statements are equivalent: (a) ω is exact. R (b) ϕ ω = 0 for every closed piecewise C 1 curve ϕ in U . R R (c) φ ω = ψ ω for every pair of piecewise C 1 curves φ, ψ : [a, b] → Rn in U with φ(a) = ψ(a) and φ(b) = ψ(b). Proof. That (a) implies (b) follows from 12.2.9. φ

ψ(a) = φ(a)

ϕ

ψ(b) = φ(b)

ψ

FIGURE 12.5: ϕ = ψ − φ. by

For (b) implies (c), define a closed, piecewise smooth curve ϕ : [a, b+1] → Rn ϕ(t) = ψ(t), a ≤ t ≤ b, ϕ(t) = φ (b + (b − t)(b − a)) , b ≤ t ≤ b + 1.

Curves and Surfaces in Rn

419

(See Figure 12.5.) Then ϕ|[b,b+1] is equivalent to −φ, hence if (b) holds, Z Z Z 0= ω= ω − ω, ϕ

ψ

φ

proving (c).

ϕx

a

ψt U

x

x + tej

FIGURE 12.6: ϕx+tej = ϕx + ψt . Pn Now assume that (c) holds and let ω = j=1 fj dxj . To establish (a), we construct a function f on U such that ∂j f = fj . Choose any point a ∈ U . 1 By Exercise 8.7.8, for each x ∈ U there exists a piecewise R C curve ϕx in U with initial point a and terminal point x. Define f (x) = ϕx ω. By (c), f (x) is independent of the path and hence is well-defined. Fix j, let t > 0, and denote by ψt the line segment x + uej , 0 ≤ u ≤ t. Then ψt lies in U for sufficiently small t > 0, and by continuity of fj , Z Z 1 1 1 t f (x + tej ) − f (x) = ω= fj x + uej ) du → fj (x) t t ψt t 0 as t → 0+ . A similar argument works for the case t → 0− . Therefore, ∂j f (x) = fj (x), as required.

Exercises 1.S Determine which of the following curves are rectifiable. (a) ϕ(t) = (t, t−p ), 0 < t ≤ 1, where p > 0. 2

3

(b) ϕ(t) = (e−t , e−t , e−t ), 0 ≤ t < +∞. (c) ϕ(t) = t−1 , e−t , t ≥ 1. (d) ϕ(t) = t−1 , e−t , 0 < t ≤ 1. R 2. Evaluate ϕ f for (a) ϕ(t) = (t3 /3, t4 /4), 1 ≤ t ≤ 2, f (x, y) = x/y. (b)S ϕ(t) = t, sin(2t), cos(2t) , 0 ≤ t ≤ π/4, f (x, y, z) = xz. (c) ϕ(t) = t, t2 /2, t3 /3 , 0 ≤ t ≤ 1, f (x, y, z) = x + 6z. √ (d) ϕ(t) = sin t, 2 cos t, sin t , 0 ≤ t ≤ π/2, f (x, y, z) = xyz.

420

A Course in Real Analysis

3. Set up, but do not evaluate, the integral that gives the circumference of x2 y2 the ellipse 2 + 2 = 1. (Your answer should involve sin2 t.) a b 4. In each case below, find a smooth simple curve or a smooth simple closed curve with trace C. Use the parametrization to find an integral that gives the length of the curve. (Do not evaluate the integral.) (a) C = (x, y) : x3 − 7y 2 = 1, 1 < x < 2, y > 0 . (b)S C = (x, y) : 9(x − 1)2 + 4(y − 2)2 = 36 . (c) C = (x, y) : x2 − y 2 = 4, x > 2, 0 < x + y < a . 5. Let ϕ(x) = (x, g(x)), a ≤ x ≤ b, where g is continuously differentiable, and let f (x, y) be continuous on the graph of g. Show that Z ϕ

f=

Z

b

p f x, g(x) 1 + [g 0 (x)]2 dx.

a

Use this to find Z (a) f if g(x) = (2/5)x5/2 and f (x, y) = x2 , 0 ≤ x ≤ 1. ϕ

(b)S the length of the graph of the equation x2/3 + y 2/3 = 1. (c) the length of the graph of the function g(x) = 0 < a ≤ x ≤ b and p > 2.

xp x2−p + , where 2p 2(p − 2)

6. Prove 12.2.5. 7.S Let a smooth curve ϕ : [a, b] → R2 be described in polar coordinates by ϕ(t) = r(t) cos θ(t), r(t) sin θ(t) , r(t) ≥ 0. Show that length(ϕ) =

Z

b

q 2 2 r(t) θ0 (t) + r0 (t) dt.

a

8. Let F~ = (F1 , F2 , F3 ) be a force field in R3 that moves a particle of mass m along a smooth curve ϕ : [a, b] → R3 . The kinetic energy of the particle at time t is defined as 21 mkϕ0 (t)k2 . Use Newton’s second law F~ = mϕ00 to show that the work done by the force in moving the particle from ϕ(a) to ϕ(b) is the change in kinetic energy 0 2 1 2 mkϕ (b)k

− 12 mkϕ0 (a)k2 .

Curves and Surfaces in Rn

421

9. A force field F~ in R3 is said to be conservative if there exists a function P (x, y, z) such that F~ = −∇P . P (x, y, z) is called the potential energy of an object at the point (x, y, z). (a)S Show that the work done by a conservative force in moving the object along a curve ϕ from ϕ(a) to ϕ(b) is P ϕ(a) − P ϕ(b) . (b) Deduce the Law of Conservation of Energy P ϕ(b) + 12 mkϕ0 (b)k2 = P ϕ(a) + 12 mkϕ0 (a)k2 , that is, the sum of the potential and kinetic energies is constant. (c) Find a potential function for the gravitational force field F (x) = −mM Gkxk−3 x, where M is the mass of the earth (concentrated at the origin, the center of the earth), m is the mass of the particle at point x, and G is the gravitation constant. 10. For a smooth curve ϕ : [a, b] :→ Rn , define the arc length function s = s(t) by Z t s(t) = kϕ0 (τ )k dτ, a ≤ t ≤ b. a

Show that s has a smooth inverse t = t(s), 0 ≤ s ≤ ` := length(ϕ). The curve ψ(s) = ϕ(t(s)) is called a reparametrization of ϕ by arc length. Show that, for a continuous vector field F~ on trace(ϕ) = trace(ψ), Z

F~ · T~ϕ =

ϕ

Z

`

F~ ψ(s) · ψ 0 (s) ds.

0

11. Let P = {a = t0 < t1 < · · · < tk = b} be a partition of [a, b]. For f : [a, b] → R, define VP (f ) =

k X

|f (tj ) − f (tj−1 )|.

j=1

Then f is said to have bounded variation on the interval [a, b] if supP VP (f ) < +∞. (Section 5.9.) Show that a curve ϕ = (ϕ1 , . . . , ϕn ) : [a, b] → Rn is rectifiable iff each component function ϕi has bounded variation on [a, b].

422

A Course in Real Analysis

12.3

Parameterized Surfaces

12.3.1 Definition. Let 1 ≤ m ≤ n. A smooth parameterized m-surface in Rn is a C 1 function ϕ = (ϕ1 , . . . , ϕn ) : U → Rn , where U ⊆ Rm is open and the derivative ϕ0 (u) has rank m at each point u ∈ U . A reparametrization of ϕ is a smooth parameterized m-surface ψ = ϕ ◦ α : V → Rn , where V ⊆ Rm is open and α : V → U is C 1 with C 1 inverse α−1 : U → V such that Jα > 0 on V . In this case, ϕ and ψ are said to be equivalent. ♦ We shall usually drop the qualifier “smooth” when referring to parameterized surfaces. Note that the parameter set U is a m-parameterized surface in Rm . Here, we take ϕ to be the identity map ι : U → U .

Tangent Spaces of a Parameterized Surface Let ϕ : U → Rn be a parameterized m-surface and u ∈ U . For small |t| the line segment u + tej is contained in U and is mapped by ϕ onto a curve in S := ϕ(U ) with tangent vector d ∂ϕ1 ∂ϕn (u), . . . , (u) =: ∂j ϕ(u), ϕ(u + tej ) = dϕu (ej ) = dt t=0 ∂uj ∂uj where e1 , . . . , em are the standard basis vectors in Rm . Note that ∂j ϕ(u) is just the jth column of ϕ0 (u). Since ϕ0 (u) has rank m, the vectors dϕu (ej ) are linearly independent and hence form a basis for an m-dimensional subspace Tϕ(u) of Rn , called the tangent space of ϕ at u. Thus dϕu is a linear isomorphism from Rm onto Tϕ(u) mapping the frame (e1 , . . . , em ) onto the frame (∂1 ϕ(u), . . . , ∂m ϕ(u)).1 Note that ϕ is not assumed to be one-to-one, and ϕ(u) = ϕ(v) does not necessarily imply that Tϕ(u) = Tϕ(v) . (See Figure 12.7.)

p Tϕ(v)

ϕ(U )

Tϕ(u)

FIGURE 12.7: Tangent spaces at p = ϕ(u) = ϕ(v). 1A

frame in a finite dimensional vector space is simply an ordered basis—see Appendix B.

Curves and Surfaces in Rn

423

Orientation of a Parameterized m-Surface Tangent spaces may be used to assign an orientation to a parameterized m-surface, a notion that will be needed later to construct the integral of a differential form on a surface. First, we define orientation for the space Rm . Two frames (v 1 , . . . , v m ) and (w1 , . . . , wm ) in Rm are said to be orientation equivalent if the determinants of the matrices 1 v · · · v m and w1 · · · wm (where v j and wj are written as column vectors) have the same sign. Orientation equivalence is easily seen to be an equivalence relation. The collection of frames of Rm is therefore partitioned into two classes, one that contains (e1 . . . , em ) and the other containing (−e1 . . . , em ). An orientation is assigned to Rm by designating one of these equivalence classes to be positive and the other negative. Any frame in the former class is then said to have positive orientation, while a frame in the latter class is said to have negative orientation. For example, if m = 3 and (v 1 , v 2 , v 3 ) has positive orientation, then so does (v 2 , v 3 , v 1 ), while (v 2 , v 1 , v 3 ) has negative orientation. By convention, the standard or positive orientation of Rm is the orientation obtained by designating the frame (e1 , . . . , em ) to be positive. For example, in the standard orientation, the sign of the frame (em , e1 , . . . , em−1 ) is (−1)m−1 . We shall always assume that the spaces Rm have the standard orientation. A parameterized m-surface ϕ : U → Rn is said to be orientable if, whenever ϕ(u) = ϕ(v), • Tϕ(u) = Tϕ(v) and • the matrix of the linear transformation, Tuv = (dϕv )−1 ◦ dϕu : Rm → Rm

(12.4)

has positive determinant. Frames (ξ 1 , . . . , ξ m ) and (ζ 1 , . . . , ζ m ) in Tϕ(u) are then declared to be orientation equivalent if the frames 1 m −1 1 −1 m dϕ−1 u (ξ , . . . , ξ ) := dϕu (ξ ), . . . , dϕu (ξ ) and

1 m −1 1 −1 m dϕ−1 u (ζ , . . . , ζ ) := dϕu (ζ ), . . . , dϕu (ζ )

are orientation equivalent in Rm . Since Tuv ◦ (dϕu )−1 (ξ 1 , . . . , ξ m ) = (dϕv )−1 (ξ 1 , . . . , ξ m ) 1 m m −1 1 and det Tuv > 0, the frames dϕ−1 u (ξ , . . . , ξ ) and dϕv (ξ , . . . , ξ ) have the same sign, hence the notion of orientation equivalence in the common tangent space Tϕ(u) = Tϕ(v) is well-defined. As with the vector space Rm , orientation

424

A Course in Real Analysis

equivalence on Tϕ(u) is an equivalence relation with two equivalence classes, one containing dϕu (e1 , . . . , em ), the other containing dϕu (−e1 . . . , em ). The positive (negative) orientation of ϕ is obtained by designating the equivalence class containing dϕu (e1 , . . . , em ) to be positive (negative) for every u ∈ U . We define the sign of ϕ by ( +1 if ϕ is positively oriented, sign(ϕ) = −1 ϕ is negatively oriented. Obviously, if ϕ is one-to-one, then it is orientable. For example, a simple smooth curve ϕ : I → Rn is orientable, and since d(ϕt )(e1 ) = ϕ0 (t) = lim + ∆t→0

ϕ(t + ∆t) − ϕ(t) , ∆t

the positive orientation is the one for which the tangent vector dϕt (e1 ) is in the direction of increasing t. By contrast, the curve in Figure (12.7) is not orientable. 12.3.2 Example. Let a1 , . . . , am be linearly independent vectors in Rn and let b ∈ Rn . Define m-dimensional parameterized affine space ϕ : Rm → Rn by ϕ(u) = ϕ(u1 , . . . , um ) = b +

m X

ui ai .

i=1

x3 u2 a2

b

u1 a1 x2

x1

FIGURE 12.8: Affine space.

Since ϕ is one-to-one, it is orientable. Since ∂i ϕ = ai , the tangent space at each point is the subspace of Rn with frame (a1 , . . . , am ). ♦ 12.3.3 Example. The Cartesian product of circles ϕ(θ1 , . . . , θm ) = r1 cos θ1 , r1 sin θ1 , . . . , rm cos θm , rm sin θm , ri > 0, is a parameterized m-surface in R2m . Orientability follows from the periodicity of the sine and cosine functions. ♦

Curves and Surfaces in Rn

425

Orientation of a Parameterized (n − 1)-Surface For m = n−1, the notion of orientability may be formulated more concretely in terms of a normal vector field. 12.3.4 Lemma. Let ϕ : U → Rn be a parameterized (n − 1)-surface. Define ∂ϕ⊥ : U → Rn by ∂ϕ⊥ :=

n X i=1

(−1)i+n

∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) i e, ∂(u1 , . . . , un−1 )

where the hat indicates that ϕi is omitted in the calculation, and let ∂1 ϕ(u) .. . . A := ∂n−1 ϕ(u) ∂ϕ⊥ (u) n×n Then dϕ⊥ (u) is perpendicular to the tangent space Tϕ(u) , and |A| = k∂ϕ⊥ (u)k2 = det ϕ0 (u)t ϕ0 (u) > 0.

(12.5)

Proof. Let m = n − 1. For each j, the determinant ∂j ϕ1 (u) · · · ∂j ϕn (u) ∂1 ϕ1 (u) · · · ∂1 ϕn (u) Dj (u) := .. .. . . ∂m ϕ1 (u) · · · ∂m ϕn (u) has two identical rows and hence is zero. Expanding Dj (u) along the first row and multiplying by (−1)m yields Dj (u) = (−1)m

n X ∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) (−1)i+1 ∂j ϕi = ∂j ϕ(u) · ∂ϕ⊥ (u). ∂(u , . . . , u ) 1 n−1 i=1

Therefore, d∂j (u) · ϕ⊥ (u) = 0, so ϕ⊥ (u) is perpendicular to Tϕ(u) . To prove the first equality in (12.5), expand |A| along the last row to obtain |A| =

2 n X ∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) i=1

∂(u1 , . . . , um )

= kϕ⊥ (u)k2 > 0,

the positive inequality because ϕ0 has rank m. For the second equality in (12.5), using what has already been established

426

A Course in Real Analysis

we calculate k∂ϕ⊥ (u)k4 = |A|2 = |AAt | ∂1 ϕ(u) · ∂1 ϕ(u) · · · ∂1 ϕ(u) · ∂m ϕ(u) .. .. . . = ∂m ϕ(u) · ∂1 ϕ(u) · · · ∂m ϕ(u) · ∂m ϕ(u) 0 ··· 0 0 ⊥ 2 t 0 = k∂ϕ (u)k det ϕ (u) ϕ (u) .

0 .. .

0 ⊥ 2 k∂ϕ (u)k

12.3.5 Corollary. The frame dϕu (e1 ), . . . , dϕu (en−1 ), ∂ϕ⊥ (u) is positively oriented in Rn . 12.3.6 Theorem. Let ϕ : U → Rn be a parameterized (n − 1)-surface. The following statements are equivalent: (a) ϕ is orientable. (b) ϕ(u) = ϕ(v) ⇒ ∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c > 0. ~ ϕ : ϕ(U ) → Rn (necessarily unique) such that (c) There exists a function N ~ ϕ ϕ(u) = k∂ϕ⊥ (u)k−1 ∂ϕ⊥ (u) N 1 =q 0 det ϕ (u)t ϕ0 (u)

(12.6) n X

(−1)i+n

i=1

∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) i e. ∂(u1 , . . . , un−1 )

Proof. For u ∈ U , let Tu : Rn → Rn denote the unique linear isomorphism such that Tu (ej ) = dϕu (ej ), 1 ≤ j ≤ n − 1, and Tu en ) = ∂ϕ⊥ (u). By Lemma 12.3.4, det Tu = kϕ⊥ (u)k2 > 0. Suppose that ϕ(u) = ϕ(v). Since ∂ϕ⊥ (u) ⊥ Tϕ(u) and ∂ϕ⊥ (v) ⊥ Tϕ(v) , Tϕ(v) = Tϕ(u)

iff

∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c 6= 0.

In this case, by (12.4), (Tv−1 Tu )(ej ) = dϕv (Tv−1 Tu )(en ) = Tv−1

−1

dϕu (ej ) = Tuv (ej ), 1 ≤ j ≤ n − 1, and c∂ϕ⊥ (v) = cen .

Thus the matrix of Tv−1 Tu has columns Tuv (e1 ), . . ., Tuv (en−1 ), cen . It follows that 0 < det Tu / det Tv = det(Tv−1 Tu ) = c det Tuv . (12.7)

Curves and Surfaces in Rn

427

~ ϕ exists and let With these preliminaries out of the way, assume that N ϕ(u) = ϕ(v). Then ~ ϕ ϕ(u) = N ~ ϕ ϕ(v) = k∂ϕ⊥ (v)k−1 ∂ϕ⊥ (v), k∂ϕ⊥ (u)k−1 ∂ϕ⊥ (u) = N hence ∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c > 0. By the first paragraph, Tϕ(u) = Tϕ(v) and det Tuv > 0. Therefore, ϕ is orientable. Conversely, assume that ϕ is orientable and let ϕ(u) = ϕ(v). Then Tϕ(u) = Tϕ(v) , hence ∂ϕ⊥ (u) = c∂ϕ⊥ (v) for some c 6= 0. Since det[Tuv ] > 0, c > 0 by (12.7). Therefore, k∂ϕ⊥ (u)k−1 ∂ϕ⊥ (u) = k∂ϕ⊥ (v)k−1 ∂ϕ⊥ (v), ~ ϕ may be unambiguously defined by (12.6). so N 12.3.7 Special Cases. (a) n = 2: Then ϕ⊥ = (−ϕ02 , ϕ01 ), the inward normal. (Figure 12.9.)

(−ϕ02 , ϕ01 ) (ϕ01 , ϕ02 )

FIGURE 12.9: The inward unit normal. (b) n = 3: Then ∂ ϕ ∂ϕ⊥ = 1 2 ∂2 ϕ2

∂1 ϕ1 ∂1 ϕ3 , − ∂2 ϕ1 ∂2 ϕ3

∂1 ϕ3 ∂1 ϕ1 , ∂2 ϕ3 ∂2 ϕ1

∂1 ϕ2 = ∂1 ϕ × ∂2 ϕ, ∂2 ϕ2

the familiar cross product of ∂1 ϕ and ∂2 ϕ. dϕu (e2 )

~ ϕ (p) N

S = ϕ(U ) p dϕu (e1 )

FIGURE 12.10: Normal vector to S at p. ~ ϕ (p) is a right-handed Thus the positively oriented frame dϕu (e1 ), dϕu (e2 ), N system, as shown in Figure 12.10.

428

A Course in Real Analysis

(c) Let U ⊆ Rn−1 be open and let g : U → R be C 1 . Define ϕ(u1 , . . . , un−1 ) = u1 , . . . , un−1 , g(u1 , . . . , un−1 ) . Then ϕ(U ) is the graph of g. Since ϕ is one-to-one, it is orientable. Also, ∂j ϕ = 0, · · · , 0, 1, 0, · · · , 0, ∂j g ⊥ (−∂1 g, · · · , −∂j g, · · · , −∂n−1 g, 1 j

and, by elementary row operations, 1 ··· 0 ∂1 g 0 ··· 0 ∂2 g .. .. .. = (∂ g)2 + · · · + (∂ 2 . 1 n−1 g) + 1. . . 0 ··· 1 ∂n−1 g −∂1 g · · · −∂n−1 g 1 Since this is positive, by uniqueness, (−∂1 g, · · · , −∂n−1 g, 1

(−∇g, 1 ~ϕ ◦ ϕ = p N =p . (∂1 g)2 + · · · + (∂n−1 g)2 + 1 k∇gk2 + 1

♦

12.3.8 Example. Let r > 0 and define ϕ(θ1 , θ2 ) = (r sin θ1 cos θ2 , r sin θ1 sin θ2 , r cos θ1 ), θ1 ∈ (0, π), θ2 ∈ (0, 2π). The image of ϕ is the sphere in R3 with radius r and center (0, 0, 0) and with the great circle (r sin θ1 , 0, r cos θ1 ) (that is, θ2 = 0) through the poles (0, 0, ±r) missing. Since ∂1 ϕ(θ1 , θ2 ) = r(cos θ1 cos θ2 , cos θ1 sin θ2 , − sin θ1 ) and ∂2 ϕ(θ1 , θ2 ) = r(− sin θ1 sin θ2 , sin θ1 cos θ2 , 0), by 12.3.7 ∂ϕ⊥ (θ1 , θ2 ) = ∂1 ϕ(θ1 , θ2 ) × ∂2 ϕ(θ1 , θ2 ) = r sin θ1 r sin θ1 cos θ2 , r sin θ1 sin θ2 , r cos θ1

= (r sin θ1 )ϕ(θ1 , θ2 ). Therefore, ~ ϕ ◦ ϕ)(θ1 , θ2 ) = (N that is,

ϕ(θ1 , θ2 ) = r−1 ϕ(θ1 , θ2 ), kϕ(θ1 , θ2 )k

~ ϕ (p) = p , N kpk

p ∈ S.

♦

Curves and Surfaces in Rn

429

x2 ψ1 (t), ψ2 (t) cos θ, ψ2 (t) sin θ ψ(t)

θ x1 x3 FIGURE 12.11: Surface of revolution. 12.3.9 Example. Let I be an open interval and ψ : I → R2 a smooth curve with ψ2 (t) > 0 for t ∈ I. The parameterized surface of revolution in R3 is defined by ϕ(t, θ) = (ψ1 (t), ψ2 (t) cos θ, ψ2 (t) sin θ), t ∈ I, θ ∈ R. From (12.3.7) and the calculations ∂1 ϕ(t, θ) = ψ10 (t), ψ20 (t) cos θ, ψ20 (t) sin θ , ∂2 ϕ(t, θ) = 0, −ψ2 (t) sin θ, ψ2 (t) cos θ , we have and

∂ϕ⊥ (t, θ) = ψ2 (t) ψ20 (t), −ψ10 (t) cos θ, −ψ10 (t) sin θ

(12.8)

∂ψ ⊥ (t) = (−ψ20 (t), ψ10 (t)).

Now suppose that ψ is orientable. We claim that ϕ is then orientable. To see this, suppose that ϕ(t1 , θ1 ) = ϕ(t2 , θ2 ). Then ψ1 (t1 ) = ψ1 (t2 ), and because ψ2 (t) > 0, ψ2 (t1 ) = ψ2 (t2 ) and hence θ2 = θ1 + 2kπ. By orientability of ψ, (−ψ20 (t2 ), ψ10 (t2 )) = ∂ψ ⊥ (t2 ) = c∂ψ ⊥ (t1 ) = c(−ψ20 (t1 ), ψ10 (t1 )) for some c > 0. It follows from (12.8) that ∂ϕ⊥ (t2 , θ2 ) = ψ2 (t2 ) ψ20 (t2 ), −ψ10 (t2 ) cos θ2 , −ψ10 (t2 ) sin θ2

= cψ2 (t1 ) ψ20 (t1 ), −ψ10 (t1 ) cos θ1 , −ψ10 (t1 ) sin θ1 = c∂ϕ⊥ (t1 , θ1 ), which shows that ϕ is orientable. Moreover, from (12.8), k∂ϕ⊥ (t, θ)k = ψ2 (t)kψ 0 (t)k,

430

A Course in Real Analysis

hence ⊥ ~ ϕ (ϕ(t, θ)) = ∂ϕ (t, θ) = kψ 0 (t)k−1 ψ20 (t), −ψ10 (t) cos θ, −ψ10 (t) sin θ , N ⊥ k∂ϕ (t, θ)k

which is the rotation of the unit normal vector −Nψ about the x1 axis. For the special case ψ(x) = x, f (x) , ϕ(x, θ) = (x, f (x) cos θ, f (x) sin θ) and

~ ϕ (ϕ(x, θ)) = [f 0 (x)]2 + 1 N

−1/2

f 0 (x), − cos θ, − sin θ .

A point (x, y, z) on the surface S = ϕ(U ) and not on the graph of f may be written uniquely as x, f (x) cos(θ(y, z)), f (x) sin θ(y, z) where 0 < θ(y, z) < 2π is the (continuous) argument of (y, z) determined by θ0 = 0 (see 9.4.6). Therefore, ~ ϕ (x, y, z) = [f 0 (x)]2 + 1 −1/2 f 0 (x), − cos θ(y, z) , − sin θ(y, z) , N which is continuous on S by the periodicity of sine and cosine.

♦

12.3.10 Example. The parameterized Möbius strip is defined by ϕ(t, θ) = 2 + t cos 12 θ cos θ, 2 + t cos 12 θ sin θ, t sin 12 θ , where −1 < t < 1 and θ ∈ R. The surface may be concretely realized by taking one end of a long strip of paper, giving it a half-twist, and gluing it to the other end.

FIGURE 12.12: Möbius strip. The Möbius strip is not orientable. Indeed, ϕ(0, 0) = ϕ(0, 2π), but since ∂1 ϕ(0, 0) = −∂1 ϕ(0, 2π) = (1, 0, 0) and ∂2 ϕ(0, 0) = ∂2 ϕ(0, 2π) = (0, 1, 0), we see that ∂1 ϕ(0, 0) × ∂2 ϕ(0, 0) = (0, 0, 1) = −∂1 ϕ(0, 2π) × ∂2 ϕ(0, 2π). ~ ϕ cannot exist. Therefore, N

♦

Curves and Surfaces in Rn

431

Exercises 1. Assuming that R3 has the standard orientation, find the sign of the frames (a)S (e1 + e2 , e2 + e3 , e3 + e1 ). (b) (−e1 + e2 + e3 , e1 − e2 + e3 , e1 + e2 − e3 ). 2. Show that the frames (e1 + e2 + e3 , 2e1 + e2 + 3e3 ) and (e1 + 3e2 − e3 , e1 + 4e2 − 2e3 ) in R3 span the same subspace but have opposite orientations. 3. Let ϕ : U → Rn be a parameterized m-surface and ψ = ϕ ◦ α : V → Rn a reparametrization of ϕ. Show that ϕ is orientable iff ψ is orientable. 4. Let ϕ : U → Rn be an orientable parameterized (n − 1)-surface and let ψ = ϕ ◦ α : V → Rn be a reparametrization of ϕ. Find ∂ψ ⊥ in terms of ∂ϕ⊥ . Use the result to show that Nψ = Nϕ on S := ϕ(U ) = ψ(V ). ~ ϕ (x, y, z) for the torus 5.S Use 12.3.9 to find N ϕ(φ, θ) = a cos φ, (b + a sin φ) cos θ, (b + a sin φ) sin θ , 0 < θ, φ < 2π, where 0 < a < b. ~ ϕ (x, y, z) for the following orientable 2-surfaces in R3 : 6. Find N (a)S ϕ(t, θ) = (t cos θ, t sin θ, t), t > 0, θ ∈ R. (b) ϕ(t, θ) = (sinh t, cosh t cos θ, cosh t sin θ), t, θ ∈ R (hyperboloid of one sheet). (c) ϕ(t, θ) = (cosh t, sinh t cos θ, sinh t sin θ), t, θ ∈ R (one sheet of a hyperboloid of two sheets). (d) ϕ(t, θ) = (t cos θ, t sin θ, θ), t > 0, θ ∈ R (helicoid). (e) ϕ(t, θ) = (t cos θ, t sin θ, θ2 ), t > 0, θ > 0. (f)S ϕ(t, s) = (1 − s) a cos t, a sin t, 0 + s b cos t, b sin t, 1), 0 < s < 1, where 0 < a < b. 7.S Let V ⊆ Rn−2 be open and let ψ : V → Rn−1 be an (n−2)-parameterized surface in Rn−1 . Define the cylinder ϕ over ψ by ϕ(v, s) = ψ(v), s , v ∈ V, s ∈ (a, b). Show that

432

A Course in Real Analysis (a) ϕ is a parameterized (n − 1)-surface in Rn . (b) ∂ϕ⊥ (u1 , . . . , un−1 ) = ∂ψ ⊥ (u1 , . . . , un−2 ), 0 . (c) ϕ is orientable iff ψ is orientable, in which case Nϕ (x1 , . . . , xn ) = Nψ (x1 , . . . , xn−1 ), 0 .

8. Let V ⊆ Rn−2 be open and let ψ : V → Rn−1 be an (n−2)-parameterized surface in Rn−1 . Define the cone over ψ by ϕ(v, s) = (1 − s)ψ(v), s , v ∈ V, 0 < s < 1. Show that (a) ϕ is a parameterized (n − 1)-surface in Rn . (b) ∂ϕ⊥ (v, s) = (1 − s)n−2 ∂ψ ⊥ (v), D(v, s) , where (1 − s)a1,1 ··· (1 − s)a1,n−2 (1 − s)a2,1 · · · (1 − s)a2,n−2 D(v, s) = .. .. . . (1 − s)an−1,1

···

(1 − s)an−1,n−2

−ψ1 (v) −ψ2 (v) .. .

−ψn−1 (v)

and [ai,j ](n−1)×(n−2) = ψ 0 (v).

12.4

m-Dimensional Surfaces

Let 1 ≤ m < n and let V ⊆ Rn be open. Suppose that the function F = (F1 , . . . , Fn−m ) : V → Rn−m is C 1 on V such that the (n − m) × n matrix F 0 (x) has rank n − m at each point x ∈ V . A set of the form S = {x ∈ V : F (x) = c} , where c ∈ Rn−m , is called an m-dimensional level surface of F or simply an m-surface in Rn . By replacing F by F − c, we may (and hereafter shall) take c = 0.

Local Parametrization of an m-Surface The following theorem shows that an m-surface may be “patched together” from a collection of one-to-one parameterized m-surfaces. This will be an important tool in the development of a theory of integration on m-surfaces.

Curves and Surfaces in Rn

433

12.4.1 Theorem. Let S = {x ∈ V : F (x) = 0} be an m-surface in Rn . (a) For each a ∈ S there exist open sets Ua ⊆ Rm and Va ⊆ Rn with a ∈ Va , and a one-to-one parameterized m-surface ϕa from Ua onto Sa := S ∩ Va . 1 (b) Each ϕ−1 a is the restriction to Sa of a C map on Va .

(c) If Sa ∩ Sb 6= ∅, then the mapping −1 −1 ϕab := ϕ−1 b ◦ ϕa : ϕa (Sa ∩ Sb ) → ϕb (Sa ∩ Sb )

is C 1 with inverse ϕba . (d) The mappings ϕa may be chosen so that 0 ∈ Ua and ϕa (0) = a. Proof. If (a)–(c) of the theorem hold and a = ϕa (u0 ), then (d) may be achieved by replacing Ua by Ua − u0 and ϕa by ϕa (u + u0 ), u ∈ Ua − u0 . We prove (a)–(c) first for the case m = n − 1, that is, for F real-valued, and then outline the proof for the general case. Since F has rank 1, ∂i F (a) 6= 0 for some index i (which typically depends on a). Define a C 1 map Ga : V → Rn by Ga (x1 , . . . , xn ) = x1 , . . . , xi−1 , F (x1 , . . . , xn ), xi+1 , . . . xn . Thus Ga simply replaces the ith coordinate of its argument x by F (x). Note that G0a (x) is the identity matrix with row i replaced by ∇F (x). A standard row reduction shows that JGa (a) = ∂i F (a). Since this is nonzero, by the inverse function theorem there exist open sets Va ⊆ V and Wa = Ga (Va ) in 1 Rn with a ∈ Va such that Ga is one-to-one on Va and G−1 a : Wa → Va is C . Taking smaller Wa and Va if necessary, we may suppose that Wa = (α1 , β1 ) × · · · × (αn , βn ). Note that 0 ∈ (αi , βi ), since a1 , . . . , ai−1 , 0, ai+1 , . . . , an = Ga (a) ∈ Wa . Now let (u1 , . . . , un ) ∈ Wa and set (v1 , . . . , vn ) = G−1 a (u1 , . . . , un ). Then (u1 , . . . , ui , . . . , un ) = Ga (v1 , . . . , vn ) = v1 , . . . , vi−1 , F (v1 , . . . , vn ), vi+1 , . . . , vn

= u1 , . . . , ui−1 , (F ◦ G−1 a )(u1 , . . . , un ), ui+1 , . . . , un , hence

(F ◦ G−1 a )(u1 , . . . , un ) = ui

(12.9)

(F ◦ G−1 a )(u1 , . . . , ui−1 , 0, ui+1 , . . . , un ) = 0.

(12.10)

and, in particular,

Now set Ua := (α1 , β1 ) × · · · × (αi−1 , βi−1 ) × (αi+1 , βi+1 ) × · · · × (αn , βn )

434

A Course in Real Analysis

and define ϕa : Ua → Rn by ϕa (u1 , . . . , un−1 ) = G−1 a (u1 , . . . , ui−1 , 0, ui , . . . , un−1 ). By (12.10), F ϕa (u1 , . . . , un−1 ) = 0, hence ϕa (Ua ) ⊆ Sa . Conversely, by (12.9), (v1 , . . . , vn ) ∈ Sa ⇒ ui = (F ◦ G−1 a )(u1 , . . . , un ) = F (v1 , . . . , vn ) = 0 ⇒ (v1 , . . . , vn ) = G−1 a (u1 , . . . , ui−1 , 0, ui+1 , . . . un−1 ) = ϕa (u1 , . . . , un−1 ). Therefore, ϕa (Ua ) = Sa .

Sa

Ua

Wa

S

ϕa

G−1 a

Va

FIGURE 12.13: The mapping G−1 a . Now define the injection mapping ιa : Ua → Wa and the projection mapping πa : Va → Rn−1 , respectively, by ιa (u1 , . . . , un−1 ) = (u1 , . . . , ui−1 , 0, ui , . . . , un−1 ) and πa (v1 , . . . , vn ) = (v1 , . . . , vi−1 , vi+1 , . . . , vn ). −1 Then πa ◦ ιa : Ua → Ua is the identity function and ϕa = G−1 a ◦ ιa . Since Ga has rank n and ιa has rank n − 1, ϕa has rank n − 1. Also, if v = ϕa (u), then

(πa ◦ Ga )(v) = (πa ◦ Ga ◦ ϕa )(u) = πa ◦ ιa (u) = u = ϕ−1 a (v), 1 which shows that ϕ−1 a : Sa → Ua is the restriction to Sa of the C function πa ◦ Ga : Va → Ua . Now let b ∈ S and Sa ∩ Sb = 6 ∅. Then Gb ◦ G−1 a maps the open set Ga (Va ∩ Vb ) onto the open set Gb (Va ∩ Vb ). Also, in the preceding notation, −1 ϕa = G−1 a ◦ ιa on Ua and ϕb = πb ◦ Gb on Sb , hence −1 ϕ−1 b ◦ ϕa = πb ◦ Gb ◦ Ga ◦ ιa , −1 which maps the open set ϕ−1 a (Sb ∩ Sa ) ⊆ Ua onto the open set ϕb (Sb ∩ Sa ) ⊆ Ub and is C 1 with C 1 inverse ϕ−1 a ◦ ϕb . This verifies the theorem for the case m = n − 1.

Curves and Surfaces in Rn

435

In the general case, there exist indices i1 < · · · < ik in {1, . . . , n} such that ∂(F1 , . . . Fk ) (a) 6= 0, ∂(ui1 , . . . , uik ) where k := n − m. Let i01 < i02 < · · · < i0m denote the complementary indices. (In the above case, these were the indices 1, . . . , i − 1, i + 1, . . . , n.) Define Ga (x1 , . . . , xn ) to be the n-tuple (x1 , . . . , xn ), with the coordinates xi1 , . . . , xik replaced by F1 (x), . . ., Fk (x). Then JGa (a) 6= 0, so the sets Va and Wa may be obtained as before. Define Ua = (αi01 , βi01 ) × · · · × (αi0m , βi0m ) → Rn and the injection mapping ιa : Ua → Wa by ιa (u1 , u2 , . . . , um ) = (v1 , v2 , . . . , vn ), where vij = 0, 1 ≤ j ≤ k, and vi0j = uj , 1 ≤ j ≤ m. Thus ιa places zeros in the coordinate positions i1 < · · · < ik and fills the complementary positions by u1 , . . . , um . Finally, define the projection mapping πa : Va → Rn−1 by πa (v1 , . . . , vn ) = (vi01 , . . . , vi0m ). The proof then proceeds as before.

S Sa ϕa

b

a

Sb ϕab

Ua

ϕb

Ub ϕba

FIGURE 12.14: Transition mappings. The functions ϕa : Ua → Sa in the theorem are called local parametrizations of S, and the C 1 functions ϕab are called transition mappings. The sets Sa are called surface elements. A collection of local parameterizations of S whose surface elements cover S is called an atlas for S. Note that if F is C r then, as an examination of the proof reveals, the local parameterizations and the transition maps are C r as well.

436

A Course in Real Analysis

12.4.2 Example. Consider the (n − 1)-sphere S := {y ∈ Rn : kyk = 1} with north and south poles p := (0, . . . , 0, 1) and q := (0, . . . , 0, −1). Let the points y = (y1 , . . . , yn ) and x = (x1 , . . . , xn−1 ) be related as in Figure 12.15. p = (0, . . . , 0, 1) Rn S y 0

Rn−1

(x, 0)

q = (0, . . . , 0, −1)

FIGURE 12.15: Stereographic projection from p. Then for some t, (x1 , . . . , xn−1 , −1) = (x, 0) − p = t(y − p) = ty1 , . . . tyn−1 , t(yn − 1) , hence (x1 , . . . , xn−1 ) =

1 (y1 , . . . , yn−1 ), −1 ≤ yn < 1. 1 − yn

The mapping x = ϕ−1 (y) =

1 (y1 , . . . , yn−1 ), yn < 1, 1 − yn

from S \ {p} onto Rn−1 , is called the stereographic projection from p onto the equatorial hyperplane xn = 0. One readily checks that the inverse of this mapping is given by 1 y = ϕ(x) = 2x1 , . . . , 2xn−1 , kxk2 − 1 , x ∈ Rn−1 . 2 1 + kxk Similarly, the stereographic projection from q is given by x = ϕ˜−1 (y) =

1 (y1 , . . . , yn−1 ), yn > −1 1 + yn

with inverse y = ϕ(x) ˜ =

1 2x1 , . . . , 2xn−1 , 1 − kxk2 . 2 1 + kxk

The set {ϕ, ϕ} ˜ is an atlas for S. The transition mapping from Rn−1 \ {0} to n−1 R \ {0} is the self-inverse mapping x (ϕ−1 ◦ ϕ)(x) ˜ = . ♦ kxk2

Curves and Surfaces in Rn

437

Tangent Space of an m-Surface The local parameterizations ϕa of an m-surface S = {x : F (x) = 0} may be used to construct a tangent space at each point a ∈ S. Let ϕa (u) = ϕb (v) ∈ Sa ∩ Sb . Then v := ϕab (u) and, by the chain rule applied to ϕa = ϕb ◦ ϕab , d(ϕa )u = d(ϕb )v ◦ d(ϕab )u . Since d(ϕab )u : Rm → Rm is an isomorphism, the vectors dj := dϕab )u (ej ) form a basis of Rm . Therefore, we have the mapping of Rm -frames d(ϕa )u (e1 , . . . , em ) = d(ϕb )v (d1 , . . . , dm ),

(12.11)

which shows that Tϕb (v) = Tϕa (u) and hence makes the following definition meaningful. 12.4.3 Definition. The tangent space Tx to S at a point x ∈ S is defined as Tϕa (u) , where ϕa is any local parametrization of S with ϕa (u) = x. ♦ The next proposition gives an intrinsic characterization of tangent space. 12.4.4 Proposition. For x ∈ S let Λx denote the set of all vectors in Rn of the form α0 (0), where α : (−r, r) → S is a C 1 curve with α(0) = x. Then Tx = Λx = {z ∈ R : dFx (z) = 0} = n

n−m \

{z ∈ Rn : ∇Fi (x) · z = 0} .

i=1

Proof. Let ϕ be a local parametrization of S with ϕ(u) = x. A member of Tx m X is of the form z = ai dϕu (ei ). For small |t|, the curve i=1

m X α(t) = ϕ u + t ai ei i=1

lies in S, α(0) = x, and, by the chain rule, α0 (0) = dϕu

X m i=1

ai ei

=

m X

ai dϕ)u (ei ) = z.

i=1

Therefore, z ∈ Λx . On the other hand, if α0 (0) ∈ Λx , then differentiating the identity (F ◦ α)(t) = 0 at t = 0 yields dFx α0 (0) = 0. We have shown that Tx ⊆ Λx ⊆ {z : dFx (z) = 0} . Since dFx (z) = 0 has dimension m, the three spaces must be equal.

438

A Course in Real Analysis

12.4.5 Remark. The proposition shows that if S1 is an m1 -surface, S2 is an m2 -surface, and ψ : S1 → S2 is C 1 , then for x ∈ S1 and y = ψ(x) the function dψx maps Tx into Ty . Indeed, if v ∈ Tx , then there exists a smooth curve α1 : (−1, 1) → S1 with α1 (0) = x and α10 (0) = v. Then α2 =: ψ ◦ α1 is a smooth curve in S2 and dψx (v) = (ψ ◦ α1 )0 (0) = α20 (0) ∈ Ty . ♦

dψx x S1

v α1

y

α2 S2

ψ FIGURE 12.16: The mapping dψx : Tx → Ty .

Orientation of an m-Surface Let S be an m-surface with local parameterizations ϕa : Ua → Rn . Since ϕa is one-to-one, it is orientable. Suppose the parameterizations have the same orientation, that is, sign(ϕa ) = sign(ϕb ) for all a and b. If u ∈ Ua , v ∈ Ub , and ϕa (u) = ϕb (v), then (12.11) shows that the orientation of Tϕb (v) agrees with that of Tϕa (u) iff Jϕab (v) > 0. Thus if Jϕab > 0 whenever Sa ∩ Sb 6= ∅, then S may be given a well-defined orientation via the orientations of the local parameterizations. In this case, S is said to be orientable. The positive orientation is obtained if each local parametrization is positively oriented.

Orientation of an (n − 1)-Surface Orientability of an (n − 1)-surface may be characterized in terms of the ~ ϕ . For this we need the following lemma, which relates normal vector fields N a ~ ϕ and N ~ ϕ on overlapping surface elements. N a b 12.4.6 Lemma. Let x := ϕa (u) = ϕb (v) ∈ Sa ∩ Sb , where u ∈ Ua , v ∈ Ub . Then ~ ϕ (x) = |Jϕ (u)|−1 Jϕ (u)N ~ ϕ (x) = sign Jϕ (u) N ~ ϕ (x). N a ab ab b ab b Proof. Since ϕa = ϕb ◦ ϕab and v = ϕab (u), the chain rule implies that ϕ0a (u) = ϕ0b (v)ϕ0ab (u) and ∂(ϕa,1 , . . . , ϕ d ∂(ϕb,1 , . . . , ϕ d a,i , . . . , ϕa,n ) b,i , . . . , ϕb,n ) (u) = (v)Jϕab (u). ∂(u1 , . . . , un−1 ) ∂(v1 , . . . , vn−1 )

Curves and Surfaces in Rn

439

From the first equation, q q det ϕ0a (u)t ϕ0a (u) = det ϕ0ab (u)t ϕ0b (v)t ϕ0b (v)ϕ0ab (u) q = |Jϕab (u)| det ϕ0b (v)t ϕ0b (v) . The assertion now follows by recalling that ~ ϕ (x) = q N a

n X ∂(ϕa,1 , . . . , ϕ d a,i , . . . , ϕa,n ) (−1)i+n (u) ∂(u , . . . , un−1 ) 1 det ϕ0a (u)t ϕ0a (u) i=1

1

and ~ ϕ x) = q N b

n X

1 det ϕ0b (v)t ϕ0b (v)

i=1

(−1)i+n

∂(ϕb,1 , . . . , ϕ d b,i , . . . , ϕb,n ) (v). ∂(v1 , . . . , vn−1 )

12.4.7 Theorem. An (n − 1)-surface S is orientable iff there exists a contin~ on S such that uous vector field N ~ = N ~ ϕ for each a ∈ S. N (12.12) a Sa ~ϕ = N ~ ϕ on Proof. If S is orientable, then Jϕab > 0, hence, by 12.4.6, N a b ~ ~ Sa ∩ Sb . Therefore, (12.12) defines N unambiguously. Since Nϕa is easily seen ~ is continuous on S. to be continuous on Sa and Sa is relatively open in S, N ~ on S that Conversely, assume there exists a continuous vector field N satisfies (12.12). If x = ϕa (u) ∈ Sa ∩ Sb , then ~ ϕ (x) = N ~ (x) = N ~ ϕ (x), N a b hence, by 12.4.6, Jϕab (u) > 0. Therefore, S is orientable. Let S be orientable with positive orientation. Then, by definition, the frame d(ϕa )u (e1 ), . . . , d(ϕa )u (en−1 )) in Ta is designated as positive (sign(ϕa ) > 0) for each a ∈ S. Since the frame ~ (a) d(ϕa )u (e1 ), . . . , d(ϕa )u (en−1 ), N ~ . The in Rn is positive (12.3.5), we say in this case that S is oriented by N ~ notion of orientation by −N is defined analogously. For example, the sphere S = {(x1 , . . . , xn ) : kxk = r} is locally parameterized by the mappings ϕ and ϕ˜ of 12.3.8. The positive orientation is given by the unit normal vector field ~ (p) = kpk−1 p, called the outward unit normal. N 12.4.8 Corollary. If S = {x : F (x) = 0} is connected, then S is orientable and ~ = k∇F k−1 ∇F or N ~ = −k∇F k−1 ∇F. N

440

A Course in Real Analysis

~ implies Proof. Since ∇F (x) is perpendicular to S at x, the uniqueness of N that ~ (x) = s(x) ∇F (x) , x ∈ S, N k∇F (x)k where s(x) = ±1 is constant on each surface element. Since the surface elements are open in S, s(x) is continuous. Since S is connected, s(x) must be constant on S.

(n − 1)-Surfaces-with-Boundary To discuss surfaces-with-boundary, we shall need the following notation: Rn−1 := y ∈ Rn−1 : yn−1 > 0 . + Hn−1 := y ∈ Rn−1 : yn−1 ≥ 0 . ∂Hn−1 := y ∈ Rn−1 : yn−1 = 0 . 12.4.9 Definition. An (n − 1)-surface-with-boundary is a subset of Rn of the form S = {x ∈ W : F (x) = 0 and gi (x) ≥ 0, i = 1, . . . , k} , where W ⊆ Rn is open and F : W → R and gi : W → R are C 1 and satisfy the following conditions: (a) ∇F (x) 6= 0 for all x ∈ S. (b) The sets Bi := {x ∈ S : gi (x) = 0} are pairwise disjoint. (c) For each i and x ∈ Bi , the vectors ∇F (x) and ∇gi (x) are linearly independent. Sk The set ∂S:= i=1 Bi is called the boundary of S and S \ ∂S is the interior.♦

x3

B2 : g2 (x) := 1 − x3 = 0

∇g1

∇F ∇g2

x1

F (x) := x21 + x22 − 1 = 0

x2 B1 : g1 (x) := x3 = 0 ∇F

FIGURE 12.17: Cylinder-with-boundary: x21 + x22 = 1, 0 ≤ x3 ≤ 1.

Curves and Surfaces in Rn

441

If V denotes the open set {x ∈ W : gi (x) > 0, i = 1, . . . , k}, then S \ ∂S = {x ∈ V : F (x) = 0} . Therefore, condition (a) implies that the interior of S is an (n − 1)-surface. Conditions (b) and (c) assert that the boundary of S is made up of disjoint (n − 2)-surfaces. Indeed, if Fi := (F, gi ), then Bi = {x ∈ W : Fi = 0} and ∇F 0 Fi = ∇gi has rank 2. Also, because the (n − 2)-surfaces Bi are pairwise disjoint, a local parametrization of Bi may be chosen to be disjoint from a local parametrization of Bj . The following theorem shows that, as in the case of an (n − 1)-surface, an (n − 1)-surface-with-boundary may be described by a collection of local parameterizations. 12.4.10 Theorem. Let S be an (n − 1)-surface-with-boundary. (a) If a ∈ S \ ∂S, then there exists a local parametrization ϕa : Ua → Rn of S \ ∂S at a with ϕa (0) = a. ˜a ⊆ Rn−1 and a one-to-one (b) If a ∈ ∂S, then there exists an open set U ˜a → Rn−1 with ϕ˜a (0) = a such that parameterized (n − 1)-surface ϕ˜a : U n−1 ˜a ∩ H if Ua := U and ϕa := ϕ˜a U , then a (i) ϕa Ua is open in S, (ii) ϕa Ua ∩ Rn−1 is open in S \ ∂S, and + n−1 (iii) ϕa Ua ∩ ∂H is open in ∂S. ϕa Ua ∩ ∂Hn−1 ϕ˜a U˜a +

∂S S

a ϕa Ua ∩ Rn−1 +

˜a ∩ Hn−1 . FIGURE 12.18: Surface element Sa = ϕa U Proof. Part (a) follows from 12.4.1, since S \ ∂S is an (n − 1)-surface without boundary. For part (b), we may assume without loss of generality that ∂S = {x ∈ S : g(x) = 0}. Choose a local parametrization ψa : Wa → Rn of

442

A Course in Real Analysis

S 0 := {x ∈ W : F (x) = 0} such that ψa (0) = a. Since ψa has rank n − 1 and g has rank 1, ∂i (g ◦ ψa ) 0) 6= 0 for some i. Define Ha : Wa → Rn−1 by Ha (w1 , . . . , wn−1 ) = w1 , . . . , wi−1 , wi+1 , . . . , wn−1 , g ◦ ψa (w1 , . . . , wn−1 ) . Then Ha has rank n − 1 at 0, hence, by the inverse function theorem, there ˜ a ⊆ Wa and U ˜a = Ha (W ˜ a ) in Rn−1 with 0 ∈ W ˜ a such that exist open sets W −1 1 ˜ ˜ ˜ Ha is one-to-one on Wa and Ha : Ua → Wa is C . Set ˜a → S 0 . ϕ˜a = ψa ◦ Ha−1 : U ˜a , then g ◦ ψa (w) = g ◦ ψa ◦ Ha−1 (u) = g ◦ ϕ˜a (u), hence, by If u = Ha (w) ∈ U definition of Ha , (u1 , . . . , un−1 ) = w1 , . . . , wi−1 , wi+1 , . . . , wn−1 , g ◦ ϕ˜a (u) . Therefore, un−1 = g ◦ ϕ˜a (u), so ˜a ∩ Rn−1 and g ◦ ϕ˜a (u) = 0 iff u ∈ U ˜a ∩ ∂ Hn−1 . g ◦ ϕ˜a (u) > 0 iff u ∈ U + It follows that ˜a ∩ Rn−1 = (S \ ∂S) ∩ ψ W ˜ a and ϕ˜a U ˜a ∩ ∂Hn−1 = ∂S ∩ ψ W ˜a . ϕ˜a U + ˜ a is open in S 0 and S 0 ⊇ S, (i)–(iii) follow. Since ψ W

Oriented (n − 1)-Surfaces-with-Boundary As in the non-boundary case, orientation of an (n−1)-surface-with-boundary S may be defined in terms of local parameterizations. By 12.4.4, the (n − 1)dimensional tangent space at a ∈ S is TaS = {z ∈ Rn : z · ∇F (a) = 0} . The new feature here is that if a ∈ ∂S, say a ∈ Bi , then there is also an (n − 2)-dimensional tangent space to ∂S at a, namely, Ta∂S = {z ∈ Rn : z · ∇F (a) = z · ∇gi (a) = 0} . The connection between TaS and Ta∂S is described as follows: Let ϕa be a local parametrization of S as described in part (b) of 12.4.10, where ϕa (0) = a. Since ϕ˜a (Ua ∩ ∂Hn−1 ) ⊆ ∂S and (e1 , . . . , en−2 ) is a frame for ∂Hn−1 , d(ϕ˜a )0 (e1 , . . . , en−2 ) is a frame for Ta∂S . Since the vector d(ϕ˜a )0 (−en−1 ) is not in the subspace Ta∂S , d(ϕ˜a )0 (−en−1 , e1 , . . . , en−2 )

(12.13)

is a frame for TaS . The induced orientation of ∂S is obtained by declaring the frame d(ϕ˜a )0 (e1 , . . . , en−2 ) of Ta∂S to have the sign of the frame (12.13). If S is positively oriented, then this sign is (−1)n−1 .

Curves and Surfaces in Rn

443 TaS

R+ n−1

d(ϕa )0 (−e

Hn−1 d(ϕa )0 (e1 )

Ua 0 −en−1

∂Hn−1

n=3

)

∂S

a

Ta∂S

→ − N ϕ (a) ϕa

S

FIGURE 12.19: Induced orientation of Ta∂S . Figure 12.19 depicts the case n = 3. Here, S is oriented by the normal ~ (pointing outward). Therefore, by definition, the frame d(ϕa )0 (e1 , e2 ) is N positive in TaS , hence so is the frame d(ϕ˜a )0 (−e2 , e1 ). Thus, again by definition, the frame d(ϕ˜a )0 (e1 ) of Ta∂S is positive in the induced orientation. Note that ~ ϕ ) in R3 is positive (12.3.5), so because the frame (d(ϕa )0 (e1 ), d(ϕa )0 (e2 ), N ~ ϕ ). The latter therefore forms a rightis the frame (d(ϕa )0 (−e2 ), d(ϕa )0 (e1 ), N 3 handed system in R . Thus if d(ϕ˜a )0 (−e2 ) points upward, then d(ϕ˜a )0 (e1 ) must point in the direction shown. Therefore, the induced orientation of ∂S is the one for which the surface S is on the left when ∂S is traversed in the direction of the tangent vectors d(ϕ˜a )0 (e1 ).

Exercises 1. Let 0 < a < b. Show that the mapping ϕ(φ, θ) = a cos φ, (b + a sin φ) cos θ, (b + a sin φ) sin θ , 0 < θ, φ < 2π, p 2 is a local parametrization of the torus x2 + y 2 + z 2 − b = a2 with two circles missing. 2. Let U = x ∈ Rn−1 : kxk < 1 and define a local parametrization ψ : U → S n−1 = {y ∈ Rn : kyk = 1} by p ψ(x) = x, 1 − kxk2 , x ∈ Rn−1 Give a geometric description of ψ. Referring to 12.4.2, find the transition mapping ϕ˜−1 ◦ ψ. 3.S Consider the stereographic projection ϕ−1 1 (y) = x from p onto the hyperplane xn = −1 shown in Figure 12.20, where y = (y1 , . . . , yn ) and x = (x1 , . . . , xn−1 ). Calculate ϕ1 (x) and ϕ−1 1 (y) and find the transition mapping ϕ−1 ◦ ϕ1 , where ϕ is the mapping of 12.4.2.

444

A Course in Real Analysis

p = (0, . . . , 0, 1) S

Rn y

0

q = (0, . . . , 0, −1)

(x, −1)

FIGURE 12.20: Stereographic projection ϕ−1 1 (y) from p. 4. Replace the sphere in 12.4.2 by the elliptic paraboloid ) ( 2 2 y2 y1 + , y3 < 1 S = (y1 , y2 , y3 ) : y3 = a1 a2 (with p = (0, 0, 1)) and find the corresponding maps ϕ and ϕ−1 . 5.S Repeat Exercise 4 using the elliptic cone ( ) 2 2 y1 y2 2 S = (y1 , y2 , y3 ) : y3 = + , 0 < y3 < 1 . a1 a2 6. Repeat Exercise 4 using the ellipsoid ( ) 2 2 y1 y2 S = (y1 , y2 , y3 ) : + + y32 = 1 . a1 a2 7. Find the equation of the tangent plane Ta at a = (1, 1, 1) for each of the following surfaces: (a)S x21 + 2x22 + 3x23 = 6. (b) x21 + x22 − 2x23 = 0. (c) x21 − x22 + x3 = 1. 8. An n × n matrix A is said to be orthogonal if At A is the identity matrix. Identifying a 2 × 2 matrix [ xx13 xx24 ] with the point (x1 , x2 , x3 , x4 ), show that the collection of all 2 × 2 orthogonal matrices is a 1-surface S in R4 . Characterize the matrices in the tangent space to S at each of the following points: √ √ 1 0 −1 0 0 1 1/√2 −1/√2 (a) . (b) . (c) . (d) . 0 1 0 1 1 0 1/ 2 1/ 2 The matrices in the tangent space at the point in part (a) are the so-called 2 × 2 skew-symmetric matrices.

Curves and Surfaces in Rn

445

9. Referring to 12.4.2, let y ∈ S and set T := d(ϕ−1 )y : Ty → Rn−1 . 1 (a)S Prove that kT (v)k = kvk for all v ∈ Ty . (1 − yn ) (b) Use (a), the bilinearity of v · w and T (v) · T (w), and the identity 2v · w = kv + wk2 − kvk2 − kwk2 to prove that T (v) · T (w) v·w = , v, w ∈ Ty . kvkkwk kT (v)kkT (w)k Thus, by 12.4.4, the stereographic projection preserves the angle at the intersection of a pair of simple smooth curves on S. 10. Let each of the following 2-surfaces-with-boundary be positively oriented. Find parametrizations of the boundary curves that are compatible with the induced orientation on the boundary. (a) S = (x1 , x2 , x3 ) : x21 + x22 = 1, 0 ≤ x3 ≤ 2 − x2 . (b) S = (x1 , x2 , x3 ) : x3 = x21 + x22 , 0 ≤ x3 ≤ 1 − x1 − x2 . (c) S = (x1 , x2 , x3 ) : x21 + x22 + x23 = 4, −2 ≤ x3 ≤ 3 − x1 − x2 . Hint. For (c) the boundary is a circle on the plane x1 + x2 + x3 = 3. Translate and rotate that plane into the plane x3 = 0, find a parametric equation of the rotated circle with center 0, then reverse the procedure to find the parametrization of the original circle with appropriate orientation. 11. Let S = {x : F (x) = 0} be an oriented 2-surface in R3 , where F is C 2 . (a)S The tangent bundle of S is the set [ TS = {x} × Tx . x∈S

Show that TS = (x, v) ∈ R6 : F (x) = 0 and v · ∇F (x) = 0 and that TS is a 4-surface in R6 . (b) The sphere bundle of S is the subset TS1 := {(x, v) ∈ TS : kvk = 1} . Show that TS1 is a 3-surface in R6 . (c) Let S = x ∈ R3 : kxk2 = 3 . Show that the tangent space to the √ √ √ sphere bundle TS1 at the point (1, 1, 1, 1/ 6, 1/ 6, −2/ 6) consists of all vectors w ∈ R6 satisfying the system w1 √ −3 6w3 w4

+ w2 + w4 + w5

+ + −

w3 w5 2w6

+ w6

=0 =0 =0

Chapter 13 Integration on Surfaces

Throughout the chapter m and n are fixed positive integers with 1 ≤ m ≤ n. In this chapter we construct the integral of a differential m-form on an m-surface in Rn , a generalization of the line integral of a 1-form on a curve. This will provide the necessary context for the divergence theorem and the theorems of Green and Stokes, far-reaching generalizations of the fundamental theorem of calculus

13.1

Differential Forms

Alternating Multilinear Functionals An m-multilinear functional on Rn is a real-valued function M (a1 , . . . , am ),

a1 , . . . , am ∈ Rn ,

that is linear in each variable ai separately. (See Section 9.7.) Such a function is said to be alternating if interchanging two vectors changes the sign of M : M (a1 , . . . , ai , . . . , aj , . . . , am ) = −M (a1 , . . . , aj , . . . , ai , . . . , am ). Thus if ai = aj , then M (a1 , . . . , am ) = 0. Note that a linear combination of alternating m-multilinear functionals is an alternating m-multilinear functional. A permutation of (1, . . . , m) is a one-to-one function σ mapping {1, . . . , m} onto itself, frequently denoted by (i1 , . . . , im ), where ik = σ(k). The sign (−1)σ of σ is positive (negative) if an even (odd) number of adjacent interchanges are required to transform (i1 , . . . , im ) back to (1, . . . , m) (see Appendix B). It follows that if M is an alternating m-multilinear functional, then M (aσ(1) , . . . , aσ(m) ) = (−1)σ M (a1 , . . . , am ). An important example is the determinant of an n × n matrix, which is 447

448

A Course in Real Analysis

multilinear and alternating on its rows as well as its columns. To build on this, we introduce the following notation. Define Jm = {j := (j1 , . . . , jm ) : 1 ≤ jk ≤ n} , and Im = {i := (i1 , . . . , im ) : 1 ≤ i1 < i2 < · · · < im ≤ n} . Thus Jm is the set of all m-tuples of (possibly repeated) indices in {1, . . . , n} and Im the set of all strictly increasing m-tuples in Jm . In particular, In = {(1, . . . , n)}. Now let A be an n × m matrix with columns a1 , . . . , am ∈ Rn and B an m × n matrix with rows b1 , . . . , bm ∈ Rn . For any member j = (j1 , . . . , jm ) of Jm define Aj to be the m × m matrix whose rth row is row jr of A and define B j to be the m × m matrix whose cth column is column jc of B, that is, 1 1 aj1 a2j1 · · · am a1 a21 · · · am j1 1 2 m a12 a22 · · · am a1j 2 2 aj2 · · · aj2 Aj = a1 · · · am j = . = . . . . . .. .. .. .. .. .. a1n

a2n

···

am n

j

a1jm

a2jm

···

am jm

and 1 j b1 b1 b12 B j = ... = . .. bm b1m

Thus j selects rows from m = 3, 1 4 7 10 and

1 5 9

b21 b22 .. .

··· ···

b2m

···

j j1 bn1 b1 bj1 bn2 2 .. = . .. . bnm bjm1

bj12 bj22 .. .

··· ···

bj1m bj2m .. .

bjm2

···

bjmm

A and columns from B. For example, for n = 4 and

2 6 10

2 5 8 11 3 7 11

3 10 6 4 = 9 1 12 (4,2,1) (4,4,1) 4 4 8 =8 12 12

11 5 2 4 8 12

12 6 3 1 5 . 9

Finally, define the alternating m-multilinear functional dxj = dxj1 ,...,jm on Rn by dxj a1 , . . . , am = det[a1 · · · am ]j . Note that if m = 1, the definition reduces to dxj (a) = aj , as defined in Section 9.7.

Integration on Surfaces

449

13.1.1 Lemma. If i = (i1 , · · · , im ) and j = (j1 , · · · , jm ) ∈ Im , then ( 1 if i = j, j1 jm dxi e , . . . , e = 0 otherwise, where e1 , . . . , en are the standard basis vectors in Rn . Proof. By definition, j1 e1 . jm j1 dxi e , . . . , e = det ..

···

ejn1

···

j1 ei ej1m 1 .. .. = . . j jm e 1 e n

im

i

··· ···

eji1m .. , . jm e im

where eji = 1 if i = j and 0 otherwise. If j1 < i1 , then j1 < i` for every `, hence the first column is zero and the determinant is zero. Similarly, if j1 > i1 , then the first row is zero and, again, the determinant is zero. If j1 = i1 , then the determinant reduces to j2 jm ei 2 · · · ei2 .. .. , . . j e 2 · · · ejm im im and an induction argument completes the proof. 13.1.2 Lemma. Let M and M 0 be alternating m-multilinear functionals on Rn . If M (ei1 , . . . , eim ) = M 0 (ei1 , . . . , eim ) (13.1) for all (i1 , . . . , im ) ∈ Im , then M = M 0 . Proof. For j = 1, . . . , m, let aj = (aj1 , . . . , ajn ) = M (a1 , . . . , am ) = M

n X

a1i ei , . . . ,

i=1

=

n X i1 =1

···

Pn

n X

i=1

aji ei . By multilinearity, !

i am i e

i=1 n X

i1 im a1i1 · · · am im M (e , . . . , e ),

im =1

with the analogous equality holding for M 0 . It therefore suffices to show that M (ei1 , . . . , eim ) = M 0 (ei1 , . . . , eim ). This is clear if two of the indices ik are equal, since then both sides are zero. If the indices are distinct, then, by permuting the vectors ei1 , . . . , eim and attaching the appropriate signs, the indices may be brought into increasing order, and the desired equality then follows from the hypothesis.

450

A Course in Real Analysis

13.1.3 Theorem. If M is an alternating m-multilinear functional on Rn , then X M= M (ei1 , . . . , eim ) dxi1 ,··· ,im . (i1 ,...,im )∈Im

Proof. Let M denote the alternating m-multilinear functional on the right. If (j1 , . . . , jm ) ∈ Im , then X M 0 (ej1 , . . . , ejm ) = M (ei1 , . . . , eim ) dxi1 ,...,im (ej1 , . . . , ejm ) 0

(i1 ,...,im )∈Im

= M (ej1 , . . . , ejm ), the second equality from 13.1.1. By 13.1.2, M = M 0 . The following application of 13.1.3 will be needed later in connection with integration on surfaces. 13.1.4 Binet–Cauchy Product. Let C be an m × n matrix and D an n × m matrix. Then X det(CD) = det C i det Di . i∈Im

Proof. Let c1 , . . ., cm ∈ Rn denote the rows of C and d1 , . . ., dm ∈ Rn the columns of D, the latter considered as variables. Define M d1 , . . . , dm = det(CD) = det ci · dj m×m . Then M is an alternating m-multilinear form and, by 13.1.3, X M d1 , . . . , dm = M (ei1 , . . . , eim ) dxi1 ,...,im d1 , . . . , dm . (i1 ,...,im )∈Im

Since M ei1 , . . . , eim = det C i and dxi d1 , . . . , dm = det Di , the conclusion follows.

13.1.5 Corollary. If C and D are n × n matrices, then det(CD) = (det C)(det D). 13.1.6 Corollary. If A is an n × m matrix, then X det(At A) = [det(Ai )]2 . i∈Im i t Proof. Take C = At and D = A in the theorem 2and note that C = (Ai ) , so t i det C det Di = det Ai det Ai = [det Ai ] .

From 13.1.6, we have 13.1.7 Corollary. Let A be an n × m matrix. Then A has rank m iff det(At A) 6= 0.

Integration on Surfaces

451

Definition of a Differential Form A differential m-form on a set S ⊆ Rn is a function ω that assigns to each x ∈ S an alternating m-multilinear functional ωx on Rn . We shall usually drop the qualifier “differential” when referring to forms. The integer m is called the degree of the form. A 0-form is simply a real-valued function on S. By 13.1.3, if ω is an m-form, then for each i ∈ Im there exists a unique function gi on S such that X ωx = gi (x) dxi , x ∈ S. i∈Im

Conversely, if fj is a real-valued function on S, then X ωx := fj (x) dxj , x ∈ S,

(13.2)

j∈Jm

defines an m-form on S. If each fj is of class C r on S (that is, on an open set containing S), then ω is called a differential form of class C r or simply a C r form, where r ∈ Z+ ∪ {+∞}.

The Algebra of Differential Forms For a ∈ R and m-forms X X ω= fj dxj and η = gj dxj j∈Jm

j∈Jm

on S, define m-forms aω and ω + η on S by X X aω := afj dxj and ω + η := (fj + gj ) dxj . j∈Jm

j∈Jm

The collection of m-forms on S is easily seen to be a vector space under these operations. It is also possibly to multiply forms. For this, the notation dxj1 ,...,jm = dxj1 ∧ · · · ∧ dxjm

(13.3)

will be useful. The right side may be interpreted as a product of differentials, called a wedge product and made precise below. Because dxj1 ,...,jm (a1 , . . . , am ) is a determinant, interchanging a pair of differentials in (13.3) changes the sign of the product. Furthermore, if there are duplicate indices, then the product is zero. Thus we have the “rules” dxj ∧ dxi = −dxi ∧ dxj

and dxi ∧ dxi = 0.

(13.4)

Using these rules, one can reduce any m-form to its unique canonical representation X ω= gi1 ,...,im dxi1 ∧ · · · ∧ dxim . (i1 ,...,im )∈Im

452

A Course in Real Analysis

For example, the 3-form in R4 ω = f dx2 ∧ dx1 ∧ dx2 + g dx3 ∧ dx2 ∧ dx1 + h dx2 ∧ dx4 ∧ dx1 has canonical representation ω = −g dx1 ∧ dx2 ∧ dx3 + h dx1 ∧ dx2 ∧ dx4 . 13.1.8 Definition. Let 1 ≤ p, q ≤ n. The wedge product or exterior product of the forms X X ω= fj1 ,...,jp dxj1 ∧ · · · ∧ dxjp and η = gk1 ,...,kq dxk1 ∧ · · · ∧ dxkq (j1 ,...,jp )∈Jp

(k1 ,...,kq )∈Jq

is the form ω ∧ η :=

X

fj1 ,...,jp gk1 ,...,kq dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq . (13.5)

(j1 ,...,jp )∈Jp (k1 ,...,kq )∈Jq

If f is a 0-form on S, then the p-form f ω = f ∧ ω is defined by X f ∧ ω := f fj1 ,...,jp dxj1 ∧ · · · ∧ dxjp .

♦

(j1 ,...,jp )∈J

Note that the right side of (13.5) may be obtained by formally multiplying the sums defining ω and η, where the product of forms dxi1 ∧ · · · ∧ dxip and dxj1 ∧ · · · ∧ dxjq is defined as dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq . The rules in (13.4) may then be used to obtain the canonical representation of ω ∧ η. The resulting form has degree ≤ n, in compliance with our definition. 13.1.9 Example. In R4 , (a)

(f1 dx1 + f2 dx2 + f3 dx3 + f4 dx4 ) ∧ (g1 dx1 + g2 dx2 ) = (f1 g2 − f2 g1 ) dx1 ∧ dx2 − f3 g1 dx1 ∧ dx3 − f3 g2 dx2 ∧ dx3 − f4 g2 dx2 ∧ dx4 − f4 g1 dx1 ∧ dx4 .

(b)

(f1 dx1 + f2 dx2 + f3 dx3 + f4 dx4 ) ∧ (h1 dx1 ∧ dx3 + h2 dx2 ∧ dx4 ) = f1 h2 dx1 ∧ dx2 ∧ dx4 − f2 h1 dx1 ∧ dx2 ∧ dx3 − f3 h2 dx2 ∧ dx3 ∧ dx4 + f4 h1 dx1 ∧ dx3 ∧ dx4 .

♦

It must still be shown that the definition of ω ∧ η in (13.5) is independent of the particular representations of ω and η. To see this, apply the rules in (13.4), first on the indices jp and then on the indices kq , to reduce the right side of (13.5) to X f˜i1 ,...,ip g˜i01 ,...,i0p dxi1 ∧ · · · ∧ dxip ∧ dxi01 ∧ · · · ∧ dxi0q (i1 ,...,ip )∈Ip (i01 ,...,i0q )∈Iq

Integration on Surfaces

453

where X

f˜i1 ,...,ip dxi1 ∧ · · · ∧ dxip and

X

g˜i01 ,...,i0q dxi01 ∧ · · · ∧ dxi0q

(i01 ,...,i0q )∈Iq

(i1 ,...,ip )∈Ip

are the canonical representations of ω and η. Since the latter are unique, every version of ω ∧ η may be reduced to the same form, hence ω ∧ η is well-defined. 13.1.10 Proposition. Let ω be a p-form, η a q-form, and ν an r-form, where 1 ≤ p, q, r ≤ n. Then (a) ω ∧ η is linear in each variable separately; (b) (ω ∧ η) ∧ ν = ω ∧ (η ∧ ν); (c) η ∧ ω = (−1)pq ω ∧ η. Proof. The straightforward proofs of (a) and (b) are left to the reader. For the proof of (c), let ω and η be as in 13.1.8. Then X η∧ω = gk1 ,...,kq fj1 ,...,jp dxk1 ∧ · · · ∧ dxkq ∧ dxj1 ∧ · · · ∧ dxjp (k1 ,...,kq )∈Jq (j1 ,...,jp )∈Jp

=

X

gk1 ,...,kq fj1 ,...,jp (−1)pq dxj1 ∧ · · · ∧ dxjp ∧ dxk1 ∧ · · · ∧ dxkq

(k1 ,...,kq )∈Jq (j1 ,...,jp )∈Jp

= (−1)pq ω ∧ η, the last equality because pq adjacent interchanges are required. Pn 13.1.11 Proposition. Let aj = i=1 aji ej , j = 1, . . . , n. Then ! ! n n X X 1 n ai dxi ∧ · · · ∧ ai dxi = det[a1 · · · an ]dx1 ∧ · · · ∧ dxn . i=1

i=1

Proof. By properties of the wedge product, the left side of the equation is n X i1 =1

···

n X in =1

a1i1 · · · anin dxi1 ∧ · · · ∧ dxin =

X

a1i1 · · · anin dxi1 ∧ · · · ∧ dxin .

i1 ,...,in distinct

If σ = (i1 , . . . , in ), then dxi1 ∧ · · · ∧ dxin = (−1)σ dx1 ∧ · · · ∧ dxn , and the assertion follows from the definition of determinant. The proposition provides an alternate method for evaluating determinants.

454

A Course in Real Analysis 1 3 5 4 6. By wedge product rules applied to 13.1.12 Example. Let A = 2 3 −2 1 the forms constructed from the columns, (1 dx1 + 2 dx2 + 3 dx3 ) ∧ (3 dx1 + 4 dx2 − 2 dx3 ) ∧ (5 dx1 + 6 dx2 + 1 dx3 ) = (−2 dx1,2 − 11 dx1,3 − 16 dx2,3 ) ∧ (5 dx1 + 6 dx2 + dx3 ) = (−2 dx1,2,3 + 66 dx1,2,3 − 80 dx1,2,3 , ) = −16 dx1,2,3 , hence det(A) = −16.

♦

The Differential of a Form 13.1.13 Definition. The differential of a 0-form f of class C 1 on S ⊆ Rn is its differential as a C 1 function, namely, the 1-form df =

n X

(∂j f )dxj .

j=1

The differential of an m-form X ω=

fj1 ,...,jm dxj1 ∧ · · · ∧ dxjm

(j1 ,...,jm )∈J

of class C 1 on S is the (m + 1)-form dω defined by X dω = (dfj1 ,...,jm ) ∧ dxj1 ∧ · · · ∧ dxjm

(13.6)

(j1 ,...,jm )∈Jm

=

X

n X

(∂j fj1 ,...,jm ) dxj ∧ dxj1 ∧ · · · ∧ dxjm .

♦

(j1 ,...,jm )∈J j=1

Note that if m = n, then dω = 0, since in the last expression every dxj is a dxji for some i. As in the case of wedge products, it must be verified that the definition of dω does not depend on the particular representation of ω. For this we use the rules in (13.4) to express ω canonically as X ω= gi1 ,...,im dxi1 ∧ · · · ∧ dxim . (i1 ,...,im )∈Im

Here, each gi1 ,...,im is a linear combination the functions fj1 ,...,jm produced by combining these functions during the reduction process. Applying the same sequence of operations to the sum on the right in (13.6) results in X ηi1 ,...,im ∧ dxi1 ∧ dxi2 ∧ · · · ∧ dxim , (i1 ,...,im )∈Im

Integration on Surfaces

455

where ηi1 ,...,im is precisely the same linear combination of the forms dfj1 ,...,jm . Since the differential is linear on 0-forms, ηi1 ,...,im = dgi1 ,...,im . Therefore, all versions of dω may be reduced to the same form and hence are equal. For the next example, we introduce the following notation and terminology from classical vector analysis. 13.1.14 Definition. The curl of a C 1 vector field F~ = (f1 , f2 , f3 ) on an open subset of R3 is the vector curl F~ = (∂2 f3 − ∂3 f2 ) e1 + (∂3 f1 − ∂1 f3 ) e2 + (∂1 f2 − ∂2 f1 ) e3 . The divergence of a C 1 vector field F~ = (f1 , . . . , fn ) on an open subset of Rn is defined by n X ~ div F = ∂i fi . i=1

If ω =

Pn

j=1

fj dxj we define div ω = div F~ .

♦

13.1.15 Example. In R3 , (a) d f1 dx1 + f2 dx2 + f3 dx3

= (∂1 f1 dx1 + ∂2 f1 dx2 + ∂3 f1 dx3 ) ∧ dx1 + (∂1 f2 dx1 + ∂2 f2 dx2 + ∂3 f2 dx3 ) ∧ dx2 + (∂1 f3 dx1 + ∂2 f3 dx2 + ∂3 f3 dx3 ) ∧ dx3 = (∂2 f3 − ∂3 f2 ) dx2,3 + (∂3 f1 − ∂1 f3 ) dx3,1 + (∂1 f2 − ∂2 f1 ) dx1,2 = e1 · curl F~ dx2,3 + e2 · curl F~ dx3,1 + e3 · curl F~ dx1,2 . (b) d f3 dx1 ∧ dx2 + f1 dx2 ∧ dx3 + f2 dx3 ∧ dx1 ) = (∂1 f3 dx1 + ∂2 f3 dx2 + ∂3 f3 dx3 ) ∧ dx1 ∧ dx2 + (∂1 f1 dx1 + ∂2 f1 dx2 + ∂3 f1 dx3 ) ∧ dx2 ∧ dx3 + (∂1 f2 dx1 + ∂2 f2 dx2 + ∂3 f2 dx3 ) ∧ dx3 ∧ dx1 = (∂1 f1 + ∂2 f2 + ∂3 f3 ) dx1 ∧ dx2 ∧ dx3 = div F~ dx1 ∧ dx2 ∧ dx3 .

♦

13.1.16 Theorem. Let f be a 0-form, let ω and η be p-forms, and let ν be a q form, all of class C 1 on S ⊆ Rn . Then (a) d(aω + bη) = a dω + b dη, a, b ∈ R; (b) d2 ω := d(dω) = 0; (c) d(ω ∧ ν) = (dω) ∧ ν + (−1)p ω ∧ (dν); (d) d(f ν) = (df ) ∧ ν + f dν. Proof. Part (a) is clear from the definition of addition and scalar multiplication of m-forms and the linearity of the differential operator on 0-forms.

456

A Course in Real Analysis For (b), it suffices by linearity to prove that d (df ) dxj1 ∧ dxj2 ∧ · · · ∧ dxjp = 0.

The left side of this equation is X n d (∂k f )dxk ∧ dxj1 ∧ · · · ∧ dxjp k=1

=

X n X n

∂j ∂k f dxj ∧ dxk ∧ dxj1 ∧ · · · ∧ dxjp .

j=1 k=1

Since dxk ∧ dxj = −dxj ∧ dxk and ∂j ∂k f = ∂k ∂j f , the terms in the square brackets on the right cancel pairwise, producing zero, as required. To prove (c), let X X ω= fj dxj and ν = gk dxk . j∈Jp

k∈Jq

By the product rule for differentials of 0-forms, X d(ω ∧ ν) = d(fj gk ) ∧ dxj ∧ dxk j∈Jp , k∈Jq

=

X

gk (dfj ) ∧ dxj ∧ dxk +

j∈Jp ,k∈Jq

X

fj (dgk ) ∧ dxj ∧ dxk

j∈Jp ,k∈Jq

= (dω) ∧ ν + (−1)−p ω ∧ (dν), the last equality because p adjacent interchanges are needed to place the form dgk in the second sum to the immediate left of dxk . Part (d) follows from (c) with p = 0.

The Pullback of a Form Throughout this subsection, U ⊆ Rm and W ⊆ Rn are open and ϕ : U → W is a C 1 map. 13.1.17 Definition. The pullback by ϕ of a C 1 function (0-form) f on W is the 0-form ϕ∗ (f ) on U defined by ϕ∗ (f )(u) := f ϕ(u) , u ∈ U. The pullback by ϕ of the 1-form dxj on W is the 1-form ϕ∗ (dxj ) on U defined by m X ∂ϕj ϕ∗ (dxj ) := dui = dϕj , j = 1, . . . , n. ∂ui i=1

Integration on Surfaces

457

The pullback by ϕ of the C 1 p-form X ω= fj1 ,...,jp dxj1 ∧ · · · ∧ dxjp (j1 ,...,jp )∈Jp

on W is the C 1 p-form ϕ∗ ω on U defined by X ϕ∗ ω := ϕ∗ (fj1 ,...,jp )ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp ).

♦

(j1 ,...,jp )∈Jp

Arguments similar to those used earlier show that the definition of ϕ∗ ω is independent of the representation of ω. 13.1.18 Example. Let ϕ = (ϕ1 , ϕ2 , ϕ3 ) : R2 → R3 be C 1 . Then (a) ϕ∗ f dx1 ∧ dx2 ) = ϕ∗ (f )ϕ∗ ( dx1 ) ∧ ϕ∗ ( dx2 ) ∂ϕ1 ∂ϕ2 ∂ϕ2 ∂ϕ1 du1 + du2 ∧ du1 + du2 = (f ◦ ϕ) ∂u1 ∂u2 ∂u1 ∂u2 ∂ϕ1 ∂ϕ2 ∂ϕ2 ∂ϕ1 = (f ◦ ϕ) − du1 ∧ du2 . ∂u1 ∂u2 ∂u1 ∂u2 (b) ϕ∗ f1 dx1 + f2 dx2 + f3 dx3 = ϕ∗ (f1 )ϕ∗ ( dx1 ) + ϕ∗ (f2 )ϕ∗ ( dx2 ) + ϕ∗ (f3 )ϕ∗ ( dx3 ∂ϕ1 ∂ϕ1 ∂ϕ2 ∂ϕ2 du1 + du2 + (f2 ◦ ϕ) du1 + du2 = (f1 ◦ ϕ) ∂u1 ∂u2 ∂u1 ∂u2 ∂ϕ3 ∂ϕ3 + (f3 ◦ ϕ) du1 + du2 ∂u1 ∂u2 ∂ϕ1 ∂ϕ2 ∂ϕ3 = (f1 ◦ ϕ) + (f2 ◦ ϕ) + (f3 ◦ ϕ) du1 ∂u1 ∂u1 ∂u1 ∂ϕ2 ∂ϕ3 ∂ϕ1 + (f2 ◦ ϕ) + (f3 ◦ ϕ) du2 . ♦ + (f1 ◦ ϕ) ∂u2 ∂u2 ∂u2 13.1.19 Theorem. If ω and η are C 1 p-forms and ν is a C 1 q-form, then (a) ϕ∗ (aω + bη) = aϕ∗ (ω) + bϕ∗ (η), a, b ∈ R; (b) ϕ∗ (ω ∧ ν) = ϕ∗ (ω) ∧ ϕ∗ (ν); (c) ϕ∗ (dω) = dϕ∗ (ω); (d) (ϕ∗ ω)u (a1 , . . . , ap ) = ωϕ(u) (dϕu (a1 ), . . . , dϕu (ap )). Proof. Part (a) follows directly from the definition of pullback. Part (b) is easily established for ω = f dxi1 ∧ · · · ∧ dxip and ν = g dxj1 ∧ · · · ∧ dxjq ; bilinearity of the wedge product and linearity of ϕ∗ then imply that (b) holds generally. For (c) it suffices, by linearity of the differential and pullback, to verify that ϕ∗ d(f dxj1 ∧ · · · ∧ dxjp ) = dϕ∗ (f dxj1 ∧ · · · ∧ dxjp ),

458

A Course in Real Analysis

that is, n X

[(∂j f ) ◦ ϕ] ϕ∗ ( dxj ) ∧ ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp )

j=1

=

X m

∂i (f ◦ ϕ)dui ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp ) (13.7)

i=1

By the chain rule, ∂i (f ◦ ϕ) =

n X ∂ϕj , (∂j f ) ◦ ϕ ∂ui j=1

hence the right side of (13.7) is n m X X ∂ϕj (∂j f ) ◦ ϕ dui ∂ui j=1 i=1

! ∧ ϕ∗ (dxj1 ) ∧ · · · ∧ ϕ∗ (dxjp ).

Recalling the definition of ϕ∗ ( dxj ), we see that the last expression is precisely the left side of (13.7). To prove (d), let ω have canonical representation X ω= fi1 ,...,ip dxi1 ∧ · · · ∧ dxip . (i1 ,...,ip )∈Ip

By 13.1.2, it suffices to show that (ϕ∗ ω)u (e`1 , . . . , e`p ) = ωϕ(u) (dϕu (e`1 ), . . . , dϕu (e`p )) for any (`1 , . . . , `p ) ∈ Ip . The left side of this equation is X ϕ∗ (fi )(u) ϕ∗ (dxi1 ) ∧ · · · ∧ ϕ∗ (dxip ) (e`1 , . . . , e`p ) i∈Ip

and the right side is X fi ϕ(u) dxi1 ∧ · · · ∧ dxip dϕu (e`1 ), . . . , dϕu (e`p )) i∈Ip

Hence it suffices to prove that ϕ∗ (dxi1 ) ∧ · · · ∧ ϕ∗ (dxip ) (e`1 , . . . , e`p ) = dxi1 ∧ · · · ∧ dxip dϕu (e`1 ), . . . , dϕu (e`p ))

(13.8)

By multilinearity, ϕ∗ (dxi1 ) ∧ · · · ∧ ϕ∗ (dxip ) =

X

∂ϕip ∂ϕi1 ··· duj1 ∧ · · · ∧ dujp . ∂uj1 ∂ujp

(j1 ,...,jp )∈Jp

Integration on Surfaces

459

Now, duj1 ∧ · · · ∧ dujp (e`1 , . . . , e`p ) 6= 0 only if the p-tuple (j1 , . . . , jp ) is a permutation of (`1 , . . . , `p ). For each such p-tuple define a permutation σ of (1, . . . , p) such that `k = jσ(k) . Then duj1 ∧ · · · ∧ dujp (e`1 , . . . , e`p ) = (−1)σ du`1 ∧ · · · ∧ du`p (e`1 , . . . , e`p ) = (−1)σ and

∂ϕiσ(1) ∂ϕiσ(p) ∂ϕip ∂ϕip ∂ϕi1 ∂ϕi1 ··· = ··· = ··· , ∂uj1 ∂ujp ∂u`τ (1) ∂u`τ (p) ∂u`1 ∂u`p

where τ = σ −1 . Thus the left side of (13.8) is X ∗ ∂ϕiσ(p) ∂ϕiσ(1) ··· , (13.9) ϕ (dxi1 )∧· · ·∧ϕ∗ (dxip ) (e`1 , . . . , e`p ) = (−1)σ ∂u`1 ∂u`p σ where the sum is taken over all permutations σ of (1, . . . , p). On the other hand, since dϕu (e ) = ∂`j ϕ(u) = `j

p X ∂ϕi (u) i=1

∂u`j

ei ,

the right side of (13.8) is dxi1 ∧ · · · ∧ dxip =

p p X X ∂ϕ ∂ϕ j j ej , . . . , ej ∂u ∂u ` ` 1 p j=1 j=1 X

∂ϕjp ∂ϕj1 ··· dx ∧ · · · ∧ dxip (ej1 , . . . , ejp ). (13.10) ∂u`1 ∂u`p i1

(j1 ,...,jp )∈Jp

As above, dxi1 ∧ · · · ∧ dxip (ej1 , . . . , ejp ) 6= 0 only if the p-tuple (j1 , . . . , jp ) is a permutation of (i1 , . . . , ip ). For each such p-tuple, define a permutation σ of (1, . . . , p) such that jk = iσ(k) . Then dxi1 ∧ · · · ∧ dxip (ej1 , . . . , ejp ) = dxi1 ∧ · · · ∧ dxip eiσ(1) , . . . , eiσ(k) = (−1)σ and

∂ϕiσ(1) ∂ϕiσ(p) ∂ϕjp ∂ϕj1 ··· = ··· ∂u`1 ∂u`p ∂u`1 ∂u`p

so (13.10) reduces to X σ

(−1)σ

∂ϕiσ(1) ∂ϕiσ(p) ··· , ∂u`1 ∂u`p

where the sum is taken over all permutations of (1, . . . , p). As this is precisely (13.9) the proof is complete.

460

A Course in Real Analysis

Exercises 1. Let Tj ∈ L(Rn , R), j = 1, . . . , m. Which of the following functions is multilinear on Rn ? Pm Qm (a) M (x1 , . . . , xm ) := i=1 Ti (xi ). (b) M (x1 , . . . , xm ) := i=1 Ti (xi ). 2. For fixed c = (c1 , c2 ), d = (d1 , d2 ) ∈ R2 define M (x, y) := (c · x)(d · y) − (c · y)(d · x), x, y ∈ R2 . (a) Show that M is an alternating multilinear functional on R2 . (b) Express M in terms of differentials, as in 13.1.3 3.S Let M (a1 , . . . , am ) be a multilinear functional on Rn with the property that M (a1 , . . . , am ) = 0 whenever two of the vectors aj are equal. Prove that M is alternating. 4. Let M be an alternating m-multilinear functional on Rn . Show that if the vectors a1 , . . . , am are linearly dependent, then M (a1 , . . . , am ) = 0. 5. Let M (a1 , . . . , am ) be an m-multilinear functional on Rn . Define Alt(M )(a1 , . . . , am ) =

1 X (−1)σ M aσ(1) , . . . , aσ(m) , m! σ

where the sum is taken over all permutations σ of (1, . . . , m). Show that Alt(M ) is an alternating m-multilinear functional on Rn and that Alt(M ) = M iff M is alternating. n 6. Prove that the vector space of m-forms on S has dimension m . 7. Find the canonical representation of the following forms in R3 : (a)S (f1 dx1 + f2 dx2 + f3 dx3 ) ∧ (g1 dx1 + g2 dx2 + g3 dx3 ). (b) (f1 dx1 + f2 dx2 + f3 dx3 ) ∧ (g1 dx1 + g2 dx2 + g3 dx3 ) ∧(h1 dx1 + h2 dx2 + h3 dx3 ). 8. Find the canonical representation of the following forms in R5 : (a) (−dx1 + dx2 + dx3 ∧ (dx1 − 2dx2 + 3dx3 ). (b)S (dx1 + dx2 ) ∧ (dx1 − dx3 ) ∧ (dx2 + 2dx3 ). (c) dx1 ∧ (dx1 ∧ dx3 + 3dx5 ∧ dx4 ). (d) dx1 ∧ dx2 + dx1 ∧ dx3 ∧ dx 4 ∧ dx3 + dx2 ∧ dx5 ∧ dx3 ∧ dx1 + dx4 ∧ dx1 . 9. Find the canonical representation of the following forms in Rn : (a)S dx2 ∧ dx4 ∧ · · · ∧ dx2k ∧ dx1 ∧ dx3 ∧ · · · ∧ dx2k−1 , 2k ≤ n. (b) dx1 ∧ dx5 ∧ · · · ∧ dx4k−3 ∧ dx3 ∧ dx7 ∧ · · · ∧ dx4k−1 ∧ dx2 ∧ dx6 ∧ · · · ∧ dx4k−2 ∧ dx4 ∧ dx8 ∧ · · · ∧ dx4k , 4k ≤ n.

Integration on Surfaces

461

10. Show that if ω is an m-form and m is odd, then ω ∧ ω = 0. Find an example of a 2-form ω in R4 such that ω ∧ ω 6= 0. 11. Use the method of 13.1.12 to verify the determinants 1 −1 −1 3 1 0 2 1 1 = -4. (b)S 2 −1 0 = 9. (c) 0 (a) −1 1 0 −1 1 −1 2 1 12. Show directly that in Rn , d f ( dx1 ∧ · · · ∧ dxn ) = 0.

1 2 −1 1 = -6. 1 0

13. Let f : R → R be C 1 and define gj (x) = f (xj ). Find the canonical representation of n n X X S (a) d gj dxj . (b) d gn−j+1 dxj . j=1

j=1

14. Find d(f dg), where f is C 1 and g is C 2 on W . 15.S A form η on W is exact if η = dω for some form ω on W . Prove that if η is exact and dν = 0, then η ∧ ν is exact. Pn 16. Let f and ω := i=1 fi dxi be C 1 on an open set W ⊆ Rn . Show that if d(f ω) = 0, then f ω ∧ dω = (df ) ∧ ω ∧ ω. 17.S Let U ⊆ Rk , V ⊆ R` , and W ⊆ Rn be open and let ϕ : U → V and ψ : V → W be C 1 . If ω is an m-form on W , prove that (ψ ◦ ϕ)∗ ω = ϕ∗ (ψ ∗ ω). Hint. Use 13.1.19(d). 18. Let U, W ⊆ Rn be open and let ϕ : U → W and f : W → Rn be C 1 . Show that ϕ∗ (dx1 ∧ · · · ∧ dxn ) = det(ϕ0 ) du1 ∧ · · · ∧ dun . 19.S Let F = (f1 , f2 , f3 ) be C 1 on R3 and homogeneous of degree k ∈ N. (See Exercise 9.3.15.) Let ω = f1 dx1 + f2 dx2 + f3 dx3 . Show that if dω = 0, then ω = df where f (x) = (k + 1)−1 F (x) · x.

13.2

Integrals on Parameterized Surfaces

Recall that the length of a parameterized curve C in Rn is, by definition, a limit of lengths of inscribed polygonal lines. The proof of 12.2.4 shows that if the curve C is C 1 , then its length may be also be approximated by tangent line segments. This idea may be extended to higher dimensions, using tangent parallelepipeds to approximate surface area. This leads ultimately to the definition of the integral of a function or a form on a surface.

462

A Course in Real Analysis

Area of a Parallelepiped 13.2.1 Definition. The parallelepiped spanned by vectors a1 , . . . , am ∈ Rn is the set X m 1 m i P = P (a , . . . , a ) := ti a : 0 ≤ ti ≤ 1 . i=1

The volume vol(P ) of P is its n-dimensional Lebesgue measure.

♦

For m = n, there is a simple formula for the volume: 13.2.2 Lemma. vol(P ) = det a1 . . . an . Proof. Denote by T ∈ L(Rn , Rn ) the linear mapping with matrix A := 1 n a · · · a . Since T (ej ) = aj , a typical member of P := P (a1 , . . . , an ) may be expressed as X n n X ti ai = T ti ei = T (t1 , . . . , tn ) ∈ T ([0, 1]n ) . i=1

i=1

By 11.6.3 and 11.6.9, λn (P ) = λn (T ([0, 1]n )) = | det A|λn ([0, 1]n ) = | det A|. If m < n, then λn (P ) = 0 but P may still have positive m-dimensional Lebesgue measure, as defined in 11.6.9. Specifically, let V denote the linear span of the vectors a1 , . . ., am and choose an orthonormal basis v 1 , . . ., v n of Rn such that v 1 , . . ., v m is a basis for V. Define T ∈ L(V, Rm ) so that T (v j ) = ej , 1 ≤ j ≤ m. Thus T “rotates” and/or “reflects” V onto Rm × {0}. The area of P is then defined by area P (a1 , . . . , am ) = λm T P (a1 , . . . , am ) . A concrete value for this area is given in the following theorem. 13.2.3 Theorem. Let m < n, a1 , . . ., am ∈ Rn , and A = [a1 · · · am ]. Then X p 2 1/2 area P (a1 , . . . , am ) = det(At A) = det Ai . i∈Im

Proof. Set b = T (aj ) and B = b j

1

···

b . By linearity of T , m

T P (a1 , . . . , am ) = P (b1 , . . . , bm ) ⊆ Rm ,

hence, by 13.2.2, area P (a1 , . . . , am ) = λm P (b1 , . . . , bm ) = | det B|. Now, the (i, j)th entry of B t B is bi ·bj , and because T preserves inner products this is the same as ai · aj . Therefore, B t B = At A, hence p p p | det B| = (det B t )(det B) = det(B t B) = det(At A). This proves the first equality in the theorem. The second equality is from 13.1.6.

Integration on Surfaces

463

Area of a Parameterized Surface Let ϕ : U → Rn be a parameterized m-surface in Rn with image S and let u = (u1 , . . . , um ) ∈ U and a = ϕ(u) ∈ S. Choose a small m-dimensional interval Q = [u1 , u1 + ∆u1 ] × · · · × [um , um + ∆um ] ⊆ U, ∆uj > 0. As noted in Chapter 12, the line segments u + tej in U map onto curves in S with tangent vectors dϕu (ej ) = ∂j ϕ(u),

1 ≤ j ≤ m,

at ϕ(u). The matrix with columns ∂j ϕ(u) is ϕ0 (u), the Jacobian matrix of ϕ at u. By 13.2.3, the parallelepiped spanned by the vectors ∆uj ∂j ϕ(u) therefore (∆u2 ) dϕu (e2 ) U

Q (∆u2 )e2 u (∆u )e 1 1

ϕ

S = ϕ(U ) p

ϕ(Q)

(∆u1 ) dϕu (e1 )

FIGURE 13.1: Parallelogram approximation to ϕ(Q). has area

q det ϕ0 (u)t ϕ0 (u) ∆u1 ∆u2 · · · ∆um ,

which is taken as an approximation of the area of the surface element ϕ(Q). Partitioning U into a grid Q of intervals Q and summing these expressions, we obtain the Riemann sums Xq det ϕ0 (u)t ϕ0 (u) ∆u1 ∆u2 · · · ∆um . Q

It is reasonable then to define the area of S as the limit of these sums as the diameters of the intervals Q tend to zero, that is, Z q area(ϕ) := det ϕ0 (u)t ϕ0 (u) du. (13.11) U

Integral of a Function on a Parameterized Surface Let f be a continuous, real-valued function on S = ϕ(U ). Motivated by (13.11) we define the surface integral of f over ϕ by Z Z q f dS = (f ◦ ϕ)(u) det ϕ0 (u)t ϕ0 (u) du (13.12) ϕ

U

464

A Course in Real Analysis

whenever the right side exists. In particular, Z area(S) = 1 dS. ϕ

The integral on the right in (13.12) may be interpreted as a Lebesgue integral or (if ϕ has compact support) as a Riemann integral. In the latter case, it is a limit of Riemann sums q X (13.13) (f ◦ ϕ)(u) det ϕ0 (u)t ϕ0 (u) ∆u1 · · · ∆um . Q

This interpretation has important physical applications. For example, if f is the density in mass per unit area of a curved sheet S in R3 , then (13.13) approximates the mass of the surface element X ϕ {u + tj ej : 0 ≤ tj ≤ ∆uj } , j

hence ϕ f gives the mass of S. For another example, let f (x) be denote the R temperature of the sheet at point x ∈ S. Then [area(S)]−1 ϕ f dS gives the average temperature of the sheet. To evaluate (13.12), it is useful to note that since ϕ0 = ∂1 ϕ · · · ∂n ϕ , by 13.1.6 R

X

det ϕ0 (u)t ϕ0 (u) =

(i1 ,...,im )∈Im

2 ∂(ϕii , . . . ϕim ) (u) ∂(u1 , . . . , um )

(13.14)

The following instances of 13.12 are of particular interest. 13.2.4 Special Cases. (a) m = 1: Then det ϕ0 (u)t ϕ0 (u) = kϕ0 (u)k2 , hence Z Z Z 0 f dS = (f ◦ ϕ)(u)kϕ (u)k du = f ds, ϕ

U

ϕ

which is the line integral of Section 12.2. (b) m = 2: In this case det ϕ0 (u)t ϕ0 (u) = det hence

Z ϕ

f dS =

Z U

∂1 ϕ ∂1 ϕ ∂2 ϕ

∂2 ϕ

∂ ϕ · ∂1 ϕ ∂1 ϕ · ∂1 ϕ = 1 ∂1 ϕ · ∂2 ϕ ∂2 ϕ · ∂2 ϕ

q 2 (f ◦ ϕ) k∂1 ϕk2 k∂2 ϕk2 − ∂1 ϕ · ∂2 ϕ du.

Integration on Surfaces

465

(c) m = n − 1: Here det ϕ (u) ϕ (u) = 0

t

0

n X ∂(ϕ1 , . . . , ϕbi , . . . ϕn )

∂(u1 , . . . , un−1 )

i=1

hence

Z

f dS =

ϕ

Z

(u)

2

= k∂ϕ⊥ (u)k2 ,

(f ◦ ϕ)(u)k∂ϕ⊥ (u)k du.

U

(d) ϕ(u1 , . . . , un−1 ) = u1 , . . . , un−1 , g(u1 , . . . , un−1 ) (the graph of g): Let i = (1, . . . , i − 1, i + 1, . . . , n). Then 1 0 ··· 0 0 1 ··· 0 ∂(ϕ1 , . . . , ϕbi , . . . ϕn ) . . .. .. = .. . ∂(u1 , . . . , un−1 ) 0 0 ··· 1 ∂1 g ∂2 g · · · ∂n−1 g i ( n−1+i (−1) ∂i g, i < n, = (13.15) 1, i = n, hence and

det ϕ0 (u)t ϕ0 (u) = 1 + k∇g(u)k2 Z

f dS =

Z

ϕ

p (f ◦ ϕ)(u) 1 + ||∇g(u)||2 du.

♦

U

13.2.5 Example. Let S be the following portion of an n-dimensional cone: n o n X x2i , 0 < xn+1 < 1 . S = (x1 , . . . , xn+1 ) : x2n+1 = i=1

Then S is parameterized by ϕ(x) = x, g(x) , g(x) := kxk, x := (x1 , . . . , xn ), where ∇g(x) = x/kxk. If f is of the form f (x) = h(kxk), then, by Exercise 11.6.3, Z Z 1 √ f dS = 2 n αn h(r)rn−1 dr. ϕ

0

In particular, taking h = 1, area(S) =

√

2 αn ,

♦

The following result will be needed later to construct the integral of a function on a general m-surface. It asserts that the integral over a parameterized surface ϕ is invariant under a change of parameter and hence may be viewed as a construct intrinsic to the image of ϕ.

466

A Course in Real Analysis

13.2.6 Proposition. Let U and V be open subsets of Rm , α : V → U a C 1 function with C 1 inverse, and ϕ : U → Rn a Rparameterized m-surface. Then R ψ := ϕ ◦ α is a parameterized m-surface and ϕ f dS = ψ f dS. Proof. By the chain rule, ψ 0 (v) = ϕ0 (u)α0 (v), where u = α(v), hence det ψ 0 (v)t ψ 0 (v) = det α0 (v)t ϕ0 (u)t ϕ0 (u)α0 (v) 2 = Jα (v) det ϕ0 (u)t ϕ0 (u) . Therefore, by the change of variables theorem, Z Z q f dS = (f ◦ ψ)(v) det ψ 0 (v)t ψ 0 (v) dv ψ V Z q = (f ◦ ϕ)(α(v)) det ϕ0 (α(v))t ϕ0 (α(v)) |Jα (v)| dv ZV q = (f ◦ ϕ)(u) det ϕ0 (u)t ϕ0 (u) du ZU = f dS. ϕ

13.2.7 Remark. The material in this section holds, in particular, for a local parametrization of an m-surface as well as a local parametrization of an (n − 1)surface-with-boundary. In the latter case, the domain of the parametrization at a boundary point is an open set in Hn−1 . ♦

Integration of a Form on a Parameterized m-Surface 13.2.8 Definition. Let ϕ : U → Rn be a parameterized orientable m-surface in Rn and let X ω= fj1 ,··· ,jm dxj1 ∧ · · · ∧ dxjm (j1 ,··· ,jm )∈Jm

be a continuous m-form on S := ϕ(U ). The integral of ω over ϕ is defined by Z Z Z ω= ω = sign(ϕ) ωϕ(u) dϕu (e1 ), . . . , dϕu (em ) du. ♦ ϕ

S

U

The inclusion of sign(ϕ) corresponds to the familiar convention Z a Z b f (t) dt = − f (t) dt b

a

for Riemann integrals, which reflects the fact that the process of Riemann integration respects the natural orientation (ordering) of the interval [a, b]. Recalling that dϕu (ej ) = ∂j ϕ(u) and ∂(ϕj1 , . . . , ϕjm ) dxj1 ∧ · · · ∧ dxjm ∂1 ϕ(u), . . . , ∂m ϕ(u) = (u), ∂(u1 , . . . , um )

Integration on Surfaces we obtain the formula Z Z ω = sign(ϕ) ϕ

X

(fj1 ,...,jm ◦ ϕ)

U (j ,...,j )∈J 1 m m

467

∂(ϕj1 , . . . , ϕjm ) du. ∂(u1 , . . . , um )

(13.16)

The following instances of (13.16) are of particular importance. 13.2.9 Special Cases. Let ϕ be positively oriented. (a) m = 1:

Z X n

fi dxi =

i=1

ϕ

n Z X i=1

fi ϕ(t) ϕ0i (t) dt,

I

which is the integral of Section 12.2. (b) m = n − 1: Z X n ϕ

ci ∧ · · · ∧ dxn = fi dx1 ∧ · · · ∧ dx

i=1

Z X n ∂(ϕ1 , . . . , ϕbi , . . . ϕn ) du. (fi ◦ ϕ) ∂(u1 , . . . , un−1 ) i=1

U

In particular, for the graph ϕ(u1 , . . . , un−1 ) = u1 , . . . , un−1 , g(u1 , . . . , un−1 ) , we have from (13.15) Z X n

ci ∧ · · · ∧ dxn = fi dx1 ∧ · · · ∧ dx

fn ◦ ϕ +

U

i=1

ϕ

Z h

(c) m = 2, n = 3: Let Dij (u) :=

n−1 X

i (−1)n−1+i (fi ◦ ϕ)∂i g du.

i=1

∂(ϕi , ϕj ) . Then ∂(u1 , u2 )

Z

f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 ϕ Z = [(f1 ◦ ϕ)(u)D23 (u) + (f2 ◦ ϕ)(u)D13 (u) + (f3 ◦ ϕ)(u)D12 (u)] du. U

(d) m-form on parameterized surface ι : U → U : Z Z X gj1 ,··· ,jk duj1 ∧ · · · ∧ dujm = ι (j ,··· ,j )∈J 1 m m

X

gj (u) du.

13.2.10 Notation. For the integral on the left in (d) we write R . In particular, ι Z Z g duj1 ∧ · · · ∧ dujm = g(u) du. ι

♦

U j∈J m

U

R U

instead of

♦

468

A Course in Real Analysis

13.2.11 Example. Let S be the following portion of a paraboloid: S = (x1 , x2 , x3 ) : x1 = x22 + x23 , 0 < x1 < 1 . For purposes of integration, we may consider S to be the image of the parameterized 2-surface √ √ ϕ(t, θ) = (t, t cos θ, t sin θ), 0 < t < 1, 0 < θ < 2π, since there are no contributions to an integral on the set where θ = 0. By 13.2.9(c) , Z Z 1 Z 2π ∂(ϕ1 , ϕ2 ) x22 x3 dx1 ∧ dx2 = [t3/2 cos2 θ sin θ] dθ dt ∂(t, θ) S 0 0 Z 1 Z 2π =− t2 cos2 θ sin2 θ dθ dt 0

=−

0

π . 12

♦

The following proposition, the analog of 13.2.6 for differential forms, shows that the definition of integral of a form is invariant under reparametrizations. 13.2.12 Proposition. Let U, V be open connected subsets of Rm , α : V → U a C 1 function with C 1 inverse and positive Jacobian, and ϕ : U → Rn a parameterized orientable m-surface. If ω is a continuous m-form on ϕ(U ), then Z Z ω= ω. ϕ

ϕ◦α

Proof. Note first that sign(Jα ) is constant since α is C 1 and V is connected. Let ψ = ϕ ◦ α. By the chain rule and the change of variables theorem, Z ∂(ψj1 , . . . , ψjm ) fj1 ,...,jm ◦ ψ dv ∂(v1 , . . . , vm ) V Z ∂(ϕj1 , . . . , ϕjm ) α(v) Jα (v) dv = fj1 ,...,jm ◦ ϕ ◦ α (v) ∂(u , . . . , u ) 1 m ZV ∂(ϕj1 , . . . , ϕjm ) = (fj1 ,...,jm ◦ ϕ) du. ∂(u1 , . . . , um ) U The conclusion now follows from (13.16) and linearity of the integral. R The final result of this section expresses ϕ ω as an integral of a form on U . It will be needed in the proof of Stokes’s theorem. 13.2.13 Theorem. Let U ⊆ Rm be open and let ϕ : U → Rn be an oriented parameterized surface. If ω is a C 1 m-form on ϕ(U ), then Z Z ω = sign(ϕ) ϕ∗ ω. ϕ

U

Integration on Surfaces

469

Proof. By (d) of 13.1.19, if ι : U → U denotes the identity map then ωϕ(u) (dϕu (e1 ), . . . , dϕu (em )) = (ϕ∗ ω)u (e1 , . . . , em ) = (ϕ∗ ω)u (d ιu (e1 ), . . . , d ιu (em )), The result now follows directly from the definition of the integral of a form (13.2.8) and 13.2.10.

Exercises 1. Find the area of the following 2-surfaces in R3 . (a) ϕ(t, θ) = (t cos θ, t sin θ, t), t ∈ (0, 1), θ ∈ (0, 2π). (b)S ϕ(t, θ) = (t cos θ, t sin θ, θ), 0 < t < 1, 0 < θ < 2π. (c) ϕ(θ, s) = (1 − s) a cos θ, a sin θ, 0 + s b cos θ, b sin θ, 1), 0 < s < 1, 0 < θ < 2π, 0 < a < b. 2. Let a1 , . . . , am ∈ Rn be linearly independent and let b ∈ Rn . Define ϕ : Rm → Rn by ϕ(u1 , . . . , um ) = b +

m X

ui a i .

i=1

(See 12.3.2.) For a continuous function f on Rn , prove that Z Z p f = det(At A) (f ◦ ϕ)(u) du, Rn

ϕ

where A = a1 · · · am n×m . 3. Let ϕ be as in Exercise 2. Show that Z Z X X fi dxi = det(Ai ) fi ◦ ϕ du. ϕ i∈I m

i∈Im

U

4.S Show that the area of the Cartesian product of circles ϕ(θ1 , . . . , θm ) = r1 cos θ1 , r1 sin θ1 , . . . , rm cos θm , rm sin θm , ri > 0, is (2πr1 )(2πr2 ) · · · (2πrm ). 5. Let ϕ be the product of two circles: ϕ(θ1 , θ2 ) = r1 cos θ1 , r1 sin θ1 , r2 cos θ2 , r2 sin θ2 , ri > 0, and let ω = f12 dx1 ∧ dx2 + f13 dx1 ∧ dx3 + f14 dx1 ∧ dx4 + f23 dx2 ∧ dx3 + f24 dx2 ∧ dx4 + f34 dx3 ∧ dx4 .

470

A Course in Real Analysis Show that Z Z ω = r1 r2 ϕ

0

2π

Z

2π

(f13 ◦ ϕ) sin θ1 sin θ2 − (f14 ◦ ϕ) sin θ1 cos θ2 0 − (f23 ◦ ϕ) cos θ1 sin θ2 + (f24 ◦ ϕ] cos θ1 cos θ2 dθ dφ.

6.S (Area of an n-dimensional simplex in Rn+1 ). Use Example 11.5.5 to find the surface area of n+1 n o X S = (x1 , . . . xn+1 ) : xj = 1 and xj ≥ 0 . j=1

x3 1

S

1

x1

x2

1

FIGURE 13.2: Two dimensional simplex S in R3 . 7.S Let U ⊆ Rn−2 be open and let ψ : U → Rn−1 be a parameterized (n − 2)-surface in Rn−1 . Let ϕ : U × [0, h] → Rn be the cylinder ϕ(u, s) = ψ(u), s , u ∈ U, 0 ≤ s ≤ h. Show that area(ϕ) = h · area(ψ). 8.S Let ϕ be the cylinder of Exercise 7 for n = 3 and h = 1. Show that Z f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 ϕ

=

Z

1

(f1 ◦ ϕ)ψ10 + (f2 ◦ ϕ)ψ20 dt.

0

9. Let ψ : [a, b] → R2 be a C 1 curve in R2 and let ϕ : [a, b] × (0, h) → R3 be the cone ϕ(t, s) = (1 − s/h)ψ(t), s , a ≤ t ≤ b, 0 < s < h. Show that the area of ϕ is Z q 2 h b 0 2 0 2 ψ1 (t) + ψ2 (t) + h−2 [ψ1 (t)ψ20 (t) − ψ2 (t)ψ10 (t) dt. 2 a

Integration on Surfaces

471

Use this to show that the √ surface area of a right circular cone with radius r and axis length h is πr r2 + h2 . 10. Let ϕ be the cone of Exercise 9 with h = 1. Show that Z f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 ϕ

=

1 2

Z

1

n o (f1 ◦ ϕ)ψ10 + (f2 ◦ ϕ)ψ20 + [ψ1 (t)ψ20 (t) − ψ2 (t)ψ10 (t)] dt.

0

11.S Let ψ : [a, b] → R2 a parameterized C 1 curve with ψ2 (t) > 0 for all t. Define ϕ(t, θ) = ψ1 (t), ψ2 (t) cos θ, ψ2 (t) sin θ , t ∈ I, θ ∈ (0, 2π), which is the parameterized surface of revolution of 12.3.9. Show that Z b area(ϕ) = 2π ψ2 (t)kψ 0 (t)k dt = (2πy)length(ψ), (13.17) a

Z 1 y ds, the y-coordinate of the length(ψ) ψ centroid of ψ. Use the first part of (13.17) to find the surface area of the torus ϕ(t, θ) = a cos θ, (b + a sin t) cos θ, (b + a sin t) sin θ , 0 < θ, t < 2π,

where (x, y) = ψ and y :=

where 0 < a < b. Show also that the area of the cone in Exercise 9 may be found from (13.17). 12. Let ϕ be the parameterized surface of revolution in Exercise 11 and let ω := f1 dx2 ∧ dx3 + f2 dx1 ∧ dx3 + f3 dx1 ∧ dx2 . Show that Z Z bZ 2π Z bZ 2π (f2 ◦ ϕ)ψ10 (t)ψ2 (t) cos θ dθ dt ω= (f1 ◦ ϕ)ψ2 (t)ψ20 (t) dθ dt + ϕ

a

a

0

Z bZ − a

0

2π

(f3 ◦ ϕ)ψ10 (t)ψ2 (t) sin θ dθ dt.

0

Show also that if ψ(t) = (t, g(t)) (the graph of g), then this reduces to Z b Z 2π g(t) (f1 ◦ ϕ)g 0 (t) + (f2 ◦ ϕ) cos θ − (f3 ◦ ϕ) sin θ dθ dt. a

0

13. Use Exercise 12 to evaluate Z Z Z (a)S x1 x3 dx1 ∧ dx2 , (b) x2 x3 dx1 ∧ dx2 , (c) x21 x22 dx2 ∧ dx3 , S

S

S

where S is the cone S = (x1 , x2 , x3 ) : x21 = x22 + x23 , 0 < x1 < 1 .

472

A Course in Real Analysis

14. Repeat Exercise 13 using the portion of the hyperboloid n √ o S = (x1 , x2 , x3 ) : x21 − x22 − x23 = 1, 1 < x1 < 2 . p 2 15. Let S be the torus given by x21 + x22 + x23 − b = a2 , where 0 < a < b. Use Exercise 12 to evaluate Z Z Z (a) x2 dx2 ∧ dx3 . (b)S x1 dx2 ∧ dx3 . (c) x2 dx1 ∧ dx3 . S

13.3

S

S

Partitions of Unity

The theorem proved in this section will be used to extend the definition of the integral to functions and forms on m-surfaces. It will also be needed later in the proofs of Stokes’s theorem and the divergence theorem. 13.3.1 Definition. The support of a continuous function ψ : Rn → R is defined by supp(ψ) = cl {x : ψ(x) 6= 0} . ♦ Thus, by definition of closure, supp(ψ) is the smallest closed set outside of which ψ is zero. 13.3.2 Partition of Unity. Let K be a compact subset of Rn and let {Ui : i ∈ I} be an open cover of K. Then there exists a finite subcover {U1 , . . . , Up } of K and C ∞ functions χi : Rn → [0, i = 1, . . . , p, such P+∞), p that supp(χi ) is compact and contained in Ui and i=1 χi = 1 on K.

χ1

χ2

K U1

U2

FIGURE 13.3: A partition of unity subordinate to U1 and U2 . The functions χi are said to form a partition of unity subordinate to the open sets Ui . They are typically used to patch together local data to form a global construct such as a surface integral, or to reduce a global problem to a local one, as in the case of the proof of Stokes’s theorem. The proof of 13.3.2 requires several lemmas which are of intrinsic interest. 13.3.3 Lemma. Let a < b. Then there exists a C ∞ function h : R → [0, +∞) such that h > 0 on (a, b), and h = 0 on (a, b)c .

Integration on Surfaces

473

Proof. Define h by ( exp (x − a)−1 (x − b)−1 if a < x < b, h(x) = 0 otherwise. Clearly, h(m) = 0 on [a, b]c for all m ≥ 0. Moreover, if x ∈ (a, b), then h(m) (x) is a sum of terms of the form ±h(x) , p, q ∈ Z+ . (x − a)p (x − b)q Since the exponent (x − a)−1 (x − b)−1 is negative on (a, b), by l’Hospital’s rule, lim

x→a+

h(x) = 0. (x − a)p (x − b)q

Therefore, limx→a h(m) (x) = 0, and an induction argument then shows that h(m) (a) = 0 for all m. A similar argument holds at the point b. Thus h is C ∞ on R.

FIGURE 13.4: The functions h and g. 13.3.4 Lemma. Let a < b. Then there exists a C ∞ function g : R → R such that 0 ≤ g ≤ 1, g = 0 on (−∞, a], and g = 1 on [b, +∞). Proof. Let h be the function in 13.3.3. Then g(x) :=

hZ a

b

i−1 Z h

x

h

a

has the required properties. 13.3.5 Lemma. Let I = (a1 , b1 ) × · · · × (an , bn ). Then there exists a C ∞ function f : Rn → R such that f > 0 on I and f = 0 on I c . Proof. For each j, let hj : R → [0, +∞) be a C ∞ function such that hj > 0 on (aj , bj ) and hj = 0 on (aj , bj )c . The function f (x1 , . . . , xn ) := h1 (x1 ) · · · hn (xn ) then satisfies the requirements.

474

A Course in Real Analysis

For the next lemma we define the open cube with center x ∈ Rn and edge 2r by {y ∈ Rn : xj − r < yj < xj + r, j = 1, . . . , n} . 13.3.6 Lemma. Let K ⊆ U ⊆ Rn , where K is compact and U is open. Then there exists a C ∞ function ψ : Rn → [0, 1] such that supp(ψ) ⊆ U and ψ = 1 on K. Proof. For each x ∈ K, let Vx be an open cube with center x and edge 2r such that cl Vx ⊆ U and let Wx ⊆ Vx denote the concentric open cube with center x and edge r. Since K is compact, there exist finitely many cubes Wx whose union contains K. Denote these cubes by W1 , . . . , Wm and denote the corresponding cubes Vx by V1 , . . . , Vm . (See Figure 13.5.) By 13.3.5, for each i

f =0

f >0 U Wi Vi

K

FIGURE 13.5: The cubes Wi and Vi . there exists a C ∞ function fi : Rn → R such that fi > 0 on Wi and fi = 0 on Wic . Set m m m X [ [ f= fi , V = Vi , and W = Wi . i=1

i=1

i=1

Then f is nonnegative and C on R , f > 0 on W ⊇ K, and supp(f ) ⊆ cl(V ) ⊆ U . Now let a = minx∈K f (x). Since a > 0, there exists a C ∞ function g : R → [0, 1] such that g = 0 on (−∞, 0] and g = 1 on [a, +∞) (13.3.4). The function ψ := g ◦ f then has the required properties. ∞

n

Proof of the partition of unity theorem. For each x ∈ K, let i(x) be an index such that x ∈ Ui(x) . Choose a bounded open set Vx containing x such that cl Vx ⊆ Ui(x) . Since K is compact, finitely many of the sets Vx cover K. Denote these by V1 , . . . Vp and denote the corresponding sets Ui(x) by U1 , . . . , Up . Since Vi ⊆ Ki := cl(Vi ) ⊆ Ui , by 13.3.6 there exists a C ∞ function ψi : Rn → [0, 1] such that ψi = 1 on Ki and supp(ψi ) ⊆ Ui . Now set χ1 = ψ1 and χi = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψi−1 )ψi , i > 1. Then χi is C ∞ , 0 ≤ χi ≤ 1, and supp(χi ) ⊆ supp(ψi ) ⊆ Ui . Finally, let ηi = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψi ).

Integration on Surfaces

475

For i > 1, ηi−1 − ηi = (1 − ψ1 )(1 − ψ2 ) · · · (1 − ψi−1 ) 1 − (1 − ψi ) = χi , hence p X

χi = χ1 +

p X

i=1

(ηi−1 − ηi ) = χ1 + η1 − ηp = 1 − ηp .

i=2

S S Pp Since K ⊆ i Vi ⊆ i Ki and ψi = 1 on Ki , ηp = 0 on K, hence i=1 χi = 1 on K, completing the proof.

13.4

Integration on Compact m-Surfaces

In this section we define the integrals of a function and a form on a compact m-surface S = {x ∈ V : F (x) = 0} , where V ⊆ Rn is open, F : V → Rn−m is C 1 , and F 0 (x) has rank n − m for all x ∈ V . To set the stage, let {(Ua , ϕa ) : a ∈ S} be an atlas for S. By the partition of unity theorem, there exist finitely many charts (Ui , ϕi ) := Uai , ϕai and C 1 n functions P χi : R → R such that the sets Si := ϕi (Ui ) cover S, supp(χi ) ⊆ Si , and i χi = 1 on S.

Integral of a Function The (surface) integral of a continuous function f on S is defined by Z XZ XZ q f dS = χi f = (χi f ) ◦ ϕi (u) det ϕ0i (u)t ϕ0i (u) du. S

i

ϕi

i

Ui

To see that the integral is independent of the system { Ui , ϕi , χi )}i and hence ˜j , ϕ˜j , χ is well-defined, consider another such system { U ˜j }j . Since X X χi = χi χ ˜j and χ ˜j = χ ˜j χi on S, j

we see that XZ i

Set

ϕi

f χi =

i

XZ i,j

ϕi

f χi χ ˜j and

XZ j

ϕ ˜j

fχ ˜j =

XZ i,j

ϕ ˜j

−1 −1 ˜ ˜ αij = ϕ˜−1 j ◦ ϕi : ϕi (Si ∩ Sj ) → ϕj (Si ∩ Sj ).

fχ ˜j χi .

476

A Course in Real Analysis

˜ ˜j R= 0 outside Si ∩ S˜j and ϕi = ϕ˜j ◦ αij on ϕ−1 i (Si ∩ Sj ), by 13.2.6 RSince f χi χ f χi χ ˜j = ϕ˜j f χi χ ˜j . Therefore, ϕi XZ XZ f χi = fχ ˜j , ϕi

i

ϕ ˜j

j

as required. The definition of the integral is extended to a finite union S of compact m-surfaces S1 , . . . , Sp by defining Z XZ f dS = f dS. S

Si

i

13.4.1 Definition. The area of S is defined as Z area(S) = 1 dS.

♦

S

13.4.2 Example. In 11.5.6 we found that the volume of the closed ball Crn (0) = {x ∈ Rn : ||x|| ≤ r} is rn αn , where (n−1)/2 2(2π) if n is odd, ···3 · 1 αn = n(n − 2)n/2 (2π) if n is even. n(n − 2) · · · 4 · 2 We now show that for the sphere S := Srn−1 (0) = {x ∈ Rn : ||x|| = r}, n area(S) = nrn−1 αn = λn Crn (0) . (13.18) r To this end, note that the upper hemisphere H u of S is the graph of the function q p g(x1 , . . . , xn−1 ) = r2 − (x21 + · · · + x2n−1 ) = r2 − ||x||2 , ||x|| ≤ r. Let 0 < t < 1 and consider the part of the hemisphere Htu for which ||x|| < rt. Since ||x||2 r2 1 + ||∇g(x)||2 = 1 + 2 = , r − ||x||2 r2 − ||x||2 by 13.2.4(c) area(Htu ) = r

Z

r2 − ||x||2

||x|| 0}.

♦

Note that Sa is an (n − 1)-surface in Rn and hence has a local parametrization at each x ∈ Sa . Let ~na = k∇Fa k−1 ∇Fa , and let x ∈ Sa . For sufficiently small |t|, h(t) := x + t~na (x) ∈ Ua . Since (Fa ◦ h)0 (0) = ∇Fa (x) · ~na (x) = k∇Fa (x)k > 0, (Fa ◦ h) is strictly increasing. Because (Fa ◦ h)(0) = 0 we therefore have ( < 0 if t < 0, Fa x + t~na (x) > 0 if t > 0. It follows from (ii) and (iii) that the normal vector t~na (x) to Sa at x points into E if t < 0 and away from E (that is, toward E c ) if t > 0. The exterior unit normal vector on bd(E) is then defined by ~n(x) = ~na (x), x ∈ Sa . Uniqueness and continuity of ~na shows that ~n is well-defined and continuous on bd(E). (See Figure 13.6.)

Integration on Surfaces

Ua

→ − n (x)

Fa > 0

483

a x Fa = 0 Fa < 0

E

FIGURE 13.6: Regular region E. 13.5.6 Example. The n-dimensional annulus E = {x ∈ Rn : r1 < kxk < r2 } is a regular region in Rn . Here, bd(E) has the components Si = {x ∈ Rn : kxk = ri } , i = 1, 2. The conditions of regularity are met by defining ( r1 − kxk on {x ∈ Rn : kxk < (r1 + r2 )/2} if a ∈ S1 , Fa (x) = kxk − r2 on {x ∈ Rn : kxk > (r1 + r2 )/2} if a ∈ S2 . Figure 13.7 depicts the case n = 2.

♦

S1

S2

E FIGURE 13.7: Annulus in R2 with exterior normal. 13.5.7 Divergence Theorem. If E is a regular region in Rn and ω is a C 1 1-form on cl(E), then Z Z ω · ~n dS = div ωx dx. (13.25) bd(E)

E

484

A Course in Real Analysis

Proof. The proof uses ideas similar to those used in the proof of Stokes’s theorem. By hypothesis, ω is C 1 on an open set containing cl(E), which we may assume also contains the sets Ua in 13.5.5. Since cl(E) is compact, by using a partition of unity as in the proof of Stokes’s theorem, we may assume that for any a = (a1 , . . . , an ) ∈ cl(E) and the neighborhoods W of a constructed in the proof, n [ K := supp(fi ) ⊆ W. i=1

Suppose first that a ∈ E. Choose an n-dimensional interval W containing a such that cl(W ) ⊆ E. If K ⊆ W , then ω = 0 on W c ⊇ bd(E), hence Z

ω · ~n dS = 0 and

Z

div ωx dx =

E

bd(E)

Z

n X

∂i fi (x) dx = 0,

W i=1

the last equality by the Fubini–Tonelli theorem and the fundamental theorem of calculus. Therefore, (13.25) holds in this case.

bd(E) K

W E

a

FIGURE 13.8: The case a ∈ E. Now let a ∈ bd(E) and let Ua and Fa be as in 13.5.5. We may assume that the components ai of a and ni (a) of ~n(a) are positive, otherwise apply a rotation and translation; the change of variables theorem implies that (13.25) is invariant under such transformations. (See Exercise 11 below for a special case of this.) We show that for each i = 1, . . . , n, there exists a neighborhood Wi of a such that if K ⊆ Wi then Z Z fi ni dS = ∂i fi (x) dx. (13.26) S

E

For notational simplicity, we do this for the case i = n. Since ∂n Fa (a) 6= 0, by the implicit function theorem there exists a neighborhood V of (a1 , . . . , an−1 ), an open interval I containing an , and a C 1 function g : V → R such that V × I ⊆ Ua , an = g(a1 , . . . , an−1 ), and Fa x1 , . . . , xn−1 , g(x1 , . . . , xn−1 ) = 0 on V. By continuity, we may choose V and I sufficiently small so that g(x1 , . . . , xn−1 ) > 0 and ∂n Fa (x) > 0 for all x ∈ V × I. Now let x = (x1 , . . . , xn ) ∈ (V × I) ∩ E. Since Fa (x) is a strictly increasing

Integration on Surfaces

485

xn = g(x1 , . . . , xn−1 )

I bd(E) xn < g(x1 , . . . , xn−1 )

V ×I

a

K

V Ua

E

FIGURE 13.9: The case a ∈ bd(E). function of xn ∈ I when the other coordinates are fixed and since Fa (x) < 0, it must be the case that 0 < xn < g(x1 , . . . , xn−1 ). Thus (V × I) ∩ E = {x ∈ V × I : 0 < xn < g(x1 , . . . , xn−1 )} and (V × I) ∩ Sa = {x ∈ V × I : xn = g(x1 , . . . , xn−1 )} . (See Figure 13.9.) Note that the function ϕ defined by ϕ(v) := (v, g(v)), v = (v1 , . . . , vn−1 ) ∈ V, is a local parametrization of Sa with unit normal (1 + k∇gk2 )−1/2 − ∇g, 1 . Since this points outward it coincides with ~n. In particular, the nth component of ~n is nn = (1 + k∇gk2 )−1/2 on (V × I) ∩ Sa . Therefore, if K ⊆ V × I then, by 13.2.4(d), Z Z Z fn p dS = (fn ◦ ϕ)(v) dv. (13.27) fn nn (a) dS = 1 + k∇gk2 V Sa (V ×I)∩Sa On the other hand, since fn = ∂n fn = 0 outside K, by the Fubini–Tonelli theorem, Z Z ∂n fn (x) dx = ∂n fn (x) dx (V ×I)∩E

E

= =

Z Z ZV

g(v1 ,...,vn−1 )

(∂n fn )(v1 , . . . , vn−1 , xn ) dxn dv1 . . . dvn−1

0

(fn ◦ ϕ)(v) dv,

(13.28)

V

the last equality by the fundamental theorem of calculus. Setting Wn = V × I and comparing (13.27) and (13.28), we see that (13.26) holds for i = n. A similar proof works for i < n. Thus if K ⊆ W1 ∩ · · · ∩ Wn , then (13.26) holds for all i. Summing from 1 to n we obtain (13.25).

486

A Course in Real Analysis

Connection with Stokes’s Theorem Let E be a regular region in Rn whose boundary is a finite union of compact connected (n − 1)-surfaces of the form S = {x : F (x) = 0}, where F : U → R is a C 1 function with ∇F 6= 0 such that Ua = U and Fa = F for all a ∈ S. A ball or annulus in Rn are simple examples. By 12.4.8, S is oriented and, for each local parametrization ϕ : V → S, ~n(ϕ(v)) = q

n X ∂(ϕ1 , . . . , ϕ [ i−1 , . . . , ϕn ) (−1)i−1 (v). ∂(v , . . . , vn−1 ) 1 det ϕ0 (v)t ϕ0 (v) i=1

±1

where the sign is chosen to be the same for all v. Let each S have the orientation for which the sign is (+). We shall call the resulting orientation of bd(E) positive. In this setting we have the following consequence of the divergence theorem. 13.5.8 Theorem. Let E be as described above and let ω=

n X

fi dx1 ∧ · · · ∧ d dxi ∧ · · · ∧ dxn

i=1

be an (n − 1)-form on cl(E). If bd(E) is positively oriented, then Z Z ω= dω. E

bd(E)

Proof. Recalling the additive definition of bd(E) ω, we may assume that bd(E) consists of a single compact connected (n − 1)-surface S. Let R

η :=

n X (−1)i−1 fi dxi . i=1

By the above, (~n · η) ◦ ϕ(v) = q

n X ∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) (fi ◦ ϕ)(v) (v), ∂(v1 , . . . , vn−1 ) det ϕ0 (v)t ϕ0 (v) i=1

1

hence Z ϕ

~n · η dS =

n Z X i=1

∂(ϕ1 , . . . , ϕbi , . . . , ϕn ) (fi ◦ ϕ) dv = ∂(v1 , . . . , vn−1 ) V

Using a partition of unity we obtain Z Z ~n · η dS. ω= S

S

Z ω. ϕ

(13.29)

Integration on Surfaces

487

On the other hand, dω =

n X n X (∂j fi ) dxj ∧ dx1 ∧ · · · ∧ d dxi ∧ · · · ∧ dxn i=1

j=1

X n i−1 = (−1) ∂i fi dx1 ∧ · · · ∧ dxn i=1

= div η dx1 ∧ · · · ∧ dxn , hence, recalling 13.2.10, Z Z Z dω = div ηx dx1 ∧ · · · ∧ dxn = div ηx dx E

E

(13.30)

E

The conclusion now follows from (13.29), (13.30), and the divergence theorem. 13.5.9 Remark. The divergence theorem has an interesting application to fluid dynamics. Consider an incompressible fluid moving in space. Let ρ(x, t) denote the density of the fluid in mass per unit volume at time t and point x, and let ~v (x, t) denote its velocity. If ~n is normal to a small surface element of area ∆S, then (ρ~v · ~n)(∆S)(∆t) is approximately the mass of the fluid flowing across that surface element during a small time period ∆t. The rate of flow is then (ρ~v · ~n)∆S. Adding these quantities and taking limits, we see that the rate of flow of the fluid across a surface S in the direction of the normal is given by the integral Z ρ~v · ~n dS S

Now let E be a regular region with smooth boundary S. Applying the foregoing to a ball Bε in E with boundary Sε , center y, and outer normal ~n, we see that the integral Z ρ~v · ~n dS

Sε

represents the rate of flow of the fluid out of the ball, that is, the negative of the rate of R change of fluid in the ball. Since the amount of fluid in the ball at time t is Bε ρ(x, t) dx, d dt

Z Bε

ρ(x, t) dx = −

Z Sε

ρ~v · ~n dS = −

Z

div (ρ~v ) dx,

Bε

the last equality by the divergence theorem. Differentiating under the integral sign and dividing by vol(Bε ), we obtain Z Z 1 1 ∂t ρ(x, t) dx = − div (ρ~v ) dx. vol(Bε ) Bε vol(Bε ) Bε

488

A Course in Real Analysis

Letting ε → 0, we obtain ∂t ρ(y, t) = −div ρ(y, t)~v (y, t) . In particular, if ρ is constant in time, then div (ρ~v ) is zero throughout E, hence Z Z ~ ρ~v · n dS = div (ρ~v ) dx = 0, S

E

that is, the amount of fluid flowing out of E equals the amount flowing in. ♦

Green’s Theorem Let E be a regular region in R2 with boundary the union of finitely many smooth simple pairwise disjoint curves C = ϕ(I). The boundary bd(E) is said to be positively oriented if the vector obtained by rotating the unit tangent vector T~ , which is in the direction of (ϕ01 , ϕ02 ), 90 degrees clockwise. This produces the exterior normal ~n on C, which is in the direction of (ϕ02 , −ϕ01 ). The region is then to the left as the boundary is traced in the direction of the tangent vector field on each curve C.

C1

T~

E

~n

C2 C3 FIGURE 13.10: Regular region E in R2 . Now let ω = Q dx − P dy. Then (ω · ~n)◦ϕ = (Q◦ϕ, −P ◦ϕ)·(ϕ02 , −ϕ01 )kϕ0 k−1 = (P ◦ϕ)ϕ01 +(Q◦ϕ)ϕ02 kϕ0 k−1 , hence

Z

ω · ~n ds =

C

Z

(P dx + Q dy).

C

Summing over the curves C, we have Z Z ~ ω · n ds = bd(E)

(P dx + Q dy).

bd(E)

Since

∂Q ∂P − , ∂x ∂y we obtain the following important special case of the divergence theorem. div ω =

Integration on Surfaces

489

13.5.10 Green’s Theorem. Let E be a region in R2 , as described above. If P, Q are C 1 functions on an open set containing E, then Z ZZ ∂Q ∂P (P dx + Q dy) = − dx dy. (13.31) ∂x ∂y bd(E) E 13.5.11 Corollary. The area of E is given by Z 1 (x dy − y dx). area(S) = 2 ∂S Proof. Apply Green’s theorem to P (x, y) = −y/2, Q(x, y) = x/2, noting that Qx − Py = 1. x2 y2 13.5.12 Example. The ellipse 2 + 2 = 1 has parametrization x = a cos t, a b y = b sin t, 0 ≤ t ≤ 2π. Therefore, the area inside the ellipse is Z 1 2π ab(cos2 t + sin2 t) dt = πab. ♦ 2 0

The Piecewise Smooth Case Both Stokes’s theorem and the divergence theorem may be extended to more general surfaces called piecewise smooth. In the case n = 3, these are finite unions of smooth surfaces S1 , . . ., Sk that fit together so that • no three surfaces meet in more than a single point, and • the common boundary of two of these surfaces consists of finitely many disjoint piecewise smooth simple closed curves. S3 S5

S3 S2

S4 S2

S1 S1

FIGURE 13.11: Piecewise smooth surfaces. R (See Figure 13.11.) If S is such a surface, then the surface integral S f dS is Pk R defined as the sum j=1 Sj f dS. The integral of a form on S has an analogous definition. These definitions are reasonable since, by cancelations, the common

490

A Course in Real Analysis

boundary of a pair of surfaces contributes nothing to the integral. We illustrate the basic idea with the simple example of a cube. Removing a face of the cube results in a surface-with-boundary Q, which we orient by the outward normal. If Stokes’s theorem is applied to each of the five faces and the results are added, the integrals along the boundaries cancel and one is left with Stokes’s theorem for Q: Z Z ~ ~ ~ dS. F · dr = curl F~ · N ∂Q

Q

Q

∂Q FIGURE 13.12: Oriented cube without bottom face. Similarly, Green’s theorem extends to regions in R2 whose boundaries are only piecewise smooth. This, of course, leads to extended versions of its corollaries. Here’s an application of the extended version of 13.5.11: 13.5.13 Example. Let ∂S be a closed polygon consisting of m line segments Li := [(ai , bi ) : (ai+1 , bi+1 )], i = 1, 2, . . . , m, where (am+1 , bm+1 ) = (a1 , b1 ) and the vertices are in counterclockwise order. (See Figure 13.13.)

(a4 , b4 ) L4 (a5 , b5 )

L3 (a3 , b3 )

L2 (a2 , b2 )

L5 L1 (a1 , b1 ) FIGURE 13.13: Closed polygon.

Integration on Surfaces

491

Then Li has the parametrization x = (1 − t)ai + tai+1 , y = (1 − t)bi + tbi+1 , 0 ≤ t ≤ 1, hence Z

(x dy − y dx) = (bi+1 − bi )

Z

Li

1

(1 − t)ai + tai+1 dt

0

− (ai+1 − ai )

Z

1

(1 − t)bi + tbi+1 dt

0

= ai bi+1 − ai+1 bi . Therefore,

m

area(S) =

1X (ai bi+1 − ai+1 bi ). 2 1=1

♦

Exercises 1.S Verify directly the following version of Stokes’s theorem Z [f dx + g dy + h dz] ∂S Z = (hy − gz ) dy ∧ dz + (fz − hx ) dx ∧ dz + (gx − fy ) dx ∧ dy , S

where S is the cylinder (x, y, z) : x2 + y 2 = 1, 0 ≤ z ≤ 1 . 2. For (x, y) 6= (0, 0) define P (x, y) =

−y dx x2 + y 2

and Q(x, y) =

x dy . x2 + y 2

Show that (a) Qx = Py . R (b) ϕr P dx + Q dy = 2π, where ϕr (t) = (r cos t, r sin t), 0 ≤ t ≤ 2π. R (c) ψ P dx + Q dy = 2π, where ψ is any piecewise smooth, clockwise oriented, simple closed curve enclosing (0, 0). Z 2π cos2m t sin2m t 2π + (d) 4m+2 dt = (2m + 1)ab , m ∈ Z , a, b > 0. 2 4m+2 2 a cos t + b sin t 0 3. Let 0 < r < R and let S = (x, y) : r2 ≤ x2 + y 2 ≤ R2 . Verify Green’s

492

A Course in Real Analysis theorem on S for (a) S P (x, y) = p

−y + y

x2

(b) P (x, y) = p

y2

x2 + y 2 x (c) P (x, y) = 2 , x + y2

,

Q(x, y) = p

,

Q(x, y) = p

x + y2 x

x2

x2 + y 2 −y Q(x, y) = 2 . x + y2

. .

4. Use Green’s theorem to evaluate the following integrals, where the curves C have counterclockwise orientation. Z (a) sin(x − y) dx + sin(x + y) dy , C = bd [0, π/2] × [0, π/2] . ZC −xy (b) e dx + exy dy , C = bd [0, 1] × [0, 1] . ZC (c) cos(xy) dx + sin(xy) dy , C = bd [0, 1] × [0, 1] . ZC (d)S f (x) dx + g(y) dy , where f and g are C 1 and C is simple, closed, C

and piecewise C 1 . 5.S Use 13.5.11 to show that the area enclosed by the “elliptical astroid”

x2 a2

1/(2m+1) 2 1/(2m+1) y + = 1, a > 0, b > 0, m ∈ Z+ , b2

is given by Z

π/2

β

cos2m t + sin2m t) dt =

0

βπ (2m − 1)(2m − 3) · · · 5 · 3 , 2 2m(2m − 2) · · · 4 · 2

where β := 4−m ab(m + 21 ). (See 5.3.4.) 6. Let E be a regular region in Rn and let f and g be C 2 on cl(E). Prove Green’s formulas: Z Z ~ (a) f ∇g · n dS = ∇f · ∇g + f ∇2 g dx. E

bd(E)

(b)

Z

(f ∇g − g∇f ) · ~n dS =

f ∇2 g − g∇2 f dx,

E

bd(E)

where ∇2 f :=

Z

n X ∂2f i=1

∂x2i

, the Laplacian of f .

Integration on Surfaces

493

7. A C 2 function f is said to be harmonic on set S ⊆ Rn if ∇2 f = 0 on an open set containing S. R (a) Show that if f is harmonic on the ball Cr (0), then Sr (0) ∇f · ~n dS = 0. (b) Show that if f and g are harmonic on the region cl(E) of 13.5.6 and ~nt = kxk−1 x on St := St (0), then Z Z ~ ∇f · n1 dS = ∇f · ~n2 dS S1

and

Z

S2

(g ∇f − f ∇g) · ~n1 dS =

S1

Z

(g ∇f − f ∇g) · ~n2 dS.

S2

8. Let E ⊆ Rn be a regular region and let f be harmonic on cl(E) (Exercise 7). Show that Z Z 2 k∇f k dx = f ∇f · ~n dS, E

bd(E)

where ~n is the outer normal. Deduce that if f = 0 on bd(E) and E is connected, then f = 0 on E. 9.S Let E ⊆ Rn be a regular region and let f and g be harmonic on cl(E) (Exercise 7). Show that Z Z (f ∇g + g∇f ) · ~n dS = 2 ∇f · ∇g dx, E

bd(E)

where ~n is the outer normal. 10. Let n > 2. For t > 0, let Ct = Ct (0), St = St (0), and ~nt (x) = kxk−1 x, the outer normal to St . Suppose f is harmonic on Cr (Exercise 7). Prove the average value property of harmonic functions Z 1 f dS f (0) = area(Sr ) Sr by verifying (a)–(f) for 0 < t ≤ r. (Refer to 13.4.2.) (a) The function g(x) := kxk2−n , x 6= 0, is harmonic. Z Z 2−n (b) f ∇g · ~nt dS = n−1 f dS. t St St Z (c) g∇f · ~nt dS = 0. St

(d)

1 tn−1

Z St

f dS =

1 rn−1

Z f dS. Sr

494

A Course in Real Analysis Z 1 1 (e) f dS = f dS. area(Sr ) Sr area(St ) St Z 1 f dS = f (0). (f) lim t→0 area(St ) S t Z

11. Let E be a region as in the statement of Green’s theorem. For the functions ψ in (a) and (b) below, prove that if the conclusion of Green’s theorem holds for ψ(E), then it holds for E. (This is a special case of the statement in the proof of the divergence that the region E may be rotated and translated without loss of generality.) (a) ψ is the translation ψ(x, y) = (x + x0 , y + y0 ). (b) ψ is the rotation ψ(x, y) = x cos θ − y sin θ, x sin θ + y cos θ . S1

S1

S2

C

C

S2 (a)

(b)

FIGURE 13.14: Surfaces S1 and S2 with common boundary C. 12.S Orient the surfaces S1 and S2 in (a) and (b) of Figure 13.14 by their outer normals ~n. Show that in Z Z Z (a), curl F~ · ~n dS = 0; (b), curl F~ · ~n dS = curl F~ · ~n dS. S1 ∪S2

S1

S2

13. Let a ∈ Rn , n > 2, and define an (n − 1) form ω on Rn+1 \ {a} by ωx = kx − ak−n

n X

ci ∧ · · · ∧ dxn . (−1)i−1 (xi − ai ) dx1 ∧ · · · ∧ dx

i=1

Show that dω = 0. Conclude that if S is Ra compact, oriented n-surfacewith-boundary in Rn+1 and a 6∈ S, then ∂S ω = 0. 14.S Use the divergence theorem and 11.5.6 to show that the area of the sphere Sr (0) is nrn−1 αn , derived by another method in 13.4.2. 15. Let E ⊆ Rn be a regular region and a ∈ E. Define f on Rn \ {a} by

Integration on Surfaces

495

f (x) = kx − ak2−n . Show that div ∇f = 0. Conclude that if Cr (a) ⊆ E, then Z Z (∇f ) · ~n dS = (∇f ) · ~n dS = (2 − n)nαn , bd(E)

Sr (a)

where ~n denotes the outer normals.

Closed Forms in Rn

*13.6

13.6.1 Definition. A C 1 m-form ω on an open subset W of Rn is said to be closed if d ω = 0. The form ω is exact if there exists a C 2 (m − 1)-form η on W such that d η = ω. ♦ By 13.1.16(b), an exact form is closed. The converse is false (see Exercise 13.5.2). However, there is a general class of regions on which every closed m-form is exact. We consider first the case m = 1.

Closed 1-Forms on Simply Connected Regions 13.6.2 Definition. An open connected subset U of Rn is said to be simply connected if for each closed C 2 curve ϕ : [a, b] → Rn in U there exists a C 2 function Φ : [a, b] × [0, 1] → U such that for all s ∈ [0, 1] and t ∈ [a, b], Φ(t, 1) = ϕ(t), Φ(t, 0) = ϕ(a) = ϕ(b), and Φ(a, s) = Φ(b, s).

♦

The function Φ is called a (C 2 ) homotopy between ϕ and the point p : ϕ(a) = ϕ(b).

s 1

Φ( · , 1)

s

Φ( · , s)

a

b

Φ( · , 0) t

q p

FIGURE 13.15: Curves contracting to p must pass through q. Note that, for each s ∈ [0, 1], Φ(·, s) is a closed C 2 curve in U such that

496

A Course in Real Analysis

Φ(·, 1) = ϕ and Φ(·, 0) is a single point p. Thus a simply connected region U has the property that every closed curve in U may be contracted smoothly to a point while remaining in U (see Figure 13.15). In R2 this means that there are no “holes” in U . In higher dimensions a simply connected set may have holes. For example, Rn \ C1 (0) is simply connected if n ≥ 3. However, the holes may not be too large: the set R3 \ L, where L is a line, is not simply connected. To prove that every closed 1-form of class C 2 on a simply connected set is exact, we follow [5]. 13.6.3 Lemma. Let ω be a closed 1-form on a simply connected subset U of R Rn . Then ϕ ω = 0 for each closed C 2 curve ϕ in U . Pn Proof. Let ω = j=1 fj dxj and let Φ : [a, b] × [0, 1] → U be a homotopy as in 13.6.2. By hypothesis, 0 = dω =

n X n X

∂i fj dxi ∧ dxj =

j=1 i=1

hence

X

(∂i fj − ∂j fi )dxi ∧ dxj ,

1≤i 0 c on (−1, 1) and h = R 0 on (−1, 1) . Multiplying h by a positive constant, we may assume that R h = 1. Let R hk (x) = kh(kx), k = 1, 2, . . . . Then hk ≥ 0, hk (x) = 0 for |x| ≥ 1/k, and R hk = 1. Define a C ∞ function gk on R by gk (x) =

Z

∞

ϕ (y)hk (x − y) dy = 0

Z

−∞

1/k

ϕ0 (x + y)hk (y) dy.

−1/k

The sequence {gk } is uniformly bounded since Z ∞ Z |gk (x)| ≤ |ϕ0 (x + y)|hk (y) dy ≤ M −∞

By periodicity, Z

∞

hk (y) dy = M.

−∞

1

ϕ0 (x + y) dx =

Z

0

1

ϕ0 (x) dx = ϕ(1) − ϕ(0) = 0

0

(Exercise 5.3.1), hence, by Fubini’s theorem, Z 1 Z ∞ Z 1 gk (x) dx = hk (y) ϕ0 (x + y) dx dy = 0. −∞

0

0

Now define ϕk on R by ϕk (x) = ϕ(0) +

Z

x

gk (y) dy.

0

Then (a) and (d) hold and (b) follows from Z 1/k 0 0 0 0 ϕk (x) − ϕ (x) = gk (x) − ϕ (x) = ϕ (x + y) − ϕ0 (x) hk (y) dy, −1/k

which tends to 0 at continuity points x as k → +∞. Finally, (c) follows from (b), the inequality Z t Z 1 |ϕk (t) − ϕ(t)| ≤ |ϕ0k (x) − ϕ0 (x)| dx ≤ |ϕ0k (x) − ϕ0 (x)| dx, 0

0

and Lebesgue’s dominated convergence theorem, noting that the set of discontinuity points of ϕ0 is finite and hence has measure zero.

Integration on Surfaces

499

13.6.5 Theorem. Let ω be a closed 1-form on a simply connected subset U of Rn . Then ω is exact. R Proof. By 12.2.10 it suffices to show that ϕ ω = 0 for every piecewise C 1 closed curve ϕ : [0, 1] → U . Let {ϕk } be as in 13.6.4. Since ϕk → ϕ uniformly on [0, 1] and ϕ([0, 1]) ⊆ U , it follows R that ϕk ([0, 1]) ⊆ U for all sufficiently large k (Exercise 8.5.22). For such k, ϕk ω = 0 by 13.6.3. By (b) and (c) of 13.6.4, Lebesgue’s dominated convergence theorem, and the definition of integral of a R R R form (13.16), ϕk ω → ϕ ω. Therefore, ϕ ω = 0, as required.

Closed m-Forms on Star-Shaped Regions 13.6.6 Definition. A subset W of Rn is said to be star-shaped with respect to y ∈ W if the line segment from y to any point x ∈ W lies in W : y + t(x − y) ∈ W, 0 ≤ t ≤ 1.

♦

For example, a convex set is star-shaped with respect to every one of its points. In Figure 13.17, W is star-shaped with respect to y but not z, and V is not star-shaped with respect to any of its points.

z

x

x

y

y W

V

FIGURE 13.17: Star-shaped and non-star-shaped regions. 13.6.7 Poincaré’s Lemma. Let W ⊆ Rn be open and star-shaped with respect to some y ∈ W . If ω is a closed C 1 m-form on W , where 1 ≤ m ≤ n, then ω is exact. Proof. Define a function ψ : [0, 1] × W → W by ψ(t, x) = y + t(x − y). For an r-form X η= gj dxj j∈Jr

on W , define the (r − 1)-form ηe on W by X Z 1 r−1 ηex = t (gj ◦ ψ)(t, x) dt η j , where j∈Jm

η j :=

r X i=1

0

dj ∧ · · · ∧ dxj , j = (j1 , . . . , jr ). (−1)i−1 (xji − yji ) dxj1 ∧ · · · ∧ dx i r

500

A Course in Real Analysis

A standard argument shows that the definition of ηe is independent of the choice of representation of η. In particular, by putting η in canonical form we see that η = 0 ⇒ ηe = 0. Furthermore, dη j =

r X

dj ∧ · · · ∧ dxj = r dxj . (−1)i−1 d(xji − yji ) dxj1 ∧ · · · ∧ dx i m

i=1

Now let

ω=

X

fj dxj .

j∈Jm

Then

X Z

ω e=

j∈Jm

1

t

m−1

(fj ◦ ψ)(t, x) dt ωj ,

0

and, by 13.1.16(d) (suppressing the variables (t, x) in fj ◦ ψ(t, x)), Z 1 X Z 1 m−1 m−1 dω e= d t fj ◦ ψ dt ∧ ωj + t fj ◦ ψ dt dωj . 0

j∈Jm

0

Differentiating under the integral sign, applying the chain rule, and noting that ψx = tIn , we have Z 1 X n Z 1 m−1 m d t (fj ◦ ψ) dt = t (∂i (fj ) ◦ ψ) dt dxi . 0

i=1

0

Therefore, using dωj = m dxj , ( n Z ) Z 1 1 X X tm (∂i fj ) ◦ ψ dt dxi ∧ ωj + m tm−1 fj ◦ ψ dxj . dω e= i=1

j∈Jm

0

0

(13.33) On the other hand, dω =

X

n X

j∈Jm

i=1

hence, since dω = 0, n Z X X j∈Jm i=1

1

! ∂i fj dxi

∧ dxj =

n X X

∂i fj dxi ∧ dxj ,

j∈Jm i=1

f = 0. t (∂i (fj ) ◦ ψ)(t, x) dt (dω)(i,j) = dω m

0

By the above definition, (dω)(i,j) =

m X dj ∧ · · · ∧ dxj (−1)` (xj` − yj` ) dxi ∧ dxj1 ∧ · · · ∧ dx m ` `=1

+ (xi − yi ) dxj1 ∧ · · · ∧ dxjm = − dxi ∧ ωj + (xi − yi ) dxj ,

Integration on Surfaces

501

hence =

n Z X X j∈Jm i=1

1

tm (∂i fj ) ◦ ψ dt − dxi ∧ ωj + (xi − yi ) dxj = 0.

(13.34)

0

Adding (13.33) and (13.34), we obtain ( Z X ) Z 1 n 1 X m−1 m t (fj ◦ ψ) + dxj . dω e= m (xi − yi ) t (∂i fj ) ◦ ψ dt j∈Jm

0

i=1

0

The term in braces is simply Z 1 1 d m [t fj ◦ ψ] dt = tm fj ◦ ψ 0 = fj . 0 dt Therefore, d ω e = ω, which shows that ω is exact. From Poincaré’s lemma we obtain the following results from classical vector analysis, where, in keeping with the spirit, we write grad f for ∇f . 13.6.8 Corollary. Let W be an open star-shaped subset of R3 and let F~ (x, y, z) = P (x, y, z), Q(x, y, z), R(x, y, z) be a C 1 vector field on W . Then (a) curl F~ = 0 iff F~ = grad f for some C 2 function f : W → R. ~ for some C 2 vector field G ~ on W . (b) div F~ = 0 iff F~ = curl G Proof. (a) If F~ = grad f = (fx , fy , fz ), then curl F~ = (fzy − fyz , fxz − fzx , fyx − fxy ), which is zero because f is C 2 . Conversely, assume that curl F~ = 0, that is, Ry − Qz = Pz − Rx = Qx − Py = 0. Let ω = P dx + Q dy + R dz. Then dω = (Py dy + Pz dz) ∧ dx + (Qx dx + Qz dz) ∧ dy + (Rx dx + Ry dy) ∧ dz = (Qx − Py ) dx ∧ dy + (Rx − Pz ) dx ∧ dz + (Ry − Qz ) dy ∧ dz = 0 so ω is closed. By Poincaré’s lemma, there exists a 0-form f of class C 2 on W such that df = ω, that is, grad f = F~ . ~ where G ~ = (f, g, h), then (b) If F~ = curl G, P = hy − gz , Q = fz − hx , and R = gx − fy ,

502

A Course in Real Analysis

hence, if G is C 2 , div F~ = Px + Qy + Rz = (hyx − gzx ) + (fzy − hxy ) + (gxz − fyz ) = 0. Conversely, assume div F~ = 0 and let ω = R dx ∧ dy + P dy ∧ dz + Q dz ∧ dx. Then dω = div F~ dx ∧ dy ∧ dz, hence ω is closed. By Poincaré’s lemma, ω = d(f dx+g dy+h dz) = (gx −fy ) dx∧ dy+(hx −fz ) dx∧ dz+(hy −gz ) dy∧ dz for some C 2 functions f , g, h on W . Therefore, P = hy − gz , that is, F~ = curl (f, g, h).

Q = fz − hx ,

R = gx − fy ,

Part III

Appendices

Appendix A Set Theory

In this appendix we give an overview of those aspects of elementary set theory that are used throughout the book. For details the reader may wish to consult [2, 8].

Notation for a Set A set is simply a collection of objects, each of which is called a member or element of the set. Sets are usually denoted by capital letters, and members of a set by small letters. If x is a member of the set A, we write x ∈ A; otherwise, we write x 6∈ A. The empty set, denoted by ∅, is the set with no members. A concrete set may be described either by listing its elements or by setbuilder notation. The latter notation is of the form {x : P (x)}, which is read “the set of all x such that P (x),” where P (x) is a well-defined property that x must possess in order to belong to the set. For example, the set A of all odd positive integers may be described as A = {1, 3, 5, . . .} = {n : n = 2m − 1 for some positive integer m}. A set A is a subset of a set B, written A ⊆ B, if every member of A is a member of B. If A ⊆ B and A 6= B, then A is called a proper subset of a set B. The empty set is a subset of every set and a proper subset of every nonempty set. Sets A and B are said to be equal, written A = B, if each is a subset of the other. If all sets under consideration are subsets of the set S, then S is called a universal set (of discourse).

Set Operations Let S be a universal set. The basic set operations are A∪B A∩B A×B Ac A\B

= = = = =

{x : x ∈ A or x ∈ B}, {x : x ∈ A and x ∈ B}, {(x, y) : x ∈ A and y ∈ B}, {x : x ∈ S and x 6∈ A}, {x : x ∈ A and x 6∈ B},

union of A and B; intersection of A and B; Cartesian product of A and B; complement of A in S; difference of A and B.

More generally, if {Ai : i ∈ I} is an arbitrary collection of sets indexed by a 505

506

A Course in Real Analysis

set I, then the union and intersection of the collection are defined, respectively, by [ Ai = {x : x ∈ Ai for some i ∈ I}, i∈I

\

Ai = {x : x ∈ Ai for every i ∈ I}.

i∈I

If the index set is {1, 2 . . . , n} or {1, 2, . . . , n, . . .}, we use the alternate notation n [

Aj = A1 ∪ A2 ∪ · · · ∪ An ,

j=1

n \

Aj = A1 ∩ A2 ∩ · · · ∩ An

j=1

and

∞ [

Aj = A1 ∪ A2 ∪ . . . ,

∞ \

Aj = A1 ∩ A2 ∩ . . .

j=1

j=1

A sequence of sets An is said to be increasing if A1 ⊆ A2 ⊆ · · · , in which case we write An ↑. Similarly, the sequence is decreasing if A1 ⊇ A2 ⊇ · · · , written An ↓. In the first case we also write An ↑ A, where A = A1 ∪ A2 ∪ · · · , and in the second An ↓ A, where A = A1 ∩ A2 ∩ · · · . For finitely many sets we extend the definition of Cartesian product by n Y

Aj = A1 × · · · × An = {(a1 , . . . , an ) : aj ∈ Aj , j = 1, . . . n},

j=1

where (a1 , . . . , an ) is an (ordered) n-tuple. Also, we write An = A × A · · · × A . {z } | n

In particular, for an interval [a, b] and the set of all real numbers R, [a, b]n = [a, b] × · · · × [a, b] and Rn = R × · · · × R . | {z } {z } | n

n

The following propositions summarize the basic properties of set operations that will be needed in the text. As with many set equalities, they may be established directly by showing that an arbitrary member of the left side of an equation is a member of the right side, and vice versa. Proposition. If {Ai : i ∈ I} is collection of subsets of a set S, then \ \ [ c \ (a) Ai = Aci . (b) A ∪ Ai = A ∪ Ai . i∈I

(c)

\ i∈I

i∈I

Ai

c

=

[ i∈I

i∈I

Aci .

(d) A ∩

[ i∈I

i∈I

Ai =

[ i∈I

A ∩ Ai .

Set Theory

507

Parts (a) and (c) of the above proposition are known as DeMorgan’s laws. Parts (b) and (d) are called distributive laws. Proposition. The Cartesian product of sets has the following properties: (a) A × A1 ∪ · · · ∪ An = (A × A1 ) ∪ · · · ∪ (A × An ). (b) A × A1 ∩ · · · ∩ An = (A × A1 ) ∩ · · · ∩ (A × An ). (c) A1 ∩ · · · ∩ An × B1 ∩ · · · ∩ Bn = (A1 × B1 ) ∩ · · · ∩ (An × Bn ).

Partitions and Equivalence Relations A collection of sets is pairwise disjoint if A ∩ B = ∅ for each pair of distinct members A and B in the collection. A partition of a set S is a collection of nonempty pairwise disjoint sets whose union is S. An equivalence relation on a set S is a subset R of S × S with the following properties: • (reflexivity) xRx for every x ∈ S; • (symmetry) xRy ⇒ yRx; • (transitivity) xRy and yRz ⇒ xRz. Here, as is customary, we have written xRy for (x, y) ∈ R. There is an important duality regarding partitions and equivalence relations: If R is an equivalence relation on S, then the collection of sets of the form [x] := {y ∈ S : xRy}, called an equivalence class of the relation, is a partition of S. Conversely, given a partition of S, define xRy iff x and y are in the same partition member. Then R is an equivalence relation on S whose equivalence classes are precisely the members of the partition.

Functions Let A and B be nonempty sets. A function or mapping from A to B is a rule f that assigns to each member x of A a unique member f (x) of B. We then write f : A → B. The set A is called the domain of f . The alternate notation x 7→ f (x) : A → B is also used. If A0 ⊆ A and B0 ⊆ B, then f (A0 ) = {f (x) : x ∈ A0 } and f −1 (B0 ) = {x ∈ A : f (x) ∈ B0 } are called, respectively, the image of A0 and the pre-image of B0 under f . The set f (A) is called the range of f . A function f : A → B is said to be onto B if f (A) = B, and one-to-one if x1 6= x2 implies f (x1 ) 6= f (x2 ).

508

A Course in Real Analysis

Proposition. Let f : A → B be a function, {Ai : i ∈ I} a collection of subsets of A, and {Bj : j ∈ J} a collection of subsets of B. Then [ [ (a) f −1 Bj = f −1 (Bj ). j∈J

(b) f −1

\

j∈J

Bj =

j∈J

(c) f

[ \

Ai =

(f) (g) (h)

[

f (Ai ).

i∈I

\ Ai ⊆ f (Ai ), where equality holds if f is one-to-one.

i∈I

(e)

f −1 (Bj ).

j∈J

i∈I

(d) f

\

i∈I

c f = f −1 (Bj ) . c f (Aci ) ⊆ f (Ai ) , where equality holds if f is onto B. f f −1 (Bj ) ⊆ Bj , where equality holds if f is onto B. Ai ⊆ f −1 f (Ai ) , where equality holds if f is one-to-one. −1

(Bjc )

If f : A → B and g : C → D are functions with B ⊆ C, then the composition of g and f is the function g ◦ f : A → D defined by (g ◦ f )(x) = g f (x) , x ∈ A. If D0 ⊆ D, then

(g ◦ f )−1 (D0 ) = f −1 g −1 (D0 ) .

If f : A → B is one-to-one and onto B, then the inverse f −1 : B → A is defined by the rule x = f −1 (y) iff y = f (x). One then has the identities (f −1 ◦ f )(x) = x and (f ◦ f −1 )(y) = y, x ∈ A, y ∈ B. Thus f −1 ◦ f and f ◦ f −1 are the identity functions on A and B, respectively.

Cardinality Two sets A and B are said to have the same cardinality if there exists a one-to-one function from A onto B. A set A is finite if either A is the empty set or A has the same cardinality as {1, 2, . . . , n} for some positive integer n. In the latter case, the members of A may be labeled with the numbers 1, 2, . . . , n, so A may be written {a1 , a2 , . . . , an }. A set A is countably infinite if it has the same cardinality as the set of natural numbers. In this case the members of A may be labeled with the positive integers 1, 2, 3, . . . A set is countable if it is either finite or countably infinite; otherwise it is said to be uncountable. The set of all integers is countably infinite, as is the set of rational numbers. The set R of all real numbers is uncountable, as is any (nondegenerate) interval of real numbers.

Appendix B Linear Algebra

This appendix contains a brief review of the main ideas of linear algebra that will be needed in Part II of the text. For details and proofs the reader is referred to [9].

Vector Spaces. Bases A vector space is a set V containing at least one member 0, called the zero vector, together with two operations u + v and au (u, v ∈ V, a ∈ R), called vector addition and scalar multiplication, respectively, such that for all u, v, w ∈ V and a, b ∈ R the following axioms hold: • Associativity of addition: (u + v) + w = u + (v + w). • Commutativity of addition: u + v = v + u. • Additive identity: v + 0 = v. • Existence of additive inverse: u + (−u) = 0. • Associativity of scalar multiplication: (ab)u = a(bu). • Scalar distributivity: a(u + v) = au + av. • Vector distributivity: (a + b)u = au + bu. • Scalar multiplicative identity: 1u = u. A subset W of V containing the zero vector and closed under the operations of vector addition and scalar multiplication is called a subspace of V. The set W is then a vector space under the operations it inherits from V. A linear combination of vectors v 1 , . . . , v n ∈ V is an expression of the form c1 v 1 + · · · + cn v n , cj ∈ R. The set of all linear combinations of v 1 , . . . , v n is called the linear span of v 1 , . . . , v n or the subspace spanned by v 1 , . . . , v n . The vectors v 1 , . . . , v n are then said to span V. Vectors v 1 , . . . , v n ∈ V are linearly independent if an equation of the form c1 v 1 + · · · + cn v n = 0 509

510

A Course in Real Analysis

can hold only if c1 = · · · = cn = 0. A basis for V is a finite set of linearly independent vectors that span V. It follows that each member of V is uniquely expressible as a linear combination of the basis vectors. A vector space that has a basis is said to be finite dimensional; otherwise it is infinite dimensional. All bases in a finite dimensional vector space V have the same number of vectors. This number is called the dimension of the vector space and is denoted by dim V. A frame for a finite dimensional vector space is an ordered basis. If V is finite dimensional, then every set of linearly independent vectors may be extended to a basis, and every finite set of vectors that span V may be reduced to a basis. An important example of a finite dimensional vector space is Euclidean space Rn (Section 1.6). The standard basis in Rn consists of the n vectors e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), en = (0, 0, . . . , 0, 1). An example of an infinite dimensional vector space is the set of all Riemann integrable functions on [a, b] with the operations of pointwise addition and scalar multiplication. A basis {w1 , . . . , wm } for a subspace W of Rn is orthonormal if ( 0 if i 6= j, wi · wj = 1 if i = j, where (·) is the usual inner (= dot) product on Rn . For example, the standard basis is orthonormal. Every subspace of Rn has an orthonormal basis.

Linear Transformations Let U and V be vector spaces. A linear transformation from U to V is a function T : U → V with the properties T (u + v) = T u + T v and T (cu) = cT u, u, v ∈ U, c ∈ R. Here, we have used the convention for linear transformations of dropping the parentheses in the notation T (u) when there is no danger of ambiguity. The collection of all linear transformations from U to V is denoted by L(U , V). It is a vector space under the operations T1 + T2 and cT defined by (T1 + T2 )(u) = T1 u + T2 u, (cT )u = c(T u),

u ∈ U , c ∈ R.

If T ∈ L(U, V) and S ∈ L(V, W), then ST := S ◦ T is a member of L(U, W). Also, the subspace N (T ) := T −1 ({0}) of U is called the nullspace of T . The range of T , which is a subspace of V, is denoted by R(T ). If U and R(T ) are finite dimensional, then dim N (T ) + dim R(T ) = dim U.

Linear Algebra

511

If T ∈ L(U, V) is one-to-one and onto V, then T −1 ∈ L(V, U). In this case T is said to be invertible. If U and V are finite dimensional, then T is invertible iff N (T ) = {0} iff R(T ) = V. In this case T maps a frame (u1 , . . . , un ) in U onto a frame (v 1 , . . . , v n ) in V, where v j = T uj . We indicate this by writing T (u1 , . . . , un ) = (v 1 , . . . , v n ).

Matrices An m × n matrix is a rectangular array of real numbers with m rows and n columns. It is written variously as 1 a1 a21 · · · an1 a1 a12 a22 · · · an2 a2 A = [aji ]m×n = . .. .. = .. = a1 a2 · · · an , . . . ··· . . a1m

a2m

···

anm

am

where ai = (a1i , · · · , ani ) is the ith row of A and aj = (aj1 , · · · , ajm ) is the jth column of A (written, of course, as a column). The number aji located in row i and column j of the matrix is also written aij and is called the (i, j)th entry of A. For a ∈ R and matrices A = [aji ]m×n , B = [bji ]m×n , and C = [aji ]n×p , the sum A + B, scalar multiple aA, and product AC are defined, respectively, by A+B = [xji ], xji := aji +bji , aA = [yij ], yij := aaji , AC = [zij ], zij :=

n X

aki cjk .

k=1

The product AC may also be written as a1 a2 .. c1 c2 · · · cp n×p = [ai · cj ]m×p . . am

m×n

The m × n matrix Om×n with all entries equal to 0 is called a zero matrix. It has the property that A+Om×n = A for all m×n matrices A. The collection of m × n matrices is a vector space under the operations A + B and aA and with zero Om×n . The transpose of an m × n matrix A is the n × m matrix At := [xji ], where j xi = aij . For example t 1 4 1 2 3 = 2 5 . 4 5 6 3 6 The transpose operation has the following properties: (A + B)t = At + B t , (aA)t = aAt , (AC)t = C t At .

512

A Course in Real Analysis For each n, the matrix 1 0 In := . .. 0

0 ··· 1 ··· .. . ··· 0 ···

0 0 .. . 1

is called the nth order identity matrix. It has the property that AIn = A and In B = B for all m × n matrices A and all n × p matrices B. An n×n matrix A is said to be nonsingular if there exists a matrix, denoted by A−1 and called the inverse of A, such that AA−1 = A−1 A = In . The inverse operation has the property (AB)−1 = B −1 A−1 for all nonsingular n × n matrices A and B. An m × n matrix A is said to be in reduced row echelon form if the following conditions hold: • Any nonzero row has its first entry equal to 1. This entry is then called the leading entry of the row. • If rows i and k are nonzero and i < k, then the leading entry of row i is to the left of the leading entry of row k. • Entries above and below a leading entry are zero. • Any zero row is below all nonzero rows. For example, the following matrix is 0 1 0 0 0 0 0 0

in reduced row echelon form: 0 3 0 1 7 0 . 0 0 1 0 0 0

For a given n, In is the only n × n matrix in reduced row echelon form without any zero rows. An elementary row operation on an m × n matrix A is one of the following: • Interchange a pair of rows. • Multiply a row by a nonzero scalar. • Add to one row a scalar multiple of another.

Linear Algebra

513

An elementary matrix is a matrix obtained from the identity matrix by an elementary row operation. Each elementary row operation on A may be achieved by multiplying A on the left by a suitable elementary matrix. For example, the multiplication 0 1 0 1 2 3 4 5 6 1 0 0 4 5 6 = 1 2 3 0 0 1 7 8 9 7 8 9 switches the first and second rows 1 0 0 1 2 1 0 4 0 0 1 7

of A, and the 2 3 1 5 6 = 6 8 9 7

multiplication 2 3 9 12 8 9

adds twice row one to row two. Using elementary operations, one may transform any m × n matrix A into reduced row echelon form R. It follows that there exists a sequence of elementary matrices Ej such that R = Ep Ep−1 · · · E1 A. The row rank (column rank) of a matrix A is the maximum number of linearly independent rows (columns) of A. The row rank of a matrix is always equal to the column rank. (This is clear for the reduced row echelon form.) The rank of a matrix is its row (= column) rank.

The Matrix of a Linear Transformation Let T ∈ L(Rn , Rm ). The matrix of T is defined by [T ] = T e1 T e2 · · · T en (where T ej is P written as a column). If T ej = (aj1 , aj2 , · · · , ajm ) and x = n (x1 , . . . , xn ) = j=1 xj ej , then, by linearity of T , T (x1 , x2 , . . . , xn ) =

n X

xj T e j =

j=1

=

n X j=1

n X

(aj1 xj , aj2 xj , · · · , ajm xj )

j=1

aj1 xj ,

n X j=1

aj2 xj , · · · ,

n X j=1

which may be written in column matrix form as 1 a1 a21 · · · an1 x1 a12 a22 · · · an2 x2 [T ]xt = . .. .. .. . .. . ··· . . 1 2 xn am am · · · anm

ajm xj ,

514

A Course in Real Analysis

Note that aji may be expressed as (T ej ) · ei . The operations of addition, scalar multiplication, and composition of linear transformations correspond to addition, scalar multiplication, and multiplication of matrices in the following way: If T, T 0 ∈ L(Rn , Rm ) and S ∈ L(Rm , Rp ), then [T + T 0 ] = [T ] + [T 0 ], [tT ] = t[T ] [ST ] = [S][T ]. In particular, if T ∈ L(Rn , Rn ), then T is invertible iff [T ] is nonsingular. An n × n matrix A is orthogonal if AAt = In , that is, if At = A−1 or, equivalently, det A = ±1. (See below.) A linear transformation T ∈ L(Rn , Rm ) is said to be orthogonal if [T ] is orthogonal.

Determinants A permutation of the n-tuple (1, . . . , n) is a one-to-one function σ mapping {1, . . . , n} onto itself. It is frequently denoted by (i1 , . . . , in ), where ik = σ(k). The permutation is said to be even or odd according as an even or odd number of adjacent interchanges are required to transform (i1 , . . . , in ) to (1, . . . , n) (or vice versa). For example, (3, 2, 1) is odd and (4, 3, 2, 1) is even. The sign of a permutation σ is defined by ( 1 if σ is even, (−1)σ = −1 if σ is odd. We then have (−1)στ = (−1)σ (−1)τ

and

(−1)σ

−1

= (−1)σ ,

where, as is customary, τ σ stands for τ ◦ σ. The determinant of an n × n matrix A = [aji ] is defined by 1 a1 a21 · · · an1 1 a2 a22 · · · an2 X σ(1) det A = . (−1)σ a1 · · · aσ(n) , .. .. := n . . . · · · . σ a1 a2 · · · an m

m

m

where the sum is taken over all permutations σ of (1, . . . , n). For example, a b c d = ad − bc, since (−1)(1,2) = 1 and (−1)(2,1) = −1. If T ∈ L(Rn , Rn ) we denote the determinant of the matrix of T by det T rather than by the more cumbersome det[T ]. The following theorem summarizes the main properties of determinants. Parts (a)–(f) follow directly from the above definition; part (g) is proved in Chapter 13.

Linear Algebra 515 Theorem. Let A = a1 · · · an be an n × n matrix and t ∈ R. Then (a) det a1 · · · taj · · · an = t det a1 · · · aj · · · an . (b) det a1 · · · aj + b · · · an = det a1 · · · aj · · · an + det a1 · · · b · · · an . (c) Interchanging two rows of A changes the sign of the determinant. (d) If A has a pair of duplicate rows, then det A = 0. (e) Adding a multiple of one row to another does not change the value of the determinant. (f) det At = det A. Thus any “row property” has a corresponding “column property.” (g) If B is an n × n matrix, then det(AB) = (det A)(det B). The following theorem is frequently useful in evaluating determinants. Theorem. Let A = [aij ] be an n × n matrix, and for each (i, j), let Aij denote the matrix obtained by removing row i and column j from A. Then for each fixed i and j, det A =

n X

(−1)i+k aik det Aik =

k=1

n X

(−1)k+j akj det Akj .

k=1

The first equality is called expansion along row i and the second expansion along column j. For example, expanding along row 1, a11 a12 a13 a21 a22 a23 = a11 a22 a23 − a12 a21 a23 + a13 a21 a22 . a32 a33 a31 a33 a31 a32 a31 a32 a33 a b = ad − bc may then be used to complete the evaluation. The formula c d For another example, consider Ip Cp×q Oq×p Dq×q = det D, obtained by successive expansion along the first column. The preceding theorem may be used to prove the following result. Theorem. Let A = [aij ] be an n × n matrix. Then A−1 exists iff det A = 6 0. In this case the (i, j) entry of A−1 is (−1)i+j

det Aji . det A

516

A Course in Real Analysis

The last theorem may be used to prove Cramer’s Rule: Consider a system of n equations in n unknowns, written in matrix form as Ax = b or explicitly as a11 a12 · · · a1n x1 b1 a21 a22 · · · a2n x2 b2 .. .. .. .. = .. . . . ··· . . . an1

an2

···

ann

xn

bn

If A is nonsingular, then the solution to the system is a11 · · · a1,j−1 b1 a1,j+1 · · · 1 a21 · · · a2,j−1 b2 a2,j+1 · · · xj = . .. .. .. det A .. ··· . . . an1 · · · an,j−1 bn an,j+1 · · ·

a1n a2n .. . . ann

Appendix C Solutions to Selected Problems

Section 1.2 1. (b) (ab) + (−a)b = a + (−a) b = 0 · b = 0, so uniqueness of the additive inverse implies −(ab) = (−a)b. A similar argument works for the second equality. (d) By (b), (−1)a = 1(−a) = −a. (f) Using commutativity and associativity of multiplication and the distributive law and 1.2.1(i), a/b + c/d = ab−1 (dd−1 ) + cd−1 (bb−1 ) = ad(b−1 d−1 ) + bc(b−1 d−1 ) = ad(bd)−1 + bc(bd−1 ) = (ad + bc)/(bd). 3. If s := r/x ∈ Q, then, by Exercise 2, x = r/s ∈ Q, a contradiction. Therefore, r/x ∈ I. The remaining parts have similar proofs. 1 n! n−1n−2 · · · = n . For (b), n n n n (2n)! = 2n(2n − 2)(2n − 4) · · · 4 · 2 (2n − 1)(2n − 3) · · · 3 · 1 = 2n n(n − 1)(n − 2) · · · 2 · 1 (2n − 1)(2n − 3) · · · 3 · 1 .

5. The left side of (a) is

8. f (k) = k 3 − (k − 1)3 = 3k 2 − 3k + 1.

Section 1.3 1. (c) Follows from a/b − c/d = (ad − bc)/bd. 4. If 0 < x < y, then multiplying the inequality by 1/(xy) and using (d) of 1.3.2 shows that 1/y < 1/x. If x < y < 0, then 0 < −y < −x, hence, by the first part, 1/(−x) < 1/(−y) so 1/x > 1/y. 6. (a) By Exercise 1.2.4, y n − xn = (y − x)

n X

y n−j xj−1 . Each term of the

j=1

sum is positive and less than y n−j y j−1 = y n−1 . Since there are n terms, part (a) follows. 8. a = ta + (1 − t)a < tb + (1 − t)b = b. 517

518

A Course in Real Analysis

10. If a > b, then x := a − b > 0 and a > b + x, contradicting the hypothesis. 13. (b) 0 ≤ (x − y)2 + (y − z)2 + (z − x)2 = 2(x2 + y 2 + z 2 ) − 2(xy + yz + xz). 14. Expand (x − a)2 ≥ 0 and divide by x. 18. If a ≤ x ≤ b, then x ≤ |b| and −x ≤ −a ≤ |a|, hence |x| ≤ max{|a|, |b|}. 21. Assume without loss of generality that S1 = S \{a1 , . . . , ak }, so min S1 = ak+1 . Each of the remaining sets Sj contains at least one of a1 , . . . , ak , hence min Sj ≤ ak < ak+1 , verifying the assertion.

Section 1.4 2. (a) sup = 12, inf = −12.

(b) sup = 1, inf = −1.

3. (c) sup = 10/3, inf = 3;

(d) sup =

(e) sup = +∞, inf = −∞. (i) sup =

1 2

+

√

2 4 ,

inf =

1 2

√

−

2 4 ;

√ 3+ 5 2 ,

inf = −∞;

(h) sup = 3, inf = 0; (m) sup = 4/3, inf = −1.

5. Let x, y ∈ A. Then ±(x−y) ≤ sup A−inf A, hence |x−y| ≤ sup A−inf A. Since |x|−|y| ≤ |x−y|, |x|−|y| ≤ sup A−inf A so |x| ≤ sup A−inf A+|y|. Since x was arbitrary, we have sup |A| ≤ sup A − inf A + |y|, hence sup |A| − sup A + inf A ≤ |y|. Since y was arbitrary, it follows that sup |A| − sup A + inf A ≤ inf |A|. 6. (b) Since x > 0, xa ≤ x sup A for all a ∈ A, hence sup (xA) ≤ x sup A. Replacing x by 1/x proves the inequality in the other direction. The infimum case is similar. √ √ √ 9. Let a < b and choose a rational r in (a − 2, b − 2). Then r + 2 is irrational and in (a, b). 12. (b) If n := bxc = −b−xc, then x − 1 < n ≤ x and x ≤ n < x + 1. This is possible only if x = n. The converse is trivial. (c) By definition −x − 1 < b−xc ≤ −x. m 1/n 14. Let x := (bm ) and y := b1/n . By definition, x is the unique positive h m in h 1/n n im solution of xn = bm . Since y n = b1/n = b = bm , x = y. 17. Let ` ≤ x ≤ u for all x ∈ A. By the Archimedean principle, there exist positive integers m and n such that −m < ` ≤ u < n. Set N = max{m, n}.

Solutions to Selected Problems 519 √ √ 20. For√any a ∈ N, if√r := n + a +√ n ∈ Q, then squaring both sides of n + a = r − n shows√that n ∈ Q and hence that n = j 2 for some j ∈ N (1.4.11). Then n + a ∈ Q, hence n = k 2 for some k ∈ N. Therefore, a = k 2 − j 2 = (k − j)(k + j). If a = 11, then k − j = 1 and j + k = 11 so n = 25. If a = 21, then either k − j = 1 and j + k = 21 or k − j = 3 and j + k = 7. The first choice leads to j = 10 and n = 100 and the second to j = 2 and n = 4.

Section 1.5 3. Let f (n) denote the sum on the left side of the equation and g(n) the sum on the right. Then f (1) = 1/2 = g(1). Now let n ≥ 1. Then f (n + 1) − f (n) =

2n+2 X k=1

g(n + 1) − g(n) =

2n

(−1)k+1 X (−1)k+1 1 1 − = − k k 2n + 1 2n + 2

2n+2 X k=n+2

k=1

2n X 1 1 1 1 1 − = + − . k k 2n + 2 2n + 1 n + 1 k=n+1

Since the right sides are equal, f (n) = g(n) ⇒ f (n + 1) = g(n + 1). 5.

25 3 3 n

6. (b)

−

500 X k=1

15 2 2 n

+ 16 n.

(4k 2 − 1) = 4

500 · 501 · 1001 − 500 = 167, 166, 500. 6

7. For n ≥ 1, let Q(n) be the statement P (n − 1 + n0 ). Then Q(1) = P (n0 ) is true. Assume Q(n) = P (n − 1 + n0 ) is true. Then Q(n + 1) = P (n + n0 ) is true. By mathematical induction, Q(n) = P (n − 1 + n0 ) is true for all n ≥ 1, that is, P (n) is true for every n ≥ n0 . 8. In each case, let f (n) be the left side of the inequality and g(n) the right side, and let P (n) : f (n) < g(n). Let n0 be the base value of n for which P (n) is true. It is straightforward to check that f (n0 ) < g(n0 ). Assume P (n) holds for some n ≥ n0 , so that f (n)/g(n) < 1. Then (a)

f (n + 1) 2n + 3 f (n) 1 = n+1 = + < 1. g(n + 1) 2 2g(n) 2n

(e)

2n+1 (n + 1)! 2 f (n + 1) f (n) = = < 1. n+1 g(n + 1) (n + 1) g(n) (1 + 1/n)n

9. Check that 6 < ln(6!). For the induction step, use (n + 1)! = (n + 1)n!. 13. Let gn denote the expression on the right in the assertion. One checks directly that g0 = g1 = 1. Let n ≥ 2 and assume that fj = gj for all

520

A Course in Real Analysis 2 ≤ j ≤ n. Then gn+1 − fn+1 = gn+1 − fn − fn−1 = gn+1 − gn − gn−1 1 1 = √ an+2 − an+1 − an + √ bn+2 − bn+1 − bn 5 5 bn 2 an 2 = √ (a − a − 1) + √ (b − b − 1) = 0. 5 5

15. The set of all nonnegative integers of the form m−qn, q ∈ Z, is nonempty (Archimedean principle), hence has a smallest member r = m − qn (well ordering principle). If r ≥ n, then 0 ≤ r − n = m − (q + 1)n < r, contradicting the minimal property of r. Therefore, m = qn + r has the required form. If also m = q 0 n + r0 , q 0 ∈ Z, and r0 ∈ {0, . . . n − 1}, then |q − q 0 |n = |r − r0 | < n, hence q 0 = q and r0 = r.

Section 1.6 1. x = c −

d · e − (b · c)(b · d) a, 1 − (a · b)(b · d)

y =e−

b · c − (a · b)(d · e) d. 1 − (a · b)(b · d)

2. (c) By the triangle inequality, ||x||2 = ||x − y + y||2 ≤ ||x − y||2 + ||y||2 , hence ||x||2 − ||y||2 ≤ ||x − y||2 . Similarly, ||y||2 − ||x||2 ≤ ||x − y||2 . 3. By 1.6.3, ||x1 + x2 + · · · + xk ||22 =

n X

xi · xj =

i,j=1

k X

xj · xj .

j=1

7. The hypotheses imply that n X j=1

x2j =

n X

yj2 = 1 and

j=1

n X

(xj + yj )2 = 4.

j=1

Pn Pn It follows that j=1 xj yj = 1 and j=1 (xj − yj )2 = 0. The same does not hold for || · ||∞ (take x = (−1, 1) and y = (1, 1)) or for || · ||1 (take x = (1, 0) and y = (0, 1)).

Section 2.1 1. (a) an = [a + b + (−1)n (b − a)]/2. 3. (b) If n ≥ 6, |(2n2 − n)/(n2 + 3) − 2| = |n + 6|/(n2 + 3) ≤ 2n/n2 = 2/n. Therefore, choose N ≥ min{6, 2/ε}. (e) |(2 + 1/n)3 − 8| = (2 + 1/n)2 + 2(2 + 1/n) + 4 /n ≤ 19/n, so choose any integer N > 19/ε.

Solutions to Selected Problems

521

5. Let r = pq −1 , p, q ∈ Z, q > 0. For all n ≥ q, n!r ∈ Z, hence sin(n!rπ) = 0. 7. Let A = {x1 , . . . , xp } and Aj = {n : an = xj }. One of these sets, say A1 , must have infinitely many members. Since |x1 − a| ≤ |x1 − an | + |an − a| and an → a, letting n → +∞ through A1 shows that x1 = a. We may therefore choose ε > 0 so that I := (a − ε, a + ε) contains no xj for j ≥ 2. Let N ∈ N such that an ∈ I for all n ≥ N . For such n, an = a. 8. (a) bn = (3an + 2bn − 3an )/2 → (c − 3a)/2. √ 9. (a) 2. (d) b/2 a. (g) −kak−1 .

(k) 1/2.

11. Use −r ≤ an − bn ≤ r and 2.1.4. 14. (a) Suppose first that r > 1. Set hn = r1/n − 1. By the binomial theorem, r = (1 + hn )n > nhn , hence, by the squeeze principle, hn → 0. If r < 1, consider 1/r. 17. an < ran−1 < r2 an−2 < · · · < rn−1 a1 → 0. For the example, take an = 21/n . 19. Choose N such that an − a < ε for all n ≥ N . For such n, 0 ≤ min{a1 , . . . , an } − a ≤ an − a < ε. Therefore, min{a1 , . . . , an } → a. The converse is false: consider an = 1 + (−1)n . 22. Suppose that c ≤ f (x) − x ≤ d for all x, so c + jx ≤ f (jx) ≤ djx. Summing and using Exercise 1.5.4, nc + xn(n + 1)/2 ≤

n X

f (jx) ≤ nd + xn(n + 1)/2,

j=1

hence c/n + x(1 + 1/n)/2 ≤ (1/n2 )

n X

f (jx) ≤ d/n + x(1 + 1/n)/2.

j=1

Letting n → +∞, we obtain (a). Part (b) is proved similarly.

Section 2.2 1. Since

a1/n a1/(n+1)

= a1/n(n+1) < 1 < b1/n(n+1) =

b1/n b1/(n+1)

,

a1/n is increasing and b1/n is decreasing. Each tends to 1 by Exercise 2.1.14.

522

A Course in Real Analysis

3. By results of Section 2.1, an = a(1/n + nb)−1 → 0 and nan = a(1/n2 + b)−1 → ab−1 . The condition an+1 < an is equivalent to (n2 + n)b > 1, which holds eventually. The condition (n+1)an+1 > nan is equivalent to the inequality (n + 1)2 > n2 . 3x + 4 1 = . Then f : [1, 2] → [1, 2], f is 2 + (1 + x)−1 2x + 3 increasing and f (am ) = am+2 . Since a1 , a2 ∈ [1, 2], an ∈ [1, 2] for all n.

7. Let f (x) = 1 +

Since a1 = 1, a2 = 3/2, a3 = 7/5 and a4 = 17/12, the inequalities a2n+2 < a2n and a2n+1 > a2n−1 hold for n = 1. Assume they hold for n = k. Then a2k+4 = f (a2k+2 ) < f (a2k ) = a2k+2

and

a2k+3 = f (a2k+1 ) > f (a2k−1 ) = a2k+1 , hence the inequalities hold for n = k + 1. Since the sequences {a2n } and {a2n+1 } are bounded and monotone, the monotone convergence theorem implies that a2n → a and a2n+1 → b for some a, b ∈√R. Letting n → +∞ √ in f (a2n ) = a2n+2 gives f (a) = a. Therefore, a = 2. Similarly, b = 2. √ √ √ 2 r ≥ 2x r, hence (x + r/x)/2 ≥ r. Therefore, an ≥ r. 9. For x > 0, √x + 2 2 For x ≥ r, x + r ≤ 2x , hence (x + r/x)/2 ≤ x. Therefore,√an ≥ an+1 . By the monotone convergence theorem, an → a for some a ≥ r. Letting n → +∞ in an = (an−1 √ + r/an−1 )/2, yields a = (a + r/a)/2, which has positive solution a = r.

Section 2.3 1. (a) 0, ±3/8. 2/k 3. (d) an = 1 +

(c) ±4, ±6, ±12, ±14. 2n+k −k 1 1 1+ → e. 2n + k 2n + k

5. If {an } lies in the set {x1 , . . . , xn }, then one of the sets {n : an = xj } must have infinitely many members and a subsequence may be constructed from these. P∞ 8. Given ε > 0, choose N so that n=N |an+k − an | < ε. For m > n ≥ N , |amk − ank | ≤ |amk − a(m−1)k | + · · · + |a(n+1)k − ank | < ε. Therefore, {ank }∞ n=1 is Cauchy.

Solutions to Selected Problems

523

10. Clearly an → 0 implies bn → 0. For the converse, suppose an 6→ 0. Choose ε > 0 and a subsequence such that ank ≥ ε > 0 for all k. Then 1 1 1 1 1 = bn k + ≤ b + , nk aqnk εq εq−p aq−p nk hence bn 6→ 0. If 0 < q < p, then √ the sufficiency is false: Take an = n, q = 1/2 and p = 1. Then bn = n/(n + 1) → 0 but an → +∞.

Section 2.4 1. (a) lim inf = −5/3, lim sup = 5/3.

(c) lim inf = −14, lim sup = 14.

(h) lim inf = −∞, lim sup = +∞. 3. Follows from Exercise 1.4.6. 5. Follows from {ank : k ≥ n} ⊆ {ak : k ≥ n}. 7. 0 < b − ε < bn < b + ε ⇒ an (b − ε) < an bn < an (b + ε) ⇒ (b − ε) lim supn→∞ an ≤ lim supn→∞ an bn ≤ (b + ε) lim supn→∞ an . Now let ε → 0. 10. Choose r so that lim inf n→∞ bn > r > 0. Then, given ε > 0, there exists N such that an > a/2 and bn > r, and cn := (bn − 3an )(bn + 2an ) = b2n − an bn − 6a2n < ε for every n > N . Then bn − 3an = cn /(bn + 2an ) < ε/(r + a), so lim supn→∞ bn ≤ 3a. an+1 . Choose r strictly between an these numbers and then choose N such that an /an−1 > r for all n > N . For such n, an > an−1 r > an−2 r2 > · · · > aN rn−N , 1/n

12. Suppose that lim inf n an

1/n

< lim inf n

1/n

hence lim inf n an ≥ lim inf n (aN r1−N/n ) = r, a contradiction. To evaluate limn n/(n!)1/n take an = nn /n! and calculate n an+1 n+1 = → e. an n

Section 3.1 1. Let x1 < · · · < xn denote the points of E and let δ=

1 min{xj − xi : 1 ≤ i < j ≤ n}. 2

Then for each j, (xj − δ, xj + δ) ∩ E = {xj }.

524

A Course in Real Analysis

4. Let ε, M > 0. (b) The limit is 1. If |x − 1| < 1, then x > 0, hence 2|x − 1| x+3 3x + 1 − 1 = 3x + 1 < 2|x − 1|. Therefore, choose δ = min{1, ε/2}. √ √ √ (d) The limit is +∞: x < − M − 1 ⇒ −x > M and − x − 1 > M ⇒ x2 + x = (−x)(−x − 1) > M . 6. (a) 2/3. 7. (b) −1/2.

(d) +∞. (g) 9/25. √ √ √ √ r b+x− b−x c+x+ c−x c √ √ → (f) √ = √ . b c+x− c−x b+x+ b−x

√ √ (h) (a d)/(c b).

9. The limit exists at a iff lim{x→a, x∈Q} f (x) = lim{x→a, x∈I} f (x). By continuity of polynomials, this is equivalent to 4a2 + 2a − 11 = 3a2 + a − 5. Thus a = −3, 2. √ √ 11. (a) a. (e) (c a)/(2 b). 13. Proof for the case f increasing and L := limn f (an ) ∈ R: Given ε > 0, choose N such that L − ε < f (an ) < L + ε for all n ≥ N . Let x > aN and let n be the least integer > N such that x < an . Then an−1 ≤ x < an so L − ε < f (an−1 ) ≤ f (x) ≤ f (an ) < L + ε.

Section 3.2 1. (a) −1, 1.

(c) −2/3, 2/3.

(e) −1, 1.

(i) −3, 1.

3. lim sup case: Assume a ∈ R. Set L = lim sup{x→a, x∈E} f (x) and Lj = lim sup{x→a, x∈Ej } f (x), j = 1, 2. By 3.2.1, there exists a sequence an ∈ E1 such that f (an ) → L1 . Since an ∈ E, by the same theorem, L1 ≤ L. Similarly, L2 ≤ L. Now let bn ∈ E such that f (bn ) → L. Then one of the sets, say E1 , contains infinitely many terms of the sequence. Therefore, L ≤ L1 , hence L = max{L1 , L2 }. 4. Let g(x) = 1/f (x). Then g(r) = 1/f (r).

Section 3.3 1. By continuity, f (2) = limx→2− (mx + 3) = limx→2+ (3x2 + 7), that is, 2m + 3 = 19. Therefore, m = 8.

Solutions to Selected Problems

525

4. This follows from lim{x→a, x∈Q} d(x)g(x) = g(a) and lim{x→a, x∈I} d(x)g(x) = 0. 8. The identity implies that f (nx) = nf (x), n ∈ N. Also, f (0) + f (0) = f (0) so f (0) = 0. Since f (−x) + f (x) = f (0), we see that f (−x) = −f (x), hence f (nx) = nf (x) for all n ∈ Z. Let m, n ∈ N. Then f (x) = f (nx/n) = nf (x/n). Replacing x by xm gives mf (x) = f (mx) = nf (mx/n). Thus, f (tx) = tf (x) for all x ∈ R and t ∈ Q. Since f is continuous at zero and f (x − y) = f (x) + f (−y) = f (x) − f (y), f is continuous on R. Thus, f (tx) = tf (x) for all x, t ∈ R. Setting x = 1 gives the desired result. P 9. (c) Let a ∈ R and ε > 0. Choose N so that n>N 2−n < ε and then choose δ > 0 so that (a, a+δ) contains none of the numbers c1 , c2 , . . . , cN . If a < x < a + δ, then X X 0 ≤ f (x) − f (a) = 2−n ≤ 2−n < ε. n:aN

Therefore, f is right continuous at a. If a 6∈ {cn }, then we may choose δ so that (a − δ, a] contains none of the numbers c1 , c2 , . . . , cN . If a − δ < x < a, then, as before, X 0 ≤ f (a) − f (x) = 2−n < ε. n:x 0. We show that the set Dε := {x ∈ [0, 1] : |f (x) − g(x)| ≥ ε} is finite. The desired

526

A Course in Real Analysis conclusionSwill follow on observing that the set of discontinuities off is ∞ precisely n=1 D1/n . Suppose Dε is infinite. Then there exists a sequence of distinct terms such that |f (xn ) − g(xn )| ≥ ε for all n. By the Bolzano–Weierstrass theorem, {xn } has a convergent subsequence, say xnk → x. Because the terms of {xn } are distinct, xnk 6= x for all large k, hence f (xnk ) → g(x). Also, by continuity, g(xnk ) → g(x). But this contradicts the inequality |f (xnk ) − g(xnk )| ≥ ε.

Section 3.4 2. Let x0 > 0 and choose r > x0 such that f (x) < f (x0 ) for all x with |x| > r. Then the maximum M of f on [−r, r] is ≥ f (x0 ), hence M must be the maximum of f on R. 4. (e) Suppose f is not upper semicontinuous at x0 . Choose r such that f (x0 ) < r < lim supx→x0 f (x) and then i such that fi (x0 ) < r. For each δ > 0, r < sup0 0, choose δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ E with |x − y| < δ. Then choose N such that |an − am | < δ for all m, n ≥ N . For such m, n, |f (an ) − f (am )| < ε. 9. The inequality |x| − |y| ≤ |x − y| shows that |x| is uniformly continuous. The given functions are therefore compositions of uniformly continuous functions. 13. If 0 < p ≤ 1, then sin x = 0, and x→+∞ xp lim

lim+

x→0

sin x sin x = lim+ x1−p = 0 or 1. p x→0 x x

Therefore, (sin x)/xp has a uniformly continuous extension to [0, +∞). If p > 1, (sin x)/xp is continuous on (0, +∞) but has no continuous extension to [0, +∞). 15. Since f may be extended continuously to [a, b], it is bounded. The examples f (x) = x on (0, +∞) and f (x) = 1/x on (0, 1) show that the assumptions cannot be relaxed. 18. f (x) has unequal one-sided limits at 0 while those of g(x) are equal. Hence 0 is a removable discontinuity of g but not of f .

Section 4.1 f (x + h) − f (x) = h 2 1 →√ (b) √ . √ 2x + 1 2x + 2h + 1 + 2x + 1 −3 −3 √ → (d) √ . √ √ 2(3x + 2)3/2 3x + 3h + 2 3x + 2 3x + 2 + 3x + 3h + 2 2 x −1 3(5x + 7)1/3 5(3x + 2)1/5 4x cos . 3. (a) + . (c) 2 (x + 1)2 x2 + 1 5(3x + 2)4/5 3(5x + 7)2/3

1. If f denotes the given function,

4. (b) −

1 y − y cos xy 2 2x

528

A Course in Real Analysis

7. f is continuous at 1 iff 2a + b = 1. For such a and b, f is differentiable iff a + b = 3. Therefore, a = −2 and b = 5. 11. (b) The difference quotient is f (a − h) − f (a) f (a + h2 ) − f (a) h+ → f 0 (a). h2 −h 14. For all h 6= 0, [f (x + h) − f (x)]/h ≥ 0, hence f 0 (x) ≥ 0. 16. Clear for n = 2. Suppose the assertion holds for n ≥ 2. Then D

n+1

n X n (f g) = D (Dk f )(Dn−k g) k k=0 n X n k+1 = (D f )(Dn−k g) + (Dk f )(Dn+1−k g) k k=0 n X n n = + (Dk f )(Dn+1−k g) + gDn+1 f + f Dn+1 g k−1 k k=1 n+1 X n + 1 (Dk f )(Dn+1−k g). = k k=0

18. (c) (f 0 ◦ g)g 00 + (f 00 ◦ g)(g 0 )2 . 19. (a)

(−1)n n! . xn+1

sin xn 21. If x 6= 0, f (x) = x n cos x + m n . Also, x does not exist if n + m < 1, sin xn n+m−1 0 f (0) = lim x =0 if n + m > 1, x→0 xn =1 if n + m = 1. 0

m+n−1

n

Therefore, f 0 is continuous at 0 if n + m ≥ 1. 23. For the second order determinant use the expansion f1 f2 g1 g2 = f1 g2 − f2 g1 . For the third order determinant, expand along a row or column and use the formula for the second order case. The same idea may be applied to nth order determinants.

Solutions to Selected Problems

529

Section 4.2 √ 1. Set f (x) = cos x − x + 1. Since f (0) > 0 > f (π/2), f has at least one zero in (0, π/2), by the intermediate value theorem. Since f 0 < 0 on (0, π/2), f is strictly decreasing so the zero is unique. 3. Since f 0 (x) = 4x(x − 1)(x − 2) < 0 on (1, 2), f has at exactly one zero in the interval (1, 2) iff f (1)(= 1 + c) and f (2)(= c) have opposite signs, that is, iff c < 0 < c + 1, or −1 < c < 0. 7. The assertion is clear if n = 0. Suppose it holds for all polynomials with degree ≤ n. Let P (x) have degree n + 1 and suppose that the equation sin(ax) = P (x) has more than n + 2 solutions. Then f (x) := sin(ax) − P (x) has more than n + 2 zeros, hence, by Rolle’s theorem, f 00 (x) = −a2 sin(ax) − P 00 (x) has more than n zeros. But this means that sin(ax) = −P 00 (x)/a2 has more than n solutions, contradicting the induction hypothesis. 9. By the Cauchy mean value theorem, |f (x) − f (y)| |g 0 (c)| = |g(x) − g(y)| |f 0 (c)| ≤ |g(x) − g(y)| |g 0 (c)|. 11. The derivative of x−1 sin x is negative since tan x > x, 0 < x < π/2. 17. Let c1 < · · · < cm be the distinct zeros of P 0 . By the intermediate value theorem, P 0 has a constant sign on (cj , cj+1 ). Therefore, P (x) is strictly monotone on these intervals. 19. Let |f 0 | ≤ c < r. Then g 0 (x) = r + f 0 (x) ≥ r − c > 0, so g is strictly increasing, hence one-to-one. By the mean value theorem, |f (x) − f (0)| ≤ c|x| or f (0) − c|x| ≥ f (x) ≤ f (0) + c|x|. Therefore, f (0) + rx − c|x| ≤ g(x) ≤ f (0) + rx + c|x|. Thus x > 0 ⇒ g(x) ≥ f (0) + (r − c)x ⇒ limx→+∞ g(x) = +∞, and x < 0 ⇒ g(x) ≤ f (0) + (r − c)x ⇒ limx→−∞ g(x) = −∞. By the intermediate value theorem, g(R) = R. 22. g 0 (0) = 0, hence f 0 (0) > 0. Since f (±1/nπ) = ±1/nπ for all n ∈ N, f is not monotone on any neighborhood of 0. 25. Let a, b ∈ I with a < b and suppose that f 0 (a) < y0 < f 0 (b), so 0 0 g (a) < 0 < g (b). Then g(x) − g(a) /(x − a) < 0 for x ∈ (a, a + δ), so the minimum of g cannot occur at a. Similarly, the minimum of g cannot occur at b. Thus, by the local extremum theorem, g 0 (x0 ) = 0, that is, f 0 (x0 ) = y0 , for some x0 ∈ (a, b).

530

A Course in Real Analysis

28. Set q(x, y) = [f (x) − f (y)](x − y), x 6= y. If f is uniformly differentiable on I, then |f 0 (x) − f 0 (y)| ≤ |f 0 (x) − q(x, y)| + |f 0 (y) − q(x, y)| shows that f 0 is uniformly continuous. Conversely, assume that f 0 is uniformly continuous. By the mean value theorem, for each x < y there exists a z ∈ (x, y) such that |q(x, y) − f 0 (y)| = |f 0 (z) − f 0 (y)|. It follows that f is uniformly differentiable. 31. If such a function exists, then lim

x→y

f (x) − f (y) = ϕ(y, y) x−y

so f 0 (y) = ϕ(y, y), which is continuous in y. Conversely, assume f is continuously differentiable on an open interval I and define f (x) − f (y) if x 6= y, x−y ϕ(x, y) = 0 f (x) if x = y. Clearly ϕ is continuous on {(x, y) ∈ I × I : x 6= y}. By the mean value theorem, ϕ(x, y) = f 0 (ξxy )(x − y), where ξxy is between x and y. The continuity of ϕ on I × I now follows from the continuity of f 0 .

Section 4.4 1. (b) (2 − 3x)/(2x − 3), x 6= 3/2. (f) cos−1

3x − 2 , 1/2 < x < 3/4. 1−x

5. (b) Fix y > 0 and let f (x) = ln(xy) − ln x − ln y. Since f 0 (x) = 0, f (x) = f (1) = 0 for all x > 0. 7. (b) ax+y = exp((x + y) ln a) = exp(x ln a) exp(y ln a) = ax ay . 9. xa = exp(a ln x), hence (xa )0 = exp(a ln x)(a/x) = axa−1 . 13. The derivative of the left side of (c) is 2 4x p − , y := x2 + 1 (x2 + 1)2 1 − y 2

x2 − 1 x2 + 1

which reduces to 0. Therefore, the left side is constant.

2 ,

Solutions to Selected Problems

531

14. Set c = f 0 (0). Since f (h) − 1 f (x + h) − f (x) = f (x) , h h f 0 (x) exists and equals cf (x). Therefore, e−cx f (x) has zero derivative, hence e−cx f (x) = f (0). f 00 f −1 (x) −1 00 18. (f ) (x) = − 3 . f 0 f −1 (x)

Section 4.5 1. (a) p − q.

(d) −1.

(g) −2.

(s) 1 if p > 1, +∞ if p ≤ 1.

(j) 0.

(m) −∞.

(p) 0.

(v) 1.

2. (c) f (0) = limx→0+ f (x) = 5/3. 3. (a) ln an = n−1 ln sin(1/n) is of the form −∞ +∞ , hence has the same limit as 1 cos(1/n) cos(1/n) 1 =− → 0. 2 −1 n sin(1/n) n n sin(1/n) Therefore, an → 1. 6. By logarithmic differentiation, 1 ln x 1/x f 0 (x) = 1 + x . x1/x − x x By l’Hospital’s rule, x1/x → 1, hence limx→+∞ f 0 (x) = 1. Applying the mean value theorem to f on each of the intervals [n, n + 1] shows that f (n + 1) − f (n) → 1. 9. Let L := limx→+∞ f 0 (x)/g 0 (x). By l’Hospital’s rule, limx→+∞ g(x)/f (x) exists and equals 1/L. Another application of l’Hospital’s rule yields ln f (x) f 0 (x) g(x) = lim 0 = 1. x→+∞ ln g(x) x→+∞ g (x) f (x) lim

10. (a) By l’Hospital’s rule, the quotient has the same limit as αβ f 0 (a + αh) − f 0 (a + βh) 2 h αβ f 0 (a + αh) − f 0 (a) f 0 (a + βh) − f 0 (a) = α −β , 2 αh βh which is αβ(α − β)f 00 (a)/2.

532

A Course in Real Analysis

12. Apply l’Hospital’s rule n times to f (x)/x−n to obtain lim+ xn f (x) = lim+ x→0

x→0

where a =

(−1)n f (n) (x) = lim+ ax2n f (n) (x), x→0 n(n + 1) . . . (2n − 1)x−2n

(−1) (n − 1)! . Therefore, lim+ xn f (x) exists and equals aL. x→0 (2n − 1)! n

16. By l’Hospital’s rule, f (g(x)) f 0 (g(x))g 0 (x) = lim = L. x→+∞ g(x) x→+∞ g 0 (x) √ For examples, take f (x) = x, ln x, or x + 1/x, and g(x) = xn , ex , or ln x. lim

18. By l’Hospital’s rule, xf (x) = lim xf 0 (x) + f (x) x→+∞ x→+∞ x = lim xf 0 (x) + lim f (x).

lim f (x) = lim

x→+∞

x→+∞

x→+∞

For the second part consider, f (x) = ln x.

Section 4.6 2. Apply Taylor’s theorem to the function between the inequalities to produce the number c ∈ (0, x) in the remainder term: (b) f (k) (x) = (−1)k e−x , hence e−x =

k=0

e−c ∈ (0, 1). 1 3. Let In := n! ing,

2n−1 X

Z

x

(x − t)n f (n+1) (t) dt = −

a

In = −

n X f (k) (a) k=1

k!

(−1)k k e−c 2n x + x , where k! (2n)!

f (n) (a) (x − a)n + In−1 . Iteratn!

(x − a)k + I0 = −Tn (x, a) + f (x).

5. By Taylor’s theorem, bk =

n−k P (k) (b) 1 X = (j + 1)(j + 2) · · · (j + k)(b − a)j ak+j . k! k! j=0

Section 4.7 1. (a) −1.52137970. 2. (a) 0.87672621. 4. 7.937253933.

(d) −1.42360584.

(g) 1.220924381.

(c) 1.55714559.

Solutions to Selected Problems

533

Section 5.1 3. Since Mj (−f ) = −mj (f ), S(−f, P) = −S(f, P), hence Z

b

(−f ) = inf S(−f, P) = inf (−S(f, P)) = − sup S(f, P) = −

a

P

P

Replacing f by −f shows that

P

Rb a

(−f ) = −

Rb a

Z

b

f. a

f.

5. Since g may be obtained from f by changing one point at a time, we may assume that f = g except at a single point c ∈ (a, b). Let ε > 0 and let M be a bound for both |f | and |g|. The point c is in at most two intervals of any partition P, and each of these has width ≤ kPk. Since f = g on the remaining intervals, |S(f, P) − S(g, P)| ≤ 2M kPk. It follows from 5.1.15 that

Rb a

f=

Rb a

g. Similarly

Rb a

f=

Rb a

g.

6. (c) Let g = sin f , ε > 0, and let P be any partition of [a, b] such that S(f, P) − S(f, P) < ε. For fixed j, choose sequences an , bn ∈ [xj−1 , xj ] such that g(an ) → Mj (g) and g(bn ) → mj (g). Then g(an ) − g(bn ) ≤ |f (an ) − f (bn )| ≤ Mj (f ) − mj (f ), hence Mj (g) − mj (g) ≤ Mj (f ) − mj (f ). Therefore, S(g, P) − S(g, P) ≤ S(f, P) − S(f, P) < ε. 7. (a) Let L = limP F (P) and M = limP G(P). Given ε > 0, choose Pε0 and Pε00 such that |F (P) − L| < η for all partitions P refining Pε0 and |G(P) − M | < η for all partitions P refining Pε00 , where η = ε/(2|α| + 2|β| + 2). Let Pε denote the common refinement of Pε0 and Pε00 . Then both inequalities hold for any partition P refining Pε , hence |(αF (P) + βG) − (αL + βM )| ≤ |α||F (P) − L| + |β||G(P) − M | < ε. Rb Rb (b) Given ε > 0, choose Pε such that a f − ε < S(f, Pε ) ≤ a f . The inequality still holds if Pε is replaced by a refinement. Therefore, Rb f = limP S(f, P). a

Section 5.2 1. Assume cn → c ∈ (a, b). Choose δ > 0 so that a < c − δ < c + δ < b and choose N so that cn ∈ (c − δ, c + δ) for all n > N . Since f has only finitely many discontinuities on [a, c − δ] ∪ [c + δ, b], f is integrable on

534

A Course in Real Analysis these intervals and the integrals are zero. Thus, given ε > 0, there exist partitions P1 of [a, c − δ] and P2 of [c + δ, b] such that S(f, P1 ) − S(f, P1 ) < ε/3 and S(f, P2 ) − S(f, P2 ) < ε/3. Define a partition P on [a, b] by P = P1 ∪ P2 and let |f | ≤ M on [a, b]. If δ < ε/6M , then S(f, P) − S(f, P) ≤ S(f, P1 ) − S(f, P1 ) + S(f, P2 ) − S(f, P2 ) + 2M δ < ε. Therefore, f ∈ Rba . Moreover, Z a

hence

b

f=

Z

c−δ

f+

c+δ

Z

a

f+

c−δ

Z

b

Z f ≤

a

b

Z

f=

c+δ

Z

c+δ

f, c−δ

c+δ

|f | ≤ 2M δ.

c−δ

Since δ may be made arbitrarily small,

Rb a

f = 0.

5. Set Mn = max{f1 , . . . , fn }. Then M2 = f1 + f2 + |f1 − f2 | /2 ∈ Rba . Since Mn = max{Mn−1 , fn }, the general result follows by induction. A similar argument holds for min. Rb 6. Choose x0 such that f (x0 ) = supa≤x≤b f (x). Then a f ≤ f (x0 )(b − a) < M (b − a). 9. Let |f | ≤ M on [a, b]. Then |F (x, y) − F (x, y0 )| ≤ M (y − y0 ), hence limy→y0 F (x, y) = F (x, y0 ). 12. (a) By the approximation property, choose x0 such that |f (x0 )| > M − ε. By continuity, we may take x0 ∈ (a, b) and we may choose δ > 0 such that |f (x)| > M − ε for all x ∈ (x0 − δ, x0 + δ). Then M (b − a) ≥

Z

b

Z

x0 +δ/2

|f | ≥

|f | ≥ δ(M − ε). x0 −δ/2

a

(b) By (a), |f (x)|p > (M − ε)p on (x0 − δ, x0 + δ), hence, as in (a), δ(M − ε)p ≤

Z

b

|f |p ≤ M p (b − a).

a

Therefore, δ 1/p (M − ε) ≤

Z a

b

|f |p

1/p

≤ M (b − a)1/p ,

Solutions to Selected Problems

535

hence M − ε ≤ lim inf

b

Z

p→+∞

|f |p

1/p

≤ lim sup

Z

p→+∞

a

b

|f |p

1/p

≤ M.

a

Since ε was arbitrary, lim inf

Z

p→+∞

b

|f |

p

1/p

= lim sup p→+∞

a

b

Z

|f |p

1/p

= M.

a

Section 5.3 1. By a change of variables and periodicity, Z p Z p+y f (x + y) dx = f (x) dx y

0

=

Z

p

f (x) dx +

y

=

Z Z

f (x) dx

p p

f (x) dx +

p+y

Z

y

=

p+y

Z

f (x − p) dx

p p

f (x) dx +

y

Z

y

f (x dx =

0

f (x) dx.

0 1

Z

3. (a) On [0, 1], 2x/π ≤ sin x ≤ x. Since

p

Z

√

0

inequalities follow.

x x2 + 1

dx =

√

2 − 1, the

5. (a) Substituting y = x1/n and integrating by parts n − 1 times yields Z

1

exp x1/n dx = n

Z

0

1

y n−1 ey dy = F (1) − F (0),

0

where F (y) = (−1)n+1 n!ey

n−1 X j=0

(−1)j j y . j!

7. Let I denote the integral. Successive integration by parts yields Z 1 (k − 1)(k − 3) · · · (k − 2j + 1) I= Ij , Ij := xk−2j (1 − x2 )j−1/2 dx. 1 · 3 · · · (2j − 1) 0 If k is odd, take j = (k − 1)/2 so I=

(k − 1)(k − 3) · · · 4 · 2 Ij , Ij = 1 · 3 · · · (k − 2)

Z 0

1

x(1 − x2 )(k−2)/2 dx = k −1 .

536

A Course in Real Analysis If k is even, take j = k/2 so (k − 1)(k − 3) · · · 3 · 1 I= Ij = Ij = 3 · 5 · · · (k − 1)

Z

1

(1 − x2 )(k−1)/2 dx.

0

By trig. substitution and Exercise 6, Ij =

π (k − 1)(k − 3) · · · 3 · 1 . 2 k(k − 2) · · · 4 · 2

π/2

Z

cosk θ dθ =

0

9. Substituting s = f (t) and integrating by parts yields y

Z

f −1 (s) ds =

Z

0

hence Z x

f+

0

f −1 (y)

tf 0 (t) dt = yf −1 (y) −

0

Z

f

= yf

−1

f, 0

y −1

f −1 (y)

Z

(y) +

x

Z

0

Z f−

0

f −1 (y)

f = yf

−1

(y) +

Z

0

x

f. f −1 (y)

If f −1 (y) ≤ x, then f (t) ≥ y for all t ∈ [f −1 (y), x], hence Z x Z x f (t) dt + yf −1 (y) ≥ y dt + yf −1 (y) = xy. f −1 (y)

f −1 (y)

On the other hand, if f −1 (y) ≥ x then f (t) ≤ y for all t ∈ [x, f −1 (y)], hence Z x Z f −1 (y) f (t) dt + yf −1 (y) ≥ − y dt + yf −1 (y) = xy. f −1 (y)

x

10. (b) Take f (t) = ln(t + 1), 0 ≤ x ≤ 1, and 0 ≤ y ≤ ln 2 in Young’s inequality to obtain Z x Z y (x + 1) ln(x + 1) − x + ey − y − 1 = ln(t + 1) dt + (es − 1) ds ≥ xy. 0

0

Replace x + 1 by x, 1 ≤ x ≤ 2. 13. Integrate by parts to obtain Z b a 1 Z b f (x) sin(nx) dx = f (x) cos(nx) + f 0 (x) cos(nx) dx. n a b a 17. If F is a primitive of f , then chain rule.

Z

v(x)

u(x)

f = F v(x) − F u(x) . Now use the

Solutions to Selected Problems

537

19. By l’Hospital’s rule, Z x Z x i h g(x) lim f = lim g(x)f (x) + g 0 (x) f = g(a)f (a). x→a x − a a x→a a Z

20. (a) sn is a Riemann sum for

1

xp dx, hence limn→+∞ sn = 1/(p + 1).

0

21. By the mean value theorem, |f (x) − f (xk−1 )| ≤ M |x − xk−1 | ≤ M (xk − xk−1 ) = M h, x ∈ [xk−1 , xk ], hence n Z Z b n X X f− f (x )h k−1 = a

k=1

k=1

xk

xk−1

f (x) − f (xk−1 ) dx ≤ M nh2 .

Section 5.5 3. The substitution t = sin x yields Z

√

π/3

f sin x dx =

π/6

Z

3/2

1/2

f (t) √ dt. 1 − t2

Now apply 5.5.3 with g(t) = (1 − t2 )−1/2 . Z b 7. G(b) ≤ f g ≤ G(a). Now apply the intermediate value theorem to G. a

9. Apply 5.5.3 to obtain c ∈ [0, 1] such that Z π Z c Z g(x) sin x dx = g(0) sin x dx + g(1) 0

0

π

sin x dx = cos c + 1.

c

Section 5.7 R1 R1 Rε 1. Let f (x) denote the integrand and 0 < ε < 1. Then 0 f = 0 f + ε f. On (0, ε], 2x/π ≤ sin x ≤ x and 1 − ε ≤ 1 − x < 1, hence 1 (π/2)p ≤ f (x) ≤ . p |x| (1 − ε)q |x|p On [ε, 1),

Therefore,

1 1 ≤ f (x) ≤ . (1 − x)q sinp 1 (1 − x)q sinp ε R1 0

f converges iff p, q > 1.

538

A Course in Real Analysis

5. Only (b) and (d) diverge. p sin x < 1 for 0 < x < r. Then x Z Z ε Z ε sinp x 1 ε q−p x ≤ xq−p dx. dx ≤ 2 0 xq 0 0

8. Choose r > 0 so that 1/2 <

Now apply 5.7.3(a). 9. (a) all p.

(k) p > −1. Rx 11. Let g(x) = x(1+x2 )−1 , h(x) = sin x and f := gh. Then | 1 h| is bounded and g 0 < 0 so, by 5.7.17, f is improperly integrable on [1, +∞). For every n, Z

(c) all p.

∞

|f | dx ≥ 0

n Z X j=2

=M

jπ

(j−1)π

n X j=2

(h) p > −2.

n

X x| sin x| dx ≥ 2 1+x j=2

Z

jπ

(j−1)π

π(j − 1)| sin x| dx 1 + π2 j 2

j−1 , 1 + π2 j 2

where M is a positive constant. The sums in the last equality are unbounded, hence h is not improperly absolutely integrable in this case. 13. (a) Converges for all p > 0 if 0 < q < 1; diverges for all p > 0 if q ≥ 1. (b) Converges for all p > 0 if 0 < q < 1; diverges for all p > 0 if q ≥ 1. (c) Converges if p > 2 or q > 2 and diverges otherwise. (d) Converges if p < 2 or q < 2 and diverges otherwise. (e) Converges iff q < 1. (f) Converges iff pq < 1. 15. Integrate by parts: Z ∞ Z 2 In := x2n e−x /2 dx = (2n − 1) −∞

∞

x2n−2 e−x

2

/2

dx = (2n − 1)In−1 .

−∞

20. Both integrals converge. The root test is inconclusive. 24. By the Cauchy–Schwarz inequality, Z ∞ 1/2 Z ∞ 1/2 Z ∞p f (x) dx ≤ f (x) dx < +∞. x x2 1 1 1

Solutions to Selected Problems

539

Rx 26. Let F (x) = a f g, a ≤ x < b, and let bn ↑ b. By the weighted mean value theorem, F (bm ) − F (bn ) = f (cm,n )[G(bm ) − G(bn )] for some cm,n between bm and bn . Since G is bounded and f (cm,n ) → 0, {F (bn )} is a Cauchy sequence and hence converges. Since {bn } was Rb arbitrary, a f g converges. Rt 28. Let F (t) = 0 f dx. Then Z

t

f (x + c) dx =

Z

c+t

f (x) dx =

c

0

hence

Z Z

f (x) dx + F (t + c) − F (t),

c

∞

f (x + c) dx =

Z

∞

f (x) dx.

c

0

Similarly,

t

Z

0

f (x + c) dx =

−∞

Z

c

f (x) dx.

−∞

Section 5.8 2. Given ε > 0, let An be covered by intervals In,k , k = 1, 2, . . ., with total length < ε/2n . Then the union is covered by intervals In,k , n, k = 1, 2, . . ., with total length < ε. 6. The discontinuity set is countable, hence the integral exists. Since all lower sums are zero, the integral must be zero.

Section 6.1 1. (a)

m3 (m + 1) . 2m + 1

m

(c) ln(3/2).

(e)

1 X (−1)k . m k k=1

(i) − ln(m + 1).

(n)

m X (−1)k

k

k=1

2. (a)

(g)

23 . 480

.

1 . 1 + r2

3. (a) 193e. 5. Let sn =

(c) (e − 1/e)/2. n n n X X X 1 1 4 , un = , and vn = . k 2k − 1 (2k − 1)(2k + 1)

k=1

k=1

k=1

(a) s2n = sn /2 + un , hence, by 6.1.9, un −

1 2

ln n = [s2n − ln(2n)] − 12 [sn − ln n] + ln 2 → 21 γ + ln 2.

540

A Course in Real Analysis

8. Given ε > 0, choose N such that L − ε < ak /bk < L + ε for all k ≥ N . Multiplying by bk and summing, (L − ε)

m X k=n

bk <

m X k=n

ak < (L + ε)

m X

bk , m > n ≥ N.

k=n

Letting m → +∞ and dividing, P∞ ak < L + ε, n ≥ N. L − ε < Pk=n ∞ k=n bk P P 12. Let sn and tn denote the nth partial sums of an and bn , respectively. P Then tk = snk so P {tk } is a subsequence of {sn }. Therefore, if n an converges, so does k bk . If the Pterms an are nonnegative, then, P for each b n, sn ≤ t for k ≥ n, hence if converges, then so does k k k n an . The P∞ series n=0 (−1)n shows that the latter assertion fails in general. 15. By summing a geometric series, a real number x with representation bN bN −1 · · · b0 .a1 a2 · · · an 999 · · · , where an 6= 9 may be written as bN bN −1 · · · b0 .a1 a2 · · · an + 10−n = bN bN −1 · · · b0 .a1 a2 · · · an−1 a0n , where a0n := an + 1. Therefore, a real number has at least one standard representation. Suppose that bN bN −1 · · · b0 .a1 a2 · · · = cM cM −1 · · · c0 .d1 d2 · · · are standard representations. Then |bN bN −1 · · · b0 − cM cM −1 · · · c0 | = |(.d1 d2 · · · ) − (.a1 a2 · · · )| ∞ X |dj − aj | ≤ . 10j j=1 Since the representations are standard, |dj − aj | cannot eventually equal 9, hence the right side is < 1. Therefore, since the left side is an integer, it must be zero. It follows from Exercise 1.5.16 that M = N and bj = cj , 0 ≤ j ≤ N . Then a1 .a2 a3 · · · = d1 .d2 d3 · · · , hence a1 = d1 . An induction argument shows that an = dn for all n.

Section 6.2 1. By the ratio test, (a), (b), (e), and (f) converge; (c) and (d) diverge. 2. (a) Converges by ratio test. (d) Converges by ratio test. (g) Converges by integral test iff p > 1. P (j) Diverges by limit comparison with 1/n.

Solutions to Selected Problems P (m) Converges by limit comparison with 1/n2 .

541

(p) Diverges by ratio test. (s) Diverges since 2ln n = np , p = ln 2 < 1. P (v) Converges by limit comparison with 1/2n . 5. For all sufficiently large n, an < an n1/n < 2an . 6. (a) Converges iff p > 1. q > 1 + p.

(e) Converges iff q > p.

8. (a) Since an → 0, a2n < an for all large n. Therefore, comparison test.

(g) Converges iff P

bn converges by

(d) Converges by comparison test: bn ≤ an . (h) Converges: For n sufficiently large, say n ≥ N , an < 1, hence bn = M aN · · · an < M an , where M = a1 · · · aN −1 . (l) Converges by the Cauchy–Schwarz inequality. 11. The inequality implies that {an /bn } is a decreasing sequence and hence converges to L < +∞. Now use the comparison test. 14. Since limx→∞ f (g(x)) = limx→∞ g(x) = 0, l’Hospital’s rule implies that 0 0 limx→∞ f (g(x))/g(x) P = limx→∞ f (g(x)) P = f (0). Now apply the limit comparison test to n f (g(n)) and n g(n). P 15. (a) If f (1/np ) converges, then f (0) = limn f (1/np ) = 0. Suppose f (xp ) f 0 (0) 6= 0. Then, by l’Hospital’s rule, limx→0 2p = ∞. Therefore, x eventually f (1/np ) > 1/n2p so the series diverges by the comparison test. 17. (a) n!(e − sn ) = m(n − 1)! −

n X n! k=1

k!

∈ N.

∞ X

1 (n + k)! k=1 1 1 1 = 1+ + + ... (n + 1)! n + 2 (n + 2)(n + 3) 1 1 1 < 1+ + + ... (n + 1)! n + 1 (n + 1)2 1 n+1 = , (n + 1)! n

(b) e − sn =

hence n!(e − sn ) < 1/n. By (a) and (b), n!(e − sn ) is a positive integer < 1/n, which is impossible.

542

A Course in Real Analysis

Section 6.3 3. (a) and (c) diverge: dn → 2/3; (b) converges: dn → 3/2. 5. (a) By ratio test: series converges if p < e and diverges if p > e. If p = e, series diverges by Raabe’s test since then dn → −1/2. 6. Ratio test fails. Raabe: dn → (1 + p)/2, hence converges if p > 1 and diverges if p < 1. Also diverges if p = 1, since then an = 1/(2n + 1). 10. (a) Diverges.

(d) Converges iff r > 1.

13. − ln an / ln n → ln b. 16. (a) Converges iff q > p. 18. Let c > 1 and choose r ∈ (1, c). Then, for sufficiently large n, cn > r, hence r ln a−1 n = ln n + cn ln ln n > ln n + r ln ln n = ln n(ln n) R∞ 1 . Since 2 1/x(ln x)r dx < +∞, the integral r n(ln n) and comparison tests complete the proof in this case. The case c < 1 is similar. and therefore an <

The given series diverges. 21. Take bn = n ln n in Kummer’s test. Then 1 βn n cn = 1 + + n ln n−(n+1) ln(n+1) = (n+1) ln +βn . n n ln n n+1 Since the first term on the right side tends to −1, lim inf βn > 1 implies n→∞ lim inf cn > 0, and lim sup βn < 1 implies lim inf cn < 0. n→∞

n→∞

n→∞

Section 6.4 2. Choose r > 1 and N ∈ N such that |an+1 |/|an | > r for all n ≥ N . Then |aN +k | > rk |aN | for all k, hence an 6→ 0. Therefore, series diverges. 4. (a) Diverges.

(b) Converges conditionally.

(c) Converges absolutely if p > 1, conditionally if p ≤ 1. (i) Converges absolutely if p > 1/2, conditionally if p ≤ 1/2. (m) Converges absolutely if p > 1, diverges if p < 1. n − 1/2 . If p ≤ 1, then bn sin nθ need not tend to zero (see + (−1)n Example 8.3.10). For p > 1, it suffices by Dirichlet’s test to P show that P |bn+1 − bn | < +∞. This follows by limit comparison with 1/np .

9. Let bn =

np

Solutions to Selected Problems

543

13. (a) For n ∈ N, n = qmn + rn , where rn , mn ∈ N and 0 ≤ rn ≤ q − 1. Since sn − sqmn is a sum of terms of the form aqmn +j , j = 1, . . . q − 1, each of which → 0, sn − sqmn → 0. Therefore, sn → s. (b) For n ∈ N, 1 1 1 1 1 1 1 1 1 1 1 − + + + + + − + + s6n = 1 + + 2 3 4 5 6 7 8 9 10 11 12 1 1 1 1 1 1 + ··· + + + − + + 6n − 5 6n − 4 6n − 3 6n − 2 6n − 1 6n 1 1 1 1 1 1 1 = 1− + − + − + ··· + − 4 2 5 3 6 6n − 3 6n =

6n−3 X 3 3 3 1 3 + + + ··· + =3 . 1·4 2·5 3·6 (6n − 3)6n k(k + 3) k=1

The last expression converges to (1 + 1/2 + 1/3) = 11/6 by 6.1.5 with m = 3. By part (a), s = 11/6. (c) Let tn be the nth partial sum of the series. Then 1 1 1 1 1 1 1 1 1 1 + − − + + + − − + ··· − 2 3 4 5 6 7 8 9 10 5n 1 1 1 1 1 1 1 1 1 + + − − + + + − − + ··· 3 3 3 3 8 8 8 8 8 1 1 1 + + + ··· + . 8 13 5n − 2

t5n = 1 + 1 3 1 = 3 ≥

Thus t5n → +∞, so the series diverges.

Section 6.5 3. (a), (b), (c): Double limit does not exist; only one iterated limit exists. (d), (g), (l): Iterated limits exist and are unequal. Hence double limit does not exist. (e), (h): Iterated limits exist and are equal. Double limit exists. (f), (i), (k): Iterated limits exist and are equal. Double limit does not exist. (j) If a = b, iterated limits exist and are equal, double limit exists. If a 6= b, iterated limits exist and are unequal. Pm Pn Pn 9. Let sm,n = j=1 k=1 aj,k and sn = k=1 bk . Then for m ≥ n, sn ≤ sn,n ≤ sm,n ≤ sm,m ≤ s2m−1 , hence the result follows from the squeeze principle.

544

A Course in Real Analysis

10. (b) Let bn =

Pn

j=1 aj,n+1−j =

n X j=1

1 and let sn = [j 2 + (n + 1 − j)2 ]p/2

Pn

2 2 k=1 bk . The minimum of x +(n+1−x) on [1, n] occurs at x = (n+1)/2 and the maximum at x = 1 and x = n, hence

(n + 1)2 /2 ≤ j 2 + (n + 1 − j)2 ≤ n2 + 1, 1 ≤ j ≤ n, and therefore (n2

2p/2 n n ≤ bn ≤ , p/2 (n + 1)p + 1)

so the double series converges iff p > 2. 11. If |r| ≥ 1, then am,n 6→ 0, hence the double series diverges. Let |r| < 1 and set cm = |r|m /(1−|r|m ). Choose M such that |r|m < 1/2 for m > M . Then ∞ X ∞ X m=1 n=1

|r|mn =

M X

cm +

m=1

∞ X m=M +1

cm ≤

M X

cm + 2

m=1

∞ X

|r|m < +∞.

m=1

Therefore, the iterated series, and hence the double series, converges absolutely. 1/mn

12. Let L < 1. Choose r ∈ (L, 1) and then N suchP that am,n < r for all m, n ≥ N . For such m, n, am,n < rmn , hence am,n converges by Exercises 6 and 11. If L > 1, choose r ∈ (1, L) and then N such that 1/mn am,n > r for all m, n ≥ N . For such m, n, am,n > rmn > 1, hence am,n 6→ 0, so the series diverges.

Section 7.1 1. (b) Pointwise to 0 on (−1, 1] for all p ≥ 0, uniformly on intervals [a, 1] for a > −1 and p < 1. Uniformly on [−1, 1] if p < 0. (d) Pointwise to 0 on R, uniformly on |x| ≥ a > 0. (g) Uniformly to 0 on R. (j) Pointwise on R, uniformly on the sets |x| ≥ r > 1 and |x| ≤ s < 1. 2. (a) Pointwise but not uniformly.

(b) Uniformly.

6. For example, fn (x) = x + 1/n, f (x) = gn (x) = g(x) = x on [1, +∞]. 10. Given ε > 0, choose δ > 0 such that |f (x) − f (y)| < ε for all x, y ∈ R with |x − y| < δ. Then choose N such that |an − a| < δ for all n ≥ N . For such n and for all x, |fn (x) − f (x + a)| = |f (x + an ) − f (x + a)| < ε.

Solutions to Selected Problems

545

13. If x ∈ Q has reduced form x = k/m, then fn (x) = 1 for all n ≥ m. Therefore, fn converges pointwise to the Dirichlet function d(x). Suppose the convergence were uniform on [0, 1]. Then we could find n such that |fn (x) − d(x)| < 1 for all x ∈ [0, 1]. In particular, |fn (1/m) − 1| < 1 for all m > n, which is impossible since fn (1/m) = 0. 17. Let M > |f0 (x)| + 1 for all x ∈ S. Then |fn+1 (x) − fn (x)| = | sin rfn (x) − sin rfn−1 (x) | ≤ r|fn (x) − fn−1 (x)| ≤ · · · ≤ rn |f1 (x) − f0 (x)| ≤ M rn . Since r < 1, {fn } is uniformly Cauchy. Therefore, fn → some f , uniformly on S. The generalization is proved in a similar manner, using the mean value theorem.

Section 7.2 4. Let x > 0. By l’Hospital’s rule, n2 xe−nx has the same limit as 2ne−nx , namely, 0. The convergence is not uniform on (0, 1), however, as may be seen by taking bn = 1/n in 7.1.5. An integration by parts shows that R 1 2 −nx n xe dx = 1 − e−n (1 + n) → 1. 0 R1 5. (d) Let L := limn 0 fn . By the mean value theorem, e−x/n − 1 = (−x/n)e−ξ/n , hence √ n e−x/n − 1 e−ξ/n 1 ≤ √ ≤ √ x n n so

√

n e−x/n − 1 /x converges uniformly to zero. Therefore, L = 0. √ √ 6. If x ≥ r > 0, then fn (x) = n/(1 + n2 x2 ) ≤ n/(1 + n2 r2 ), hence fn → 0 uniformly on [r, +∞). The convergence is not uniform on (0, 1), as can be seen by taking bn = 1/n in 7.1.5. A substitution shows that R1 f = n−1/2 arctan n → 0. 0 n Rb 8. (a) n sin fn → f 0 /f uniformly ⇒ a n sin fn → ln f (b) − ln f (a). 9. This follows from the inequality Z x Z Z x fn (t) dt − f (t) dt ≤ a

a

a

x

|fn (t) − f (t)| dt ≤

Z

b

|fn − f |. a

546

A Course in Real Analysis

Section 7.3 1. (a) Pointwise on (1, +∞), uniformly on [r, +∞), r > 1. (d) Uniformly on [0, +∞). (g) Pointwise on (0, +∞), uniformly on [r, +∞), r > 0. (i) If p > 1, pointwise on [0, +∞), uniformly on [0, r]; If p = 1, converges only at x = 0. 2. (b) s(x) = vals.

1 . Pointwise on (1/e, e), uniformly on closed subinter1 + ln x

4. Both s(x) and c(x) converge uniformly on R by the M -test. Therefore, term by term integration is justified so Z x Z π/2 X an X an cos (2n + 1)x , c(t) dt = sin(nx). s(t) dt = 2n + 1 n 0 x n n 6. (a) Let p ≤ 1/2 and x 6= 0. By l’Hospital’s rule, n−1 [1 − cos(x/np )] has the same limit as n → +∞ as −pxn−p−1 sin(x/np ) sin(x/np ) = px2 n1−2p . 2 −1/n x/np Since this limit is positive, (a) follows from the limit comparison test. (b) Since cosine is an even function, to show uniform convergence on intervals [a, b] we may assume a = 0. By the mean value theorem, for each n ∈ N and x ∈ [0, b] there exists xn ∈ [0, b] such that |1 − cos(x/np )| = (x/np )| sin(xn /np )| ≤ b2 /n2p . Therefore, uniform convergence on [0, b] follows from the M -test. Since 1−cos(x/np ) does not converge uniformly to 0 on any unbounded interval, s(x) does not converge uniformly on R. 9. Let |f 0 | ≤ M on I. By the mean value theorem, for each x ∈ I and n ∈ N there exists ξ between x/(n + 1) and 0 such that 1 x |xf 0 (ξ)| rM f = ≤ . n n+1 n(n + 1) n(n + 1) Therefore, s(x) converges uniformly on I by the Weierstrass M -test. Since f 0 is bounded, the derived series ∞ X 1 x s0 (x) = f0 n(n + 1) n+1 n=1 converges uniformly on I and s0 (0) = f 0 (0).

Solutions to Selected Problems

547

11. Since fn ≥ 0, the partial sums of the series increase, so the conclusion follows from Dini’s theorem (7.1.12). 13. For x ∈ [a, b], either fn (a) ≤ fn (x) ≤ fn (b) or fn (b) ≤ fn (x) ≤ fn (a), hence |fn (x)| ≤ Mn := P P max{|fn (a)|, |fn (b)|} ≤ |fn (a)| + |fn (b)|. Since Mn < +∞, s = n converges uniformly on [a, b]. Since each n f Rb PRb fn ∈ R [a, b] , s ∈ R [a, b] and a s = f . a n 15. By Dini’s theorem, the convergence of {gn } is uniform. Therefore, the result follows from 7.3.9. 18. Since g is continuous and n−2 [g + n] ↓ 0, the convergence is uniform on closed bounded intervals I. By 7.3.9, s(x) converges uniformly on I. The P convergence is not absolute for any x (compare with n 1/n).

Section 7.4 1. (a) (−1, 3). 2. (b)

(d) (−1, 1].

(g) (−1/4, 1/4).

(i) (−1, 1).

∞ X 3n−3 n x , −2/3 < x < 2/3. 2n−2 n=3

3. (a) Replace x by x − 1 in (7.12), where |x − 1| < 1, to obtain x ln x = (x − 1) ln x + ln x ∞ ∞ X X (−1)n+1 (−1)n+1 = (x − 1)n+1 + (x − 1)n n n n=1 n=1 ∞ ∞ X X (−1)n+1 (−1)n (x − 1)n + (x − 1)n n − 1 n n=1 n=2 ∞ n X (−1) (−1)n+1 = (x − 1) + + (x − 1)n n − 1 n n=2

=

= (x − 1) +

4. (a)

∞ X (−1)n (x − 1)n . n(n − 1) n=2

∞ X (−1)n+1 2n + 3n n x , |x| < 1/3. n n=1

∞ X (−1)n 4n 2n+1 (g) x , x ∈ R. (2n + 1)! n=0

5. Use arccos x = π/2 − arcsin x and (7.20). 9. (a)

∞ X n=1

(−1)n

x2n−1 . (2n − 1)(2n + 1)!

(e)

∞ X (−1)n n x , x > 0. (2n + 1)! n=0

548

A Course in Real Analysis

10. (b)

x(1 − x2 ) . (1 + x2 )2

11. 27/4. 12. (a) (d)

∞ X n=1 ∞ X

(−1)n+1 cn xn , |x| < 1, cn :=

n X (−1)k k=1

cn x2n+1 , x ∈ R, cn :=

n=0

n X k=0

k

.

(−1)k . (2k + 1)!(n − k)!

√

16. For |x| < ( 5 − 1)/2, (1 − x − x2 )s(x) =

∞ X

cn xn −

n=0

∞ X

cn xn+1 −

n=0

= c0 + c1 x − c0 x +

∞ X

cn xn+2

n=0 ∞ X

(cn − cn−1 − cn−2 )xn

n=2

= 1. 18. Replace x by −t2 in (7.19) to obtain √

∞ X 1 (−1)n (2n)! 2n t , = (n!)2 4n 1 + t2 n=0

|t| < 1.

Integrating from 0 to x yields the desired representation. 21. (a)) Choose r such that Rs−1 = lim supn |cn |1/n < r < 1. Then 2

|cn2 |1/n = |cn2 |1/n

n

< rn → 0,

hence Rt = +∞. (b) If cn = (1 + a/np )n , p > 0, then Rs = 1 and

a −n Rt = lim 1 + 2p n n

−a e 0 = +∞ 1

if if if if

p = 1/2, p < 1/2 and a > 0 p < 1/2 and a < 0 p > 1/2.

22. (a) If 0 < Rs < +∞, choose N such that |cn |1/n < 2Rs−1 for all n ≥ 2 N . For such n, |cn |1/n < (2Rs−1 )1/n → 1, hence Rt ≥ 1. Similarly, 2 |cn |1/n > (Rs−1 /2)1/n for infinitely many n, hence Rt ≤ 1. P∞ 27. By the alternating series test, n=0 cn xn converges at x = −1, hence the result follows from Abel’s continuity theorem.

Solutions to Selected Problems n X

549

x , x ∈ [0, 1). By 7.4.6 and (1 − x)2 k=1 the boundedness of f , sn (x) → s(x) uniformly on [0, r], 0 < r < 1.

28. (a) Let sn (x) =

kxk and s(x) =

30. Define h on I ∪ J by ( h(x) =

f (x) if x ∈ I, g(x) if x ∈ J.

By 7.4.19, f = g on I ∩ J, hence h is well-defined and analytic on I ∪ J. 33. (a) P By 7.4.13,n if the series g(x) converges for |x−a| < r1 , then f (x)g(x) = cn (x − a) , where c0 = a0 b0 = a0 = 1 and cn =

n X

ak bn−k = a0 bn − bn = 0, n ≥ 1.

k=0

Therefore, f (x)g(x) = 1 for |x − a| < r1 . (b) Suppose |an | ≤ M n for all n. If |bj | ≤ (2M )j for 1 ≤ j ≤ n − 1, then |bn | ≤

n X

|ak ||bn−k | ≤

k=1

n X

2n−k M k M n−k < (2M )n .

k=1

By induction, |bn | ≤ (2M ) for all n. n

(c) By 7.4.16, there exists a constant M > 0 such that |an | ≤ M n for all n, hence (b) holds. By 7.4.16, g is analytic at a.

Section 8.1 1. Only (b) and (d) are not metrics. 3. Symmetry and coincidence are clear. To verify the triangle inequality d(x, y) ≤ d(x, z) + d(y, z) simply note that if xj 6= yj then either xj 6= zj or yj 6= zj so that every index j contributing to d(x, y) also contributes to d(x, z) + d(y, z). 5. By the triangle inequality, d(x, y) ≤ d(x, a) + d(a, y) ≤ d(x, a) + d(a, b) + d(b, y), hence d(x, y) − d(a, b) ≤ d(x, a) + d(b, y). Similarly d(a, b) − d(x, y) ≤ d(x, a) + d(b, y). 10. Let {xn } be a Cauchy sequence in E. Some Ej must contain a subsequence of {xn }, and since Ej is complete, the subsequence converges to a member of Ej . By Exercise 9, {xn } converges. The assertion is false for infinitely sets. For example, let {r1 , r2 , . . .} be an enumeration of the rationals, and take En = {rn } (or {r1 , . . . , rn }).

550

A Course in Real Analysis

13. The proof of (a) is straightforward. For the necessity in (b), let {(xn , yn )} be Cauchy in Z. Since d(xn , xm ) ≤ η (xn , yn ), (xm , ym ) , {xn } is Cauchy in X. Similarly, {yn } is Cauchy in Y . The converse is clear. Part (c) is proved in a similar manner, and (d) follows from (b) and (c). 15. Part (a) is straightforward. For example, if ρ(x, y) = 0, then ρ(x, y) = d(x, y), hence x = y. Parts (b) and (c) follow from the observation that ρ(x, y) = d(x, y) if either term is less than a. Part (d) follows from (b) and (c). The metrics need not be metrically equivalent: Take d to be the usual metric on R. The function σ does not define a metric on X since σ(x, x) = a > 0. 18. (a) The triangle inequality follows from the observation that the function t(1 + t)−1 is increasing on [0, +∞). The remaining properties of a metric are easily established. Parts (b) and (c) follow from the definition of ρ and the equation ρ(x, y) d(x, y) = , 1 − ρ(x, y) noting that ρ < 1. The metrics |x−y| and |x−y|/(1+|x−y|) are not metrically equivalent. 20. By Exercise 18, each ρk is a metric on X. It follows easily that ρ is a metric on X. For (b), suppose ρ(xn , x) → 0. Since ρk ≤ 2k ρ, ρk (xn , x) → 0. By Exercise 18, dk (xn , x) → 0. Conversely, suppose dk (xn , x) → 0, hence ρ k (xn , x) → 0, for each k. Given ε > 0, choose M ∈ N such that P −n < ε/2 and choose N > M so that n>M 2 ρ1 (xn , x) + ρ2 (xn , x) + · · · + ρM (xn , x) < ε/2 for all n ≥ N . For such n, ρ(xn , x) < ε. 23. For x, y ∈ [1, b], y(1 + xn )1/n − x(1 + y n )1/n |fn (x, y) − f (x, y)| = y(1 + y n )1/n n 1/n −n 1/n (1 + x ) − x(1 + y ) = (1 + y n )1/n |(1 + xn )1/n − x| + |x − x(1 + y −n )1/n | (1 + y n )1/n h i h i ≤ x (1 + x−n )1/n − 1 + x (1 + y −n )1/n − 1 h i ≤ b (21/n − 1) + (21/n − 1) → 0. ≤

Solutions to Selected Problems

551

Section 8.2 1.

(1, 0) (0, 0)

(0, 0)

B1d1 (0, 0)

B1d∞ (0, 0)

(1, 0)

FIGURE C.1: Open balls for Exercise 1. 3. r = d(x, y)/2. 5. If x, y ∈ Br (a) and 0 < t < 1, then ktx + (1 − t)y − ak = kt(x − a) + (1 − t)(y − a)k ≤ kt(x − a)k + k(1 − t)(y − a)k < tr + (1 − t)r = r. In general, spheres are not convex. (Consider (R2 , d2 ).) 8. By Exercise 8.1.6, ρ is a metric. Since ex is a continuous function on R with continuous inverse, ρ(xn , x) → 0 iff |xn − x| → 0. Therefore, ρ is topologically equivalent to the usual metric of R. (R, ρ) is not complete in this metric. For example, {−n}∞ n=1 is a Cauchy sequence in (R, ρ) with no limit. Therefore, ρ cannot be metrically equivalent to the usual metric of R. 12. Let {fn } be a sequence in C converging uniformly to f . Then fn (x) = fn (1 − x) for all n and x. Taking limits yields f (x) = f (1 − x) for all x. To see that C is not closed in the metric of Exercise 8.1.22, define fn ∈ C by fn (1/2) = 1, fn (x) = 0 if x ∈ [0, 1/2 − 1/n] ∪ [1/2 + 1/n, 1] and linear on [1/2 − 1/n, 1/2 + 1/n].

Section 8.3 1. (a) cl(A) ∪ cl(B) is closed and ⊇ A ∪ B, so cl(A) ∪ cl(B) ⊇ cl(A ∪ B). Similarly, cl(A ∪ B) ⊇ cl(A) and cl(A ∪ B) ⊇ cl(B). (d) int(A)∪int(B) is open and ⊆ A∪B, hence int(A)∪int(B) ⊆ int(A∪B). The example A = (0, 1], B = (1, 2) in R produces strict inclusion. (f) bd(cl(A)) = cl(cl(A)) \ int(cl(A)) ⊆ cl(A) \ int(A) = bd(A). The example A = Q in R produces strict inclusion.

552

A Course in Real Analysis 3. (b) (x, y, 0) : x2 + y 2 = 1 . (e) {(1, 0), (0, 0)}. 2 2 (f) The circle (x, y) : x + y = 1 together with the point (0, 0).

6. (a) By 8.3.6, y ∈ clY (A) iff for any sequence {an } in A with an → y, y ∈ A. The same characterization can be given for y ∈ clX (A) ∩ Y . 8. The sequence {fn } has no cluster points in C([0, 1]), k · k∞ , hence the set {f1 , f2 , . . .} is closed. The identically zero function is a cluster point of the sequence in C([0, 1]), k · k1 , hence the set is not closed in this space. 9. (a) B is open and B ⊆ C, hence B ⊆ int(C). The example B1 (x) = {x} and C1 (x) = X in a nontrivial discrete space gives strict inclusion. 12. (b) By 8.3.9, for any y ∈ R there exist integers nk > 0, mk such that nk /(2π) + mk → y − x/(2π) hence sin(nk + x) = sin 2π (nk + x)/(2π) + mk → sin(2πy). Therefore, the set is dense in [−1, 1]. 16. Let u ∈ U and choose ε > 0 such that Bε (u) ⊆ U . Since Y is dense in X, Bε (u) ∩ U ∩ Y = Bε (u) ∩ Y 6= ∅. If U is not open, then the assertion may not hold. For example, take X = [0, 1], Y = (0, 1], and U = {0}. S 20. (a) Let u, v ∈ I := i∈I Ii and t ∈ (0, 1). Then u ∈ Ii and v ∈ Ij for some i, j ∈ I. Since Ii ∩ Ij = 6 ∅, Ii ∪ Ij is an interval. Therefore, tu + (1 − t)v ∈ Ii ∪ Ij ⊆ I, hence I is an interval. Since each Ii is open, I is open.

Section 8.4 1. (b), (k) (o), (r) Limit and iterated limits are 0. (e) Limit does not exist. One iterated limit is 0, the other is 1. (i) Limit and iterated limits exist and = 1/2. 2. (a) The limit is 1 since 2 x − 5y 2 8y 2 8y 2 = − 1 < ≤ 8a−2/p |y|2(1−1/p) → 0. x2 + 3y 2 x2 + 3y 2 (|y|/a)2/p + 3y 2 (b) The limit does not exist, as may be seen by converting to polar coordinates.

Solutions to Selected Problems

553

6 y there exists a number 3. By the Cauchy mean value theorem, for each x = θ = θ(x, y) between x and y such that g(x, y) =

f 0 (θ) . cos θ

Since limy→x θ(x, y) = x, define g(x, x) = f 0 (x)/ cos x. 6. This follows from Exercise 8.1.5 7. Given ε > 0, choose p δ > 0 such that |f (x) − f (a)| < ε for all x, a with |x − a| < δ. Let (x − a)2 + (y − b)2 < δ/2. Then p p |x2 + y 2 − a2 − b2 | √ x2 + y 2 − a2 + b2 = p x2 + y 2 + a2 + b2 |x − a|(|x| + |a|) + |y − b|(|y| + |b|) p ≤ √ x2 + y 2 + a2 + b2 ≤ |x − a| + |y − b| p ≤ 2 (x − a)2 + (y − b)2 < δ, hence |g(x, y) − g(a, b)| < ε. 8. For a proof using the sequential criterion for uniform continuity, let xn − an , yn − bn → 0. Then αxn + βyn − (αan + βbn ) → 0, hence g(xn , yn ) − g(an , bn ) = f (αxn + βyn ) − f (αan + βan ) → 0. The functions xy and sin(xy) are not uniformly continuous on R2 . (For the former √ take xn = yn = n + 1/n and an =√bn = n. For the latter take xn = yn = 2π [n + 1/(3n)] and an = bn = 2π n.) 11. This follows from the inequalities |fj (x) − fj (a)| ≤ kf (x) − f (a)k ≤

n X

|fj (x) − fj (a)|.

j=1

12. We prove the uniform continuity part. Given ε > 0, choose a fixed n such that ρ(fn (x), f (x)) < ε/3 for all x ∈ X. Then choose δ > 0 such that ρ(fn (x), fn (a)) < ε/3 for all x, a ∈ X with d(x, a) < δ. The triangle inequality then shows that ρ(f (x), f (a)) < ε/3 for all x, a ∈ X with d(x, a) < δ.

Section 8.5 1. (a) compact. (f) bounded, not closed.

(b) closed, not bounded. (h) neither bounded nor closed.

554

A Course in Real Analysis

3. Compact case: Let {Ui : i ∈ I} be an open cover of C := C1 ∪ · · · ∪ Ck , where each Cj is compact. For each j there exists a finite set Ij ⊆ I such that {Ui : i ∈ Ij } covers Cj . If I0 is the union of the Ij , then {Ui : i ∈ I0 } is a finite subcover of C. 4. Such an intersection is closed and contained in a compact set and is therefore compact. 7. If E is totally bounded, then cl(E) is totally bounded. Since X is complete, cl(E) is complete. Therefore, by 8.5.8, cl(E) is sequentially compact. In particular, every sequence in E has a cluster point in X. Conversely, assume every sequence in E has a subsequence that converges in X. Let {yn } be a sequence in cl(E). For each n, choose xn ∈ E such that d(xn , yn ) < 1/n. By hypothesis, a subsequence xnk converges to some x ∈ X, hence ynk → x. Therefore, cl(E) is sequentially compact hence totally bounded. T∞ S∞ 11. Suppose n=1 Cn = ∅. Then n=1 Cnc = X, hence {Cnc : n ∈ N} is an open cover of X and therefore also of C1 . Choose n ∈ N such that C1 ⊆ C1c ∪ · · · ∪ Cnc . Taking complements, Cn = C1 ∩ · · · ∩ Cn ⊆ C1c ⊆ Cnc , which is impossible. 13. By the approximation property of suprema, there exist sequences {an } and {bn } in A such that d(an , bn ) → d(A). Since A is compact, there exists a subsequence {a0n } of {an } converging to some a ∈ A. Similarly, there exists a subsequence {b00n } of the corresponding subsequence {b0n } that converges to some b ∈ A. It follows that d(a, b) = limn d(a00n , b00n ) = d(A). For the example, take A = {fn } in C [0, 1] with the sup metric, where fn (x) = xn . Then d(A) = 1 > d(fn , fm ) for all m, n. 15. (a) For any a ∈ A, d(A, x) ≤ d(a, x) ≤ d(a, y) + d(y, x), hence d(A, x) − d(y, x) ≤ d(a, y). Taking the infimum over a yields d(A, x) − d(y, x) ≤ d(A, y) or d(A, x) − d(A, y) ≤ d(y, x). Interchanging x and y yields (a). (b) If x 6∈ cl(A) there exists r > 0 such that Br (x) ∩ cl(A) = ∅. Then d(a, x) ≥ r for all a ∈ A, hence d(A, x) > 0. Conversely, assume x ∈ cl(A) and let an ∈ A with an → x. Since d(A, x) ≤ d(an , x) → 0, d(A, x) = 0. (c) By (b), the denominator of FAB (x) is positive, hence FAB is welldefined. Continuity follows from (a), and clearly 0 ≤ FAB ≤ 1. The last assertions follow from (b). (d) U = {x ∈ X : FAB (x) < 1/2}, V = {x ∈ X : FAB (x) > 1/2}. 19. Let xn := f (1/n) and yn := f (2π −1/n). Then limn xn = limn yn = (1, 0) but f −1 (xn ) = 1/n → 0 and f −1 (yn ) = 2π − 1/n → 2π. 21. Each set is a continuous image of the compact set A × B.

Solutions to Selected Problems

555

Section 8.6 3. Suppose that F is equicontinuous at a. Given ε > 0, choose δ > 0 such that ρ(f (x), f (a)) < ε for all x ∈ X with d(x, a) < δ and all f ∈ F. Given sequences {fn } in F and {xn } in E with xn → a, choose N such that d(xn , a) < δ for all n ≥ N . For such n, ρ fn (xn ), fn (a) < ε. Conversely, suppose that F is not equicontinuous at a. Then there exist ε > 0 and members xn of E and fn of F such that d(xn , a) < 1/n but ρ fn (xn ), fn (a) ≥ ε. Therefore, the sequential condition does not hold. 7. Let x > a ≥ c. By the mean value theorem applied to the function f (z) = z −p on (na, nx), 1 1 pn|x − a| for some yn ∈ (na, nx). (nx)p − (na)p = y p+1 n Since ynp+1 ≥ (nc)p+1 ≥ cp+1 , |(nx)−p − (na)−p | ≤ p|x − a|c−(p+1) , which shows equicontinuity. 9. Take xn = a + π/n in Exercise 3. Then xn → a but sin(nxn ) − sin(na) = −2 sin(na), which has no limit if a is a nonzero rational number. 11. By the mean value theorem, |f (x) − f (y)| ≤ M |x − y|. 14. Let kfi k∞ ≤ M for all i. Then |Fi (x) − Fi (y)| ≤ M |x − y|, hence F is uniformly equicontinuous on [a, b]. It follows that the uniform closure G of F in C([a, b]) is uniformly equicontinuous on [a, b] (Exercise 6). Since G is also closed and bounded, it is compact (Arzelà–Ascoli Theorem), hence totally bounded.

Section 8.7 1. (c) not connected.

(d) path connected, hence connected.

(e) connected iff −1 ≤ a ≤ 1. 5. Then f (u) and f (v) have opposite signs, say f (u) < 0 < f (v). Since the range of f is connected, it contains the interval (f (u), f (v)). 7. Let f = (g, h) : X → R2 and L := {(x, x) : x ∈ R}. Then L separates R2 into two open half-planes H1 and H2 . Choose any x0 ∈ X and suppose f (x0 ) ∈ H1 . Then E := f −1 (H1c ) = f −1 (H2 ) is both open and closed. Since X is connected, E = ∅. Therefore, f (X) ⊆ H1 .

556

A Course in Real Analysis

9. Consider the case B := B1 (0). Any point in B c may be connected to the sphere S := S2 (0) by a radial line segment. Since S is path connected (8.7.10), B c is path connected. 12. Denote the union by A. Let f : A → {0, 1} be continuous. Since An is connected, f (An ) is a single point. Since An ∩ An+1 = 6 ∅, an induction argument shows that f is constant. 16. Suppose that f : L → C is such a function. Then f −1 : C → L is continuous (8.5.11). Remove a point p from the interior of L. Then f −1 maps the connected set C \ f (p) onto the disconnected set L \ p. The function f (t) = (cos t, sin t) maps [0, 2π] continuously onto the circle x2 + y 2 = 1. 20. Let x ∈ bd(A) and ε > 0. Then there exist u, v ∈ Bε (x) such that f (u) ≥ c and f (v) < c. Since Bε (x) is convex, it is connected, hence f Bε (x) is an interval and so must contain c. Taking ε = 1/n, we may construct a sequence xn → x with f (xn ) = c for each n. Therefore, f (x) = c. This shows that bd(A) ⊆ B. The example f (x) = x2 on R with c = 0 shows that the inclusion may be strict. 22. (a) Cx is connected by Exercise 13. Let u ∈ Cx and choose ε > 0 such that Bε (u) ⊆ U . Since Bε (u) is connected, Bε (u) ∪ Cx is connected, hence Bε (u) ⊆ Cx . Therefore, Cx is open. If Cx ∩ Cy 6=, then Cx ∪ Cy is connected hence Cx = Cy . Therefore, U is a union of pairwise disjoint components. (b) Choose a point with rational coordinates in each component in (a). Since these points form a countable set, the union is countable.

Section 8.8 3. Choose a sequence of polynomials Pn converging uniformly to f on Rb Rb [a, b]. By hypothesis, a f Pn = 0 for all n, hence a f 2 = 0. Since f is continuous, f = 0. If a ≥ 0, then the polynomials with even powers form a separating algebra, hence the result follows as before. 6. By the Stone–Weierstrass theorem, there exists a sequence of functions gn in A converging uniformly to f . Set fn = gn − gn (x0 ). Then fn ∈ A and gn (x0 ) → 0, hence kfn − f k∞ ≤ kfn − gn k∞ + kgn − f k∞ = |gn (x0 )| + kgn − f k∞ → 0. 9. By 8.8.8, there exists a sequence {Tn } of trigonometric polynomials converging uniformly to f on [0, 2π]. For any j, sin(jx) and cos(jx) m n are x, hence, by hypothesis, R 2π linear combinations of products sin xRcos 2π 2 f (x)T (x) dx = 0 for all n. Therefore, f = 0 so f = 0. n 0 0

Solutions to Selected Problems

557

Pm 11. The set of all functions of the form T (x) := b0 + j=1 bj sin(jx) on [−π/2, π/2] is an algebra A containing the constant functions. Since sin x separates points, so does A. Therefore, given ε > 0, kf − T k∞ < ε/2 for some T . Since f (0) = 0, |b0 | < ε/2. Therefore, kf − (T − b0 )k∞ < ε. Pn 15. The functions i=1 gi (x)hi (y) form an algebra and separate points of X ×Y.

Section 8.9 1. Assume that X has the decreasing sequence property and let {xn } be a Cauchy sequence in X. Take Cn = cl {xn , xn+1 , . . .} . Then Cn is closed, Cn+1 ⊆ Cn and d(Cn ) → 0 (because {xn } is Cauchy). By assumption, there exists x ∈ X such that x ∈ Cn for all n. It follows that some subsequence of {xn } converges to x. Therefore xn → x (Exercise 8.1.9). 3. Let {r1 , r2 . . .} be an enumeration of Q. Then Un := {rn , rn+1 , . . .} is T open and dense in Q but n Un = ∅.

Section 9.1 1. (a)

2y dx − 2x dy . (x + y)2

(e) cos(x2 y)(2xy dx + x2 dy).

2

(h) exy (y 2 dx + 2xy dy). x e sin y ex cos y 2. (b) . ey cos x ey sin x

(c)

3 1 y − x2 y 2 2 2 4xy 2 (x + y )

x3 − xy 2 . −4y 3

3. Let ∆ = {(x, x) : x ∈ R}. (a) Differentiable on R2 iff p, q > 3, in which case partials are continuous. (d) Differentiable on R2 iff p, q > 1. Partials are continuous iff p > 2. 4. (a) Differentiable and partials continuous iff p + q > 1. (d) Differentiable and partials continuous iff p + q > 2s + 1. 8. x · ∇f (x) = a · f (x), x · ∇g(x) = g(x). 10. e−f (x) (ex1 , ex2 , . . . , exn ). 12. (a)

xi . kxk

(c)

kxk2 − x2i . kxk3

Section 9.2 1. Let α denote the right side of the inequality. Clearly, kT k ≤ α. If kxk ≤ 1, then kT xk ≤ kT kkxk ≤ kT k, hence α ≤ kT k.

558

A Course in Real Analysis

3. Since ∇(ψ −1 ) = ψ −2 ∇ψ, the assertion follows from the scalar product rule. 4. (a) Let f (x) = x and ψ(x) = kxk in the product rule (9.2.6). Since dfx is the identity transformation and ∇ψ(x) = x/kxk, dgx (h) = kxkh + kxk−1 (x · h)x. Therefore, dgx (x) = kxkx + kxk−1 (x · x)x = 2kxkx. 6. Let η(h), µ(k) be such that f (a+h) = f (a)+dfa (h)+khkη(h), g(b+k) = g(b)+dgb (k)+kkkµ(k) for all h ∈ Rp , k ∈ Rq with khk, kkk sufficiently small, and lim η(h) = lim µ(k) = 0.

h→0

k→0

Let T (h, k) = αdfa (h) + βdgb (k). Then T is linear in (h, k) and ε(h, k) := F (a + h, b + k) − F (a, b) − T (h, k) = αkhkη(h) + βkkkµ(k). Since k(h, k)k =

p

k(hk2 + kkk2 ≥ khk, kkk,

kε(h, k)k |α|khkkη(h)k + |β|kkkkµ(k)k ≤ ≤ |α|kη(h)k + |β|kµ(k)k. k(h, k)k k(h, k)k 10. Part (a) follows from Exercise 8.5.14. For (b), set g(t) = kf (t) − vk2 . Then g 0 (t) = 2(f (t) − v) · f 0 (t), and since g(t0 ) is the minimum value of g, g 0 (t0 ) = 0.

Section 9.3 1. g 0 ϕ(x)ψ(y) ϕ0 (x)ψ(y), ϕ(x)ψ 0 (y) . 3. gx a · x, b · x)a + gy a · x, b · x)b. 7. T f 0 (x). 10. (a) Let g(t) = f (a + tu). By definition, Du f (a) = g 0 (0). On the other hand, by the chain rule, g 0 (t) = u · ∇f (a + tu). Setting t = 0 yields (a). f (tu) − f (0) ab2 = lim 2 exists for all u = (a, b). f is not t→0 t→0 a + b4 t2 t √ continuous at (0, 0), since f → 0 along y = 0 but f = 1/2 along y = x, x > 0.

(c) lim

Solutions to Selected Problems 12. Let F (x) =

Rb a

559

f (t, x) dt. By the mean value theorem,

F (x + h) − F (x) = h

b

Z

fx (t, x + rh) dt, for some r = r(t, x, h) ∈ (0, 1).

a

Since fx is uniformly continuous, fx (t, x + rh) → fx (t, x) uniformly in t Rb on [a, b] as h → 0. Therefore, F 0 (x) = a fx (t, x) dt. 15. Let ϕ(t) = t−p f (tx). By the product rule and the chain rule, ϕ0 (t) =

1 −p f (tx) + p ∇f (tx) · x. tp+1 t

If f is homogeneous of degree p, then ϕ is a constant function, hence p 1 f (tx) = p ∇f (tx) · x. tp+1 t Setting t = 1 produces the desired identity. On the other hand, if the identity holds, then tx · ∇f (tx) = pf (tx) for all t and x, hence ϕ0 (t) = 0. Therefore, ϕ(t) = ϕ(1), which shows that f is homogeneous of degree p. 17. Fix y ∈ C and define g on U by g(x) = f (x) − f (y) − dfy (x). Then g(x) − g(y) = f (x) − f (y) − dfy (x − y) and dgz = dfz − dfy (9.1.7), hence the result follows from 9.3.6 applied to g.

Section 9.4 1. (a) {(x, y) : x 6= y}. (e) {(x, y) : xy 6= 0}.

(b) {(x, y) : x + y 6= (2n + 1)π/2, n ∈ Z}. (f) {(x, y) : x, y > 0, y 6= x}.

(i) {(x, y) : y 6= ±x}. (j) {(x, y, z) : xyz 6= 0}. √ √ 2. (i) x = 12 u + u2 − 4v , y = 12 u − u2 + 4v . √ √ 1/2 1/2 (v) x = √12 u − u2 − 4v 2 , y = √12 u + u2 − 4v 2 . 4. Set u = x(x2 + y 2 )−1 and v = y(x2 + y 2 )−1 . Square and add. Jf = −1.

Section 9.5 1. Let f (x, y) = x + y 2 + exy − 1. Then fx (0, 0) = 1 and fy (0, 0) = 0, so the implicit function theorem guarantees a local solution x = x(y) but says nothing about a solution y = y(x). 5. Let F (x, y, z) = sin(x + z) + ln(y + z) − G(x, y, z) = e

xz

+ sin(πy + z) − 1.

√

2/2 and

560

A Course in Real Analysis Then, at (π/4, 1, 0), F = G = 0 and ∂(F, G) ∂(F, G) ∂(F, G) 6= 0. ∂(x, y) ∂(y, z) ∂(x, z)

8. Let F = x − y + z + u2 − 2, G = −x + 2z + u3 − 2, H = −y + 3z + u4 − 3. Then, at (1, 1, 1, 1), F = G = H = 0 and ∂(F, G, H) ∂(F, G, H) ∂(F, G, H) 6= 0. ∂(x, y, u) ∂(y, z, u) ∂(x, z, u) 9. (b) Let a := fx (0, 0) and b := fy (0, 0). Thecondition is b(a + 1) 6= 0. The −fx (x, y)fx f (x, y), y . derivative is fy (x, y)fx f (x, y), y + fy f (x, y), y 11. (a) The condition is a(a3 − ab2 − b3 ) 6= 0 where a := fx (0, 0), b := fy (0, 0). 13. f 0 (1) + g 0 (1) + h0 (1) 6= 0. 15. Let y = F (x1 , . . . , xn ). If x1 is a function of x2 , . . . , xn , then, assuming the necessary differentiability, 0= hence

∂y ∂x1 = Fx 1 + Fxn , ∂xn ∂xn

∂x1 Fx = − n . In this manner we obtain ∂xn Fx1

Fx Fx ∂x2 ∂x3 ∂xn ∂x1 Fx Fx ... = (−1)n 1 2 . . . n−1 n = (−1)n . ∂x1 ∂x2 ∂xn−1 ∂xn Fx 2 Fx 3 Fxn Fx1

Section 9.6 1. (b) zrr = t2 zxx + 2tzxy + zyy , ztt = r2 zxx + 2rzxy + zyy . (e) zrr = (e2r sin2 t)zxx + (e2r cos2 t)zyy + (2e2r sin t cos t)zxy + (er sin t)zx + (er cos t)zx , ztt = (e2r cos2 t)zxx + (e2r sin2 t)zyy − (2e2r sin t cos t)zxy − (er sin t)zx − (er cos t)zx . (f) zr = axzx , zt = byzy , zrr = a2 x2 zxx + a2 xzx , ztt = b2 y 2 zyy + b2 yzy . 4. Fx + zx Fz = 0, hence 2 0 = Fxx + 2zx Fxz + zxx Fzz + zxx Fz = Fxx − 2

and so zxx = −

Fx F2 Fxz + x2 Fzz + zxx Fz Fz Fz

1 Fx F2 Fxx + 2 2 Fxz − x3 Fzz . Fz Fz Fz

Solutions to Selected Problems

561

5. (a) ut = −k 2 u, uxx = −u. (b) By logarithmic differentiation, 2 2 1 2 x x − − 2 u. ut = u, uxx = 4k 2 t2 2t 4k 4 t2 4k t 7. The second order partial derivatives are wρρ = (sin φ cos θ)2 wxx + (sin φ sin θ)2 wyy + (cos θ)2 wzz + (2 sin φ) (sin φ sin θ cos θ)wxy + (cos φ cos θ)wxz + (cos φ sin θ)wyz , wθθ = (ρ sin φ)2 (sin2 θ)wxx + (cos2 θ)wyy − 2(sin θ cos θ)wxy − (ρ sin φ)[(cos θ)wx − (sin θ)wy ], wφφ = ρ (cos φ cos θ)2 wxx + (cos φ sin θ)2 wyy + (sin φ)2 wzz + 2ρ2 (cos2 φ sin θ cos θ)wxy − (cos φ sin φ cos θ)wxz − (cos φ sin φ sin θ)wyz − ρ (sin φ cos θ)wx + (sin φ sin θ)wy + (cos φ)wz . 2

9. fxi = pxi kxkp−2 g 0 (kxkp ), hence fxi xi = p kxkp−2 + (p − 2)x2i kxkp−4 g 0 (kxkp ) + p2 x2i kxk2(p−2) g 00 (kxkp ) , h i fxi xj = pxi xj (p − 2)kxkp−4 g 0 (kxkp ) + pkxk2(p−2) g 00 (kxkp ) (i 6= j).

Section 9.7 1. (b)

∂3f ∂3f ∂3f ∂3f (dx)3 + 3 2 (dx)2 dy + 3 dx (dy)2 + 3 (dx)3 . 3 2 ∂x ∂x ∂y ∂x∂y ∂y

2. (a) 2y 2 (3x + y) (dx)2 + 12xy(x + y) dx dy + 2x2 (x + 3y) (dy)2 . 6 4 2 (b) 4 (dx)2 + 3 2 dx dy + 2 3 (dy)2 . x y x y x y (c) −y 2 sin(xy) (dx)2 + 2 cos(xy) − xy sin(xy) dx dy − x2 sin(xy) (dy)2 . (d) 2f (x, y) (2x2 + 1) (dx)2 + 4xy dx dy + (2y 2 + 1) (dy)2 . 1 (e) 2 2(y − x2 ) (dx)2 − 4x dx dy − (dy)2 . 2 (x + y) 3. zero. 5. (a) f + h1

∂f ∂f ∂f + h2 + h3 . The terms are evaluated at a. ∂x1 ∂x2 ∂x3

8. By induction, ∂ p f (x) = bp11 bp22 . . . bpnn ϕ(p) b · x , . . . ∂xpnn

∂xp11 ∂xp22

562

A Course in Real Analysis hence (x · ∇)p f (0) = ϕ(p) (0)

X

p (b1 x1 )p1 (b2 x2 )p2 . . . (bn xn )pn p1 , p2 , . . . , pn

= ϕ(p) (0)(b · x)p , where the second equality follows from the multinomial theorem. 11. (a) x + y − 16 (x + y)3 .

(d) x + y − 31 (x + y)3 .

Section 9.8 2. x2 +2y 2 +3z 2 −xy−yz−xz =

1 2

(x−y)2 +(y−z)2 +(x−z)2 +y 2 +2z 2 ≥ 0.

3. (a) (0, 0): local min; (−4/3, 4/3): saddle. (d) (1, 1), (−1, −1): local max; (0, 0): saddle. (f) (2, −2): saddle. (i) (1/3, 1/3): local max; (0, 0), (0, 1), (1, 0): saddle. 4. (a) Use polar coordinates to optimize the resulting single variable function g(θ) = cos θ +sin θ, g 0 (θ) = − sin θ +cos θ, 0 ≤ θ ≤ 2π. The √ critical points of g occur at values of θ that satisfy sin θ = cos θ = ± 2/2. At these √ values, g(θ) = ± 2. Also, g(0) √ = g(2π) = 1. Therefore, the maximum and minimum values of f are ± 2. 2 6. (b) The only critical point is (2/3, −1/3). On √bd(D),√f = x − x + 2, −1 ≤ x ≤ 1, which has critical point (±1/ 2, ±1/ 2). Checking the values of f at these√points and √ at (±1, 0) shows that the maximum of f is f (−1, 0) = f (−1/ 2, −1/ 2) = 2 and the minimum is f (2/3, −1/3) = −1/3. √ (d) The only critical point is (0, 0). On bd(D), f = ± sin x 1 − x2 , √ √ −1 ≤ x ≤ 1, which has critical points (±1/ 2, ±1/ 2). Checking the values of f at these points and at (±1, 0) shows that the extreme values of f are ± sin(1/2).

10. Since

lim

(x,y)→(0+ ,0+ )

f (x, y) =

lim

(x,y)→(+∞,+∞)

f (x, y) = +∞,

f has a minimum on (0, c) × (0, d) for suitable c, d > 0, and the minimum must occur at a critical point. The unique critical point is (a2/3 b−1/3 , a−1/3 b2/3 ), which gives the minimum 3(ab)1/3 . 2 Pn 11. Let f (m, b) = i=1 yi − mxi − b . Since not all x coordinates are the same, m must be bounded. Since the data is bounded, b must be

Solutions to Selected Problems

563

bounded. Therefore, the minimum exists and must occur at the unique critical point (m, b) of f , which is determined by the system n n X X (yi − mxi − b)(−xi ) = (yi − mxi − b)(−1) = 0. i=1

i=1

It follows that x · y − mkxk2 − nbx = mx − y + b = 0. 15. Let f (x, y) = ax2 + 2bxy + y 2 and g(x, y) = x2 + y 2 − c2 . The equation ∇f = λ∇g yields ax + by = λx and bx + y = λy. Multiplying the first equation by x and the second by y and then adding yields f (x, y) = λ(x2 + y 2 ) = λc2 . Since the system (a − λ)x + by = bx + (1 − λ)y = 0 has a nontrivial solution iff the determinant of the coefficient matrix is zero, we obtain λ2 − (a + 1)λ + a − b2 = 0. Solving for λ we see that the maximum and minimum values of f on the circle are p λc2 = a + 1 ± (a + 1)2 + 4(a − b2 ) (c2 /2). 17. We minimize f (x, y) := (x − 1)2 + (y − 2)2 + (z − 3)2 subject to the constraint g(x, y, z) := x2 + y 2 − z = 0. From ∇f = λ∇g we have x − 1 = λx, y − 2 = λy, z − 3 = −λ/2, from which it follows that y = 2x and z = 3−(x−1)/2x. From z = x2 +y 2 we then have 3 − (x − 1)/2x = 5x2 , or 10x3 − 5x − 1 = 0. 19. We minimize f (x, y) := (x − 1)2 + (y − 2)2 + (z − 3)2 subject to the constraint g(x, y, z) := z 2 − x2 − y 2 − 1 = 0. From ∇f = λ∇g we have x=

1 2 3 , y= x, z = , 1+λ 1+λ 1−λ

hence y = 2x and z = 3x/(2x − 1). Substituting into z 2 − x2 − y 2 = 1 yields the desired polynomial. 22. Let f (x, y, z) = x + 2y + 3z, g1 (x, y, z) = x + y + z − 1, and g2 (x, y, z) = x2 + y 2 + z 2 − 1. From ∇f = λ1 ∇g1 + λ2 ∇g2 , 1 = λ1 + 2λ2 x, 2 = λ1 + 2λ2 y, 3 = λ1 + 2λ2 z. Subtracting yields 1 = 2λ2 (y − x) = 2λ2 (z − y) so y − x = z − y or x − 2y + z = 0. Combining this with the constraint x + y + z = 1 yields y = 1/3 and z = 2/3 − x. From the constraint x2 + y 2 + z 2 = 1 we √ obtain x2 − 2x/3 − 2/9 = 0 so x = (1√± 3)/3. The maximum value of f (≈ 3.154694) √ occurs when x = (1 − 3)/3, the minimum (≈ 0.845293) when x = (1 + 3)/3.

564

A Course in Real Analysis Pn 24. We minimize f (x) := i=1 (xi − bi )2 subject to the constraint g(x) := a · x − c = 0. From ∇f = λ∇g we have (xj − bj ) = λaj /2, hence (xj − bj )2 =

λ2 a2j λ(aj xj − aj bj ) = , 4 2

1 ≤ j ≤ n.

Adding and using the constraint, f (x) =

λ λ λ2 kak2 and f (x) = (a · x − a · b) = (c − a · b) 4 2 2

Therefore, λ = 2(c − a · b)kak−2 , which gives the desired conclusion. 26. Let f (x) = kx−ak2 and g(x) = kxk2 −1. The equation ∇f = λ∇g leads to the system xj − aj = λxj , or xj (1 − λ) = aj , j = 1, . . . , n. Therefore, n X xj = aj /(1 − λ), so by the constraint kak2 = a2j = (1 − λ)2 , hence j=1

x = ±a/kak. The distance to the sphere is then the smaller of

± akak−1 − a = 1 ± kak−1 kak, namely 1 − kak−1 kak. Pn 27. (a) Let f (x) = a · x and g(x) = i=1 bi /xi − 1. From ∇f = λ∇g we have ai = −λbi /x2i , hence p p √ = ai bi /µ, µ := −λ. xi = µ bi /ai and bi x−1 i √ Pn √ The constraint implies that µ = i=1 ai bi . Since ai xi = µ ai bi , the n p X 2 minimum is a i bi . i=1

That the value is indeed the minimum may be argued as follows. If x is any point satisfying the constraint, then f (x) = a1 x1 + a2 x2 + · · · + an−1 xn−1 +

1−

an bn Pn−1 i=1

bi /xi

,

where xi > bP i . Thus |f | → +∞ as the variables x1 , x2 , . . ., xn−1 become n large or as i=1 bi /xi nears 1. Therefore, the minimum occurs in the interior of a compact set, hence at the point obtained above. 30. Since cl(U ) is compact, there exist points u, v ∈ cl(U ) such that f (u) ≤ f (x) ≤ f (v) for all x ∈ cl(U ). If f (u) = f (v), then f is a constant function and the result follows. If f (u) < f (v), then one of the points, say u, must lie in U . By 9.8.2, f 0 (u) = 0.

Solutions to Selected Problems

565

Section 10.1 1. Let µ be as in 10.1.5 with pk = 1/k, or let µ be as in ??, and take Ak = {k, k + 1, . . .}. 3. By the inclusion-exclusion principle and additivity, µ(A ∪ B) = µ(A) + µ(B) − µ(A ∩ B) = µ(A), and µ(A) = µ(A \ B) + µ(A ∩ B) = µ(A \ B). 5. Let B = A1 ∪ · · · ∪ An . By 10.1.6(c), µ(A1 ∪ · · · ∪ An+1 ) = µ(B ∪ An+1 ) = µ(B) + µ(An+1 ) − µ(B ∩ An+1 ). By the induction hypothesis, µ(B) + µ(An+1 ) =

n+1 X i=1

µ(Ai ) −

n X

µ(Ai ∩ Aj ) + · · · + (−1)n−1 µ(A1 ∩ · · · ∩ An )

1≤i 0} ∈ F. Conversely, if E ∈ F and t ∈ R, then if t < 0, ∅ c {x : 1E (x) ≤ t} = E if 0 ≤ t < 1, and S if t ≥ 1. In each case, {x : 1E (x) ≤ t} ∈ F, hence 1E is measurable. 10. 1A∆B (x) = 1 iff 1A (x) − 1B (x) = 1 or 1B (x) − 1A (x) = 1 iff x ∈ A \ B or x ∈ B \ A. 14. The range of f is {1/k : k ∈ N}. Since f (x) = 1/k iff 1/x − 1 < k ≤ 1/x iff 1/(k + 1) < x ≤ 1/k, the assertion follows from Exercise 7. 17. Let ε > 0 and choose N ∈ N such that 2N > 1/ε and f ≤ N on S. Let k > N , so 0 ≤ f ≤ k. Then, in the notation of the proof of 10.5.8, k

fk =

k2 X j−1 j=1

2k

1Ak,j , where

Ak,j = x ∈ S : (j − 1)2−k ≤ f (x) < j2−k , j = 1, 2, . . . , k2k . For any x ∈ S there exists j ∈ {1, 2, . . . , k2k } such that x ∈ Ak,j , hence 0 ≤ f (x) − fk (x) = f (x) − (j − 1)2−k ≤ 1/2k < ε. 19. (a) That F is a σ-field follows from properties of preimages. Tm (b) Since f −1 I1 × · · · × Im = j=1 fj−1 (Ij ), F contains all intervals, hence, by minimality, F = B(Rm ). (c) If A ∈ B(R) and B := F −1 (A), then B ∈ B(Rm ), hence, from (b), g −1 (A) = f −1 (B) ∈ B(Rn ).

568

A Course in Real Analysis

Section 11.2 3. Since f

−1

{d } = 2

∞ [

d/10k , (d + 1)/10k ∩ I,

k=1

Z

f dλ =

[0,1]

9 X d=1

9 1 X d2 λ f −1 {d2 } = d2 . 9 d=1

R 5. (a) If E |g| dλR= 0, then g = 0 a.e.,R hence both integralsR in (a) are R zero. Suppose E |g| dλ 6= 0. Since m E |g| ≤ E f |g| ≤ M E |g| on E, −1 R R a := E |g| dλ f |g| dλ satisfies the requirement. E (b) For example, take E = (−1, 1) and f = g = 1(−1,0) − 1(0,1) , so Z Z Z fg = 1(−1,0) + 1(0,1) = 2, g = 0. E

E

E

(c) Given ε > 0, choose δ > 0 such that −ε < f (t) − f (x) < ε for all t ∈ (x − δ, x + δ). If y ∈ (x, x + δ), then, by (a), Z Z Z f dλ − f dλ − f (x)(y − x) = [f (t) − f (x)]1[x,y] (t) dt [a,y] [a,x] Z = ay 1[x,y] dλ = ay (y − x), where |ay | ≤ ε. Dividing by y − x proves t