# Riley_Mathematical_Methods_for_Physics_and_Engineering_3ed_(2006).pdf

## Physics 411 with Na at University of Illinois - Chicago *

- StudyBlue
- Illinois
- University of Illinois - Chicago
- Physics
- Physics 411
- Na
- Riley_Mathematical_Methods_for_Physics_and_Engineering_3ed_(2006).pdf

Test T.

This page intentionally left blank Mathematical Methods for Physics and Engineering The third edition of this highly acclaimed undergraduate textbook is suitable for teaching all the mathematics ever likely to be needed for an undergraduate course in any of the physical sciences. As well as lucid descriptions of all the topics covered and many worked examples, it contains more than 800 exercises. A number of additional topics have been included and the text has undergone significant reorganisation in some areas. New stand-alone chapters: • give a systematic account of the ‘special functions’ of physical science • cover an extended range of practical applications of complex variables including WKB methods and saddle-point integration techniques • provide an introduction to quantum operators. Further tabulations, of relevance in statistics and numerical integration, have been added. In this edition, all 400 odd-numbered exercises are provided with complete worked solutions in a separate manual, available to both students and their teachers; these are in addition to the hints and outline answers given in the main text. The even-numbered exercises have no hints, answers or worked solutions and can be used for unaided homework; full solutions to them are available to instructors on a password-protected website. Ken Riley read mathematics at the University of Cambridge and proceeded to a Ph.D. there in theoretical and experimental nuclear physics. He became a research associate in elementary particle physics at Brookhaven, and then, having taken up a lectureship at the Cavendish Laboratory, Cambridge, continued this research at the Rutherford Laboratory and Stanford; in particular he was involved in the experimental discovery of a number of the early baryonic resonances. As well as having been Senior Tutor at Clare College, where he has taught physics and mathematics for over 40 years, he has served on many committees concerned with the teaching and examining of these subjects at all levels of tertiary and undergraduate education. He is also one of the authors of 200 Puzzling Physics Problems. Michael Hobson read natural sciences at the University of Cambridge, spe- cialising in theoretical physics, and remained at the Cavendish Laboratory to complete a Ph.D. in the physics of star-formation. As a research fellow at Trinity Hall, Cambridge and subsequently an advanced fellow of the Particle Physics and Astronomy Research Council, he developed an interest in cosmology, and in particular in the study of fluctuations in the cosmic microwave background. He was involved in the first detection of these fluctuations using a ground-based interferometer. He is currently a University Reader at the Cavendish Laboratory, his research interests include both theoretical and observational aspects of cos- mology, and he is the principal author of General Relativity: An Introduction for Physicists. He is also a Director of Studies in Natural Sciences at Trinity Hall and enjoys an active role in the teaching of undergraduate physics and mathematics. Stephen Bence obtained both his undergraduate degree in Natural Sciences and his Ph.D. in Astrophysics from the University of Cambridge. He then became a Research Associate with a special interest in star-formation processes and the structure of star-forming regions. In particular, his research concentrated on the physics of jets and outflows from young stars. He has had considerable experi- ence of teaching mathematics and physics to undergraduate and pre-universtiy students. ii Mathematical Methods for Physics and Engineering Third Edition K.F. RILEY, M.P. HOBSON and S.J. BENCE cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge cb2 2ru,UK First published in print format isbn-13 978-0-521-86153-3 isbn-13 978-0-521-67971-8 isbn-13 978-0-511-16842-0 © K. F. Riley, M. P. Hobson and S. J. Bence 2006 2006 Informationonthistitle:www.cambridge.org/9780521861533 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. isbn-10 0-511-16842-x isbn-10 0-521-86153-5 isbn-10 0-521-67971-0 Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Published in the United States of America by Cambridge University Press, New York www.cambridge.org hardback paperback paperback eBook (EBL) eBook (EBL) hardback Contents Preface to the third edition page xx Preface to the second edition xxiii Preface to the first edition xxv 1 Preliminary algebra 1 1.1 Simple functions and equations 1 Polynomial equations; factorisation; properties of roots 1.2 Trigonometric identities 10 Single angle; compound angles; double- and half-angle identities 1.3 Coordinate geometry 15 1.4 Partial fractions 18 Complications and special cases 1.5 Binomial expansion 25 1.6 Properties of binomial coeﬃcients 27 1.7 Some particular methods of proof 30 Proof by induction; proof by contradiction; necessary and suﬃcient conditions 1.8 Exercises 36 1.9 Hints and answers 39 2 Preliminary calculus 41 2.1 Diﬀerentiation 41 Diﬀerentiation from first principles; products; the chain rule; quotients; implicit diﬀerentiation; logarithmic diﬀerentiation; Leibnitz’ theorem; special points of a function; curvature; theorems of diﬀerentiation v CONTENTS 2.2 Integration 59 Integration from first principles; the inverse of diﬀerentiation; by inspec- tion; sinusoidal functions; logarithmic integration; using partial fractions; substitution method; integration by parts; reduction formulae; infinite and improper integrals; plane polar coordinates; integral inequalities; applications of integration 2.3 Exercises 76 2.4 Hints and answers 81 3 Complex numbers and hyperbolic functions 83 3.1 The need for complex numbers 83 3.2 Manipulation of complex numbers 85 Addition and subtraction; modulus and argument; multiplication; complex conjugate; division 3.3 Polar representation of complex numbers 92 Multiplication and division in polar form 3.4 de Moivre’s theorem 95 trigonometric identities; finding the nth roots of unity; solving polynomial equations 3.5 Complex logarithms and complex powers 99 3.6 Applications to diﬀerentiation and integration 101 3.7 Hyperbolic functions 102 Definitions; hyperbolic–trigonometric analogies; identities of hyperbolic functions; solving hyperbolic equations; inverses of hyperbolic functions; calculus of hyperbolic functions 3.8 Exercises 109 3.9 Hints and answers 113 4 Series and limits 115 4.1 Series 115 4.2 Summation of series 116 Arithmetic series; geometric series; arithmetico-geometric series; the diﬀerence method; series involving natural numbers; transformation of series 4.3 Convergence of infinite series 124 Absolute and conditional convergence; series containing only real positive terms; alternating series test 4.4 Operations with series 131 4.5 Power series 131 Convergence of power series; operations with power series 4.6 Taylor series 136 Taylor’s theorem; approximation errors; standard Maclaurin series 4.7 Evaluation of limits 141 4.8 Exercises 144 4.9 Hints and answers 149 vi CONTENTS 5 Partial diﬀerentiation 151 5.1 Definition of the partial derivative 151 5.2 The total diﬀerential and total derivative 153 5.3 Exact and inexact diﬀerentials 155 5.4 Useful theorems of partial diﬀerentiation 157 5.5 The chain rule 157 5.6 Change of variables 158 5.7 Taylor’s theorem for many-variable functions 160 5.8 Stationary values of many-variable functions 162 5.9 Stationary values under constraints 167 5.10 Envelopes 173 5.11 Thermodynamic relations 176 5.12 Diﬀerentiation of integrals 178 5.13 Exercises 179 5.14 Hints and answers 185 6 Multiple integrals 187 6.1 Double integrals 187 6.2 Triple integrals 190 6.3 Applications of multiple integrals 191 Areas and volumes; masses, centres of mass and centroids; Pappus’ theorems; moments of inertia; mean values of functions 6.4 Change of variables in multiple integrals 199 Change of variables in double integrals; evaluation of the integral I = integraltext ∞ −∞ e −x 2 dx; change of variables in triple integrals; general properties of Jacobians 6.5 Exercises 207 6.6 Hints and answers 211 7 Vector algebra 212 7.1 Scalars and vectors 212 7.2 Addition and subtraction of vectors 213 7.3 Multiplication by a scalar 214 7.4 Basis vectors and components 217 7.5 Magnitude of a vector 218 7.6 Multiplication of vectors 219 Scalar product; vector product; scalar triple product; vector triple product vii CONTENTS 7.7 Equations of lines, planes and spheres 226 7.8 Using vectors to find distances 229 Point to line; point to plane; line to line; line to plane 7.9 Reciprocal vectors 233 7.10 Exercises 234 7.11 Hints and answers 240 8 Matrices and vector spaces 241 8.1 Vector spaces 242 Basis vectors; inner product; some useful inequalities 8.2 Linear operators 247 8.3 Matrices 249 8.4 Basic matrix algebra 250 Matrix addition; multiplication by a scalar; matrix multiplication 8.5 Functions of matrices 255 8.6 The transpose of a matrix 255 8.7 The complex and Hermitian conjugates of a matrix 256 8.8 The trace of a matrix 258 8.9 The determinant of a matrix 259 Properties of determinants 8.10 The inverse of a matrix 263 8.11 The rank of a matrix 267 8.12 Special types of square matrix 268 Diagonal; triangular; symmetric and antisymmetric; orthogonal; Hermitian and anti-Hermitian; unitary; normal 8.13 Eigenvectors and eigenvalues 272 Of a normal matrix; of Hermitian and anti-Hermitian matrices; of a unitary matrix; of a general square matrix 8.14 Determination of eigenvalues and eigenvectors 280 Degenerate eigenvalues 8.15 Change of basis and similarity transformations 282 8.16 Diagonalisation of matrices 285 8.17 Quadratic and Hermitian forms 288 Stationary properties of the eigenvectors; quadratic surfaces 8.18 Simultaneous linear equations 292 Range; null space; N simultaneous linear equations in N unknowns; singular value decomposition 8.19 Exercises 307 8.20 Hints and answers 314 9 Normal modes 316 9.1 Typical oscillatory systems 317 9.2 Symmetry and normal modes 322 viii CONTENTS 9.3 Rayleigh–Ritz method 327 9.4 Exercises 329 9.5 Hints and answers 332 10 Vector calculus 334 10.1 Diﬀerentiation of vectors 334 Composite vector expressions; diﬀerential of a vector 10.2 Integration of vectors 339 10.3 Space curves 340 10.4 Vector functions of several arguments 344 10.5 Surfaces 345 10.6 Scalar and vector fields 347 10.7 Vector operators 347 Gradient of a scalar field; divergence of a vector field; curl of a vector field 10.8 Vector operator formulae 354 Vector operators acting on sums and products; combinations of grad, div and curl 10.9 Cylindrical and spherical polar coordinates 357 10.10 General curvilinear coordinates 364 10.11 Exercises 369 10.12 Hints and answers 375 11 Line, surface and volume integrals 377 11.1 Line integrals 377 Evaluating line integrals; physical examples; line integrals with respect to a scalar 11.2 Connectivity of regions 383 11.3 Green’s theorem in a plane 384 11.4 Conservative fields and potentials 387 11.5 Surface integrals 389 Evaluating surface integrals; vector areas of surfaces; physical examples 11.6 Volume integrals 396 Volumes of three-dimensional regions 11.7 Integral forms for grad, div and curl 398 11.8 Divergence theorem and related theorems 401 Green’s theorems; other related integral theorems; physical applications 11.9 Stokes’ theorem and related theorems 406 Related integral theorems; physical applications 11.10 Exercises 409 11.11 Hints and answers 414 12 Fourier series 415 12.1 The Dirichlet conditions 415 ix CONTENTS 12.2 The Fourier coeﬃcients 417 12.3 Symmetry considerations 419 12.4 Discontinuous functions 420 12.5 Non-periodic functions 422 12.6 Integration and diﬀerentiation 424 12.7 Complex Fourier series 424 12.8 Parseval’s theorem 426 12.9 Exercises 427 12.10 Hints and answers 431 13 Integral transforms 433 13.1 Fourier transforms 433 The uncertainty principle; Fraunhofer diﬀraction; the Dirac δ-function; relation of the δ-function to Fourier transforms; properties of Fourier transforms; odd and even functions; convolution and deconvolution; correlation functions and energy spectra; Parseval’s theorem; Fourier transforms in higher dimensions 13.2 Laplace transforms 453 Laplace transforms of derivatives and integrals; other properties of Laplace transforms 13.3 Concluding remarks 459 13.4 Exercises 460 13.5 Hints and answers 466 14 First-order ordinary diﬀerential equations 468 14.1 General form of solution 469 14.2 First-degree first-order equations 470 Separable-variable equations; exact equations; inexact equations, integrat- ing factors; linear equations; homogeneous equations; isobaric equations; Bernoulli’s equation; miscellaneous equations 14.3 Higher-degree first-order equations 480 Equations soluble for p;forx;fory; Clairaut’s equation 14.4 Exercises 484 14.5 Hints and answers 488 15 Higher-order ordinary diﬀerential equations 490 15.1 Linear equations with constant coeﬃcients 492 Finding the complementary function y c (x); finding the particular integral y p (x); constructing the general solution y c (x)+y p (x); linear recurrence relations; Laplace transform method 15.2 Linear equations with variable coeﬃcients 503 The Legendre and Euler linear equations; exact equations; partially known complementary function; variation of parameters; Green’s functions; canonical form for second-order equations x CONTENTS 15.3 General ordinary diﬀerential equations 518 Dependent variable absent; independent variable absent; non-linear exact equations; isobaric or homogeneous equations; equations homogeneous in x or y alone; equations having y = Ae x as a solution 15.4 Exercises 523 15.5 Hints and answers 529 16 Series solutions of ordinary diﬀerential equations 531 16.1 Second-order linear ordinary diﬀerential equations 531 Ordinary and singular points 16.2 Series solutions about an ordinary point 535 16.3 Series solutions about a regular singular point 538 Distinct roots not diﬀering by an integer; repeated root of the indicial equation; distinct roots diﬀering by an integer 16.4 Obtaining a second solution 544 The Wronskian method; the derivative method; series form of the second solution 16.5 Polynomial solutions 548 16.6 Exercises 550 16.7 Hints and answers 553 17 Eigenfunction methods for diﬀerential equations 554 17.1 Sets of functions 556 Some useful inequalities 17.2 Adjoint, self-adjoint and Hermitian operators 559 17.3 Properties of Hermitian operators 561 Reality of the eigenvalues; orthogonality of the eigenfunctions; construction of real eigenfunctions 17.4 Sturm–Liouville equations 564 Valid boundary conditions; putting an equation into Sturm–Liouville form 17.5 Superposition of eigenfunctions: Green’s functions 569 17.6 A useful generalisation 572 17.7 Exercises 573 17.8 Hints and answers 576 18 Special functions 577 18.1 Legendre functions 577 General solution for integer lscript; properties of Legendre polynomials 18.2 Associated Legendre functions 587 18.3 Spherical harmonics 593 18.4 Chebyshev functions 595 18.5 Bessel functions 602 General solution for non-integer ν; general solution for integer ν; properties of Bessel functions xi CONTENTS 18.6 Spherical Bessel functions 614 18.7 Laguerre functions 616 18.8 Associated Laguerre functions 621 18.9 Hermite functions 624 18.10 Hypergeometric functions 628 18.11 Confluent hypergeometric functions 633 18.12 The gamma function and related functions 635 18.13 Exercises 640 18.14 Hints and answers 646 19 Quantum operators 648 19.1 Operator formalism 648 Commutators 19.2 Physical examples of operators 656 Uncertainty principle; angular momentum; creation and annihilation operators 19.3 Exercises 671 19.4 Hints and answers 674 20 Partial diﬀerential equations: general and particular solutions 675 20.1 Important partial diﬀerential equations 676 The wave equation; the diﬀusion equation; Laplace’s equation; Poisson’s equation; Schr¨odinger’s equation 20.2 General form of solution 680 20.3 General and particular solutions 681 First-order equations; inhomogeneous equations and problems; second-order equations 20.4 The wave equation 693 20.5 The diﬀusion equation 695 20.6 Characteristics and the existence of solutions 699 First-order equations; second-order equations 20.7 Uniqueness of solutions 705 20.8 Exercises 707 20.9 Hints and answers 711 21 Partial diﬀerential equations: separation of variables and other methods 713 21.1 Separation of variables: the general method 713 21.2 Superposition of separated solutions 717 21.3 Separation of variables in polar coordinates 725 Laplace’s equation in polar coordinates; spherical harmonics; other equations in polar coordinates; solution by expansion; separation of variables for inhomogeneous equations 21.4 Integral transform methods 747 xii CONTENTS 21.5 Inhomogeneous problems – Green’s functions 751 Similarities to Green’s functions for ordinary diﬀerential equations; general boundary-value problems; Dirichlet problems; Neumann problems 21.6 Exercises 767 21.7 Hints and answers 773 22 Calculus of variations 775 22.1 The Euler–Lagrange equation 776 22.2 Special cases 777 F does not contain y explicitly; F does not contain x explicitly 22.3 Some extensions 781 Several dependent variables; several independent variables; higher-order derivatives; variable end-points 22.4 Constrained variation 785 22.5 Physical variational principles 787 Fermat’s principle in optics; Hamilton’s principle in mechanics 22.6 General eigenvalue problems 790 22.7 Estimation of eigenvalues and eigenfunctions 792 22.8 Adjustment of parameters 795 22.9 Exercises 797 22.10 Hints and answers 801 23 Integral equations 803 23.1 Obtaining an integral equation from a diﬀerential equation 803 23.2 Types of integral equation 804 23.3 Operator notation and the existence of solutions 805 23.4 Closed-form solutions 806 Separable kernels; integral transform methods; diﬀerentiation 23.5 Neumann series 813 23.6 Fredholm theory 815 23.7 Schmidt–Hilbert theory 816 23.8 Exercises 819 23.9 Hints and answers 823 24 Complex variables 824 24.1 Functions of a complex variable 825 24.2 The Cauchy–Riemann relations 827 24.3 Power series in a complex variable 830 24.4 Some elementary functions 832 24.5 Multivalued functions and branch cuts 835 24.6 Singularities and zeros of complex functions 837 24.7 Conformal transformations 839 24.8 Complex integrals 845 xiii CONTENTS 24.9 Cauchy’s theorem 849 24.10 Cauchy’s integral formula 851 24.11 Taylor and Laurent series 853 24.12 Residue theorem 858 24.13 Definite integrals using contour integration 861 24.14 Exercises 867 24.15 Hints and answers 870 25 Applications of complex variables 871 25.1 Complex potentials 871 25.2 Applications of conformal transformations 876 25.3 Location of zeros 879 25.4 Summation of series 882 25.5 Inverse Laplace transform 884 25.6 Stokes’ equation and Airy integrals 888 25.7 WKB methods 895 25.8 Approximations to integrals 905 Level lines and saddle points; steepest descents; stationary phase 25.9 Exercises 920 25.10 Hints and answers 925 26 Tensors 927 26.1 Some notation 928 26.2 Change of basis 929 26.3 Cartesian tensors 930 26.4 First- and zero-order Cartesian tensors 932 26.5 Second- and higher-order Cartesian tensors 935 26.6 The algebra of tensors 938 26.7 The quotient law 939 26.8 The tensors δ ij and epsilon1 ijk 941 26.9 Isotropic tensors 944 26.10 Improper rotations and pseudotensors 946 26.11 Dual tensors 949 26.12 Physical applications of tensors 950 26.13 Integral theorems for tensors 954 26.14 Non-Cartesian coordinates 955 26.15 The metric tensor 957 26.16 General coordinate transformations and tensors 960 26.17 Relative tensors 963 26.18 Derivatives of basis vectors and Christoﬀel symbols 965 26.19 Covariant diﬀerentiation 968 26.20 Vector operators in tensor form 971 xiv CONTENTS 26.21 Absolute derivatives along curves 975 26.22 Geodesics 976 26.23 Exercises 977 26.24 Hints and answers 982 27 Numerical methods 984 27.1 Algebraic and transcendental equations 985 Rearrangement of the equation; linear interpolation; binary chopping; Newton–Raphson method 27.2 Convergence of iteration schemes 992 27.3 Simultaneous linear equations 994 Gaussian elimination; Gauss–Seidel iteration; tridiagonal matrices 27.4 Numerical integration 1000 Trapezium rule; Simpson’s rule; Gaussian integration; Monte Carlo methods 27.5 Finite diﬀerences 1019 27.6 Diﬀerential equations 1020 Diﬀerence equations; Taylor series solutions; prediction and correction; Runge–Kutta methods; isoclines 27.7 Higher-order equations 1028 27.8 Partial diﬀerential equations 1030 27.9 Exercises 1033 27.10 Hints and answers 1039 28 Group theory 1041 28.1 Groups 1041 Definition of a group; examples of groups 28.2 Finite groups 1049 28.3 Non-Abelian groups 1052 28.4 Permutation groups 1056 28.5 Mappings between groups 1059 28.6 Subgroups 1061 28.7 Subdividing a group 1063 Equivalence relations and classes; congruence and cosets; conjugates and classes 28.8 Exercises 1070 28.9 Hints and answers 1074 29 Representation theory 1076 29.1 Dipole moments of molecules 1077 29.2 Choosing an appropriate formalism 1078 29.3 Equivalent representations 1084 29.4 Reducibility of a representation 1086 29.5 The orthogonality theorem for irreducible representations 1090 xv CONTENTS 29.6 Characters 1092 Orthogonality property of characters 29.7 Counting irreps using characters 1095 Summation rules for irreps 29.8 Construction of a character table 1100 29.9 Group nomenclature 1102 29.10 Product representations 1103 29.11 Physical applications of group theory 1105 Bonding in molecules; matrix elements in quantum mechanics; degeneracy of normal modes; breaking of degeneracies 29.12 Exercises 1113 29.13 Hints and answers 1117 30 Probability 1119 30.1 Venn diagrams 1119 30.2 Probability 1124 Axioms and theorems; conditional probability; Bayes’ theorem 30.3 Permutations and combinations 1133 30.4 Random variables and distributions 1139 Discrete random variables; continuous random variables 30.5 Properties of distributions 1143 Mean; mode and median; variance and standard deviation; moments; central moments 30.6 Functions of random variables 1150 30.7 Generating functions 1157 Probability generating functions; moment generating functions; characteristic functions; cumulant generating functions 30.8 Important discrete distributions 1168 Binomial; geometric; negative binomial; hypergeometric; Poisson 30.9 Important continuous distributions 1179 Gaussian; log-normal; exponential; gamma; chi-squared; Cauchy; Breit– Wigner; uniform 30.10 The central limit theorem 1195 30.11 Joint distributions 1196 Discrete bivariate; continuous bivariate; marginal and conditional distributions 30.12 Properties of joint distributions 1199 Means; variances; covariance and correlation 30.13 Generating functions for joint distributions 1205 30.14 Transformation of variables in joint distributions 1206 30.15 Important joint distributions 1207 Multinominal; multivariate Gaussian 30.16 Exercises 1211 30.17 Hints and answers 1219 xvi CONTENTS 31 Statistics 1221 31.1 Experiments, samples and populations 1221 31.2 Sample statistics 1222 Averages; variance and standard deviation; moments; covariance and correla- tion 31.3 Estimators and sampling distributions 1229 Consistency, bias and eﬃciency; Fisher’s inequality; standard errors; confi- dence limits 31.4 Some basic estimators 1243 Mean; variance; standard deviation; moments; covariance and correlation 31.5 Maximum-likelihood method 1255 ML estimator; transformation invariance and bias; eﬃciency; errors and confidence limits; Bayesian interpretation; large-N behaviour; extended ML method 31.6 The method of least squares 1271 Linear least squares; non-linear least squares 31.7 Hypothesis testing 1277 Simple and composite hypotheses; statistical tests; Neyman–Pearson; gener- alised likelihood-ratio; Student’s t;Fisher’sF; goodness of fit 31.8 Exercises 1298 31.9 Hints and answers 1303 Index 1305 xvii CONTENTS I am the very Model for a Student Mathematical I am the very model for a student mathematical; I’ve information rational, and logical and practical. I know the laws of algebra, and find them quite symmetrical, And even know the meaning of ‘a variate antithetical’. I’m extremely well acquainted, with all things mathematical. I understand equations, both the simple and quadratical. About binomial theorems I’m teeming with a lot o’news, With many cheerful facts about the square of the hypotenuse. I’m very good at integral and diﬀerential calculus, And solving paradoxes that so often seem to rankle us. In short in matters rational, and logical and practical, I am the very model for a student mathematical. I know the singularities of equations diﬀerential, And some of these are regular, but the rest are quite essential. I quote the results of giants; with Euler, Newton, Gauss, Laplace, And can calculate an orbit, given a centre, force and mass. I can reconstruct equations, both canonical and formal, And write all kinds of matrices, orthogonal, real and normal. I show how to tackle problems that one has never met before, By analogy or example, or with some clever metaphor. I seldom use equivalence to help decide upon a class, But often find an integral, using a contour o’er a pass. In short in matters rational, and logical and practical, I am the very model for a student mathematical. When you have learnt just what is meant by ‘Jacobian’ and ‘Abelian’; When you at sight can estimate, for the modal, mean and median; When describing normal subgroups is much more than recitation; When you understand precisely what is ‘quantum excitation’; When you know enough statistics that you can recognise RV; When you have learnt all advances that have been made in SVD; And when you can spot the transform that solves some tricky PDE, You will feel no better student has ever sat for a degree. Your accumulated knowledge, whilst extensive and exemplary, Will have only been brought down to the beginning of last century, But still in matters rational, and logical and practical, You’ll be the very model of a student mathematical. KFR, with apologies to W.S. Gilbert xix Preface to the third edition As is natural, in the four years since the publication of the second edition of this book we have somewhat modified our views on what should be included and how it should be presented. In this new edition, although the range of topics covered has been extended, there has been no significant shift in the general level of diﬃculty or in the degree of mathematical sophistication required. Further, we have aimed to preserve the same style of presentation as seems to have been well received in the first two editions. However, a significant change has been made to the format of the chapters, specifically to the way that the exercises, together with their hints and answers, have been treated; the details of the change are explained below. The two major chapters that are new in this third edition are those dealing with ‘special functions’ and the applications of complex variables. The former presents a systematic account of those functions that appear to have arisen in a more or less haphazard way as a result of studying particular physical situations, and are deemed ‘special’ for that reason. The treatment presented here shows that, in fact, they are nearly all particular cases of the hypergeometric or confluent hypergeometric functions, and are special only in the sense that the parameters of the relevant function take simple or related values. The second new chapter describes how the properties of complex variables can be used to tackle problems arising from the description of physical situations or from other seemingly unrelated areas of mathematics. To topics treated in earlier editions, such as the solution of Laplace’s equation in two dimensions, the summation of series, the location of zeros of polynomials and the calculation of inverse Laplace transforms, has been added new material covering Airy integrals, saddle-point methods for contour integral evaluation, and the WKB approach to asymptotic forms. Other new material includes a stand-alone chapter on the use of coordinate-free operators to establish valuable results in the field of quantum mechanics; amongst xx PREFACE TO THE THIRD EDITION the physical topics covered are angular momentum and uncertainty principles. There are also significant additions to the treatment of numerical integration. In particular, Gaussian quadrature based on Legendre, Laguerre, Hermite and Chebyshev polynomials is discussed, and appropriate tables of points and weights are provided. We now turn to the most obvious change to the format of the book, namely the way that the exercises, hints and answers are treated. The second edition of Mathematical Methods for Physics and Engineering carried more than twice as many exercises, based on its various chapters, as did the first. In its preface we discussed the general question of how such exercises should be treated but, in the end, decided to provide hints and outline answers to all problems, as in the first edition. This decision was an uneasy one as, on the one hand, it did not allow the exercises to be set as totally unaided homework that could be used for assessment purposes but, on the other, it did not give a full explanation of how to tackle a problem when a student needed explicit guidance or a model answer. In order to allow both of these educationally desirable goals to be achieved, we have, in this third edition, completely changed the way in which this matter is handled. A large number of exercises have been included in the penultimate subsections of the appropriate, sometimes reorganised, chapters. Hints and outline answers are given, as previously, in the final subsections, but only for the odd- numbered exercises. This leaves all even-numbered exercises free to be set as unaided homework, as described below. For the four hundred plus odd-numbered exercises, complete solutions are available, to both students and their teachers, in the form of a separate manual, Student Solutions Manual for Mathematical Methods for Physics and Engineering (Cambridge: Cambridge University Press, 2006); the hints and outline answers given in this main text are brief summaries of the model answers given in the manual. There, each original exercise is reproduced and followed by a fully worked solution. For those original exercises that make internal reference to this text or to other (even-numbered) exercises not included in the solutions manual, the questions have been reworded, usually by including additional information, so that the questions can stand alone. In many cases, the solution given in the manual is even fuller than one that might be expected of a good student that has understood the material. This is because we have aimed to make the solutions instructional as well as utilitarian. To this end, we have included comments that are intended to show how the plan for the solution is fomulated and have given the justifications for particular intermediate steps (something not always done, even by the best of students). We have also tried to write each individual substituted formula in the form that best indicates how it was obtained, before simplifying it at the next or a subsequent stage. Where several lines of algebraic manipulation or calculus are needed to obtain a final result, they are normally included in full; this should enable the xxi PREFACE TO THE THIRD EDITION student to determine whether an incorrect answer is due to a misunderstanding of principles or to a technical error. The remaining four hundred or so even-numbered exercises have no hints or answers, outlined or detailed, available for general access. They can therefore be used by instructors as a basis for setting unaided homework. Full solutions to these exercises, in the same general format as those appearing in the manual (though they may contain references to the main text or to other exercises), are available without charge to accredited teachers as downloadable pdf files on the password-protected website http://www.cambridge.org/9780521679718. Teachers wishing to have access to the website should contact solutions@cambridge.org for registration details. In all new publications, errors and typographical mistakes are virtually un- avoidable, and we would be grateful to any reader who brings instances to our attention. Retrospectively, we would like to record our thanks to Reinhard Gerndt, Paul Renteln and Joe Tenn for making us aware of some errors in the second edition. Finally, we are extremely grateful to Dave Green for his considerable and continuing advice concerning L A T E X. Ken Riley, Michael Hobson, Cambridge, 2006 xxii Preface to the second edition Since the publication of the first edition of this book, both through teaching the material it covers and as a result of receiving helpful comments from colleagues, we have become aware of the desirability of changes in a number of areas. The most important of these is that the mathematical preparation of current senior college and university entrants is now less thorough than it used to be. To match this, we decided to include a preliminary chapter covering areas such as polynomial equations, trigonometric identities, coordinate geometry, partial fractions, binomial expansions, necessary and suﬃcient condition and proof by induction and contradiction. Whilst the general level of what is included in this second edition has not been raised, some areas have been expanded to take in topics we now feel were not adequately covered in the first. In particular, increased attention has been given to non-square sets of simultaneous linear equations and their associated matrices. We hope that this more extended treatment, together with the inclusion of singular value matrix decomposition, will make the material of more practical use to engineering students. In the same spirit, an elementary treatment of linear recurrence relations has been included. The topic of normal modes has been given a small chapter of its own, though the links to matrices on the one hand, and to representation theory on the other, have not been lost. Elsewhere, the presentation of probability and statistics has been reorganised to give the two aspects more nearly equal weights. The early part of the probability chapter has been rewritten in order to present a more coherent development based on Boolean algebra, the fundamental axioms of probability theory and the properties of intersections and unions. Whilst this is somewhat more formal than previously, we think that it has not reduced the accessibility of these topics and hope that it has increased it. The scope of the chapter has been somewhat extended to include all physically important distributions and an introduction to cumulants. xxiii PREFACE TO THE SECOND EDITION Statistics now occupies a substantial chapter of its own, one that includes sys- tematic discussions of estimators and their eﬃciency, sample distributions and t- and F-tests for comparing means and variances. Other new topics are applications of the chi-squared distribution, maximum-likelihood parameter estimation and least-squares fitting. In other chapters we have added material on the following topics: curvature, envelopes, curve-sketching, more refined numerical methods for diﬀerential equations and the elements of integration using Monte Carlo techniques. Over the last four years we have received somewhat mixed feedback about the number of exercises at the ends of the various chapters. After consideration, we decided to increase the number substantially, partly to correspond to the additional topics covered in the text but mainly to give both students and their teachers a wider choice. There are now nearly 800 such exercises, many with several parts. An even more vexed question has been whether to provide hints and answers to all the exercises or just to ‘the odd-numbered’ ones, as is the normal practice for textbooks in the United States, thus making the remainder more suitable for setting as homework. In the end, we decided that hints and outline solutions should be provided for all the exercises, in order to facilitate independent study while leaving the details of the calculation as a task for the student. In conclusion, we hope that this edition will be thought by its users to be ‘heading in the right direction’ and would like to place on record our thanks to all who have helped to bring about the changes and adjustments. Naturally, those colleagues who have noted errors or ambiguities in the first edition and brought them to our attention figure high on the list, as do the staﬀ at The Cambridge University Press. In particular, we are grateful to Dave Green for continued L A T E X advice, Susan Parkinson for copy-editing the second edition with her usual keen eye for detail and flair for crafting coherent prose and Alison Woollatt for once again turning our basic L A T E X into a beautifully typeset book. Our thanks go to all of them, though of course we accept full responsibility for any remaining errors or ambiguities, of which, as with any new publication, there are bound to be some. On a more personal note, KFR again wishes to thank his wife Penny for her unwavering support, not only in his academic and tutorial work, but also in their joint eﬀorts to convert time at the bridge table into ‘green points’ on their record. MPH is once more indebted to his wife, Becky, and his mother, Pat, for their tireless support and encouragement above and beyond the call of duty. MPH dedicates his contribution to this book to the memory of his father, Ronald Leonard Hobson, whose gentle kindness, patient understanding and unbreakable spirit made all things seem possible. Ken Riley, Michael Hobson Cambridge, 2002 xxiv Preface to the first edition A knowledge of mathematical methods is important for an increasing number of university and college courses, particularly in physics, engineering and chemistry, but also in more general science. Students embarking on such courses come from diverse mathematical backgrounds, and their core knowledge varies considerably. We have therefore decided to write a textbook that assumes knowledge only of material that can be expected to be familiar to all the current generation of students starting physical science courses at university. In the United Kingdom this corresponds to the standard of Mathematics A-level, whereas in the United States the material assumed is that which would normally be covered at junior college. Starting from this level, the first six chapters cover a collection of topics with which the reader may already be familiar, but which are here extended and applied to typical problems encountered by first-year university students. They are aimed at providing a common base of general techniques used in the development of the remaining chapters. Students who have had additional preparation, such as Further Mathematics at A-level, will find much of this material straightforward. Following these opening chapters, the remainder of the book is intended to cover at least that mathematical material which an undergraduate in the physical sciences might encounter up to the end of his or her course. The book is also appropriate for those beginning graduate study with a mathematical content, and naturally much of the material forms parts of courses for mathematics students. Furthermore, the text should provide a useful reference for research workers. The general aim of the book is to present a topic in three stages. The first stage is a qualitative introduction, wherever possible from a physical point of view. The second is a more formal presentation, although we have deliberately avoided strictly mathematical questions such as the existence of limits, uniform convergence, the interchanging of integration and summation orders, etc. on the xxv PREFACE TO THE FIRST EDITION grounds that ‘this is the real world; it must behave reasonably’. Finally a worked example is presented, often drawn from familiar situations in physical science and engineering. These examples have generally been fully worked, since, in the authors’ experience, partially worked examples are unpopular with students. Only in a few cases, where trivial algebraic manipulation is involved, or where repetition of the main text would result, has an example been left as an exercise for the reader. Nevertheless, a number of exercises also appear at the end of each chapter, and these should give the reader ample opportunity to test his or her understanding. Hints and answers to these exercises are also provided. With regard to the presentation of the mathematics, it has to be accepted that many equations (especially partial diﬀerential equations) can be written more compactly by using subscripts, e.g. u xy for a second partial derivative, instead of the more familiar ∂ 2 u/∂x∂y, and that this certainly saves typographical space. However, for many students, the labour of mentally unpacking such equations is suﬃciently great that it is not possible to think of an equation’s physical interpretation at the same time. Consequently, wherever possible we have decided to write out such expressions in their more obvious but longer form. During the writing of this book we have received much help and encouragement from various colleagues at the Cavendish Laboratory, Clare College, Trinity Hall and Peterhouse. In particular, we would like to thank Peter Scheuer, whose comments and general enthusiasm proved invaluable in the early stages. For reading sections of the manuscript, for pointing out misprints and for numerous useful comments, we thank many of our students and colleagues at the University of Cambridge. We are especially grateful to Chris Doran, John Huber, Garth Leder, Tom K¨orner and, not least, Mike Stobbs, who, sadly, died before the book was completed. We also extend our thanks to the University of Cambridge and the Cavendish teaching staﬀ, whose examination questions and lecture hand-outs have collectively provided the basis for some of the examples included. Of course, any errors and ambiguities remaining are entirely the responsibility of the authors, and we would be most grateful to have them brought to our attention. We are indebted to Dave Green for a great deal of advice concerning typesetting in L A T E X and to Andrew Lovatt for various other computing tips. Our thanks also go to Anja Visser and Grac¸a Rocha for enduring many hours of (sometimes heated) debate. At Cambridge University Press, we are very grateful to our editor Adam Black for his help and patience and to Alison Woollatt for her expert typesetting of such a complicated text. We also thank our copy-editor Susan Parkinson for many useful suggestions that have undoubtedly improved the style of the book. Finally, on a personal note, KFR wishes to thank his wife Penny, not only for a long and happy marriage, but also for her support and understanding during his recent illness – and when things have not gone too well at the bridge table! MPH is indebted both to Rebecca Morris and to his parents for their tireless xxvi PREFACE TO THE FIRST EDITION support and patience, and for their unending supplies of tea. SJB is grateful to Anthony Gritten for numerous relaxing discussions about J.S.Bach, to Susannah Ticciati for her patience and understanding, and to Kate Isaak for her calming late-night e-mails from the USA. Ken Riley, Michael Hobson and Stephen Bence Cambridge, 1997 xxvii 1 Preliminary algebra This opening chapter reviews the basic algebra of which a working knowledge is presumed in the rest of the book. Many students will be familiar with much, if not all, of it, but recent changes in what is studied during secondary education mean that it cannot be taken for granted that they will already have a mastery of all the topics presented here. The reader may assess which areas need further study or revision by attempting the exercises at the end of the chapter. The main areas covered are polynomial equations and the related topic of partial fractions, curve sketching, coordinate geometry, trigonometric identities and the notions of proof by induction or contradiction. 1.1 Simple functions and equations It is normal practice when starting the mathematical investigation of a physical problem to assign an algebraic symbol to the quantity whose value is sought, either numerically or as an explicit algebraic expression. For the sake of definiteness, in this chapter we will use x to denote this quantity most of the time. Subsequent steps in the analysis involve applying a combination of known laws, consistency conditions and (possibly) given constraints to derive one or more equations satisfied by x. These equations may take many forms, ranging from a simple polynomial equation to, say, a partial diﬀerential equation with several boundary conditions. Some of the more complicated possibilities are treated in the later chapters of this book, but for the present we will be concerned with techniques for the solution of relatively straightforward algebraic equations. 1.1.1 Polynomials and polynomial equations Firstly we consider the simplest type of equation, a polynomial equation,inwhich a polynomial expression in x, denoted by f(x), is set equal to zero and thereby 1 PRELIMINARY ALGEBRA forms an equation which is satisfied by particular values of x, called the roots of the equation: f(x)=a n x n + a n−1 x n−1 +···+ a 1 x + a 0 =0. (1.1) Here n is an integer > 0, called the degree of both the polynomial and the equation, and the known coeﬃcients a 0 ,a 1 ,...,a n are real quantities with a n negationslash=0. Equations such as (1.1) arise frequently in physical problems, the coeﬃcients a i being determined by the physical properties of the system under study. What is needed is to find some or all of the roots of (1.1), i.e. the x-values, α k ,thatsatisfy f(α k )=0;herek is an index that, as we shall see later, can take up to n diﬀerent values, i.e. k =1,2,...,n. The roots of the polynomial equation can equally well be described as the zeros of the polynomial. When they are real, they correspond to the points at which a graph of f(x) crosses the x-axis. Roots that are complex (see chapter 3) do not have such a graphical interpretation. For polynomial equations containing powers of x greater than x 4 general methods do not exist for obtaining explicit expressions for the roots α k . Even for n = 3 and n = 4 the prescriptions for obtaining the roots are suﬃciently complicated that it is usually preferable to obtain exact or approximate values by other methods. Only for n = 1 and n = 2 can closed-form solutions be given. These results will be well known to the reader, but they are given here for the sake of completeness. For n = 1, (1.1) reduces to the linear equation a 1 x + a 0 = 0; (1.2) the solution (root) is α 1 = −a 0 /a 1 .Forn = 2, (1.1) reduces to the quadratic equation a 2 x 2 + a 1 x + a 0 = 0; (1.3) the two roots α 1 and α 2 are given by α 1,2 = −a 1 ± radicalBig a 2 1 −4a 2 a 0 2a 2 . (1.4) When discussing specifically quadratic equations, as opposed to more general polynomial equations, it is usual to write the equation in one of the two notations ax 2 + bx + c =0,ax 2 +2bx + c =0, (1.5) with respective explicit pairs of solutions α 1,2 = −b± √ b 2 −4ac 2a ,α 1,2 = −b± √ b 2 −ac a . (1.6) Of course, these two notations are entirely equivalent and the only important point is to associate each form of answer with the corresponding form of equation; most people keep to one form, to avoid any possible confusion. 2 1.1 SIMPLE FUNCTIONS AND EQUATIONS If the value of the quantity appearing under the square root sign is positive then both roots are real; if it is negative then the roots form a complex conjugate pair, i.e. they are of the form p±iq with p and q real (see chapter 3); if it has zero value then the two roots are equal and special considerations usually arise. Thus linear and quadratic equations can be dealt with in a cut-and-dried way. We now turn to methods for obtaining partial information about the roots of higher-degree polynomial equations. In some circumstances the knowledge that an equation has a root lying in a certain range, or that it has no real roots at all, is all that is actually required. For example, in the design of electronic circuits it is necessary to know whether the current in a proposed circuit will break into spontaneous oscillation. To test this, it is suﬃcient to establish whether a certain polynomial equation, whose coeﬃcients are determined by the physical parameters of the circuit, has a root with a positive real part (see chapter 3); complete determination of all the roots is not needed for this purpose. If the complete set of roots of a polynomial equation is required, it can usually be obtained to any desired accuracy by numerical methods such as those described in chapter 27. There is no explicit step-by-step approach to finding the roots of a general polynomial equation such as (1.1). In most cases analytic methods yield only information about the roots, rather than their exact values. To explain the relevant techniques we will consider a particular example, ‘thinking aloud’ on paper and expanding on special points about methods and lines of reasoning. In more routine situations such comment would be absent and the whole process briefer and more tightly focussed. Example: the cubic case Let us investigate the roots of the equation g(x)=4x 3 +3x 2 −6x−1 = 0 (1.7) or, in an alternative phrasing, investigate the zeros of g(x).Wenotefirstofall that this is a cubic equation. It can be seen that for x large and positive g(x) will be large and positive and, equally, that for x large and negative g(x) will be large and negative. Therefore, intuitively (or, more formally, by continuity) g(x) must cross the x-axis at least once and so g(x) = 0 must have at least one real root. Furthermore, it can be shown that if f(x)isannth-degree polynomial then the graph of f(x) must cross the x-axis an even or odd number of times as x varies between −∞ and +∞, according to whether n itself is even or odd. Thus a polynomial of odd degree always has at least one real root, but one of even degree may have no real root. A small complication, discussed later in this section, occurs when repeated roots arise. Having established that g(x) = 0 has at least one real root, we may ask how 3 PRELIMINARY ALGEBRA many real roots it could have. To answer this we need one of the fundamental theorems of algebra, mentioned above: An nth-degree polynomial equation has exactly n roots. It should be noted that this does not imply that there are n real roots (only that there are not more than n); some of the roots may be of the form p + iq. To make the above theorem plausible and to see what is meant by repeated roots, let us suppose that the nth-degree polynomial equation f(x) = 0, (1.1), has r roots α 1 ,α 2 ,...,α r , considered distinct for the moment. That is, we suppose that f(α k )=0fork =1,2,...,r,sothatf(x) vanishes only when x is equal to one of the r values α k . But the same can be said for the function F(x)=A(x−α 1 )(x−α 2 )···(x−α r ), (1.8) in which A is a non-zero constant; F(x) can clearly be multiplied out to form a polynomial expression. We now call upon a second fundamental result in algebra: that if two poly- nomial functions f(x)andF(x) have equal values for all values of x, then their coeﬃcients are equal on a term-by-term basis. In other words, we can equate the coeﬃcients of each and every power of x in the two expressions (1.8) and (1.1); in particular we can equate the coeﬃcients of the highest power of x.From this we have Ax r ≡ a n x n and thus that r = n and A = a n .Asr is both equal to n and to the number of roots of f(x) = 0, we conclude that the nth-degree polynomial f(x)=0hasn roots. (Although this line of reasoning may make the theorem plausible, it does not constitute a proof since we have not shown that it is permissible to write f(x) in the form of equation (1.8).) We next note that the condition f(α k )=0fork =1,2,...,r, could also be met if (1.8) were replaced by F(x)=A(x−α 1 ) m1 (x−α 2 ) m2 ···(x−α r ) mr , (1.9) with A = a n . In (1.9) the m k are integers ≥1 and are known as the multiplicities of the roots, m k being the multiplicity of α k . Expanding the right-hand side (RHS) leads to a polynomial of degree m 1 +m 2 +···+m r . This sum must be equal to n. Thus, if any of the m k is greater than unity then the number of distinct roots, r, is less than n; the total number of roots remains at n, but one or more of the α k counts more than once. For example, the equation F(x)=A(x−α 1 ) 2 (x−α 2 ) 3 (x−α 3 )(x−α 4 )=0 has exactly seven roots, α 1 being a double root and α 2 a triple root, whilst α 3 and α 4 are unrepeated (simple)roots. We can now say that our particular equation (1.7) has either one or three real roots but in the latter case it may be that not all the roots are distinct. To decide how many real roots the equation has, we need to anticipate two ideas from the 4 1.1 SIMPLE FUNCTIONS AND EQUATIONS xx φ 1 (x) φ 2 (x) β 1 β 1 β 2 β 2 Figure 1.1 Two curves φ 1 (x)andφ 2 (x), both with zero derivatives at the same values of x, but with diﬀerent numbers of real solutions to φ i (x)=0. next chapter. The first of these is the notion of the derivative of a function, and the second is a result known as Rolle’s theorem. The derivative f prime (x) of a function f(x) measures the slope of the tangent to the graph of f(x) at that value of x (see figure 2.1 in the next chapter). For the moment, the reader with no prior knowledge of calculus is asked to accept that the derivative of ax n is nax n−1 , so that the derivative g prime (x)ofthecurve g(x)=4x 3 +3x 2 −6x−1 is given by g prime (x)=12x 2 +6x−6. Similar expressions for the derivatives of other polynomials are used later in this chapter. Rolle’s theorem states that if f(x) has equal values at two diﬀerent values of x then at some point between these two x-values its derivative is equal to zero; i.e. the tangent to its graph is parallel to the x-axis at that point (see figure 2.2). Having briefly mentioned the derivative of a function and Rolle’s theorem, we now use them to establish whether g(x) has one or three real zeros. If g(x)=0 does have three real roots α k ,i.e.g(α k )=0fork =1,2,3, then it follows from Rolle’s theorem that between any consecutive pair of them (say α 1 and α 2 )there must be some real value of x at which g prime (x) = 0. Similarly, there must be a further zero of g prime (x) lying between α 2 and α 3 . Thus a necessary condition for three real roots of g(x) = 0 is that g prime (x) = 0 itself has two real roots. However, this condition on the number of roots of g prime (x) = 0, whilst necessary, is not suﬃcient to guarantee three real roots of g(x) = 0. This can be seen by inspecting the cubic curves in figure 1.1. For each of the two functions φ 1 (x)and φ 2 (x), the derivative is equal to zero at both x = β 1 and x = β 2 . Clearly, though, φ 2 (x) = 0 has three real roots whilst φ 1 (x) = 0 has only one. It is easy to see that the crucial diﬀerence is that φ 1 (β 1 )andφ 1 (β 2 ) have the same sign, whilst φ 2 (β 1 ) and φ 2 (β 2 ) have opposite signs. It will be apparent that for some equations, φ(x)=0say,φ prime (x) equals zero 5 PRELIMINARY ALGEBRA at a value of x for which φ(x) is also zero. Then the graph of φ(x)justtouches the x-axis. When this happens the value of x so found is, in fact, a double real root of the polynomial equation (corresponding to one of the m k in (1.9) having the value 2) and must be counted twice when determining the number of real roots. Finally, then, we are in a position to decide the number of real roots of the equation g(x)=4x 3 +3x 2 −6x−1=0. The equation g prime (x)=0,withg prime (x)=12x 2 +6x−6, is a quadratic equation with explicit solutions § β 1,2 = −3± √ 9+72 12 , so that β 1 =−1andβ 2 = 1 2 . The corresponding values of g(x)areg(β 1 )=4and g(β 2 )=− 11 4 , which are of opposite sign. This indicates that 4x 3 +3x 2 −6x−1=0 has three real roots, one lying in the range −1 j α j α k = a n−2 a n . (1.14) 9 PRELIMINARY ALGEBRA In the case of a quadratic equation these root properties are used suﬃciently often that they are worth stating explicitly, as follows. If the roots of the quadratic equation ax 2 + bx + c = 0 are α 1 and α 2 then α 1 + α 2 =− b a , α 1 α 2 = c a . If the alternative standard form for the quadratic is used, b is replaced by 2b in both the equation and the first of these results. trianglerightsldFind a cubic equation whose roots are −4,3 and 5. From results (1.12) – (1.14) we can compute that, arbitrarily setting a 3 =1, −a 2 = 3 summationdisplay k=1 α k =4,a 1 = 3 summationdisplay j=1 3 summationdisplay k>j α j α k =−17,a 0 =(−1) 3 3 productdisplay k=1 α k =60. Thus a possible cubic equation is x 3 +(−4)x 2 +(−17)x+(60) = 0. Of course, any multiple of x 3 −4x 2 −17x + 60 = 0 will do just as well. triangleleftsld 1.2 Trigonometric identities So many of the applications of mathematics to physics and engineering are concerned with periodic, and in particular sinusoidal, behaviour that a sure and ready handling of the corresponding mathematical functions is an essential skill. Even situations with no obvious periodicity are often expressed in terms of periodic functions for the purposes of analysis. Later in this book whole chapters are devoted to developing the techniques involved, but as a necessary prerequisite we here establish (or remind the reader of) some standard identities with which he or she should be fully familiar, so that the manipulation of expressions containing sinusoids becomes automatic and reliable. So as to emphasise the angular nature of the argument of a sinusoid we will denote it in this section by θ rather than x. 1.2.1 Single-angle identities We give without proof the basic identity satisfied by the sinusoidal functions sinθ and cosθ, namely cos 2 θ +sin 2 θ =1. (1.15) If sinθ and cosθ have been defined geometrically in terms of the coordinates of a point on a circle, a reference to the name of Pythagoras will suﬃce to establish this result. If they have been defined by means of series (with θ expressed in radians) then the reader should refer to Euler’s equation (3.23) on page 93, and note that e iθ has unit modulus if θ is real. 10 1.2 TRIGONOMETRIC IDENTITIES x y x prime y prime O A B P T N R M Figure 1.2 Illustration of the compound-angle identities. Refer to the main text for details. Other standard single-angle formulae derived from (1.15) by dividing through by various powers of sinθ and cosθ are 1+tan 2 θ =sec 2 θ, (1.16) cot 2 θ +1=cosec 2 θ. (1.17) 1.2.2 Compound-angle identities The basis for building expressions for the sinusoidal functions of compound angles are those for the sum and diﬀerence of just two angles, since all other cases can be built up from these, in principle. Later we will see that a study of complex numbers can provide a more eﬃcient approach in some cases. To prove the basic formulae for the sine and cosine of a compound angle A+B in terms of the sines and cosines of A and B, we consider the construction shown in figure 1.2. It shows two sets of axes, Oxy and Ox prime y prime , with a common origin but rotated with respect to each other through an angle A. The point P lies on the unit circle centred on the common origin O and has coordinates cos(A + B),sin(A + B) with respect to the axes Oxy and coordinates cosB,sinB with respect to the axes Ox prime y prime . Parallels to the axes Oxy (dotted lines) and Ox prime y prime (broken lines) have been drawn through P. Further parallels (MR and RN)totheOx prime y prime axes have been 11 PRELIMINARY ALGEBRA drawn through R, the point (0,sin(A +B)) in the Oxy system. That all the angles marked with the symbol • are equal to A follows from the simple geometry of right-angled triangles and crossing lines. We now determine the coordinates of P in terms of lengths in the figure, expressing those lengths in terms of both sets of coordinates: (i) cosB = x prime = TN+ NP = MR + NP = ORsinA + RP cosA = sin(A + B)sinA +cos(A + B)cosA; (ii) sinB = y prime = OM−TM = OM−NR = ORcosA−RP sinA = sin(A + B)cosA−cos(A + B)sinA. Now, if equation (i) is multiplied by sinA and added to equation (ii) multiplied by cosA, the result is sinAcosB +cosAsinB = sin(A + B)(sin 2 A +cos 2 A)=sin(A + B). Similarly, if equation (ii) is multiplied by sinA and subtracted from equation (i) multiplied by cosA, the result is cosAcosB−sinAsinB =cos(A + B)(cos 2 A +sin 2 A)=cos(A + B). Corresponding graphically based results can be derived for the sines and cosines of the diﬀerence of two angles; however, they are more easily obtained by setting B to −B in the previous results and remembering that sinB becomes −sinB whilst cosB is unchanged. The four results may be summarised by sin(A±B)=sinAcosB±cosAsinB (1.18) cos(A±B)=cosAcosB∓sinAsinB. (1.19) Standard results can be deduced from these by setting one of the two angles equal to π or to π/2: sin(π−θ)=sinθ, cos(π−θ)=−cosθ, (1.20) sin parenleftbig 1 2 π−θ parenrightbig =cosθ, cos parenleftbig 1 2 π−θ parenrightbig =sinθ, (1.21) From these basic results many more can be derived. An immediate deduction, obtained by taking the ratio of the two equations (1.18) and (1.19) and then dividing both the numerator and denominator of this ratio by cosAcosB,is tan(A±B)= tanA±tanB 1∓tanAtanB . (1.22) One application of this result is a test for whether two lines on a graph are orthogonal (perpendicular); more generally, it determines the angle between them. The standard notation for a straight-line graph is y = mx + c,inwhichm is the slope of the graph and c is its intercept on the y-axis. It should be noted that the slope m is also the tangent of the angle the line makes with the x-axis. 12 1.2 TRIGONOMETRIC IDENTITIES Consequently the angle θ 12 between two such straight-line graphs is equal to the diﬀerence in the angles they individually make with the x-axis, and the tangent of that angle is given by (1.22): tanθ 12 = tanθ 1 −tanθ 2 1+tanθ 1 tanθ 2 = m 1 −m 2 1+m 1 m 2 . (1.23) For the lines to be orthogonal we must have θ 12 = π/2, i.e. the final fraction on the RHS of the above equation must equal ∞,andso m 1 m 2 =−1. (1.24) A kind of inversion of equations (1.18) and (1.19) enables the sum or diﬀerence of two sines or cosines to be expressed as the product of two sinusoids; the procedure is typified by the following. Adding together the expressions given by (1.18) for sin(A + B) and sin(A−B) yields sin(A + B)+sin(A−B)=2sinAcosB. If we now write A + B = C and A−B = D, this becomes sinC +sinD =2sin parenleftbigg C + D 2 parenrightbigg cos parenleftbigg C−D 2 parenrightbigg . (1.25) In a similar way each of the following equations can be derived: sinC−sinD =2cos parenleftbigg C + D 2 parenrightbigg sin parenleftbigg C−D 2 parenrightbigg , (1.26) cosC +cosD =2cos parenleftbigg C + D 2 parenrightbigg cos parenleftbigg C−D 2 parenrightbigg , (1.27) cosC−cosD =−2sin parenleftbigg C + D 2 parenrightbigg sin parenleftbigg C−D 2 parenrightbigg . (1.28) The minus sign on the right of the last of these equations should be noted; it may help to avoid overlooking this ‘oddity’ to recall that if C>Dthen cosC|K|=(a 2 +b 2 ) 1/2 . triangleleftsld 1.3 Coordinate geometry We have already mentioned the standard form for a straight-line graph, namely y = mx + c, (1.35) representing a linear relationship between the independent variable x and the dependent variable y.Theslopem is equal to the tangent of the angle the line makes with the x-axis whilst c is the intercept on the y-axis. An alternative form for the equation of a straight line is ax + by + k =0, (1.36) to which (1.35) is clearly connected by m =− a b and c =− k b . This form treats x and y on a more symmetrical basis, the intercepts on the two axes being −k/a and −k/b respectively. A power relationship between two variables, i.e. one of the form y = Ax n ,can also be cast into straight-line form by taking the logarithms of both sides. Whilst it is normal in mathematical work to use natural logarithms (to base e, written lnx), for practical investigations logarithms to base 10 are often employed. In either case the form is the same, but it needs to be remembered which has been used when recovering the value of A from fitted data. In the mathematical (base e) form, the power relationship becomes lny = nlnx +lnA. (1.37) Now the slope gives the power n, whilst the intercept on the lny axis is lnA, which yields A, either by exponentiation or by taking antilogarithms. The other standard coordinate forms of two-dimensional curves that students should know and recognise are those concerned with the conic sections – so called because they can all be obtained by taking suitable sections across a (double) cone. Because the conic sections can take many diﬀerent orientations and scalings their general form is complex, Ax 2 + By 2 + Cxy + Dx + Ey + F =0, (1.38) but each can be represented by one of four generic forms, an ellipse, a parabola, a hyperbola or, the degenerate form, a pair of straight lines. If they are reduced to their standard representations, in which axes of symmetry are made to coincide 15 PRELIMINARY ALGEBRA with the coordinate axes, the first three take the forms (x−α) 2 a 2 + (y−β) 2 b 2 = 1 (ellipse), (1.39) (y−β) 2 =4a(x−α) (parabola), (1.40) (x−α) 2 a 2 − (y−β) 2 b 2 = 1 (hyperbola). (1.41) Here, (α,β) gives the position of the ‘centre’ of the curve, usually taken as the origin (0,0) when this does not conflict with any imposed conditions. The parabola equation given is that for a curve symmetric about a line parallel to the x-axis. For one symmetrical about a parallel to the y-axis the equation would read (x−α) 2 =4a(y−β). Of course, the circle is the special case of an ellipse in which b = a and the equation takes the form (x−α) 2 +(y−β) 2 = a 2 . (1.42) The distinguishing characteristic of this equation is that when it is expressed in the form (1.38) the coeﬃcients of x 2 and y 2 are equal and that of xy is zero; this property is not changed by any reorientation or scaling and so acts to identify a general conic as a circle. Definitions of the conic sections in terms of geometrical properties are also available; for example, a parabola can be defined as the locus of a point that is always at the same distance from a given straight line (the directrix)asitis from a given point (the focus). When these properties are expressed in Cartesian coordinates the above equations are obtained. For a circle, the defining property is that all points on the curve are a distance a from (α,β); (1.42) expresses this requirement very directly. In the following worked example we derive the equation for a parabola. trianglerightsldFind the equation of a parabola that has the line x = −a as its directrix and the point (a,0) as its focus. Figure 1.3 shows the situation in Cartesian coordinates. Expressing the defining requirement that PN and PF are equal in length gives (x + a)=[(x−a) 2 + y 2 ] 1/2 ⇒ (x + a) 2 =(x−a) 2 + y 2 which, on expansion of the squared terms, immediately gives y 2 =4ax. This is (1.40) with α and β both set equal to zero. triangleleftsld Although the algebra is more complicated, the same method can be used to derive the equations for the ellipse and the hyperbola. In these cases the distance from the fixed point is a definite fraction, e, known as the eccentricity,ofthe distance from the fixed line. For an ellipse 0 1. The parabola corresponds to the case e =1. 16 1.3 COORDINATE GEOMETRY x y O P F N x =−a (a,0) (x,y) Figure 1.3 Construction of a parabola using the point (a,0) as the focus and the line x =−a as the directrix. The values of a and b (with a≥b) in equation (1.39) for an ellipse are related to e through e 2 = a 2 −b 2 a 2 and give the lengths of the semi-axes of the ellipse. If the ellipse is centred on the origin, i.e. α = β = 0, then the focus is (−ae,0) and the directrix is the line x =−a/e. For each conic section curve, although we have two variables, x and y,theyare not independent, since if one is given then the other can be determined. However, determining y when x is given, say, involves solving a quadratic equation on each occasion, and so it is convenient to have parametric representations of the curves. A parametric representation allows each point on a curve to be associated with a unique value of a single parameter t. The simplest parametric representations for the conic sections are as given below, though that for the hyperbola uses hyperbolic functions, not formally introduced until chapter 3. That they do give valid parameterizations can be verified by substituting them into the standard forms (1.39)–(1.41); in each case the standard form is reduced to an algebraic or trigonometric identity. x = α + acosφ, y = β + bsinφ (ellipse), x = α + at 2 , y = β +2at (parabola), x = α + acoshφ, y = β + bsinhφ (hyperbola). As a final example illustrating several topics from this section we now prove 17 PRELIMINARY ALGEBRA the well-known result that the angle subtended by a diameter at any point on a circle is a right angle. trianglerightsldTaking the diameter to be the line joining Q =(−a,0) and R =(a,0) and the point P to be any point on the circle x 2 + y 2 = a 2 , prove that angle QPR is a right angle. If P is the point (x,y), the slope of the line QP is m 1 = y−0 x−(−a) = y x + a . That of RP is m 2 = y−0 x−(a) = y x−a . Thus m 1 m 2 = y 2 x 2 −a 2 . But, since P is on the circle, y 2 = a 2 −x 2 and consequently m 1 m 2 =−1. From result (1.24) this implies that QP and RP are orthogonal and that QPR is therefore a right angle. Note that this is true for any point P on the circle. triangleleftsld 1.4 Partial fractions In subsequent chapters, and in particular when we come to study integration in chapter 2, we will need to express a function f(x) that is the ratio of two polynomials in a more manageable form. To remove some potential complexity from our discussion we will assume that all the coeﬃcients in the polynomials are real, although this is not an essential simplification. The behaviour of f(x) is crucially determined by the location of the zeros of its denominator, i.e. if f(x) is written as f(x)=g(x)/h(x) where both g(x)and h(x) are polynomials, § then f(x) changes extremely rapidly when x is close to those values α i that are the roots of h(x) = 0. To make such behaviour explicit, we write f(x) as a sum of terms such as A/(x−α) n ,inwhichA is a constant, α is one of the α i that satisfy h(α i )=0andn is a positive integer. Writing a function in this way is known as expressing it in partial fractions. Suppose, for the sake of definiteness, that we wish to express the function f(x)= 4x +2 x 2 +3x +2 § It is assumed that the ratio has been reduced so that g(x)andh(x) do not contain any common factors, i.e. there is no value of x that makes both vanish at the same time. We may also assume without any loss of generality that the coeﬃcient of the highest power of x in h(x) has been made equal to unity, if necessary, by dividing both numerator and denominator by the coeﬃcient of this highest power. 18 1.4 PARTIAL FRACTIONS in partial fractions, i.e. to write it as f(x)= g(x) h(x) = 4x +2 x 2 +3x +2 = A 1 (x−α 1 ) n1 + A 2 (x−α 2 ) n2 + ···. (1.43) The first question that arises is that of how many terms there should be on the right-hand side (RHS). Although some complications occur when h(x)has repeated roots (these are considered below) it is clear that f(x) only becomes infinite at the two values of x, α 1 and α 2 , that make h(x) = 0. Consequently the RHS can only become infinite at the same two values of x and therefore contains only two partial fractions – these are the ones shown explicitly. This argument can be trivially extended (again temporarily ignoring the possibility of repeated roots of h(x)) to show that if h(x) is a polynomial of degree n then there should be n terms on the RHS, each containing a diﬀerent root α i of the equation h(α i )=0. A second general question concerns the appropriate values of the n i .Thisis answered by putting the RHS over a common denominator, which will clearly have to be the product (x−α 1 ) n1 (x−α 2 ) n2 ···. Comparison of the highest power of x in this new RHS with the same power in h(x)showsthatn 1 + n 2 + ···= n. This result holds whether or not h(x) = 0 has repeated roots and, although we do not give a rigorous proof, strongly suggests the following correct conclusions. • The number of terms on the RHS is equal to the number of distinct roots of h(x) = 0, each term having a diﬀerent root α i in its denominator (x−α i ) ni . • If α i is a multiple root of h(x) = 0 then the value to be assigned to n i in (1.43) is that of m i when h(x) is written in the product form (1.9). Further, as discussed on p. 23, A i has to be replaced by a polynomial of degree m i −1. This is also formally true for non-repeated roots, since then both m i and n i are equal to unity. Returning to our specific example we note that the denominator h(x) has zeros at x = α 1 = −1andx = α 2 = −2; these x-values are the simple (non-repeated) roots of h(x) = 0. Thus the partial fraction expansion will be of the form 4x +2 x 2 +3x +2 = A 1 x +1 + A 2 x +2 . (1.44) We now list several methods available for determining the coeﬃcients A 1 and A 2 . We also remind the reader that, as with all the explicit examples and techniques described, these methods are to be considered as models for the handling of any ratio of polynomials, with or without characteristics that make it a special case. (i) The RHS can be put over a common denominator, in this case (x+1)(x+2), and then the coeﬃcients of the various powers of x can be equated in the 19 PRELIMINARY ALGEBRA numerators on both sides of the equation. This leads to 4x +2=A 1 (x +2)+A 2 (x +1), 4=A 1 + A 2 2=2A 1 + A 2 . Solving the simultaneous equations for A 1 and A 2 gives A 1 = −2and A 2 =6. (ii) A second method is to substitute two (or more generally n) diﬀerent values of x into each side of (1.44) and so obtain two (or n) simultaneous equations for the two (or n)constantsA i . To justify this practical way of proceeding it is necessary, strictly speaking, to appeal to method (i) above, which establishes that there are unique values for A 1 and A 2 valid for all values of x. It is normally very convenient to take zero as one of the values of x, but of course any set will do. Suppose in the present case that we use the values x = 0 and x = 1 and substitute in (1.44). The resulting equations are 2 2 = A 1 1 + A 2 2 , 6 6 = A 1 2 + A 2 3 , which on solution give A 1 = −2andA 2 = 6, as before. The reader can easily verify that any other pair of values for x (except for a pair that includes α 1 or α 2 ) gives the same values for A 1 and A 2 . (iii) The very reason why method (ii) fails if x is chosen as one of the roots α i of h(x) = 0 can be made the basis for determining the values of the A i corresponding to non-multiple roots without having to solve simultaneous equations. The method is conceptually more diﬃcult than the other meth- ods presented here, and needs results from the theory of complex variables (chapter 24) to justify it. However, we give a practical ‘cookbook’ recipe for determining the coeﬃcients. (a) To determine the coeﬃcient A k , imagine the denominator h(x) written as the product (x−α 1 )(x−α 2 )···(x−α n ), with any m-fold repeated root giving rise to m factors in parentheses. (b) Now set x equal to α k and evaluate the expression obtained after omitting the factor that reads α k −α k . (c) Divide the value so obtained into g(α k ); the result is the required coeﬃcient A k . For our specific example we find that in step (a) that h(x)=(x+1)(x+2) and that in evaluating A 1 step (b) yields −1 + 2, i.e. 1. Since g(−1) = 4(−1) + 2 = −2, step (c) gives A 1 as (−2)/(1), i.e in agreement with our other evaluations. In a similar way A 2 is evaluated as (−6)/(−1) = 6. 20 1.4 PARTIAL FRACTIONS Thus any one of the methods listed above shows that 4x +2 x 2 +3x +2 = −2 x +1 + 6 x +2 . The best method to use in any particular circumstance will depend on the complexity, in terms of the degrees of the polynomials and the multiplicities of the roots of the denominator, of the function being considered and, to some extent, on the individual inclinations of the student; some prefer lengthy but straightforward solution of simultaneous equations, whilst others feel more at home carrying through shorter but more abstract calculations in their heads. 1.4.1 Complications and special cases Having established the basic method for partial fractions, we now show, through further worked examples, how some complications are dealt with by extensions to the procedure. These extensions are introduced one at a time, but of course in any practical application more than one may be involved. The degree of the numerator is greater than or equal to that of the denominator Although we have not specifically mentioned the fact, it will be apparent from trying to apply method (i) of the previous subsection to such a case, that if the degree of the numerator (m) is not less than that of the denominator (n) then the ratio of two polynomials cannot be expressed in partial fractions. To get round this diﬃculty it is necessary to start by dividing the denominator h(x) into the numerator g(x) to obtain a further polynomial, which we will denote by s(x), together with a function t(x)thatis a ratio of two polynomials for which the degree of the numerator is less than that of the denominator. The function t(x) can therefore be expanded in partial fractions. As a formula, f(x)= g(x) h(x) = s(x)+t(x)≡s(x)+ r(x) h(x) . (1.45) It is apparent that the polynomial r(x)istheremainder obtained when g(x)is divided by h(x), and, in general, will be a polynomial of degree n−1. It is also clear that the polynomial s(x) will be of degree m−n. Again, the actual division process can be set out as an algebraic long division sum but is probably more easily handled by writing (1.45) in the form g(x)=s(x)h(x)+r(x) (1.46) or, more explicitly, as g(x)=(s m−n x m−n + s m−n−1 x m−n−1 + ···+ s 0 )h(x)+(r n−1 x n−1 + r n−2 x n−2 +···+ r 0 ) (1.47) and then equating coeﬃcients. 21 PRELIMINARY ALGEBRA We illustrate this procedure with the following worked example. trianglerightsldFind the partial fraction decomposition of the function f(x)= x 3 +3x 2 +2x +1 x 2 −x−6 . Since the degree of the numerator is 3 and that of the denominator is 2, a preliminary long division is necessary. The polynomial s(x) resulting from the division will have degree 3−2 = 1 and the remainder r(x) will be of degree 2−1 = 1 (or less). Thus we write x 3 +3x 2 +2x +1=(s 1 x + s 0 )(x 2 −x−6) + (r 1 x + r 0 ). From equating the coeﬃcients of the various powers of x on the two sides of the equation, starting with the highest, we now obtain the simultaneous equations 1=s 1 , 3=s 0 −s 1 , 2=−s 0 −6s 1 + r 1 , 1=−6s 0 + r 0 . These are readily solved, in the given order, to yield s 1 =1,s 0 =4,r 1 =12andr 0 = 25. Thus f(x) can be written as f(x)=x +4+ 12x +25 x 2 −x−6 . The last term can now be decomposed into partial fractions as previously. The zeros of the denominator are at x =3andx = −2 and the application of any method from the previous subsection yields the respective constants as A 1 =12 1 5 and A 2 = − 1 5 . Thus the final partial fraction decomposition of f(x)is x +4+ 61 5(x−3) − 1 5(x +2) . triangleleftsld Factors of the form a 2 + x 2 in the denominator We have so far assumed that the roots of h(x) = 0, needed for the factorisation of the denominator of f(x), can always be found. In principle they always can but in some cases they are not real. Consider, for example, attempting to express in partial fractions a polynomial ratio whose denominator is h(x)=x 3 −x 2 +2x−2. Clearly x = 1 is a zero of h(x), and so a first factorisation is (x−1)(x 2 +2). However we cannot make any further progress because the factor x 2 + 2 cannot be expressed as (x−α)(x−β)foranyrealα and β. Complex numbers are introduced later in this book (chapter 3) and, when the reader has studied them, he or she may wish to justify the procedure set out below. It can be shown to be equivalent to that already given, but the zeros of h(x) are now allowed to be complex and terms that are complex conjugates of each other are combined to leave only real terms. Since quadratic factors of the form a 2 +x 2 that appear in h(x) cannot be reduced to the product of two linear factors, partial fraction expansions including them need to have numerators in the corresponding terms that are not simply constants 22 1.4 PARTIAL FRACTIONS A i but linear functions of x,i.e.oftheformB i x + C i . Thus, in the expansion, linear terms (first-degree polynomials) in the denominator have constants (zero- degree polynomials) in their numerators, whilst quadratic terms (second-degree polynomials) in the denominator have linear terms (first-degree polynomials) in their numerators. As a symbolic formula, the partial fraction expansion of g(x) (x−α 1 )(x−α 2 )···(x−α p )(x 2 + a 2 1 )(x 2 + a 2 2 )···(x 2 + a 2 q ) should take the form A 1 x−α 1 + A 2 x−α 2 +···+ A p x−α p + B 1 x + C 1 x 2 + a 2 1 + B 2 x + C 2 x 2 + a 2 2 + ···+ B q x + C q x 2 + a 2 q . Of course, the degree of g(x) must be less than p +2q; if it is not, an initial division must be carried out as demonstrated earlier. Repeated factors in the denominator Consider trying (incorrectly) to expand f(x)= x−4 (x +1)(x−2) 2 in partial fraction form as follows: x−4 (x +1)(x−2) 2 = A 1 x +1 + A 2 (x−2) 2 . Multiplying both sides of this supposed equality by (x +1)(x−2) 2 produces an equation whose LHS is linear in x, whilst its RHS is quadratic. This is clearly wrong and so an expansion in the above form cannot be valid. The correction we must make is very similar to that needed in the previous subsection, namely that since (x−2) 2 is a quadratic polynomial the numerator of the term containing it must be a first-degree polynomial, and not simply a constant. The correct form for the part of the expansion containing the doubly repeated root is therefore (Bx+C)/(x−2) 2 . Using this form and either of methods (i) and (ii) for determining the constants gives the full partial fraction expansion as x−4 (x +1)(x−2) 2 =− 5 9(x +1) + 5x−16 9(x−2) 2 , as the reader may verify. Since any term of the form (Bx+ C)/(x−α) 2 can be written as B(x−α)+C + Bα (x−α) 2 = B x−α + C + Bα (x−α) 2 , and similarly for multiply repeated roots, an alternative form for the part of the partial fraction expansion containing a repeated root α is D 1 x−α + D 2 (x−α) 2 +···+ D p (x−α) p . (1.48) 23 PRELIMINARY ALGEBRA In this form, all x-dependence has disappeared from the numerators but at the expense of p−1 additional terms; the total number of constants to be determined remains unchanged, as it must. When describing possible methods of determining the constants in a partial fraction expansion, we noted that method (iii), p. 20, which avoids the need to solve simultaneous equations, is restricted to terms involving non-repeated roots. In fact, it can be applied in repeated-root situations, when the expansion is put in the form (1.48), but only to find the constant in the term involving the largest inverse power of x−α,i.e.D p in (1.48). We conclude this section with a more protracted worked example that contains all three of the complications discussed. trianglerightsldResolve the following expression F(x) into partial fractions: F(x)= x 5 −2x 4 −x 3 +5x 2 −46x + 100 (x 2 +6)(x−2) 2 . We note that the degree of the denominator (4) is not greater than that of the numerator (5), and so we must start by dividing the latter by the former. It follows, from the diﬀerence in degrees and the coeﬃcients of the highest powers in each, that the result will be a linear expression s 1 x + s 0 with the coeﬃcient s 1 equal to 1. Thus the numerator of F(x)mustbe expressible as (x + s 0 )(x 4 −4x 3 +10x 2 −24x + 24) + (r 3 x 3 + r 2 x 2 + r 1 x + r 0 ), where the second factor in parentheses is the denominator of F(x) written as a polynomial. Equating the coeﬃcients of x 4 gives−2=−4+s 0 and fixes s 0 as 2. Equating the coeﬃcients of powers less than 4 gives equations involving the coeﬃcients r i as follows: −1=−8+10+r 3 , 5=−24+20+r 2 , −46 = 24−48 + r 1 , 100 = 48 + r 0 . Thus the remainder polynomial r(x) can be constructed and F(x) written as F(x)=x +2+ −3x 3 +9x 2 −22x +52 (x 2 +6)(x−2) 2 ≡x +2+f(x). The polynomial ratio f(x) can now be expressed in partial fraction form, noting that its denominator contains both a term of the form x 2 + a 2 and a repeated root. Thus f(x)= Bx+ C x 2 +6 + D 1 x−2 + D 2 (x−2) 2 . We could now put the RHS of this equation over the common denominator (x 2 +6)(x−2) 2 and find B,C,D 1 and D 2 by equating coeﬃcients of powers of x. It is quicker, however, to use methods (iii) and (ii). Method (iii) gives D 2 as (−24 + 36−44 + 52)/(4 + 6) = 2. We choose to evaluate the other coeﬃcients by method (ii), and setting x =0,x =1and 24 1.5 BINOMIAL EXPANSION x =−1 gives respectively 52 24 = C 6 − D 1 2 + 2 4 , 36 7 = B + C 7 −D 1 +2, 86 63 = C−B 7 − D 1 3 + 2 9 . These equations reduce to 4C−12D 1 =40, B + C−7D 1 =22, −9B +9C−21D 1 =72, with solution B =0,C =1,D 1 =−3. Thus, finally, we may rewrite the original expression F(x) in partial fractions as F(x)=x +2+ 1 x 2 +6 − 3 x−2 + 2 (x−2) 2 . triangleleftsld 1.5 Binomial expansion Earlier in this chapter we were led to consider functions containing powers of the sum or diﬀerence of two terms, e.g. (x−α) m . Later in this book we will find numerous occasions on which we wish to write such a product of repeated factors as a polynomial in x or, more generally, as a sum of terms each of which contains powers of x and α separately, as opposed to a power of their sum or diﬀerence. To make the discussion general and the result applicable to a wide variety of situations, we will consider the general expansion of f(x)=(x+y) n ,wherex and y may stand for constants, variables or functions and, for the time being, n is a positive integer. It may not be obvious what form the general expansion takes but some idea can be obtained by carrying out the multiplication explicitly for small values of n. Thus we obtain successively (x + y) 1 = x + y, (x + y) 2 =(x + y)(x + y)=x 2 +2xy + y 2 , (x + y) 3 =(x + y)(x 2 +2xy + y 2 )=x 3 +3x 2 y +3xy 2 + y 3 , (x + y) 4 =(x + y)(x 3 +3x 2 y +3xy 2 + y 3 )=x 4 +4x 3 y +6x 2 y 2 +4xy 3 + y 4 . This does not establish a general formula, but the regularity of the terms in the expansions and the suggestion of a pattern in the coeﬃcients indicate that a general formula for power n will have n + 1 terms, that the powers of x and y in every term will add up to n and that the coeﬃcients of the first and last terms will be unity whilst those of the second and penultimate terms will be n. 25 PRELIMINARY ALGEBRA In fact, the general expression, the binomial expansion for power n, is given by (x + y) n = k=n summationdisplay k=0 n C k x n−k y k , (1.49) where n C k is called the binomial coeﬃcient and is expressed in terms of factorial functions by n!/[k!(n−k)!]. Clearly, simply to make such a statement does not constitute proof of its validity, but, as we will see in subsection 1.5.2, (1.49) can be proved using a method called induction. Before turning to that proof, we investigate some of the elementary properties of the binomial coeﬃcients. 1.5.1 Binomial coefficients As stated above, the binomial coeﬃcients are defined by n C k ≡ n! k!(n−k)! ≡ parenleftbigg n k parenrightbigg for 0 ≤k≤n, (1.50) where in the second identity we give a common alternative notation for n C k . Obvious properties include (i) n C 0 = n C n =1, (ii) n C 1 = n C n−1 = n, (iii) n C k = n C n−k . We note that, for any given n, the largest coeﬃcient in the binomial expansion is the middle one (k = n/2) if n is even; the middle two coeﬃcients (k = 1 2 (n±1)) are equal largest if n is odd. Somewhat less obvious is the result n C k + n C k−1 = n! k!(n−k)! + n! (k−1)!(n−k +1)! = n![(n +1−k)+k] k!(n +1−k)! = (n +1)! k!(n +1−k)! = n+1 C k . (1.51) An equivalent statement, in which k has been redefined as k +1,is n C k + n C k+1 = n+1 C k+1 . (1.52) 1.5.2 Proof of the binomial expansion We are now in a position to prove the binomial expansion (1.49). In doing so, we introduce the reader to a procedure applicable to certain types of problems and known as the method of induction. The method is discussed much more fully in subsection 1.7.1. 26 1.6 PROPERTIES OF BINOMIAL COEFFICIENTS We start by assuming that (1.49) is true for some positive integer n = N.Wenow proceed to show that this implies that it must also be true for n = N+1, as follows: (x + y) N+1 =(x + y) N summationdisplay k=0 N C k x N−k y k = N summationdisplay k=0 N C k x N+1−k y k + N summationdisplay k=0 N C k x N−k y k+1 = N summationdisplay k=0 N C k x N+1−k y k + N+1 summationdisplay j=1 N C j−1 x (N+1)−j y j , where in the first line we have used the assumption and in the third line have moved the second summation index by unity, by writing k +1=j.Wenow separate oﬀ the first term of the first sum, N C 0 x N+1 , and write it as N+1 C 0 x N+1 ; we can do this since, as noted in (i) following (1.50), n C 0 =1foreveryn. Similarly, the last term of the second summation can be replaced by N+1 C N+1 y N+1 . The remaining terms of each of the two summations are now written together, with the summation index denoted by k in both terms. Thus (x + y) N+1 = N+1 C 0 x N+1 + N summationdisplay k=1 parenleftbig N C k + N C k−1 parenrightbig x (N+1)−k y k + N+1 C N+1 y N+1 = N+1 C 0 x N+1 + N summationdisplay k=1 N+1 C k x (N+1)−k y k + N+1 C N+1 y N+1 = N+1 summationdisplay k=0 N+1 C k x (N+1)−k y k . In going from the first to the second line we have used result (1.51). Now we observe that the final overall equation is just the original assumed result (1.49) but with n = N + 1. Thus it has been shown that if the binomial expansion is assumed to be true for n = N,thenitcanbeproved to be true for n = N +1.But it holds trivially for n = 1, and therefore for n = 2 also. By the same token it is valid for n =3,4,..., and hence is established for all positive integers n. 1.6 Properties of binomial coeﬃcients 1.6.1 Identities involving binomial coefficients There are many identities involving the binomial coeﬃcients that can be derived directly from their definition, and yet more that follow from their appearance in the binomial expansion. Only the most elementary ones, given earlier, are worth committing to memory but, as illustrations, we now derive two results involving sums of binomial coeﬃcients. 27 PRELIMINARY ALGEBRA The first is a further application of the method of induction. Consider the proposal that, for any n≥1andk≥0, n−1 summationdisplay s=0 k+s C k = n+k C k+1 . (1.53) Notice that here n, the number of terms in the sum, is the parameter that varies, k is a fixed parameter, whilst s is a summation index and does not appear on the RHS of the equation. Now we suppose that the statement (1.53) about the value of the sum of the binomial coeﬃcients k C k , k+1 C k ,..., k+n−1 C k is true for n = N.Wenextwritedown a series with an extra term and determine the implications of the supposition for the new series: N+1−1 summationdisplay s=0 k+s C k = N−1 summationdisplay s=0 k+s C k + k+N C k = N+k C k+1 + N+k C k = N+k+1 C k+1 . But this is just proposal (1.53) with n now set equal to N + 1. To obtain the last line, we have used (1.52), with n set equal to N + k. It only remains to consider the case n = 1, when the summation only contains one term and (1.53) reduces to k C k = 1+k C k+1 . This is trivially valid for any k since both sides are equal to unity, thus completing the proof of (1.53) for all positive integers n. The second result, which gives a formula for combining terms from two sets of binomial coeﬃcients in a particular way (a kind of ‘convolution’, for readers who are already familiar with this term), is derived by applying the binomial expansion directly to the identity (x + y) p (x + y) q ≡(x + y) p+q . Written in terms of binomial expansions, this reads p summationdisplay s=0 p C s x p−s y s q summationdisplay t=0 q C t x q−t y t = p+q summationdisplay r=0 p+q C r x p+q−r y r . We now equate coeﬃcients of x p+q−r y r on the two sides of the equation, noting that on the LHS all combinations of s and t such that s + t = r contribute. This gives as an identity that r summationdisplay t=0 p C r−t q C t = p+q C r = r summationdisplay t=0 p C t q C r−t . (1.54) 28 1.6 PROPERTIES OF BINOMIAL COEFFICIENTS We have specifically included the second equality to emphasise the symmetrical nature of the relationship with respect to p and q. Further identities involving the coeﬃcients can be obtained by giving x and y special values in the defining equation (1.49) for the expansion. If both are set equal to unity then we obtain (using the alternative notation so as to produce familiarity with it) parenleftbigg n 0 parenrightbigg + parenleftbigg n 1 parenrightbigg + parenleftbigg n 2 parenrightbigg + ···+ parenleftbigg n n parenrightbigg =2 n , (1.55) whilst setting x = 1 and y =−1 yields parenleftbigg n 0 parenrightbigg − parenleftbigg n 1 parenrightbigg + parenleftbigg n 2 parenrightbigg −···+(−1) n parenleftbigg n n parenrightbigg =0. (1.56) 1.6.2 Negative and non-integral values of n Up till now we have restricted n in the binomial expansion to be a positive integer. Negative values can be accommodated, but only at the cost of an infinite series of terms rather than the finite one represented by (1.49). For reasons that are intuitively sensible and will be discussed in more detail in chapter 4, very often we require an expansion in which, at least ultimately, successive terms in the infinite series decrease in magnitude. For this reason, if x>ywe consider (x + y) −m ,wherem itself is a positive integer, in the form (x + y) n =(x + y) −m = x −m parenleftBig 1+ y x parenrightBig −m . Since the ratio y/x is less than unity, terms containing higher powers of it will be small in magnitude, whilst raising the unit term to any power will not aﬀect its magnitude. If y>xthe roles of the two must be interchanged. We can now state, but will not explicitly prove, the form of the binomial expansion appropriate to negative values of n (n equal to −m): (x + y) n =(x + y) −m = x −m ∞ summationdisplay k=0 −m C k parenleftBig y x parenrightBig k , (1.57) where the hitherto undefined quantity −m C k , which appears to involve factorials of negative numbers, is given by −m C k =(−1) k m(m +1)···(m + k−1) k! =(−1) k (m + k−1)! (m−1)!k! =(−1) km+k−1 C k . (1.58) The binomial coeﬃcient on the extreme right of this equation has its normal meaning and is well defined since m + k−1≥k. Thus we have a definition of binomial coeﬃcients for negative integer values of n in terms of those for positive n. The connection between the two may not 29 PRELIMINARY ALGEBRA be obvious, but they are both formed in the same way in terms of recurrence relations. Whatever the sign of n, the series of coeﬃcients n C k can be generated by starting with n C 0 = 1 and using the recurrence relation n C k+1 = n−k k +1 n C k . (1.59) The diﬀerence is that for positive integer n the series terminates when k = n, whereas for negative n there is no such termination – in line with the infinite series of terms in the corresponding expansion. Finally we note that, in fact, equation (1.59) generates the appropriate coef- ficients for all values of n, positive or negative, integer or non-integer, with the obvious exception of the case in which x =−y and n is negative. For non-integer n the expansion does not terminate, even if n is positive. 1.7 Some particular methods of proof Much of the mathematics used by physicists and engineers is concerned with obtaining a particular value, formula or function from a given set of data and stated conditions. However, just as it is essential in physics to formulate the basic laws and so be able to set boundaries on what can or cannot happen, so it is important in mathematics to be able to state general propositions about the outcomes that are or are not possible. To this end one attempts to establish theorems that state in as general a way as possible mathematical results that apply to particular types of situation. We conclude this introductory chapter by describing two methods that can sometimes be used to prove particular classes of theorems. The two general methods of proof are known as proof by induction (which has already been met in this chapter) and proof by contradiction. They share the common characteristic that at an early stage in the proof an assumption is made that a particular (unproven) statement is true; the consequences of that assumption are then explored. In an inductive proof the conclusion is reached that the assumption is self-consistent and has other equally consistent but broader implications, which are then applied to establish the general validity of the assumption. A proof by contradiction, however, establishes an internal inconsistency and thus shows that the assumption is unsustainable; the natural consequence of this is that the negative of the assumption is established as true. Later in this book use will be made of these methods of proof to explore new territory, e.g. to examine the properties of vector spaces, matrices and groups. However, at this stage we will draw our illustrative and test examples from earlier sections of this chapter and other topics in elementary algebra and number theory. 30 1.7 SOME PARTICULAR METHODS OF PROOF 1.7.1 Proof by induction The proof of the binomial expansion given in subsection 1.5.2 and the identity established in subsection 1.6.1 have already shown the way in which an inductive proof is carried through. They also indicated the main limitation of the method, namely that only an initially supposed result can be proved. Thus the method of induction is of no use for deducing a previously unknown result; a putative equation or result has to be arrived at by some other means, usually by noticing patterns or by trial and error using simple values of the variables involved. It will also be clear that propositions that can be proved by induction are limited to those containing a parameter that takes a range of integer values (usually infinite). For a proposition involving a parameter n, the five steps in a proof using induction are as follows. (i) Formulate the supposed result for general n. (ii) Suppose (i) to be true for n = N (or more generally for all values of n≤N;seebelow),whereN is restricted to lie in the stated range. (iii) Show, using only proven results and supposition (ii), that proposition (i) is true for n = N +1. (iv) Demonstrate directly, and without any assumptions, that proposition (i) is true when n takes the lowest value in its range. (v) It then follows from (iii) and (iv) that the proposition is valid for all values of n in the stated range. (It should be noted that, although many proofs at stage (iii) require the validity of the proposition only for n = N, some require it for all n less than or equal to N – hence the form of inequality given in parentheses in the stage (ii) assumption.) To illustrate further the method of induction, we now apply it to two worked examples; the first concerns the sum of the squares of the first n natural numbers. trianglerightsldProve that the sum of the squares of the first n natural numbers is given by n summationdisplay r=1 r 2 = 1 6 n(n + 1)(2n +1). (1.60) As previously we start by assuming the result is true for n = N. Then it follows that N+1 summationdisplay r=1 r 2 = N summationdisplay r=1 r 2 +(N +1) 2 = 1 6 N(N + 1)(2N +1)+(N +1) 2 = 1 6 (N +1)[N(2N +1)+6N +6] = 1 6 (N + 1)[(2N +3)(N +2)] = 1 6 (N +1)[(N +1)+1][2(N +1)+1]. 31 PRELIMINARY ALGEBRA This is precisely the original assumption, but with N replaced by N + 1. To complete the proof we only have to verify (1.60) for n = 1. This is trivially done and establishes the result for all positive n. The same and related results are obtained by a diﬀerent method in subsection 4.2.5. triangleleftsld Our second example is somewhat more complex and involves two nested proofs by induction: whilst trying to establish the main result by induction, we find that we are faced with a second proposition which itself requires an inductive proof. trianglerightsldShow that Q(n)=n 4 +2n 3 +2n 2 +n is divisible by 6 (without remainder) for all positive integer values of n. Again we start by assuming the result is true for some particular value N of n, whilst noting that it is trivially true for n = 0. We next examine Q(N + 1), writing each of its terms as a binomial expansion: Q(N +1)=(N +1) 4 +2(N +1) 3 +2(N +1) 2 +(N +1) =(N 4 +4N 3 +6N 2 +4N +1)+2(N 3 +3N 2 +3N +1) +2(N 2 +2N +1)+(N +1) =(N 4 +2N 3 +2N 2 + N)+(4N 3 +12N 2 +14N +6). Now, by our assumption, the group of terms within the first parentheses in the last line is divisible by 6 and clearly so are the terms 12N 2 and 6 within the second parentheses. Thus it comes down to deciding whether 4N 3 +14N is divisible by 6 – or equivalently, whether R(N)=2N 3 +7N is divisible by 3. To settle this latter question we try using a second inductive proof and assume that R(N) is divisible by 3 for N = M, whilst again noting that the proposition is trivially true for N = M = 0. This time we examine R(M +1): R(M +1)=2(M +1) 3 +7(M +1) =2(M 3 +3M 2 +3M +1)+7(M +1) =(2M 3 +7M)+3(2M 2 +2M +3) By assumption, the first group of terms in the last line is divisible by 3 and the second group is patently so. We thus conclude that R(N) is divisible by 3 for all N ≥ M,and taking M = 0 shows that it is divisible by 3 for all N. We can now return to the main proposition and conclude that since R(N)=2N 3 +7N is divisible by 3, 4N 3 +12N 2 +14N + 6 is divisible by 6. This in turn establishes that the divisibility of Q(N + 1) by 6 follows from the assumption that Q(N) divides by 6. Since Q(0) clearly divides by 6, the proposition in the question is established for all values of n. triangleleftsld 1.7.2 Proof by contradiction The second general line of proof, but again one that is normally only useful when the result is already suspected, is proof by contradiction. The questions it can attempt to answer are only those that can be expressed in a proposition that is either true or false. Clearly, it could be argued that any mathematical result can be so expressed but, if the proposition is no more than a guess, the chances of success are negligible. Valid propositions containing even modest formulae are either the result of true inspiration or, much more normally, yet another reworking of an old chestnut! 32 1.7 SOME PARTICULAR METHODS OF PROOF The essence of the method is to exploit the fact that mathematics is required to be self-consistent, so that, for example, two calculations of the same quantity, starting from the same given data but proceeding by diﬀerent methods, must give the same answer. Equally, it must not be possible to follow a line of reasoning and draw a conclusion that contradicts either the input data or any other conclusion based upon the same data. It is this requirement on which the method of proof by contradiction is based. The crux of the method is to assume that the proposition to be proved is not true, and then use this incorrect assumption and ‘watertight’ reasoning to draw a conclusion that contradicts the assumption. The only way out of the self-contradiction is then to conclude that the assumption was indeed false and therefore that the proposition is true. It must be emphasised that once a (false) contrary assumption has been made, every subsequent conclusion in the argument must follow of necessity. Proof by contradiction fails if at any stage we have to admit ‘this may or may not be the case’. That is, each step in the argument must be a necessary consequence of results that precede it (taken together with the assumption), rather than simply a possible consequence. It should also be added that if no contradiction can be found using sound reasoning based on the assumption then no conclusion can be drawn about either the proposition or its negative and some other approach must be tried. We illustrate the general method with an example in which the mathematical reasoning is straightforward, so that attention can be focussed on the structure of the proof. trianglerightsldA rational number r is a fraction r = p/q in which p and q are integers with q positive. Further, r is expressed in its lowest terms, any integer common factor of p and q having been divided out. Prove that the square root of an integer m cannot be a rational number, unless the square root itself is an integer. We begin by supposing that the stated result is not true and that we can write an equation √ m = r = p q for integers m,p,q with qnegationslash=1. It then follows that p 2 = mq 2 .But,sincer is expressed in its lowest terms, p and q,and hence p 2 and q 2 , have no factors in common. However, m is an integer; this is only possible if q =1andp 2 = m. This conclusion contradicts the requirement that q negationslash=1andsoleads to the conclusion that it was wrong to suppose that √ m can be expressed as a non-integer rational number. This completes the proof of the statement in the question. triangleleftsld Our second worked example, also taken from elementary number theory, involves slightly more complicated mathematical reasoning but again exhibits the structure associated with this type of proof. 33 PRELIMINARY ALGEBRA trianglerightsldThe prime integers p i are labelled in ascending order, thus p 1 =1,p 2 =2,p 5 =7, etc. Show that there is no largest prime number. Assume, on the contrary, that there is a largest prime and let it be p N .Considernowthe number q formed by multiplying together all the primes from p 1 to p N and then adding one to the product, i.e. q = p 1 p 2 ···p N +1. By our assumption p N is the largest prime, and so no number can have a prime factor greater than this. However, for every prime p i , i =1,2,...,N, the quotient q/p i has the form M i +(1/p i )withM i an integer and 1/p i non-integer. This means that q/p i cannot be an integer and so p i cannot be a divisor of q. Since q is not divisible by any of the (assumed) finite set of primes, it must be itself a prime. As q is also clearly greater than p N , we have a contradiction. This shows that our assumption that there is a largest prime integer must be false, and so it follows that there is no largest prime integer. It should be noted that the given construction for q does not generate all the primes that actually exist (e.g. for N =3,q = 7 rather than the next actual prime value of 5, is found), but this does not matter for the purposes of our proof by contradiction. triangleleftsld 1.7.3 Necessary and sufficient conditions As the final topic in this introductory chapter, we consider briefly the notion of, and distinction between, necessary and suﬃcient conditions in the context of proving a mathematical proposition. In ordinary English the distinction is well defined, and that distinction is maintained in mathematics. However, in the authors’ experience students tend to overlook it and assume (wrongly) that, having proved that the validity of proposition A implies the truth of proposition B, it follows by ‘reversing the argument’ that the validity of B automatically implies that of A. As an example, let proposition A be that an integer N is divisible without remainder by 6, and proposition B be that N is divisible without remainder by 2. Clearly, if A is true then it follows that B is true, i.e. A is a suﬃcient condition for B; it is not however a necessary condition, as is trivially shown by taking N as 8. Conversely, the same value of N shows that whilst the validity of B is a necessary condition for A to hold, it is not suﬃcient. An alternative terminology to ‘necessary’ and ‘suﬃcient’ often employed by mathematicians is that of ‘if’ and ‘only if’, particularly in the combination ‘if and only if’ which is usually written as IFF or denoted by a double-headed arrow ⇐⇒ . The equivalent statements can be summarised by A if BAis true if B is true or B =⇒A, B is a suﬃcient condition for AB=⇒A, A only if BAis true only if B is true or A =⇒B, B is a necessary consequence of AA=⇒B, 34 1.7 SOME PARTICULAR METHODS OF PROOF A IFF BAis true if and only if B is true or B ⇐⇒ A, A and B necessarily imply each other B ⇐⇒ A. Although at this stage in the book we are able to employ for illustrative purposes only simple and fairly obvious results, the following example is given as a model of how necessary and suﬃcient conditions should be proved. The essential point is that for the second part of the proof (whether it be the ‘necessary’ part or the ‘suﬃcient’ part) one needs to start again from scratch; more often than not, the lines of the second part of the proof will not be simply those of the first written in reverse order. trianglerightsldProve that (A) a function f(x) is a quadratic polynomial with zeros at x =2and x =3 if and only if (B) the function f(x) has the form λ(x 2 −5x+6) with λ a non-zero constant. (1) Assume A,i.e.thatf(x) is a quadratic polynomial with zeros at x =2andx =3.Let its form be ax 2 + bx + c with anegationslash= 0. Then we have 4a +2b + c =0, 9a +3b + c =0, and subtraction shows that 5a + b =0andb = −5a. Substitution of this into the first of the above equations gives c =−4a−2b =−4a +10a =6a. Thus, it follows that f(x)=a(x 2 −5x +6) with anegationslash=0, and establishes the ‘A only if B’ part of the stated result. (2) Now assume that f(x) has the form λ(x 2 −5x +6)withλ a non-zero constant. Firstly we note that f(x) is a quadratic polynomial, and so it only remains to prove that its zeros occur at x =2andx =3.Considerf(x) = 0, which, after dividing through by the non-zero constant λ,gives x 2 −5x +6=0. We proceed by using a technique known as completing the square, for the purposes of illustration, although the factorisation of the above equation should be clear to the reader. Thus we write x 2 −5x +( 5 2 ) 2 −( 5 2 ) 2 +6=0, (x− 5 2 ) 2 = 1 4 , x− 5 2 =± 1 2 . The two roots of f(x) = 0 are therefore x =2andx =3;thesex-values give the zeros of f(x). This establishes the second (‘A if B’) part of the result. Thus we have shown that the assumption of either condition implies the validity of the other and the proof is complete. triangleleftsld It should be noted that the propositions have to be carefully and precisely formulated. If, for example, the word ‘quadratic’ were omitted from A, statement B would still be a suﬃcient condition for A but not a necessary one, since f(x) could then be x 3 −4x 2 +x+ 6 and A would not require B. Omitting the constant λ from the stated form of f(x)inB has the same eﬀect. Conversely, if A were to state that f(x)=3(x−2)(x−3) then B would be a necessary condition for A but not a suﬃcient one. 35 PRELIMINARY ALGEBRA 1.8 Exercises Polynomial equations 1.1 Continue the investigation of equation (1.7), namely g(x)=4x 3 +3x 2 −6x−1, as follows. (a) Make a table of values of g(x) for integer values of x between−2and2.Use it and the information derived in the text to draw a graph and so determine the roots of g(x) = 0 as accurately as possible. (b) Find one accurate root of g(x) = 0 by inspection and hence determine precise values for the other two roots. (c) Show that f(x)=4x 3 +3x 2 −6x−k = 0 has only one real root unless −5≤k≤ 7 4 . 1.2 Determine how the number of real roots of the equation g(x)=4x 3 −17x 2 +10x + k =0 depends upon k. Are there any cases for which the equation has exactly two distinct real roots? 1.3 Continue the analysis of the polynomial equation f(x)=x 7 +5x 6 + x 4 −x 3 + x 2 −2=0, investigated in subsection 1.1.1, as follows. (a) By writing the fifth-degree polynomial appearing in the expression for f prime (x) in the form 7x 5 +30x 4 + a(x−b) 2 + c, show that there is in fact only one positive root of f(x)=0. (b) By evaluating f(1),f(0) and f(−1), and by inspecting the form of f(x)for negative values of x, determine what you can about the positions of the real roots of f(x)=0. 1.4 Given that x =2isonerootof g(x)=2x 4 +4x 3 −9x 2 −11x−6=0, use factorisation to determine how many real roots it has. 1.5 Construct the quadratic equations that have the following pairs of roots: (a) −6,−3; (b) 0,4; (c) 2,2;(d)3+2i,3−2i,wherei 2 =−1. 1.6 Use the results of (i) equation (1.13), (ii) equation (1.12) and (iii) equation (1.14) to prove that if the roots of 3x 3 −x 2 −10x +8=0areα 1 ,α 2 and α 3 then (a) α −1 1 + α −1 2 + α −1 3 =5/4, (b) α 2 1 + α 2 2 + α 2 3 =61/9, (c) α 3 1 + α 3 2 + α 3 3 =−125/27. (d) Convince yourself that eliminating (say) α 2 and α 3 from (i), (ii) and (iii) does not give a simple explicit way of finding α 1 . Trigonometric identities 1.7 Prove that cos π 12 = √ 3+1 2 √ 2 by considering 36 1.8 EXERCISES (a) the sum of the sines of π/3andπ/6, (b) the sine of the sum of π/3andπ/4. 1.8 The following exercises are based on the half-angle formulae. (a) Use the fact that sin(π/6) = 1/2 to prove that tan(π/12) = 2− √ 3. (b) Use the result of (a) to show further that tan(π/24) = q(2 −q)where q 2 =2+ √ 3. 1.9 Find the real solutions of (a) 3 sinθ−4cosθ =2, (b) 4 sinθ +3cosθ =6, (c) 12 sinθ−5cosθ =−6. 1.10 If s = sin(π/8), prove that 8s 4 −8s 2 +1=0, and hence show that s =[(2− √ 2)/4] 1/2 . 1.11 Find all the solutions of sinθ +sin4θ =sin2θ +sin3θ that lie in the range −π<θ≤π. What is the multiplicity of the solution θ =0? Coordinate geometry 1.12 Obtain in the form (1.38) the equations that describe the following: (a) a circle of radius 5 with its centre at (1,−1); (b) the line 2x +3y + 4 = 0 and the line orthogonal to it which passes through (1,1); (c) an ellipse of eccentricity 0.6 with centre (1,1) and its major axis of length 10 parallel to the y-axis. 1.13 Determine the forms of the conic sections described by the following equations: (a) x 2 + y 2 +6x +8y =0; (b) 9x 2 −4y 2 −54x−16y +29=0; (c) 2x 2 +2y 2 +5xy−4x + y−6=0; (d) x 2 + y 2 +2xy−8x +8y =0. 1.14 For the ellipse x 2 a 2 + y 2 b 2 =1 with eccentricity e, the two points (−ae,0) and (ae,0) are known as its foci. Show that the sum of the distances from any point on the ellipse to the foci is 2a.(The constancy of the sum of the distances from two fixed points can be used as an alternative defining property of an ellipse.) Partial fractions 1.15 Resolve the following into partial fractions using the three methods given in section 1.4, verifying that the same decomposition is obtained by each method: (a) 2x +1 x 2 +3x−10 , (b) 4 x 2 −3x . 37 PRELIMINARY ALGEBRA 1.16 Express the following in partial fraction form: (a) 2x 3 −5x +1 x 2 −2x−8 , (b) x 2 + x−1 x 2 + x−2 . 1.17 Rearrange the following functions in partial fraction form: (a) x−6 x 3 −x 2 +4x−4 , (b) x 3 +3x 2 + x +19 x 4 +10x 2 +9 . 1.18 Resolve the following into partial fractions in such a way that x does not appear in any numerator: (a) 2x 2 + x +1 (x−1) 2 (x +3) , (b) x 2 −2 x 3 +8x 2 +16x , (c) x 3 −x−1 (x +3) 3 (x +1) . Binomial expansion 1.19 Evaluate those of the following that are defined: (a) 5 C 3 ,(b) 3 C 5 ,(c) −5 C 3 ,(d) −3 C 5 . 1.20 Use a binomial expansion to evaluate 1/ √ 4.2 to five places of decimals, and compare it with the accurate answer obtained using a calculator. Proof by induction and contradiction 1.21 Prove by induction that n summationdisplay r=1 r = 1 2 n(n +1) and n summationdisplay r=1 r 3 = 1 4 n 2 (n +1) 2 . 1.22 Prove by induction that 1+r + r 2 +···+ r k +···+ r n = 1−r n+1 1−r . 1.23 Prove that 3 2n +7,wheren is a non-negative integer, is divisible by 8. 1.24 If a sequence of terms, u n , satisfies the recurrence relation u n+1 =(1−x)u n + nx, with u 1 = 0, show, by induction, that, for n≥1, u n = 1 x [nx−1+(1−x) n ]. 1.25 Prove by induction that n summationdisplay r=1 1 2 r tan parenleftbigg θ 2 r parenrightbigg = 1 2 n cot parenleftbigg θ 2 n parenrightbigg −cotθ. 1.26 The quantities a i in this exercise are all positive real numbers. (a) Show that a 1 a 2 ≤ parenleftbigg a 1 + a 2 2 parenrightbigg 2 . (b) Hence prove, by induction on m,that a 1 a 2 ···a p ≤ parenleftbigg a 1 + a 2 +···+ a p p parenrightbigg p , where p =2 m with m a positive integer. Note that each increase of m by unity doubles the number of factors in the product. 38 1.9 HINTS AND ANSWERS 1.27 Establish the values of k for which the binomial coeﬃcient p C k is divisible by p when p is a prime number. Use your result and the method of induction to prove that n p −n is divisible by p for all integers n and all prime numbers p. Deduce that n 5 −n isdivisibleby30foranyintegern. 1.28 An arithmetic progression of integers a n is one in which a n = a 0 + nd,wherea 0 and d are integers and n takes successive values 0,1,2,.... (a) Show that if any one term of the progression is the cube of an integer then so are infinitely many others. (b) Show that no cube of an integer can be expressed as 7n+5 for some positive integer n. 1.29 Prove, by the method of contradiction, that the equation x n + a n−1 x n−1 +···+ a 1 x + a 0 =0, in which all the coeﬃcients a i are integers, cannot have a rational root, unless that root is an integer. Deduce that any integral root must be a divisor of a 0 and hence find all rational roots of (a) x 4 +6x 3 +4x 2 +5x +4=0, (b) x 4 +5x 3 +2x 2 −10x +6=0. Necessary and suﬃcient conditions 1.30 Prove that the equation ax 2 + bx + c =0,inwhicha, b and c are real and a>0, has two real distinct solutions IFF b 2 > 4ac. 1.31 For the real variable x, show that a suﬃcient, but not necessary, condition for f(x)=x(x + 1)(2x + 1) to be divisible by 6 is that x is an integer. 1.32 Given that at least one of a and b, and at least one of c and d, are non-zero, show that ad = bc is both a necessary and suﬃcient condition for the equations ax + by =0, cx + dy =0, to have a solution in which at least one of x and y is non-zero. 1.33 The coeﬃcients a i in the polynomial Q(x)=a 4 x 4 + a 3 x 3 + a 2 x 2 + a 1 x are all integers. Show that Q(n) is divisible by 24 for all integers n≥0 if and only if all of the following conditions are satisfied: (i) 2a 4 + a 3 is divisible by 4; (ii) a 4 + a 2 is divisible by 12; (iii) a 4 + a 3 + a 2 + a 1 is divisible by 24. 1.9 Hints and answers 1.1 (b) The roots are 1, 1 8 (−7+ √ 33) =−0.1569, 1 8 (−7− √ 33) =−1.593. (c) −5and 7 4 are the values of k that make f(−1) and f( 1 2 ) equal to zero. 1.3 (a) a =4,b= 3 8 and c = 23 16 are all positive. Therefore f prime (x) > 0forallx>0. (b) f(1) = 5, f(0) =−2andf(−1) = 5, and so there is at least one root in each of the ranges 0 4 2 +3 2 .(c)−0.0849, −2.276. 39 PRELIMINARY ALGEBRA 1.11 Show that the equation is equivalent to sin(5θ/2)sin(θ)sin(θ/2) = 0. Solutions are −4π/5,−2π/5,0,2π/5,4π/5,π.Thesolutionθ = 0 has multiplicity 3. 1.13 (a) A circle of radius 5 centred on (−3,−4). (b) A hyperbola with ‘centre’ (3,−2) and ‘semi-axes’ 2 and 3. (c) The expression factorises into two lines, x +2y−3=0and2x + y +2=0. (d) Write the expression as (x+y) 2 =8(x−y) to see that it represents a parabola passing through the origin, with the line x + y = 0 as its axis of symmetry. 1.15 (a) 5 7(x−2) + 9 7(x +5) , (b) − 4 3x + 4 3(x−3) . 1.17 (a) x +2 x 2 +4 − 1 x−1 , (b) x +1 x 2 +9 + 2 x 2 +1 . 1.19 (a) 10, (b) not defined, (c) −35, (d) −21. 1.21 Look for factors common to the n = N sum and the additional n = N +1term, so as to reduce the sum for n = N +1toasingleterm. 1.23 Write 3 2n as 8m−7. 1.25 Use the half-angle formulae of equations (1.32) to (1.34) to relate functions of θ/2 k to those of θ/2 k+1 . 1.27 Divisible for k =1,2,...,p−1. Expand (n +1) p as n p + summationtext p−1 1 p C k n k + 1. Apply the stated result for p = 5. Note that n 5 −n = n(n−1)(n+1)(n 2 +1); the product of any three consecutive integers must divide by both 2 and 3. 1.29 By assuming x = p/q with q negationslash= 1, show that a fraction −p n /q isequaltoan integer a n−1 p n−1 + ···+ a 1 pq n−2 + a 0 q n−1 . This is a contradiction, and is only resolved if q = 1 and the root is an integer. (a) The only possible candidates are ±1,±2,±4. None is a root. (b) The only possible candidates are ±1,±2,±3,±6. Only −3isaroot. 1.31 f(x) can be written as x(x +1)(x +2)+x(x +1)(x−1). Each term consists of the product of three consecutive integers, of which one must therefore divide by 2 and (a diﬀerent) one by 3. Thus each term separately divides by 6, and so therefore does f(x). Note that if x is the root of 2x 3 +3x 2 + x−24 = 0 that lies near the non-integer value x =1.826, then x(x + 1)(2x + 1) = 24 and therefore divides by 6. 1.33 Note that, e.g., the condition for 6a 4 + a 3 to be divisible by 4 is the same as the condition for 2a 4 + a 3 to be divisible by 4. For the necessary (only if) part of the proof set n =1,2,3 and take integer combinations of the resulting equations. For the suﬃcient (if) part of the proof use the stated conditions to prove the proposition by induction. Note that n 3 −n is divisible by 6 and that n 2 + n is even. 40 2 Preliminary calculus This chapter is concerned with the formalism of probably the most widely used mathematical technique in the physical sciences, namely the calculus. The chapter divides into two sections. The first deals with the process of diﬀerentiation and the second with its inverse process, integration. The material covered is essential for the remainder of the book and serves as a reference. Readers who have previously studied these topics should ensure familiarity by looking at the worked examples in the main text and by attempting the exercises at the end of the chapter. 2.1 Diﬀerentiation Diﬀerentiation is the process of determining how quickly or slowly a function varies, as the quantity on which it depends, its argument, is changed. More specifically it is the procedure for obtaining an expression (numerical or algebraic) for the rate of change of the function with respect to its argument. Familiar examples of rates of change include acceleration (the rate of change of velocity) and chemical reaction rate (the rate of change of chemical composition). Both acceleration and reaction rate give a measure of the change of a quantity with respect to time. However, diﬀerentiation may also be applied to changes with respect to other quantities, for example the change in pressure with respect to a change in temperature. Although it will not be apparent from what we have said so far, diﬀerentiation is in fact a limiting process, that is, it deals only with the infinitesimal change in one quantity resulting from an infinitesimal change in another. 2.1.1 Differentiation from first principles Let us consider a function f(x) that depends on only one variable x, together with numerical constants, for example, f(x)=3x 2 or f(x)=sinx or f(x)=2+3/x. 41 PRELIMINARY CALCULUS A P x f(x) x +∆x f(x +∆x) ∆f θ ∆x Figure 2.1 The graph of a function f(x) showing that the gradient or slope of the function at P,givenbytanθ, is approximately equal to ∆f/∆x. Figure 2.1 shows an example of such a function. Near any particular point, P, the value of the function changes by an amount ∆f, say, as x changes by a small amount ∆x. The slope of the tangent to the graph of f(x)atP is then approximately ∆f/∆x, and the change in the value of the function is ∆f = f(x +∆x)−f(x). In order to calculate the true value of the gradient, or first derivative, of the function at P, we must let ∆x become infinitesimally small. We therefore define the first derivative of f(x)as f prime (x)≡ df(x) dx ≡ lim ∆x→0 f(x +∆x)−f(x) ∆x , (2.1) provided that the limit exists. The limit will depend in almost all cases on the value of x. If the limit does exist at a point x = a then the function is said to be diﬀerentiable at a; otherwise it is said to be non-diﬀerentiable at a.Theformal concept of a limit and its existence or non-existence is discussed in chapter 4; for present purposes we will adopt an intuitive approach. In the definition (2.1), we allow ∆x to tend to zero from either positive or negative values and require the same limit to be obtained in both cases. A function that is diﬀerentiable at a is necessarily continuous at a (there must be no jump in the value of the function at a), though the converse is not necessarily true. This latter assertion is illustrated in figure 2.1: the function is continuous at the ‘kink’ A but the two limits of the gradient as ∆x tends to zero from positive or negative values are diﬀerent and so the function is not diﬀerentiable at A. It should be clear from the above discussion that near the point P we may 42 2.1 DIFFERENTIATION approximate the change in the value of the function, ∆f, that results from a small change ∆x in x by ∆f ≈ df(x) dx ∆x. (2.2) As one would expect, the approximation improves as the value of ∆x is reduced. In the limit in which the change ∆x becomes infinitesimally small, we denote it by the diﬀerential dx, and (2.2) reads df = df(x) dx dx. (2.3) This equality relates the infinitesimal change in the function, df, to the infinitesimal change dx that causes it. So far we have discussed only the first derivative of a function. However, we can also define the second derivative as the gradient of the gradient of a function. Again we use the definition (2.1) but now with f(x) replaced by f prime (x). Hence the second derivative is defined by f primeprime (x)≡ lim ∆x→0 f prime (x +∆x)−f prime (x) ∆x , (2.4) provided that the limit exists. A physical example of a second derivative is the second derivative of the distance travelled by a particle with respect to time. Since the first derivative of distance travelled gives the particle’s velocity, the second derivative gives its acceleration. We can continue in this manner, the nth derivative of the function f(x)being defined by f (n) (x)≡ lim ∆x→0 f (n−1) (x +∆x)−f (n−1) (x) ∆x . (2.5) It should be noted that with this notation f prime (x)≡f (1) (x), f primeprime (x)≡f (2) (x), etc., and that formally f (0) (x)≡f(x). All this should be familiar to the reader, though perhaps not with such formal definitions. The following example shows the diﬀerentiation of f(x)=x 2 from first principles. In practice, however, it is desirable simply to remember the derivatives of standard functions; the techniques given in the remainder of this section can be applied to find more complicated derivatives. 43 PRELIMINARY CALCULUS trianglerightsldFind from first principles the derivative with respect to x of f(x)=x 2 . Using the definition (2.1), f prime (x) = lim ∆x→0 f(x +∆x)−f(x) ∆x = lim ∆x→0 (x +∆x) 2 −x 2 ∆x = lim ∆x→0 2x∆x +(∆x) 2 ∆x = lim ∆x→0 (2x +∆x). As ∆x tends to zero, 2x +∆x tends towards 2x, hence f prime (x)=2x. triangleleftsld Derivatives of other functions can be obtained in the same way. The derivatives of some simple functions are listed below (note that a is a constant): d dx (x n ) = nx n−1 , d dx (e ax ) = ae ax , d dx (lnax) = 1 x , d dx (sinax) = acosax, d dx (cosax) =−asinax, d dx (secax) = asecaxtanax, d dx (tanax) = asec 2 ax, d dx (cosec ax) =−a cosec axcotax, d dx (cotax) =−a cosec 2 ax, d dx parenleftBig sin −1 x a parenrightBig = 1 √ a 2 −x 2 , d dx parenleftBig cos −1 x a parenrightBig = −1 √ a 2 −x 2 , d dx parenleftBig tan −1 x a parenrightBig = a a 2 + x 2 . Diﬀerentiation from first principles emphasises the definition of a derivative as the gradient of a function. However, for most practical purposes, returning to the definition (2.1) is time consuming and does not aid our understanding. Instead, as mentioned above, we employ a number of techniques, which use the derivatives listed above as ‘building blocks’, to evaluate the derivatives of more complicated functions than hitherto encountered. Subsections 2.1.2–2.1.7 develop the methods required. 2.1.2 Differentiation of products As a first example of the diﬀerentiation of a more complicated function, we consider finding the derivative of a function f(x) that can be written as the product of two other functions of x, namely f(x)=u(x)v(x). For example, if f(x)=x 3 sinx then we might take u(x)=x 3 and v(x)=sinx. Clearly the 44 2.1 DIFFERENTIATION separation is not unique. (In the given example, possible alternative break-ups would be u(x)=x 2 , v(x)=xsinx,orevenu(x)=x 4 tanx, v(x)=x −1 cosx.) The purpose of the separation is to split the function into two (or more) parts, of which we know the derivatives (or at least we can evaluate these derivatives more easily than that of the whole). We would gain little, however, if we did not know the relationship between the derivative of f and those of u and v. Fortunately, they are very simply related, as we shall now show. Since f(x) is written as the product u(x)v(x), it follows that f(x +∆x)−f(x)=u(x +∆x)v(x +∆x)−u(x)v(x) = u(x +∆x)[v(x +∆x)−v(x)] + [u(x +∆x)−u(x)]v(x). From the definition of a derivative (2.1), df dx = lim ∆x→0 f(x +∆x)−f(x) ∆x = lim ∆x→0 braceleftbigg u(x +∆x) bracketleftbigg v(x +∆x)−v(x) ∆x bracketrightbigg + bracketleftbigg u(x +∆x)−u(x) ∆x bracketrightbigg v(x) bracerightbigg . In the limit ∆x → 0, the factors in square brackets become dv/dx and du/dx (by the definitions of these quantities) and u(x +∆x) simply becomes u(x). Consequently we obtain df dx = d dx [u(x)v(x)] = u(x) dv(x) dx + du(x) dx v(x). (2.6) In primed notation and without writing the argument x explicitly, (2.6) is stated concisely as f prime =(uv) prime = uv prime + u prime v. (2.7) This is a general result obtained without making any assumptions about the specific forms f, u and v, other than that f(x)=u(x)v(x). In words, the result reads as follows. The derivative of the product of two functions is equal to the first function times the derivative of the second plus the second function times the derivative of the first. trianglerightsldFind the derivative with respect to x of f(x)=x 3 sinx. Using the product rule, (2.6), d dx (x 3 sinx)=x 3 d dx (sinx)+ d dx (x 3 )sinx = x 3 cosx +3x 2 sinx. triangleleftsld The product rule may readily be extended to the product of three or more functions. Considering the function f(x)=u(x)v(x)w(x) (2.8) 45 PRELIMINARY CALCULUS and using (2.6), we obtain, as before omitting the argument, df dx = u d dx (vw) + du dx vw. Using (2.6) again to expand the first term on the RHS gives the complete result d dx (uvw)=uv dw dx + u dv dx w + du dx vw (2.9) or (uvw) prime = uvw prime + uv prime w + u prime vw. (2.10) It is readily apparent that this can be extended to products containing any number n of factors; the expression for the derivative will then consist of n terms with the prime appearing in successive terms on each of the n factors in turn. This is probably the easiest way to recall the product rule. 2.1.3 The chain rule Products are just one type of complicated function that we may encounter in diﬀerentiation. Another is the function of a function, e.g. f(x)=(3+x 2 ) 3 = u(x) 3 , where u(x)=3+x 2 .If∆f,∆u and ∆x are small finite quantities, it follows that ∆f ∆x = ∆f ∆u ∆u ∆x ; As the quantities become infinitesimally small we obtain df dx = df du du dx . (2.11) This is the chain rule, which we must apply when diﬀerentiating a function of a function. trianglerightsldFind the derivative with respect to x of f(x)=(3+x 2 ) 3 . Rewriting the function as f(x)=u 3 ,whereu(x)=3+x 2 , and applying (2.11) we find df dx =3u 2 du dx =3u 2 d dx (3 + x 2 )=3u 2 ×2x =6x(3 + x 2 ) 2 . triangleleftsld Similarly, the derivative with respect to x of f(x)=1/v(x) may be obtained by rewriting the function as f(x)=v −1 and applying (2.11): df dx =−v −2 dv dx =− 1 v 2 dv dx . (2.12) The chain rule is also useful for calculating the derivative of a function f with respect to x when both x and f are written in terms of a variable (or parameter), say t. 46 2.1 DIFFERENTIATION trianglerightsldFind the derivative with respect to x of f(t)=2at,wherex = at 2 . We could of course substitute for t and then diﬀerentiate f as a function of x, but in this case it is quicker to use df dx = df dt dt dx =2a 1 2at = 1 t , where we have used the fact that dt dx = parenleftbigg dx dt parenrightbigg −1 . triangleleftsld 2.1.4 Differentiation of quotients Applying (2.6) for the derivative of a product to a function f(x)=u(x)[1/v(x)], we may obtain the derivative of the quotient of two factors. Thus f prime = parenleftBig u v parenrightBig prime = u parenleftbigg 1 v parenrightbigg prime + u prime parenleftbigg 1 v parenrightbigg = u parenleftbigg − v prime v 2 parenrightbigg + u prime v , where (2.12) has been used to evaluate (1/v) prime . This can now be rearranged into the more convenient and memorisable form f prime = parenleftBig u v parenrightBig prime = vu prime −uv prime v 2 . (2.13) This can be expressed in words as the derivative of a quotient is equal to the bottom times the derivative of the top minus the top times the derivative of the bottom, all over the bottom squared. trianglerightsldFind the derivative with respect to x of f(x)=sinx/x. Using (2.13) with u(x)=sinx, v(x)=x and hence u prime (x)=cosx, v prime (x) = 1, we find f prime (x)= xcosx−sinx x 2 = cosx x − sinx x 2 . triangleleftsld 2.1.5 Implicit differentiation So far we have only diﬀerentiated functions written in the form y = f(x). However, we may not always be presented with a relationship in this simple form. As an example consider the relation x 3 −3xy + y 3 = 2. In this case it is not possible to rearrange the equation to give y as a function of x. Nevertheless, by diﬀerentiating term by term with respect to x (implicit diﬀerentiation), we can find the derivative of y. 47 PRELIMINARY CALCULUS trianglerightsldFind dy/dx if x 3 −3xy + y 3 =2. Diﬀerentiating each term in the equation with respect to x we obtain d dx (x 3 )− d dx (3xy)+ d dx (y 3 )= d dx (2), ⇒ 3x 2 − parenleftbigg 3x dy dx +3y parenrightbigg +3y 2 dy dx =0, where the derivative of 3xy has been found using the product rule. Hence, rearranging for dy/dx, dy dx = y−x 2 y 2 −x . Note that dy/dx is a function of both x and y and cannot be expressed as a function of x only. triangleleftsld 2.1.6 Logarithmic differentiation In circumstances in which the variable with respect to which we are diﬀerentiating is an exponent, taking logarithms and then diﬀerentiating implicitly is the simplest way to find the derivative. trianglerightsldFind the derivative with respect to x of y = a x . To find the required derivative we first take logarithms and then diﬀerentiate implicitly: lny =lna x = xlna ⇒ 1 y dy dx =lna. Now, rearranging and substituting for y, we find dy dx = y lna = a x lna. triangleleftsld 2.1.7 Leibnitz’ theorem We have discussed already how to find the derivative of a product of two or more functions. We now consider Leibnitz’ theorem, which gives the corresponding results for the higher derivatives of products. Consider again the function f(x)=u(x)v(x). We know from the product rule that f prime = uv prime + u prime v. Using the rule once more for each of the products, we obtain f primeprime =(uv primeprime + u prime v prime )+(u prime v prime + u primeprime v) = uv primeprime +2u prime v prime + u primeprime v. Similarly, diﬀerentiating twice more gives f primeprimeprime = uv primeprimeprime +3u prime v primeprime +3u primeprime v prime + u primeprimeprime v, f (4) = uv (4) +4u prime v primeprimeprime +6u primeprime v primeprime +4u primeprimeprime v prime + u (4) v. 48 2.1 DIFFERENTIATION The pattern emerging is clear and strongly suggests that the results generalise to f (n) = n summationdisplay r=0 n! r!(n−r)! u (r) v (n−r) = n summationdisplay r=0 n C r u (r) v (n−r) , (2.14) where the fraction n!/[r!(n−r)!] is identified with the binomial coeﬃcient n C r (see chapter 1). To prove that this is so, we use the method of induction as follows. Assume that (2.14) is valid for n equal to some integer N.Then f (N+1) = N summationdisplay r=0 N C r d dx parenleftbig u (r) v (N−r) parenrightbig = N summationdisplay r=0 N C r [u (r) v (N−r+1) + u (r+1) v (N−r) ] = N summationdisplay s=0 N C s u (s) v (N+1−s) + N+1 summationdisplay s=1 N C s−1 u (s) v (N+1−s) , where we have substituted summation index s for r in the first summation, and for r + 1 in the second. Now, from our earlier discussion of binomial coeﬃcients, equation (1.51), we have N C s + N C s−1 = N+1 C s and so, after separating out the first term of the first summation and the last term of the second, obtain f (N+1) = N C 0 u (0) v (N+1) + N summationdisplay s=1 N+1 C s u (s) v (N+1−s) + N C N u (N+1) v (0) . But N C 0 =1= N+1 C 0 and N C N =1= N+1 C N+1 , and so we may write f (N+1) = N+1 C 0 u (0) v (N+1) + N summationdisplay s=1 N+1 C s u (s) v (N+1−s) + N+1 C N+1 u (N+1) v (0) = N+1 summationdisplay s=0 N+1 C s u (s) v (N+1−s) . This is just (2.14) with n set equal to N + 1. Thus, assuming the validity of (2.14) for n = N implies its validity for n = N + 1. However, when n =1equation (2.14) is simply the product rule, and this we have already proved directly. These results taken together establish the validity of (2.14) for all n and prove Leibnitz’ theorem. 49 PRELIMINARY CALCULUS Q A B C f(x) x S Figure 2.2 A graph of a function, f(x), showing how diﬀerentiation corre- sponds to finding the gradient of the function at a particular point. Points B, Q and S are stationary points (see text). trianglerightsldFind the third derivative of the function f(x)=x 3 sinx. Using (2.14) we immediately find f primeprimeprime (x)=6sinx +3(6x)cosx +3(3x 2 )(−sinx)+x 3 (−cosx) =3(2−3x 2 )sinx + x(18−x 2 )cosx. triangleleftsld 2.1.8 Special points of a function We have interpreted the derivative of a function as the gradient of the function at the relevant point (figure 2.1). If the gradient is zero for some particular value of x then the function is said to have a stationary point there. Clearly, in graphical terms, this corresponds to a horizontal tangent to the graph. Stationary points may be divided into three categories and an example of each is shown in figure 2.2. Point B is said to be a minimum since the function increases in value in both directions away from it. Point Q is said to be a maximum since the function decreases in both directions away from it. Note that B is not the overall minimum value of the function and Q is not the overall maximum; rather, they are a local minimum and a local maximum. Maxima and minima are known collectively as turning points. The third type of stationary point is the stationary point of inflection, S.In this case the function falls in the positive x-direction and rises in the negative x-direction so that S is neither a maximum nor a minimum. Nevertheless, the gradient of the function is zero at S, i.e. the graph of the function is flat there, and this justifies our calling it a stationary point. Of course, a point at which the 50 2.1 DIFFERENTIATION gradient of the function is zero but the function rises in the positive x-direction and falls in the negative x-direction is also a stationary point of inflection. The above distinction between the three types of stationary point has been made rather descriptively. However, it is possible to define and distinguish sta- tionary points mathematically. From their definition as points of zero gradient, all stationary points must be characterised by df/dx = 0. In the case of the minimum, B, the slope, i.e. df/dx, changes from negative at A to positive at C through zero at B. Thus df/dx is increasing and so the second derivative d 2 f/dx 2 must be positive. Conversely, at the maximum, Q, we must have that d 2 f/dx 2 is negative. It is less obvious, but intuitively reasonable, that at S, d 2 f/dx 2 is zero. This may be inferred from the following observations. To the left of S the curve is concave upwards so that df/dx is increasing with x and hence d 2 f/dx 2 > 0. To the right of S, however, the curve is concave downwards so that df/dx is decreasing with x and hence d 2 f/dx 2 < 0. In summary, at a stationary point df/dx = 0 and (i) for a minimum, d 2 f/dx 2 > 0, (ii) for a maximum, d 2 f/dx 2 < 0, (iii) for a stationary point of inflection, d 2 f/dx 2 = 0 and d 2 f/dx 2 changes sign through the point. In case (iii), a stationary point of inflection, in order that d 2 f/dx 2 changes sign through the point we normally require d 3 f/dx 3 negationslash= 0 at that point. This simple rule can fail for some functions, however, and in general if the first non-vanishing derivative of f(x) at the stationary point is f (n) then if n is even the point is a maximum or minimum and if n is odd the point is a stationary point of inflection. This may be seen from the Taylor expansion (see equation (4.17)) of the function about the stationary point, but it is not proved here. trianglerightsldFind the positions and natures of the stationary points of the function f(x)=2x 3 −3x 2 −36x +2. The first criterion for a stationary point is that df/dx = 0, and hence we set df dx =6x 2 −6x−36 = 0, from which we obtain (x−3)(x +2)=0. Hence the stationary points are at x =3andx = −2. To determine the nature of the stationary point we must evaluate d 2 f/dx 2 : d 2 f dx 2 =12x−6. 51 PRELIMINARY CALCULUS G f(x) x Figure 2.3 The graph of a function f(x) that has a general point of inflection at the point G. Now, we examine each stationary point in turn. For x =3,d 2 f/dx 2 = 30. Since this is positive, we conclude that x = 3 is a minimum. Similarly, for x =−2, d 2 f/dx 2 =−30 and so x =−2 is a maximum. triangleleftsld So far we have concentrated on stationary points, which are defined to have df/dx = 0. We have found that at a stationary point of inflection d 2 f/dx 2 is also zero and changes sign. This naturally leads us to consider points at which d 2 f/dx 2 is zero and changes sign but at which df/dx is not, in general, zero. Such points are called general points of inflection or simply points of inflection. Clearly, a stationary point of inflection is a special case for which df/dx is also zero. At a general point of inflection the graph of the function changes from being concave upwards to concave downwards (or vice versa), but the tangent to the curve at this point need not be horizontal. A typical example of a general point of inflection is shown in figure 2.3. The determination of the stationary points of a function, together with the identification of its zeros, infinities and possible asymptotes, is usually suﬃcient to enable a graph of the function showing most of its significant features to be sketched. Some examples for the reader to try are included in the exercises at the end of this chapter. 2.1.9 Curvature of a function In the previous section we saw that at a point of inflection of the function f(x), the second derivative d 2 f/dx 2 changes sign and passes through zero. The corresponding graph of f shows an inversion of its curvature at the point of inflection. We now develop a more quantitative measure of the curvature of a function (or its graph), which is applicable at general points and not just in the neighbourhood of a point of inflection. As in figure 2.1, let θ be the angle made with the x-axis by the tangent at a 52 2.1 DIFFERENTIATION C P Q ρ θ θ +∆θ ∆θ x f(x) Figure 2.4 Two neighbouring tangents to the curve f(x) whose slopes diﬀer by ∆θ. The angular separation of the corresponding radii of the circle of curvature is also ∆θ. point P on the curve f = f(x), with tanθ = df/dx evaluated at P. Now consider also the tangent at a neighbouring point Q on the curve, and suppose that it makes an angle θ +∆θ with the x-axis, as illustrated in figure 2.4. It follows that the corresponding normals at P and Q, which are perpendicular to the respective tangents, also intersect at an angle ∆θ. Furthermore, their point of intersection, C in the figure, will be the position of the centre of a circle that approximates the arc PQ, at least to the extent of having the same tangents at the extremities of the arc. This circle is called the circle of curvature. For a finite arc PQ, the lengths of CP and CQ will not, in general, be equal, as they would be if f = f(x) were in fact the equation of a circle. But, as Q is allowed to tend to P,i.e.as∆θ → 0, they do become equal, their common value being ρ, the radius of the circle, known as the radius of curvature. It follows immediately that the curve and the circle of curvature have a common tangent at P and lie on the same side of it. The reciprocal of the radius of curvature, ρ −1 , defines the curvature of the function f(x) at the point P. The radius of curvature can be defined more mathematically as follows. The length ∆s of arc PQis approximately equal to ρ∆θ and, in the limit ∆θ→0, this relationship defines ρ as ρ = lim ∆θ→0 ∆s ∆θ = ds dθ . (2.15) It should be noted that, as s increases, θ may increase or decrease according to whether the curve is locally concave upwards (i.e. shaped as if it were near a minimum in f(x)) or concave downwards. This is reflected in the sign of ρ,which therefore also indicates the position of the curve (and of the circle of curvature) 53 PRELIMINARY CALCULUS relative to the common tangent, above or below. Thus a negative value of ρ indicates that the curve is locally concave downwards and that the tangent lies above the curve. We next obtain an expression for ρ, not in terms of s and θ but in terms of x and f(x). The expression, though somewhat cumbersome, follows from the defining equation (2.15), the defining property of θ that tanθ = df/dx≡f prime and the fact that the rate of change of arc length with x is given by ds dx = bracketleftBigg 1+ parenleftbigg df dx parenrightbigg 2 bracketrightBigg 1/2 . (2.16) This last result, simply quoted here, is proved more formally in subsection 2.2.13. From the chain rule (2.11) it follows that ρ = ds dθ = ds dx dx dθ . (2.17) Diﬀerentiating both sides of tanθ = df/dx with respect to x gives sec 2 θ dθ dx = d 2 f dx 2 ≡f primeprime , from which, using sec 2 θ =1+tan 2 θ =1+(f prime ) 2 , we can obtain dx/dθ as dx dθ = 1+tan 2 θ f primeprime = 1+(f prime ) 2 f primeprime . (2.18) Substituting (2.16) and (2.18) into (2.17) then yields the final expression for ρ, ρ = bracketleftbig 1+(f prime ) 2 bracketrightbig 3/2 f primeprime . (2.19) It should be noted that the quantity in brackets is always positive and that its three-halves root is also taken as positive. The sign of ρ is thus solely determined by that of d 2 f/dx 2 , in line with our previous discussion relating the sign to whether the curve is concave or convex upwards. If, as happens at a point of inflection, d 2 f/dx 2 is zero then ρ is formally infinite and the curvature of f(x)is zero. As d 2 f/dx 2 changes sign on passing through zero, both the local tangent and the circle of curvature change from their initial positions to the opposite side of the curve. 54 2.1 DIFFERENTIATION trianglerightsldShow that the radius of curvature at the point (x,y) on the ellipse x 2 a 2 + y 2 b 2 =1 has magnitude (a 4 y 2 + b 4 x 2 ) 3/2 /(a 4 b 4 ) and the opposite sign to y. Check the special case b = a, for which the ellipse becomes a circle. Diﬀerentiating the equation of the ellipse with respect to x gives 2x a 2 + 2y b 2 dy dx =0 and so dy dx =− b 2 x a 2 y . A second diﬀerentiation, using (2.13), then yields d 2 y dx 2 =− b 2 a 2 parenleftbigg y−xy prime y 2 parenrightbigg =− b 4 a 2 y 3 parenleftbigg y 2 b 2 + x 2 a 2 parenrightbigg =− b 4 a 2 y 3 , where we have used the fact that (x,y) lies on the ellipse. We note that d 2 y/dx 2 , and hence ρ, has the opposite sign to y 3 and hence to y. Substituting in (2.19) gives for the magnitude of the radius of curvature |ρ|= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle bracketleftbig 1+b 4 x 2 /(a 4 y 2 ) bracketrightbig 3/2 −b 4 /(a 2 y 3 ) vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle = (a 4 y 2 + b 4 x 2 ) 3/2 a 4 b 4 . For the special case b = a, |ρ| reduces to a −2 (y 2 + x 2 ) 3/2 and, since x 2 + y 2 = a 2 ,thisin turn gives |ρ|= a, as expected. triangleleftsld The discussion in this section has been confined to the behaviour of curves that lie in one plane; examples of the application of curvature to the bending of loaded beams and to particle orbits under the influence of a central forces can be found in the exercises at the ends of later chapters. A more general treatment of curvature in three dimensions is given in section 10.3, where a vector approach is adopted. 2.1.10 Theorems of differentiation Rolle’s theorem Rolle’s theorem (figure 2.5) states that if a function f(x) is continuous in the range a≤x≤c, is diﬀerentiable in the range a1, 1− 1 x < lnx0, is not diﬀerentiable at x = 0. Consider the limiting process for both ∆x>0and ∆x<0. 2.7 Find dy/dx if x =(t−2)/(t +2)andy =2t/(t +1)for−∞ 4ac, (ii) b 2 < 4ac and (iii) b 2 =4ac. 2.34 Use logarithmic integration to find the indefinite integrals J of the following: (a) sin 2x/(1 + 4sin 2 x); (b) e x /(e x −e −x ); (c) (1 + xlnx)/(xlnx); (d) [x(x n + a n )] −1 . 2.35 Find the derivative of f(x)=(1+sinx)/cosx and hence determine the indefinite integral J of secx. 2.36 Find the indefinite integrals, J, of the following functions involving sinusoids: (a) cos 5 x−cos 3 x; (b) (1−cosx)/(1 + cosx); (c) cosxsinx/(1 + cosx); (d) sec 2 x/(1−tan 2 x). 2.37 By making the substitution x = acos 2 θ + bsin 2 θ, evaluate the definite integrals J between limits a and b (>a) of the following functions: (a) [(x−a)(b−x)] −1/2 ; (b) [(x−a)(b−x)] 1/2 ; 79 PRELIMINARY CALCULUS (c) [(x−a)/(b−x)] 1/2 . 2.38 Determine whether the following integrals exist and, where they do, evaluate them: (a) integraldisplay ∞ 0 exp(−λx)dx;(b) integraldisplay ∞ −∞ x (x 2 + a 2 ) 2 dx; (c) integraldisplay ∞ 1 1 x +1 dx;(d) integraldisplay 1 0 1 x 2 dx; (e) integraldisplay π/2 0 cotθdθ;(f) integraldisplay 1 0 x (1−x 2 ) 1/2 dx. 2.39 Use integration by parts to evaluate the following: (a) integraldisplay y 0 x 2 sinxdx;(b) integraldisplay y 1 xlnxdx; (c) integraldisplay y 0 sin −1 xdx;(d) integraldisplay y 1 ln(a 2 + x 2 )/x 2 dx. 2.40 Show, using the following methods, that the indefinite integral of x 3 /(x+1) 1/2 is J = 2 35 (5x 3 −6x 2 +8x−16)(x +1) 1/2 + c. (a) Repeated integration by parts. (b) Setting x +1=u 2 and determining dJ/du as (dJ/dx)(dx/du). 2.41 The gamma function Γ(n) is defined for all n>−1by Γ(n +1)= integraldisplay ∞ 0 x n e −x dx. Find a recurrence relation connecting Γ(n +1)andΓ(n). (a) Deduce (i) the value of Γ(n +1)whenn is a non-negative integer, and (ii) the value of Γ parenleftbig 7 2 parenrightbig ,giventhatΓ parenleftbig 1 2 parenrightbig = √ π. (b) Now, taking factorial m for any m to be defined by m!=Γ(m + 1), evaluate parenleftbig − 3 2 parenrightbig !. 2.42 Define J(m,n), for non-negative integers m and n, by the integral J(m,n)= integraldisplay π/2 0 cos m θ sin n θdθ. (a) Evaluate J(0,0), J(0,1), J(1,0), J(1,1), J(m,1), J(1,n). (b) Using integration by parts, prove that, for m and n both > 1, J(m,n)= m−1 m + n J(m−2,n)andJ(m,n)= n−1 m + n J(m,n−2). (c) Evaluate (i) J(5,3), (ii) J(6,5) and (iii) J(4,8). 2.43 By integrating by parts twice, prove that I n as defined in the first equality below for positive integers n has the value given in the second equality: I n = integraldisplay π/2 0 sinnθcosθdθ= n−sin(nπ/2) n 2 −1 . 2.44 Evaluate the following definite integrals: (a) integraltext ∞ 0 xe −x dx;(b) integraltext 1 0 bracketleftbig (x 3 +1)/(x 4 +4x +1) bracketrightbig dx; (c) integraltext π/2 0 [a +(a−1) cosθ] −1 dθ with a> 1 2 ;(d) integraltext ∞ −∞ (x 2 +6x +18) −1 dx. 80 2.4 HINTS AND ANSWERS 2.45 If J r is the integral integraldisplay ∞ 0 x r exp(−x 2 )dx show that (a) J 2r+1 =(r!)/2, (b) J 2r =2 −r (2r−1)(2r−3)···(5)(3)(1)J 0 . 2.46 Find positive constants a, b such that ax ≤ sinx ≤ bx for 0 ≤ x ≤ π/2. Use this inequality to find (to two significant figures) upper and lower bounds for the integral I = integraldisplay π/2 0 (1+sinx) 1/2 dx. Use the substitution t =tan(x/2) to evaluate I exactly. 2.47 By noting that for 0≤η≤1, η 1/2 ≥η 3/4 ≥η, prove that 2 3 ≤ 1 a 5/2 integraldisplay a 0 (a 2 −x 2 ) 3/4 dx≤ π 4 . 2.48 Show that the total length of the astroid x 2/3 + y 2/3 = a 2/3 ,whichcanbe parameterised as x = acos 3 θ, y = asin 3 θ,is6a. 2.49 By noting that sinhx< 1 2 e x < coshx,andthat1+z 2 < (1 + z) 2 for z>0, show that, for x>0, the length L of the curve y = 1 2 e x measured from the origin satisfies the inequalities sinhx0andf(0) = 0; g prime (x)=(−cosx)(tanx−x)/x 2 , which is never positive in the range. 2.25 The false result arises because tannx is not diﬀerentiable at x = π/(2n), which lies in the range 0 0, or 4ac−b 2 as ∆ prime2 > 0: (i) ∆ −1 ln[(2ax + b−∆)/(2ax + b +∆)]+k; (ii) 2∆ prime−1 tan −1 [(2ax + b)/∆ prime ]+k; (iii) −2(2ax + b) −1 + k. 2.35 f prime (x)=(1+sinx)/cos 2 x = f(x)secx; J =ln(f(x)) + c =ln(secx +tanx)+c. 2.37 Note that dx =2(b−a)cosθ sinθdθ. (a) π;(b)π(b−a) 2 /8; (c) π(b−a)/2. 2.39 (a) (2−y 2 )cosy +2y siny − 2; (b) [(y 2 lny)/2] + [(1−y 2 )/4]; (c) y sin −1 y +(1−y 2 ) 1/2 −1; (d) ln(a 2 +1)−(1/y)ln(a 2 + y 2 )+(2/a)[tan −1 (y/a)−tan −1 (1/a)]. 2.41 Γ(n +1)=nΓ(n); (a) (i) n!, (ii) 15 √ π/8; (b) −2 √ π. 2.43 By integrating twice, recover a multiple of I n . 2.45 J 2r+1 = rJ 2r−1 and 2J 2r =(2r−1)J 2r−2 . 2.47 Set η =1−(x/a) 2 throughout, and x = asinθ in one of the bounds. 2.49 L = integraltext x 0 parenleftbig 1+ 1 4 exp 2x parenrightbig 1/2 dx. 82 3 Complex numbers and hyperbolic functions This chapter is concerned with the representation and manipulation of complex numbers. Complex numbers pervade this book, underscoring their wide appli- cation in the mathematics of the physical sciences. The application of complex numbers to the description of physical systems is left until later chapters and only the basic tools are presented here. 3.1 The need for complex numbers Although complex numbers occur in many branches of mathematics, they arise most directly out of solving polynomial equations. We examine a specific quadratic equation as an example. Consider the quadratic equation z 2 −4z +5=0. (3.1) Equation (3.1) has two solutions, z 1 and z 2 , such that (z−z 1 )(z−z 2 )=0. (3.2) Using the familiar formula for the roots of a quadratic equation, (1.4), the solutions z 1 and z 2 , written in brief as z 1,2 ,are z 1,2 = 4± radicalbig (−4) 2 −4(1×5) 2 =2± √ −4 2 . (3.3) Both solutions contain the square root of a negative number. However, it is not true to say that there are no solutions to the quadratic equation. The fundamental theorem of algebra states that a quadratic equation will always have two solutions and these are in fact given by (3.3). The second term on the RHS of (3.3) is called an imaginary term since it contains the square root of a negative number; 83 COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS 1 1 2 2 3 3 4 4 5 z f(z) Figure 3.1 The function f(z)=z 2 −4z +5. the first term is called a real term. The full solution is the sum of a real term and an imaginary term and is called a complex number. A plot of the function f(z)=z 2 −4z + 5 is shown in figure 3.1. It will be seen that the plot does not intersect the z-axis, corresponding to the fact that the equation f(z)=0hasno purely real solutions. The choice of the symbol z for the quadratic variable was not arbitrary; the conventional representation of a complex number is z,wherez is the sum of a real part x and i times an imaginary part y,i.e. z = x + iy, where i is used to denote the square root of−1. The real part x and the imaginary part y are usually denoted by Rez and Imz respectively. We note at this point that some physical scientists, engineers in particular, use j instead of i. However, for consistency, we will use i throughout this book. In our particular example, √ −4=2 √ −1=2i, and hence the two solutions of (3.1) are z 1,2 =2± 2i 2 =2±i. Thus, here x = 2 and y =±1. For compactness a complex number is sometimes written in the form z =(x,y), where the components of z may be thought of as coordinates in an xy-plot. Such a plot is called an Argand diagram and is a common representation of complex numbers; an example is shown in figure 3.2. 84 3.2 MANIPULATION OF COMPLEX NUMBERS Rez Imz z = x + iy x y Figure 3.2 The Argand diagram. Our particular example of a quadratic equation may be generalised readily to polynomials whose highest power (degree) is greater than 2, e.g. cubic equations (degree 3), quartic equations (degree 4) and so on. For a general polynomial f(z), of degree n, the fundamental theorem of algebra states that the equation f(z)=0 will have exactly n solutions. We will examine cases of higher-degree equations in subsection 3.4.3. The remainder of this chapter deals with: the algebra and manipulation of complex numbers; their polar representation, which has advantages in many circumstances; complex exponentials and logarithms; the use of complex numbers in finding the roots of polynomial equations; and hyperbolic functions. 3.2 Manipulation of complex numbers This section considers basic complex number manipulation. Some analogy may be drawn with vector manipulation (see chapter 7) but this section stands alone as an introduction. 3.2.1 Addition and subtraction The addition of two complex numbers, z 1 and z 2 , in general gives another complex number. The real components and the imaginary components are added separately and in a like manner to the familiar addition of real numbers: z 1 + z 2 =(x 1 + iy 1 )+(x 2 + iy 2 )=(x 1 + x 2 )+i(y 1 + y 2 ), 85 COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS Rez Imz z 1 z 2 z 1 + z 2 Figure 3.3 The addition of two complex numbers. or in component notation z 1 + z 2 =(x 1 ,y 1 )+(x 2 ,y 2 )=(x 1 + x 2 ,y 1 + y 2 ). The Argand representation of the addition of two complex numbers is shown in figure 3.3. By straightforward application of the commutativity and associativity of the real and imaginary parts separately, we can show that the addition of complex numbers is itself commutative and associative, i.e. z 1 + z 2 = z 2 + z 1 , z 1 +(z 2 + z 3 )=(z 1 + z 2 )+z 3 . Thus it is immaterial in what order complex numbers are added. trianglerightsldSum the complex numbers 1+2i, 3−4i, −2+i. Summing the real terms we obtain 1+3−2=2, and summing the imaginary terms we obtain 2i−4i + i =−i. Hence (1 + 2i)+(3−4i)+(−2+i)=2−i. triangleleftsld The subtraction of complex numbers is very similar to their addition. As in the case of real numbers, if two identical complex numbers are subtracted then the result is zero. 86 3.2 MANIPULATION OF COMPLEX NUMBERS Rez Imz |z| x y arg z Figure 3.4 The modulus and argument of a complex number. 3.2.2 Modulus and argument The modulus of the complex number z is denoted by |z| and is defined as |z|= radicalbig x 2 + y 2 . (3.4) Hence the modulus of the complex number is the distance of the corresponding point from the origin in the Argand diagram, as may be seen in figure 3.4. The argument of the complex number z is denoted by arg z and is defined as arg z =tan −1 parenleftBig y x parenrightBig . (3.5) It can be seen that argz is the angle that the line joining the origin to z on the Argand diagram makes with the positive x-axis. The anticlockwise direction is taken to be positive by convention. The angle arg z is shown in figure 3.4. Account must be taken of the signs of x and y individually in determining in which quadrant arg z lies. Thus, for example, if x and y are both negative then arg z lies in the range −πa 2 . (3.61) These may be derived from the logarithmic form of the inverse (see subsec- tion 3.7.5). 108 3.8 EXERCISES trianglerightsldEvaluate (d/dx)sinh −1 x using the logarithmic form of the inverse. From the results of section 3.7.5, d dx parenleftbig sinh −1 x parenrightbig = d dx bracketleftBig ln parenleftBig x + radicalbig x 2 +1 parenrightBigbracketrightBig = 1 x + √ x 2 +1 parenleftbigg 1+ x √ x 2 +1 parenrightbigg = 1 x + √ x 2 +1 parenleftBigg √ x 2 +1+x √ x 2 +1 parenrightBigg = 1 √ x 2 +1 . triangleleftsld 3.8 Exercises 3.1 Two complex numbers z and w are given by z =3+4i and w =2−i.Onan Argand diagram, plot (a) z + w,(b)w−z,(c)wz,(d)z/w, (e) z ∗ w + w ∗ z,(f)w 2 ,(g)lnz,(h)(1+z + w) 1/2 . 3.2 By considering the real and imaginary parts of the product e iθ e iφ prove the standard formulae for cos(θ + φ) and sin(θ + φ). 3.3 By writing π/12 = (π/3)−(π/4) and considering e iπ/12 , evaluate cot(π/12). 3.4 Find the locus in the complex z-plane of points that satisfy the following equa- tions. (a) z−c = ρ parenleftbigg 1+it 1−it parenrightbigg , where c is complex, ρ is real and t is a real parameter that varies in the range −∞0, is a circle of radius |2λa/(1−λ 2 )| centred on the point z = ia[(1 + λ 2 )/(1−λ 2 )]. Sketch the circles for a few typical values of λ, including λ<1, λ>1andλ =1. 3.8 The two sets of points z = a, z = b, z = c,andz = A, z = B, z = C are the corners of two similar triangles in the Argand diagram. Express in terms of a,b,... ,C 109 COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS (a) the equalities of corresponding angles, and (b) the constant ratio of corresponding sides, in the two triangles. By noting that any complex quantity can be expressed as z =|z|exp(iargz), deduce that a(B−C)+b(C−A)+c(A−B)=0. 3.9 For the real constant a find the loci of all points z = x+iy in the complex plane that satisfy (a) Re braceleftbigg ln parenleftbigg z−ia z + ia parenrightbiggbracerightbigg = c, c>0, (b) Im braceleftbigg ln parenleftbigg z−ia z + ia parenrightbiggbracerightbigg = k,0≤k≤π/2. Identify the two families of curves and verify that in case (b) all curves pass through the two points ±ia. 3.10 The most general type of transformation between one Argand diagram, in the z-plane, and another, in the Z-plane, that gives one and only one value of Z for each value of z (and conversely) is known as the general bilinear transformation and takes the form z = aZ + b cZ + d . (a) Confirm that the transformation from the Z-plane to the z-plane is also a general bilinear transformation. (b) Recalling that the equation of a circle can be written in the form vextendsingle vextendsingle vextendsingle vextendsingle z−z 1 z−z 2 vextendsingle vextendsingle vextendsingle vextendsingle = λ, λnegationslash=1, show that the general bilinear transformation transforms circles into circles (or straight lines). What is the condition that z 1 , z 2 and λ must satisfy if the transformed circle is to be a straight line? 3.11 Sketch the parts of the Argand diagram in which (a) Re z 2 < 0, |z 1/2 |≤2; (b) 0≤arg z ∗ ≤π/2; (c) |expz 3 |→0as|z|→∞. What is the area of the region in which all three sets of conditions are satisfied? 3.12 Denote the nth roots of unity by 1, ω n , ω 2 n , ..., ω n−1 n . (a) Prove that (i) n−1 summationdisplay r=0 ω r n =0, (ii) n−1 productdisplay r=0 ω r n =(−1) n+1 . (b) Express x 2 +y 2 +z 2 −yz−zx−xy as the product of two factors, each linear in x, y and z, with coeﬃcients dependent on the third roots of unity (and those of the x terms arbitrarily taken as real). 110 3.8 EXERCISES 3.13 Prove that x 2m+1 −a 2m+1 ,wherem is an integer ≥1, can be written as x 2m+1 −a 2m+1 =(x−a) m productdisplay r=1 bracketleftbigg x 2 −2axcos parenleftbigg 2πr 2m +1 parenrightbigg + a 2 bracketrightbigg . 3.14 The complex position vectors of two parallel interacting equal fluid vortices moving with their axes of rotation always perpendicular to the z-plane are z 1 and z 2 . The equations governing their motions are dz ∗ 1 dt =− i z 1 −z 2 , dz ∗ 2 dt =− i z 2 −z 1 . Deduce that (a) z 1 + z 2 ,(b)|z 1 −z 2 | and (c) |z 1 | 2 +|z 2 | 2 are all constant in time, and hence describe the motion geometrically. 3.15 Solve the equation z 7 −4z 6 +6z 5 −6z 4 +6z 3 −12z 2 +8z +4=0, (a) by examining the eﬀect of setting z 3 equal to 2, and then (b) by factorising and using the binomial expansion of (z + a) 4 . Plot the seven roots of the equation on an Argand plot, exemplifying that complex roots of a polynomial equation always occur in conjugate pairs if the polynomial has real coeﬃcients. 3.16 The polynomial f(z) is defined by f(z)=z 5 −6z 4 +15z 3 −34z 2 +36z−48. (a) Show that the equation f(z) = 0 has roots of the form z = λi,whereλ is real, and hence factorize f(z). (b) Show further that the cubic factor of f(z)canbewrittenintheform (z + a) 3 + b,wherea and b are real, and hence solve the equation f(z)=0 completely. 3.17 The binomial expansion of (1 + x) n , discussed in chapter 1, can be written for a positive integer n as (1 + x) n = n summationdisplay r=0 n C r x r , where n C r = n!/[r!(n−r)!]. (a) Use de Moivre’s theorem to show that the sum S 1 (n)= n C 0 − n C 2 + n C 4 −···+(−1) mn C 2m ,n−1≤2m≤n, has the value 2 n/2 cos(nπ/4). (b) Derive a similar result for the sum S 2 (n)= n C 1 − n C 3 + n C 5 −···+(−1) mn C 2m+1 ,n−1≤2m +1≤n, and verify it for the cases n =6,7and8. 3.18 By considering (1 + exp iθ) n , prove that n summationdisplay r=0 n C r cosrθ =2 n cos n (θ/2)cos(nθ/2), n summationdisplay r=0 n C r sinrθ =2 n cos n (θ/2)sin(nθ/2), where n C r = n!/[r!(n−r)!]. 111 COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS 3.19 Use de Moivre’s theorem with n = 4 to prove that cos 4θ =8cos 4 θ−8cos 2 θ +1, and deduce that cos π 8 = parenleftBigg 2+ √ 2 4 parenrightBigg 1/2 . 3.20 Express sin 4 θ entirely in terms of the trigonometric functions of multiple angles and deduce that its average value over a complete cycle is 3 8 . 3.21 Use de Moivre’s theorem to prove that tan5θ = t 5 −10t 3 +5t 5t 4 −10t 2 +1 , where t =tanθ. Deduce the values of tan(nπ/10) for n =1,2,3,4. 3.22 Prove the following results involving hyperbolic functions. (a) That coshx−coshy =2sinh parenleftbigg x + y 2 parenrightbigg sinh parenleftbigg x−y 2 parenrightbigg . (b) That, if y =sinh −1 x, (x 2 +1) d 2 y dx 2 + x dy dx =0. 3.23 Determine the conditions under which the equation acoshx + bsinhx = c, c > 0, has zero, one, or two real solutions for x. What is the solution if a 2 = c 2 + b 2 ? 3.24 Use the definitions and properties of hyperbolic functions to do the following: (a) Solve coshx =sinhx +2sechx. (b) Show that the real solution x of tanhx =cosechx canbewritteninthe form x =ln(u + √ u). Find an explicit value for u. (c) Evaluate tanhx when x is the real solution of cosh2x =2coshx. 3.25 Express sinh 4 x in terms of hyperbolic cosines of multiples of x, and hence find the real solutions of 2cosh4x−8cosh2x +5=0. 3.26 In the theory of special relativity, the relationship between the position and time coordinates of an event, as measured in two frames of reference that have parallel x-axes, can be expressed in terms of hyperbolic functions. If the coordinates are x and t in one frame and x prime and t prime in the other, then the relationship take the form x prime = xcoshφ−ctsinhφ, ct prime =−xsinhφ + ctcoshφ. Express x and ct in terms of x prime , ct prime and φ and show that x 2 −(ct) 2 =(x prime ) 2 −(ct prime ) 2 . 112 3.9 HINTS AND ANSWERS 3.27 A closed barrel has as its curved surface the surface obtained by rotating about the x-axis the part of the curve y = a[2−cosh(x/a)] lying in the range −b≤x≤b,whereb1. (b) The condition is that arg[(z−ia)/(z+ia)] = k. This can be rearranged to give a(z + z ∗ )=(a 2 −|z| 2 )tank, which becomes in x,y coordinates the equation of a circle with centre (−acotk,0) and radius acosec k. 3.11 All three conditions are satisfied in 3π/2≤θ≤7π/4, |z|≤4; area = 2π. 3.13 Denoting exp[2πi/(2m + 1)] by Ω, express x 2m+1 −a 2m+1 as a product of factors like (x−aΩ r ) and then combine those containing Ω r and Ω 2m+1−r .Usethefact that Ω 2m+1 =1. 3.15 The roots are 2 1/3 exp(2πni/3) for n =0,1,2; 1±3 1/4 ;1±3 1/4 i. 3.17 Consider (1 + i) n .(b)S 2 (n)=2 n/2 sin(nπ/4). S 2 (6) =−8, S 2 (7) =−8, S 2 (8) = 0. 3.19 Use the binomial expansion of (cosθ + isinθ) 4 . 3.21 Show that cos5θ =16c 5 −20c 3 +5c,wherec =cosθ, and correspondingly for sin 5θ.Usecos −2 θ =1+tan 2 θ. The four required values are [(5− √ 20)/5] 1/2 ,(5− √ 20) 1/2 ,[(5+ √ 20)/5] 1/2 ,(5+ √ 20) 1/2 . 3.23 Reality of the root(s) requires c 2 + b 2 ≥a 2 and a + b>0. With these conditions, there are two roots if a 2 >b 2 , but only one if b 2 >a 2 . For a 2 = c 2 + b 2 , x = 1 2 ln[(a−b)/(a + b)]. 3.25 Reduce the equation to 16sinh 4 x = 1, yielding x =±0.481. 113 COMPLEX NUMBERS AND HYPERBOLIC FUNCTIONS 3.27 Show that ds =(coshx/a)dx; curved surface area = πa 2 [8sinh(b/a)−sinh(2b/a)]−2πab. flat ends area = 2πa 2 [4−4cosh(b/a)+cosh 2 (b/a)]. 114 4 Series and limits 4.1 Series Many examples exist in the physical sciences of situations where we are presented with a sum of terms to evaluate. For example, we may wish to add the contributions from successive slits in a diﬀraction grating to find the total light intensity at a particular point behind the grating. A series may have either a finite or infinite number of terms. In either case, the sum of the first N terms of a series (often called a partial sum) is written S N = u 1 + u 2 + u 3 + ···+ u N , where the terms of the series u n , n =1,2,3,...,N are numbers, that may in general be complex. If the terms are complex then S N will in general be complex also, and we can write S N = X N +iY N ,whereX N and Y N are the partial sums of the real and imaginary parts of each term separately and are therefore real. If a series has only N terms then the partial sum S N is of course the sum of the series. Sometimes we may encounter series where each term depends on some variable, x, say. In this case the partial sum of the series will depend on the value assumed by x. For example, consider the infinite series S(x)=1+x + x 2 2! + x 3 3! + ···. This is an example of a power series; these are discussed in more detail in section 4.5. It is in fact the Maclaurin expansion of expx (see subsection 4.6.3). Therefore S(x)=expx and, of course, varies according to the value of the variable x. A series might just as easily depend on a complex variable z. A general, random sequence of numbers can be described as a series and a sum of the terms found. However, for cases of practical interest, there will usually be 115 SERIES AND LIMITS some sort of relationship between successive terms. For example, if the nth term of a series is given by u n = 1 2 n , for n =1,2,3,...,N then the sum of the first N terms will be S N = N summationdisplay n=1 u n = 1 2 + 1 4 + 1 8 +···+ 1 2 N . (4.1) It is clear that the sum of a finite number of terms is always finite, provided that each term is itself finite. It is often of practical interest, however, to consider the sum of a series with an infinite number of finite terms. The sum of an infinite number of terms is best defined by first considering the partial sum of the first N terms, S N . If the value of the partial sum S N tends to a finite limit, S,asN tends to infinity, then the series is said to converge and its sum is given by the limit S. In other words, the sum of an infinite series is given by S = lim N→∞ S N , provided the limit exists. For complex infinite series, if S N approaches a limit S = X + iY as N →∞, this means that X N → X and Y N → Y separately, i.e. the real and imaginary parts of the series are each convergent series with sums X and Y respectively. However, not all infinite series have finite sums. As N →∞, the value of the partial sum S N may diverge: it may approach +∞ or −∞, or oscillate finitely or infinitely. Moreover, for a series where each term depends on some variable, its convergence can depend on the value assumed by the variable. Whether an infinite series converges, diverges or oscillates has important implications when describing physical systems. Methods for determining whether a series converges are discussed in section 4.3. 4.2 Summation of series It is often necessary to find the sum of a finite series or a convergent infinite series. We now describe arithmetic, geometric and arithmetico-geometric series, which are particularly common and for which the sums are easily found. Other methods that can sometimes be used to sum more complicated series are discussed below. 116 4.2 SUMMATION OF SERIES 4.2.1 Arithmetic series An arithmetic series has the characteristic that the diﬀerence between successive terms is constant. The sum of a general arithmetic series is written S N = a +(a + d)+(a +2d)+···+ [a +(N−1)d] = N−1 summationdisplay n=0 (a + nd). Rewriting the series in the opposite order and adding this term by term to the original expression for S N , we find S N = N 2 [a + a +(N−1)d] = N 2 (first term + last term). (4.2) If an infinite number of such terms are added the series will increase (or decrease) indefinitely; that is to say, it diverges. trianglerightsldSum the integers between 1 and 1000 inclusive. This is an arithmetic series with a =1,d =1andN = 1000. Therefore, using (4.2) we find S N = 1000 2 (1 + 1000) = 500500, which can be checked directly only with considerable eﬀort. triangleleftsld 4.2.2 Geometric series Equation (4.1) is a particular example of a geometric series, which has the characteristic that the ratio of successive terms is a constant (one-half in this case). The sum of a geometric series is in general written S N = a + ar + ar 2 +···+ ar N−1 = N−1 summationdisplay n=0 ar n , where a is a constant and r is the ratio of successive terms, the common ratio.The sum may be evaluated by considering S N and rS N : S N = a + ar + ar 2 + ar 3 +···+ ar N−1 , rS N = ar + ar 2 + ar 3 + ar 4 + ···+ ar N . If we now subtract the second equation from the first we obtain (1−r)S N = a−ar N , and hence S N = a(1−r N ) 1−r . (4.3) 117 SERIES AND LIMITS For a series with an infinite number of terms and|r|< 1, we have lim N→∞ r N =0, and the sum tends to the limit S = a 1−r . (4.4) In (4.1), r = 1 2 , a = 1 2 ,andsoS =1.For|r|≥1, however, the series either diverges or oscillates. trianglerightsldConsider a ball that drops from a height of 27 m and on each bounce retains only a third of its kinetic energy; thus after one bounce it will return to a height of 9m,aftertwo bounces to 3m, and so on. Find the total distance travelled between the first bounce and the Mth bounce. The total distance travelled between the first bounce and the Mth bounce is given by the sum of M−1 terms: S M−1 =2(9+3+1+···) =2 M−2 summationdisplay m=0 9 3 m for M>1, where the factor 2 is included to allow for both the upward and the downward journey. Inside the parentheses we clearly have a geometric series with first term 9 and common ratio 1/3 and hence the distance is given by (4.3), i.e. S M−1 =2× 9 bracketleftBig 1− parenleftbig 1 3 parenrightbig M−1 bracketrightBig 1− 1 3 =27 bracketleftBig 1− parenleftbig 1 3 parenrightbig M−1 bracketrightBig , where the number of terms N in (4.3) has been replaced by M−1. triangleleftsld 4.2.3 Arithmetico-geometric series An arithmetico-geometric series, as its name suggests, is a combined arithmetic and geometric series. It has the general form S N = a +(a + d)r +(a +2d)r 2 + ···+ [a +(N−1)d]r N−1 = N−1 summationdisplay n=0 (a + nd)r n , and can be summed, in a similar way to a pure geometric series, by multiplying by r and subtracting the result from the original series to obtain (1−r)S N = a + rd + r 2 d + ···+ r N−1 d−[a +(N−1)d]r N . Using the expression for the sum of a geometric series (4.3) and rearranging, we find S N = a−[a +(N−1)d]r N 1−r + rd(1−r N−1 ) (1−r) 2 . For an infinite series with |r| < 1, lim N→∞ r N = 0 as in the previous subsection, and the sum tends to the limit S = a 1−r + rd (1−r) 2 . (4.5) As for a geometric series, if |r|≥1 then the series either diverges or oscillates. 118 4.2 SUMMATION OF SERIES trianglerightsldSum the series S =2+ 5 2 + 8 2 2 + 11 2 3 +···. This is an infinite arithmetico-geometric series with a =2,d =3andr =1/2. Therefore, from (4.5), we obtain S = 10. triangleleftsld 4.2.4 The difference method The diﬀerence method is sometimes useful in summing series that are more complicated than the examples discussed above. Let us consider the general series N summationdisplay n=1 u n = u 1 + u 2 + ···+ u N . If the terms of the series, u n , can be expressed in the form u n = f(n)−f(n−1) for some function f(n) then its (partial) sum is given by S N = N summationdisplay n=1 u n = f(N)−f(0). This can be shown as follows. The sum is given by S N = u 1 + u 2 + ···+ u N and since u n = f(n)−f(n−1), it may be rewritten S N =[f(1)−f(0)] + [f(2)−f(1)] + ···+[f(N)−f(N−1)]. By cancelling terms we see that S N = f(N)−f(0). trianglerightsldEvaluate the sum N summationdisplay n=1 1 n(n +1) . Using partial fractions we find u n =− parenleftbigg 1 n +1 − 1 n parenrightbigg . Hence u n = f(n)−f(n−1) with f(n)=−1/(n +1),andsothesumisgivenby S N = f(N)−f(0) =− 1 N +1 +1= N N +1 . triangleleftsld 119 SERIES AND LIMITS The diﬀerence method may be easily extended to evaluate sums in which each term can be expressed in the form u n = f(n)−f(n−m), (4.6) where m is an integer. By writing out the sum to N terms with each term expressed in this form, and cancelling terms in pairs as before, we find S N = m summationdisplay k=1 f(N−k +1)− m summationdisplay k=1 f(1−k). trianglerightsldEvaluate the sum N summationdisplay n=1 1 n(n +2) . Using partial fractions we find u n =− bracketleftbigg 1 2(n +2) − 1 2n bracketrightbigg . Hence u n = f(n)−f(n−2) with f(n)=−1/[2(n +2)],andsothesumisgivenby S N = f(N)+f(N−1)−f(0)−f(−1) = 3 4 − 1 2 parenleftbigg 1 N +2 + 1 N +1 parenrightbigg . triangleleftsld In fact the diﬀerence method is quite flexible and may be used to evaluate sums even when each term cannot be expressed as in (4.6). The method still relies, however, on being able to write u n in terms of a single function such that most terms in the sum cancel, leaving only a few terms at the beginning and the end. This is best illustrated by an example. trianglerightsldEvaluate the sum N summationdisplay n=1 1 n(n +1)(n +2) . Using partial fractions we find u n = 1 2(n +2) − 1 n +1 + 1 2n . Hence u n = f(n)−2f(n−1) + f(n−2) with f(n)=1/[2(n + 2)]. If we write out the sum, expressing each term u n in this form, we find that most terms cancel and the sum is given by S N = f(N)−f(N−1)−f(0) + f(−1) = 1 4 + 1 2 parenleftbigg 1 N +2 − 1 N +1 parenrightbigg . triangleleftsld 120 4.2 SUMMATION OF SERIES 4.2.5 Series involving natural numbers Series consisting of the natural numbers 1, 2, 3, ..., or the square or cube of these numbers, occur frequently and deserve a special mention. Let us first consider the sum of the first N natural numbers, S N =1+2+3+···+ N = N summationdisplay n=1 n. This is clearly an arithmetic series with first term a = 1 and common diﬀerence d = 1. Therefore, from (4.2), S N = 1 2 N(N +1). Next, we consider the sum of the squares of the first N natural numbers: S N =1 2 +2 2 +3 2 + ...+ N 2 = N summationdisplay n=1 n 2 , which may be evaluated using the diﬀerence method. The nth term in the series is u n = n 2 , which we need to express in the form f(n)−f(n−1) for some function f(n). Consider the function f(n)=n(n + 1)(2n +1) ⇒ f(n−1) = (n−1)n(2n−1). For this function f(n)−f(n−1) = 6n 2 , and so we can write u n = 1 6 [f(n)−f(n−1)]. Therefore, by the diﬀerence method, S N = 1 6 [f(N)−f(0)] = 1 6 N(N + 1)(2N +1). Finally, we calculate the sum of the cubes of the first N natural numbers, S N =1 3 +2 3 +3 3 + ···+ N 3 = N summationdisplay n=1 n 3 , again using the diﬀerence method. Consider the function f(n)=[n(n +1)] 2 ⇒ f(n−1) = [(n−1)n] 2 , for which f(n)−f(n−1) = 4n 3 . Therefore we can write the general nth term of the series as u n = 1 4 [f(n)−f(n−1)], and using the diﬀerence method we find S N = 1 4 [f(N)−f(0)] = 1 4 N 2 (N +1) 2 . Note that this is the square of the sum of the natural numbers, i.e. N summationdisplay n=1 n 3 = parenleftBigg N summationdisplay n=1 n parenrightBigg 2 . 121 SERIES AND LIMITS trianglerightsldSum the series N summationdisplay n=1 (n +1)(n +3). The nth term in this series is u n =(n +1)(n +3)=n 2 +4n +3, and therefore we can write N summationdisplay n=1 (n +1)(n +3)= N summationdisplay n=1 (n 2 +4n +3) = N summationdisplay n=1 n 2 +4 N summationdisplay n=1 n + N summationdisplay n=1 3 = 1 6 N(N + 1)(2N +1)+4× 1 2 N(N +1)+3N = 1 6 N(2N 2 +15N + 31). triangleleftsld 4.2.6 Transformation of series A complicated series may sometimes be summed by transforming it into a familiar series for which we already know the sum, perhaps a geometric series or the Maclaurin expansion of a simple function (see subsection 4.6.3). Various techniques are useful, and deciding which one to use in any given case is a matter of experience. We now discuss a few of the more common methods. The diﬀerentiation or integration of a series is often useful in transforming an apparently intractable series into a more familiar one. If we wish to diﬀerentiate or integrate a series that already depends on some variable then we may do so in a straightforward manner. trianglerightsldSum the series S(x)= x 4 3(0!) + x 5 4(1!) + x 6 5(2!) +···. Dividing both sides by x we obtain S(x) x = x 3 3(0!) + x 4 4(1!) + x 5 5(2!) +···, which is easily diﬀerentiated to give d dx bracketleftbigg S(x) x bracketrightbigg = x 2 0! + x 3 1! + x 4 2! + x 5 3! +···. Recalling the Maclaurin expansion of expx given in subsection 4.6.3, we recognise that the RHS is equal to x 2 expx. Having done so, we can now integrate both sides to obtain S(x)/x = integraldisplay x 2 expxdx. 122 4.2 SUMMATION OF SERIES Integrating the RHS by parts we find S(x)/x = x 2 expx−2xexpx +2expx + c, where the value of the constant of integration c canbefixedbytherequirementthat S(x)/x =0atx = 0. Thus we find that c =−2 and that the sum is given by S(x)=x 3 expx−2x 2 expx +2xexpx−2x. triangleleftsld Often, however, we require the sum of a series that does not depend on a variable. In this case, in order that we may diﬀerentiate or integrate the series, we define a function of some variable x such that the value of this function is equal to the sum of the series for some particular value of x (usually at x =1). trianglerightsldSum the series S =1+ 2 2 + 3 2 2 + 4 2 3 +···. Let us begin by defining the function f(x)=1+2x +3x 2 +4x 3 +···, so that the sum S = f(1/2). Integrating this function we obtain integraldisplay f(x) dx = x + x 2 + x 3 +···, which we recognise as an infinite geometric series with first term a = x and common ratio r = x. Therefore, from (4.4), we find that the sum of this series is x/(1−x). In other words integraldisplay f(x) dx = x 1−x , so that f(x)isgivenby f(x)= d dx parenleftBig x 1−x parenrightBig = 1 (1−x) 2 . The sum of the original series is therefore S = f(1/2) = 4. triangleleftsld Aside from diﬀerentiation and integration, an appropriate substitution can sometimes transform a series into a more familiar form. In particular, series with terms that contain trigonometric functions can often be summed by the use of complex exponentials. trianglerightsldSum the series S(θ)=1+cosθ + cos 2θ 2! + cos 3θ 3! +···. Replacing the cosine terms with a complex exponential, we obtain S(θ)=Re braceleftbigg 1+expiθ + exp 2iθ 2! + exp3iθ 3! +··· bracerightbigg =Re braceleftbigg 1+expiθ + (expiθ) 2 2! + (expiθ) 3 3! +··· bracerightbigg . 123 SERIES AND LIMITS Again using the Maclaurin expansion of expx given in subsection 4.6.3, we notice that S(θ) = Re [exp(expiθ)] = Re [exp(cosθ + isinθ)] =Re{[exp(cosθ)][exp(isinθ)]}= [exp(cosθ)]Re [exp(isinθ)] =[exp(cosθ)][cos(sinθ)]. triangleleftsld 4.3 Convergence of infinite series Although the sums of some commonly occurring infinite series may be found, the sum of a general infinite series is usually diﬃcult to calculate. Nevertheless, it is often useful to know whether the partial sum of such a series converges to a limit, even if the limit cannot be found explicitly. As mentioned at the end of section 4.1, if we allow N to tend to infinity, the partial sum S N = N summationdisplay n=1 u n of a series may tend to a definite limit (i.e. the sum S of the series), or increase or decrease without limit, or oscillate finitely or infinitely. To investigate the convergence of any given series, it is useful to have available a number of tests and theorems of general applicability. We discuss them below; some we will merely state, since once they have been stated they become almost self-evident, but are no less useful for that. 4.3.1 Absolute and conditional convergence Let us first consider some general points concerning the convergence, or otherwise, of an infinite series. In general an infinite series summationtext u n can have complex terms, and in the special case of a real series the terms can be positive or negative. From any such series, however, we can always construct another series summationtext |u n | in which each term is simply the modulus of the corresponding term in the original series. Then each term in the new series will be a positive real number. If the series summationtext |u n | converges then summationtext u n also converges, and summationtext u n is said to be absolutely convergent, i.e. the series formed by the absolute values is convergent. For an absolutely convergent series, the terms may be reordered without aﬀecting the convergence of the series. However, if summationtext |u n | diverges whilst summationtext u n converges then summationtext u n is said to be conditionally convergent. For a conditionally convergent series, rearranging the order of the terms can aﬀect the behaviour of the sum and, hence, whether the series converges or diverges. In fact, a theorem due to Riemann shows that, by a suitable rearrangement, a conditionally convergent series may be made to converge to any arbitrary limit, or to diverge, or to oscillate finitely or infinitely! Of course, if the original series summationtext u n consists only of positive real terms and converges then automatically it is absolutely convergent. 124 4.3 CONVERGENCE OF INFINITE SERIES 4.3.2 Convergence of a series containing only real positive terms As discussed above, in order to test for the absolute convergence of a series summationtext u n , we first construct the corresponding series summationtext |u n | that consists only of real positive terms. Therefore in this subsection we will restrict our attention to series of this type. We discuss below some tests that may be used to investigate the convergence of such a series. Before doing so, however, we note the following crucial consideration. In all the tests for, or discussions of, the convergence of a series, it is not what happens in the first ten, or the first thousand, or the first million terms (or any other finite number of terms) that matters, but what happens ultimately. Preliminary test A necessary but not suﬃcient condition for a series of real positive terms summationtext u n to be convergent is that the term u n tends to zero as n tends to infinity, i.e. we require lim n→∞ u n =0. If this condition is not satisfied then the series must diverge. Even if it is satisfied, however, the series may still diverge, and further testing is required. Comparison test The comparison test is the most basic test for convergence. Let us consider two series summationtext u n and summationtext v n and suppose that we know the latter to be convergent (by some earlier analysis, for example). Then, if each term u n in the first series is less than or equal to the corresponding term v n in the second series, for all n greater than some fixed number N that will vary from series to series, then the original series summationtext u n is also convergent. In other words, if summationtext v n is convergent and u n ≤v n for n>N, then summationtext u n converges. However, if summationtext v n diverges and u n ≥v n for all n greater than some fixed number then summationtext u n diverges. trianglerightsldDetermine whether the following series converges: ∞ summationdisplay n=1 1 n!+1 = 1 2 + 1 3 + 1 7 + 1 25 +···. (4.7) Let us compare this series with the series ∞ summationdisplay n=0 1 n! = 1 0! + 1 1! + 1 2! + 1 3! +···=2+ 1 2! + 1 3! +···, (4.8) 125 SERIES AND LIMITS which is merely the series obtained by setting x = 1 in the Maclaurin expansion of expx (see subsection 4.6.3), i.e. exp(1) = e =1+ 1 1! + 1 2! + 1 3! +···. Clearly this second series is convergent, since it consists of only positive terms and has a finite sum. Thus, since each term u n in the series (4.7) is less than the corresponding term 1/n! in (4.8), we conclude from the comparison test that (4.7) is also convergent. triangleleftsld D’Alembert’s ratio test The ratio test determines whether a series converges by comparing the relative magnitude of successive terms. If we consider a series summationtext u n and set ρ = lim n→∞ parenleftbigg u n+1 u n parenrightbigg , (4.9) then if ρ<1 the series is convergent; if ρ>1 the series is divergent; if ρ =1 then the behaviour of the series is undetermined by this test. To prove this we observe that if the limit (4.9) is less than unity, i.e. ρ<1then we can find a value r in the range ρN. Now the terms u n of the series that follow u N are u N+1 ,u N+2 ,u N+3 , ..., and each of these is less than the corresponding term of ru N ,r 2 u N ,r 3 u N , .... (4.10) However, the terms of (4.10) are those of a geometric series with a common ratio r that is less than unity. This geometric series consequently converges and therefore, by the comparison test discussed above, so must the original series summationtext u n . An analogous argument may be used to prove the divergent case when ρ>1. trianglerightsldDetermine whether the following series converges: ∞ summationdisplay n=0 1 n! = 1 0! + 1 1! + 1 2! + 1 3! +···=2+ 1 2! + 1 3! +···. As mentioned in the previous example, this series may be obtained by setting x =1inthe Maclaurin expansion of expx, and hence we know already that it converges and has the sum exp(1) = e. Nevertheless, we may use the ratio test to confirm that it converges. Using (4.9), we have ρ = lim n→∞ bracketleftbigg n! (n +1)! bracketrightbigg = lim n→∞ parenleftbigg 1 n +1 parenrightbigg = 0 (4.11) and since ρ<1, the series converges, as expected. triangleleftsld 126 4.3 CONVERGENCE OF INFINITE SERIES Ratio comparison test As its name suggests, the ratio comparison test is a combination of the ratio and comparison tests. Let us consider the two series summationtext u n and summationtext v n and assume that we know the latter to be convergent. It may be shown that if u n+1 u n ≤ v n+1 v n for all n greater than some fixed value N then summationtext u n is also convergent. Similarly, if u n+1 u n ≥ v n+1 v n for all suﬃciently large n,and summationtext v n diverges then summationtext u n also diverges. trianglerightsldDetermine whether the following series converges: ∞ summationdisplay n=1 1 (n!) 2 =1+ 1 2 2 + 1 6 2 +···. In this case the ratio of successive terms, as n tends to infinity, is given by R = lim n→∞ bracketleftbigg n! (n +1)! bracketrightbigg 2 = lim n→∞ parenleftbigg 1 n +1 parenrightbigg 2 , which is less than the ratio seen in (4.11). Hence, by the ratio comparison test, the series converges. (It is clear that this series could also be found to be convergent using the ratio test.) triangleleftsld Quotient test The quotient test may also be considered as a combination of the ratio and comparison tests. Let us again consider the two series summationtext u n and summationtext v n , and define ρ as the limit ρ = lim n→∞ parenleftbigg u n v n parenrightbigg . (4.12) Then, it can be shown that: (i) if ρ negationslash= 0 but is finite then summationtext u n and summationtext v n either both converge or both diverge; (ii) if ρ = 0 and summationtext v n converges then summationtext u n converges; (iii) if ρ =∞ and summationtext v n diverges then summationtext u n diverges. 127 SERIES AND LIMITS trianglerightsldGiven that the series summationtext ∞ n=1 1/n diverges, determine whether the following series converges: ∞ summationdisplay n=1 4n 2 −n−3 n 3 +2n . (4.13) If we set u n =(4n 2 −n−3)/(n 3 +2n)andv n =1/n then the limit (4.12) becomes ρ = lim n→∞ bracketleftbigg (4n 2 −n−3)/(n 3 +2n) 1/n bracketrightbigg = lim n→∞ bracketleftbigg 4n 3 −n 2 −3n n 3 +2n bracketrightbigg =4. Since ρ is finite but non-zero and summationtext v n diverges, from (i) above summationtext u n must also diverge. triangleleftsld Integral test The integral test is an extremely powerful means of investigating the convergence of a series summationtext u n . Suppose that there exists a function f(x) which monotonically decreases for x greater than some fixed value x 0 and for which f(n)=u n ,i.e.the value of the function at integer values of x is equal to the corresponding term in the series under investigation. Then it can be shown that, if the limit of the integral lim N→∞ integraldisplay N f(x)dx exists, the series summationtext u n is convergent. Otherwise the series diverges. Note that the integral defined here has no lower limit; the test is sometimes stated with a lower limit, equal to unity, for the integral, but this can lead to unnecessary diﬃculties. trianglerightsldDetermine whether the following series converges: ∞ summationdisplay n=1 1 (n−3/2) 2 =4+4+ 4 9 + 4 25 +···. Let us consider the function f(x)=(x−3/2) −2 . Clearly f(n)=u n and f(x) monotonically decreases for x>3/2. Applying the integral test, we consider lim N→∞ integraldisplay N 1 (x−3/2) 2 dx = lim N→∞ parenleftbigg −1 N−3/2 parenrightbigg =0. Since the limit exists the series converges. Note, however, that if we had included a lower limit, equal to unity, in the integral then we would have run into problems, since the integrand diverges at x =3/2. triangleleftsld The integral test is also useful for examining the convergence of the Riemann zeta series. This is a special series that occurs regularly and is of the form ∞ summationdisplay n=1 1 n p . It converges for p>1 and diverges if p≤ 1. These convergence criteria may be derived as follows. 128 4.3 CONVERGENCE OF INFINITE SERIES Using the integral test, we consider lim N→∞ integraldisplay N 1 x p dx = lim N→∞ parenleftbigg N 1−p 1−p parenrightbigg , and it is obvious that the limit tends to zero for p>1andto∞ for p≤1. Cauchy’s root test Cauchy’s root test may be useful in testing for convergence, especially if the nth terms of the series contains an nth power. If we define the limit ρ = lim n→∞ (u n ) 1/n , then it may be proved that the series summationtext u n converges if ρ<1. If ρ>1 then the series diverges. Its behaviour is undetermined if ρ =1. trianglerightsldDetermine whether the following series converges: ∞ summationdisplay n=1 parenleftbigg 1 n parenrightbigg n =1+ 1 4 + 1 27 +···. Using Cauchy’s root test, we find ρ = lim n→∞ parenleftbigg 1 n parenrightbigg =0, and hence the series converges. triangleleftsld Grouping terms We now consider the Riemann zeta series, mentioned above, with an alternative proof of its convergence that uses the method of grouping terms. In general there are better ways of determining convergence, but the grouping method may be used if it is not immediately obvious how to approach a problem by a better method. First consider the case where p>1, and group the terms in the series as follows: S N = 1 1 p + parenleftbigg 1 2 p + 1 3 p parenrightbigg + parenleftbigg 1 4 p + ···+ 1 7 p parenrightbigg +···. Now we can see that each bracket of this series is less than each term of the geometric series S N = 1 1 p + 2 2 p + 4 4 p +···. This geometric series has common ratio r = parenleftbig 1 2 parenrightbig p−1 ;sincep>1, it follows that r<1 and that the geometric series converges. Then the comparison test shows that the Riemann zeta series also converges for p>1. 129 SERIES AND LIMITS The divergence of the Riemann zeta series for p ≤ 1 can be seen by first considering the case p = 1. The series is S N =1+ 1 2 + 1 3 + 1 4 + ···, which does not converge, as may be seen by bracketing the terms of the series in groups in the following way: S N = N summationdisplay n=1 u n =1+ parenleftbigg 1 2 parenrightbigg + parenleftbigg 1 3 + 1 4 parenrightbigg + parenleftbigg 1 5 + 1 6 + 1 7 + 1 8 parenrightbigg +···. The sum of the terms in each bracket is ≥ 1 2 and, since as many such groupings can be made as we wish, it is clear that S N increases indefinitely as N is increased. Now returning to the case of the Riemann zeta series for p<1, we note that each term in the series is greater than the corresponding one in the series for which p =1.Inotherwords1/n p > 1/n for n>1, p<1. The comparison test then shows us that the Riemann zeta series will diverge for all p≤1. 4.3.3 Alternating series test The tests discussed in the last subsection have been concerned with determining whether the series of real positive terms summationtext |u n | converges, and so whether summationtext u n is absolutely convergent. Nevertheless, it is sometimes useful to consider whether a series is merely convergent rather than absolutely convergent. This is especially true for series containing an infinite number of both positive and negative terms. In particular, we will consider the convergence of series in which the positive and negative terms alternate, i.e. an alternating series. An alternating series can be written as ∞ summationdisplay n=1 (−1) n+1 u n = u 1 −u 2 + u 3 −u 4 + u 5 −···, with all u n ≥ 0. Such a series can be shown to converge provided (i) u n → 0as n→∞and (ii) u n

__Nfor some finite N. If these conditions are not met then the series oscillates. To prove this, suppose for definiteness that N is odd and consider the series starting at u N . The sum of its first 2m terms is S 2m =(u N −u N+1 )+(u N+2 −u N+3 )+···+(u N+2m−2 −u N+2m−1 ). By condition (ii) above, all the parentheses are positive, and so S 2m increases as m increases. We can also write, however, S 2m = u N −(u N+1 −u N+2 )−···−(u N+2m−3 −u N+2m−2 )−u N+2m−1 , and since each parenthesis is positive, we must have S 2m__

__1, is not greater than p/(p−1). 4.17 Demonstrate that rearranging the order of its terms can make a condition- ally convergent series converge to a diﬀerent limit by considering the series summationtext (−1) n+1 n −1 =ln2=0.693. Rearrange the series as S = 1 1 + 1 3 − 1 2 + 1 5 + 1 7 − 1 4 + 1 9 + 1 11 − 1 6 + 1 13 +··· and group each set of three successive terms. Show that the series can then be written ∞ summationdisplay m=1 8m−3 2m(4m−3)(4m−1) , which is convergent (by comparison with summationtext n −2 ) and contains only positive terms. Evaluate the first of these and hence deduce that S is not equal to ln 2. 4.18 Illustrate result (iv) of section 4.4, concerning Cauchy products, by considering thedoublesummation S = ∞ summationdisplay n=1 n summationdisplay r=1 1 r 2 (n +1−r) 3 . By examining the points in the nr-plane over which the double summation is to be carried out, show that S can be written as S = ∞ summationdisplay n=r ∞ summationdisplay r=1 1 r 2 (n +1−r) 3 . Deduce that S ≤3. 4.19 A Fabry–P´erot interferometer consists of two parallel heavily silvered glass plates; light enters normally to the plates, and undergoes repeated reflections between them, with a small transmitted fraction emerging at each reflection. Find the intensity of the emerging wave, |B| 2 ,where B = A(1−r) ∞ summationdisplay n=0 r n e inφ , with r and φ real. 146 4.8 EXERCISES 4.20 Identify the series ∞ summationdisplay n=1 (−1) n+1 x 2n (2n−1)! , and then, by integration and diﬀerentiation, deduce the values S of the following series: (a) ∞ summationdisplay n=1 (−1) n+1 n 2 (2n)! ,(b) ∞ summationdisplay n=1 (−1) n+1 n (2n +1)! , (c) ∞ summationdisplay n=1 (−1) n+1 nπ 2n 4 n (2n−1)! ,(d) ∞ summationdisplay n=0 (−1) n (n +1) (2n)! . 4.21 Starting from the Maclaurin series for cosx, show that (cosx) −2 =1+x 2 + 2x 4 3 +···. Deduce the first three terms in the Maclaurin series for tanx. 4.22 Find the Maclaurin series for: (a) ln parenleftbigg 1+x 1−x parenrightbigg , (b) (x 2 +4) −1 , (c) sin 2 x. 4.23 Writing the nth derivative of f(x)=sinh −1 x as f (n) (x)= P n (x) (1 + x 2 ) n−1/2 , where P n (x) is a polynomial (of order n−1), show that the P n (x)satisfythe recurrence relation P n+1 (x)=(1+x 2 )P prime n (x)−(2n−1)xP n (x). Hence generate the coeﬃcients necessary to express sinh −1 x as a Maclaurin series up to terms in x 5 . 4.24 Find the first three non-zero terms in the Maclaurin series for the following functions: (a) (x 2 +9) −1/2 , (b) ln[(2 + x) 3 ], (c) exp(sinx), (d) ln(cosx), (e) exp[−(x−a) −2 ], (f) tan −1 x. 4.25 By using the logarithmic series, prove that if a and b are positive and nearly equal then ln a b similarequal 2(a−b) a + b . Show that the error in this approximation is about 2(a−b) 3 /[3(a + b) 3 ]. 4.26 Determine whether the following functions f(x) are (i) continuous, and (ii) diﬀerentiable at x =0: (a) f(x)=exp(−|x|); (b) f(x)=(1−cosx)/x 2 for xnegationslash=0,f(0) = 1 2 ; (c) f(x)=xsin(1/x)forxnegationslash=0,f(0) = 0; (d) f(x)=[4−x 2 ], where [y] denotes the integer part of y. 4.27 Find the limit as x→0of[ √ 1+x m − √ 1−x m ]/x n ,inwhichm and n are positive integers. 4.28 Evaluate the following limits: 147 SERIES AND LIMITS (a) lim x→0 sin3x sinhx , (b) lim x→0 tanx−tanhx sinhx−x , (c) lim x→0 tanx−x cosx−1 , (d) lim x→0 parenleftbigg cosecx x 3 − sinhx x 5 parenrightbigg . 4.29 Find the limits of the following functions: (a) x 3 + x 2 −5x−2 2x 3 −7x 2 +4x +4 ,asx→0, x→∞and x→2; (b) sinx−xcoshx sinhx−x ,asx→0; (c) integraldisplay π/2 x parenleftbigg y cosy−siny y 2 parenrightbigg dy,asx→0. 4.30 Use Taylor expansions to three terms to find approximations to (a) 4 √ 17, and (b) 3 √ 26. 4.31 Using a first-order Taylor expansion about x = x 0 , show that a better approxi- mation than x 0 to the solution of the equation f(x)=sinx +tanx =2 is given by x = x 0 + δ,where δ = 2−f(x 0 ) cosx 0 +sec 2 x 0 . (a) Use this procedure twice to find the solution of f(x) = 2 to six significant figures, given that it is close to x =0.9. (b) Use the result in (a) to deduce, to the same degree of accuracy, one solution of the quartic equation y 4 −4y 3 +4y 2 +4y−4=0. 4.32 Evaluate lim x→0 bracketleftbigg 1 x 3 parenleftbigg cosecx− 1 x − x 6 parenrightbiggbracketrightbigg . 4.33 In quantum theory, a system of oscillators, each of fundamental frequency ν and interacting at temperature T, has an average energy ¯ E given by ¯ E = summationtext ∞ n=0 nhνe −nx summationtext ∞ n=0 e −nx , where x = hν/kT, h and k being the Planck and Boltzmann constants, respec- tively. Prove that both series converge, evaluate their sums, and show that at high temperatures ¯ E ≈kT, whilst at low temperatures ¯ E ≈hν exp(−hν/kT). 4.34 In a very simple model of a crystal, point-like atomic ions are regularly spaced along an infinite one-dimensional row with spacing R. Alternate ions carry equal and opposite charges ±e. The potential energy of the ith ion in the electric field due to another ion, the jth, is q i q j 4πepsilon1 0 r ij , where q i , q j are the charges on the ions and r ij is the distance between them. Write down a series giving the total contribution V i of the ithiontotheoverall potential energy. Show that the series converges, and, if V i is written as V i = αe 2 4πepsilon1 0 R , 148 4.9 HINTS AND ANSWERS find a closed-form expression for α, the Madelung constant for this (unrealistic) lattice. 4.35 One of the factors contributing to the high relative permittivity of water to static electric fields is the permanent electric dipole moment, p, of the water molecule. In an external field E the dipoles tend to line up with the field, but they do not do so completely because of thermal agitation corresponding to the temperature, T, of the water. A classical (non-quantum) calculation using the Boltzmann distribution shows that the average polarisability per molecule, α,isgivenby α = p E (cothx−x −1 ), where x = pE/(kT)andk is the Boltzmann constant. At ordinary temperatures, even with high field strengths (10 4 Vm −1 or more), xlessmuch1. By making suitable series expansions of the hyperbolic functions involved, show that α = p 2 /(3kT) to an accuracy of about one part in 15x −2 . 4.36 In quantum theory, a certain method (the Born approximation) gives the (so- called) amplitude f(θ) for the scattering of a particle of mass m through an angle θ by a uniform potential well of depth V 0 and radius b (i.e. the potential energy of the particle is −V 0 within a sphere of radius b and zero elsewhere) as f(θ)= 2mV 0 planckover2pi1 2 K 3 (sinKb−KbcosKb). Hereplanckover2pi1is the Planck constant divided by 2π, the energy of the particle isplanckover2pi1 2 k 2 /(2m) and K is 2ksin(θ/2). Use l’H ˆopital’s rule to evaluate the amplitude at low energies, i.e. when k and hence K tend to zero, and so determine the low-energy total cross-section. [ Note: the diﬀerential cross-section is given by |f(θ)| 2 and the total cross- section by the integral of this over all solid angles, i.e. 2π integraltext π 0 |f(θ)| 2 sinθdθ.] 4.9 Hints and answers 4.1 Write as 2( summationtext 1000 n=1 n− summationtext 499 n=1 n) = 751500. 4.3 Divergent for r≤1; convergent for r≥2. 4.5 (a) ln(N + 1), divergent; (b) 1 3 [1−(−2) n ], oscillates infinitely; (c) Add 1 3 S N to the S N series; 3 16 [1−(−3) −N ]+ 3 4 N(−3) −N−1 ,convergentto 3 16 . 4.7 Write the nth term as the diﬀerence between two consecutive values of a partial- fraction function of n.Thesumequals 1 2 (1−N −2 ). 4.9 Sum the geometric series with rth term exp[i(θ + rα)]. Its real part is {cosθ−cos [(n +1)α + θ]−cos(θ−α)+cos(θ + nα)}/4sin 2 (α/2), which can be reduced to the given answer. 4.11 (a) −1 ≤ x<1; (b) all x except x =(2n±1)π/2; (c) x<−1; (d) x<0; (e) always divergent. Clearly divergent for x>−1. For −X = x<−1, consider ∞ summationdisplay k=1 M k summationdisplay n=M k−1 +1 1 (lnM k ) X , where lnM k = k and note that M k −M k−1 = e −1 (e−1)M k ; hence show that the series diverges. 4.13 (a) Absolutely convergent, compare with exercise 4.10(b). (b) Oscillates finitely. (c) Absolutely convergent for all x. (d) Absolutely convergent; use partial frac- tions. (e) Oscillates infinitely. 149 SERIES AND LIMITS 4.15 Divide the series into two series, n odd and n even. For r = 2 both are absolutely convergent, by comparison with summationtext n −2 .Forr = 1 neither series is convergent, by comparison with summationtext n −1 . However, the sum of the two is convergent, by the alternating sign test or by showing that the terms cancel in pairs. 4.17 The first term has value 0.833 and all other terms are positive. 4.19 |A| 2 (1−r) 2 /(1 + r 2 −2rcosφ). 4.21 Use the binomial expansion and collect terms up to x 4 . Integrate both sides of the displayed equation. tanx = x + x 3 /3+2x 5 /15 +···. 4.23 For example, P 5 (x)=24x 4 −72x 2 +9.sinh −1 x = x−x 3 /6+3x 5 /40−···. 4.25 Set a = D + δ and b = D−δ and use the expansion for ln(1±δ/D). 4.27 The limit is 0 for m>n,1form = n,and∞ for m__ 0; (ii) a maximum if d 2 f/dx 2 < 0; (iii) a stationary point of inflection if d 2 f/dx 2 = 0 and changes sign through the point. We now consider the stationary points of functions of more than one variable; we will see that partial diﬀerential analysis is ideally suited to the determination of the position and nature of such points. It is helpful to consider first the case of a function of just two variables but, even in this case, the general situation is more complex than that for a function of one variable, as can be seen from figure 5.2. This figure shows part of a three-dimensional model of a function f(x,y). At positions P and B there are a peak and a bowl respectively or, more mathemati- cally, a local maximum and a local minimum. At position S the gradient in any direction is zero but the situation is complicated, since a section parallel to the plane x = 0 would show a maximum, but one parallel to the plane y =0would show a minimum. A point such as S is known as a saddle point. The orientation of the ‘saddle’ in the xy-plane is irrelevant; it is as shown in the figure solely for ease of discussion. For any saddle point the function increases in some directions away from the point but decreases in other directions. 162 5.8 STATIONARY VALUES OF MANY-VARIABLE FUNCTIONS P S y x B Figure 5.2 Stationary points of a function of two variables. A minimum occurs at B, a maximum at P and a saddle point at S. For functions of two variables, such as the one shown, it should be clear that a necessary condition for a stationary point (maximum, minimum or saddle point) to occur is that ∂f ∂x = 0 and ∂f ∂y =0. (5.21) The vanishing of the partial derivatives in directions parallel to the axes is enough to ensure that the partial derivative in any arbitrary direction is also zero. The latter can be considered as the superposition of two contributions, one along each axis; since both contributions are zero, so is the partial derivative in the arbitrary direction. This may be made more precise by considering the total diﬀerential df = ∂f ∂x dx + ∂f ∂y dy. Using (5.21) we see that although the infinitesimal changes dx and dy can be chosen independently the change in the value of the infinitesimal function df is always zero at a stationary point. We now turn our attention to determining the nature of a stationary point of a function of two variables, i.e. whether it is a maximum, a minimum or a saddle point. By analogy with the one-variable case we see that ∂ 2 f/∂x 2 and ∂ 2 f/∂y 2 must both be positive for a minimum and both be negative for a maximum. However these are not suﬃcient conditions since they could also be obeyed at complicated saddle points. What is important for a minimum (or maximum) is that the second partial derivative must be positive (or negative) in all directions, not just in the x-andy- directions. 163 PARTIAL DIFFERENTIATION To establish just what constitutes suﬃcient conditions we first note that, since f is a function of two variables and ∂f/∂x = ∂f/∂y = 0, a Taylor expansion of the type (5.18) about the stationary point yields f(x,y)−f(x 0 ,y 0 )≈ 1 2! bracketleftbig (∆x) 2 f xx +2∆x∆yf xy +(∆y) 2 f yy bracketrightbig , where ∆x = x−x 0 and ∆y = y−y 0 and where the partial derivatives have been written in more compact notation. Rearranging the contents of the bracket as the weighted sum of two squares, we find f(x,y)−f(x 0 ,y 0 )≈ 1 2 bracketleftBigg f xx parenleftbigg ∆x + f xy ∆y f xx parenrightbigg 2 +(∆y) 2 parenleftBigg f yy − f 2 xy f xx parenrightBiggbracketrightBigg . (5.22) For a minimum, we require (5.22) to be positive for all ∆x and ∆y, and hence f xx > 0andf yy −(f 2 xy /f xx ) > 0. Given the first constraint, the second can be written f xx f yy >f 2 xy . Similarly for a maximum we require (5.22) to be negative, and hence f xx < 0andf xx f yy >f 2 xy . For minima and maxima, symmetry requires that f yy obeys the same criteria as f xx . When (5.22) is negative (or zero) for some values of ∆x and ∆y but positive (or zero) for others, we have a saddle point. In this case f xx f yy f xx f yy . Note, however, that if f 2 xy = f xx f yy then f(x,y)−f(x 0 ,y 0 )canbewritteninone of the four forms ± 1 2 parenleftBig ∆x|f xx | 1/2 ±∆y|f yy | 1/2 parenrightBig 2 . For some choice of the ratio ∆y/∆x this expression has zero value, showing that, for a displacement from the stationary point in this particular direction, f(x 0 +∆x,y 0 +∆y) does not diﬀer from f(x 0 ,y 0 )tosecondorderin∆x and ∆y; in such situations further investigation is required. In particular, if f xx , f yy and f xy are all zero then the Taylor expansion has to be taken to a higher order. As examples, such extended investigations would show that the function f(x,y)=x 4 + y 4 has a minimum at the origin but that g(x,y)=x 4 + y 3 has a saddle point there. 164 5.8 STATIONARY VALUES OF MANY-VARIABLE FUNCTIONS trianglerightsldShow that the function f(x,y)=x 3 exp(−x 2 −y 2 ) has a maximum at the point ( radicalbig 3/2,0), a minimum at (− radicalbig 3/2,0) and a stationary point at the origin whose nature cannot be determined by the above procedures. Setting the first two partial derivatives to zero to locate the stationary points, we find ∂f ∂x =(3x 2 −2x 4 )exp(−x 2 −y 2 )=0, (5.23) ∂f ∂y =−2yx 3 exp(−x 2 −y 2 )=0. (5.24) For (5.24) to be satisfied we require x =0ory = 0 and for (5.23) to be satisfied we require x =0orx =± radicalbig 3/2. Hence the stationary points are at (0,0), ( radicalbig 3/2,0) and (− radicalbig 3/2,0). We now find the second partial derivatives: f xx =(4x 5 −14x 3 +6x)exp(−x 2 −y 2 ), f yy = x 3 (4y 2 −2) exp(−x 2 −y 2 ), f xy =2x 2 y(2x 2 −3) exp(−x 2 −y 2 ). We then substitute the pairs of values of x and y for each stationary point and find that at (0,0) f xx =0,f yy =0,f xy =0 and at (± radicalbig 3/2,0) f xx =∓6 radicalbig 3/2exp(−3/2),f yy =∓3 radicalbig 3/2exp(−3/2),f xy =0. Hence, applying criteria (i)–(iii) above, we find that (0,0) is an undetermined stationary point, ( radicalbig 3/2,0) is a maximum and (− radicalbig 3/2,0) is a minimum. The function is shown in figure 5.3. triangleleftsld Determining the nature of stationary points for functions of a general number of variables is considerably more diﬃcult and requires a knowledge of the eigenvectors and eigenvalues of matrices. Although these are not discussed until chapter 8, we present the analysis here for completeness. The remainder of this section can therefore be omitted on a first reading. For a function of n real variables, f(x 1 ,x 2 ,...,x n ), we require that, at all stationary points, ∂f ∂x i = 0 for all x i . In order to determine the nature of a stationary point, we must expand the function as a Taylor series about the point. Recalling the Taylor expansion (5.20) for a function of n variables, we see that ∆f = f(x)−f(x 0 )≈ 1 2 summationdisplay i summationdisplay j ∂ 2 f ∂x i ∂x j ∆x i ∆x j . (5.25) 165 PARTIAL DIFFERENTIATION minimum x y −1 1 2 3 −2 2 −2−3 −0.2 −0.4 0.2 0.4 0 0 0 maximum Figure 5.3 The function f(x,y)=x 3 exp(−x 2 −y 2 ). If we define the matrix M to have elements given by M ij = ∂ 2 f ∂x i ∂x j , then we can rewrite (5.25) as ∆f = 1 2 ∆x T M∆x, (5.26) where ∆x is the column vector with the ∆x i as its components and ∆x T is its transpose. Since M is real and symmetric it has n real eigenvalues λ r and n orthogonal eigenvectors e r , which after suitable normalisation satisfy Me r = λ r e r , e T r e s = δ rs , where the Kronecker delta, written δ rs , equals unity for r = s and equals zero otherwise. These eigenvectors form a basis set for the n-dimensional space and we can therefore expand ∆x in terms of them, obtaining ∆x = summationdisplay r a r e r , 166 5.9 STATIONARY VALUES UNDER CONSTRAINTS where the a r are coeﬃcients dependent upon ∆x. Substituting this into (5.26), we find ∆f = 1 2 ∆x T M∆x = 1 2 summationdisplay r λ r a 2 r . Now, for the stationary point to be a minimum, we require ∆f = 1 2 summationtext r λ r a 2 r > 0 for all sets of values of the a r , and therefore all the eigenvalues of M to be greater than zero. Conversely, for a maximum we require ∆f = 1 2 summationtext r λ r a 2 r < 0, and therefore all the eigenvalues of M to be less than zero. If the eigenvalues have mixed signs, then we have a saddle point. Note that the test may fail if some or all of the eigenvalues are equal to zero and all the non-zero ones have the same sign. trianglerightsldDerive the conditions for maxima, minima and saddle points for a function of two real variables, using the above analysis. For a two-variable function the matrix M is given by M = parenleftbigg f xx f xy f yx f yy parenrightbigg . Therefore its eigenvalues satisfy the equation vextendsingle vextendsingle vextendsingle vextendsingle f xx −λf xy f xy f yy −λ vextendsingle vextendsingle vextendsingle vextendsingle =0. Hence (f xx −λ)(f yy −λ)−f 2 xy =0 ⇒ f xx f yy −(f xx + f yy )λ + λ 2 −f 2 xy =0 ⇒ 2λ =(f xx + f yy )± radicalBig (f xx + f yy ) 2 −4(f xx f yy −f 2 xy ), which by rearrangement of the terms under the square root gives 2λ =(f xx + f yy )± radicalBig (f xx −f yy ) 2 +4f 2 xy . Now, that M is real and symmetric implies that its eigenvalues are real, and so for both eigenvalues to be positive (corresponding to a minimum), we require f xx and f yy positive and also f xx + f yy > radicalBig (f xx + f yy ) 2 −4(f xx f yy −f 2 xy ), ⇒ f xx f yy −f 2 xy > 0. A similar procedure will find the criteria for maxima and saddle points. triangleleftsld 5.9 Stationary values under constraints In the previous section we looked at the problem of finding stationary values of a function of two or more variables when all the variables may be independently 167 PARTIAL DIFFERENTIATION varied. However, it is often the case in physical problems that not all the vari- ables used to describe a situation are in fact independent, i.e. some relationship between the variables must be satisfied. For example, if we walk through a hilly landscape and we are constrained to walk along a path, we will never reach the highest peak on the landscape unless the path happens to take us to it. Nevertheless, we can still find the highest point that we have reached during our journey. We first discuss the case of a function of just two variables. Let us consider finding the maximum value of the diﬀerentiable function f(x,y) subject to the constraint g(x,y)=c,wherec is a constant. In the above analogy, f(x,y) might represent the height of the land above sea-level in some hilly region, whilst g(x,y)=c is the equation of the path along which we walk. We could, of course, use the constraint g(x,y)=c to substitute for x or y in f(x,y), thereby obtaining a new function of only one variable whose stationary points could be found using the methods discussed in subsection 2.1.8. However, such a procedure can involve a lot of algebra and becomes very tedious for func- tions of more than two variables. A more direct method for solving such problems is the method of Lagrange undetermined multipliers, which we now discuss. To maximise f we require df = ∂f ∂x dx + ∂f ∂y dy =0. If dx and dy were independent, we could conclude f x =0=f y . However, here they are not independent, but constrained because g is constant: dg = ∂g ∂x dx + ∂g ∂y dy =0. Multiplying dg by an as yet unknown number λ and adding it to df we obtain d(f + λg)= parenleftbigg ∂f ∂x + λ ∂g ∂x parenrightbigg dx + parenleftbigg ∂f ∂y + λ ∂g ∂y parenrightbigg dy =0, where λ is called a Lagrange undetermined multiplier. In this equation dx and dy are to be independent and arbitrary; we must therefore choose λ such that ∂f ∂x + λ ∂g ∂x =0, (5.27) ∂f ∂y + λ ∂g ∂y =0. (5.28) These equations, together with the constraint g(x,y)=c, are suﬃcient to find the three unknowns, i.e. λ and the values of x and y at the stationary point. 168 5.9 STATIONARY VALUES UNDER CONSTRAINTS trianglerightsldThe temperature of a point (x,y) on a unit circle is given by T(x,y)=1+xy.Findthe temperature of the two hottest points on the circle. We need to maximise T(x,y) subject to the constraint x 2 + y 2 = 1. Applying (5.27) and (5.28), we obtain y +2λx =0, (5.29) x +2λy =0. (5.30) These results, together with the original constraint x 2 +y 2 = 1, provide three simultaneous equations that may be solved for λ, x and y. From (5.29) and (5.30) we find λ = ±1/2, which in turn implies that y = ∓x.Remem- bering that x 2 + y 2 = 1, we find that y = x ⇒ x =± 1 √ 2 ,y=± 1 √ 2 y =−x ⇒ x =∓ 1 √ 2 ,y=± 1 √ 2 . We have not yet determined which of these stationary points are maxima and which are minima. In this simple case, we need only substitute the four pairs of x-andy- values into T(x,y)=1+xy to find that the maximum temperature on the unit circle is T max =3/2at the points y = x =±1/ √ 2. triangleleftsld The method of Lagrange multipliers can be used to find the stationary points of functions of more than two variables, subject to several constraints, provided that the number of constraints is smaller than the number of variables. For example, if we wish to find the stationary points of f(x,y,z) subject to the constraints g(x,y,z)=c 1 and h(x,y,z)=c 2 ,wherec 1 and c 2 are constants, then we proceed as above, obtaining ∂ ∂x (f + λg + µh)= ∂f ∂x + λ ∂g ∂x + µ ∂h ∂x =0, ∂ ∂y (f + λg + µh)= ∂f ∂y + λ ∂g ∂y + µ ∂h ∂y =0, (5.31) ∂ ∂z (f + λg + µh)= ∂f ∂z + λ ∂g ∂z + µ ∂h ∂z =0. We may now solve these three equations, together with the two constraints, to give λ, µ, x, y and z. 169 PARTIAL DIFFERENTIATION trianglerightsldFind the stationary points of f(x,y,z)=x 3 +y 3 +z 3 subject to the following constraints: (i) g(x,y,z)=x 2 + y 2 + z 2 =1; (ii) g(x,y,z)=x 2 + y 2 + z 2 =1 and h(x,y,z)=x + y + z =0. Case (i). Since there is only one constraint in this case, we need only introduce a single Lagrange multiplier to obtain ∂ ∂x (f + λg)=3x 2 +2λx =0, ∂ ∂y (f + λg)=3y 2 +2λy =0, (5.32) ∂ ∂z (f + λg)=3z 2 +2λz =0. These equations are highly symmetrical and clearly have the solution x = y = z =−2λ/3. Using the constraint x 2 + y 2 + z 2 = 1 we find λ = ± √ 3/2 and so stationary points occur at x = y = z =± 1 √ 3 . (5.33) In solving the three equations (5.32) in this way, however, we have implicitly assumed that x, y and z are non-zero. However, it is clear from (5.32) that any of these values can equal zero, with the exception of the case x = y = z = 0 since this is prohibited by the constraint x 2 + y 2 + z 2 = 1. We must consider the other cases separately. If x = 0, for example, we require 3y 2 +2λy =0, 3z 2 +2λz =0, y 2 + z 2 =1. Clearly, we require λ negationslash= 0, otherwise these equations are inconsistent. If neither y nor z is zero we find y = −2λ/3=z andfromthethirdequationwerequirey = z = ±1/ √ 2. If y =0,however,thenz = ±1 and, similarly, if z =0theny = ±1. Thus the stationary points having x =0are(0,0,±1), (0,±1,0) and (0, ±1/ √ 2, ±1/ √ 2). A similar procedure can be followed for the cases y =0andz = 0 respectively and, in addition to those already obtained, we find the stationary points (±1,0,0), (±1/ √ 2, 0, ±1/ √ 2) and (±1/ √ 2, ±1/ √ 2, 0). Case (ii). We now have two constraints and must therefore introduce two Lagrange multipliers to obtain (cf. (5.31)) ∂ ∂x (f + λg + µh)=3x 2 +2λx + µ =0, (5.34) ∂ ∂y (f + λg + µh)=3y 2 +2λy + µ =0, (5.35) ∂ ∂z (f + λg + µh)=3z 2 +2λz + µ =0. (5.36) These equations are again highly symmetrical and the simplest way to proceed is to subtract (5.35) from (5.34) to obtain 3(x 2 −y 2 )+2λ(x−y)=0 ⇒ 3(x + y)(x−y)+2λ(x−y)=0. (5.37) This equation is clearly satisfied if x = y; then, from the second constraint, x + y + z =0, 170 5.9 STATIONARY VALUES UNDER CONSTRAINTS we find z = −2x. Substituting these values into the first constraint, x 2 + y 2 + z 2 =1,we obtain x =± 1 √ 6 ,y=± 1 √ 6 ,z=∓ 2 √ 6 . (5.38) Because of the high degree of symmetry amongst the equations (5.34)–(5.36), we may obtain by inspection two further relations analogous to (5.37), one containing the variables y,z and the other the variables x,z. Assuming y = z in the first relation and x = z in the second, we find the stationary points x =± 1 √ 6 ,y=∓ 2 √ 6 ,z=± 1 √ 6 (5.39) and x =∓ 2 √ 6 ,y=± 1 √ 6 ,z=± 1 √ 6 . (5.40) We note that in finding the stationary points (5.38)–(5.40) we did not need to evaluate the Lagrange multipliers λ and µ explicitly. This is not always the case, however, and in some problems it may be simpler to begin by finding the values of these multipliers. Returning to (5.37) we must now consider the case where xnegationslash= y; then we find 3(x + y)+2λ =0. (5.41) However, in obtaining the stationary points (5.39), (5.40), we did not assume x = y but only required y = z and x = z respectively. It is clear that xnegationslash= y at these stationary points, and it can be shown that they do indeed satisfy (5.41). Similarly, several stationary points for which xnegationslash= z or ynegationslash= z have already been found. Thus we need to consider further only two cases, x = y = z,andx, y and z are all diﬀerent. The first is clearly prohibited by the constraint x + y + z = 0. For the second case, (5.41) must be satisfied, together with the analogous equations containing y,z and x,z respectively, i.e. 3(x + y)+2λ =0, 3(y + z)+2λ =0, 3(x + z)+2λ =0. Adding these three equations together and using the constraint x+y+z = 0 we find λ =0. However, for λ = 0 the equations are inconsistent for non-zero x, y and z. Therefore all the stationary points have already been found and are given by (5.38)–(5.40). triangleleftsld The method may be extended to functions of any number n of variables subject to any smaller number m of constraints. This means that eﬀectively there are n−m independent variables and, as mentioned above, we could solve by substitution and then by the methods of the previous section. However, for large n this becomes cumbersome and the use of Lagrange undetermined multipliers is a useful simplification. 171 PARTIAL DIFFERENTIATION trianglerightsldA system contains a very large number N of particles, each of which can be in any of R energy levels with a corresponding energy E i , i =1,2,...,R. The number of particles in the ith level is n i and the total energy of the system is a constant, E. Find the distribution of particles amongst the energy levels that maximises the expression P = N! n 1 !n 2 !···n R ! , subject to the constraints that both the number of particles and the total energy remain constant, i.e. g = N− R summationdisplay i=1 n i =0 and h = E− R summationdisplay i=1 n i E i =0. The way in which we proceed is as follows. In order to maximise P, we must minimise its denominator (since the numerator is fixed). Minimising the denominator is the same as minimising the logarithm of the denominator, i.e. f =ln(n 1 !n 2 !···n R !) =ln(n 1 !) +ln(n 2 !) +···+ln(n R !). Using Stirling’s approximation, ln (n!)≈nlnn−n, we find that f = n 1 lnn 1 + n 2 lnn 2 +···+ n R lnn R −(n 1 + n 2 +···+ n R ) = parenleftBigg R summationdisplay i=1 n i lnn i parenrightBigg −N. It has been assumed here that, for the desired distribution, all the n i are large. Thus, we now have a function f subject to two constraints, g =0andh = 0, and we can apply the Lagrange method, obtaining (cf. (5.31)) ∂f ∂n 1 + λ ∂g ∂n 1 + µ ∂h ∂n 1 =0, ∂f ∂n 2 + λ ∂g ∂n 2 + µ ∂h ∂n 2 =0, . . . ∂f ∂n R + λ ∂g ∂n R + µ ∂h ∂n R =0. Since all these equations are alike, we consider the general case ∂f ∂n k + λ ∂g ∂n k + µ ∂h ∂n k =0, for k =1,2,...,R. Substituting the functions f, g and h into this relation we find n k n k +lnn k + λ(−1) + µ(−E k )=0, which can be rearranged to give lnn k = µE k + λ−1, and hence n k = C expµE k . 172 5.10 ENVELOPES We now have the general form for the distribution of particles amongst energy levels, but in order to determine the two constants µ, C we recall that R summationdisplay k=1 C expµE k = N and R summationdisplay k=1 CE k expµE k = E. This is known as the Boltzmann distribution and is a well-known result from statistical mechanics. triangleleftsld 5.10 Envelopes As noted at the start of this chapter, many of the functions with which physicists, chemists and engineers have to deal contain, in addition to constants and one or more variables, quantities that are normally considered as parameters of the system under study. Such parameters may, for example, represent the capacitance of a capacitor, the length of a rod, or the mass of a particle – quantities that are normally taken as fixed for any particular physical set-up. The corresponding variables may well be time, currents, charges, positions and velocities. However, the parameters could be varied and in this section we study the eﬀects of doing so; in particular we study how the form of dependence of one variable on another, typically y = y(x), is aﬀected when the value of a parameter is changed in a smooth and continuous way. In eﬀect, we are making the parameter into an additional variable. As a particular parameter, which we denote by α, is varied over its permitted range, the shape of the plot of y against x will change, usually, but not always, in a smooth and continuous way. For example, if the muzzle speed v of a shell fired from a gun is increased through a range of values then its height–distance trajectories will be a series of curves with a common starting point that are essentially just magnified copies of the original; furthermore the curves do not cross each other. However, if the muzzle speed is kept constant but θ, the angle of elevation of the gun, is increased through a series of values, the corresponding trajectories do not vary in a monotonic way. When θ has been increased beyond 45 ◦ the trajectories then do cross some of the trajectories corresponding to θ<45 ◦ . The trajectories for θ>45 ◦ all lie within a curve that touches each individual trajectory at one point. Such a curve is called the envelope to the set of trajectory solutions; it is to the study of such envelopes that this section is devoted. For our general discussion of envelopes we will consider an equation of the form f = f(x, y, α) = 0. A function of three Cartesian variables, f = f(x, y, α), is defined at all points in xyα-space, whereas f = f(x, y, α)=0isasurface in this space. A plane of constant α, which is parallel to the xy-plane, cuts such 173 PARTIAL DIFFERENTIATION P P 1 P 2 x y f(x,y,α 1 )=0 f(x,y,α 1 + h)=0 Figure 5.4 Two neighbouring curves in the xy-plane of the family f(x,y,α)= 0 intersecting at P.Forfixedα 1 , the point P 1 is the limiting position of P as h→0. As α 1 is varied, P 1 delineates the envelope of the family (broken line). a surface in a curve. Thus diﬀerent values of the parameter α correspond to diﬀerent curves, which can be plotted in the xy-plane. We now investigate how the envelope equation for such a family of curves is obtained. 5.10.1 Envelope equations Suppose f(x,y,α 1 )=0andf(x,y,α 1 + h) = 0 are two neighbouring curves of a family for which the parameter α diﬀers by a small amount h. Let them intersect at the point P with coordinates x,y, as shown in figure 5.4. Then the envelope, indicated by the broken line in the figure, touches f(x,y,α 1 ) = 0 at the point P 1 , which is defined as the limiting position of P when α 1 is fixed but h → 0. The full envelope is the curve traced out by P 1 as α 1 changes to generate successive members of the family of curves. Of course, for any finite h, f(x,y,α 1 + h)=0is one of these curves and the envelope touches it at the point P 2 . We are now going to apply Rolle’s theorem, see subsection 2.1.10, with the parameter α as the independent variable and x and y fixed as constants. In this context, the two curves in figure 5.4 can be thought of as the projections onto the xy-plane of the planar curves in which the surface f = f(x, y, α) = 0 meets the planes α = α 1 and α = α 1 + h. Along the normal to the page that passes through P,asα changes from α 1 to α 1 + h the value of f = f(x, y, α) will depart from zero, because the normal meets the surface f = f(x, y, α)=0onlyatα = α 1 and at α = α 1 + h. However, at these end points the values of f = f(x, y, α) will both be zero, and therefore equal. This allows us to apply Rolle’s theorem and so to conclude that for some θ in the range 0≤θ≤1 the partial derivative ∂f(x, y, α 1 +θh)/∂α is zero. When 174 5.10 ENVELOPES h is made arbitrarily small, so that P → P 1 , the three defining equations reduce to two, which define the envelope point P 1 : f(x,y,α 1 )=0 and ∂f(x,y,α 1 ) ∂α =0. (5.42) In (5.42) both the function and the gradient are evaluated at α = α 1 . The equation of the envelope g(x,y) = 0 is found by eliminating α 1 between the two equations. As a simple example we will now solve the problem which when posed mathe- matically reads ‘calculate the envelope appropriate to the family of straight lines in the xy-plane whose points of intersection with the coordinate axes are a fixed distance apart’. In more ordinary language, the problem is about a ladder leaning against a wall. trianglerightsldA ladder of length L stands on level ground and can be leaned at any angle against a vertical wall. Find the equation of the curve bounding the vertical area below the ladder. We take the ground and the wall as the x-andy-axes respectively. If the foot of the ladder is a from the foot of the wall and the top is b above the ground then the straight-line equation of the ladder is x a + y b =1, where a and b are connected by a 2 + b 2 = L 2 . Expressed in standard form with only one independent parameter, a, the equation becomes f(x,y,a)= x a + y (L 2 −a 2 ) 1/2 −1=0. (5.43) Now, diﬀerentiating (5.43) with respect to a and setting the derivative ∂f/∂a equal to zero gives − x a 2 + ay (L 2 −a 2 ) 3/2 =0; from which it follows that a = Lx 1/3 (x 2/3 + y 2/3 ) 1/2 and (L 2 −a 2 ) 1/2 = Ly 1/3 (x 2/3 + y 2/3 ) 1/2 . Eliminating a by substituting these values into (5.43) gives, for the equation of the envelope of all possible positions on the ladder, x 2/3 + y 2/3 = L 2/3 . This is the equation of an astroid (mentioned in exercise 2.19), and, together with the wall and the ground, marks the boundary of the vertical area below the ladder. triangleleftsld Other examples, drawn from both geometry and and the physical sciences, are considered in the exercises at the end of this chapter. The shell trajectory problem discussed earlier in this section is solved there, but in the guise of a question about the water bell of an ornamental fountain. 175 PARTIAL DIFFERENTIATION 5.11 Thermodynamic relations Thermodynamic relations provide a useful set of physical examples of partial diﬀerentiation. The relations we will derive are called Maxwell’s thermodynamic relations. They express relationships between four thermodynamic quantities de- scribing a unit mass of a substance. The quantities are the pressure P, the volume V, the thermodynamic temperature T and the entropy S of the substance. These four quantities are not independent; any two of them can be varied independently, but the other two are then determined. The first law of thermodynamics may be expressed as dU = TdS−PdV, (5.44) where U is the internal energy of the substance. Essentially this is a conservation of energy equation, but we shall concern ourselves, not with the physics, but rather with the use of partial diﬀerentials to relate the four basic quantities discussed above. The method involves writing a total diﬀerential, dU say, in terms of the diﬀerentials of two variables, say X and Y , thus dU = parenleftbigg ∂U ∂X parenrightbigg Y dX + parenleftbigg ∂U ∂Y parenrightbigg X dY , (5.45) and then using the relationship ∂ 2 U ∂X∂Y = ∂ 2 U ∂Y∂X to obtain the required Maxwell relation. The variables X and Y aretobechosen from P, V, T and S. trianglerightsldShow that (∂T/∂V) S =−(∂P/∂S) V . Here the two variables that have to be held constant, in turn, happen to be those whose diﬀerentials appear on the RHS of (5.44). And so, taking X as S and Y as V in (5.45), we have TdS−PdV= dU = parenleftbigg ∂U ∂S parenrightbigg V dS + parenleftbigg ∂U ∂V parenrightbigg S dV, and find directly that parenleftbigg ∂U ∂S parenrightbigg V = T and parenleftbigg ∂U ∂V parenrightbigg S =−P. Diﬀerentiating the first expression with respect to V and the second with respect to S,and using ∂ 2 U ∂V∂S = ∂ 2 U ∂S∂V , we find the Maxwell relation parenleftbigg ∂T ∂V parenrightbigg S =− parenleftbigg ∂P ∂S parenrightbigg V . triangleleftsld 176 5.11 THERMODYNAMIC RELATIONS trianglerightsldShow that (∂S/∂V) T =(∂P/∂T) V . Applying (5.45) to dS, with independent variables V and T, we find dU = TdS−PdV= T bracketleftbiggparenleftbigg ∂S ∂V parenrightbigg T dV + parenleftbigg ∂S ∂T parenrightbigg V dT bracketrightbigg −PdV. Similarly applying (5.45) to dU, we find dU = parenleftbigg ∂U ∂V parenrightbigg T dV + parenleftbigg ∂U ∂T parenrightbigg V dT. Thus, equating partial derivatives, parenleftbigg ∂U ∂V parenrightbigg T = T parenleftbigg ∂S ∂V parenrightbigg T −P and parenleftbigg ∂U ∂T parenrightbigg V = T parenleftbigg ∂S ∂T parenrightbigg V . But, since ∂ 2 U ∂T∂V = ∂ 2 U ∂V∂T , i.e. ∂ ∂T parenleftbigg ∂U ∂V parenrightbigg T = ∂ ∂V parenleftbigg ∂U ∂T parenrightbigg V , it follows that parenleftbigg ∂S ∂V parenrightbigg T + T ∂ 2 S ∂T∂V − parenleftbigg ∂P ∂T parenrightbigg V = ∂ ∂V bracketleftbigg T parenleftbigg ∂S ∂T parenrightbigg V bracketrightbigg T = T ∂ 2 S ∂V∂T . Thus finally we get the Maxwell relation parenleftbigg ∂S ∂V parenrightbigg T = parenleftbigg ∂P ∂T parenrightbigg V . triangleleftsld The above derivation is rather cumbersome, however, and a useful trick that can simplify the working is to define a new function, called a potential.The internal energy U discussed above is one example of a potential but three others are commonly defined and they are described below. trianglerightsldShow that (∂S/∂V) T =(∂P/∂T) V by considering the potential U−ST. We first consider the diﬀerential d(U−ST). From (5.5), we obtain d(U−ST)=dU−SdT−TdS =−SdT−PdV when use is made of (5.44). We rewrite U−ST as F for convenience of notation; F is called the Helmholtz potential. Thus dF =−SdT−PdV, and it follows that parenleftbigg ∂F ∂T parenrightbigg V =−S and parenleftbigg ∂F ∂V parenrightbigg T =−P. Using these results together with ∂ 2 F ∂T∂V = ∂ 2 F ∂V∂T , we can see immediately that parenleftbigg ∂S ∂V parenrightbigg T = parenleftbigg ∂P ∂T parenrightbigg V , which is the same Maxwell relation as before. triangleleftsld 177 PARTIAL DIFFERENTIATION Although the Helmholtz potential has other uses, in this context it has simply provided a means for a quick derivation of the Maxwell relation. The other Maxwell relations can be derived similarly by using two other potentials, the enthalpy, H = U + PV,andtheGibbs free energy, G = U + PV−ST (see exercise 5.25). 5.12 Diﬀerentiation of integrals We conclude this chapter with a discussion of the diﬀerentiation of integrals. Let us consider the indefinite integral (cf. equation (2.30)) F(x,t)= integraldisplay f(x,t)dt, from which it follows immediately that ∂F(x,t) ∂t = f(x,t). Assuming that the second partial derivatives of F(x,t) are continuous, we have ∂ 2 F(x,t) ∂t∂x = ∂ 2 F(x,t) ∂x∂t , and so we can write ∂ ∂t bracketleftbigg ∂F(x,t) ∂x bracketrightbigg = ∂ ∂x bracketleftbigg ∂F(x,t) ∂t bracketrightbigg = ∂f(x,t) ∂x . Integrating this equation with respect to t then gives ∂F(x,t) ∂x = integraldisplay ∂f(x,t) ∂x dt. (5.46) Now consider the definite integral I(x)= integraldisplay t=v t=u f(x,t)dt = F(x,v)−F(x,u), where u and v are constants. Diﬀerentiating this integral with respect to x,and using (5.46), we see that dI(x) dx = ∂F(x,v) ∂x − ∂F(x,u) ∂x = integraldisplay v ∂f(x,t) ∂x dt− integraldisplay u ∂f(x,t) ∂x dt = integraldisplay v u ∂f(x,t) ∂x dt. This is Leibnitz’ rule for diﬀerentiating integrals, and basically it states that for 178 5.13 EXERCISES constant limits of integration the order of integration and diﬀerentiation can be reversed. In the more general case where the limits of the integral are themselves functions of x, it follows immediately that I(x)= integraldisplay t=v(x) t=u(x) f(x,t)dt = F(x,v(x))−F(x,u(x)), which yields the partial derivatives ∂I ∂v = f(x,v(x)), ∂I ∂u =−f(x,u(x)). Consequently dI dx = parenleftbigg ∂I ∂v parenrightbigg dv dx + parenleftbigg ∂I ∂u parenrightbigg du dx + ∂I ∂x = f(x,v(x)) dv dx −f(x,u(x)) du dx + ∂ ∂x integraldisplay v(x) u(x) f(x,t)dt = f(x,v(x)) dv dx −f(x,u(x)) du dx + integraldisplay v(x) u(x) ∂f(x,t) ∂x dt, (5.47) where the partial derivative with respect to x in the last term has been taken inside the integral sign using (5.46). This procedure is valid because u(x)andv(x) are being held constant in this term. trianglerightsldFind the derivative with respect to x of the integral I(x)= integraldisplay x 2 x sinxt t dt. Applying (5.47), we see that dI dx = sinx 3 x 2 (2x)− sinx 2 x (1) + integraldisplay x 2 x tcosxt t dt = 2sinx 3 x − sinx 2 x + bracketleftbigg sinxt x bracketrightbigg x 2 x =3 sinx 3 x −2 sinx 2 x = 1 x (3 sinx 3 −2sinx 2 ). triangleleftsld 5.13 Exercises 5.1 Using the appropriate properties of ordinary derivatives, perform the following. 179 PARTIAL DIFFERENTIATION (a) Find all the first partial derivatives of the following functions f(x,y): (i) x 2 y, (ii) x 2 + y 2 + 4, (iii) sin(x/y), (iv) tan −1 (y/x), (v) r(x,y,z)=(x 2 + y 2 + z 2 ) 1/2 . (b) For (i), (ii) and (v), find ∂ 2 f/∂x 2 , ∂ 2 f/∂y 2 and ∂ 2 f/∂x∂y. (c) For (iv) verify that ∂ 2 f/∂x∂y = ∂ 2 f/∂y∂x. 5.2 Determine which of the following are exact diﬀerentials: (a) (3x +2)ydx+ x(x +1)dy; (b) y tanxdx+ xtanydy; (c) y 2 (lnx +1)dx +2xy lnxdy; (d) y 2 (lnx +1)dy +2xy lnxdx; (e) [x/(x 2 + y 2 )]dy−[y/(x 2 + y 2 )]dx. 5.3 Show that the diﬀerential df = x 2 dy−(y 2 + xy)dx is not exact, but that dg =(xy 2 ) −1 df is exact. 5.4 Show that df = y(1 + x−x 2 )dx + x(x +1)dy is not an exact diﬀerential. Find the diﬀerential equation that a function g(x)mustsatisfyifdφ = g(x)df is to be an exact diﬀerential. Verify that g(x)=e −x is a solution of this equation and deduce the form of φ(x,y). 5.5 The equation 3y = z 3 +3xz defines z implicitly as a function of x and y. Evaluate all three second partial derivatives of z with respect to x and/or y.Verifythatz is a solution of x ∂ 2 z ∂y 2 + ∂ 2 z ∂x 2 =0. 5.6 A possible equation of state for a gas takes the form PV = RT exp parenleftBig − α VRT parenrightBig , in which α and R are constants. Calculate expressions for parenleftbigg ∂P ∂V parenrightbigg T , parenleftbigg ∂V ∂T parenrightbigg P , parenleftbigg ∂T ∂P parenrightbigg V , and show that their product is −1, as stated in section 5.4. 5.7 The function G(t) is defined by G(t)=F(x,y)=x 2 + y 2 +3xy, where x(t)=at 2 and y(t)=2at. Use the chain rule to find the values of (x,y)at which G(t) has stationary values as a function of t. Do any of them correspond to the stationary points of F(x,y) as a function of x and y? 5.8 In the xy-plane, new coordinates s and t are defined by s = 1 2 (x + y),t= 1 2 (x−y). Transform the equation ∂ 2 φ ∂x 2 − ∂ 2 φ ∂y 2 =0 into the new coordinates and deduce that its general solution can be written φ(x,y)=f(x + y)+g(x−y), where f(u)andg(v) are arbitrary functions of u and v, respectively. 180 5.13 EXERCISES 5.9 The function f(x,y) satisfies the diﬀerential equation y ∂f ∂x + x ∂f ∂y =0. By changing to new variables u = x 2 −y 2 and v =2xy, show that f is, in fact, a function of x 2 −y 2 only. 5.10 If x = e u cosθ and y = e u sinθ, show that ∂ 2 φ ∂u 2 + ∂ 2 φ ∂θ 2 =(x 2 + y 2 ) parenleftbigg ∂ 2 f ∂x 2 + ∂ 2 f ∂y 2 parenrightbigg , where f(x,y)=φ(u,θ). 5.11 Find and evaluate the maxima, minima and saddle points of the function f(x,y)=xy(x 2 + y 2 −1). 5.12 Show that f(x,y)=x 3 −12xy +48x + by 2 ,bnegationslash=0, has two, one, or zero stationary points, according to whether |b| is less than, equal to, or greater than 3. 5.13 Locate the stationary points of the function f(x,y)=(x 2 −2y 2 )exp[−(x 2 + y 2 )/a 2 ], where a is a non-zero constant. Sketch the function along the x-andy-axes and hence identify the nature and values of the stationary points. 5.14 Find the stationary points of the function f(x,y)=x 3 + xy 2 −12x−y 2 and identify their natures. 5.15 Find the stationary values of f(x,y)=4x 2 +4y 2 + x 4 −6x 2 y 2 + y 4 and classify them as maxima, minima or saddle points. Make a rough sketch of the contours of f in the quarter plane x,y≥0. 5.16 The temperature of a point (x,y,z) on the unit sphere is given by T(x,y,z)=1+xy + yz. By using the method of Lagrange multipliers, find the temperature of the hottest point on the sphere. 5.17 A rectangular parallelepiped has all eight vertices on the ellipsoid x 2 +3y 2 +3z 2 =1. Using the symmetry of the parallelepiped about each of the planes x =0, y =0,z = 0, write down the surface area of the parallelepiped in terms of the coordinates of the vertex that lies in the octant x,y,z ≥ 0. Hence find the maximum value of the surface area of such a parallelepiped. 5.18 Two horizontal corridors, 0≤x≤a with y≥0, and 0≤y≤b with x≥0, meet at right angles. Find the length L of the longest ladder (considered as a stick) that may be carried horizontally around the corner. 5.19 A barn is to be constructed with a uniform cross-sectional area A throughout its length. The cross-section is to be a rectangle of wall height h (fixed) and width w, surmounted by an isosceles triangular roof that makes an angle θ with 181 PARTIAL DIFFERENTIATION the horizontal. The cost of construction is α perunitheightofwallandβ per unit (slope) length of roof. Show that, irrespective of the values of α and β,to minimise costs w should be chosen to satisfy the equation w 4 =16A(A−wh), and θ made such that 2 tan2θ = w/h. 5.20 Show that the envelope of all concentric ellipses that have their axes along the x-andy-coordinate axes, and that have the sum of their semi-axes equal to a constant L, is the same curve (an astroid) as that found in the worked example in section 5.10. 5.21 Find the area of the region covered by points on the lines x a + y b =1, where the sum of any line’s intercepts on the coordinate axes is fixed and equal to c. 5.22 Prove that the envelope of the circles whose diameters are those chords of a given circle that pass through a fixed point on its circumference, is the cardioid r = a(1 + cosθ). Here a is the radius of the given circle and (r,θ) are the polar coordinates of the envelope. Take as the system parameter the angle φ between a chord and the polar axis from which θ is measured. 5.23 A water feature contains a spray head at water level at the centre of a round basin. The head is in the form of a small hemisphere perforated by many evenly distributed small holes, through which water spurts out at the same speed, v 0 ,in all directions. (a) What is the shape of the ‘water bell’ so formed? (b) What must be the minimum diameter of the bowl if no water is to be lost? 5.24 In order to make a focussing mirror that concentrates parallel axial rays to one spot (or conversely forms a parallel beam from a point source), a parabolic shape should be adopted. If a mirror that is part of a circular cylinder or sphere were used, the light would be spread out along a curve. This curve is known as a caustic and is the envelope of the rays reflected from the mirror. Denoting by θ the angle which a typical incident axial ray makes with the normal to the mirror at the place where it is reflected, the geometry of reflection (the angle of incidence equals the angle of reflection) is shown in figure 5.5. Show that a parametric specification of the caustic is x = R cosθ parenleftbig 1 2 +sin 2 θ parenrightbig ,y= R sin 3 θ, where R is the radius of curvature of the mirror. The curve is, in fact, part of an epicycloid. 5.25 By considering the diﬀerential dG = d(U + PV−ST), where G is the Gibbs free energy, P the pressure, V the volume, S the entropy and T the temperature of a system, and given further that the internal energy U satisfies dU = TdS−PdV, derive a Maxwell relation connecting (∂V/∂T) P and (∂S/∂P) T . 182 5.13 EXERCISES O R x y θ θ 2θ Figure 5.5 The reflecting mirror discussed in exercise 5.24. 5.26 Functions P(V,T), U(V,T)andS(V,T) are related by TdS= dU + PdV, where the symbols have the same meaning as in the previous question. The pressure P is known from experiment to have the form P = T 4 3 + T V , in appropriate units. If U = αVT 4 + βT, where α, β, are constants (or, at least, do not depend on T or V), deduce that α must have a specific value, but that β may have any value. Find the corresponding form of S. 5.27 As in the previous two exercises on the thermodynamics of a simple gas, the quantity dS = T −1 (dU + PdV) is an exact diﬀerential. Use this to prove that parenleftbigg ∂U ∂V parenrightbigg T = T parenleftbigg ∂P ∂T parenrightbigg V −P. In the van der Waals model of a gas, P obeys the equation P = RT V −b − a V 2 , where R, a and b are constants. Further, in the limit V →∞, the form of U becomes U = cT, where c is another constant. Find the complete expression for U(V,T). 5.28 The entropy S(H,T), the magnetisation M(H,T) and the internal energy U(H,T) of a magnetic salt placed in a magnetic field of strength H, at temperature T, are connected by the equation TdS = dU−HdM. 183 PARTIAL DIFFERENTIATION By considering d(U−TS−HM) prove that parenleftbigg ∂M ∂T parenrightbigg H = parenleftbigg ∂S ∂H parenrightbigg T . For a particular salt, M(H,T)=M 0 [1−exp(−αH/T)]. Show that if, at a fixed temperature, the applied field is increased from zero to a strength such that the magnetization of the salt is 3 4 M 0 , then the salt’s entropy decreases by an amount M 0 4α (3−ln 4). 5.29 Using the results of section 5.12, evaluate the integral I(y)= integraldisplay ∞ 0 e −xy sinx x dx. Hence show that J = integraldisplay ∞ 0 sinx x dx = π 2 . 5.30 The integral integraldisplay ∞ −∞ e −αx 2 dx has the value (π/α) 1/2 . Use this result to evaluate J(n)= integraldisplay ∞ −∞ x 2n e −x 2 dx, where n is a positive integer. Express your answer in terms of factorials. 5.31 The function f(x) is diﬀerentiable and f(0) = 0. A second function g(y) is defined by g(y)= integraldisplay y 0 f(x)dx √ y−x . Prove that dg dy = integraldisplay y 0 df dx dx √ y−x . For the case f(x)=x n , prove that d n g dy n =2(n!) √ y. 5.32 The functions f(x,t)andF(x) are defined by f(x,t)=e −xt , F(x)= integraldisplay x 0 f(x,t)dt. Verify, by explicit calculation, that dF dx = f(x,x)+ integraldisplay x 0 ∂f(x,t) ∂x dt. 184 5.14 HINTS AND ANSWERS 5.33 If I(α)= integraldisplay 1 0 x α −1 lnx dx, α >−1, what is the value of I(0)? Show that d dα x α = x α lnx, and deduce that d dα I(α)= 1 α +1 . Hence prove that I(α)=ln(1+α). 5.34 Find the derivative, with respect to x, of the integral I(x)= integraldisplay 3x x expxtdt. 5.35 The function G(t,ξ) is defined for 0≤t≤π by G(t,ξ)= braceleftBigg −costsinξ for ξ ≤t, −sintcosξ for ξ>t. Show that the function x(t) defined by x(t)= integraldisplay π 0 G(t,ξ)f(ξ)dξ satisfies the equation d 2 x dt 2 + x = f(t), where f(t)canbeany arbitrary (continuous) function. Show further that x(0) = [dx/dt] t=π = 0, again for any f(t), but that the value of x(π) does depend upon the form of f(t). [The function G(t,ξ) is an example of a Green’s function, an important concept in the solution of diﬀerential equations and one studied extensively in later chapters.] 5.14 Hints and answers 5.1 (a) (i) 2xy,x 2 ; (ii) 2x,2y; (iii) y −1 cos(x/y),(−x/y 2 )cos(x/y); (iv) −y/(x 2 + y 2 ),x/(x 2 + y 2 ); (v) x/r,y/r,z/r. (b) (i) 2y,0,2x; (ii) 2,2,0; (v) (y 2 + z 2 )r −3 ,(x 2 + z 2 )r −3 ,−xyr −3 . (c) Both second derivatives are equal to (y 2 −x 2 )(x 2 + y 2 ) −2 . 5.3 2xnegationslash=−2y−x. For g, both sides of equation (5.9) equal y −2 . 5.5 ∂ 2 z/∂x 2 =2xz(z 2 +x) −3 , ∂ 2 z/∂x∂y =(z 2 −x)(z 2 +x) −3 , ∂ 2 z/∂y 2 =−2z(z 2 +x) −3 . 5.7 (0,0), (a/4,−a)and(16a,−8a). Only the saddle point at (0,0). 5.9 The transformed equation is 2(x 2 +y 2 )∂f/∂v = 0; hence f does not depend on v. 5.11 Maxima, equal to 1/8, at ±(1/2,−1/2), minima, equal to −1/8, at ±(1/2,1/2), saddle points, equalling 0, at (0,0), (0,±1), (±1,0). 5.13 Maxima equal to a 2 e −1 at (±a,0), minima equal to −2a 2 e −1 at (0,±a), saddle point equalling 0 at (0,0). 5.15 Minimum at (0,0); saddle points at (±1,±1). To help with sketching the contours, determine the behaviour of g(x)=f(x,x). 5.17 The Lagrange multiplier method gives z = y = x/2, for a maximal area of 4. 185 PARTIAL DIFFERENTIATION 5.19 The cost always includes 2αh, which can therefore be ignored in the optimisation. With Lagrange multiplier λ,sinθ = λw/(4β)andβ secθ− 1 2 λwtanθ = λh, leading to the stated results. 5.21 The envelope of the lines x/a+y/(c−a)−1 = 0, as a is varied, is √ x+ √ y = √ c. Area = c 2 /6. 5.23 (a) Using α =cotθ,whereθ is the initial angle a jet makes with the vertical, the equation is f(z,ρ,α)=z−ρα+[gρ 2 (1+α 2 )/(2v 2 0 )], and setting ∂f/∂α = 0 gives α = v 2 0 /(gρ). The water bell has a parabolic profile z = v 2 0 /(2g)−gρ 2 /(2v 2 0 ). (b) Setting z = 0 gives the minimum diameter as 2v 2 0 /g. 5.25 Show that (∂G/∂P) T = V and (∂G/∂T) P = −S. From each result, obtain an expression for ∂ 2 G/∂T∂P and equate these, giving (∂V/∂T) P =−(∂S/∂P) T . 5.27 Find expressions for (∂S/∂V) T and (∂S/∂T) V , and equate ∂ 2 S/∂V∂T with ∂ 2 S/∂T∂V. U(V,T)=cT −aV −1 . 5.29 dI/dy =−Im[ integraltext ∞ 0 exp(−xy + ix)dx]=−1/(1 + y 2 ). Integrate dI/dy from 0 to ∞. I(∞)=0andI(0) = J. 5.31 Integrate the RHS of the equation by parts, before diﬀerentiating with respect to y. Repeated application of the method establishes the result for all orders of derivative. 5.33 I(0) = 0; use Leibnitz’ rule. 5.35 Write x(t)=−cost integraltext t 0 sinξf(ξ)dξ−sint integraltext π t cosξf(ξ)dξ and diﬀerentiate each term as a product to obtain dx/dt.Obtaind 2 x/dt 2 in a similar way. Note that integrals that have equal lower and upper limits have value zero. The value of x(π)is integraltext π 0 sinξf(ξ)dξ. 186 6 Multiple integrals For functions of several variables, just as we may consider derivatives with respect to two or more of them, so may the integral of the function with respect to more than one variable be formed. The formal definitions of such multiple integrals are extensions of that for a single variable, discussed in chapter 2. We first discuss double and triple integrals and illustrate some of their applications. We then consider changing the variables in multiple integrals and discuss some general properties of Jacobians. 6.1 Double integrals For an integral involving two variables – a double integral – we have a function, f(x,y) say, to be integrated with respect to x and y between certain limits. These limits can usually be represented by a closed curve C bounding a region R in the xy-plane. Following the discussion of single integrals given in chapter 2, let us divide the region R into N subregions ∆R p of area ∆A p , p =1,2,...,N, and let (x p ,y p ) be any point in subregion ∆R p . Now consider the sum S = N summationdisplay p=1 f(x p ,y p )∆A p , and let N →∞as each of the areas ∆A p → 0. If the sum S tends to a unique limit, I, then this is called the double integral of f(x,y) over the region R and is written I = integraldisplay R f(x,y)dA, (6.1) where dA stands for the element of area in the xy-plane. By choosing the subregions to be small rectangles each of area ∆A =∆x∆y, and letting both ∆x 187 MULTIPLE INTEGRALS V U C T S dx dy R dA = dxdy y d c a b x Figure 6.1 A simple curve C in the xy-plane, enclosing a region R. and ∆y→0, we can also write the integral as I = integraldisplayintegraldisplay R f(x,y)dxdy, (6.2) where we have written out the element of area explicitly as the product of the two coordinate diﬀerentials (see figure 6.1). Some authors use a single integration symbol whatever the dimension of the integral; others use as many symbols as the dimension. In diﬀerent circumstances both have their advantages. We will adopt the convention used in (6.1) and (6.2), that as many integration symbols will be used as diﬀerentials explicitly written. The form (6.2) gives us a clue as to how we may proceed in the evaluation of a double integral. Referring to figure 6.1, the limits on the integration may bewrittenasanequationc(x,y) = 0 giving the boundary curve C. However, an explicit statement of the limits can be written in two distinct ways. One way of evaluating the integral is first to sum up the contributions from the small rectangular elemental areas in a horizontal strip of width dy (as shown in the figure) and then to combine the contributions of these horizontal strips to cover the region R. In this case, we write I = integraldisplay y=d y=c braceleftbiggintegraldisplay x=x2(y) x=x1(y) f(x,y)dx bracerightbigg dy, (6.3) where x = x 1 (y)andx = x 2 (y) are the equations of the curves TSV and TUV respectively. This expression indicates that first f(x,y)istobeintegratedwith respect to x (treating y as a constant) between the values x = x 1 (y)andx = x 2 (y) and then the result, considered as a function of y, is to be integrated between the limits y = c and y = d. Thus the double integral is evaluated by expressing it in terms of two single integrals called iterated (or repeated) integrals. 188 6.1 DOUBLE INTEGRALS An alternative way of evaluating the integral, however, is first to sum up the contributions from the elemental rectangles arranged into vertical strips and then to combine these vertical strips to cover the region R.Wethenwrite I = integraldisplay x=b x=a braceleftbiggintegraldisplay y=y2(x) y=y1(x) f(x,y)dy bracerightbigg dx, (6.4) where y = y 1 (x)andy = y 2 (x) are the equations of the curves STU and SVU respectively. In going to (6.4) from (6.3), we have essentially interchanged the order of integration. In the discussion above we assumed that the curve C was such that any line parallel to either the x-ory-axis intersected C at most twice. In general, provided f(x,y) is continuous everywhere in R and the boundary curve C has this simple shape, the same result is obtained irrespective of the order of integration. In cases where the region R has a more complicated shape, it can usually be subdivided into smaller simpler regions R 1 , R 2 etc. that satisfy this criterion. The double integral over R is then merely the sum of the double integrals over the subregions. trianglerightsldEvaluate the double integral I = integraldisplayintegraldisplay R x 2 ydxdy, where R is the triangular area bounded by the lines x =0, y =0and x + y =1. Reverse the order of integration and demonstrate that the same result is obtained. The area of integration is shown in figure 6.2. Suppose we choose to carry out the integration with respect to y first. With x fixed, the range of y is 0 to 1−x.Wecan therefore write I = integraldisplay x=1 x=0 braceleftbiggintegraldisplay y=1−x y=0 x 2 ydy bracerightbigg dx = integraldisplay x=1 x=0 bracketleftbigg x 2 y 2 2 bracketrightbigg y=1−x y=0 dx = integraldisplay 1 0 x 2 (1−x) 2 2 dx = 1 60 . Alternatively, we may choose to perform the integration with respect to x first. With y fixed, the range of x is 0 to 1−y, so we have I = integraldisplay y=1 y=0 braceleftbiggintegraldisplay x=1−y x=0 x 2 ydx bracerightbigg dy = integraldisplay y=1 y=0 bracketleftbigg x 3 y 3 bracketrightbigg x=1−y x=0 dx = integraldisplay 1 0 (1−y) 3 y 3 dy = 1 60 . As expected, we obtain the same result irrespective of the order of integration. triangleleftsld We may avoid the use of braces in expressions such as (6.3) and (6.4) by writing (6.4), for example, as I = integraldisplay b a dx integraldisplay y2(x) y1(x) dyf(x,y), where it is understood that each integral symbol acts on everything to its right, 189 MULTIPLE INTEGRALS y 1 1 dy 0 0 dx x x + y =1 R Figure 6.2 The triangular region whose sides are the axes x =0,y =0and the line x + y =1. and that the order of integration is from right to left. So, in this example, the integrand f(x,y) is first to be integrated with respect to y and then with respect to x. With the double integral expressed in this way, we will no longer write the independent variables explicitly in the limits of integration, since the diﬀerential of the variable with respect to which we are integrating is always adjacent to the relevant integral sign. Using the order of integration in (6.3), we could also write the double integral as I = integraldisplay d c dy integraldisplay x2(y) x1(y) dxf(x,y). Occasionally, however, interchange of the order of integration in a double integral is not permissible, as it yields a diﬀerent result. For example, diﬃculties might arise if the region R were unbounded with some of the limits infinite, though in many cases involving infinite limits the same result is obtained whichever order of integration is used. Diﬃculties can also occur if the integrand f(x,y) has any discontinuities in the region R or on its boundary C. 6.2 Triple integrals The above discussion for double integrals can easily be extended to triple integrals. Consider the function f(x,y,z) defined in a closed three-dimensional region R. Proceeding as we did for double integrals, let us divide the region R into N subregions ∆R p of volume ∆V p , p =1,2,...,N,andlet(x p ,y p ,z p ) be any point in the subregion ∆R p . Now we form the sum S = N summationdisplay p=1 f(x p ,y p ,z p )∆V p , 190 6.3 APPLICATIONS OF MULTIPLE INTEGRALS and let N →∞as each of the volumes ∆V p →0. If the sum S tends to a unique limit, I, then this is called the triple integral of f(x,y,z) over the region R and is written I = integraldisplay R f(x,y,z)dV, (6.5) where dV stands for the element of volume. By choosing the subregions to be small cuboids, each of volume ∆V =∆x∆y∆z, and proceeding to the limit, we canalsowritetheintegralas I = integraldisplayintegraldisplayintegraldisplay R f(x,y,z)dxdydz, (6.6) where we have written out the element of volume explicitly as the product of the three coordinate diﬀerentials. Extending the notation used for double integrals, we may write triple integrals as three iterated integrals, for example, I = integraldisplay x2 x1 dx integraldisplay y2(x) y1(x) dy integraldisplay z2(x,y) z1(x,y) dzf(x,y,z), where the limits on each of the integrals describe the values that x, y and z take on the boundary of the region R. As for double integrals, in most cases the order of integration does not aﬀect the value of the integral. We can extend these ideas to define multiple integrals of higher dimensionality in a similar way. 6.3 Applications of multiple integrals Multiple integrals have many uses in the physical sciences, since there are numer- ous physical quantities which can be written in terms of them. We now discuss a few of the more common examples. 6.3.1 Areas and volumes Multiple integrals are often used in finding areas and volumes. For example, the integral A = integraldisplay R dA = integraldisplayintegraldisplay R dxdy is simply equal to the area of the region R. Similarly, if we consider the surface z = f(x,y) in three-dimensional Cartesian coordinates then the volume under this surface that stands vertically above the region R is given by the integral V = integraldisplay R zdA= integraldisplayintegraldisplay R f(x,y)dxdy, where volumes above the xy-plane are counted as positive, and those below as negative. 191 MULTIPLE INTEGRALS z c dx a x dz dy b y dV = dxdydz Figure 6.3 The tetrahedron bounded by the coordinate surfaces and the plane x/a + y/b + z/c = 1 is divided up into vertical slabs, the slabs into columns and the columns into small boxes. trianglerightsldFind the volume of the tetrahedron bounded by the three coordinate surfaces x =0, y =0 and z =0and the plane x/a + y/b+ z/c =1. Referring to figure 6.3, the elemental volume of the shaded region is given by dV = zdxdy, and we must integrate over the triangular region R in the xy-plane whose sides are x =0, y =0andy = b−bx/a. The total volume of the tetrahedron is therefore given by V = integraldisplayintegraldisplay R zdxdy= integraldisplay a 0 dx integraldisplay b−bx/a 0 dyc parenleftBig 1− y b − x a parenrightBig = c integraldisplay a 0 dx bracketleftbigg y− y 2 2b − xy a bracketrightbigg y=b−bx/a y=0 = c integraldisplay a 0 dx parenleftbigg bx 2 2a 2 − bx a + b 2 parenrightbigg = abc 6 . triangleleftsld Alternatively, we can write the volume of a three-dimensional region R as V = integraldisplay R dV = integraldisplayintegraldisplayintegraldisplay R dxdydz, (6.7) where the only diﬃculty occurs in setting the correct limits on each of the integrals. For the above example, writing the volume in this way corresponds to dividing the tetrahedron into elemental boxes of volume dxdydz (as shown in figure 6.3); integration over z then adds up the boxes to form the shaded column in the figure. The limits of integration are z =0toz = c parenleftbig 1−y/b−x/a parenrightbig ,and 192 6.3 APPLICATIONS OF MULTIPLE INTEGRALS the total volume of the tetrahedron is given by V = integraldisplay a 0 dx integraldisplay b−bx/a 0 dy integraldisplay c(1−y/b−x/a) 0 dz, (6.8) which clearly gives the same result as above. This method is illustrated further in the following example. trianglerightsldFind the volume of the region bounded by the paraboloid z = x 2 + y 2 and the plane z =2y. The required region is shown in figure 6.4. In order to write the volume of the region in the form (6.7), we must deduce the limits on each of the integrals. Since the integrations can be performed in any order, let us first divide the region into vertical slabs of thickness dy perpendicular to the y-axis, and then as shown in the figure we cut each slab into horizontal strips of height dz, and each strip into elemental boxes of volume dV = dxdydz. Integrating first with respect to x (adding up the elemental boxes to get a horizontal strip), the limits on x are x = − radicalbig z−y 2 to x = radicalbig z−y 2 . Now integrating with respect to z (adding up the strips to form a vertical slab) the limits on z are z = y 2 to z =2y. Finally, integrating with respect to y (adding up the slabs to obtain the required region), the limits on y are y =0andy = 2, the solutions of the simultaneous equations z =0 2 + y 2 and z =2y. So the volume of the region is V = integraldisplay 2 0 dy integraldisplay 2y y 2 dz integraldisplay √ z−y 2 − √ z−y 2 dx = integraldisplay 2 0 dy integraldisplay 2y y 2 dz 2 radicalbig z−y 2 = integraldisplay 2 0 dy bracketleftbig 4 3 (z−y 2 ) 3/2 bracketrightbig z=2y z=y 2 = integraldisplay 2 0 dy 4 3 (2y−y 2 ) 3/2 . The integral over y may be evaluated straightforwardly by making the substitution y = 1+sinu, and gives V = π/2. triangleleftsld In general, when calculating the volume (area) of a region, the volume (area) elements need not be small boxes as in the previous example, but may be of any convenient shape. The latter is usually chosen to make evaluation of the integral as simple as possible. 6.3.2 Masses, centres of mass and centroids It is sometimes necessary to calculate the mass of a given object having a non- uniform density. Symbolically, this mass is given simply by M = integraldisplay dM, where dM is the element of mass and the integral is taken over the extent of the object. For a solid three-dimensional body the element of mass is just dM = ρdV, where dV is an element of volume and ρ is the variable density. For a laminar body (i.e. a uniform sheet of material) the element of mass is dM = σdA,where σ is the mass per unit area of the body and dA is an area element. Finally, for a body in the form of a thin wire we have dM = λds,whereλ is the mass per 193 MULTIPLE INTEGRALS z 0 2 y dV = dxdydz z =2y z = x 2 + y 2 x Figure 6.4 The region bounded by the paraboloid z = x 2 +y 2 and the plane z =2y is divided into vertical slabs, the slabs into horizontal strips and the strips into boxes. unit length and ds is an element of arc length along the wire. When evaluating the required integral, we are free to divide up the body into mass elements in the most convenient way, provided that over each mass element the density is approximately constant. trianglerightsld Find the mass of the tetrahedron bounded by the three coordinate surfaces and the plane x/a + y/b+ z/c =1, if its density is given by ρ(x,y,z)=ρ 0 (1 + x/a). From (6.8), we can immediately write down the mass of the tetrahedron as M = integraldisplay R ρ 0 parenleftBig 1+ x a parenrightBig dV = integraldisplay a 0 dxρ 0 parenleftBig 1+ x a parenrightBig integraldisplay b−bx/a 0 dy integraldisplay c(1−y/b−x/a) 0 dz, where we have taken the density outside the integrations with respect to z and y since it depends only on x. Therefore the integrations with respect to z and y proceed exactly as they did when finding the volume of the tetrahedron, and we have M = cρ 0 integraldisplay a 0 dx parenleftBig 1+ x a parenrightBig parenleftbigg bx 2 2a 2 − bx a + b 2 parenrightbigg . (6.9) We could have arrived at (6.9) more directly by dividing the tetrahedron into triangular slabs of thickness dx perpendicular to the x-axis (see figure 6.3), each of which is of constant density, since ρ depends on x alone. A slab at a position x has volume dV = 1 2 c(1−x/a)(b−bx/a)dx and mass dM = ρdV = ρ 0 (1 + x/a)dV. Integrating over x we again obtain (6.9). This integral is easily evaluated and gives M = 5 24 abcρ 0 . triangleleftsld 194 6.3 APPLICATIONS OF MULTIPLE INTEGRALS The coordinates of the centre of mass of a solid or laminar body may also be written as multiple integrals. The centre of mass of a body has coordinates ¯x, ¯y, ¯z given by the three equations ¯x integraldisplay dM = integraldisplay xdM ¯y integraldisplay dM = integraldisplay ydM ¯z integraldisplay dM = integraldisplay zdM, where again dM is an element of mass as described above, x, y, z are the coordinates of the centre of mass of the element dM and the integrals are taken over the entire body. Obviously, for any body that lies entirely in, or is symmetrical about, the xy-plane (say), we immediately have ¯z = 0. For completeness, we note that the three equations above can be written as the single vector equation (see chapter 7) ¯r = 1 M integraldisplay rdM, where ¯r is the position vector of the body’s centre of mass with respect to the origin, r is the position vector of the centre of mass of the element dM and M = integraltext dM is the total mass of the body. As previously, we may divide the body into the most convenient mass elements for evaluating the necessary integrals, provided each mass element is of constant density. We further note that the coordinates of the centroid of a body are defined as those of its centre of mass if the body had uniform density. trianglerightsldFind the centre of mass of the solid hemisphere bounded by the surfaces x 2 +y 2 +z 2 = a 2 and the xy-plane, assuming that it has a uniform density ρ. Referring to figure 6.5, we know from symmetry that the centre of mass must lie on the z-axis. Let us divide the hemisphere into volume elements that are circular slabs of thickness dz parallel to the xy-plane. For a slab at a height z, the mass of the element is dM = ρdV = ρπ(a 2 −z 2 )dz. Integrating over z, we find that the z-coordinate of the centre of mass of the hemisphere is given by ¯z integraldisplay a 0 ρπ(a 2 −z 2 )dz = integraldisplay a 0 zρπ(a 2 −z 2 )dz. The integrals are easily evaluated and give ¯z =3a/8. Since the hemisphere is of uniform density, this is also the position of its centroid. triangleleftsld 6.3.3 Pappus’ theorems The theorems of Pappus (which are about seventeen centuries old) relate centroids to volumes of revolution and areas of surfaces, discussed in chapter 2, and may be useful for finding one quantity given another that can be calculated more easily. 195 MULTIPLE INTEGRALS z x y a a a √ a 2 −z 2 dz Figure 6.5 The solid hemisphere bounded by the surfaces x 2 + y 2 + z 2 = a 2 and the xy-plane. A y y x dA ¯y Figure 6.6 An area A in the xy-plane, which may be rotated about the x-axis to form a volume of revolution. If a plane area is rotated about an axis that does not intersect it then the solid so generated is called a volume of revolution. Pappus’ first theorem states that the volume of such a solid is given by the plane area A multiplied by the distance moved by its centroid (see figure 6.6). This may be proved by considering the definition of the centroid of the plane area as the position of the centre of mass if the density is uniform, so that ¯y = 1 A integraldisplay ydA. Now the volume generated by rotating the plane area about the x-axis is given by V = integraldisplay 2πydA =2π¯yA, which is the area multiplied by the distance moved by the centroid. 196 6.3 APPLICATIONS OF MULTIPLE INTEGRALS y y ds ¯y x Figure 6.7 A curve in the xy-plane, which may be rotated about the x-axis to form a surface of revolution. Pappus’ second theorem states that if a plane curve is rotated about a coplanar axis that does not intersect it then the area of the surface of revolution so generated is given by the length of the curve L multiplied by the distance moved by its centroid (see figure 6.7). This may be proved in a similar manner to the first theorem by considering the definition of the centroid of a plane curve, ¯y = 1 L integraldisplay yds, and noting that the surface area generated is given by S = integraldisplay 2πyds =2π¯yL, which is equal to the length of the curve multiplied by the distance moved by its centroid. trianglerightsld A semicircular uniform lamina is freely suspended from one of its corners. Show that its straight edge makes an angle of 23.0 ◦ with the vertical. Referring to figure 6.8, the suspended lamina will have its centre of gravity C vertically below the suspension point and its straight edge will make an angle θ =tan −1 (d/a)with the vertical, where 2a is the diameter of the semicircle and d is the distance of its centre of mass from the diameter. Since rotating the lamina about the diameter generates a sphere of volume 4 3 πa 3 , Pappus’ first theorem requires that 4 3 πa 3 =2πd× 1 2 πa 2 . Hence d = 4a 3π and θ =tan −1 4 3π =23.0 ◦ . triangleleftsld 197 MULTIPLE INTEGRALS a θ d C Figure 6.8 Suspending a semicircular lamina from one of its corners. 6.3.4 Moments of inertia For problems in rotational mechanics it is often necessary to calculate the moment of inertia of a body about a given axis. This is defined by the multiple integral I = integraldisplay l 2 dM, where l is the distance of a mass element dM from the axis. We may again choose mass elements convenient for evaluating the integral. In this case, however, in addition to elements of constant density we require all parts of each element to be at approximately the same distance from the axis about which the moment of inertia is required. trianglerightsld Find the moment of inertia of a uniform rectangular lamina of mass M with sides a and b about one of the sides of length b. Referring to figure 6.9, we wish to calculate the moment of inertia about the y-axis. We therefore divide the rectangular lamina into elemental strips parallel to the y-axis of width dx. The mass of such a strip is dM = σbdx,whereσ is the mass per unit area of the lamina. The moment of inertia of a strip at a distance x from the y-axisissimply dI = x 2 dM = σbx 2 dx. The total moment of inertia of the lamina about the y-axis is therefore I = integraldisplay a 0 σbx 2 dx = σba 3 3 . Since the total mass of the lamina is M = σab,wecanwriteI = 1 3 Ma 2 . triangleleftsld 198 6.4 CHANGE OF VARIABLES IN MULTIPLE INTEGRALS y x b dx a dM = σbdx Figure 6.9 A uniform rectangular lamina of mass M with sides a and b can be divided into vertical strips. 6.3.5 Mean values of functions In chapter 2 we discussed average values for functions of a single variable. This is easily extended to functions of several variables. Let us consider, for example, a function f(x,y) defined in some region R of the xy-plane. Then the average value ¯ f of the function is given by ¯ f integraldisplay R dA = integraldisplay R f(x,y)dA. (6.10) This definition is easily extended to three (and higher) dimensions; if a function f(x,y,z) is defined in some three-dimensional region of space R then the average value ¯ f of the function is given by ¯ f integraldisplay R dV = integraldisplay R f(x,y,z)dV. (6.11) trianglerightsldA tetrahedron is bounded by the three coordinate surfaces and the plane x/a+y/b+z/c = 1 and has density ρ(x,y,z)=ρ 0 (1 + x/a). Find the average value of the density. From (6.11), the average value of the density is given by ¯ρ integraldisplay R dV = integraldisplay R ρ(x,y,z)dV. Now the integral on the LHS is just the volume of the tetrahedron, which we found in subsection 6.3.1 to be V = 1 6 abc, and the integral on the RHS is its mass M = 5 24 abcρ 0 , calculated in subsection 6.3.2. Therefore ¯ρ = M/V = 5 4 ρ 0 . triangleleftsld 6.4 Change of variables in multiple integrals It often happens that, either because of the form of the integrand involved or because of the boundary shape of the region of integration, it is desirable to 199 MULTIPLE INTEGRALS y x u =constant v =constant N M L K R C Figure 6.10 A region of integration R overlaid with a grid formed by the family of curves u =constantandv = constant. The parallelogram KLMN defines the area element dA uv . express a multiple integral in terms of a new set of variables. We now consider how to do this. 6.4.1 Change of variables in double integrals Let us begin by examining the change of variables in a double integral. Suppose that we require to change an integral I = integraldisplayintegraldisplay R f(x,y)dxdy, in terms of coordinates x and y, into one expressed in new coordinates u and v, given in terms of x and y by diﬀerentiable equations u = u(x,y)andv = v(x,y) with inverses x = x(u,v)andy = y(u,v). The region R in the xy-plane and the curve C that bounds it will become a new region R prime and a new boundary C prime in the uv-plane, and so we must change the limits of integration accordingly. Also, the function f(x,y) becomes a new function g(u,v) of the new coordinates. Now the part of the integral that requires most consideration is the area element. In the xy-plane the element is the rectangular area dA xy = dxdy generated by constructing a grid of straight lines parallel to the x-andy- axes respectively. Our task is to determine the corresponding area element in the uv-coordinates. In general the corresponding element dA uv will not be the same shape as dA xy , but this does not matter since all elements are infinitesimally small and the value of the integrand is considered constant over them. Since the sides of the area element are infinitesimal, dA uv will in general have the shape of a parallelogram. We can find the connection between dA xy and dA uv by considering the grid formed by the family of curves u =constantandv = constant, as shown in figure 6.10. Since v 200 6.4 CHANGE OF VARIABLES IN MULTIPLE INTEGRALS is constant along the line element KL, the latter has components (∂x/∂u)du and (∂y/∂u)du in the directions of the x-andy-axes respectively. Similarly, since u is constant along the line element KN, the latter has corresponding components (∂x/∂v)dv and (∂y/∂v)dv. Using the result for the area of a parallelogram given in chapter 7, we find that the area of the parallelogram KLMN is given by dA uv = vextendsingle vextendsingle vextendsingle vextendsingle ∂x ∂u du ∂y ∂v dv− ∂x ∂v dv ∂y ∂u du vextendsingle vextendsingle vextendsingle vextendsingle = vextendsingle vextendsingle vextendsingle vextendsingle ∂x ∂u ∂y ∂v − ∂x ∂v ∂y ∂u vextendsingle vextendsingle vextendsingle vextendsingle dudv. Defining the Jacobian of x, y with respect to u, v as J = ∂(x,y) ∂(u,v) ≡ ∂x ∂u ∂y ∂v − ∂x ∂v ∂y ∂u , we have dA uv = vextendsingle vextendsingle vextendsingle vextendsingle ∂(x,y) ∂(u,v) vextendsingle vextendsingle vextendsingle vextendsingle dudv. The reader acquainted with determinants will notice that the Jacobian can also be written as the 2×2 determinant J = ∂(x,y) ∂(u,v) = vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle ∂x ∂u ∂y ∂u ∂x ∂v ∂y ∂v vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . Such determinants can be evaluated using the methods of chapter 8. So, in summary, the relationship between the size of the area element generated by dx, dy and the size of the corresponding area element generated by du, dv is dxdy = vextendsingle vextendsingle vextendsingle vextendsingle ∂(x,y) ∂(u,v) vextendsingle vextendsingle vextendsingle vextendsingle dudv. This equality should be taken as meaning that when transforming from coordi- nates x,y to coordinates u,v, the area element dxdy should be replaced by the expression on the RHS of the above equality. Of course, the Jacobian can, and in general will, vary over the region of integration. We may express the double integral in either coordinate system as I = integraldisplayintegraldisplay R f(x,y)dxdy = integraldisplayintegraldisplay R prime g(u,v) vextendsingle vextendsingle vextendsingle vextendsingle ∂(x,y) ∂(u,v) vextendsingle vextendsingle vextendsingle vextendsingle dudv. (6.12) When evaluating the integral in the new coordinate system, it is usually advisable to sketch the region of integration R prime in the uv-plane. 201 MULTIPLE INTEGRALS trianglerightsldEvaluate the double integral I = integraldisplayintegraldisplay R parenleftBig a + radicalbig x 2 + y 2 parenrightBig dxdy, where R is the region bounded by the circle x 2 + y 2 = a 2 . In Cartesian coordinates, the integral may be written I = integraldisplay a −a dx integraldisplay √ a 2 −x 2 − √ a 2 −x 2 dy parenleftBig a + radicalbig x 2 + y 2 parenrightBig , and can be calculated directly. However, because of the circular boundary of the integration region, a change of variables to plane polar coordinates ρ, φ is indicated. The relationship between Cartesian and plane polar coordinates is given by x = ρcosφ and y = ρsinφ. Using (6.12) we can therefore write I = integraldisplayintegraldisplay R prime (a + ρ) vextendsingle vextendsingle vextendsingle vextendsingle ∂(x,y) ∂(ρ,φ) vextendsingle vextendsingle vextendsingle vextendsingle dρdφ, where R prime is the rectangular region in the ρφ-plane whose sides are ρ =0,ρ = a, φ =0 and φ =2π. The Jacobian is easily calculated, and we obtain J = ∂(x,y) ∂(ρ,φ) = vextendsingle vextendsingle vextendsingle vextendsingle cosφ sinφ −ρsinφρcosφ vextendsingle vextendsingle vextendsingle vextendsingle = ρ(cos 2 φ +sin 2 φ)=ρ. So the relationship between the area elements in Cartesian and in plane polar coordinates is dxdy = ρdρdφ. Therefore, when expressed in plane polar coordinates, the integral is given by I = integraldisplayintegraldisplay R prime (a + ρ)ρdρdφ = integraldisplay 2π 0 dφ integraldisplay a 0 dρ(a + ρ)ρ =2π bracketleftbigg aρ 2 2 + ρ 3 3 bracketrightbigg a 0 = 5πa 3 3 . triangleleftsld 6.4.2 Evaluation of the integral I = integraltext ∞ −∞ e −x 2 dx By making a judicious change of variables, it is sometimes possible to evaluate an integral that would be intractable otherwise. An important example of this method is provided by the evaluation of the integral I = integraldisplay ∞ −∞ e −x 2 dx. Its value may be found by first constructing I 2 , as follows: I 2 = integraldisplay ∞ −∞ e −x 2 dx integraldisplay ∞ −∞ e −y 2 dy = integraldisplay ∞ −∞ dx integraldisplay ∞ −∞ dy e −(x 2 +y 2 ) = integraldisplayintegraldisplay R e −(x 2 +y 2 ) dxdy, 202 6.4 CHANGE OF VARIABLES IN MULTIPLE INTEGRALS a a −a −a y x Figure 6.11 The regions used to illustrate the convergence properties of the integral I(a)= integraltext a −a e −x 2 dx as a→∞. where the region R is the whole xy-plane. Then, transforming to plane polar coordinates, we find I 2 = integraldisplayintegraldisplay R prime e −ρ 2 ρdρdφ= integraldisplay 2π 0 dφ integraldisplay ∞ 0 dρρe −ρ 2 =2π bracketleftBig − 1 2 e −ρ 2 bracketrightBig ∞ 0 = π. Therefore the original integral is given by I = √ π. Because the integrand is an even function of x, it follows that the value of the integral from 0 to ∞ is simply √ π/2. We note, however, that unlike in all the previous examples, the regions of integration R and R prime are both infinite in extent (i.e. unbounded). It is therefore prudent to derive this result more rigorously; this we do by considering the integral I(a)= integraldisplay a −a e −x 2 dx. We then have I 2 (a)= integraldisplayintegraldisplay R e −(x 2 +y 2 ) dxdy, where R is the square of side 2a centred on the origin. Referring to figure 6.11, since the integrand is always positive the value of the integral taken over the square lies between the value of the integral taken over the region bounded by the inner circle of radius a and the value of the integral taken over the outer circle of radius √ 2a. Transforming to plane polar coordinates as above, we may 203 MULTIPLE INTEGRALS z x y C R T S P Q u = c 1 v = c 2 w = c 3 Figure 6.12 A three-dimensional region of integration R, showing an el- ement of volume in u,v,w coordinates formed by the coordinate surfaces u =constant,v =constant,w =constant. evaluate the integrals over the inner and outer circles respectively, and we find π parenleftBig 1−e −a 2 parenrightBig *a). (a) Find the volume V and surface area A of the torus, and show that they can be written as V = π 2 4 (r 2 o −r 2 i )(r o −r i ),A= π 2 (r 2 o −r 2 i ), where r o and r i are, respectively, the outer and inner radii of the torus. (b) Show that a vertical circular cylinder of radius c, coaxial with the torus, divides A in the ratio πc+2a : πc−2a. 6.10 A thin uniform circular disc has mass M and radius a. (a) Prove that its moment of inertia about an axis perpendicular to its plane and passing through its centre is 1 2 Ma 2 . (b) Prove that the moment of inertia of the same disc about a diameter is 1 4 Ma 2 . 208 6.5 EXERCISES This is an example of the general result for planar bodies that the moment of inertia of the body about an axis perpendicular to the plane is equal to the sum of the moments of inertia about two perpendicular axes lying in the plane; in an obvious notation I z = integraldisplay r 2 dm = integraldisplay (x 2 + y 2 )dm = integraldisplay x 2 dm + integraldisplay y 2 dm = I y + I x . 6.11 In some applications in mechanics the moment of inertia of a body about a single point (as opposed to about an axis) is needed. The moment of inertia, I, about the origin of a uniform solid body of density ρ is given by the volume integral I = integraldisplay V (x 2 + y 2 + z 2 )ρdV. Show that the moment of inertia of a right circular cylinder of radius a,length 2b and mass M about its centre is M parenleftbigg a 2 2 + b 2 3 parenrightbigg . 6.12 The shape of an axially symmetric hard-boiled egg, of uniform density ρ 0 ,is given in spherical polar coordinates by r = a(2−cosθ), where θ is measured from the axis of symmetry. (a) Prove that the mass M of the egg is M = 40 3 πρ 0 a 3 . (b) Prove that the egg’s moment of inertia about its axis of symmetry is 342 175 Ma 2 . 6.13 In spherical polar coordinates r, θ, φ the element of volume for a body that is symmetrical about the polar axis is dV =2πr 2 sinθdrdθ, whilst its element of surface area is 2πrsinθ[(dr) 2 + r 2 (dθ) 2 ] 1/2 . A particular surface is defined by r =2acosθ,wherea is a constant and 0 ≤ θ ≤ π/2. Find its total surface area and the volume it encloses, and hence identify the surface. 6.14 By expressing both the integrand and the surface element in spherical polar coordinates, show that the surface integral integraldisplay x 2 x 2 + y 2 dS over the surface x 2 + y 2 = z 2 ,0≤z ≤1, has the value π/ √ 2. 6.15 By transforming to cylindrical polar coordinates, evaluate the integral I = integraldisplayintegraldisplayintegraldisplay ln(x 2 + y 2 )dxdydz over the interior of the conical region x 2 + y 2 ≤z 2 ,0≤z ≤1. 6.16 Sketch the two families of curves y 2 =4u(u−x),y 2 =4v(v + x), where u and v are parameters. By transforming to the uv-plane, evaluate the integral of y/(x 2 + y 2 ) 1/2 over the part of the quadrant x>0, y>0 that is bounded by the lines x =0,y =0 and the curve y 2 =4a(a−x). 6.17 By making two successive simple changes of variables, evaluate I = integraldisplayintegraldisplayintegraldisplay x 2 dxdydz 209 MULTIPLE INTEGRALS over the ellipsoidal region x 2 a 2 + y 2 b 2 + z 2 c 2 ≤1. 6.18 Sketch the domain of integration for the integral I = integraldisplay 1 0 integraldisplay 1/y x=y y 3 x exp[y 2 (x 2 + x −2 )]dxdy and characterise its boundaries in terms of new variables u = xy and v = y/x. Show that the Jacobian for the change from (x,y)to(u,v)isequalto(2v) −1 ,and hence evaluate I. 6.19 Sketch the part of the region 0 ≤x,0≤y ≤π/2 that is bounded by the curves x =0,y =0,sinhxcosy =1andcoshxsiny = 1. By making a suitable change of variables, evaluate the integral I = integraldisplayintegraldisplay (sinh 2 x +cos 2 y)sinh2xsin2ydxdy over the bounded subregion. 6.20 Define a coordinate system u,v whose origin coincides with that of the usual x,y system and whose u-axis coincides with the x-axis, whilst the v-axis makes an angle α with it. By considering the integral I = integraltext exp(−r 2 )dA,wherer is the radial distance from the origin, over the area defined by 0 ≤u<∞,0≤v<∞, prove that integraldisplay ∞ 0 integraldisplay ∞ 0 exp(−u 2 −v 2 −2uv cosα)dudv = α 2sinα . 6.21 As stated in section 5.11, the first law of thermodynamics can be expressed as dU = TdS−PdV. By calculating and equating ∂ 2 U/∂Y∂X and ∂ 2 U/∂X∂Y,whereX and Y are an unspecified pair of variables (drawn from P,V,T and S), prove that ∂(S,T) ∂(X,Y) = ∂(V,P) ∂(X,Y) . Using the properties of Jacobians, deduce that ∂(S,T) ∂(V,P) =1. 6.22 The distances of the variable point P, which has coordinates x,y,z, from the fixed points (0,0,1) and (0,0,−1) are denoted by u and v respectively. New variables ξ,η,φ are defined by ξ = 1 2 (u + v),η= 1 2 (u−v), and φ is the angle between the plane y = 0 and the plane containing the three points. Prove that the Jacobian ∂(ξ,η,φ)/∂(x,y,z) has the value (ξ 2 −η 2 ) −1 and that integraldisplayintegraldisplayintegraldisplay all space (u−v) 2 uv exp parenleftbigg − u + v 2 parenrightbigg dxdydz = 16π 3e . 6.23 This is a more diﬃcult question about ‘volumes’ in an increasing number of dimensions. 210 6.6 HINTS AND ANSWERS (a) Let R be a real positive number and define K m by K m = integraldisplay R −R parenleftbig R 2 −x 2 parenrightbig m dx. Show, using integration by parts, that K m satisfies the recurrence relation (2m +1)K m =2mR 2 K m−1 . (b) For integer n, define I n = K n and J n = K n+1/2 . Evaluate I 0 and J 0 directly and hence prove that I n = 2 2n+1 (n!) 2 R 2n+1 (2n +1)! and J n = π(2n +1)!R 2n+2 2 2n+1 n!(n +1)! . (c) A sequence of functions V n (R) is defined by V 0 (R)=1, V n (R)= integraldisplay R −R V n−1 parenleftBig √ R 2 −x 2 parenrightBig dx, n≥1. Prove by induction that V 2n (R)= π n R 2n n! ,V 2n+1 (R)= π n 2 2n+1 n!R 2n+1 (2n +1)! . (d) For interest, (i) show that V 2n+2 (1)*1). O A B P a b p µ λ Figure 7.5 An illustration of the ratio theorem. The point P divides the line segment AB in the ratio λ : µ. Having defined the operations of addition, subtraction and multiplication by a scalar, we can now use vectors to solve simple problems in geometry. trianglerightsldA point P divides a line segment AB in the ratio λ : µ (see figure 7.5). If the position vectors of the points A and B are a and b, respectively, find the position vector of the point P. As is conventional for vector geometry problems, we denote the vector from the point A to the point B by AB. If the position vectors of the points A and B, relative to some origin O,area and b, it should be clear that AB = b−a. Now, from figure 7.5 we see that one possible way of reaching the point P from O is first to go from O to A and to go along the line AB for a distance equal to the the fraction λ/(λ + µ) of its total length. We may express this in terms of vectors as OP = p = a + λ λ + µ AB = a + λ λ + µ (b−a) = parenleftbigg 1− λ λ + µ parenrightbigg a + λ λ + µ b = µ λ + µ a + λ λ + µ b, (7.6) which expresses the position vector of the point P in terms of those of A and B.Wewould, of course, obtain the same result by considering the path from O to B and then to P. triangleleftsld 215 VECTOR ALGEBRA O A B C D E F G a b c Figure 7.6 The centroid of a triangle. The triangle is defined by the points A, B and C that have position vectors a, b and c. The broken lines CD, BE, AF connect the vertices of the triangle to the mid-points of the opposite sides; these lines intersect at the centroid G of the triangle. Result (7.6) is a version of the ratio theorem and we may use it in solving more complicated problems. trianglerightsldThe vertices of triangle ABC have position vectors a, b and c relative to some origin O (see figure 7.6). Find the position vector of the centroid G of the triangle. From figure 7.6, the points D and E bisect the lines AB and AC respectively. Thus from the ratio theorem (7.6), with λ = µ =1/2, the position vectors of D and E relative to the origin are d = 1 2 a + 1 2 b, e = 1 2 a + 1 2 c. Using the ratio theorem again, we may write the position vector of a general point on the line CD that divides the line in the ratio λ :(1−λ)as r =(1−λ)c + λd, =(1−λ)c + 1 2 λ(a + b), (7.7) where we have expressed d in terms of a and b. Similarly, the position vector of a general point on the line BE can be expressed as r =(1−µ)b + µe, =(1−µ)b + 1 2 µ(a + c). (7.8) Thus, at the intersection of the lines CD and BE we require, from (7.7), (7.8), (1−λ)c + 1 2 λ(a + b)=(1−µ)b + 1 2 µ(a + c). By equating the coeﬃcents of the vectors a, b, c we find λ = µ, 1 2 λ =1−µ, 1−λ = 1 2 µ. 216 7.4 BASIS VECTORS AND COMPONENTS These equations are consistent and have the solution λ = µ =2/3. Substituting these values into either (7.7) or (7.8) we find that the position vector of the centroid G is given by g = 1 3 (a + b + c). triangleleftsld 7.4 Basis vectors and components Given any three diﬀerent vectors e 1 , e 2 and e 3 , which do not all lie in a plane, it is possible, in a three-dimensional space, to write any other vector in terms of scalar multiples of them: a = a 1 e 1 + a 2 e 2 + a 3 e 3 . (7.9) The three vectors e 1 , e 2 and e 3 are said to form a basis (for the three-dimensional space); the scalars a 1 , a 2 and a 3 , which may be positive, negative or zero, are called the components of the vector a with respect to this basis. We say that the vector has been resolved into components. Most often we shall use basis vectors that are mutually perpendicular, for ease of manipulation, though this is not necessary. In general, a basis set must (i) have as many basis vectors as the number of dimensions (in more formal language, the basis vectors must span the space) and (ii) be such that no basis vector may be described as a sum of the others, or, more formally, the basis vectors must be linearly independent. Putting this mathematically, in N dimensions, we require c 1 e 1 + c 2 e 2 +···+ c N e N negationslash= 0, for any set of coeﬃcients c 1 ,c 2 ,...,c N except c 1 = c 2 = ···= c N =0. In this chapter we will only consider vectors in three dimensions; higher dimen- sionality can be achieved by simple extension. If we wish to label points in space using a Cartesian coordinate system (x,y,z), we may introduce the unit vectors i, j and k, which point along the positive x-, y-andz- axes respectively. A vector a may then be written as a sum of three vectors, each parallel to a diﬀerent coordinate axis: a = a x i + a y j + a z k. (7.10) A vector in three-dimensional space thus requires three components to describe fully both its direction and its magnitude. A displacement in space may be thought of as the sum of displacements along the x-, y-andz- directions (see figure 7.7). For brevity, the components of a vector a with respect to a particular coordinate system are sometimes written in the form (a x ,a y ,a z ). Note that the 217 VECTOR ALGEBRA i j k a x i a y j a z k a Figure 7.7 A Cartesian basis set. The vector a is the sum of a x i, a y j and a z k. basis vectors i, j and k may themselves be represented by (1,0,0), (0,1,0) and (0,0,1) respectively. We can consider the addition and subtraction of vectors in terms of their components. The sum of two vectors a and b is found by simply adding their components, i.e. a + b = a x i + a y j + a z k + b x i + b y j + b z k =(a x + b x )i +(a y + b y )j +(a z + b z )k, (7.11) and their diﬀerence by subtracting them, a−b = a x i + a y j + a z k−(b x i + b y j + b z k) =(a x −b x )i +(a y −b y )j +(a z −b z )k. (7.12) trianglerightsldTwo particles have velocities v 1 = i +3j +6k and v 2 = i−2k, respectively. Find the velocity u of the second particle relative to the first. The required relative velocity is given by u = v 2 −v 1 =(1−1)i +(0−3)j +(−2−6)k =−3j−8k. triangleleftsld 7.5 Magnitude of a vector The magnitude of the vector a is denoted by |a| or a. In terms of its components in three-dimensional Cartesian coordinates, the magnitude of a is given by a≡|a|= radicalBig a 2 x + a 2 y + a 2 z . (7.13) Hence, the magnitude of a vector is a measure of its length. Such an analogy is useful for displacement vectors but magnitude is better described, for example, by ‘strength’ for vectors such as force or by ‘speed’ for velocity vectors. For instance, 218 7.6 MULTIPLICATION OF VECTORS a b O θ bcosθ Figure 7.8 The projection of b onto the direction of a is bcosθ. The scalar product of a and b is abcosθ. in the previous example, the speed of the second particle relative to the first is given by u =|u|= radicalbig (−3) 2 +(−8) 2 = √ 73. A vector whose magnitude equals unity is called a unit vector. The unit vector in the direction a is usually notated ˆa and may be evaluated as ˆa = a |a| . (7.14) The unit vector is a useful concept because a vector written as λˆa then has mag- nitude λ and direction ˆa. Thus magnitude and direction are explicitly separated. 7.6 Multiplication of vectors We have already considered multiplying a vector by a scalar. Now we consider the concept of multiplying one vector by another vector. It is not immediately obvious what the product of two vectors represents and in fact two products are commonly defined, the scalar product and the vector product. As their names imply, the scalar product of two vectors is just a number, whereas the vector product is itself a vector. Although neither the scalar nor the vector product is what we might normally think of as a product, their use is widespread and numerous examples will be described elsewhere in this book. 7.6.1 Scalar product The scalar product (or dot product) of two vectors a and b is denoted by a · b and is given by a · b≡|a||b|cosθ, 0≤θ≤π, (7.15) where θ is the angle between the two vectors, placed ‘tail to tail’ or ‘head to head’. Thus, the value of the scalar product a · b equals the magnitude of a multiplied by the projection of b onto a (see figure 7.8). 219 VECTOR ALGEBRA From (7.15) we see that the scalar product has the particularly useful property that a· b = 0 (7.16) is a necessary and suﬃcient condition for a to be perpendicular to b (unless either of them is zero). It should be noted in particular that the Cartesian basis vectors i, j and k, being mutually orthogonal unit vectors, satisfy the equations i · i = j · j = k· k =1, (7.17) i · j = j · k = k · i =0. (7.18) Examples of scalar products arise naturally throughout physics and in partic- ular in connection with energy. Perhaps the simplest is the work done F · r in moving the point of application of a constant force F through a displacement r; notice that, as expected, if the displacement is perpendicular to the direction of the force then F·r = 0 and no work is done. A second simple example is aﬀorded by the potential energy −m·B of a magnetic dipole, represented in strength and orientation by a vector m, placed in an external magnetic field B. As the name implies, the scalar product has a magnitude but no direction. The scalar product is commutative and distributive over addition: a · b = b · a (7.19) a · (b + c)=a · b + a · c. (7.20) trianglerightsldFour non-coplanar points A,B,C,D are positioned such that the line AD is perpendicular to BC and BD is perpendicular to AC. Show that CD is perpendicular to AB. Denote the four position vectors by a, b, c, d. As none of the three pairs of lines actually intersect, it is diﬃcult to indicate their orthogonality in the diagram we would normally draw. However, the orthogonality can be expressed in vector form and we start by noting that, since AD⊥BC, it follows from (7.16) that (d−a)·(c−b)=0. Similarly, since BD⊥AC, (d−b)·(c−a)=0. Combining these two equations we find (d−a)·(c−b)=(d−b)·(c−a), which, on mutliplying out the parentheses, gives d·c−a·c−d·b + a·b = d·c−b·c−d·a + b·a. Cancelling terms that appear on both sides and rearranging yields d·b−d·a−c·b + c·a =0, which simplifies to give (d−c)·(b−a)=0. From (7.16), we see that this implies that CD is perpendicular to AB. triangleleftsld 220 7.6 MULTIPLICATION OF VECTORS If we introduce a set of basis vectors that are mutually orthogonal, such as i, j, k, we can write the components of a vector a, with respect to that basis, in terms of the scalar product of a with each of the basis vectors, i.e. a x = a·i, a y = a·j and a z = a·k. In terms of the components a x , a y and a z the scalar product is given by a · b =(a x i + a y j + a z k) · (b x i + b y j + b z k)=a x b x + a y b y + a z b z , (7.21) where the cross terms such as a x i · b y j are zero because the basis vectors are mutually perpendicular; see equation (7.18). It should be clear from (7.15) that the value of a · b has a geometrical definition and that this value is independent of the actual basis vectors used. trianglerightsldFind the angle between the vectors a = i +2j +3k and b =2i +3j +4k. From (7.15) the cosine of the angle θ between a and b is given by cosθ = a·b |a||b| . From (7.21) the scalar product a·b has the value a·b =1×2+2×3+3×4=20, and from (7.13) the lengths of the vectors are |a|= radicalbig 1 2 +2 2 +3 2 = √ 14 and |b|= radicalbig 2 2 +3 2 +4 2 = √ 29. Thus, cosθ = 20 √ 14 √ 29 ≈0.9926 ⇒ θ =0.12 rad. triangleleftsld We can see from the expressions (7.15) and (7.21) for the scalar product that if θ is the angle between a and b then cosθ = a x a b x b + a y a b y b + a z a b z b where a x /a, a y /a and a z /a are called the direction cosines of a, since they give the cosine of the angle made by a with each of the basis vectors. Similarly b x /b, b y /b and b z /b are the direction cosines of b. If we take the scalar product of any vector a with itself then clearly θ = 0 and from (7.15) we have a · a =|a| 2 . Thus the magnitude of a can be written in a coordinate-independent form as |a|= √ a · a. Finally, we note that the scalar product may be extended to vectors with complex components if it is redefined as a · b = a ∗ x b x + a ∗ y b y + a ∗ z b z , where the asterisk represents the operation of complex conjugation. To accom- 221 VECTOR ALGEBRA θ a×b a b Figure 7.9 The vector product. The vectors a, b and a×b form a right-handed set. modate this extension the commutation property (7.19) must be modified to read a · b =(b · a) ∗ . (7.22) In particular it should be noted that (λa) · b = λ ∗ a · b, whereas a · (λb)=λa · b. However, the magnitude of a complex vector is still given by |a| = √ a · a,since a · a is always real. 7.6.2 Vector product The vector product (or cross product) of two vectors a and b is denoted by a×b and is defined to be a vector of magnitude |a||b|sinθ in a direction perpendicular to both a and b; |a×b|=|a||b| sinθ. The direction is found by ‘rotating’ a into b through the smallest possible angle. The sense of rotation is that of a right-handed screw that moves forward in the direction a×b (see figure 7.9). Again, θ is the angle between the two vectors placed ‘tail to tail’ or ‘head to head’. With this definition a, b and a×b form a right-handed set. A more directly usable description of the relative directions in a vector product is provided by a right hand whose first two fingers and thumb are held to be as nearly mutually perpendicular as possible. If the first finger is pointed in the direction of the first vector and the second finger in the direction of the second vector, then the thumb gives the direction of the vector product. The vector product is distributive over addition, but anticommutative and non- associative: (a + b)×c =(a×c)+(b×c), (7.23) b×a =−(a×b), (7.24) (a×b)×cnegationslash= a×(b×c). (7.25) 222 7.6 MULTIPLICATION OF VECTORS θ O R P F r Figure 7.10 The moment of the force F about O is r×F. The cross represents the direction of r×F, which is perpendicularly into the plane of the paper. From its definition, we see that the vector product has the very useful property that if a×b = 0 then a is parallel or antiparallel to b (unless either of them is zero). We also note that a×a = 0. (7.26) trianglerightsldShow that if a = b + λc, for some scalar λ,thena×c = b×c. From (7.23) we have a×c =(b + λc)×c = b×c + λc×c. However, from (7.26), c×c = 0 and so a×c = b×c. (7.27) We note in passing that the fact that (7.27) is satisfied does not imply that a = b. triangleleftsld An example of the use of the vector product is that of finding the area, A,of a parallelogram with sides a and b, using the formula A =|a×b|. (7.28) Another example is aﬀorded by considering a force F acting through a point R, whose vector position relative to the origin O is r (see figure 7.10). Its moment or torque about O is the strength of the force times the perpendicular distance OP, which numerically is just Frsinθ, i.e. the magnitude of r×F. Furthermore, the sense of the moment is clockwise about an axis through O that points perpendicularly into the plane of the paper (the axis is represented by a cross in the figure). Thus the moment is completely represented by the vector r×F, in both magnitude and spatial sense. It should be noted that the same vector product is obtained wherever the point R is chosen, so long as it lies on the line of action of F. Similarly, if a solid body is rotating about some axis that passes through the origin, with an angular velocity ω then we can describe this rotation by a vector ω that has magnitude ω and points along the axis of rotation. The direction of ω 223 VECTOR ALGEBRA is the forward direction of a right-handed screw rotating in the same sense as the body. The velocity of any point in the body with position vector r is then given by v = ω×r. Since the basis vectors i, j, k are mutually perpendicular unit vectors, forming a right-handed set, their vector products are easily seen to be i×i = j×j = k×k = 0, (7.29) i×j =−j×i = k, (7.30) j×k =−k×j = i, (7.31) k×i =−i×k = j. (7.32) Using these relations, it is straightforward to show that the vector product of two general vectors a and b is given in terms of their components with respect to the basis set i, j, k,by a×b =(a y b z −a z b y )i +(a z b x −a x b z )j +(a x b y −a y b x )k. (7.33) For the reader who is familiar with determinants (see chapter 8), we record that this can also be written as a×b = vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle ijk a x a y a z b x b y b z vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . That the cross product a×b is perpendicular to both a and b can be verified in component form by forming its dot products with each of the two vectors and showing that it is zero in both cases. trianglerightsldFind the area A of the parallelogram with sides a = i +2j +3k and b =4i +5j +6k. The vector product a×b is given in component form by a×b =(2×6−3×5)i +(3×4−1×6)j +(1×5−2×4)k =−3i +6j−3k. Thus the area of the parallelogram is A =|a×b|= radicalbig (−3) 2 +6 2 +(−3) 2 = √ 54. triangleleftsld 7.6.3 Scalar triple product Now that we have defined the scalar and vector products, we can extend our discussion to define products of three vectors. Again, there are two possibilities, the scalar triple product and the vector triple product. 224 7.6 MULTIPLICATION OF VECTORS θ O φ P a b c v Figure 7.11 The scalar triple product gives the volume of a parallelepiped. The scalar triple product is denoted by [a,b,c]≡a · (b×c) and, as its name suggests, it is just a number. It is most simply interpreted as the volume of a parallelepiped whose edges are given by a, b and c (see figure 7.11). The vector v = a×b is perpendicular to the base of the solid and has magnitude v = absinθ, i.e. the area of the base. Further, v · c = vccosφ. Thus, since ccosφ = OP is the vertical height of the parallelepiped, it is clear that (a×b)·c =area of the base × perpendicular height = volume. It follows that, if the vectors a, b and c are coplanar, a · (b×c)=0. Expressed in terms of the components of each vector with respect to the Cartesian basis set i, j, k the scalar triple product is a · (b×c)=a x (b y c z −b z c y )+a y (b z c x −b x c z )+a z (b x c y −b y c x ), (7.34) which can also be written as a determinant: a· (b×c)= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle a x a y a z b x b y b z c x c y c z vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . By writing the vectors in component form, it can be shown that a · (b×c)=(a×b) · c, so that the dot and cross symbols can be interchanged without changing the result. More generally, the scalar triple product is unchanged under cyclic permutation of the vectors a,b,c. Other permutations simply give the negative of the original scalar triple product. These results can be summarised by [a,b,c]=[b,c,a]=[c,a,b]=−[a,c,b]=−[b,a,c]=−[c,b,a]. (7.35) 225 VECTOR ALGEBRA trianglerightsldFind the volume V of the parallelepiped with sides a = i +2j +3k, b =4i +5j +6k and c =7i +8j +10k. We have already found that a×b =−3i +6j−3k, in subsection 7.6.2. Hence the volume of the parallelepiped is given by V =|a·(b×c)|=|(a×b)·c| =|(−3i +6j−3k)·(7i +8j +10k)| =|(−3)(7) + (6)(8) + (−3)(10)|=3. triangleleftsld Another useful formula involving both the scalar and vector products is La- grange’s identity (see exercise 7.9), i.e. (a×b) · (c×d)≡(a · c)(b · d)−(a · d)(b · c). (7.36) 7.6.4 Vector triple product By the vector triple product of three vectors a, b, c we mean the vector a×(b×c). Clearly, a×(b×c) is perpendicular to a and lies in the plane of b and c and so can be expressed in terms of them (see (7.37) below). We note, from (7.25), that the vector triple product is not associative, i.e. a×(b×c)negationslash=(a×b)×c. Two useful formulae involving the vector triple product are a×(b×c)=(a · c)b−(a · b)c, (7.37) (a×b)×c =(a · c)b−(b · c)a, (7.38) which may be derived by writing each vector in component form (see exercise 7.8). It can also be shown that for any three vectors a, b, c, a×(b×c)+b×(c×a)+c×(a×b)=0. 7.7 Equations of lines, planes and spheres Now that we have described the basic algebra of vectors, we can apply the results to a variety of problems, the first of which is to find the equation of a line in vector form. 7.7.1 Equation of a line Consider the line passing through the fixed point A with position vector a and having a direction b (see figure 7.12). It is clear that the position vector r of a general point R on the line can be written as r = a + λb, (7.39) 226 7.7 EQUATIONS OF LINES, PLANES AND SPHERES O A R a b r Figure 7.12 The equation of a line. The vector b is in the direction AR and λb is the vector from A to R. since R can be reached by starting from O, going along the translation vector a to the point A on the line and then adding some multiple λb of the vector b. Diﬀerent values of λ give diﬀerent points R on the line. Taking the components of (7.39), we see that the equation of the line can also bewrittenintheform x−a x b x = y−a y b y = z−a z b z = constant. (7.40) Taking the vector product of (7.39) with b and remembering that b×b = 0 gives an alternative equation for the line (r−a)×b = 0. We may also find the equation of the line that passes through two fixed points A and C with position vectors a and c.SinceAC is given by c−a, the position vector of a general point on the line is r = a + λ(c−a). 7.7.2 Equation of a plane The equation of a plane through a point A with position vector a and perpendic- ular to a unit position vector ˆn (see figure 7.13) is (r−a) · ˆn =0. (7.41) This follows since the vector joining A to a general point R with position vector r is r−a; r will lie in the plane if this vector is perpendicular to the normal to the plane. Rewriting (7.41) as r · ˆn = a · ˆn, we see that the equation of the plane may also be expressed in the form r · ˆn = d, or in component form as lx+ my + nz = d, (7.42) 227 VECTOR ALGEBRA O d a ˆn r AR Figure 7.13 The equation of the plane is (r−a)· ˆn =0. where the unit normal to the plane is ˆn = li + mj + nk and d = a · ˆn is the perpendicular distance of the plane from the origin. The equation of a plane containing points a, b and c is r = a + λ(b−a)+µ(c−a). This is apparent because starting from the point a in the plane, all other points may be reached by moving a distance along each of two (non-parallel) directions in the plane. Two such directions are given by b−a and c−a. It can be shown that the equation of this plane may also be written in the more symmetrical form r = αa + βb + γc, where α + β + γ =1. trianglerightsldFind the direction of the line of intersection of the two planes x +3y −z =5and 2x−2y +4z =3. The two planes have normal vectors n 1 = i +3j−k and n 2 =2i−2j +4k.Itisclear that these are not parallel vectors and so the planes must intersect along some line. The direction p of this line must be parallel to both planes and hence perpendicular to both normals. Therefore p = n 1 ×n 2 = [(3)(4)−(−2)(−1)]i +[(−1)(2)−(1)(4)]j + [(1)(−2)−(3)(2)]k =10i−6j−8k. triangleleftsld 7.7.3 Equation of a sphere Clearly, the defining property of a sphere is that all points on it are equidistant from a fixed point in space and that the common distance is equal to the radius 228 7.8 USING VECTORS TO FIND DISTANCES of the sphere. This is easily expressed in vector notation as |r−c| 2 =(r−c) · (r−c)=a 2 , (7.43) where c is the position vector of the centre of the sphere and a is its radius. trianglerightsldFind the radius ρ of the circle that is the intersection of the plane ˆn·r = p and the sphere of radius a centred on the point with position vector c. The equation of the sphere is |r−c| 2 = a 2 , (7.44) and that of the circle of intersection is |r−b| 2 = ρ 2 , (7.45) where r is restricted to lie in the plane and b is the position of the circle’s centre. As b lies on the plane whose normal is ˆn,thevectorb−c must be parallel to ˆn,i.e. b−c = λˆn for some λ. Further, by Pythagoras, we must have ρ 2 +|b−c| 2 = a 2 . Thus λ 2 = a 2 −ρ 2 . Writing b = c + radicalbig a 2 −ρ 2 ˆn and substituting in (7.45) gives r 2 −2r· parenleftBig c + radicalbig a 2 −ρ 2 ˆn parenrightBig + c 2 +2(c· ˆn) radicalbig a 2 −ρ 2 + a 2 −ρ 2 = ρ 2 , whilst, on expansion, (7.44) becomes r 2 −2r·c + c 2 = a 2 . Subtracting these last two equations, using ˆn·r = p and simplifying yields p−c· ˆn = radicalbig a 2 −ρ 2 . On rearrangement, this gives ρ as radicalbig a 2 −(p−c· ˆn) 2 , which places obvious geometrical constraints on the values a,c, ˆn and p can take if a real intersection between the sphere and the plane is to occur. triangleleftsld 7.8 Using vectors to find distances This section deals with the practical application of vectors to finding distances. Some of these problems are extremely cumbersome in component form, but they all reduce to neat solutions when general vectors, with no explicit basis set, are used. These examples show the power of vectors in simplifying geometrical problems. 7.8.1 Distance from a point to a line Figure 7.14 shows a line having direction b that passes through a point A whose position vector is a. To find the minimum distance d of the line from a point P whose position vector is p, we must solve the right-angled triangle shown. We see that d =|p−a|sinθ; so, from the definition of the vector product, it follows that d =|(p−a)× ˆ b|. 229 VECTOR ALGEBRA O A P θ p−a d a b p Figure 7.14 The minimum distance from a point to a line. trianglerightsldFind the minimum distance from the point P with coordinates (1,2,1) to the line r = a+λb, where a = i + j + k and b =2i−j +3k. Comparison with (7.39) shows that the line passes through the point (1,1,1) and has direction 2i−j +3k. The unit vector in this direction is ˆ b = 1 √ 14 (2i−j +3k). The position vector of P is p = i +2j + k and we find (p−a)× ˆ b = 1 √ 14 [ j×(2i−3j +3k)] = 1 √ 14 (3i−2k). Thus the minimum distance from the line to the point P is d = radicalbig 13/14. triangleleftsld 7.8.2 Distance from a point to a plane The minimum distance d from a point P whose position vector is p to the plane defined by (r−a) · ˆn = 0 may be deduced by finding any vector from P to the plane and then determining its component in the normal direction. This is shown in figure 7.15. Consider the vector a−p, which is a particular vector from P to the plane. Its component normal to the plane, and hence its distance from the plane, is given by d =(a−p) · ˆn, (7.46) where the sign of d depends on which side of the plane P is situated. 230 7.8 USING VECTORS TO FIND DISTANCES O P d a p ˆn Figure 7.15 The minimum distance d from a point to a plane. trianglerightsldFind the distance from the point P with coordinates (1,2,3) to the plane that contains the points A, B and C having coordinates (0,1,0), (2,3,1) and (5,7,2). Let us denote the position vectors of the points A, B, C by a, b, c. Two vectors in the plane are b−a =2i +2j + k and c−a =5i +6j +2k, and hence a vector normal to the plane is n =(2i +2j + k)×(5i +6j +2k)=−2i + j +2k, and its unit normal is ˆn = n |n| = 1 3 (−2i + j +2k). Denoting the position vector of P by p, the minimum distance from the plane to P is given by d =(a−p)· ˆn =(−i−j−3k)· 1 3 (−2i + j +2k) = 2 3 − 1 3 −2=− 5 3 . If we take P to be the origin O, then we find d = 1 3 , i.e. a positive quantity. It follows from this that the original point P with coordinates (1,2,3), for which d was negative, is on the opposite side of the plane from the origin. triangleleftsld 7.8.3 Distance from a line to a line Consider two lines in the directions a and b, as shown in figure 7.16. Since a×b is by definition perpendicular to both a and b, the unit vector normal to both these lines is ˆn = a×b |a×b| . 231 VECTOR ALGEBRA O Q P ˆn q a b p Figure 7.16 The minimum distance from one line to another. If p and q are the position vectors of any two points P and Q on diﬀerent lines then the vector connecting them is p−q. Thus, the minimum distance d between the lines is this vector’s component along the unit normal, i.e. d =|(p−q) · ˆn|. trianglerightsldA line is inclined at equal angles to the x-, y- and z-axes and passes through the origin. Another line passes through the points (1,2,4) and (0,0,1). Find the minimum distance between the two lines. The first line is given by r 1 = λ(i + j + k), and the second by r 2 = k + µ(i +2j +3k). Hence a vector normal to both lines is n =(i + j + k)×(i +2j +3k)=i−2j + k, and the unit normal is ˆn = 1 √ 6 (i−2j + k). A vector between the two lines is, for example, the one connecting the points (0,0,0) and (0,0,1), which is simply k. Thus it follows that the minimum distance between the two lines is d = 1 √ 6 |k·(i−2j + k)|= 1 √ 6 . triangleleftsld 7.8.4 Distance from a line to a plane Let us consider the line r = a + λb. This line will intersect any plane to which it is not parallel. Thus, if a plane has a normal ˆn then the minimum distance from 232 7.9 RECIPROCAL VECTORS the line to the plane is zero unless b · ˆn =0, in which case the distance, d, will be d =|(a−r) · ˆn|, where r is any point in the plane. trianglerightsldA line is given by r = a + λb,wherea = i +2j +3k and b =4i +5j +6k.Findthe coordinates of the point P at which the line intersects the plane x +2y +3z =6. A vector normal to the plane is n = i +2j +3k, from which we find that b · n negationslash= 0. Thus the line does indeed intersect the plane. To find the point of intersection we merely substitute the x-, y-andz- values of a general point on the line into the equation of the plane, obtaining 1+4λ +2(2+5λ)+3(3+6λ)=6 ⇒ 14 + 32λ =6. This gives λ = − 1 4 , which we may substitute into the equation for the line to obtain x =1− 1 4 (4) = 0, y =2− 1 4 (5) = 3 4 and z =3− 1 4 (6) = 3 2 . Thus the point of intersection is (0, 3 4 , 3 2 ). triangleleftsld 7.9 Reciprocal vectors The final section of this chapter introduces the concept of reciprocal vectors, which have particular uses in crystallography. The two sets of vectors a, b, c and a prime , b prime , c prime are called reciprocal sets if a · a prime = b · b prime = c · c prime = 1 (7.47) and a prime · b = a prime · c = b prime · a = b prime · c = c prime · a = c prime · b =0. (7.48) It can be verified (see exercise 7.19) that the reciprocal vectors of a, b and c are given by a prime = b×c a· (b×c) , (7.49) b prime = c×a a· (b×c) , (7.50) c prime = a×b a· (b×c) , (7.51) where a·(b×c)negationslash= 0. In other words, reciprocal vectors only exist if a, b and c are 233 VECTOR ALGEBRA not coplanar. Moreover, if a, b and c are mutually orthogonal unit vectors then a prime = a, b prime = b and c prime = c, so that the two systems of vectors are identical. trianglerightsldConstruct the reciprocal vectors of a =2i, b = j + k, c = i + k. First we evaluate the triple scalar product: a·(b×c)=2i·[(j + k)×(i + k)] =2i·(i + j−k)=2. Now we find the reciprocal vectors: a prime = 1 2 (j + k)×(i + k)= 1 2 (i + j−k), b prime = 1 2 (i + k)×2i = j, c prime = 1 2 (2i)×(j + k)=−j + k. It is easily verified that these reciprocal vectors satisfy their defining properties (7.47), (7.48). triangleleftsld We may also use the concept of reciprocal vectors to define the components of a vector a with respect to basis vectors e 1 , e 2 , e 3 that are not mutually orthogonal. If the basis vectors are of unit length and mutually orthogonal, such as the Cartesian basis vectors i, j, k, then (see the text preceeding (7.21)) the vector a canbewrittenintheform a =(a · i)i +(a · j)j +(a · k)k. If the basis is not orthonormal, however, then this is no longer true. Nevertheless, we may write the components of a with respect to a non-orthonormal basis e 1 , e 2 , e 3 in terms of its reciprocal basis vectors e prime 1 , e prime 2 , e prime 3 , which are defined as in (7.49)–(7.51). If we let a = a 1 e 1 + a 2 e 2 + a 3 e 3 , then the scalar product a · e prime 1 is given by a · e prime 1 = a 1 e 1 · e prime 1 + a 2 e 2 · e prime 1 + a 3 e 3 · e prime 1 = a 1 , where we have used the relations (7.48). Similarly, a 2 = a·e prime 2 and a 3 = a·e prime 3 ;sonow a =(a · e prime 1 )e 1 +(a · e prime 2 )e 2 +(a · e prime 3 )e 3 . (7.52) 7.10 Exercises 7.1 Which of the following statements about general vectors a, b and c are true? (a) c·(a×b)=(b×a)·c. (b) a×(b×c)=(a×b)×c. (c) a×(b×c)=(a·c)b−(a·b)c. (d) d = λa + µb implies (a×b)·d =0. (e) a×c = b×c implies c·a−c·b = c|a−b|. (f) (a×b)×(c×b)=b[b·(c×a)]. 234 7.10 EXERCISES 7.2 A unit cell of diamond is a cube of side A, with carbon atoms at each corner, at the centre of each face and, in addition, at positions displaced by 1 4 A(i + j + k) from each of those already mentioned; i, j, k are unit vectors along the cube axes. One corner of the cube is taken as the origin of coordinates. What are the vectors joining the atom at 1 4 A(i + j + k) to its four nearest neighbours? Determine the angle between the carbon bonds in diamond. 7.3 Identify the following surfaces: (a) |r|= k;(b)r·u = l;(c)r·u = m|r| for −1≤m≤+1; (d) |r−(r·u)u|= n. Here k, l, m and n are fixed scalars and u is a fixed unit vector. 7.4 Find the angle between the position vectors to the points (3,−4,0) and (−2,1,0) and find the direction cosines of a vector perpendicular to both. 7.5 A,B,C and D are the four corners, in order, of one face of a cube of side 2 units. The opposite face has corners E,F,G and H,withAE,BF,CG and DH as parallel edges of the cube. The centre O ofthecubeistakenastheorigin and the x-, y-andz-axes are parallel to AD, AE and AB, respectively. Find the following: (a) the angle between the face diagonal AF and the body diagonal AG; (b) the equation of the plane through B that is parallel to the plane CGE; (c) the perpendicular distance from the centre J of the face BCGF to the plane OCG; (d) the volume of the tetrahedron JOCG. 7.6 Use vector methods to prove that the lines joining the mid-points of the opposite edges of a tetrahedron OABC meet at a point and that this point bisects each of the lines. 7.7 The edges OP, OQ and OR of a tetrahedron OPQR are vectors p, q and r, respectively, where p =2i +4j, q =2i−j +3k and r =4i−2j +5k. Show that OP is perpendicular to the plane containing OQR. Express the volume of the tetrahedron in terms of p, q and r and hence calculate the volume. 7.8 Prove, by writing it out in component form, that (a×b)×c =(a·c)b−(b·c)a, and deduce the result, stated in equation (7.25), that the operation of forming the vector product is non-associative. 7.9 Prove Lagrange’s identity, i.e. (a×b)·(c×d)=(a·c)(b·d)−(a·d)(b·c). 7.10 For four arbitrary vectors a, b, c and d, evaluate (a×b)×(c×d) in two diﬀerent ways and so prove that a[b,c,d]−b[c,d,a]+c[d,a,b]−d[a,b,c]=0. Show that this reduces to the normal Cartesian representation of the vector d, i.e. d x i +d y j +d z k,ifa,b and c are taken as i, j and k, the Cartesian base vectors. 7.11 Show that the points (1,0,1), (1,1,0) and (1,−3,4) lie on a straight line. Give the equation of the line in the form r = a + λb. 235 VECTOR ALGEBRA 7.12 The plane P 1 contains the points A, B and C, which have position vectors a = −3i +2j, b =7i +2j and c =2i +3j +2k, respectively. Plane P 2 passes through A and is orthogonal to the line BC, whilst plane P 3 passes through B and is orthogonal to the line AC. Find the coordinates of r, the point of intersection of the three planes. 7.13 Two planes have non-parallel unit normals ˆn and ˆm and their closest distances from the origin are λ and µ, respectively. Find the vector equation of their line of intersection in the form r = νp + a. 7.14 Two fixed points, A and B, in three-dimensional space have position vectors a and b. Identify the plane P given by (a−b)·r = 1 2 (a 2 −b 2 ), where a and b are the magnitudes of a and b. Show also that the equation (a−r)·(b−r)=0 describes a sphere S of radius |a−b|/2. Deduce that the intersection of P and S is also the intersection of two spheres, centred on A and B, and each of radius |a−b|/ √ 2. 7.15 Let O, A, B and C be four points with position vectors 0, a, b and c, and denote by g = λa +µb +νc the position of the centre of the sphere on which they all lie. (a) Prove that λ, µ and ν simultaneously satisfy (a·a)λ +(a·b)µ +(a·c)ν = 1 2 a 2 and two other similar equations. (b) By making a change of origin, find the centre and radius of the sphere on which the points p =3i+j−2k, q =4i+3j−3k, r =7i−3k and s =6i+j−k all lie. 7.16 The vectors a, b and c are coplanar and related by λa + µb + νc =0, where λ, µ, ν are not all zero. Show that the condition for the points with position vectors αa, βb and γc to be collinear is λ α + µ β + ν γ =0. 7.17 Using vector methods: (a) Show that the line of intersection of the planes x +2y +3z =0and 3x +2y + z = 0 is equally inclined to the x-andz-axes and makes an angle cos −1 (−2/ √ 6) with the y-axis. (b) Find the perpendicular distance between one corner of a unit cube and the major diagonal not passing through it. 7.18 Four points X i , i =1,2,3,4, taken for simplicity as all lying within the octant x,y,z ≥ 0, have position vectors x i . Convince yourself that the direction of vector x n lies within the sector of space defined by the directions of the other three vectors if min over j bracketleftbigg x i ·x j |x i ||x j | bracketrightbigg , considered for i =1,2,3,4 in turn, takes its maximum value for i = n,i.e.n equals that value of i for which the largest of the set of angles which x i makes with the other vectors, is found to be the lowest. Determine whether any of the four 236 7.10 EXERCISES d a a b c Figure 7.17 A face-centred cubic crystal. points with coordinates X 1 =(3,2,2),X 2 =(2,3,1),X 3 =(2,1,3),X 4 =(3,0,3) lies within the tetrahedron defined by the origin and the other three points. 7.19 The vectors a, b and c are not coplanar. The vectors a prime , b prime and c prime are the associated reciprocal vectors. Verify that the expressions (7.49)–(7.51) define a set of reciprocal vectors a prime , b prime and c prime with the following properties: (a) a prime ·a = b prime ·b = c prime ·c =1; (b) a prime ·b = a prime ·c = b prime ·a etc = 0; (c) [a prime ,b prime ,c prime ]=1/[a,b,c]; (d) a =(b prime ×c prime )/[a prime ,b prime ,c prime ]. 7.20 Three non-coplanar vectors a, b and c, have as their respective reciprocal vectors the set a prime , b prime and c prime . Show that the normal to the plane containing the points k −1 a, l −1 b and m −1 c is in the direction of the vector ka prime + lb prime + mc prime . 7.21 In a crystal with a face-centred cubic structure, the basic cell can be taken as a cube of edge a with its centre at the origin of coordinates and its edges parallel to the Cartesian coordinate axes; atoms are sited at the eight corners and at the centre of each face. However, other basic cells are possible. One is the rhomboid shown in figure 7.17, which has the three vectors b, c and d as edges. (a) Show that the volume of the rhomboid is one-quarter that of the cube. (b) Show that the angles between pairs of edges of the rhomboid are 60 ◦ and that the corresponding angles between pairs of edges of the rhomboid defined by the reciprocal vectors to b, c, d are each 109.5 ◦ . (This rhomboid can be used as the basic cell of a body-centred cubic structure, more easily visualised as a cube with an atom at each corner and one at its centre.) (c) In order to use the Bragg formula, 2dsinθ = nλ, for the scattering of X-rays by a crystal, it is necessary to know the perpendicular distance d between successive planes of atoms; for a given crystal structure, d has a particular value for each set of planes considered. For the face-centred cubic structure find the distance between successive planes with normals in the k, i + j and i + j + k directions. 237 VECTOR ALGEBRA 7.22 In subsection 7.6.2 we showed how the moment or torque of a force about an axis could be represented by a vector in the direction of the axis. The magnitude of the vector gives the size of the moment and the sign of the vector gives the sense. Similar representations can be used for angular velocities and angular momenta. (a) The magnitude of the angular momentum about the origin of a particle of mass m moving with velocity v on a path that is a perpendicular distance d from the origin is given by m|v|d. Show that if r is the position of the particle then the vector J = r×mv represents the angular momentum. (b) Now consider a rigid collection of particles (or a solid body) rotating about an axis through the origin, the angular velocity of the collection being represented by ω. (i) Show that the velocity of the ith particle is v i = ω×r i and that the total angular momentum J is J = summationdisplay i m i [r 2 i ω−(r i · ω)r i ]. (ii) Show further that the component of J along the axis of rotation can be written as Iω,whereI, the moment of inertia of the collection about the axis or rotation, is given by I = summationdisplay i m i ρ 2 i . Interpret ρ i geometrically. (iii) Prove that the total kinetic energy of the particles is 1 2 Iω 2 . 7.23 By proceeding as indicated below, prove the parallel axis theorem,whichstates that, for a body of mass M, the moment of inertia I about any axis is related to the corresponding moment of inertia I 0 about a parallel axis that passes through the centre of mass of the body by I = I 0 + Ma 2 ⊥ , where a ⊥ is the perpendicular distance between the two axes. Note that I 0 can be written as integraldisplay (ˆn×r)·(ˆn×r)dm, where r is the vector position, relative to the centre of mass, of the infinitesimal mass dm and ˆn is a unit vector in the direction of the axis of rotation. Write a similar expression for I in which r is replaced by r prime = r−a,wherea is the vector position of any point on the axis to which I refers. Use Lagrange’s identity and the fact that integraltext rdm = 0 (by the definition of the centre of mass) to establish the result. 7.24 Without carrying out any further integration, use the results of the previous exercise, the worked example in subsection 6.3.4 and exercise 6.10 to prove that the moment of inertia of a uniform rectangular lamina, of mass M and sides a and b, about an axis perpendicular to its plane and passing through the point (αa/2,βb/2), with −1≤α,β ≤1, is M 12 [a 2 (1 + 3α 2 )+b 2 (1 + 3β 2 )]. 238 7.10 EXERCISES V 0 cosωt V 1 V 2 V 3 V 4 R 1 =50Ω R 2 I 1 I 2 I 3 L C =10µF Figure 7.18 An oscillatory electric circuit. The power supply has angular frequency ω =2πf = 400π s −1 . 7.25 Define a set of (non-orthogonal) base vectors a = j + k, b = i + k and c = i + j. (a) Establish their reciprocal vectors and hence express the vectors p =3i−2j+k, q = i +4j and r =−2i + j + k in terms of the base vectors a, b and c. (b) Verify that the scalar product p · q has the same value, −5, when evaluated using either set of components. 7.26 Systems that can be modelled as damped harmonic oscillators are widespread; pendulum clocks, car shock absorbers, tuning circuits in television sets and radios, and collective electron motions in plasmas and metals are just a few examples. In all these cases, one or more variables describing the system obey(s) an equation of the form ¨x +2γ˙x + ω 2 0 x = P cos ωt, where ˙x = dx/dt, etc. and the inclusion of the factor 2 is conventional. In the steady state (i.e. after the eﬀects of any initial displacement or velocity have been damped out) the solution of the equation takes the form x(t)=Acos(ωt + φ). By expressing each term in the form B cos(ωt+epsilon1), and representing it by a vector of magnitude B making an angle epsilon1 with the x-axis, draw a closed vector diagram, at t = 0, say, that is equivalent to the equation. (a) Convince yourself that whatever the value of ω (> 0) φ must be negative (−π<φ≤0) and that φ =tan −1 parenleftbigg −2γω ω 2 0 −ω 2 parenrightbigg . (b) Obtain an expression for A in terms of P, ω 0 and ω. 7.27 According to alternating current theory, the currents and potential diﬀerences in the components of the circuit shown in figure 7.18 are determined by Kirchhoﬀ’s laws and the relationships I 1 = V 1 R 1 ,I 2 = V 2 R 2 ,I 3 = iωCV 3 ,V 4 = iωLI 2 . The factor i = √ −1 in the expression for I 3 indicates that the phase of I 3 is 90 ◦ ahead of V 3 . Similarly the phase of V 4 is 90 ◦ ahead of I 2 . Measurement shows that V 3 has an amplitude of 0.661V 0 and a phase of +13.4 ◦ relative to that of the power supply. Taking V 0 =1V,andusingaseries 239 VECTOR ALGEBRA of vector plots for potential diﬀerences and currents (they could all be on the same plot if suitable scales were chosen), determine all unknown currents and potential diﬀerences and find values for the inductance of L and the resistance of R 2 . [Scales of 1cm = 0.1V for potential diﬀerences and 1 cm = 1 mA for currents are convenient.] 7.11 Hints and answers 7.1 (c), (d) and (e). 7.3 (a) A sphere of radius k centred on the origin; (b) a plane with its normal in the direction of u and at a distance l from the origin; (c) a cone with its axis parallel to u and of semiangle cos −1 m; (d) a circular cylinder of radius n with its axis parallel to u. 7.5 (a) cos −1 radicalbig 2/3; (b) z−x =2;(c)1/ √ 2; (d) 1 3 1 2 (c×g)·j = 1 3 . 7.7 Show that q×r is parallel to p; volume = 1 3 bracketleftbig 1 2 (q×r)·p bracketrightbig = 5 3 . 7.9 Note that (a×b)·(c×d)=d·[(a×b)×c] and use the result for a triple vector product to expand the expression in square brackets. 7.11 Show that the position vectors of the points are linearly dependent; r = a + λb where a = i + k and b =−j + k. 7.13 Show that p must have the direction ˆn× ˆm and write a as xˆn+y ˆm. By obtaining a pair of simultaneous equations for x and y, prove that x =(λ−µˆn· ˆm)/[1−(ˆn· ˆm) 2 ] and that y =(µ−λˆn· ˆm)/[1−(ˆn· ˆm) 2 ]. 7.15 (a) Note that |a−g| 2 = R 2 =|0−g| 2 , leading to a·a =2a·g. (b) Make p the new origin and solve the three simultaneous linear equations to obtain λ =5/18, µ =10/18, ν = −3/18, giving g =2i−k and a sphere of radius √ 5 centred on (5,1,−3). 7.17 (a) Find two points on both planes, say (0,0,0) and (1,−2,1), and hence determine the direction cosines of the line of intersection; (b) ( 2 3 ) 1/2 . 7.19 For (c) and (d), treat (c×a)×(a×b) as a triple vector product with c×a as one of the three vectors. 7.21 (b) b prime = a −1 (−i+j+k), c prime = a −1 (i−j+k), d prime = a −1 (i+j−k); (c) a/2fordirection k; successive planes through (0,0,0) and (a/2,0,a/2) give a spacing of a/ √ 8for direction i+j; successive planes through (−a/2,0,0) and (a/2,0,0) give a spacing of a/ √ 3fordirectioni + j + k. 7.23 Note that a 2 −(ˆn·a) 2 = a 2 ⊥ . 7.25 p =−2a+3b, q = 3 2 a− 3 2 b+ 5 2 c and r =2a−b−c. Remember that a·a = b·b = c·c =2anda·b = a·c = b·c =1. 7.27 With currents in mA and potential diﬀerences in volts: I 1 =(7.76,−23.2 ◦ ), I 2 =(14.36,−50.8 ◦ ), I 3 =(8.30,103.4 ◦ ); V 1 =(0.388,−23.2 ◦ ), V 2 =(0.287,−50.8 ◦ ), V 4 =(0.596,39.2 ◦ ); L = 33 mH, R 2 =20Ω. 240 8 Matrices and vector spaces In the previous chapter we defined a vector as a geometrical object which has both a magnitude and a direction and which may be thought of as an arrow fixed in our familiar three-dimensional space, a space which, if we need to, we define by reference to, say, the fixed stars. This geometrical definition of a vector is both useful and important since it is independent of any coordinate system with which we choose to label points in space. In most specific applications, however, it is necessary at some stage to choose a coordinate system and to break down a vector into its component vectors in the directions of increasing coordinate values. Thus for a particular Cartesian coordinate system (for example) the component vectors of a vector a will be a x i, a y j and a z k and the complete vector will be a = a x i + a y j + a z k. (8.1) Although we have so far considered only real three-dimensional space, we may extend our notion of a vector to more abstract spaces, which in general can have an arbitrary number of dimensions N. We may still think of such a vector as an ‘arrow’ in this abstract space, so that it is again independent of any (N- dimensional) coordinate system with which we choose to label the space. As an example of such a space, which, though abstract, has very practical applications, we may consider the description of a mechanical or electrical system. If the state of a system is uniquely specified by assigning values to a set of N variables, which could be angles or currents, for example, then that state can be represented by a vector in an N-dimensional space, the vector having those values as its components. In this chapter we first discuss general vector spaces and their properties. We then go on to discuss the transformation of one vector into another by a linear operator. This leads naturally to the concept of a matrix, a two-dimensional array of numbers. The properties of matrices are then discussed and we conclude with 241 MATRICES AND VECTOR SPACES a discussion of how to use these properties to solve systems of linear equations. The application of matrices to the study of oscillations in physical systems is takenupinchapter9. 8.1 Vector spaces A set of objects (vectors) a, b, c, ... is said to form a linear vector space V if: (i) the set is closed under commutative and associative addition, so that a + b = b + a, (8.2) (a + b)+c = a +(b + c); (8.3) (ii) the set is closed under multiplication by a scalar (any complex number) to form a new vector λa, the operation being both distributive and associative so that λ(a + b)=λa + λb, (8.4) (λ + µ)a = λa + µa, (8.5) λ(µa)=(λµ)a, (8.6) where λ and µ are arbitrary scalars; (iii) there exists a null vector 0 such that a + 0 = a for all a; (iv) multiplication by unity leaves any vector unchanged, i.e. 1×a = a; (v) all vectors have a corresponding negative vector−a such that a+(−a)=0. It follows from (8.5) with λ = 1 and µ =−1that−a is the same vector as (−1)×a. We note that if we restrict all scalars to be real then we obtain a real vector space (an example of which is our familiar three-dimensional space); otherwise, in general, we obtain a complex vector space. We note that it is common to use the terms ‘vector space’ and ‘space’, instead of the more formal ‘linear vector space’. The span of a set of vectors a,b,...,s is defined as the set of all vectors that may be written as a linear sum of the original set, i.e. all vectors x = αa + βb + ···+ σs (8.7) that result from the infinite number of possible values of the (in general complex) scalars α,β,...,σ.Ifx in (8.7) is equal to 0 for some choice of α,β,...,σ (not all zero), i.e. if αa + βb +···+ σs = 0, (8.8) then the set of vectors a,b,...,s,issaidtobelinearly dependent.Insuchaset at least one vector is redundant, since it can be expressed as a linear sum of the others. If, however, (8.8) is not satisfied by any set of coeﬃcients (other than 242 8.1 VECTOR SPACES the trivial case in which all the coeﬃcients are zero) then the vectors are linearly independent, and no vector in the set can be expressed as a linear sum of the others. If, in a given vector space, there exist sets of N linearly independent vectors, but no set of N + 1 linearly independent vectors, then the vector space is said to be N-dimensional. (In this chapter we will limit our discussion to vector spaces of finite dimensionality; spaces of infinite dimensionality are discussed in chapter 17.) 8.1.1 Basis vectors If V is an N-dimensional vector space then any set of N linearly independent vectors e 1 ,e 2 ,...,e N forms a basis for V.Ifx is an arbitrary vector lying in V then the set of N + 1 vectors x,e 1 ,e 2 ,...,e N , must be linearly dependent and therefore such that αe 1 + βe 2 +···+ σe N + χx = 0, (8.9) where the coeﬃcients α,β,...,χ are not all equal to 0, and in particular χ negationslash=0. Rearranging (8.9) we may write x as a linear sum of the vectors e i as follows: x = x 1 e 1 + x 2 e 2 +···+ x N e N = N summationdisplay i=1 x i e i , (8.10) for some set of coeﬃcients x i that are simply related to the original coeﬃcients, e.g. x 1 = −α/χ, x 2 = −β/χ, etc. Since any x lying in the span of V can be expressed in terms of the basis or base vectors e i , the latter are said to form a complete set. The coeﬃcients x i are the components of x with respect to the e i -basis. These components are unique, since if both x = N summationdisplay i=1 x i e i and x = N summationdisplay i=1 y i e i , then N summationdisplay i=1 (x i −y i )e i = 0, (8.11) which, since the e i are linearly independent, has only the solution x i = y i for all i =1,2,...,N. From the above discussion we see that any set of N linearly independent vectors can form a basis for an N-dimensional space. If we choose a diﬀerent set e prime i , i =1,...,N then we can write x as x = x prime 1 e prime 1 + x prime 2 e prime 2 +···+ x prime N e prime N = N summationdisplay i=1 x prime i e prime i . (8.12) 243 MATRICES AND VECTOR SPACES We reiterate that the vector x (a geometrical entity) is independent of the basis – it is only the components of x that depend on the basis. We note, however, that given a set of vectors u 1 ,u 2 ,...,u M ,whereM negationslash= N,inanN-dimensional vector space, then either there exists a vector that cannot be expressed as a linear combination of the u i or, for some vector that can be so expressed, the components are not unique. 8.1.2 The inner product We may usefully add to the description of vectors in a vector space by defining the inner product of two vectors, denoted in general by 〈a|b〉, which is a scalar function of a and b. The scalar or dot product, a · b ≡|a||b|cosθ,ofvectors in real three-dimensional space (where θ is the angle between the vectors), was introduced in the last chapter and is an example of an inner product. In eﬀect the notion of an inner product 〈a|b〉 is a generalisation of the dot product to more abstract vector spaces. Alternative notations for 〈a|b〉 are (a,b), or simply a · b. The inner product has the following properties: (i) 〈a|b〉=〈b|a〉 ∗ , (ii) 〈a|λb + µc〉= λ〈a|b〉+ µ〈a|c〉. We note that in general, for a complex vector space, (i) and (ii) imply that 〈λa + µb|c〉= λ ∗ 〈a|c〉+ µ ∗ 〈b|c〉, (8.13) 〈λa|µb〉= λ ∗ µ〈a|b〉. (8.14) Following the analogy with the dot product in three-dimensional real space, two vectors in a general vector space are defined to be orthogonal if 〈a|b〉 =0. Similarly, the norm of a vector a is given by bardblabardbl = 〈a|a〉 1/2 and is clearly a generalisation of the length or modulus |a| of a vector a in three-dimensional space. In a general vector space 〈a|a〉 can be positive or negative; however, we shall be primarily concerned with spaces in which 〈a|a〉≥0 and which are thus said to have a positive semi-definite norm.Insuchaspace〈a|a〉= 0 implies a = 0. Let us now introduce into our N-dimensional vector space a basis ˆe 1 , ˆe 2 ,...,ˆe N that has the desirable property of being orthonormal (the basis vectors are mutually orthogonal and each has unit norm), i.e. a basis that has the property 〈ˆe i |ˆe j 〉= δ ij . (8.15) Here δ ij is the Kronecker delta symbol (of which we say more in chapter 26) and has the properties δ ij = braceleftBigg 1fori = j, 0forinegationslash= j. 244 8.1 VECTOR SPACES In the above basis we may express any two vectors a and b as a = N summationdisplay i=1 a i ˆe i and b = N summationdisplay i=1 b i ˆe i . Furthermore, in such an orthonormal basis we have, for any a, 〈ˆe j |a〉= N summationdisplay i=1 〈ˆe j |a i ˆe i 〉= N summationdisplay i=1 a i 〈ˆe j |ˆe i 〉= a j . (8.16) Thus the components of a are given by a i = 〈ˆe i |a〉. Note that this is not true unless the basis is orthonormal. We can write the inner product of a and b in terms of their components in an orthonormal basis as 〈a|b〉=〈a 1 ˆe 1 + a 2 ˆe 2 + ···+ a N ˆe N |b 1 ˆe 1 + b 2 ˆe 2 + ···+ b N ˆe N 〉 = N summationdisplay i=1 a ∗ i b i 〈ˆe i |ˆe i 〉+ N summationdisplay i=1 N summationdisplay jnegationslash=i a ∗ i b j 〈ˆe i |ˆe j 〉 = N summationdisplay i=1 a ∗ i b i , where the second equality follows from (8.14) and the third from (8.15). This is clearly a generalisation of the expression (7.21) for the dot product of vectors in three-dimensional space. We may generalise the above to the case where the base vectors e 1 ,e 2 ,...,e N are not orthonormal (or orthogonal). In general we can define the N 2 numbers G ij =〈e i |e j 〉. (8.17) Then, if a = summationtext N i=1 a i e i and b = summationtext N i=1 b i e i , the inner product of a and b is given by 〈a|b〉= angbracketleftBigg N summationdisplay i=1 a i e i vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle N summationdisplay j=1 b j e j angbracketrightBigg = N summationdisplay i=1 N summationdisplay j=1 a ∗ i b j 〈e i |e j 〉 = N summationdisplay i=1 N summationdisplay j=1 a ∗ i G ij b j . (8.18) We further note that from (8.17) and the properties of the inner product we require G ij = G ∗ ji . This in turn ensures that bardblabardbl=〈a|a〉 is real, since then 〈a|a〉 ∗ = N summationdisplay i=1 N summationdisplay j=1 a i G ∗ ij a ∗ j = N summationdisplay j=1 N summationdisplay i=1 a ∗ j G ji a i =〈a|a〉. 245 MATRICES AND VECTOR SPACES 8.1.3 Some useful inequalities For a set of objects (vectors) forming a linear vector space in which 〈a|a〉≥0for all a, the following inequalities are often useful. (i) Schwarz’s inequality is the most basic result and states that |〈a|b〉|≤bardblabardblbardblbbardbl, (8.19) where the equality holds when a is a scalar multiple of b,i.e.whena = λb. It is important here to distinguish between the absolute value of a scalar, |λ|,andthenorm of a vector, bardblabardbl. Schwarz’s inequality may be proved by considering bardbla + λbbardbl 2 =〈a + λb|a + λb〉 =〈a|a〉+ λ〈a|b〉+ λ ∗ 〈b|a〉+ λλ ∗ 〈b|b〉. If we write 〈a|b〉 as |〈a|b〉|e iα then bardbla + λbbardbl 2 =bardblabardbl 2 +|λ| 2 bardblbbardbl 2 + λ|〈a|b〉|e iα + λ ∗ |〈a|b〉|e −iα . However, bardbla + λbbardbl 2 ≥0 for all λ,sowemaychooseλ = re −iα and require that, for all r, 0≤bardbla + λbbardbl 2 =bardblabardbl 2 + r 2 bardblbbardbl 2 +2r|〈a|b〉|. This means that the quadratic equation in r formed by setting the RHS equal to zero must have no real roots. This, in turn, implies that 4|〈a|b〉| 2 ≤4bardblabardbl 2 bardblbbardbl 2 , which, on taking the square root (all factors are necessarily positive) of both sides, gives Schwarz’s inequality. (ii) The triangle inequality states that bardbla + bbardbl≤bardblabardbl+bardblbbardbl (8.20) and may be derived from the properties of the inner product and Schwarz’s inequality as follows. Let us first consider bardbla + bbardbl 2 =bardblabardbl 2 +bardblbbardbl 2 +2Re〈a|b〉≤bardblabardbl 2 +bardblbbardbl 2 +2|〈a|b〉|. Using Schwarz’s inequality we then have bardbla + bbardbl 2 ≤bardblabardbl 2 +bardblbbardbl 2 +2bardblabardblbardblbbardbl=(bardblabardbl+bardblbbardbl) 2 , which, on taking the square root, gives the triangle inequality (8.20). (iii) Bessel’s inequality requires the introduction of an orthonormal basis ˆe i , i =1,2,...,N into the N-dimensional vector space; it states that bardblabardbl 2 ≥ summationdisplay i |〈ˆe i |a〉| 2 , (8.21) 246 8.2 LINEAR OPERATORS where the equality holds if the sum includes all N basis vectors. If not all the basis vectors are included in the sum then the inequality results (though of course the equality remains if those basis vectors omitted all have a i = 0). Bessel’s inequality can also be written 〈a|a〉≥ summationdisplay i |a i | 2 , where the a i are the components of a in the orthonormal basis. From (8.16) these are given by a i =〈ˆe i |a〉. The above may be proved by considering vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle a− summationdisplay i 〈ˆe i |a〉ˆe i vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 2 = angbracketleftBig a− summationdisplay i 〈ˆe i |a〉ˆe i vextendsingle vextendsingle vextendsinglea− summationdisplay j 〈ˆe j |a〉ˆe j angbracketrightBig . Expanding out the inner product and using 〈ˆe i |a〉 ∗ =〈a|ˆe i 〉, we obtain vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle a− summationdisplay i 〈ˆe i |a〉ˆe i vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 2 =〈a|a〉−2 summationdisplay i 〈a|ˆe i 〉〈ˆe i |a〉+ summationdisplay i summationdisplay j 〈a|ˆe i 〉〈ˆe j |a〉〈ˆe i |ˆe j 〉. Now 〈ˆe i |ˆe j 〉= δ ij , since the basis is orthonormal, and so we find 0≤ vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle a− summationdisplay i 〈ˆe i |a〉ˆe i vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 2 =bardblabardbl 2 − summationdisplay i |〈ˆe i |a〉| 2 , which is Bessel’s inequality. We take this opportunity to mention also (iv) the parallelogram equality bardbla + bbardbl 2 +bardbla−bbardbl 2 =2 parenleftbig bardblabardbl 2 +bardblbbardbl 2 parenrightbig , (8.22) which may be proved straightforwardly from the properties of the inner product. 8.2 Linear operators We now discuss the action of linear operators on vectors in a vector space. A linear operator A associates with every vector x another vector y = Ax, in such a way that, for two vectors a and b, A(λa + µb)=λAa + µAb, where λ, µ are scalars. We say that A ‘operates’ on x to give the vector y.We note that the action of A is independent of any basis or coordinate system and 247 MATRICES AND VECTOR SPACES may be thought of as ‘transforming’ one geometrical entity (i.e. a vector) into another. If we now introduce a basis e i , i =1,2,...,N, into our vector space then the action of A on each of the basis vectors is to produce a linear combination of the latter; this may be written as Ae j = N summationdisplay i=1 A ij e i , (8.23) where A ij is the ith component of the vector Ae j in this basis; collectively the numbers A ij are called the components of the linear operator in the e i -basis. In this basis we can express the relation y = Ax in component form as y = N summationdisplay i=1 y i e i = A N summationdisplay j=1 x j e j = N summationdisplay j=1 x j N summationdisplay i=1 A ij e i , and hence, in purely component form, in this basis we have y i = N summationdisplay j=1 A ij x j . (8.24) If we had chosen a diﬀerent basis e prime i , in which the components of x, y and A are x prime i , y prime i and A prime ij respectively then the geometrical relationship y = Ax would be represented in this new basis by y prime i = N summationdisplay j=1 A prime ij x prime j . We have so far assumed that the vector y is in the same vector space as x.If,however,y belongs to a diﬀerent vector space, which may in general be M-dimensional (M negationslash= N) then the above analysis needs a slight modification. By introducing a basis set f i , i =1,2,...,M, into the vector space to which y belongs we may generalise (8.23) as Ae j = M summationdisplay i=1 A ij f i , where the components A ij of the linear operator A relate to both of the bases e j and f i . 248 8.3 MATRICES 8.2.1 Properties of linear operators If x is a vector and A and B are two linear operators then it follows that (A +B)x = Ax + Bx, (λA)x = λ(Ax), (AB)x = A(Bx), where in the last equality we see that the action of two linear operators in succession is associative. The product of two linear operators is not in general commutative, however, so that in general ABx negationslash= BAx. In an obvious way we define the null (or zero) and identity operators by Ox = 0 and I x = x, for any vector x in our vector space. Two operators A and B are equal if Ax = Bx for all vectors x. Finally, if there exists an operator A −1 such that AA −1 = A −1 A = I then A −1 is the inverse of A. Some linear operators do not possess an inverse and are called singular, whilst those operators that do have an inverse are termed non-singular. 8.3 Matrices We have seen that in a particular basis e i both vectors and linear operators can be described in terms of their components with respect to the basis. These components may be displayed as an array of numbers called a matrix.Ingeneral, if a linear operator A transforms vectors from an N-dimensional vector space, for which we choose a basis e j , j =1,2,...,N, into vectors belonging to an M-dimensional vector space, with basis f i , i =1,2,...,M, then we may represent the operator A by the matrix A = A 11 A 12 ... A 1N A 21 A 22 ... A 2N . . . . . . . . . . . . A M1 A M2 ... A MN . (8.25) The matrix elements A ij are the components of the linear operator with respect to the bases e j and f i ; the component A ij of the linear operator appears in the ith row and jth column of the matrix. The array has M rows and N columns and is thus called an M×N matrix. If the dimensions of the two vector spaces are the same, i.e. M = N (for example, if they are the same vector space) then we may represent A by an N×N or square matrix of order N. The component A ij , which in general may be complex, is also denoted by (A) ij . 249 MATRICES AND VECTOR SPACES In a similar way we may denote a vector x in terms of its components x i in a basis e i , i =1,2,...,N, by the array x = x 1 x 2 . . . x N , which is a special case of (8.25) and is called a column matrix (or conventionally, and slightly confusingly, a column vector or even just a vector – strictly speaking the term ‘vector’ refers to the geometrical entity x). The column matrix x can also be written as x =(x 1 x 2 ··· x N ) T , which is the transpose of a row matrix (see section 8.6). We note that in a diﬀerent basis e prime i the vector x would be represented by a diﬀerent column matrix containing the components x prime i in the new basis, i.e. x prime = x prime 1 x prime 2 . . . x prime N . Thus, we use x and x prime to denote diﬀerent column matrices which, in diﬀerent bases e i and e prime i , represent the same vector x. In many texts, however, this distinction is not made and x (rather than x) is equated to the corresponding column matrix; if we regard x as the geometrical entity, however, this can be misleading and so we explicitly make the distinction. A similar argument follows for linear operators; the same linear operator A is described in diﬀerent bases by diﬀerent matrices A and A prime , containing diﬀerent matrix elements. 8.4 Basic matrix algebra The basic algebra of matrices may be deduced from the properties of the linear operators that they represent. In a given basis the action of two linear operators A and B on an arbitrary vector x (see the beginning of subsection 8.2.1), when written in terms of components using (8.24), is given by summationdisplay j (A + B) ij x j = summationdisplay j A ij x j + summationdisplay j B ij x j , summationdisplay j (λA) ij x j = λ summationdisplay j A ij x j , summationdisplay j (AB) ij x j = summationdisplay k A ik (Bx) k = summationdisplay j summationdisplay k A ik B kj x j . 250 8.4 BASIC MATRIX ALGEBRA Now, since x is arbitrary, we can immediately deduce the way in which matrices are added or multiplied, i.e. (A + B) ij = A ij + B ij , (8.26) (λA) ij = λA ij , (8.27) (AB) ij = summationdisplay k A ik B kj . (8.28) We note that a matrix element may, in general, be complex. We now discuss matrix addition and multiplication in more detail. 8.4.1 Matrix addition and multiplication by a scalar From (8.26) we see that the sum of two matrices, S = A + B, is the matrix whose elements are given by S ij = A ij + B ij for every pair of subscripts i,j, with i =1,2,...,M and j =1,2,...,N.For example, if A and B are 2×3 matrices then S = A + B is given by parenleftbigg S 11 S 12 S 13 S 21 S 22 S 23 parenrightbigg = parenleftbigg A 11 A 12 A 13 A 21 A 22 A 23 parenrightbigg + parenleftbigg B 11 B 12 B 13 B 21 B 22 B 23 parenrightbigg = parenleftbigg A 11 + B 11 A 12 + B 12 A 13 + B 13 A 21 + B 21 A 22 + B 22 A 23 + B 23 parenrightbigg . (8.29) Clearly, for the sum of two matrices to have any meaning, the matrices must have the same dimensions, i.e. both be M×N matrices. From definition (8.29) it follows that A + B = B + A and that the sum of a number of matrices can be written unambiguously without bracketting, i.e. matrix addition is commutative and associative. The diﬀerence of two matrices is defined by direct analogy with addition. The matrix D = A−B has elements D ij = A ij −B ij , for i =1,2,...,M, j =1,2,...,N. (8.30) From (8.27) the product of a matrix A with a scalar λ is the matrix with elements λA ij , for example λ parenleftbigg A 11 A 12 A 13 A 21 A 22 A 23 parenrightbigg = parenleftbigg λA 11 λA 12 λA 13 λA 21 λA 22 λA 23 parenrightbigg . (8.31) Multiplication by a scalar is distributive and associative. 251 MATRICES AND VECTOR SPACES trianglerightsldThe matrices A, B and C are given by A = parenleftbigg 2 −1 31 parenrightbigg , B = parenleftbigg 10 0 −2 parenrightbigg , C = parenleftbigg −21 −11 parenrightbigg . Find the matrix D = A +2B−C. D = parenleftbigg 2 −1 31 parenrightbigg +2 parenleftbigg 10 0 −2 parenrightbigg − parenleftbigg −21 −11 parenrightbigg = parenleftbigg 2+2×1−(−2) −1+2×0−1 3+2×0−(−1) 1+2×(−2)−1 parenrightbigg = parenleftbigg 6 −2 4 −4 parenrightbigg . triangleleftsld From the above considerations we see that the set of all, in general complex, M×N matrices (with fixed M and N) forms a linear vector space of dimension MN. One basis for the space is the set of M×N matrices E (p,q) with the property that E (p,q) ij =1ifi = p and j = q whilst E (p,q) ij = 0 for all other values of i and j, i.e. each matrix has only one non-zero entry, which equals unity. Here the pair (p,q) is simply a label that picks out a particular one of the matrices E (p,q) ,the total number of which is MN. 8.4.2 Multiplication of matrices Let us consider again the ‘transformation’ of one vector into another, y = Ax, which, from (8.24), may be described in terms of components with respect to a particular basis as y i = N summationdisplay j=1 A ij x j for i =1,2,...,M. (8.32) Writing this in matrix form as y = Ax we have y 1 y 2 . . . y M = A 11 A 12 ... A 1N A 21 A 22 ... A 2N . . . . . . . . . . . . A M1 A M2 ... A MN x 1 x 2 . . . x N (8.33) where we have highlighted with boxes the components used to calculate the element y 2 : using (8.32) for i =2, y 2 = A 21 x 1 + A 22 x 2 +···+ A 2N x N . All the other components y i are calculated similarly. If instead we operate with A on a basis vector e j having all components zero 252 8.4 BASIC MATRIX ALGEBRA except for the jth, which equals unity, then we find Ae j = A 11 A 12 ... A 1N A 21 A 22 ... A 2N . . . . . . . . . . . . A M1 A M2 ... A MN 0 0 . . . 1 . . . 0 = A 1j A 2j . . . A Mj , and so confirm our identification of the matrix element A ij as the ith component of Ae j in this basis. From (8.28) we can extend our discussion to the product of two matrices P = AB,whereP is the matrix of the quantities formed by the operation of the rows of A on the columns of B, treating each column of B in turn as the vector x represented in component form in (8.32). It is clear that, for this to be a meaningful definition, the number of columns in A must equal the number of rows in B. Thus the product AB of an M×N matrix A with an N×R matrix B is itself an M×R matrix P,where P ij = N summationdisplay k=1 A ik B kj for i =1,2,...,M, j =1,2,...,R. For example, P = AB may be written in matrix form parenleftBigg P 11 P 12 P 21 P 22 parenrightBigg = parenleftbigg A 11 A 12 A 13 A 21 A 22 A 23 parenrightbigg B 11 B 12 B 21 B 22 B 31 B 32 where P 11 = A 11 B 11 + A 12 B 21 + A 13 B 31 , P 21 = A 21 B 11 + A 22 B 21 + A 23 B 31 , P 12 = A 11 B 12 + A 12 B 22 + A 13 B 32 , P 22 = A 21 B 12 + A 22 B 22 + A 23 B 32 . Multiplication of more than two matrices follows naturally and is associative. So, for example, A(BC)≡(AB)C, (8.34) provided, of course, that all the products are defined. As mentioned above, if A is an M×N matrix and B is an N×M matrix then two product matrices are possible, i.e. P = AB and Q = BA. 253 MATRICES AND VECTOR SPACES These are clearly not the same, since P is an M ×M matrix whilst Q is an N×N matrix. Thus, particular care must be taken to write matrix products in the intended order; P = AB but Q = BA. We note in passing that A 2 means AA, A 3 means A(AA)=(AA)A etc. Even if both A and B are square, in general ABnegationslash= BA, (8.35) i.e. the multiplication of matrices is not, in general, commutative. trianglerightsldEvaluate P = AB and Q = BA where A = 32−1 03 2 1 −34 , B = 2 −23 110 321 . As we saw for the 2×2 case above, the element P ij of the matrix P = AB is found by mentally taking the ‘scalar product’ of the ith row of A with the jth column of B.For example, P 11 =3×2+2×1+(−1)×3=5,P 12 =3×(−2) + 2×1+(−1)×2=−6, etc. Thus P = AB = 32−1 03 2 1 −34 2 −23 110 321 = 5 −68 972 1137 , and, similarly, Q = BA = 2 −23 110 321 32−1 03 2 1 −34 = 9 −11 6 351 1095 . These results illustrate that, in general, two matrices do not commute. triangleleftsld The property that matrix multiplication is distributive over addition, i.e. that (A + B)C = AC + BC (8.36) and C(A + B)=CA + CB, (8.37) follows directly from its definition. 8.4.3 The null and identity matrices Both the null matrix and the identity matrix are frequently encountered, and we take this opportunity to introduce them briefly, leaving their uses until later. The null or zero matrix 0 has all elements equal to zero, and so its properties are A0 = 0 = 0A, A + 0 = 0 + A = A. 254 8.5 FUNCTIONS OF MATRICES The identity matrix I has the property AI = IA = A. It is clear that, in order for the above products to be defined, the identity matrix must be square. The N×N identity matrix (often denoted by I N )hastheform I N = 10··· 0 01 . . . . . . . . . 0 0 ··· 01 . 8.5 Functions of matrices If a matrix A is square then, as mentioned above, one can define powers of A in a straightforward way. For example A 2 = AA, A 3 = AAA, or in the general case A n = AA···A (n times), where n is a positive integer. Having defined powers of a square matrix A,we may construct functions of A of the form S = summationdisplay n a n A n , where the a k are simple scalars and the number of terms in the summation may be finite or infinite. In the case where the sum has an infinite number of terms, the sum has meaning only if it converges. A common example of such a function is the exponential of a matrix, which is defined by exp A = ∞ summationdisplay n=0 A n n! . (8.38) This definition can, in turn, be used to define other functions such as sinA and cosA. 8.6 The transpose of a matrix We have seen that the components of a linear operator in a given coordinate sys- tem can be written in the form of a matrix A. We will also find it useful, however, to consider the diﬀerent (but clearly related) matrix formed by interchanging the rows and columns of A. The matrix is called the transpose of A and is denoted by A T . 255 MATRICES AND VECTOR SPACES trianglerightsldFind the transpose of the matrix A = parenleftbigg 312 041 parenrightbigg . By interchanging the rows and columns of A we immediately obtain A T = 30 14 21 . triangleleftsld It is obvious that if A is an M×N matrix then its transpose A T is a N×M matrix. As mentioned in section 8.3, the transpose of a column matrix is a row matrix and vice versa. An important use of column and row matrices is in the representation of the inner product of two real vectors in terms of their components in a given basis. This notion is discussed fully in the next section, where it is extended to complex vectors. The transpose of the product of two matrices, (AB) T , is given by the product of their transposes taken in the reverse order, i.e. (AB) T = B T A T . (8.39) This is proved as follows: (AB) T ij =(AB) ji = summationdisplay k A jk B ki = summationdisplay k (A T ) kj (B T ) ik = summationdisplay k (B T ) ik (A T ) kj =(B T A T ) ij , and the proof can be extended to the product of several matrices to give (ABC···G) T = G T ···C T B T A T . 8.7 The complex and Hermitian conjugates of a matrix Two further matrices that can be derived from a given general M ×N matrix are the complex conjugate, denoted by A ∗ ,andtheHermitian conjugate, denoted by A † . The complex conjugate of a matrix A is the matrix obtained by taking the complex conjugate of each of the elements of A,i.e. (A ∗ ) ij =(A ij ) ∗ . Obviously if a matrix is real (i.e. it contains only real elements) then A ∗ = A. 256 8.7 THE COMPLEX AND HERMITIAN CONJUGATES OF A MATRIX trianglerightsldFind the complex conjugate of the matrix A = parenleftbigg 123i 1+i 10 parenrightbigg . By taking the complex conjugate of each element we obtain immediately A ∗ = parenleftbigg 12−3i 1−i 10 parenrightbigg . triangleleftsld The Hermitian conjugate, or adjoint,ofamatrixA is the transpose of its complex conjugate, or equivalently, the complex conjugate of its transpose, i.e. A † =(A ∗ ) T =(A T ) ∗ . We note that if A is real (and so A ∗ = A)thenA † = A T , and taking the Hermitian conjugate is equivalent to taking the transpose. Following the previous line of argument for the transpose of the product of several matrices, the Hermitian conjugate of such a product can be shown to be given by (AB···G) † = G † ···B † A † . (8.40) trianglerightsldFind the Hermitian conjugate of the matrix A = parenleftbigg 123i 1+i 10 parenrightbigg . Taking the complex conjugate of A and then forming the transpose we find A † = 11−i 21 −3i 0 . We obtain the same result, of course, if we first take the transpose of A andthentakethe complex conjugate. triangleleftsld An important use of the Hermitian conjugate (or transpose in the real case) is in connection with the inner product of two vectors. Suppose that in a given orthonormal basis the vectors a and b may be represented by the column matrices a = a 1 a 2 . . . a N and b = b 1 b 2 . . . b N . (8.41) Taking the Hermitian conjugate of a, to give a row matrix, and multiplying (on 257 MATRICES AND VECTOR SPACES the right) by b we obtain a † b =(a ∗ 1 a ∗ 2 ··· a ∗ N ) b 1 b 2 . . . b N = N summationdisplay i=1 a ∗ i b i , (8.42) which is the expression for the inner product 〈a|b〉 in that basis. We note that for real vectors (8.42) reduces to a T b = summationtext N i=1 a i b i . If the basis e i is not orthonormal, so that, in general, 〈e i |e j 〉= G ij negationslash= δ ij , then, from (8.18), the scalar product of a and b in terms of their components with respect to this basis is given by 〈a|b〉= N summationdisplay i=1 N summationdisplay j=1 a ∗ i G ij b j = a † Gb, where G is the N×N matrix with elements G ij . 8.8 The trace of a matrix For a given matrix A, in the previous two sections we have considered various other matrices that can be derived from it. However, sometimes one wishes to derive a single number from a matrix. The simplest example is the trace (or spur) of a square matrix, which is denoted by Tr A. This quantity is defined as the sum of the diagonal elements of the matrix, Tr A = A 11 + A 22 +···+ A NN = N summationdisplay i=1 A ii . (8.43) It is clear that taking the trace is a linear operation so that, for example, Tr(A±B)=TrA±Tr B. A very useful property of traces is that the trace of the product of two matrices is independent of the order of their multiplication; this results holds whether or not the matrices commute and is proved as follows: Tr AB = N summationdisplay i=1 (AB) ii = N summationdisplay i=1 N summationdisplay j=1 A ij B ji = N summationdisplay i=1 N summationdisplay j=1 B ji A ij = N summationdisplay j=1 (BA) jj =TrBA. (8.44) The result can be extended to the product of several matrices. For example, from (8.44), we immediately find Tr ABC =TrBCA =TrCAB, 258 8.9 THE DETERMINANT OF A MATRIX which shows that the trace of a multiple product is invariant under cyclic permutations of the matrices in the product. Other easily derived properties of the trace are, for example, Tr A T =TrA and Tr A † =(TrA) ∗ . 8.9 The determinant of a matrix For a given matrix A, the determinant det A (like the trace) is a single number (or algebraic expression) that depends upon the elements of A. Also like the trace, the determinant is defined only for square matrices. If, for example, A is a 3×3 matrix then its determinant, of order 3, is denoted by det A =|A|= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . (8.45) In order to calculate the value of a determinant, we first need to introduce the notions of the minor and the cofactor of an element of a matrix. (We shall see that we can use the cofactors to write an order-3 determinant as the weighted sum of three order-2 determinants, thereby simplifying its evaluation.) The minor M ij of the element A ij of an N×N matrix A is the determinant of the (N −1)×(N −1) matrix obtained by removing all the elements of the ith row and jth column of A; the associated cofactor, C ij , is found by multiplying the minor by (−1) i+j . trianglerightsldFind the cofactor of the element A 23 of the matrix A = A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 . Removing all the elements of the second row and third column of A and forming the determinant of the remaining terms gives the minor M 23 = vextendsingle vextendsingle vextendsingle vextendsingle A 11 A 12 A 31 A 32 vextendsingle vextendsingle vextendsingle vextendsingle . Multiplying the minor by (−1) 2+3 =(−1) 5 =−1gives C 23 =− vextendsingle vextendsingle vextendsingle vextendsingle A 11 A 12 A 31 A 32 vextendsingle vextendsingle vextendsingle vextendsingle . triangleleftsld We now define a determinant as the sum of the products of the elements of any row or column and their corresponding cofactors,e.g.A 21 C 21 +A 22 C 22 +A 23 C 23 or A 13 C 13 +A 23 C 23 +A 33 C 33 . Such a sum is called a Laplace expansion. For example, in the first of these expansions, using the elements of the second row of the 259 MATRICES AND VECTOR SPACES determinant defined by (8.45) and their corresponding cofactors, we write |A| as the Laplace expansion |A|= A 21 (−1) (2+1) M 21 + A 22 (−1) (2+2) M 22 + A 23 (−1) (2+3) M 23 =−A 21 vextendsingle vextendsingle vextendsingle vextendsingle A 12 A 13 A 32 A 33 vextendsingle vextendsingle vextendsingle vextendsingle + A 22 vextendsingle vextendsingle vextendsingle vextendsingle A 11 A 13 A 31 A 33 vextendsingle vextendsingle vextendsingle vextendsingle −A 23 vextendsingle vextendsingle vextendsingle vextendsingle A 11 A 12 A 31 A 32 vextendsingle vextendsingle vextendsingle vextendsingle . We will see later that the value of the determinant is independent of the row or column chosen. Of course, we have not yet determined the value of |A| but, rather, written it as the weighted sum of three determinants of order 2. However, applying again the definition of a determinant, we can evaluate each of the order-2 determinants. trianglerightsldEvaluate the determinant vextendsingle vextendsingle vextendsingle vextendsingle A 12 A 13 A 32 A 33 vextendsingle vextendsingle vextendsingle vextendsingle . By considering the products of the elements of the first row in the determinant, and their corresponding cofactors, we find vextendsingle vextendsingle vextendsingle vextendsingle A 12 A 13 A 32 A 33 vextendsingle vextendsingle vextendsingle vextendsingle = A 12 (−1) (1+1) |A 33 |+ A 13 (−1) (1+2) |A 32 | = A 12 A 33 −A 13 A 32 , where the values of the order-1 determinants |A 33 | and |A 32 | are defined to be A 33 and A 32 respectively. It must be remembered that the determinant is not the same as the modulus, e.g. det (−2) =|−2|=−2, not 2. triangleleftsld We can now combine all the above results to show that the value of the determinant (8.45) is given by |A|=−A 21 (A 12 A 33 −A 13 A 32 )+A 22 (A 11 A 33 −A 13 A 31 ) −A 23 (A 11 A 32 −A 12 A 31 ) (8.46) = A 11 (A 22 A 33 −A 23 A 32 )+A 12 (A 23 A 31 −A 21 A 33 ) +A 13 (A 21 A 32 −A 22 A 31 ), (8.47) where the final expression gives the form in which the determinant is usually remembered and is the form that is obtained immediately by considering the Laplace expansion using the first row of the determinant. The last equality, which essentially rearranges a Laplace expansion using the second row into one using the first row, supports our assertion that the value of the determinant is unaﬀected by which row or column is chosen for the expansion. 260 8.9 THE DETERMINANT OF A MATRIX trianglerightsldSuppose the rows of a real 3×3 matrix A are interpreted as the components in a given basis of three (three-component) vectors a, b and c. Show that one can write the determinant of A as |A|= a·(b×c). If one writes the rows of A as the components in a given basis of three vectors a, b and c, we have from (8.47) that |A|= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle = a 1 (b 2 c 3 −b 3 c 2 )+a 2 (b 3 c 1 −b 1 c 3 )+a 3 (b 1 c 2 −b 2 c 1 ). From expression (7.34) for the scalar triple product given in subsection 7.6.3, it follows that we may write the determinant as |A|= a·(b×c). (8.48) In other words, |A| is the volume of the parallelepiped defined by the vectors a, b and c. (One could equally well interpret the columns of the matrix A as the components of three vectors, and result (8.48) would still hold.) This result provides a more memorable (and more meaningful) expression than (8.47) for the value of a 3×3 determinant. Indeed, using this geometrical interpretation, we see immediately that, if the vectors a 1 , a 2 , a 3 are not linearly independent then the value of the determinant vanishes: |A|=0.triangleleftsld The evaluation of determinants of order greater than 3 follows the same general method as that presented above, in that it relies on successively reducing the order of the determinant by writing it as a Laplace expansion. Thus, a determinant of order 4 is first written as a sum of four determinants of order 3, which are then evaluated using the above method. For higher-order determinants, one cannot write down directly a simple geometrical expression for |A| analogous to that given in (8.48). Nevertheless, it is still true that if the rows or columns of the N ×N matrix A are interpreted as the components in a given basis of N (N-component) vectors a 1 ,a 2 ,...,a N , then the determinant |A| vanishes if these vectors are not all linearly independent. 8.9.1 Properties of determinants A number of properties of determinants follow straightforwardly from the defini- tion of det A; their use will often reduce the labour of evaluating a determinant. We present them here without specific proofs, though they all follow readily from the alternative form for a determinant, given in equation (26.29) on page 942, and expressed in terms of the Levi–Civita symbol epsilon1 ijk (see exercise 26.9). (i) Determinant of the transpose. The transpose matrix A T (which, we recall, is obtained by interchanging the rows and columns of A) has the same determinant as A itself, i.e. |A T |=|A|. (8.49) 261 MATRICES AND VECTOR SPACES It follows that any theorem established for the rows of A will apply to the columns as well, and vice versa. (ii) Determinant of the complex and Hermitian conjugate. It is clear that the matrix A ∗ obtained by taking the complex conjugate of each element of A has the determinant |A ∗ |=|A| ∗ . Combining this result with (8.49), we find that |A † |=|(A ∗ ) T |=|A ∗ |=|A| ∗ . (8.50) (iii) Interchanging two rows or two columns. If two rows (columns) of A are interchanged, its determinant changes sign but is unaltered in magnitude. (iv) Removing factors. If all the elements of a single row (column) of A have a common factor, λ, then this factor may be removed; the value of the determinant is given by the product of the remaining determinant and λ. Clearly this implies that if all the elements of any row (column) are zero then |A| = 0. It also follows that if every element of the N×N matrix A is multiplied by a constant factor λ then |λA|= λ N |A|. (8.51) (v) Identical rows or columns. If any two rows (columns) of A are identical or are multiples of one another, then it can be shown that |A|=0. (vi) Adding a constant multiple of one row (column) to another. The determinant of a matrix is unchanged in value by adding to the elements of one row (column) any fixed multiple of the elements of another row (column). (vii) Determinant of a product.IfA and B are square matrices of the same order then |AB|=|A||B|=|BA|. (8.52) A simple extension of this property gives, for example, |AB···G|=|A||B|···|G|=|A||G|···|B|=|A···GB|, which shows that the determinant is invariant under permutation of the matrices in a multiple product. There is no explicit procedure for using the above results in the evaluation of any given determinant, and judging the quickest route to an answer is a matter of experience. A general guide is to try to reduce all terms but one in a row or column to zero and hence in eﬀect to obtain a determinant of smaller size. The steps taken in evaluating the determinant in the example below are certainly not the fastest, but they have been chosen in order to illustrate the use of most of the properties listed above. 262 8.10 THE INVERSE OF A MATRIX trianglerightsldEvaluate the determinant |A|= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1023 01−21 3 −34−2 −21−2 −1 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . Taking a factor 2 out of the third column and then adding the second column to the third gives |A|=2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1013 01−11 3 −32−2 −21−1 −1 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1013 0101 3 −3 −1 −2 −21 0−1 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . Subtracting the second column from the fourth gives |A|=2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1013 0100 3 −3 −11 −21 0−2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle . We now note that the second row has only one non-zero element and so the determinant may conveniently be written as a Laplace expansion, i.e. |A|=2×1×(−1) 2+2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 113 3 −11 −20−2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 404 3 −11 −20−2 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle , where the last equality follows by adding the second row to the first. It can now be seen that the first row is minus twice the third, and so the value of the determinant is zero, by property (v) above. triangleleftsld 8.10 The inverse of a matrix Our first use of determinants will be in defining the inverse of a matrix. If we were dealing with ordinary numbers we would consider the relation P = AB as equivalent to B = P/A, provided that Anegationslash= 0. However, if A, B and P are matrices then this notation does not have an obvious meaning. What we really want to know is whether an explicit formula for B can be obtained in terms of A and P. It will be shown that this is possible for those cases in which |A| negationslash=0.A square matrix whose determinant is zero is called a singular matrix; otherwise it is non-singular. We will show that if A is non-singular we can define a matrix, denoted by A −1 and called the inverse of A, which has the property that if AB = P then B = A −1 P.Inwords,B can be obtained by multiplying P from the left by A −1 . Analogously, if B is non-singular then, by multiplication from the right, A = PB −1 . It is clear that AI = A ⇒ I = A −1 A, (8.53) where I is the unit matrix, and so A −1 A = I = AA −1 . These statements are 263 MATRICES AND VECTOR SPACES equivalent to saying that if we first multiply a matrix, B say, by A and then multiply by the inverse A −1 , we end up with the matrix we started with, i.e. A −1 AB = B. (8.54) This justifies our use of the term inverse. It is also clear that the inverse is only defined for square matrices. So far we have only defined what we mean by the inverse of a matrix. Actually finding the inverse of a matrix A may be carried out in a number of ways. We will show that one method is to construct first the matrix C containing the cofactors of the elements of A, as discussed in the last subsection. Then the required inverse A −1 can be found by forming the transpose of C and dividing by the determinant of A. Thus the elements of the inverse A −1 are given by (A −1 ) ik = (C) T ik |A| = C ki |A| . (8.55) That this procedure does indeed result in the inverse may be seen by considering the components of A −1 A,i.e. (A −1 A) ij = summationdisplay k (A −1 ) ik (A) kj = summationdisplay k C ki |A| A kj = |A| |A| δ ij . (8.56) The last equality in (8.56) relies on the property summationdisplay k C ki A kj =|A|δ ij ; (8.57) this can be proved by considering the matrix A prime obtained from the original matrix A when the ith column of A is replaced by one of the other columns, say the jth. Thus A prime is a matrix with two identical columns and so has zero determinant. However, replacing the ith column by another does not change the cofactors C ki of the elements in the ith column, which are therefore the same in A and A prime . Recalling the Laplace expansion of a determinant, i.e. |A|= summationdisplay k A ki C ki , we obtain 0=|A prime |= summationdisplay k A prime ki C prime ki = summationdisplay k A kj C ki ,inegationslash= j, which together with the Laplace expansion itself may be summarised by (8.57). It is immediately obvious from (8.55) that the inverse of a matrix is not defined if the matrix is singular (i.e. if |A|=0). 264 8.10 THE INVERSE OF A MATRIX trianglerightsldFind the inverse of the matrix A = 243 1 −2 −2 −33 2 . We first determine |A|: |A|=2[−2(2)−(−2)3] + 4[(−2)(−3)−(1)(2)] + 3[(1)(3)−(−2)(−3)] =11. (8.58) This is non-zero and so an inverse matrix can be constructed. To do this we need the matrix of the cofactors, C, and hence C T . We find C = 24−3 113−18 −27−8 and C T = 21−2 4137 −3 −18 −8 , and hence A −1 = C T |A| = 1 11 21−2 4137 −3 −18 −8 . triangleleftsld (8.59) For a 2×2 matrix, the inverse has a particularly simple form. If the matrix is A = parenleftbigg A 11 A 12 A 21 A 22 parenrightbigg then its determinant |A| is given by |A| = A 11 A 22 −A 12 A 21 ,andthematrixof cofactors is C = parenleftbigg A 22 −A 21 −A 12 A 11 parenrightbigg . Thus the inverse of A is given by A −1 = C T |A| = 1 A 11 A 22 −A 12 A 21 parenleftbigg A 22 −A 12 −A 21 A 11 parenrightbigg . (8.60) It can be seen that the transposed matrix of cofactors for a 2×2matrixisthe same as the matrix formed by swapping the elements on the leading diagonal (A 11 and A 22 ) and changing the signs of the other two elements (A 12 and A 21 ). This is completely general for a 2×2 matrix and is easy to remember. The following are some further useful properties related to the inverse matrix 265 MATRICES AND VECTOR SPACES and may be straightforwardly derived. (i) (A −1 ) −1 = A. (ii) (A T ) −1 =(A −1 ) T . (iii) (A † ) −1 =(A −1 ) † . (iv) (AB) −1 = B −1 A −1 . (v) (AB···G) −1 = G −1 ···B −1 A −1 . trianglerightsldProve the properties (i)–(v) stated above. We begin by writing down the fundamental expression defining the inverse of a non- singular square matrix A: AA −1 = I = A −1 A. (8.61) Property (i). This follows immediately from the expression (8.61). Property (ii). Taking the transpose of each expression in (8.61) gives (AA −1 ) T = I T =(A −1 A) T . Using the result (8.39) for the transpose of a product of matrices and noting that I T = I, we find (A −1 ) T A T = I = A T (A −1 ) T . However, from (8.61), this implies (A −1 ) T =(A T ) −1 and hence proves result (ii) above. Property (iii). This may be proved in an analogous way to property (ii), by replacing the transposes in (ii) by Hermitian conjugates and using the result (8.40) for the Hermitian conjugate of a product of matrices. Property (iv). Using (8.61), we may write (AB)(AB) −1 = I = (AB) −1 (AB), From the left-hand equality it follows, by multiplying on the left by A −1 ,that A −1 AB(AB) −1 = A −1 I and hence B(AB) −1 = A −1 . Now multiplying on the left by B −1 gives B −1 B(AB) −1 = B −1 A −1 , and hence the stated result. Property (v). Finally, result (iv) may extended to case (v) in a straightforward manner. For example, using result (iv) twice we find (ABC) −1 = (BC) −1 A −1 = C −1 B −1 A −1 . triangleleftsld We conclude this section by noting that the determinant |A −1 | of the inverse matrix can be expressed very simply in terms of the determinant|A|of the matrix itself. Again we start with the fundamental expression (8.61). Then, using the property (8.52) for the determinant of a product, we find |AA −1 |=|A||A −1 |=|I|. It is straightforward to show by Laplace expansion that |I|= 1, and so we arrive at the useful result |A −1 |= 1 |A| . (8.62) 266 8.11 THE RANK OF A MATRIX 8.11 The rank of a matrix The rank of a general M ×N matrix is an important concept, particularly in the solution of sets of simultaneous linear equations, to be discussed in the next section, and we now discuss it in some detail. Like the trace and determinant, the rank of matrix A is a single number (or algebraic expression) that depends on the elements of A. Unlike the trace and determinant, however, the rank of a matrix can be defined even when A is not square. As we shall see, there are two equivalent definitions of the rank of a general matrix. Firstly, the rank of a matrix may be defined in terms of the linear independence of vectors. Suppose that the columns of an M ×N matrix are interpreted as the components in a given basis of N (M-component) vectors v 1 ,v 2 ,...,v N ,as follows: A = ↑↑ ↑ v 1 v 2 ... v N ↓↓ ↓ . Then the rank of A, denoted by rank A or by R(A), is defined as the number of linearly independent vectors in the set v 1 ,v 2 ,...,v N , and equals the dimension of the vector space spanned by those vectors. Alternatively, we may consider the rows of A to contain the components in a given basis of the M (N-component) vectors w 1 ,w 2 ,...,w M as follows: A = ← w 1 → ← w 2 → . . . ← w M → . It may then be shown § that the rank of A is also equal to the number of linearly independent vectors in the set w 1 ,w 2 ,...,w M . From this definition it is should be clear that the rank of A is unaﬀected by the exchange of two rows (or two columns) or by the multiplication of a row (or column) by a constant. Furthermore, suppose that a constant multiple of one row (column) is added to another row (column): for example, we might replace the row w i by w i + cw j . This also has no eﬀect on the number of linearly independent rows and so leaves the rank of A unchanged. We may use these properties to evaluate the rank of a given matrix. A second (equivalent) definition of the rank of a matrix may be given and uses the concept of submatrices. A submatrix of A is any matrix that can be formed from the elements of A by ignoring one, or more than one, row or column. It § For a fuller discussion, see, for example, C. D. Cantrell, Modern Mathematical Methods for Physicists and Engineers (Cambridge: Cambridge University Press, 2000), chapter 6. 267 MATRICES AND VECTOR SPACES may be shown that the rank of a general M×N matrix is equal to the size of the largest square submatrix of A whose determinant is non-zero. Therefore, if a matrix A has an r×r submatrix S with |S|negationslash= 0, but no (r+1)×(r+1) submatrix with non-zero determinant then the rank of the matrix is r. From either definition it is clear that the rank of A is less than or equal to the smaller of M and N. trianglerightsldDetermine the rank of the matrix A = 110−2 202 2 413 1 . The largest possible square submatrices of A must be of dimension 3× 3. Clearly, A possesses four such submatrices, the determinants of which are given by vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 110 202 413 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =0, vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 11−2 20 2 41 1 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =0, vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 10−2 22 2 43 1 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =0, vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 10−2 02 2 13 1 vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =0. (In each case the determinant may be evaluated as described in subsection 8.9.1.) The next largest square submatrices of A are of dimension 2×2. Consider, for example, the 2×2 submatrix formed by ignoring the third row and the third and fourth columns of A; this has determinant vextendsingle vextendsingle vextendsingle vextendsingle 11 20 vextendsingle vextendsingle vextendsingle vextendsingle =1×0−2×1=−2. Since its determinant is non-zero, A is of rank 2 and we need not consider any other 2×2 submatrix. triangleleftsld In the special case in which the matrix A is a square N×N matrix, by comparing either of the above definitions of rank with our discussion of determinants in section 8.9, we see that |A| = 0 unless the rank of A is N.Inotherwords,A is singular unless R(A)=N. 8.12 Special types of square matrix Matrices that are square, i.e. N×N, are very common in physical applications. We now consider some special forms of square matrix that are of particular importance. 8.12.1 Diagonal matrices The unit matrix, which we have already encountered, is an example of a diagonal matrix. Such matrices are characterised by having non-zero elements only on the 268 8.12 SPECIAL TYPES OF SQUARE MATRIX leading diagonal, i.e. only elements A ij with i = j may be non-zero. For example, A = 10 0 02 0 00−3 , is a 3×3 diagonal matrix. Such a matrix is often denoted by A = diag (1,2,−3). By performing a Laplace expansion, it is easily shown that the determinant of an N×N diagonal matrix is equal to the product of the diagonal elements. Thus, if the matrix has the form A = diag(A 11 ,A 22 ,...,A NN )then |A|= A 11 A 22 ···A NN . (8.63) Moreover, it is also straightforward to show that the inverse of A is also a diagonal matrix given by A −1 = diag parenleftbigg 1 A 11 , 1 A 22 ,..., 1 A NN parenrightbigg . Finally, we note that, if two matrices A and B are both diagonal then they have the useful property that their product is commutative: AB = BA. This is not true for matrices in general. 8.12.2 Lower and upper triangular matrices A square matrix A is called lower triangular if all the elements above the principal diagonal are zero. For example, the general form for a 3×3 lower triangular matrix is A = A 11 00 A 21 A 22 0 A 31 A 32 A 33 , where the elements A ij may be zero or non-zero. Similarly an upper triangular square matrix is one for which all the elements below the principal diagonal are zero. The general 3×3 form is thus A = A 11 A 12 A 13 0 A 22 A 23 00A 33 . By performing a Laplace expansion, it is straightforward to show that, in the general N×N case, the determinant of an upper or lower triangular matrix is equal to the product of its diagonal elements, |A|= A 11 A 22 ···A NN . (8.64) 269 MATRICES AND VECTOR SPACES Clearly result (8.63) for diagonal matrices is a special case of this result. Moreover, it may be shown that the inverse of a non-singular lower (upper) triangular matrix is also lower (upper) triangular. 8.12.3 Symmetric and antisymmetric matrices A square matrix A of order N with the property A = A T is said to be symmetric. Similarly a matrix for which A = −A T is said to be anti- or skew-symmetric and its diagonal elements a 11 ,a 22 ,...,a NN are necessarily zero. Moreover, if A is (anti-)symmetric then so too is its inverse A −1 . This is easily proved by noting that if A =±A T then (A −1 ) T =(A T ) −1 =±A −1 . Any N ×N matrix A can be written as the sum of a symmetric and an antisymmetric matrix, since we may write A = 1 2 (A + A T )+ 1 2 (A−A T )=B + C, where clearly B = B T and C = −C T . The matrix B is therefore called the symmetric part of A,andC is the antisymmetric part. trianglerightsldIf A is an N×N antisymmetric matrix, show that |A|=0if N is odd. If A is antisymmetric then A T = −A. Using the properties of determinants (8.49) and (8.51), we have |A|=|A T |=|−A|=(−1) N |A|. Thus, if N is odd then |A|=−|A|, which implies that |A|=0.triangleleftsld 8.12.4 Orthogonal matrices A non-singular matrix with the property that its transpose is also its inverse, A T = A −1 , (8.65) is called an orthogonal matrix. It follows immediately that the inverse of an orthogonal matrix is also orthogonal, since (A −1 ) T =(A T ) −1 =(A −1 ) −1 . Moreover, since for an orthogonal matrix A T A = I, we have |A T A|=|A T ||A|=|A| 2 =|I|=1. Thus the determinant of an orthogonal matrix must be |A|=±1. An orthogonal matrix represents, in a particular basis, a linear operator that leaves the norms (lengths) of real vectors unchanged, as we will now show. 270 8.12 SPECIAL TYPES OF SQUARE MATRIX Suppose that y = Ax is represented in some coordinate system by the matrix equation y = Ax;then〈y|y〉 is given in this coordinate system by y T y = x T A T Ax = x T x. Hence 〈y|y〉 =〈x|x〉, showing that the action of a linear operator represented by an orthogonal matrix does not change the norm of a real vector. 8.12.5 Hermitian and anti-Hermitian matrices An Hermitian matrix is one that satisfies A = A † ,whereA † is the Hermitian conju- gate discussed in section 8.7. Similarly if A † =−A,thenA is called anti-Hermitian. A real (anti-)symmetric matrix is a special case of an (anti-)Hermitian matrix, in which all the elements of the matrix are real. Also, if A is an (anti-)Hermitian matrix then so too is its inverse A −1 ,since (A −1 ) † =(A † ) −1 =±A −1 . Any N×N matrix A can be written as the sum of an Hermitian matrix and an anti-Hermitian matrix, since A = 1 2 (A + A † )+ 1 2 (A−A † )=B + C, where clearly B = B † and C =−C † . The matrix B is called the Hermitian part of A,andC is called the anti-Hermitian part. 8.12.6 Unitary matrices A unitary matrix A is defined as one for which A † = A −1 . (8.66) Clearly, if A is real then A † = A T , showing that a real orthogonal matrix is a special case of a unitary matrix, one in which all the elements are real. We note that the inverse A −1 of a unitary is also unitary, since (A −1 ) † =(A † ) −1 =(A −1 ) −1 . Moreover, since for a unitary matrix A † A = I, we have |A † A|=|A † ||A|=|A| ∗ |A|=|I|=1. Thus the determinant of a unitary matrix has unit modulus. A unitary matrix represents, in a particular basis, a linear operator that leaves the norms (lengths) of complex vectors unchanged. If y = Ax is represented in some coordinate system by the matrix equation y = Ax then 〈y|y〉 is given in this coordinate system by y † y = x † A † Ax = x † x. 271 MATRICES AND VECTOR SPACES Hence〈y|y〉=〈x|x〉, showing that the action of the linear operator represented by a unitary matrix does not change the norm of a complex vector. The action of a unitary matrix on a complex column matrix thus parallels that of an orthogonal matrix acting on a real column matrix. 8.12.7 Normal matrices A final important set of special matrices consists of the normal matrices, for which AA † = A † A, i.e. a normal matrix is one that commutes with its Hermitian conjugate. We can easily show that Hermitian matrices and unitary matrices (or symmetric matrices and orthogonal matrices in the real case) are examples of normal matrices. For an Hermitian matrix, A = A † and so AA † = AA = A † A. Similarly, for a unitary matrix, A −1 = A † and so AA † = AA −1 = A −1 A = A † A. Finally, we note that, if A is normal then so too is its inverse A −1 ,since A −1 (A −1 ) † = A −1 (A † ) −1 =(A † A) −1 =(AA † ) −1 =(A † ) −1 A −1 =(A −1 ) † A −1 . This broad class of matrices is important in the discussion of eigenvectors and eigenvalues in the next section. 8.13 Eigenvectors and eigenvalues Suppose that a linear operator A transforms vectors x in an N-dimensional vector space into other vectors Ax in the same space. The possibility then arises that there exist vectors x each of which is transformed by A into a multiple of itself. Such vectors would have to satisfy Ax = λx. (8.67) Any non-zero vector x that satisfies (8.67) for some value of λ is called an eigenvector of the linear operator A,andλ is called the corresponding eigenvalue. As will be discussed below, in general the operator A has N independent eigenvectors x i , with eigenvalues λ i .Theλ i are not necessarily all distinct. If we choose a particular basis in the vector space, we can write (8.67) in terms of the components of A and x with respect to this basis as the matrix equation Ax = λx, (8.68) where A is an N×N matrix. The column matrices x that satisfy (8.68) obviously 272 8.13 EIGENVECTORS AND EIGENVALUES represent the eigenvectors x of A in our chosen coordinate system. Convention- ally, these column matrices are also referred to as the eigenvectors of the matrix A. § Clearly, if x is an eigenvector of A (with some eigenvalue λ) then any scalar multiple µx is also an eigenvector with the same eigenvalue. We therefore often use normalised eigenvectors, for which x † x =1 (note that x † x corresponds to the inner product 〈x|x〉 in our basis). Any eigen- vector x can be normalised by dividing all its components by the scalar (x † x) 1/2 . As will be seen, the problem of finding the eigenvalues and corresponding eigenvectors of a square matrix A plays an important role in many physical investigations. Throughout this chapter we denote the ith eigenvector of a square matrix A by x i and the corresponding eigenvalue by λ i . This superscript notation for eigenvectors is used to avoid any confusion with components. trianglerightsldA non-singular matrix A has eigenvalues λ i and eigenvectors x i . Find the eigenvalues and eigenvectors of the inverse matrix A −1 . The eigenvalues and eigenvectors of A satisfy Ax i = λ i x i . Left-multiplying both sides of this equation by A −1 , we find A −1 Ax i = λ i A −1 x i . Since A −1 A = I, on rearranging we obtain A −1 x i = 1 λ i x i . Thus, we see that A −1 has the same eigenvectors x i as does A, but the corresponding eigenvalues are 1/λ i . triangleleftsld In the remainder of this section we will discuss some useful results concerning the eigenvectors and eigenvalues of certain special (though commonly occurring) square matrices. The results will be established for matrices whose elements may be complex; the corresponding properties for real matrices may be obtained as special cases. 8.13.1 Eigenvectors and eigenvalues of a normal matrix In subsection 8.12.7 we defined a normal matrix A as one that commutes with its Hermitian conjugate, so that A † A = AA † . § In this context, when referring to linear combinations of eigenvectors x we will normally use the term ‘vector’. 273 MATRICES AND VECTOR SPACES We also showed that both Hermitian and unitary matrices (or symmetric and orthogonal matrices in the real case) are examples of normal matrices. We now discuss the properties of the eigenvectors and eigenvalues of a normal matrix. If x is an eigenvector of a normal matrix A with corresponding eigenvalue λ then Ax = λx, or equivalently, (A−λI)x = 0. (8.69) Denoting B = A−λI, (8.69) becomes Bx = 0 and, taking the Hermitian conjugate, we also have (Bx) † = x † B † = 0. (8.70) From (8.69) and (8.70) we then have x † B † Bx = 0. (8.71) However, the product B † B is given by B † B =(A−λI) † (A−λI)=(A † −λ ∗ I)(A−λI)=A † A−λ ∗ A−λA † + λλ ∗ . Now since A is normal, AA † = A † A and so B † B = AA † −λ ∗ A−λA † + λλ ∗ =(A−λI)(A−λI) † = BB † , and hence B is also normal. From (8.71) we then find x † B † Bx = x † BB † x =(B † x) † B † x = 0, from which we obtain B † x =(A † −λ ∗ I)x = 0. Therefore, for a normal matrix A, the eigenvalues of A † are the complex conjugates of the eigenvalues of A. Let us now consider two eigenvectors x i and x j of a normal matrix A corre- sponding to two diﬀerent eigenvalues λ i and λ j . We then have Ax i = λ i x i , (8.72) Ax j = λ j x j . (8.73) Multiplying (8.73) on the left by (x i ) † we obtain (x i ) † Ax j = λ j (x i ) † x j . (8.74) However, on the LHS of (8.74) we have (x i ) † A =(A † x i ) † =(λ ∗ i x i ) † = λ i (x i ) † , (8.75) where we have used (8.40) and the property just proved for a normal matrix to 274 8.13 EIGENVECTORS AND EIGENVALUES write A † x i = λ ∗ i x i . From (8.74) and (8.75) we have (λ i −λ j )(x i ) † x j = 0. (8.76) Thus, if λ i negationslash= λ j the eigenvectors x i and x j must be orthogonal,i.e.(x i ) † x j = 0. It follows immediately from (8.76) that if all N eigenvalues of a normal matrix A are distinct then all N eigenvectors of A are mutually orthogonal. If, however, two or more eigenvalues are the same then further consideration is required. An eigenvalue corresponding to two or more diﬀerent eigenvectors (i.e. they are not simply multiples of one another) is said to be degenerate. Suppose that λ 1 is k-fold degenerate, i.e. Ax i = λ 1 x i for i =1,2,...,k, (8.77) but that it is diﬀerent from any of λ k+1 , λ k+2 , etc. Then any linear combination of these x i is also an eigenvector with eigenvalue λ 1 ,since,forz = summationtext k i=1 c i x i , Az≡A k summationdisplay i=1 c i x i = k summationdisplay i=1 c i Ax i = k summationdisplay i=1 c i λ 1 x i = λ 1 z. (8.78) If the x i defined in (8.77) are not already mutually orthogonal then we can construct new eigenvectors z i that are orthogonal by the following procedure: z 1 = x 1 , z 2 = x 2 − bracketleftBig (ˆz 1 ) † x 2 bracketrightBig ˆz 1 , z 3 = x 3 − bracketleftBig (ˆz 2 ) † x 3 bracketrightBig ˆz 2 − bracketleftBig (ˆz 1 ) † x 3 bracketrightBig ˆz 1 , . . . z k = x k − bracketleftBig (ˆz k−1 ) † x k bracketrightBig ˆz k−1 −···− bracketleftBig (ˆz 1 ) † x k bracketrightBig ˆz 1 . In this procedure, known as Gram–Schmidt orthogonalisation, each new eigen- vector z i is normalised to give the unit vector ˆz i before proceeding to the construc- tion of the next one (the normalisation is carried out by dividing each element of the vector z i by [(z i ) † z i ] 1/2 ). Note that each factor in brackets (ˆz m ) † x n is a scalar product and thus only a number. It follows that, as shown in (8.78), each vector z i so constructed is an eigenvector of A with eigenvalue λ 1 and will remain so on normalisation. It is straightforward to check that, provided the previous new eigenvectors have been normalised as prescribed, each z i is orthogonal to all its predecessors. (In practice, however, the method is laborious and the example in subsection 8.14.1 gives a less rigorous but considerably quicker way.) Therefore, even if A has some degenerate eigenvalues we can by construction obtain a set of N mutually orthogonal eigenvectors. Moreover, it may be shown (although the proof is beyond the scope of this book) that these eigenvectors are complete in that they form a basis for the N-dimensional vector space. As 275 MATRICES AND VECTOR SPACES a result any arbitrary vector y can be expressed as a linear combination of the eigenvectors x i : y = N summationdisplay i=1 a i x i , (8.79) where a i =(x i ) † y. Thus, the eigenvectors form an orthogonal basis for the vector space. By normalising the eigenvectors so that (x i ) † x i = 1 this basis is made orthonormal. trianglerightsldShow that a normal matrix A can be written in terms of its eigenvalues λ i and orthonormal eigenvectors x i as A = N summationdisplay i=1 λ i x i (x i ) † . (8.80) The key to proving the validity of (8.80) is to show that both sides of the expression give thesameresultwhenactingonanarbitaryvectory.SinceA is normal, we may expand y in terms of the eigenvectors x i , as shown in (8.79). Thus, we have Ay = A N summationdisplay i=1 a i x i = N summationdisplay i=1 a i λ i x i . Alternatively, the action of the RHS of (8.80) on y is given by N summationdisplay i=1 λ i x i (x i ) † y = N summationdisplay i=1 a i λ i x i , since a i =(x i ) † y. We see that the two expressions for the action of each side of (8.80) on y are identical, which implies that this relationship is indeed correct. triangleleftsld 8.13.2 Eigenvectors and eigenvalues of Hermitian and anti-Hermitian matrices For a normal matrix we showed that if Ax = λx then A † x = λ ∗ x. However, if A is also Hermitian, A = A † , it follows necessarily that λ = λ ∗ . Thus, the eigenvalues of an Hermitian matrix are real, a result which may be proved directly. trianglerightsldProve that the eigenvalues of an Hermitian matrix are real. For any particular eigenvector x i , we take the Hermitian conjugate of Ax i = λ i x i to give (x i ) † A † = λ ∗ i (x i ) † . (8.81) Using A † = A,sinceA is Hermitian, and multiplying on the right by x i ,weobtain (x i ) † Ax i = λ ∗ i (x i ) † x i . (8.82) But multiplying Ax i = λ i x i through on the left by (x i ) † gives (x i ) † Ax i = λ i (x i ) † x i . Subtracting this from (8.82) yields 0 =(λ ∗ i −λ i )(x i ) † x i . 276 8.13 EIGENVECTORS AND EIGENVALUES But (x i ) † x i is the modulus squared of the non-zero vector x i and is thus non-zero. Hence λ ∗ i must equal λ i and thus be real. The same argument can be used to show that the eigenvalues of a real symmetric matrix are themselves real. triangleleftsld The importance of the above result will be apparent to any student of quantum mechanics. In quantum mechanics the eigenvalues of operators correspond to measured values of observable quantities, e.g. energy, angular momentum, parity and so on, and these clearly must be real. If we use Hermitian operators to formulate the theories of quantum mechanics, the above property guarantees physically meaningful results. Since an Hermitian matrix is also a normal matrix, its eigenvectors are orthog- onal (or can be made so using the Gram–Schmidt orthogonalisation procedure). Alternatively we can prove the orthogonality of the eigenvectors directly. trianglerightsldProve that the eigenvectors corresponding to diﬀerent eigenvalues of an Hermitian matrix are orthogonal. Consider two unequal eigenvalues λ i and λ j and their corresponding eigenvectors satisfying Ax i = λ i x i , (8.83) Ax j = λ j x j . (8.84) Taking the Hermitian conjugate of (8.83) we find (x i ) † A † = λ ∗ i (x i ) † . Multiplying this on the right by x j we obtain (x i ) † A † x j = λ ∗ i (x i ) † x j , and similarly multiplying (8.84) through on the left by (x i ) † we find (x i ) † Ax j = λ j (x i ) † x j . Then, since A † = A, the two left-hand sides are equal and, because the λ i are real, on subtraction we obtain 0 =(λ i −λ j )(x i ) † x j . Finally we note that λ i negationslash= λ j and so (x i ) † x j = 0, i.e. the eigenvectors x i and x j are orthogonal. triangleleftsld In the case where some of the eigenvalues are equal, further justification of the orthogonality of the eigenvectors is needed. The Gram–Schmidt orthogonalisa- tion procedure discussed above provides a proof of, and a means of achieving, orthogonality. The general method has already been described and we will not repeat it here. We may also consider the properties of the eigenvalues and eigenvectors of an anti-Hermitian matrix, for which A † =−A and thus AA † = A(−A)=(−A)A = A † A. Therefore matrices that are anti-Hermitian are also normal and so have mutu- ally orthogonal eigenvectors. The properties of the eigenvalues are also simply deduced, since if Ax = λx then λ ∗ x = A † x =−Ax =−λx. 277 MATRICES AND VECTOR SPACES Hence λ ∗ = −λ and so λ must be pure imaginary (or zero). In a similar manner to that used for Hermitian matrices, these properties may be proved directly. 8.13.3 Eigenvectors and eigenvalues of a unitary matrix A unitary matrix satisfies A † = A −1 and is also a normal matrix, with mutually orthogonal eigenvectors. To investigate the eigenvalues of a unitary matrix, we note that if Ax = λx then x † x = x † A † Ax = λ ∗ λx † x, and we deduce that λλ ∗ = |λ| 2 = 1. Thus, the eigenvalues of a unitary matrix have unit modulus. 8.13.4 Eigenvectors and eigenvalues of a general square matrix When an N ×N matrix is not normal there are no general properties of its eigenvalues and eigenvectors; in general it is not possible to find any orthogonal set of N eigenvectors or even to find pairs of orthogonal eigenvectors (except by chance in some cases). While the N non-orthogonal eigenvectors are usually linearly independent and hence form a basis for the N-dimensional vector space, this is not necessarily so. It may be shown (although we will not prove it) that any N×N matrix with distinct eigenvalues has N linearly independent eigenvectors, which therefore form a basis for the N-dimensional vector space. If a general square matrix has degenerate eigenvalues, however, then it may or may not have N linearly independent eigenvectors. A matrix whose eigenvectors are not linearly independent is said to be defective. 8.13.5 Simultaneous eigenvectors We may now ask under what conditions two diﬀerent normal matrices can have a common set of eigenvectors. The result – that they do so if, and only if, they commute – has profound significance for the foundations of quantum mechanics. To prove this important result let A and B be two N×N normal matrices and x i be the ith eigenvector of A corresponding to eigenvalue λ i ,i.e. Ax i = λ i x i for i =1,2,...,N. For the present we assume that the eigenvalues are all diﬀerent. (i) First suppose that A and B commute. Now consider ABx i = BAx i = Bλ i x i = λ i Bx i , where we have used the commutativity for the first equality and the eigenvector property for the second. It follows that A(Bx i )=λ i (Bx i ) and thus that Bx i is an 278 8.13 EIGENVECTORS AND EIGENVALUES eigenvector of A corresponding to eigenvalue λ i . But the eigenvector solutions of (A−λ i I)x i = 0 are unique to within a scale factor, and we therefore conclude that Bx i = µ i x i for some scale factor µ i . However, this is just an eigenvector equation for B and shows that x i is an eigenvector of B, in addition to being an eigenvector of A.By reversing the roles of A and B, it also follows that every eigenvector of B is an eigenvector of A. Thus the two sets of eigenvectors are identical. (ii) Now suppose that A and B have all their eigenvectors in common, a typical one x i satisfying both Ax i = λ i x i and Bx i = µ i x i . As the eigenvectors span the N-dimensional vector space, any arbitrary vector x in the space can be written as a linear combination of the eigenvectors, x = N summationdisplay i=1 c i x i . Now consider both ABx = AB N summationdisplay i=1 c i x i = A N summationdisplay i=1 c i µ i x i = N summationdisplay i=1 c i λ i µ i x i , and BAx = BA N summationdisplay i=1 c i x i = B N summationdisplay i=1 c i λ i x i = N summationdisplay i=1 c i µ i λ i x i . It follows that ABx and BAx are the same for any arbitrary x and hence that (AB−BA)x = 0 for all x.Thatis,A and B commute. This completes the proof that a necessary and suﬃcient condition for two normal matrices to have a set of eigenvectors in common is that they commute. It should be noted that if an eigenvalue of A, say, is degenerate then not all of its possible sets of eigenvectors will also constitute a set of eigenvectors of B. However, provided that by taking linear combinations one set of joint eigenvectors can be found, the proof is still valid and the result still holds. When extended to the case of Hermitian operators and continuous eigenfunc- tions (sections 17.2 and 17.3) the connection between commuting matrices and a set of common eigenvectors plays a fundamental role in the postulatory basis of quantum mechanics. It draws the distinction between commuting and non- commuting observables and sets limits on how much information about a system can be known, even in principle, at any one time. 279 MATRICES AND VECTOR SPACES 8.14 Determination of eigenvalues and eigenvectors The next step is to show how the eigenvalues and eigenvectors of a given N×N matrix A are found. To do this we refer to (8.68) and as in (8.69) rewrite it as Ax−λIx =(A−λI)x = 0. (8.85) The slight rearrangement used here is to write x as Ix,whereI is the unit matrix of order N. The point of doing this is immediate since (8.85) now has the form of a homogeneous set of simultaneous equations, the theory of which will be developed in section 8.18. What will be proved there is that the equation Bx = 0 only has a non-trivial solution x if |B| = 0. Correspondingly, therefore, we must have in the present case that |A−λI|= 0, (8.86) if there are to be non-zero solutions x to (8.85). Equation (8.86) is known as the characteristic equation for A and its LHS as the characteristic or secular determinant of A. The equation is a polynomial of degree N in the quantity λ.TheN roots of this equation λ i , i =1,2,...,N, give the eigenvalues of A. Corresponding to each λ i there will be a column vector x i , which is the ith eigenvector of A and can be found by using (8.68). It will be observed that when (8.86) is written out as a polynomial equation in λ, the coeﬃcient of −λ N−1 in the equation will be simply A 11 + A 22 + ···+ A NN relative to the coeﬃcient of λ N . As discussed in section 8.8, the quantity summationtext N i=1 A ii is the trace of A and, from the ordinary theory of polynomial equations, will be equal to the sum of the roots of (8.86): N summationdisplay i=1 λ i =TrA. (8.87) This can be used as one check that a computation of the eigenvalues λ i has been done correctly. Unless equation (8.87) is satisfied by a computed set of eigenvalues, they have not been calculated correctly. However, that equation (8.87) is satisfied is a necessary, but not suﬃcient, condition for a correct computation. An alternative proof of (8.87) is given in section 8.16. trianglerightsldFind the eigenvalues and normalised eigenvectors of the real symmetric matrix A = 11 3 11−3 3 −3 −3 . Using (8.86), vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1−λ 13 11−λ −3 3 −3 −3−λ vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =0. 280 8.14 DETERMINATION OF EIGENVALUES AND EIGENVECTORS Expanding out this determinant gives (1−λ)[(1−λ)(−3−λ)−(−3)(−3)] +1[(−3)(3)−1(−3−λ)] +3[1(−3)−(1−λ)(3)] =0, which simplifies to give (1−λ)(λ 2 +2λ−12) + (λ−6) + 3(3λ−6) = 0, ⇒ (λ−2)(λ−3)(λ +6)=0. Hence the roots of the characteristic equation, which are the eigenvalues of A,areλ 1 =2, λ 2 =3,λ 3 =−6. We note that, as expected, λ 1 + λ 2 + λ 3 =−1=1+1−3=A 11 + A 22 + A 33 =TrA. For the first root, λ 1 = 2, a suitable eigenvector x 1 , with elements x 1 , x 2 , x 3 ,mustsatisfy Ax 1 =2x 1 or, equivalently, x 1 + x 2 +3x 3 =2x 1 , x 1 + x 2 −3x 3 =2x 2 , (8.88) 3x 1 −3x 2 −3x 3 =2x 3 . These three equations are consistent (to ensure this was the purpose in finding the particular values of λ)andyieldx 3 =0,x 1 = x 2 = k,wherek is any non-zero number. A suitable eigenvector would thus be x 1 = (kk0) T . If we apply the normalisation condition, we require k 2 + k 2 +0 2 =1ork =1/ √ 2. Hence x 1 = parenleftbigg 1 √ 2 1 √ 2 0 parenrightbigg T = 1 √ 2 (110) T . Repeating the last paragraph, but with the factor 2 on the RHS of (8.88) replaced successively by λ 2 =3andλ 3 =−6, gives two further normalised eigenvectors x 2 = 1 √ 3 (1 −11) T , x 3 = 1 √ 6 (1 −1 −2) T . triangleleftsld In the above example, the three values of λ are all diﬀerent and A is a real symmetric matrix. Thus we expect, and it is easily checked, that the three eigenvectors are mutually orthogonal, i.e. parenleftbig x 1 parenrightbig T x 2 = parenleftbig x 1 parenrightbig T x 3 = parenleftbig x 2 parenrightbig T x 3 =0. It will be apparent also that, as expected, the normalisation of the eigenvectors has no eﬀect on their orthogonality. 8.14.1 Degenerate eigenvalues We return now to the case of degenerate eigenvalues, i.e. those that have two or more associated eigenvectors. We have shown already that it is always possible to construct an orthogonal set of eigenvectors for a normal matrix, see subsec- tion 8.13.1, and the following example illustrates one method for constructing such a set. 281 MATRICES AND VECTOR SPACES trianglerightsldConstruct an orthonormal set of eigenvectors for the matrix A = 103 0 −20 301 . We first determine the eigenvalues using |A−λI|=0: 0= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1−λ 03 0 −2−λ 0 301−λ vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =−(1−λ) 2 (2 + λ) + 3(3)(2 + λ) =(4−λ)(λ +2) 2 . Thus λ 1 =4,λ 2 =−2=λ 3 . The eigenvector x 1 = (x 1 x 2 x 3 ) T is found from 103 0 −20 301 x 1 x 2 x 3 =4 x 1 x 2 x 3 ⇒ x 1 = 1 √ 2 1 0 1 . A general column vector that is orthogonal to x 1 is x = (ab−a) T , (8.89) and it is easily shown that Ax = 103 0 −20 301 a b −a =−2 a b −a =−2x. Thus x is a eigenvector of A with associated eigenvalue −2. It is clear, however, that there is an infinite set of eigenvectors x all possessing the required property; the geometrical analogue is that there are an infinite number of corresponding vectors x lying in the plane that has x 1 as its normal. We do require that the two remaining eigenvectors are orthogonal to one another, but this still leaves an infinite number of possibilities. For x 2 , therefore, let us choose a simple form of (8.89), suitably normalised, say, x 2 = (010) T . The third eigenvector is then specified (to within an arbitrary multiplicative constant) by the requirement that it must be orthogonal to x 1 and x 2 ; thus x 3 may be found by evaluating the vector product of x 1 and x 2 and normalising the result. This gives x 3 = 1 √ 2 (−101) T , to complete the construction of an orthonormal set of eigenvectors. triangleleftsld 8.15 Change of basis and similarity transformations Throughout this chapter we have considered the vector x as a geometrical quantity that is independent of any basis (or coordinate system). If we introduce a basis e i , i =1,2,...,N, into our N-dimensional vector space then we may write x = x 1 e 1 + x 2 e 2 +···+ x N e N , 282 8.15 CHANGE OF BASIS AND SIMILARITY TRANSFORMATIONS and represent x in this basis by the column matrix x =(x 1 x 2 ··· x n ) T , having components x i . We now consider how these components change as a result of a prescribed change of basis. Let us introduce a new basis e prime i , i =1,2,...,N, which is related to the old basis by e prime j = N summationdisplay i=1 S ij e i , (8.90) the coeﬃcient S ij being the ith component of e prime j with respect to the old (unprimed) basis. For an arbitrary vector x it follows that x = N summationdisplay i=1 x i e i = N summationdisplay j=1 x prime j e prime j = N summationdisplay j=1 x prime j N summationdisplay i=1 S ij e i . From this we derive the relationship between the components of x in the two coordinate systems as x i = N summationdisplay j=1 S ij x prime j , which we can write in matrix form as x = Sx prime (8.91) where S is the transformation matrix associated with the change of basis. Furthermore, since the vectors e prime j are linearly independent, the matrix S is non-singular and so possesses an inverse S −1 . Multiplying (8.91) on the left by S −1 we find x prime = S −1 x, (8.92) which relates the components of x in the new basis to those in the old basis. Comparing (8.92) and (8.90) we note that the components of x transform inversely to the way in which the basis vectors e i themselves transform. This has to be so, as the vector x itself must remain unchanged. We may also find the transformation law for the components of a linear operator under the same change of basis. Now, the operator equation y = Ax (which is basis independent) can be written as a matrix equation in each of the two bases as y = Ax, y prime = A prime x prime . (8.93) But, using (8.91), we may rewrite the first equation as Sy prime = ASx prime ⇒ y prime = S −1 ASx prime . 283 MATRICES AND VECTOR SPACES Comparing this with the second equation in (8.93) we find that the components of the linear operator A transform as A prime = S −1 AS. (8.94) Equation (8.94) is an example of a similarity transformation – a transformation that can be particularly useful in converting matrices into convenient forms for computation. Given a square matrix A, we may interpret it as representing a linear operator A in a given basis e i . From (8.94), however, we may also consider the matrix A prime = S −1 AS, for any non-singular matrix S, as representing the same linear operator A but in a new basis e prime j , related to the old basis by e prime j = summationdisplay i S ij e i . Therefore we would expect that any property of the matrix A that represents some (basis-independent) property of the linear operator A will also be shared by the matrix A prime . We list these properties below. (i) If A = I then A prime = I, since, from (8.94), A prime = S −1 IS = S −1 S = I. (8.95) (ii) The value of the determinant is unchanged: |A prime |=|S −1 AS|=|S −1 ||A||S|=|A||S −1 ||S|=|A||S −1 S|=|A|. (8.96) (iii) The characteristic determinant and hence the eigenvalues of A prime are the same as those of A: from (8.86), |A prime −λI|=|S −1 AS−λI|=|S −1 (A−λI)S| =|S −1 ||S||A−λI|=|A−λI|. (8.97) (iv) The value of the trace is unchanged: from (8.87), Tr A prime = summationdisplay i A prime ii = summationdisplay i summationdisplay j summationdisplay k (S −1 ) ij A jk S ki = summationdisplay i summationdisplay j summationdisplay k S ki (S −1 ) ij A jk = summationdisplay j summationdisplay k δ kj A jk = summationdisplay j A jj =TrA. (8.98) An important class of similarity transformations is that for which S is a uni- tary matrix; in this case A prime = S −1 AS = S † AS. Unitary transformation matrices are particularly important, for the following reason. If the original basis e i is 284 8.16 DIAGONALISATION OF MATRICES orthonormal and the transformation matrix S is unitary then 〈e prime i |e prime j 〉= angbracketleftBig summationdisplay k S ki e k vextendsingle vextendsingle vextendsingle summationdisplay r S rj e r angbracketrightBig = summationdisplay k S ∗ ki summationdisplay r S rj 〈e k |e r 〉 = summationdisplay k S ∗ ki summationdisplay r S rj δ kr = summationdisplay k S ∗ ki S kj =(S † S) ij = δ ij , showing that the new basis is also orthonormal. Furthermore, in addition to the properties of general similarity transformations, for unitary transformations the following hold. (i) If A is Hermitian (anti-Hermitian) then A prime is Hermitian (anti-Hermitian), i.e. if A † =±A then (A prime ) † =(S † AS) † = S † A † S =±S † AS =±A prime . (8.99) (ii) If A is unitary (so that A † = A −1 )thenA prime is unitary, since (A prime ) † A prime =(S † AS) † (S † AS)=S † A † SS † AS = S † A † AS = S † IS = I. (8.100) 8.16 Diagonalisation of matrices Suppose that a linear operator A is represented in some basis e i , i =1,2,...,N, by the matrix A. Consider a new basis x j given by x j = N summationdisplay i=1 S ij e i , where the x j are chosen to be the eigenvectors of the linear operator A,i.e. Ax j = λ j x j . (8.101) In the new basis, A is represented by the matrix A prime = S −1 AS, which has a particularly simple form, as we shall see shortly. The element S ij of S is the ith component, in the old (unprimed) basis, of the jth eigenvector x j of A,i.e.the columns of S are the eigenvectors of the matrix A: S = ↑↑ ↑ x 1 x 2 ··· x N ↓↓ ↓ , 285 MATRICES AND VECTOR SPACES that is, S ij =(x j ) i . Therefore A prime is given by (S −1 AS) ij = summationdisplay k summationdisplay l (S −1 ) ik A kl S lj = summationdisplay k summationdisplay l (S −1 ) ik A kl (x j ) l = summationdisplay k (S −1 ) ik λ j (x j ) k = summationdisplay k λ j (S −1 ) ik S kj = λ j δ ij . So the matrix A prime is diagonal with the eigenvalues of A as the diagonal elements, i.e. A prime = λ 1 0 ··· 0 0 λ 2 . . . . . . . . . 0 0 ··· 0 λ N . Therefore, given a matrix A, if we construct the matrix S that has the eigen- vectors of A as its columns then the matrix A prime = S −1 AS is diagonal and has the eigenvalues of A as its diagonal elements. Since we require S to be non-singular (|S|negationslash= 0), the N eigenvectors of A must be linearly independent and form a basis for the N-dimensional vector space. It may be shown that any matrix with distinct eigenvalues can be diagonalised by this procedure. If, however, a general square matrix has degenerate eigenvalues then it may, or may not, have N linearly independent eigenvectors. If it does not then it cannot be diagonalised. For normal matrices (which include Hermitian, anti-Hermitian and unitary matrices) the N eigenvectors are indeed linearly independent. Moreover, when normalised, these eigenvectors form an orthonormal set (or can be made to do so). Therefore the matrix S with these normalised eigenvectors as columns, i.e. whose elements are S ij =(x j ) i ,hastheproperty (S † S) ij = summationdisplay k (S † ) ik (S) kj = summationdisplay k S ∗ ki S kj = summationdisplay k (x i ) ∗ k (x j ) k = (x i ) † x j = δ ij . Hence S is unitary (S −1 = S † ) and the original matrix A can be diagonalised by A prime = S −1 AS = S † AS. Therefore, any normal matrix A can be diagonalised by a similarity transformation using a unitary transformation matrix S. 286 8.16 DIAGONALISATION OF MATRICES trianglerightsldDiagonalise the matrix A = 103 0 −20 301 . The matrix A is symmetric and so may be diagonalised by a transformation of the form A prime = S † AS,whereS has the normalised eigenvectors of A as its columns. We have already found these eigenvectors in subsection 8.14.1, and so we can write straightaway S = 1 √ 2 10−1 0 √ 20 10 1 . We note that although the eigenvalues of A are degenerate, its three eigenvectors are linearly independent and so A can still be diagonalised. Thus, calculating S † AS we obtain S † AS = 1 2 101 0 √ 20 −101 103 0 −20 301 10−1 0 √ 20 10 1 = 40 0 0 −20 00−2 , which is diagonal, as required, and has as its diagonal elements the eigenvalues of A. triangleleftsld If a matrix A is diagonalised by the similarity transformation A prime = S −1 AS,so that A prime = diag(λ 1 ,λ 2 ,...,λ N ), then we have immediately Tr A prime =TrA = N summationdisplay i=1 λ i , (8.102) |A prime |=|A|= N productdisplay i=1 λ i , (8.103) since the eigenvalues of the matrix are unchanged by the transformation. More- over, these results may be used to prove the rather useful trace formula |exp A|= exp(Tr A), (8.104) where the exponential of a matrix is as defined in (8.38). trianglerightsldProve the trace formula (8.104). At the outset, we note that for the similarity transformation A prime = S −1 AS, we have (A prime ) n =(S −1 AS)(S −1 AS)···(S −1 AS)=S −1 A n S. Thus, from (8.38), we obtain exp A prime = S −1 (exp A)S, from which it follows that |exp A prime |= 287 MATRICES AND VECTOR SPACES |exp A|. Moreover, by choosing the similarity transformation so that it diagonalises A,we have A prime =diag(λ 1 ,λ 2 ,...,λ N ), and so |expA|=|expA prime |=|exp[diag(λ 1 ,λ 2 ,...,λ N )]|=|diag(expλ 1 ,expλ 2 ,...,expλ N )|= N productdisplay i=1 expλ i . Rewriting the final product of exponentials of the eigenvalues as the exponential of the sum of the eigenvalues, we find |exp A|= N productdisplay i=1 exp λ i =exp parenleftBigg N summationdisplay i=1 λ i parenrightBigg =exp(TrA), which gives the trace formula (8.104). triangleleftsld 8.17 Quadratic and Hermitian forms Let us now introduce the concept of quadratic forms (and their complex ana- logues, Hermitian forms). A quadratic form Q is a scalar function of a real vector x given by Q(x)=〈x|Ax〉, (8.105) for some real linear operator A. In any given basis (coordinate system) we can write (8.105) in matrix form as Q(x)=x T Ax, (8.106) where A is a real matrix. In fact, as will be explained below, we need only consider the case where A is symmetric, i.e. A = A T . As an example in a three-dimensional space, Q = x T Ax = parenleftBig x 1 x 2 x 3 parenrightBig 11 3 11−3 3 −3 −3 x 1 x 2 x 3 = x 2 1 + x 2 2 −3x 2 3 +2x 1 x 2 +6x 1 x 3 −6x 2 x 3 . (8.107) It is reasonable to ask whether a quadratic form Q = x T Mx,whereM is any (possibly non-symmetric) real square matrix, is a more general definition. That this is not the case may be seen by expressing M in terms of a symmetric matrix A = 1 2 (M+M T ) and an antisymmetric matrix B = 1 2 (M−M T ) such that M = A+B. We then have Q = x T Mx = x T Ax + x T Bx. (8.108) However, Q is a scalar quantity and so Q = Q T =(x T Ax) T +(x T Bx) T = x T A T x + x T B T x = x T Ax−x T Bx. (8.109) Comparing (8.108) and (8.109) shows that x T Bx = 0, and hence x T Mx = x T Ax, 288 8.17 QUADRATIC AND HERMITIAN FORMS i.e. Q is unchanged by considering only the symmetric part of M. Hence, with no loss of generality, we may assume A = A T in (8.106). From its definition (8.105), Q is clearly a basis- (i.e. coordinate-) independent quantity. Let us therefore consider a new basis related to the old one by an orthogonal transformation matrix S, the components in the two bases of any vector x being related (as in (8.91)) by x = Sx prime or, equivalently, by x prime = S −1 x = S T x. We then have Q = x T Ax =(x prime ) T S T ASx prime =(x prime ) T A prime x prime , where (as expected) the matrix describing the linear operator A in the new basis is given by A prime = S T AS (since S T = S −1 ). But, from the last section, if we choose as S the matrix whose columns are the normalised eigenvectors of A then A prime = S T AS is diagonal with the eigenvalues of A as the diagonal elements. (Since A is symmetric, its normalised eigenvectors are orthogonal, or can be made so, and hence S is orthogonal with S −1 = S T .) In the new basis Q = x T Ax =(x prime ) T Λx prime = λ 1 x prime 1 2 + λ 2 x prime 2 2 +···+ λ N x prime N 2 , (8.110) where Λ = diag(λ 1 ,λ 2 ,...,λ N )andtheλ i are the eigenvalues of A. It should be noted that Q contains no cross-terms of the form x prime 1 x prime 2 . trianglerightsldFind an orthogonal transformation that takes the quadratic form (8.107) into the form λ 1 x prime 1 2 + λ 2 x prime 2 2 + λ 3 x prime 3 2 . The required transformation matrix S has the normalised eigenvectors of A as its columns. We have already found these in section 8.14, and so we can write immediately S = 1 √ 6 √ 3 √ 21 √ 3 − √ 2 −1 0 √ 2 −2 , which is easily verified as being orthogonal. Since the eigenvalues of A are λ =2,3,and −6, the general result already proved shows that the transformation x = Sx prime will carry (8.107) into the form 2x prime 1 2 +3x prime 2 2 −6x prime 3 2 . This may be verified most easily by writing out the inverse transformation x prime = S −1 x = S T x and substituting. The inverse equations are x prime 1 =(x 1 + x 2 )/ √ 2, x prime 2 =(x 1 −x 2 + x 3 )/ √ 3, (8.111) x prime 3 =(x 1 −x 2 −2x 3 )/ √ 6. If these are substituted into the form Q =2x prime 1 2 +3x prime 2 2 −6x prime 3 2 then the original expression (8.107) is recovered. triangleleftsld In the definition of Q it was assumed that the components x 1 , x 2 , x 3 and the matrix A were real. It is clear that in this case the quadratic form Q≡x T Ax is real 289 MATRICES AND VECTOR SPACES also. Another, rather more general, expression that is also real is the Hermitian form H(x)≡x † Ax, (8.112) where A is Hermitian (i.e. A † = A) and the components of x may now be complex. It is straightforward to show that H is real, since H ∗ =(H T ) ∗ = x † A † x = x † Ax = H. With suitable generalisation, the properties of quadratic forms apply also to Her- mitian forms, but to keep the presentation simple we will restrict our discussion to quadratic forms. A special case of a quadratic (Hermitian) form is one for which Q = x T Ax is greater than zero for all column matrices x. By choosing as the basis the eigenvectors of A we have Q in the form Q = λ 1 x 2 1 + λ 2 x 2 2 + λ 3 x 2 3 . The requirement that Q>0 for all x means that all the eigenvalues λ i of A must be positive. A symmetric (Hermitian) matrix A with this property is called positive definite.If,instead,Q≥0 for all x then it is possible that some of the eigenvalues are zero, and A is called positive semi-definite. 8.17.1 The stationary properties of the eigenvectors Consider a quadratic form, such as Q(x)=〈x|Ax〉, equation (8.105), in a fixed basis. As the vector x is varied, through changes in its three components x 1 , x 2 and x 3 , the value of the quantity Q also varies. Because of the homogeneous form of Q we may restrict any investigation of these variations to vectors of unit length (since multiplying any vector x by any scalar k simply multiplies the value of Q by a factor k 2 ). Of particular interest are any vectors x that make the value of the quadratic form a maximum or minimum. A necessary, but not suﬃcient, condition for this is that Q is stationary with respect to small variations ∆x in x, whilst 〈x|x〉 is maintained at a constant value (unity). In the chosen basis the quadratic form is given by Q = x T Ax and, using Lagrange undetermined multipliers to incorporate the variational constraints, we are led to seek solutions of ∆[x T Ax−λ(x T x−1)] = 0. (8.113) This may be used directly, together with the fact that (∆x T )Ax = x T A∆x,sinceA is symmetric, to obtain Ax = λx (8.114) 290 8.17 QUADRATIC AND HERMITIAN FORMS as the necessary condition that x must satisfy. If (8.114) is satisfied for some eigenvector x then the value of Q(x) is given by Q = x T Ax = x T λx = λ. (8.115) However, if x and y are eigenvectors corresponding to diﬀerent eigenvalues then they are (or can be chosen to be) orthogonal. Consequently the expression y T Ax is necessarily zero, since y T Ax = y T λx = λy T x =0. (8.116) Summarising, those column matrices x of unit magnitude that make the quadratic form Q stationary are eigenvectors of the matrix A, and the stationary value of Q is then equal to the corresponding eigenvalue. It is straightforward to see from the proof of (8.114) that, conversely, any eigenvector of A makes Q stationary. Instead of maximising or minimising Q = x T Ax subject to the constraint x T x = 1, an equivalent procedure is to extremise the function λ(x)= x T Ax x T x . trianglerightsldShow that if λ(x) is stationary then x is an eigenvector of A and λ(x) is equal to the corresponding eigenvalue. We require ∆λ(x) = 0 with respect to small variations in x.Now ∆λ = 1 (x T x) 2 bracketleftbig (x T x) parenleftbig ∆x T Ax + x T A∆x parenrightbig −x T Ax parenleftbig ∆x T x + x T ∆x parenrightbigbracketrightbig = 2∆x T Ax x T x −2 parenleftbigg x T Ax x T x parenrightbigg ∆x T x x T x , since x T A∆x =(∆x T )Ax and x T ∆x =(∆x T )x. Thus ∆λ = 2 x T x ∆x T [Ax−λ(x)x]. Hence, if ∆λ =0thenAx = λ(x)x,i.e.x is an eigenvector of A with eigenvalue λ(x). triangleleftsld Thus the eigenvalues of a symmetric matrix A are the values of the function λ(x)= x T Ax x T x at its stationary points. The eigenvectors of A lie along those directions in space for which the quadratic form Q = x T Ax has stationary values, given a fixed magnitude for the vector x. Similar results hold for Hermitian matrices. 291 MATRICES AND VECTOR SPACES 8.17.2 Quadratic surfaces The results of the previous subsection may be turned round to state that the surface given by x T Ax = constant = 1 (say) (8.117) and called a quadratic surface, has stationary values of its radius (i.e. origin– surface distance) in those directions that are along the eigenvectors of A.More specifically, in three dimensions the quadratic surface x T Ax = 1 has its principal axes along the three mutually perpendicular eigenvectors of A,andthesquares of the corresponding principal radii are given by λ −1 i , i =1,2,3. As well as having this stationary property of the radius, a principal axis is characterised by the fact that any section of the surface perpendicular to it has some degree of symmetry about it. If the eigenvalues corresponding to any two principal axes are degenerate then the quadratic surface has rotational symmetry about the third principal axis and the choice of a pair of axes perpendicular to that axis is not uniquely defined. trianglerightsldFind the shape of the quadratic surface x 2 1 + x 2 2 −3x 2 3 +2x 1 x 2 +6x 1 x 3 −6x 2 x 3 =1. If, instead of expressing the quadratic surface in terms of x 1 , x 2 , x 3 , as in (8.107), we were to use the new variables x prime 1 , x prime 2 , x prime 3 defined in (8.111), for which the coordinate axes are along the three mutually perpendicular eigenvector directions (1,1,0), (1,−1,1) and (1,−1,−2), then the equation of the surface would take the form (see (8.110)) x prime 1 2 (1/ √ 2) 2 + x prime 2 2 (1/ √ 3) 2 − x prime 3 2 (1/ √ 6) 2 =1. Thus, for example, a section of the quadratic surface in the plane x prime 3 =0,i.e.x 1 −x 2 − 2x 3 = 0, is an ellipse, with semi-axes 1/ √ 2and1/ √ 3. Similarly a section in the plane x prime 1 = x 1 + x 2 = 0 is a hyperbola. triangleleftsld Clearly the simplest three-dimensional situation to visualise is that in which all the eigenvalues are positive, since then the quadratic surface is an ellipsoid. 8.18 Simultaneous linear equations In physical applications we often encounter sets of simultaneous linear equations. In general we may have M equations in N unknowns x 1 ,x 2 ,...,x N of the form A 11 x 1 + A 12 x 2 + ···+ A 1N x N = b 1 , A 21 x 1 + A 22 x 2 + ···+ A 2N x N = b 2 , . . . A M1 x 1 + A M2 x 2 + ···+ A MN x N = b M , (8.118) 292 8.18 SIMULTANEOUS LINEAR EQUATIONS where the A ij and b i have known values. If all the b i are zero then the system of equations is called homogeneous, otherwise it is inhomogeneous. Depending on the given values, this set of equations for the N unknowns x 1 , x 2 , ..., x N may have either a unique solution, no solution or infinitely many solutions. Matrix analysis may be used to distinguish between the possibilities. The set of equations may be expressed as a single matrix equation Ax = b, or, written out in full, as A 11 A 12 ... A 1N A 21 A 22 ... A 2N . . . . . . . . . . . . A M1 A M2 ... A MN x 1 x 2 . . . x N = b 1 b 2 . . . b M . 8.18.1 The range and null space of a matrix As we discussed in section 8.2, we may interpret the matrix equation Ax = b as representing, in some basis, the linear transformation Ax = b of a vector x in an N-dimensional vector space V into a vector b in some other (in general diﬀerent) M-dimensional vector space W. In general the operator A will map any vector in V into some particular subspace of W, which may be the entire space. This subspace is called the range of A (or A) and its dimension is equal to the rank of A. Moreover, if A (and hence A)issingular then there exists some subspace of V that is mapped onto the zero vector 0 in W; that is, any vector y that lies in the subspace satisfies Ay = 0. This subspace is called the null space of A and the dimension of this null space is called the nullity of A. We note that the matrix A must be singular if M negationslash= N and may be singular even if M = N. The dimensions of the range and the null space of a matrix are related through the fundamental relationship rank A + nullity A = N, (8.119) where N is the number of original unknowns x 1 ,x 2 ,...,x N . trianglerightsldProve the relationship (8.119). As discussed in section 8.11, if the columns of an M×N matrix A are interpreted as the components, in a given basis, of N (M-component) vectors v 1 ,v 2 ,...,v N then rank A is equal to the number of linearly independent vectors in this set (this number is also equal to the dimension of the vector space spanned by these vectors). Writing (8.118) in terms of the vectors v 1 ,v 2 ,...,v N , we have x 1 v 1 + x 2 v 2 +···+ x N v N = b. (8.120) From this expression, we immediately deduce that the range of A is merely the span of the vectors v 1 ,v 2 ,...,v N and hence has dimension r =rankA. 293 MATRICES AND VECTOR SPACES If a vector y lies in the null space of A then Ay = 0,whichwemaywriteas y 1 v 1 + y 2 v 2 +···+ y N v N = 0. (8.121) As just shown above, however, only r (≤N) of these vectors are linearly independent. By renumbering, if necessary, we may assume that v 1 ,v 2 ,...,v r form a linearly independent set; the remaining vectors, v r+1 ,v r+2 ,...,v N , can then be written as a linear superposition of v 1 ,v 2 ,...,v r . We are therefore free to choose the N −r coeﬃcients y r+1 ,y r+2 ,...,y N arbitrarily and (8.121) will still be satisfied for some set of r coeﬃcients y 1 ,y 2 ,...,y r (which are not all zero). The dimension of the null space is therefore N−r, and this completes the proof of (8.119). triangleleftsld Equation (8.119) has far-reaching consequences for the existence of solutions to sets of simultaneous linear equations such as (8.118). As mentioned previously, these equations may have no solution,aunique solution or infinitely many solutions. We now discuss these three cases in turn. No solution The system of equations possesses no solution unless b lies in the range of A;in this case (8.120) will be satisfied for some x 1 ,x 2 ,...,x N . This in turn requires the setofvectorsb,v 1 ,v 2 ,...,v N to have the same span (see (8.8)) as v 1 ,v 2 ,...,v N .In terms of matrices, this is equivalent to the requirement that the matrix A and the augmented matrix M = A 11 A 12 ... A 1N b 1 A 21 A 22 ... A 2N b 1 . . . . . . . . . A M1 A M2 ... A MN b M have the same rank r. If this condition is satisfied then b does lie in the range of A, and the set of equations (8.118) will have either a unique solution or infinitely many solutions. If, however, A and M have diﬀerent ranks then there will be no solution. A unique solution If b lies in the range of A and if r = N then all the vectors v 1 ,v 2 ,...,v N in (8.120) are linearly independent and the equation has a unique solution x 1 ,x 2 ,...,x N . Infinitely many solutions If b lies in the range of A and if r1. • Subtract a suitable multiple of the second row (or the uppermost row that does not start with M zero values) from each of the other lower rows so as to make B i2 =0fori>2. • Continue in this way until all remaining rows have zeros in the first M places. The number of such rows is equal to the nullity of A,andtheN rightmost entries of these rows are the components of vectors that span the null space. They can be made orthogonal if they are not so already. Use this method to show that the nullity of A = −13 2 7 310−617 −1 −22−3 23−44 40−8 −4 312 8.19 EXERCISES is 2 and that an orthogonal base for the null space of A is provided by any two column matrices of the form (2 + α i −2α i 1 α i ) T ,forwhichtheα i (i =1,2) are real and satisfy 6α 1 α 2 +2(α 1 + α 2 )+5=0. 8.32 Do the following sets of equations have non-zero solutions? If so, find them. (a) 3x +2y + z =0, x−3y +2z =0, 2x + y +3z =0. (b) 2x = b(y + z), x =2a(y−z), x =(6a−b)y−(6a + b)z. 8.33 Solve the simultaneous equations 2x +3y + z =11, x + y + z =6, 5x−y +10z =34. 8.34 Solve the following simultaneous equations for x 1 , x 2 and x 3 ,usingmatrix methods: x 1 +2x 2 +3x 3 =1, 3x 1 +4x 2 +5x 3 =2, x 1 +3x 2 +4x 3 =3. 8.35 Show that the following equations have solutions only if η = 1 or 2, and find them in these cases: x + y + z =1, x +2y +4z = η, x +4y +10z = η 2 . 8.36 Find the condition(s) on α such that the simultaneous equations x 1 + αx 2 =1, x 1 −x 2 +3x 3 =−1, 2x 1 −2x 2 + αx 3 =−2 have (a) exactly one solution, (b) no solutions, or (c) an infinite number of solutions; give all solutions where they exist. 8.37 Make an LU decomposition of the matrix A = 36 9 10 5 2 −216 and hence solve Ax = b,where(i)b = (21 9 28) T , (ii) b = (21 7 22) T . 8.38 Make an LU decomposition of the matrix A = 2 −31 3 14−3 −3 53−1 −1 3 −6 −31 . Hence solve Ax = b for (i) b =(−418−5) T , (ii) b =(−10 0 −3 −24) T . Deduce that det A =−160 and confirm this by direct calculation. 8.39 Use the Cholesky separation method to determine whether the following matrices are positive definite. For each that is, determine the corresponding lower diagonal matrix L: A = 21 3 13−1 3 −11 , B = 50 √ 3 030 √ 30 3 . 313 MATRICES AND VECTOR SPACES 8.40 Find the equation satisfied by the squares of the singular values of the matrix associated with the following over-determined set of equations: 2x +3y + z =0 x−y−z =1 2x + y =0 2y + z =−2. Show that one of the singular values is close to zero. Determine the two larger singular values by an appropriate iteration process and the smallest one by indirect calculation. 8.41 Find the SVD of A = 0 −1 11 −10 , showing that the singular values are √ 3and1. 8.42 Find the SVD form of the matrix A = 22 28 −22 1 −2 −19 19 −2 −1 −612 6 . Use it to determine the best solution x of the equation Ax = b when (i) b = (6 −39 15 18) T , (ii) b =(9 −42 15 15) T , showing that (i) has an exact solution, but that the best solution to (ii) has a residual of √ 18. 8.43 Four experimental measurements of particular combinations of three physical variables, x, y and z, gave the following inconsistent results: 13x +22y−13z =4, 10x−8y−10z =44, 10x−8y−10z =47, 9x−18y−9z =72. Find the SVD best values for x, y and z. Identify the null space of A and hence obtain the general SVD solution. 8.20 Hints and answers 8.1 (a) False. O N ,theN×N null matrix, is not non-singular. (b) False. Consider the sum of parenleftbigg 10 00 parenrightbigg and parenleftbigg 00 01 parenrightbigg . (c) True. (d) True. (e) False. Consider b n = a n +a n for which summationtext N n=0 |b n | 2 =4negationslash= 1, or note that there is no zero vector with unit norm. (f) True. (g) False. Consider the two series defined by a 0 = 1 2 ,a n =2(− 1 2 ) n for n≥1; b n =−(− 1 2 ) n for n≥0. Theseriesthatisthesumof{a n } and {b n } does not have alternating signs and so closure does not hold. 8.3 (a) x = a, b or c;(b)x =−1; the equation is linear in x. 314 8.20 HINTS AND ANSWERS 8.5 Use the property of the determinant of a matrix product. 8.7 (d) S = parenleftbigg 0 −tan(θ/2) tan(θ/2) 0 parenrightbigg . (e) Note that (I + K)(I−K)=I−K 2 =(I−K)(I + K). 8.9 (b) 32iA. 8.11 a = bcosγ + ccosβ, and cyclic permutations; a 2 = b 2 + c 2 −2bccosα, and cyclic permutations. 8.13 (a) 2 −1/2 (0011) T , 6 −1/2 (2 0 −11) T , 39 −1/2 (−16−11) T , 13 −1/2 (212−2) T . (b) 5 −1/2 (1200) T , (345) −1/2 (14 −7100) T , (18 285) −1/2 (−56 28 98 69) T . 8.15 C does not commute with the others; A, B and D have (1 −2) T and (2 1) T as common eigenvectors. 8.17 For A :(1 0 −1) T , (1 α 1 1) T , (1 α 2 1) T . For B :(111) T , (β 1 γ 1 −β 1 −γ 1 ) T , (β 2 γ 2 −β 2 −γ 2 ) T . The α i , β i and γ i are arbitrary. Simultaneous and orthogonal: (1 0 −1) T , (111) T , (1 −21) T . 8.19 α j =(v·e j∗ )/(λ j −µ), where λ j is the eigenvalue corresponding to e j . (a) x =(213) T . (b) Since µ is equal to one of A’s eigenvalues λ j , the equation only has a solution if v·e j∗ = 0; (i) no solution; (ii) x =(1 1 3/2) T . 8.21 U = (10) −1/2 (1,3i;3i,1), Λ = (1,0; 0,11). 8.23 J =(2y 2 −4y+4)/(y 2 +2), with stationary values at y =± √ 2 and corresponding eigenvalues 2∓ √ 2. From the trace property of A, the third eigenvalue equals 2. 8.25 Ellipse; θ = π/4, a = √ 22; θ =3π/4, b = √ 10. 8.27 The direction of the eigenvector having the unrepeated eigenvalue is (1,1,−1)/ √ 3. 8.29 (a) A = SA prime S † ,whereS is the matrix whose columns are the eigenvectors of the matrix A to be constructed, and A prime =diag(λ,µ,ν). (b) A =(λ +2µ +3ν, 2λ−2µ, λ +2µ−3ν;2λ−2µ, 4λ +2µ, 2λ−2µ; λ +2µ−3ν, 2λ−2µ, λ +2µ +3ν). (c) 1 3 (1,5,−2;5,4,5;−2,5,1). 8.31 The null space is spanned by (2 0 1 0) T and (1 −201) T . 8.33 x =3,y =1,z =2. 8.35 First show that A is singular. η =1,x =1+2z, y = −3z; η =2,x =2z, y =1−3z. 8.37 L =(1,0,0; 1 3 ,1,0; 2 3 ,3,1),U=(3,6,9;0,−2,2; 0,0,4). (i) x =(−112) T . (ii) x =(−322) T . 8.39 A is not positive definite, as L 33 is calculated to be √ −6. B = LL T , where the non-zero elements of L are L 11 = √ 5,L 31 = radicalbig 3/5,L 22 = √ 3,L 33 = radicalbig 12/5. 8.41 A † A = parenleftbigg 21 12 parenrightbigg , U = 1 √ 6 −1 √ 3 √ 2 20 √ 2 −1 − √ 3 √ 2 , V = 1 √ 2 parenleftbigg 11 1 −1 parenrightbigg . 8.43 The singular values are 12 √ 6,0,18 √ 3 and the calculated best solution is x = 1.71,y=−1.94,z=−1.71. The null space is the line x = z,y = 0 and the general SVD solution is x =1.71 + λ, y =−1.94,z=−1.71 + λ. 315 9 Normal modes Any student of the physical sciences will encounter the subject of oscillations on many occasions and in a wide variety of circumstances, for example the voltage and current oscillations in an electric circuit, the vibrations of a mechanical structure and the internal motions of molecules. The matrices studied in the previous chapter provide a particularly simple way to approach what may appear, at first glance, to be diﬃcult physical problems. We will consider only systems for which a position-dependent potential exists, i.e., the potential energy of the system in any particular configuration depends upon the coordinates of the configuration, which need not be be lengths, however; the potential must not depend upon the time derivatives (generalised velocities) of these coordinates. So, for example, the potential −qv · A used in the Lagrangian description of a charged particle in an electromagnetic field is excluded. A further restriction that we place is that the potential has a local minimum at the equilibrium point; physically, this is a necessary and suﬃcient condition for stable equilibrium. By suitably defining the origin of the potential, we may take its value at the equilibrium point as zero. We denote the coordinates chosen to describe a configuration of the system by q i , i =1,2,...,N.Theq i need not be distances; some could be angles, for example. For convenience we can define the q i so that they are all zero at the equilibrium point. The instantaneous velocities of various parts of the system will depend upon the time derivatives of the q i , denoted by ˙q i . For small oscillations the velocities will be linear in the ˙q i and consequently the total kinetic energy T will be quadratic in them – and will include cross terms of the form ˙q i ˙q j with inegationslash= j. The general expression for T can be written as the quadratic form T = summationdisplay i summationdisplay j a ij ˙q i ˙q j = ˙q T A˙q, (9.1) where ˙q is the column vector (˙q 1 ˙q 2 ··· ˙q N ) T and the N × N matrix A is real and may be chosen to be symmetric. Furthermore, A, like any matrix 316 9.1 TYPICAL OSCILLATORY SYSTEMS corresponding to a kinetic energy, is positive definite; that is, whatever non-zero real values the ˙q i take, the quadratic form (9.1) has a value > 0. Turning now to the potential energy, we may write its value for a configuration q by means of a Taylor expansion about the origin q = 0, V(q)=V(0)+ summationdisplay i ∂V(0) ∂q i q i + 1 2 summationdisplay i summationdisplay j ∂ 2 V(0) ∂q i ∂q j q i q j +···. However, we have chosen V(0) = 0 and, since the origin is an equilibrium point, there is no force there and ∂V(0)/∂q i = 0. Consequently, to second order in the q i we also have a quadratic form, but in the coordinates rather than in their time derivatives: V = summationdisplay i summationdisplay j b ij q i q j = q T Bq, (9.2) where B is, or can be made, symmetric. In this case, and in general, the requirement that the potential is a minimum means that the potential matrix B, like the kinetic energy matrix A, is real and positive definite. 9.1 Typical oscillatory systems We now introduce particular examples, although the results of this section are general, given the above restrictions, and the reader will find it easy to apply the results to many other instances. Consider first a uniform rod of mass M and length l, attached by a light string also of length l to a fixed point P and executing small oscillations in a vertical plane. We choose as coordinates the angles θ 1 and θ 2 shown, with exaggerated magnitude, in figure 9.1. In terms of these coordinates the centre of gravity of the rod has, to first order in the θ i , a velocity component in the x-direction equal to l ˙ θ 1 + 1 2 l ˙ θ 2 andinthey-direction equal to zero. Adding in the rotational kinetic energy of the rod about its centre of gravity we obtain, to second order in the ˙ θ i , T ≈ 1 2 Ml 2 ( ˙ θ 2 1 + 1 4 ˙ θ 2 2 + ˙ θ 1 ˙ θ 2 )+ 1 24 Ml 2 ˙ θ 2 2 = 1 6 Ml 2 parenleftbig 3 ˙ θ 2 1 +3 ˙ θ 1 ˙ θ 2 + ˙ θ 2 2 parenrightbig = 1 12 Ml 2 ˙q T parenleftbigg 63 32 parenrightbigg ˙q, (9.3) where ˙q T =( ˙ θ 1 ˙ θ 2 ). The potential energy is given by V = Mlg bracketleftbig (1−cosθ 1 )+ 1 2 (1−cosθ 2 ) bracketrightbig (9.4) so that V ≈ 1 4 Mlg(2θ 2 1 + θ 2 2 )= 1 12 Mlgq T parenleftbigg 60 03 parenrightbigg q, (9.5) where g is the acceleration due to gravity and q =(θ 1 θ 2 ) T ; (9.5) is valid to second order in the θ i . 317 NORMAL MODES P P P l l θ 1 θ 1 θ 1 θ 2 θ 2 θ 2 (a) (b) (c) Figure 9.1 A uniform rod of length l attached to the fixed point P by a light string of the same length: (a) the general coordinate system; (b) approximation to the normal mode with lower frequency; (c) approximation to the mode with higher frequency. With these expressions for T and V we now apply the conservation of energy, d dt (T + V)=0, (9.6) assuming that there are no external forces other than gravity. In matrix form (9.6) becomes d dt (˙q T A˙q + q T Bq)=¨q T A˙q + ˙q T A¨q + ˙q T Bq + q T B˙q =0, which, using A = A T and B = B T , gives 2˙q T (A¨q + Bq)=0. We will assume, although it is not clear that this gives the only possible solution, that the above equation implies that the coeﬃcient of each ˙q i is separately zero. Hence A¨q + Bq = 0. (9.7) For a rigorous derivation Lagrange’s equations should be used, as in chapter 22. Nowwesearchforsetsofcoordinatesq that all oscillate with the same period, i.e. the total motion repeats itself exactly after a finite interval. Solutions of this form will satisfy q = xcosωt; (9.8) the relative values of the elements of x in such a solution will indicate how each 318 9.1 TYPICAL OSCILLATORY SYSTEMS coordinate is involved in this special motion. In general there will be N values of ω if the matrices A and B are N×N and these values are known as normal frequencies or eigenfrequencies. Putting (9.8) into (9.7) yields −ω 2 Ax + Bx =(B−ω 2 A)x = 0. (9.9) Our work in section 8.18 showed that this can have non-trivial solutions only if |B−ω 2 A|=0. (9.10) This is a form of characteristic equation for B, except that the unit matrix I has been replaced by A. It has the more familiar form if a choice of coordinates is made in which the kinetic energy T is a simple sum of squared terms, i.e. it has been diagonalised, and the scale of the new coordinates is then chosen to make each diagonal element unity. However, even in the present case, (9.10) can be solved to yield ω 2 k for k = 1,2,...,N,whereN is the order of A and B. The values of ω k can be used with (9.9) to find the corresponding column vector x k and the initial (stationary) physical configuration that, on release, will execute motion with period 2π/ω k . In equation (8.76) we showed that the eigenvectors of a real symmetric matrix were, except in the case of degeneracy of the eigenvalues, mutually orthogonal. In the present situation an analogous, but not identical, result holds. It is shown in section 9.3 that if x 1 and x 2 are two eigenvectors satisfying (9.9) for diﬀerent values of ω 2 then they are orthogonal in the sense that (x 2 ) T Ax 1 =0 and (x 2 ) T Bx 1 =0. The direct ‘scalar product’ (x 2 ) T x 1 , formally equal to (x 2 ) T Ix 1 ,isnot,ingeneral, equal to zero. Returning to the suspended rod, we find from (9.10) vextendsingle vextendsingle vextendsingle vextendsingle Mlg 12 parenleftbigg 60 03 parenrightbigg − ω 2 Ml 2 12 parenleftbigg 63 32 parenrightbiggvextendsingle vextendsingle vextendsingle vextendsingle =0. Writing ω 2 l/g = λ, this becomes vextendsingle vextendsingle vextendsingle vextendsingle 6−6λ −3λ −3λ 3−2λ vextendsingle vextendsingle vextendsingle vextendsingle =0 ⇒ λ 2 −10λ +6=0, which has roots λ =5± √ 19. Thus we find that the two normal frequencies are given by ω 1 =(0.641g/l) 1/2 and ω 2 =(9.359g/l) 1/2 . Putting the lower of the two values for ω 2 , namely (5− √ 19)g/l, into (9.9) shows that for this mode x 1 : x 2 =3(5− √ 19) : 6( √ 19−4) = 1.923 : 2.153. This corresponds to the case where the rod and string are almost straight out, i.e. they almost form a simple pendulum. Similarly it may be shown that the higher 319 NORMAL MODES frequency corresponds to a solution where the string and rod are moving with opposite phase and x 1 : x 2 =9.359 : −16.718. The two situations are shown in figure 9.1. In connection with quadratic forms it was shown in section 8.17 how to make a change of coordinates such that the matrix for a particular form becomes diagonal. In exercise 9.6 a method is developed for diagonalising simultaneously two quadratic forms (though the transformation matrix may not be orthogonal). If this process is carried out for A and B in a general system undergoing stable oscillations, the kinetic and potential energies in the new variables η i take the forms T = summationdisplay i µ i ˙η 2 i = ˙η T M˙η, M = diag (µ 1 ,µ 2 ,...,µ N ), (9.11) V = summationdisplay i ν i η 2 i = η T Nη, N = diag (ν 1 ,ν 2 ...,ν N ), (9.12) and the equations of motion are the uncoupled equations µ i ¨η i + ν i η i =0,i=1,2,...,N. (9.13) Clearly a simple renormalisation of the η i can be made that reduces all the µ i in (9.11) to unity. When this is done the variables so formed are called normal coordinates and equations (9.13) the normal equations. When a system is executing one of these simple harmonic motions it is said to be in a normal mode, and once started in such a mode it will repeat its motion exactly after each interval of 2π/ω i . Any arbitrary motion of the system may be written as a superposition of the normal modes, and each component mode will execute harmonic motion with the corresponding eigenfrequency; however, unless by chance the eigenfrequencies are in integer relationship, the system will never return to its initial configuration after any finite time interval. As a second example we will consider a number of masses coupled together by springs. For this type of situation the potential and kinetic energies are automat- ically quadratic functions of the coordinates and their derivatives, provided the elastic limits of the springs are not exceeded, and the oscillations do not have to be vanishingly small for the analysis to be valid. trianglerightsldFind the normal frequencies and modes of oscillation of three particles of masses m, µm, m connected in that order in a straight line by two equal light springs of force constant k. This arrangement could serve as a model for some linear molecules, e.g. CO 2 . The situation is shown in figure 9.2; the coordinates of the particles, x 1 , x 2 , x 3 ,are measured from their equilibrium positions, at which the springs are neither extended nor compressed. The kinetic energy of the system is simply T = 1 2 m parenleftbig ˙x 2 1 + µ˙x 2 2 + ˙x 2 3 parenrightbig , 320 9.1 TYPICAL OSCILLATORY SYSTEMS m m µm x 1 x 2 x 3 kk Figure 9.2 Three masses m, µm and m connected by two equal light springs of force constant k. (a) (b) (c) Figure 9.3 The normal modes of the masses and springs of a linear molecule such as CO 2 .(a)ω 2 =0;(b)ω 2 = k/m;(c)ω 2 =[(µ +2)/µ](k/m). whilst the potential energy stored in the springs is V = 1 2 k bracketleftbig (x 2 −x 1 ) 2 +(x 3 −x 2 ) 2 bracketrightbig . The kinetic- and potential-energy symmetric matrices are thus A = m 2 100 0 µ 0 001 , B = k 2 1 −10 −12−1 0 −11 . From (9.10), to find the normal frequencies we have to solve |B−ω 2 A|=0. Thus, writing mω 2 /k = λ, we have vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle 1−λ −10 −12−µλ −1 0 −11−λ vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle =0, which leads to λ =0,1or1+2/µ. The corresponding eigenvectors are respectively x 1 = 1 √ 3 1 1 1 , x 2 = 1 √ 2 1 0 −1 , x 3 = 1 radicalbig 2+(4/µ 2 ) 1 −2/µ 1 . The physical motions associated with these normal modes are illustrated in figure 9.3. The first, with λ = ω =0andallthex i equal, merely describes bodily translation of the whole system, with no (i.e. zero-frequency) internal oscillations. In the second solution the central particle remains stationary, x 2 = 0, whilst the other two oscillate with equal amplitudes in antiphase with each other. This motion, which has frequency ω =(k/m) 1/2 , is illustrated in figure 9.3(b). 321 NORMAL MODES The final and most complicated of the three normal modes has angular frequency ω = {[(µ +2)/µ](k/m)} 1/2 , and involves a motion of the central particle which is in antiphase with that of the two outer ones and which has an amplitude 2/µ timesasgreat. In this motion (see figure 9.3(c)) the two springs are compressed and extended in turn. We also note that in the second and third normal modes the centre of mass of the molecule remains stationary. triangleleftsld 9.2 Symmetry and normal modes It will have been noticed that the system in the above example has an obvious symmetry under the interchange of coordinates 1 and 3: the matrices A and B, the equations of motion and the normal modes illustrated in figure 9.3 are all unaltered by the interchange of x 1 and −x 3 . This reflects the more general result that for each physical symmetry possessed by a system, there is at least one normal mode with the same symmetry. The general question of the relationship between the symmetries possessed by a physical system and those of its normal modes will be taken up more formally in chapter 29 where the representation theory of groups is considered. However, we can show here how an appreciation of a system’s symmetry properties will sometimes allow its normal modes to be guessed (and then verified), something that is particularly helpful if the number of coordinates involved is greater than two and the corresponding eigenvalue equation (9.10) is a cubic or higher-degree polynomial equation. Consider the problem of determining the normal modes of a system consist- ing of four equal masses M at the corners of a square of side 2L, each pair of masses being connected by a light spring of modulus k that is unstretched in the equilibrium situation. As shown in figure 9.4, we introduce Cartesian coordinates x n ,y n , with n =1,2,3,4, for the positions of the masses and de- note their displacements from their equilibrium positions R n by q n = x n i + y n j. Thus r n = R n + q n with R n =±Li±Lj. The coordinates for the system are thus x 1 ,y 1 ,x 2 ,...,y 4 and the kinetic en- ergy matrix A is given trivially by MI 8 ,whereI 8 is the 8 × 8 identity ma- trix. The potential energy matrix B is much more diﬃcult to calculate and involves, for each pair of values m,n, evaluating the quadratic approximation to the expression b mn = 1 2 k parenleftbig |r m −r n |−|R m −R n | parenrightbig 2 . Expressing each r i in terms of q i and R i and making the normal assumption that 322 9.2 SYMMETRY AND NORMAL MODES MM MM kk k kk k x 1 y 1 x 2 y 2 x 3 y 3 x 4 y 4 Figure 9.4 The arrangement of four equal masses and six equal springs discussed in the text. The coordinate systems x n ,y n for n =1,2,3,4 measure the displacements of the masses from their equilibrium positions. |R m −R n |greatermuch|q m −q n |,weobtainb mn (= b nm ): b mn = 1 2 k bracketleftbig |(R m −R n )+(q m −q n )|−|R m −R n | bracketrightbig 2 = 1 2 k braceleftBig bracketleftbig |R m −R n | 2 +2(q m −q n ) · (R M −R n )+|q m −q n )| 2 bracketrightbig 1/2 −|R m −R n | bracerightBig 2 = 1 2 k|R m −R n | 2 braceleftBigg bracketleftbigg 1+ 2(q m −q n ) · (R M −R n ) |R m −R n | 2 + ··· bracketrightbigg 1/2 −1 bracerightBigg 2 ≈ 1 2 k braceleftbigg (q m −q n ) · (R M −R n ) |R m −R n | bracerightbigg 2 . This final expression is readily interpretable as the potential energy stored in the spring when it is extended by an amount equal to the component, along the equilibrium direction of the spring, of the relative displacement of its two ends. Applying this result to each spring in turn gives the following expressions for the elements of the potential matrix. mn 2b mn /k 12 (x 1 −x 2 ) 2 13 (y 1 −y 3 ) 2 14 1 2 (−x 1 + x 4 + y 1 −y 4 ) 2 23 1 2 (x 2 −x 3 + y 2 −y 3 ) 2 24 (y 2 −y 4 ) 2 34 (x 3 −x 4 ) 2 . 323 NORMAL MODES The potential matrix is thus constructed as B = k 4 3 −1 −2000−11 −13000−21−1 −2031−1 −100 0013−1 −10−2 00−1 −131−20 0 −2 −1 −11300 −1100−203−1 1 −10−200−13 . To solve the eigenvalue equation |B−λA| = 0 directly would mean solving an eigth-degree polynomial equation. Fortunately, we can exploit intuition and the symmetries of the system to obtain the eigenvectors and corresponding eigenvalues without such labour. Firstly, we know that bodily translation of the whole system, without any internal vibration, must be possible and that there will be two independent solutions of this form, corresponding to translations in the x-andy-directions. The eigenvector for the first of these (written in row form to save space) is x (1) =(10101010) T . Evaluation of Bx (1) gives Bx (1) =(00000000) T , showing that x (1) is a solution of (B−ω 2 A)x = 0 corresponding to the eigenvalue ω 2 = 0, whatever form Ax may take. Similarly, x (2) =(01010101) T is a second eigenvector corresponding to the eigenvalue ω 2 =0. The next intuitive solution, again involving no internal vibrations, and, there- fore, expected to correspond to ω 2 = 0, is pure rotation of the whole system about its centre. In this mode each mass moves perpendicularly to the line joining its position to the centre, and so the relevant eigenvector is x (3) = 1 √ 2 (111−1 −11−1 −1) T . It is easily verified that Bx (3) = 0 thus confirming both the eigenvector and the corresponding eigenvalue. The three non-oscillatory normal modes are illustrated in diagrams (a)–(c) of figure 9.5. We now come to solutions that do involve real internal oscillations, and, because of the four-fold symmetry of the system, we expect one of them to be a mode in which all the masses move along radial lines – the so-called ‘breathing 324 9.2 SYMMETRY AND NORMAL MODES (a) ω 2 =0 (b)ω 2 =0 (c)ω 2 =0 (d) ω 2 =2k/M (e) ω 2 = k/M (f) ω 2 = k/M (g) ω 2 = k/M (h) ω 2 = k/M Figure 9.5 The displacements and frequencies of the eight normal modes of the system shown in figure 9.4. Modes (a), (b) and (c) are not true oscillations: (a) and (b) are purely translational whilst (c) is a mode of bodily rotation. Mode (d), the ‘breathing mode’, has the highest frequency and the remaining four, (e)–(h), of lower frequency, are degenerate. mode’. Expressing this motion in coordinate form gives as the fourth eigenvector x (4) = 1 √ 2 (−1111−1 −11−1) T . Evaluation of Bx (4) yields Bx (4) = k 4 √ 2 (−8888−8 −88−8) T =2kx (4) , i.e. a multiple of x (4) , confirming that it is indeed an eigenvector. Further, since Ax (4) = Mx (4) , it follows from (B−ω 2 A)x = 0 that ω 2 =2k/M forthisnormal mode. Diagram (d) of the figure illustrates the corresponding motions of the four masses. As the next step in exploiting the symmetry properties of the system we note that, because of its reflection symmetry in the x-axis, the system is invariant under the double interchange of y 1 with −y 3 and y 2 with −y 4 . This leads us to try an eigenvector of the form x (5) =(0 α 0 β 0 −α 0 −β) T . Substituting this trial vector into (B−ω 2 A)x = 0 gives, of course, eight simulta- 325 NORMAL MODES neous equations for α and β, but they are all equivalent to just two, namely α + β =0, 5α + β = 4Mω 2 k α; these have the solution α =−β and ω 2 = k/M. The latter thus gives the frequency of the mode with eigenvector x (5) =(0 1 0 −10−101) T . Note that, in this mode, when the spring joining masses 1 and 3 is most stretched, the one joining masses 2 and 4 is at its most compressed. Similarly, based on reflection symmetry in the y-axis, x (6) =(1 0 −10−1010) T can be shown to be an eigenvector corresponding to the same frequency. These two modes are shown in diagrams (e) and (f) of figure 9.5. This accounts for six of the expected eight modes, and the other two could be found by considering motions that are symmetric about both diagonals of the square or are invariant under successive reflections in the x-andy- axes. However, since A is a multiple of the unit matrix, and since we know that (x (j) ) T Ax (i) =0if inegationslash= j, we can find the two remaining eigenvectors more easily by requiring them to be orthogonal to each of those found so far. Let us take the next (seventh) eigenvector, x (7) , to be given by x (7) =(abcdefgh) T . Then orthogonality with each of the x (n) for n =1,2,...,6 yields six equations satisfied by the unknowns a,b,...,h. As the reader may verify, they can be reduced to the six simple equations a + g =0,d+ f =0,a+ f = d + g, b + h =0,c+ e =0,b+ c = e + h. With six homogeneous equations for eight unknowns, eﬀectively separated into two groups of four, we may pick one in each group arbitrarily. Taking a = b =1 gives d = e = 1 and c = f = g = h =−1 as a solution. Substitution of x (7) =(1 1 −111−1 −1 −1) T . into the eigenvalue equation checks that it is an eigenvector and shows that the corresponding eigenfrequency is given by ω 2 = k/M. We now have the eigenvectors for seven of the eight normal modes and the eighth can be found by making it simultaneously orthogonal to each of the other seven. It is left to the reader to show (or verify) that the final solution is x (8) =(1 −111−1 −1 −11) T 326 9.3 RAYLEIGH–RITZ METHOD and that this mode has the same frequency as three of the other modes. The general topic of the degeneracy of normal modes is discussed in chapter 29. The movements associated with the final two modes are shown in diagrams (g) and (h) of figure 9.5; this figure summarises all eight normal modes and frequencies. Although this example has been lengthy to write out, we have seen that the actual calculations are quite simple and provide the full solution to what is formally a matrix eigenvalue equation involving 8 × 8 matrices. It should be noted that our exploitation of the intrinsic symmetries of the system played a crucial part in finding the correct eigenvectors for the various normal modes. 9.3 Rayleigh–Ritz method We conclude this chapter with a discussion of the Rayleigh–Ritz method for estimating the eigenfrequencies of an oscillating system. We recall from the introduction to the chapter that for a system undergoing small oscillations the potential and kinetic energy are given by V = q T Bq and T = ˙q T A˙q, where the components of q are the coordinates chosen to represent the configura- tion of the system and A and B are symmetric matrices (or may be chosen to be such). We also recall from (9.9) that the normal modes x i and the eigenfrequencies ω i are given by (B−ω 2 i A)x i = 0. (9.14) It may be shown that the eigenvectors x i corresponding to diﬀerent normal modes are linearly independent and so form a complete set. Thus, any coordinate vector q can be written q = summationtext j c j x j . We now consider the value of the generalised quadratic form λ(x)= x T Bx x T Ax = summationtext m (x m ) T c ∗ m B summationtext i c i x i summationtext j (x j ) T c ∗ j A summationtext k c k x k , which, since both numerator and denominator are positive definite, is itself non- negative. Equation (9.14) can be used to replace Bx i , with the result that λ(x)= summationtext m (x m ) T c ∗ m A summationtext i ω 2 i c i x i summationtext j (x j ) T c ∗ j A summationtext k c k x k = summationtext m (x m ) T c ∗ m summationtext i ω 2 i c i Ax i summationtext j (x j ) T c ∗ j A summationtext k c k x k . (9.15) Now the eigenvectors x i obtained by solving (B−ω 2 A)x = 0 are not mutually orthogonal unless either A or B is a multiple of the unit matrix. However, it may 327 NORMAL MODES be shown that they do possess the desirable properties (x j ) T Ax i =0 and (x j ) T Bx i =0 ifinegationslash= j. (9.16) This result is proved as follows. From (9.14) it is clear that, for general i and j, (x j ) T (B−ω 2 i A)x i = 0. (9.17) But, by taking the transpose of (9.14) with i replaced by j and recalling that A and B are real and symmetric, we obtain (x j ) T (B−ω 2 j A)=0. Forming the scalar product of this with x i and subtracting the result from (9.17) gives (ω 2 j −ω 2 i )(x j ) T Ax i =0. Thus, for i negationslash= j and non-degenerate eigenvalues ω 2 i and ω 2 j , we have that (x j ) T Ax i = 0, and substituting this into (9.17) immediately establishes the corre- sponding result for (x j ) T Bx i . Clearly, if either A or B is a multiple of the unit matrix then the eigenvectors are mutually orthogonal in the normal sense. The orthogonality relations (9.16) are derived again, and extended, in exercise 9.6. Using the first of the relationships (9.16) to simplify (9.15), we find that λ(x)= summationtext i |c i | 2 ω 2 i (x i ) T Ax i summationtext k |c k | 2 (x k ) T Ax k . (9.18) Now, if ω 2 0 is the lowest eigenfrequency then ω 2 i ≥ω 2 0 for all i and, further, since (x i ) T Ax i ≥0 for all i the numerator of (9.18) is ≥ω 2 0 summationtext i |c i | 2 (x i ) T Ax i .Hence λ(x)≡ x T Bx x T Ax ≥ω 2 0 , (9.19) for any x whatsoever (whether x is an eigenvector or not). Thus we are able to estimate the lowest eigenfrequency of the system by evaluating λ for a variety of vectors x, the components of which, it will be recalled, give the ratios of the coordinate amplitudes. This is sometimes a useful approach if many coordinates are involved and direct solution for the eigenvalues is not possible. An additional result is that the maximum eigenfrequency ω 2 m may also be estimated. It is obvious that if we replace the statement ‘ω 2 i ≥ ω 2 0 for all i’by ‘ω 2 i ≤ ω 2 m for all i’, then λ(x) ≤ ω 2 m for any x. Thus λ(x) always lies between the lowest and highest eigenfrequencies of the system. Furthermore, λ(x) has a stationary value, equal to ω 2 k ,whenx is the kth eigenvector (see subsection 8.17.1). 328 9.4 EXERCISES trianglerightsldEstimate the eigenfrequencies of the oscillating rod of section 9.1. Firstly we recall that A = Ml 2 12 parenleftbigg 63 32 parenrightbigg and B = Mlg 12 parenleftbigg 60 03 parenrightbigg . Physical intuition suggests that the slower mode will have a configuration approximating that of a simple pendulum (figure 9.1), in which θ 1 = θ 2 , and so we use this as a trial vector.Takingx =(θθ) T , λ(x)= x T Bx x T Ax = 3Mlgθ 2 /4 7Ml 2 θ 2 /6 = 9g 14l =0.643 g l , and we conclude from (9.19) that the lower (angular) frequency is ≤ (0.643g/l) 1/2 .We have already seen on p. 319 that the true answer is (0.641g/l) 1/2 and so we have come very close to it. Next we turn to the higher frequency. Here, a typical pattern of oscillation is not so obvious but, rather preempting the answer, we try θ 2 = −2θ 1 ; we then obtain λ =9g/l and so conclude that the higher eigenfrequency ≥(9g/l) 1/2 . We have already seen that the exact answer is (9.359g/l) 1/2 and so again we have come close to it. triangleleftsld A simplified version of the Rayleigh–Ritz method may be used to estimate the eigenvalues of a symmetric (or in general Hermitian) matrix B, the eigenvectors of which will be mutually orthogonal. By repeating the calculations leading to (9.18), A being replaced by the unit matrix I, it is easily verified that if λ(x)= x T Bx x T x is evaluated for any vector x then λ 1 ≤λ(x)≤λ m , where λ 1 ,λ 2 ...,λ m are the eigenvalues of B in order of increasing size. A similar result holds for Hermitian matrices. 9.4 Exercises 9.1 Three coupled pendulums swing perpendicularly to the horizontal line containing their points of suspension, and the following equations of motion are satisfied: −m¨x 1 = cmx 1 + d(x 1 −x 2 ), −M¨x 2 = cMx 2 + d(x 2 −x 1 )+d(x 2 −x 3 ), −m¨x 3 = cmx 3 + d(x 3 −x 2 ), where x 1 , x 2 and x 3 are measured from the equilibrium points; m, M and m are the masses of the pendulum bobs; and c and d are positive constants. Find the normal frequencies of the system and sketch the corresponding patterns of oscillation. What happens as d→0ord→∞? 9.2 A double pendulum, smoothly pivoted at A, consists of two light rigid rods, AB and BC, each of length l, which are smoothly jointed at B and carry masses m and αm at B and C respectively. The pendulum makes small oscillations in one plane 329 NORMAL MODES under gravity. At time t, AB and BC make angles θ(t)andφ(t), respectively, with the downward vertical. Find quadratic expressions for the kinetic and potential energies of the system and hence show that the normal modes have angular frequencies given by ω 2 = g l bracketleftBig 1+α± radicalbig α(1 + α) bracketrightBig . For α =1/3, show that in one of the normal modes the mid-point of BC does not move during the motion. 9.3 Continue the worked example, modelling a linear molecule, discussed at the end ofsection9.1,forthecaseinwhichµ =2. (a) Show that the eigenvectors derived there have the expected orthogonality properties with respect to both A and B. (b) For the situation in which the atoms are released from rest with initial displacements x 1 =2epsilon1, x 2 = −epsilon1 and x 3 = 0, determine their subsequent motions and maximum displacements. 9.4 Consider the circuit consisting of three equal capacitors and two diﬀerent in- ductors shown in the figure. For charges Q i on the capacitors and currents I i L 1 L 2 CC C I 1 I 2 Q 1 Q 2 Q 3 through the components, write down Kirchhoﬀ’s law for the total voltage change around each of two complete circuit loops. Note that, to within an unimportant constant, the conservation of current implies that Q 3 = Q 1 −Q 2 . Express the loop equations in the form given in (9.7), namely A ¨ Q + BQ = 0. Use this to show that the normal frequencies of the circuit are given by ω 2 = 1 CL 1 L 2 bracketleftbig L 1 + L 2 ±(L 2 1 + L 2 2 −L 1 L 2 ) 1/2 bracketrightbig . Obtain the same matrices and result by finding the total energy stored in the various capacitors (typically Q 2 /(2C)) and in the inductors (typically LI 2 /2). For the special case L 1 = L 2 = L determine the relevant eigenvectors and so describe the patterns of current flow in the circuit. 9.5 It is shown in physics and engineering textbooks that circuits containing capaci- tors and inductors can be analysed by replacing a capacitor of capacitance C by a ‘complex impedance’ 1/(iωC) and an inductor of inductance L by an impedance iωL,whereω is the angular frequency of the currents flowing and i 2 =−1. Use this approach and Kirchhoﬀ’s circuit laws to analyse the circuit shown in 330 9.4 EXERCISES the figure and obtain three linear equations governing the currents I 1 , I 2 and I 3 . Show that the only possible frequencies of self-sustaining currents satisfy either LL CC C I 1 I 2 I 3 P Q RS T U (a) ω 2 LC =1or(b)3ω 2 LC = 1. Find the corresponding current patterns and, in each case, by identifying parts of the circuit in which no current flows, draw an equivalent circuit that contains only one capacitor and one inductor. 9.6 The simultaneous reduction to diagonal form of two real symmetric quadratic forms. Consider the two real symmetric quadratic forms u T Au and u T Bu,whereu T stands for the row matrix (xyz), and denote by u n those column matrices that satisfy Bu n = λ n Au n , (E9.1) in which n is a label and the λ n are real, non-zero and all diﬀerent. (a) By multiplying (E9.1) on the left by (u m ) T , and the transpose of the corre- sponding equation for u m on the right by u n , show that (u m ) T Au n =0for nnegationslash= m. (b) By noting that Au n =(λ n ) −1 Bu n , deduce that (u m ) T Bu n =0formnegationslash= n. (c) It can be shown that the u n are linearly independent; the next step is to construct a matrix P whose columns are the vectors u n . (d) Make a change of variables u = Pv such that u T Au becomes v T Cv,andu T Bu becomes v T Dv. Show that C and D are diagonal by showing that c ij =0if inegationslash= j, and similarly for d ij . Thus u = Pv or v = P −1 u reduces both quadratics to diagonal form. To summarise, the method is as follows: (a) find the λ n that allow (E9.1) a non-zero solution, by solving |B−λA|=0; (b) for each λ n construct u n ; (c) construct the non-singular matrix P whose columns are the vectors u n ; (d) make the change of variable u = Pv. 9.7 (It is recommended that the reader does not attempt this question until exercise 9.6 has been studied.) If, in the pendulum system studied in section 9.1, the string is replaced by a second rod identical to the first then the expressions for the kinetic energy T and the potential energy V become(tosecondorderintheθ i ) T ≈Ml 2 parenleftbig 8 3 ˙ θ 2 1 +2 ˙ θ 1 ˙ θ 2 + 2 3 ˙ θ 2 2 parenrightbig , V ≈Mgl parenleftbig 3 2 θ 2 1 + 1 2 θ 2 2 parenrightbig . Determine the normal frequencies of the system and find new variables ξ and η that will reduce these two expressions to diagonal form, i.e. to a 1 ˙ ξ 2 + a 2 ˙η 2 and b 1 ξ 2 + b 2 η 2 . 331 NORMAL MODES 9.8 (It is recommended that the reader does not attempt this question until exercise 9.6 has been studied.) Find a real linear transformation that simultaneously reduces the quadratic forms 3x 2 +5y 2 +5z 2 +2yz +6zx−2xy, 5x 2 +12y 2 +8yz +4zx to diagonal form. 9.9 Three particles of mass m are attached to a light horizontal string having fixed ends, the string being thus divided into four equal portions each of length a and under a tension T. Show that for small transverse vibrations the amplitudes x i of the normal modes satisfy Bx =(maω 2 /T)x,whereB is the matrix 2 −10 −12−1 0 −12 . Estimate the lowest and highest eigenfrequencies using trial vectors (343) T and (3 −43) T . Use also the exact vectors parenleftBig 1 √ 21 parenrightBig T and parenleftBig 1 − √ 21 parenrightBig T and compare the results. 9.10 Use the Rayleigh–Ritz method to estimate the lowest oscillation frequency of a heavy chain of N links, each of length a (= L/N), which hangs freely from one end. (Try simple calculable configurations such as all links but one vertical, or all links collinear, etc.) 9.5 Hints and answers 9.1 See figure 9.6. 9.3 (b) x 1 = epsilon1(cosωt+cos √ 2ωt),x 2 =−epsilon1cos √ 2ωt, x 3 = epsilon1(−cosωt +cos √ 2ωt). At various times the three displacements will reach 2epsilon1,epsilon1,2epsilon1 respectively. For exam- ple, x 1 canbewrittenas2epsilon1 cos[( √ 2−1)ωt/2]cos[( √ 2+1)ωt/2], i.e. an oscillation of angular frequency ( √ 2+1)ω/2 and modulated amplitude 2epsilon1 cos[( √ 2−1)ωt/2]; the amplitude will reach 2epsilon1 after a time ≈4π/[ω( √ 2−1)]. 9.5 As the circuit loops contain no voltage sources, the equations are homogeneous, and so for a non-trivial solution the determinant of coeﬃcients must vanish. (a) I 1 =0,I 2 = −I 3 ; no current in PQ; equivalent to two separate circuits of capacitance C and inductance L. (b) I 1 =−2I 2 =−2I 3 ; no current in TU; capacitance 3C/2 and inductance 2L. 9.7 ω =(2.634g/l) 1/2 or (0.3661g/l) 1/2 ; θ 1 = ξ + η, θ 2 =1.431ξ−2.097η. 9.9 Estimated, 10/17 aand B = B(ρ)forρa.LikeB, the vector potential is continuous at ρ = a. (c) The gas pressure p(ρ) satisfies the hydrostatic equation ∇p = J × B and vanishes at the outer wall of the cylinder. Find a general expression for p. 10.18 Evaluate the Laplacian of a vector field using two diﬀerent coordinate systems as follows. 372 10.11 EXERCISES (a) For cylindrical polar coordinates ρ,φ,z, evaluate the derivatives of the three unit vectors with respect to each of the coordinates, showing that only ∂ˆe ρ /∂φ and ∂ˆe φ /∂φ are non-zero. (i) Hence evaluate∇ 2 a when a is the vector ˆe ρ , i.e. a vector of unit magnitude everywhere directed radially outwards and expressed by a ρ =1,a φ = a z =0. (ii) Note that it is trivially obvious that ∇×a = 0 and hence that equation (10.41) requires that ∇(∇·a)=∇ 2 a. (iii) Evaluate ∇(∇·a) and show that the latter equation holds, but that [∇(∇·a)] ρ negationslash=∇ 2 a ρ . (b) Rework the same problem in Cartesian coordinates (where, as it happens, the algebra is more complicated). 10.19 Maxwell’s equations for electromagnetism in free space (i.e. in the absence of charges, currents and dielectric or magnetic media) can be written (i) ∇·B =0, (ii) ∇·E =0, (iii) ∇×E + ∂B ∂t = 0, (iv) ∇×B− 1 c 2 ∂E ∂t = 0. A vector A is defined by B =∇×A,andascalarφ by E =−∇φ−∂A/∂t. Show that if the condition (v) ∇·A + 1 c 2 ∂φ ∂t =0 is imposed (this is known as choosing the Lorentz gauge), then A and φ satisfy wave equations as follows: (vi) ∇ 2 φ− 1 c 2 ∂ 2 φ ∂t 2 =0, (vii) ∇ 2 A− 1 c 2 ∂ 2 A ∂t 2 = 0. The reader is invited to proceed as follows. (a) Verify that the expressions for B and E in terms of A and φ are consistent with (i) and (iii). (b) Substitute for E in (ii) and use the derivative with respect to time of (v) to eliminate A from the resulting expression. Hence obtain (vi). (c) Substitute for B and E in (iv) in terms of A and φ. Then use the gradient of (v) to simplify the resulting equation and so obtain (vii). 10.20 In a description of the flow of a very viscous fluid that uses spherical polar coordinates with axial symmetry, the components of the velocity field u are given in terms of the stream function ψ by u r = 1 r 2 sinθ ∂ψ ∂θ ,u θ = −1 r sinθ ∂ψ ∂r . Find an explicit expression for the diﬀerential operator E defined by Eψ =−(rsinθ)(∇×u) φ . The stream function satisfies the equation of motion E 2 ψ = 0 and, for the flow of a fluid past a sphere, takes the form ψ(r,θ)=f(r)sin 2 θ. Show that f(r)satisfies the (ordinary) diﬀerential equation r 4 f (4) −4r 2 f primeprime +8rf prime −8f =0. 373 VECTOR CALCULUS 10.21 Paraboloidal coordinates u,v,φ are defined in terms of Cartesian coordinates by x = uv cosφ, y = uv sinφ, z = 1 2 (u 2 −v 2 ). Identify the coordinate surfaces in the u,v,φ system. Verify that each coordinate surface (u = constant, say) intersects every coordinate surface on which one of the other two coordinates (v, say) is constant. Show further that the system of coordinates is an orthogonal one and determine its scale factors. Prove that the u-component of ∇×a is given by 1 (u 2 + v 2 ) 1/2 parenleftbigg a φ v + ∂a φ ∂v parenrightbigg − 1 uv ∂a v ∂φ . 10.22 Non-orthogonal curvilinear coordinates are diﬃcult to work with and should be avoided if at all possible, but the following example is provided to illustrate the content of section 10.10. In a new coordinate system for the region of space in which the Cartesian coordinate z satisfies z ≥0, the position of a point r is given by (α 1 ,α 2 ,R), where α 1 and α 2 are respectively the cosines of the angles made by r with the x-andy- coordinate axes of a Cartesian system and R = |r|. The ranges are −1 ≤α i ≤ 1, 0≤R<∞. (a) Express r in terms of α 1 ,α 2 ,Rand the unit Cartesian vectors i, j, k. (b) Obtain expressions for the vectors e i (= ∂r/∂α 1 ,...) and hence show that the scale factors h i are given by h 1 = R(1−α 2 2 ) 1/2 (1−α 2 1 −α 2 2 ) 1/2 ,h 2 = R(1−α 2 1 ) 1/2 (1−α 2 1 −α 2 2 ) 1/2 ,h 3 =1. (c) Verify formally that the system is not an orthogonal one. (d) Show that the volume element of the coordinate system is dV = R 2 dα 1 dα 2 dR (1−α 2 1 −α 2 2 ) 1/2 , and demonstrate that this is always less than or equal to the corresponding expression for an orthogonal curvilinear system. (e) Calculate the expression for (ds) 2 for the system, and show that it diﬀers from that for the corresponding orthogonal system by 2α 1 α 2 R 2 1−α 2 1 −α 2 2 dα 1 dα 2 . 10.23 Hyperbolic coordinates u,v,φ are defined in terms of Cartesian coordinates by x =coshucosv cosφ, y =coshucosv sinφ, z =sinhusinv. Sketch the coordinate curves in the φ = 0 plane, showing that far from the origin they become concentric circles and radial lines. In particular, identify the curves u =0,v=0,v= π/2andv = π. Calculate the tangent vectors at a general point, show that they are mutually orthogonal and deduce that the appropriate scale factors are h u = h v =(cosh 2 u−cos 2 v) 1/2 ,h φ =coshucosv. Find the most general function ψ(u)ofu only that satisfies Laplace’s equation ∇ 2 ψ =0. 10.24 In a Cartesian system, A and B are the points (0,0,−1) and (0,0,1) respectively. In a new coordinate system a general point P is given by (u 1 ,u 2 ,u 3 )with u 1 = 1 2 (r 1 + r 2 ), u 2 = 1 2 (r 1 −r 2 ), u 3 = φ;herer 1 and r 2 are the distances AP and BP and φ is the angle between the plane ABP and y =0. 374 10.12 HINTS AND ANSWERS (a) Express z and the perpendicular distance ρ from P to the z-axis in terms of u 1 ,u 2 ,u 3 . (b) Evaluate ∂x/∂u i , ∂y/∂u i , ∂z/∂u i ,fori =1,2,3. (c) Find the Cartesian components of ˆu j and hence show that the new coordi- nates are mutually orthogonal. Evaluate the scale factors and the infinitesimal volume element in the new coordinate system. (d) Determine and sketch the forms of the surfaces u i =constant. (e) Find the most general function f of u 1 only that satisfies ∇ 2 f =0. 10.12 Hints and answers 10.1 Group the term so that they form the total derivatives of compound vector expressions. The integral has the value a×(a×b)+h. 10.3 For crossed uniform fields, ¨x+(Bq/m) 2 x = q(E−Bv 0 )/m, ¨y =0,m˙z = qBx+mv 0 ; (b) ξ = Bqt/m; the path is a cycloid in the plane y =0;ds =[(dx/dt) 2 + (dz/dt) 2 ] 1/2 dt. 10.5 g = ¨r prime −ω×(ω×r), where ¨r prime is the shell’s acceleration measured by an observer fixed in space. To first order in ω, the direction of g is radial, i.e. parallel to ¨r prime . (a) Note that s is orthogonal to g. (b) If the actual time of flight is T,use(s +∆)·g =0toshowthat T ≈τ(1 + 2g −2 (g×ω)·v +···). In the Coriolis terms, it is suﬃcient to put T ≈τ. (c) For this situation (g×ω)· v =0andω×v = 0; τ≈ 43 s and ∆ = 10–15 m to the East. 10.7 (a) Evaluate (dr/du)·(dr/du). (b) Integrate the previous result between u =0andu =1. (c) ˆ t =[ √ 2(1 + u 2 )] −1 [(1−u 2 )i +2uj +(1+u 2 )k]. Use d ˆ t/ds =(d ˆ t/du)/(ds/du); ρ −1 =|d ˆ t/ds|. (d) ˆn =(1+u 2 ) −1 [−2ui+(1−u 2 )j]. ˆ b =[ √ 2(1+u 2 )] −1 [(u 2 −1)i−2uj+(1+u 2 )k]. Use d ˆ b/ds =(d ˆ b/du)/(ds/du) and show that this equals −[3a(1 + u 2 ) 2 ] −1 ˆn. (e) Show that dˆn/ds = τ( ˆ b− ˆ t)=−2[3 √ 2a(1 + u 2 ) 3 ] −1 [(1−u 2 )i +2uj]. 10.9 Note that dB =(dr ·∇)B and that B = B ˆ t,with ˆ t = dr/ds.Obtain(B ·∇)B/B = ˆ t(dB/ds)+ˆn(B/ρ) and then take the vector product of ˆ t with this equation. 10.11 To integrate sec 2 φ(sec 2 φ +tan 2 φ) 1/2 dφ put tanφ =2 −1/2 sinhψ. 10.13 Work in Cartesian coordinates, regrouping the terms obtained by evaluating the divergence on the LHS. 10.15 (a) 2z(x 2 +y 2 +z 2 ) −3 [(y 2 +z 2 )(y 2 +z 2 −3x 2 )−4x 4 ]; (b) 2r −1 cosθ(1−5sin 2 θ cos 2 φ); both are equal to 2zr −4 (r 2 −5x 2 ). 10.17 Use the formulae given in table 10.2. (a) C =−B 0 /(µ 0 a);B(ρ)=B 0 ρ/a. (b) B 0 ρ 2 /(3a)forρa. (c) [B 2 0 /(2µ 0 )][1−(ρ/a) 2 ]. 10.19 Recall that ∇×∇φ = 0 for any scalar φ and that ∂/∂t and ∇ act on diﬀerent variables. 10.21 Two sets of paraboloids of revolution about the z-axis and the sheaf of planes containing the z-axis. For constant u,−∞ 0, 0forrnegationslash= p, (12.2) integraldisplay x0+L x0 sin parenleftbigg 2πrx L parenrightbigg sin parenleftbigg 2πpx L parenrightbigg dx = 0forr = p =0, 1 2 L for r = p>0, 0forrnegationslash= p, (12.3) where r and p are integers greater than or equal to zero; these formulae are easily derived. A full discussion of why it is possible to expand a function as a sum of mutually orthogonal functions is given in chapter 17. The Fourier series expansion of the function f(x) is conventionally written f(x)= a 0 2 + ∞ summationdisplay r=1 bracketleftbigg a r cos parenleftbigg 2πrx L parenrightbigg + b r sin parenleftbigg 2πrx L parenrightbiggbracketrightbigg , (12.4) where a 0 ,a r ,b r are constants called the Fourier coeﬃcients. These coeﬃcients are analogous to those in a power series expansion and the determination of their numerical values is the essential step in writing a function as a Fourier series. This chapter continues with a discussion of how to find the Fourier coeﬃcients for particular functions. We then discuss simplifications to the general Fourier series that may save considerable eﬀort in calculations. This is followed by the alternative representation of a function as a complex Fourier series, and we conclude with a discussion of Parseval’s theorem. 12.2 The Fourier coeﬃcients We have indicated that a series that satisfies the Dirichlet conditions may be written in the form (12.4). We now consider how to find the Fourier coeﬃcients for any particular function. For a periodic function f(x)ofperiodL we will find that the Fourier coeﬃcients are given by a r = 2 L integraldisplay x0+L x0 f(x)cos parenleftbigg 2πrx L parenrightbigg dx, (12.5) b r = 2 L integraldisplay x0+L x0 f(x)sin parenleftbigg 2πrx L parenrightbigg dx, (12.6) where x 0 is arbitrary but is often taken as 0 or −L/2. The apparently arbitrary factor 1 2 which appears in the a 0 term in (12.4) is included so that (12.5) may 417 FOURIER SERIES apply for r = 0 as well as r>0. The relations (12.5) and (12.6) may be derived as follows. Suppose the Fourier series expansion of f(x) can be written as in (12.4), f(x)= a 0 2 + ∞ summationdisplay r=1 bracketleftbigg a r cos parenleftbigg 2πrx L parenrightbigg + b r sin parenleftbigg 2πrx L parenrightbiggbracketrightbigg . Then, multiplying by cos(2πpx/L), integrating over one full period in x and changing the order of the summation and integration, we get integraldisplay x0+L x0 f(x)cos parenleftbigg 2πpx L parenrightbigg dx = a 0 2 integraldisplay x0+L x0 cos parenleftbigg 2πpx L parenrightbigg dx + ∞ summationdisplay r=1 a r integraldisplay x0+L x0 cos parenleftbigg 2πrx L parenrightbigg cos parenleftbigg 2πpx L parenrightbigg dx + ∞ summationdisplay r=1 b r integraldisplay x0+L x0 sin parenleftbigg 2πrx L parenrightbigg cos parenleftbigg 2πpx L parenrightbigg dx. (12.7) We can now find the Fourier coeﬃcients by considering (12.7) as p takes diﬀerent values. Using the orthogonality conditions (12.1)–(12.3) of the previous section, we find that when p = 0 (12.7) becomes integraldisplay x0+L x0 f(x)dx = a 0 2 L. When p negationslash= 0 the only non-vanishing term on the RHS of (12.7) occurs when r = p,andso integraldisplay x0+L x0 f(x)cos parenleftbigg 2πrx L parenrightbigg dx = a r 2 L. The other Fourier coeﬃcients b r may be found by repeating the above process but multiplying by sin(2πpx/L) instead of cos(2πpx/L) (see exercise 12.2). trianglerightsldExpress the square-wave function illustrated in figure 12.2 as a Fourier series. Physically this might represent the input to an electrical circuit that switches between a high and a low state with time period T. The square wave may be represented by f(t)= braceleftBigg −1for− 1 2 T ≤t<0, +1 for 0≤t< 1 2 T. In deriving the Fourier coeﬃcients, we note firstly that the function is an odd function and so the series will contain only sine terms (this simplification is discussed further in the 418 12.3 SYMMETRY CONSIDERATIONS 1 −1 0 T 2 − T 2 t f(t) Figure 12.2 A square-wave function. following section). To evaluate the coeﬃcients in the sine series we use (12.6). Hence b r = 2 T integraldisplay T/2 −T/2 f(t)sin parenleftbigg 2πrt T parenrightbigg dt = 4 T integraldisplay T/2 0 sin parenleftbigg 2πrt T parenrightbigg dt = 2 πr [1−(−1) r ]. Thus the sine coeﬃcients are zero if r is even and equal to 4/(πr)ifr is odd. Hence the Fourier series for the square-wave function may be written as f(t)= 4 π parenleftbigg sinωt + sin3ωt 3 + sin5ωt 5 +··· parenrightbigg , (12.8) where ω =2π/T is called the angular frequency. triangleleftsld 12.3 Symmetry considerations The example in the previous section employed the useful property that since the function to be represented was odd, all the cosine terms of the Fourier series were absent. It is often the case that the function we wish to express as a Fourier series has a particular symmetry, which we can exploit to reduce the calculational labour of evaluating Fourier coeﬃcients. Functions that are symmetric or antisymmetric about the origin (i.e. even and odd functions respectively) admit particularly useful simplifications. Functions that are odd in x have no cosine terms (see section 12.1) and all the a-coeﬃcients are equal to zero. Similarly, functions that are even in x have no sine terms and all the b-coeﬃcients are zero. Since the Fourier series of odd or even functions contain only half the coeﬃcients required for a general periodic function, there is a considerable reduction in the algebra needed to find a Fourier series. The consequences of symmetry or antisymmetry of the function about the quarter period (i.e. about L/4) are a little less obvious. Furthermore, the results 419 FOURIER SERIES are not used as often as those above and the remainder of this section can be omitted on a first reading without loss of continuity. The following argument gives the required results. Suppose that f(x) has even or odd symmetry about L/4, i.e. f(L/4−x)= ±f(x−L/4). For convenience, we make the substitution s = x−L/4 and hence f(−s)=±f(s). We can now see that b r = 2 L integraldisplay x0+L x0 f(s)sin parenleftbigg 2πrs L + πr 2 parenrightbigg ds, where the limits of integration have been left unaltered since f is, of course, periodic in s as well as in x. If we use the expansion sin parenleftbigg 2πrs L + πr 2 parenrightbigg =sin parenleftbigg 2πrs L parenrightbigg cos parenleftBig πr 2 parenrightBig +cos parenleftbigg 2πrs L parenrightbigg sin parenleftBig πr 2 parenrightBig , we can immediately see that the trigonometric part of the integrand is an odd function of s if r is even and an even function of s if r is odd. Hence if f(s)is even and r is even then the integral is zero, and if f(s) is odd and r is odd then the integral is zero. Similar results can be derived for the Fourier a-coeﬃcients and we conclude that (i) if f(x) is even about L/4thena 2r+1 = 0 and b 2r =0, (ii) if f(x) is odd about L/4thena 2r = 0 and b 2r+1 =0. All the above results follow automatically when the Fourier coeﬃcients are evaluated in any particular case, but prior knowledge of them will often enable some coeﬃcients to be set equal to zero on inspection and so substantially reduce the computational labour. As an example, the square-wave function shown in figure 12.2 is (i) an odd function of t,sothatalla r = 0, and (ii) even about the point t = T/4, so that b 2r = 0. Thus we can say immediately that only sine terms of odd harmonics will be present and therefore will need to be calculated; this is confirmed in the expansion (12.8). 12.4 Discontinuous functions The Fourier series expansion usually works well for functions that are discon- tinuous in the required range. However, the series itself does not produce a discontinuous function and we state without proof that the value of the ex- panded f(x) at a discontinuity will be half-way between the upper and lower values. Expressing this more mathematically, at a point of finite discontinuity, x d , the Fourier series converges to 1 2 lim epsilon1→0 [f(x d + epsilon1)+f(x d −epsilon1)]. At a discontinuity, the Fourier series representation of the function will overshoot its value. Although as more terms are included the overshoot moves in position 420 12.4 DISCONTINUOUS FUNCTIONS (a) (b) (c) (d) −1 −1 −1 −1 1 1 1 1 − T 2 − T 2 − T 2 − T 2 T 2 T 2 T 2 T 2 δ Figure 12.3 The convergence of a Fourier series expansion of a square-wave function, including (a) one term, (b) two terms, (c) three terms and (d) 20 terms. The overshoot δ is shown in (d). arbitrarily close to the discontinuity, it never disappears even in the limit of an infinite number of terms. This behaviour is known as Gibbs’ phenomenon.Afull discussion is not pursued here but suﬃce it to say that the size of the overshoot is proportional to the magnitude of the discontinuity. trianglerightsldFind the value to which the Fourier series of the square-wave function discussed in sec- tion 12.2 converges at t =0. It can be seen that the function is discontinuous at t = 0 and, by the above rule, we expect the series to converge to a value half-way between the upper and lower values, in other words to converge to zero in this case. Considering the Fourier series of this function, (12.8), we see that all the terms are zero and hence the Fourier series converges to zero as expected. The Gibbs phenomenon for the square-wave function is shown in figure 12.3. triangleleftsld 421 FOURIER SERIES (a) (b) (c) (d) 0 0 0 0 L L L L 2L 2L 2L Figure 12.4 Possible periodic extensions of a function. 12.5 Non-periodic functions We have already mentioned that a Fourier representation may sometimes be used for non-periodic functions. If we wish to find the Fourier series of a non-periodic function only within a fixed range then we may continue the function outside the range so as to make it periodic. The Fourier series of this periodic function would then correctly represent the non-periodic function in the desired range. Since we are often at liberty to extend the function in a number of ways, we can sometimes make it odd or even and so reduce the calculation required. Figure 12.4(b) shows the simplest extension to the function shown in figure 12.4(a). However, this extension has no particular symmetry. Figures 12.4(c), (d) show extensions as odd and even functions respectively with the benefit that only sine or cosine terms appear in the resulting Fourier series. We note that these last two extensions give a function of period 2L. In view of the result of section 12.4, it must be added that the continuation must not be discontinuous at the end-points of the interval of interest; if it is the series will not converge to the required value there. This requirement that the series converges appropriately may reduce the choice of continuations. This is discussed further at the end of the following example. trianglerightsldFind the Fourier series of f(x)=x 2 for 0 90% of the total. 431 FOURIER SERIES 12.21 c n =[(−1) n sinhπ]/[π(1 + n 2 )]. Having set x = 0, separate out the n =0term and note that (−1) n =(−1) −n . 12.23 (π 2 −8)/16. 12.25 (b) All a n and α n are zero; b n =2(−1) n+1 /(nπ)andβ n =4/(nπ). You will need the result quoted in exercise 12.19. 432 13 Integral transforms In the previous chapter we encountered the Fourier series representation of a periodic function in a fixed interval as a superposition of sinusoidal functions. It is often desirable, however, to obtain such a representation even for functions defined over an infinite interval and with no particular periodicity. Such a representation is called a Fourier transform and is one of a class of representations called integral transforms. We begin by considering Fourier transforms as a generalisation of Fourier series. We then go on to discuss the properties of the Fourier transform and its applications. In the second part of the chapter we present an analogous discussion of the closely related Laplace transform. 13.1 Fourier transforms The Fourier transform provides a representation of functions defined over an infinite interval and having no particular periodicity, in terms of a superposition of sinusoidal functions. It may thus be considered as a generalisation of the Fourier series representation of periodic functions. Since Fourier transforms are often used to represent time-varying functions, we shall present much of our discussion in terms of f(t), rather than f(x), although in some spatial examples f(x) will be the more natural notation and we shall use it as appropriate. Our only requirement on f(t) will be that integraltext ∞ −∞ |f(t)|dt is finite. In order to develop the transition from Fourier series to Fourier transforms, we first recall that a function of period T may be represented as a complex Fourier series, cf. (12.9), f(t)= ∞ summationdisplay r=−∞ c r e 2πirt/T = ∞ summationdisplay r=−∞ c r e iωrt , (13.1) where ω r =2πr/T. As the period T tends to infinity, the ‘frequency quantum’ 433 INTEGRAL TRANSFORMS c(ω)expiωt −1 0 0 1 2 r − 2π T 2π T 4π T ωr Figure 13.1 The relationship between the Fourier terms for a function of period T and the Fourier integral (the area below the solid line) of the function. ∆ω =2π/T becomes vanishingly small and the spectrum of allowed frequencies ω r becomes a continuum. Thus, the infinite sum of terms in the Fourier series becomes an integral, and the coeﬃcients c r become functions of the continuous variable ω, as follows. We recall, cf. (12.10), that the coeﬃcients c r in (13.1) are given by c r = 1 T integraldisplay T/2 −T/2 f(t)e −2πirt/T dt = ∆ω 2π integraldisplay T/2 −T/2 f(t)e −iωrt dt, (13.2) where we have written the integral in two alternative forms and, for convenience, made one period run from −T/2to+T/2 rather than from 0 to T. Substituting from (13.2) into (13.1) gives f(t)= ∞ summationdisplay r=−∞ ∆ω 2π integraldisplay T/2 −T/2 f(u)e −iωru du e iωrt . (13.3) At this stage ω r is still a discrete function of r equal to 2πr/T. The solid points in figure 13.1 are a plot of (say, the real part of) c r e iωrt as a function of r (or equivalently of ω r ) and it is clear that (2π/T)c r e iωrt gives the area of the rth broken-line rectangle. If T tends to ∞ then ∆ω (= 2π/T) becomes infinitesimal, the width of the rectangles tends to zero and, from the mathematical definition of an integral, ∞ summationdisplay r=−∞ ∆ω 2π g(ω r )e iωrt → 1 2π integraldisplay ∞ −∞ g(ω)e iωt dω. In this particular case g(ω r )= integraldisplay T/2 −T/2 f(u)e −iωru du, 434 13.1 FOURIER TRANSFORMS and (13.3) becomes f(t)= 1 2π integraldisplay ∞ −∞ dωe iωt integraldisplay ∞ −∞ duf(u)e −iωu . (13.4) This result is known as Fourier’s inversion theorem. From it we may define the Fourier transform of f(t)by tildewide f(ω)= 1 √ 2π integraldisplay ∞ −∞ f(t)e −iωt dt, (13.5) and its inverse by f(t)= 1 √ 2π integraldisplay ∞ −∞ tildewide f(ω)e iωt dω. (13.6) Including the constant 1/ √ 2π in the definition of tildewide f(ω) (whose mathematical existence as T →∞is assumed here without proof) is clearly arbitrary, the only requirement being that the product of the constants in (13.5) and (13.6) should equal 1/(2π). Our definition is chosen to be as symmetric as possible. trianglerightsld Find the Fourier transform of the exponential decay function f(t)=0for t<0 and f(t)=Ae −λt for t≥0(λ>0). Using the definition (13.5) and separating the integral into two parts, tildewide f(ω)= 1 √ 2π integraldisplay 0 −∞ (0)e −iωt dt + A √ 2π integraldisplay ∞ 0 e −λt e −iωt dt =0+ A √ 2π bracketleftbigg − e −(λ+iω)t λ + iω bracketrightbigg ∞ 0 = A √ 2π(λ + iω) , which is the required transform. It is clear that the multiplicative constant A does not aﬀect the form of the transform, merely its amplitude. This transform may be verified by resubstitution of the above result into (13.6) to recover f(t), but evaluation of the integral requires the use of complex-variable contour integration (chapter 24). triangleleftsld 13.1.1 The uncertainty principle An important function that appears in many areas of physical science, either precisely or as an approximation to a physical situation, is the Gaussian or normal distribution. Its Fourier transform is of importance both in itself and also because, when interpreted statistically, it readily illustrates a form of uncertainty principle. 435 INTEGRAL TRANSFORMS trianglerightsldFind the Fourier transform of the normalised Gaussian distribution f(t)= 1 τ √ 2π exp parenleftbigg − t 2 2τ 2 parenrightbigg , −∞ 0, will be the superposition of all the (Huyghens’) wavelets originating from the various parts of the screen. For large r 0 (=|r 0 |), these can be treated as plane waves to give § A(r 0 )= integraldisplay Y −Y f(y)exp[ik prime · (r 0 −yj)] |r 0 −yj| dy. (13.8) § This is the approach first used by Fresnel. For simplicity we have omitted from the integral a multiplicative inclination factor that depends on angle θ and decreases as θ increases. 437 INTEGRAL TRANSFORMS −Y Y y x k k prime 0 θ Figure 13.2 Diﬀraction grating of width 2Y with light of wavelength 2π/k being diﬀracted through an angle θ. The factor exp[ik prime · (r 0 −yj)] represents the phase change undergone by the light in travelling from the point yj on the screen to the point r 0 , and the denominator represents the reduction in amplitude with distance. (Recall that the system is infinite in the z-direction and so the ‘spreading’ is eﬀectively in two dimensions only.) If the medium is the same on both sides of the screen then k prime = kcosθ i+ksinθj, and if r 0 greatermuchY then expression (13.8) can be approximated by A(r 0 )= exp(ik prime · r 0 ) r 0 integraldisplay ∞ −∞ f(y)exp(−iky sinθ)dy. (13.9) We have used that f(y)=0for|y| >Y to extend the integral to infinite limits. The intensity in the direction θ is then given by I(θ)=|A| 2 = 2π r 0 2 | tildewide f(q)| 2 , (13.10) where q = k sinθ. trianglerightsldEvaluate I(θ) for an aperture consisting of two long slits each of width 2b whose centres are separated by a distance 2a, a>b; the slits are illuminated by light of wavelength λ. The aperture function is plotted in figure 13.3. We first need to find tildewide f(q): tildewide f(q)= 1 √ 2π integraldisplay −a+b −a−b e −iqx dx + 1 √ 2π integraldisplay a+b a−b e −iqx dx = 1 √ 2π bracketleftbigg − e −iqx iq bracketrightbigg −a+b −a−b + 1 √ 2π bracketleftbigg − e −iqx iq bracketrightbigg a+b a−b = −1 iq √ 2π bracketleftbig e −iq(−a+b) −e −iq(−a−b) + e −iq(a+b) −e −iq(a−b) bracketrightbig . 438 13.1 FOURIER TRANSFORMS f(y) 1 a−b−a−b a + b−a + b a −a x Figure 13.3 The aperture function f(y) for two wide slits. After some manipulation we obtain tildewide f(q)= 4cosqasinqb q √ 2π . Now applying (13.10), and remembering that q =(2π sinθ)/λ, we find I(θ)= 16cos 2 qasin 2 qb q 2 r 0 2 , where r 0 is the distance from the centre of the aperture. triangleleftsld 13.1.3 The Dirac δ-function Before going on to consider further properties of Fourier transforms we make a digression to discuss the Dirac δ-function and its relation to Fourier transforms. The δ-function is diﬀerent from most functions encountered in the physical sciences but we will see that a rigorous mathematical definition exists; the utility of the δ-function will be demonstrated throughout the remainder of this chapter. It can be visualised as a very sharp narrow pulse (in space, time, density, etc.) which produces an integrated eﬀect having a definite magnitude. The formal properties of the δ-function may be summarised as follows. The Dirac δ-function has the property that δ(t)=0 fortnegationslash=0, (13.11) but its fundamental defining property is integraldisplay f(t)δ(t−a)dt = f(a), (13.12) provided the range of integration includes the point t = a; otherwise the integral 439 INTEGRAL TRANSFORMS equals zero. This leads immediately to two further useful results: integraldisplay b −a δ(t)dt = 1 for all a,b > 0 (13.13) and integraldisplay δ(t−a)dt =1, (13.14) provided the range of integration includes t = a. Equation (13.12) can be used to derive further useful properties of the Dirac δ-function: δ(t)=δ(−t), (13.15) δ(at)= 1 |a| δ(t), (13.16) tδ(t)=0. (13.17) trianglerightsldProve that δ(bt)=δ(t)/|b|. Let us first consider the case where b>0. It follows that integraldisplay ∞ −∞ f(t)δ(bt)dt = integraldisplay ∞ −∞ f parenleftbigg t prime b parenrightbigg δ(t prime ) dt prime b = 1 b f(0) = 1 b integraldisplay ∞ −∞ f(t)δ(t)dt, where we have made the substitution t prime = bt.Butf(t) is arbitrary and so we immediately see that δ(bt)=δ(t)/b = δ(t)/|b| for b>0. Now consider the case where b =−c<0. It follows that integraldisplay ∞ −∞ f(t)δ(bt)dt = integraldisplay −∞ ∞ f parenleftbigg t prime −c parenrightbigg δ(t prime ) parenleftbigg dt prime −c parenrightbigg = integraldisplay ∞ −∞ 1 c f parenleftbigg t prime −c parenrightbigg δ(t prime )dt prime = 1 c f(0) = 1 |b| f(0) = 1 |b| integraldisplay ∞ −∞ f(t)δ(t)dt, where we have made the substitution t prime = bt =−ct.Butf(t) is arbitrary and so δ(bt)= 1 |b| δ(t), for all b, which establishes the result. triangleleftsld Furthermore, by considering an integral of the form integraldisplay f(t)δ(h(t))dt, and making a change of variables to z = h(t), we may show that δ(h(t)) = summationdisplay i δ(t−t i ) |h prime (t i )| , (13.18) where the t i are those values of t for which h(t)=0andh prime (t) stands for dh/dt. 440 13.1 FOURIER TRANSFORMS The derivative of the delta function, δ prime (t), is defined by integraldisplay ∞ −∞ f(t)δ prime (t)dt = bracketleftBig f(t)δ(t) bracketrightBig ∞ −∞ − integraldisplay ∞ −∞ f prime (t)δ(t)dt =−f prime (0), (13.19) and similarly for higher derivatives. For many practical purposes, eﬀects that are not strictly described by a δ- function may be analysed as such, if they take place in an interval much shorter than the response interval of the system on which they act. For example, the idealised notion of an impulse of magnitude J applied at time t 0 can be represented by j(t)=Jδ(t−t 0 ). (13.20) Many physical situations are described by a δ-function in space rather than in time. Moreover, we often require the δ-function to be defined in more than one dimension. For example, the charge density of a point charge q at a point r 0 may be expressed as a three-dimensional δ-function ρ(r)=qδ(r−r 0 )=qδ(x−x 0 )δ(y−y 0 )δ(z−z 0 ), (13.21) so that a discrete ‘quantum’ is expressed as if it were a continuous distribution. From (13.21) we see that (as expected) the total charge enclosed in a volume V is given by integraldisplay V ρ(r)dV = integraldisplay V qδ(r−r 0 )dV = braceleftBigg q if r 0 lies in V, 0otherwise. Closely related to the Dirac δ-function is the Heaviside or unit step function H(t), for which H(t)= braceleftBigg 1fort>0, 0fort<0. (13.22) This function is clearly discontinuous at t = 0 and it is usual to take H(0) = 1/2. The Heaviside function is related to the delta function by H prime (t)=δ(t). (13.23) 441 INTEGRAL TRANSFORMS trianglerightsldProve relation (13.23). Considering the integral integraldisplay ∞ −∞ f(t)H prime (t)dt = bracketleftbigg f(t)H(t) bracketrightbigg ∞ −∞ − integraldisplay ∞ −∞ f prime (t)H(t)dt = f(∞)− integraldisplay ∞ 0 f prime (t)dt = f(∞)− bracketleftbigg f(t) bracketrightbigg ∞ 0 = f(0), and comparing it with (13.12) when a = 0 immediately shows that H prime (t)=δ(t). triangleleftsld 13.1.4 Relation of the δ-function to Fourier transforms In the previous section we introduced the Dirac δ-function as a way of repre- senting very sharp narrow pulses, but in no way related it to Fourier transforms. We now show that the δ-function can equally well be defined in a way that more naturally relates it to the Fourier transform. Referring back to the Fourier inversion theorem (13.4), we have f(t)= 1 2π integraldisplay ∞ −∞ dωe iωt integraldisplay ∞ −∞ duf(u)e −iωu = integraldisplay ∞ −∞ du f(u) braceleftbigg 1 2π integraldisplay ∞ −∞ e iω(t−u) dω bracerightbigg . Comparison of this with (13.12) shows that we may write the δ-function as δ(t−u)= 1 2π integraldisplay ∞ −∞ e iω(t−u) dω. (13.24) Considered as a Fourier transform, this representation shows that a very narrow time peak at t = u results from the superposition of a complete spectrum of harmonic waves, all frequencies having the same amplitude and all waves being in phase at t = u. This suggests that the δ-function may also be represented as the limit of the transform of a uniform distribution of unit height as the width of this distribution becomes infinite. Consider the rectangular distribution of frequencies shown in figure 13.4(a). From (13.6), taking the inverse Fourier transform, f Ω (t)= 1 √ 2π integraldisplay Ω −Ω 1×e iωt dω = 2Ω √ 2π sin Ωt Ωt . (13.25) This function is illustrated in figure 13.4(b) and it is apparent that, for large Ω, it becomes very large at t = 0 and also very narrow about t = 0, as we qualitatively 442 13.1 FOURIER TRANSFORMS ω (a) (b) Ω−Ω t π Ω 1 tildewide f Ω f Ω (t) 2Ω (2π) 1/2 Figure 13.4 (a) A Fourier transform showing a rectangular distribution of frequencies between ±Ω; (b) the function of which it is the transform, which is proportional to t −1 sin Ωt. expect and require. We also note that, in the limit Ω →∞, f Ω (t), as defined by the inverse Fourier transform, tends to (2π) 1/2 δ(t) by virtue of (13.24). Hence we may conclude that the δ-function can also be represented by δ(t) = lim Ω→∞ parenleftbigg sinΩt πt parenrightbigg . (13.26) Several other function representations are equally valid, e.g. the limiting cases of rectangular, triangular or Gaussian distributions; the only essential requirements are a knowledge of the area under such a curve and that undefined operations such as dividing by zero are not inadvertently carried out on the δ-function whilst some non-explicit representation is being employed. We also note that the Fourier transform definition of the delta function, (13.24), shows that the latter is real since δ ∗ (t)= 1 2π integraldisplay ∞ −∞ e −iωt dω = δ(−t)=δ(t). Finally, the Fourier transform of a δ-function is simply tildewide δ(ω)= 1 √ 2π integraldisplay ∞ −∞ δ(t)e −iωt dt = 1 √ 2π . (13.27) 13.1.5 Properties of Fourier transforms Having considered the Dirac δ-function, we now return to our discussion of the properties of Fourier transforms. As we would expect, Fourier transforms have many properties analogous to those of Fourier series in respect of the connection between the transforms of related functions. Here we list these properties without proof; they can be verified by working from the definition of the transform. As previously, we denote the Fourier transform of f(t)by tildewide f(ω)orF[f(t)]. 443 INTEGRAL TRANSFORMS (i) Diﬀerentiation: F bracketleftbig f prime (t) bracketrightbig = iω tildewide f(ω). (13.28) This may be extended to higher derivatives, so that F bracketleftbig f primeprime (t) bracketrightbig = iωF bracketleftbig f prime (t) bracketrightbig =−ω 2 tildewide f(ω), andsoon. (ii) Integration: F bracketleftbiggintegraldisplay t f(s)ds bracketrightbigg = 1 iω tildewide f(ω)+2πcδ(ω), (13.29) where the term 2πcδ(ω) represents the Fourier transform of the constant of integration associated with the indefinite integral. (iii) Scaling: F[f(at)] = 1 a tildewide f parenleftBig ω a parenrightBig . (13.30) (iv) Translation: F[f(t + a)] = e iaω tildewide f(ω). (13.31) (v) Exponential multiplication: F bracketleftbig e αt f(t) bracketrightbig = tildewide f(ω + iα), (13.32) where α may be real, imaginary or complex. trianglerightsldProve relation (13.28). Calculating the Fourier transform of f prime (t) directly, we obtain F bracketleftbig f prime (t) bracketrightbig = 1 √ 2π integraldisplay ∞ −∞ f prime (t)e −iωt dt = 1 √ 2π bracketleftbigg e −iωt f(t) bracketrightbigg ∞ −∞ + 1 √ 2π integraldisplay ∞ −∞ iωe −iωt f(t)dt = iω tildewide f(ω), if f(t)→0att =±∞,asitmustsince integraltext ∞ −∞ |f(t)|dt is finite. triangleleftsld To illustrate a use and also a proof of (13.32), let us consider an amplitude- modulated radio wave. Suppose a message to be broadcast is represented by f(t). The message can be added electronically to a constant signal a of magnitude such that a + f(t) is never negative, and then the sum can be used to modulate the amplitude of a carrier signal of frequency ω c . Using a complex exponential notation, the transmitted amplitude is now g(t)=A[a + f(t)] e iωct . (13.33) 444 13.1 FOURIER TRANSFORMS Ignoring in the present context the eﬀect of the term Aaexp(iω c t), which gives a contribution to the transmitted spectrum only at ω = ω c , we obtain for the new spectrum tildewideg(ω)= 1 √ 2π A integraldisplay ∞ −∞ f(t)e iωct e −iωt dt = 1 √ 2π A integraldisplay ∞ −∞ f(t)e −i(ω−ωc)t dt = A tildewide f(ω−ω c ), (13.34) which is simply a shift of the whole spectrum by the carrier frequency. The use of diﬀerent carrier frequencies enables signals to be separated. 13.1.6 Odd and even functions If f(t) is odd or even then we may derive alternative forms of Fourier’s inversion theorem, which lead to the definition of diﬀerent transform pairs. Let us first consider an odd function f(t)=−f(−t), whose Fourier transform is given by tildewide f(ω)= 1 √ 2π integraldisplay ∞ −∞ f(t)e −iωt dt = 1 √ 2π integraldisplay ∞ −∞ f(t)(cosωt−isinωt)dt = −2i √ 2π integraldisplay ∞ 0 f(t)sinωtdt, where in the last line we use the fact that f(t)andsinωt are odd, whereas cosωt is even. We note that tildewide f(−ω)=− tildewide f(ω), i.e. tildewide f(ω) is an odd function of ω. Hence f(t)= 1 √ 2π integraldisplay ∞ −∞ tildewide f(ω)e iωt dω = 2i √ 2π integraldisplay ∞ 0 tildewide f(ω)sinωtdω = 2 π integraldisplay ∞ 0 dω sinωt braceleftbiggintegraldisplay ∞ 0 f(u)sinωudu bracerightbigg . Thus we may define the Fourier sine transform pair for odd functions: tildewide f s (ω)= radicalbigg 2 π integraldisplay ∞ 0 f(t)sinωtdt, (13.35) f(t)= radicalbigg 2 π integraldisplay ∞ 0 tildewide f s (ω)sinωtdω. (13.36) Note that although the Fourier sine transform pair was derived by considering an odd function f(t) defined over all t, the definitions (13.35) and (13.36) only require f(t)and tildewide f s (ω) to be defined for positive t and ω respectively. For an 445 INTEGRAL TRANSFORMS g(y) (a) (b) (c) (d) y 0 Figure 13.5 Resolution functions: (a) ideal δ-function; (b) typical unbiased resolution; (c) and (d) biases tending to shift observations to higher values than the true one. even function, i.e. one for which f(t)=f(−t), we can define the Fourier cosine transform pair in a similar way, but with sinωt replaced by cosωt. 13.1.7 Convolution and deconvolution It is apparent that any attempt to measure the value of a physical quantity is limited, to some extent, by the finite resolution of the measuring apparatus used. On the one hand, the physical quantity we wish to measure will be in general a function of an independent variable, x say, i.e. the true function to be measured takes the form f(x). On the other hand, the apparatus we are using does not give the true output value of the function; a resolution function g(y) is involved. By this we mean that the probability that an output value y = 0 will be recorded instead as being between y and y+dy is given by g(y)dy. Some possible resolution functions of this sort are shown in figure 13.5. To obtain good results we wish the resolution function to be as close to a δ-function as possible (case (a)). A typical piece of apparatus has a resolution function of finite width, although if it is accurate the mean is centred on the true value (case (b)). However, some apparatus may show a bias that tends to shift observations to higher or lower values than the true ones (cases (c)and(d)), thereby exhibiting systematic error. Given that the true distribution is f(x) and the resolution function of our measuring apparatus is g(y), we wish to calculate what the observed distribution h(z) will be. The symbols x, y and z all refer to the same physical variable (e.g. 446 13.1 FOURIER TRANSFORMS −a −aaa x y z f(x) b−b 2b2b g(y) h(z)∗ = 1 Figure 13.6 The convolution of two functions f(x)andg(y). length or angle), but are denoted diﬀerently because the variable appears in the analysis in three diﬀerent roles. The probability that a true reading lying between x and x + dx, and so having probability f(x)dx of being selected by the experiment, will be moved by the instrumental resolution by an amount z−x into a small interval of width dz is g(z−x)dz. Hence the combined probability that the interval dx will give rise to an observation appearing in the interval dz is f(x)dxg(z−x)dz. Adding together the contributions from all values of x thatcanleadtoanobservationintherange z to z + dz, we find that the observed distribution is given by h(z)= integraldisplay ∞ −∞ f(x)g(z−x)dx. (13.37) The integral in (13.37) is called the convolution of the functions f and g and is often written f∗g. The convolution defined above is commutative (f∗g = g∗f), associative and distributive. The observed distribution is thus the convolution of the true distribution and the experimental resolution function. The result will be that the observed distribution is broader and smoother than the true one and, if g(y) has a bias, the maxima will normally be displaced from their true positions. It is also obvious from (13.37) that if the resolution is the ideal δ-function, g(y)=δ(y)thenh(z)=f(z) and the observed distribution is the true one. It is interesting to note, and a very important property, that the convolution of any function g(y) with a number of delta functions leaves a copy of g(y)atthe position of each of the delta functions. trianglerightsldFind the convolution of the function f(x)=δ(x + a)+δ(x−a) with the function g(y) plotted in figure 13.6. Using the convolution integral (13.37) h(z)= integraldisplay ∞ −∞ f(x)g(z−x)dx = integraldisplay ∞ −∞ [δ(x + a)+δ(x−a)]g(z−x)dx = g(z + a)+g(z−a). This convolution h(z) is plotted in figure 13.6. triangleleftsld Let us now consider the Fourier transform of the convolution (13.37); this is 447 INTEGRAL TRANSFORMS given by tildewide h(k)= 1 √ 2π integraldisplay ∞ −∞ dze −ikz braceleftbiggintegraldisplay ∞ −∞ f(x)g(z−x)dx bracerightbigg = 1 √ 2π integraldisplay ∞ −∞ dxf(x) braceleftbiggintegraldisplay ∞ −∞ g(z−x)e −ikz dz bracerightbigg . If we let u = z−x in the second integral we have tildewide h(k)= 1 √ 2π integraldisplay ∞ −∞ dxf(x) braceleftbiggintegraldisplay ∞ −∞ g(u)e −ik(u+x) du bracerightbigg = 1 √ 2π integraldisplay ∞ −∞ f(x)e −ikx dx integraldisplay ∞ −∞ g(u)e −iku du = 1 √ 2π × √ 2π tildewide f(k)× √ 2πtildewideg(k)= √ 2π tildewide f(k)tildewideg(k). (13.38) Hence the Fourier transform of a convolution f∗g is equal to the product of the separate Fourier transforms multiplied by √ 2π; this result is called the convolution theorem. It may be proved similarly that the converse is also true, namely that the Fourier transform of the product f(x)g(x) is given by F[f(x)g(x)] = 1 √ 2π tildewide f(k)∗tildewideg(k). (13.39) trianglerightsldFind the Fourier transform of the function in figure 13.3 representing two wide slits by considering the Fourier transforms of (i) two δ-functions, at x = ±a, (ii) a rectangular function of height 1 and width 2b centred on x =0. (i) The Fourier transform of the two δ-functions is given by tildewide f(q)= 1 √ 2π integraldisplay ∞ −∞ δ(x−a)e −iqx dx + 1 √ 2π integraldisplay ∞ −∞ δ(x + a)e −iqx dx = 1 √ 2π parenleftbig e −iqa + e iqa parenrightbig = 2cosqa √ 2π . (ii) The Fourier transform of the broad slit is tildewideg(q)= 1 √ 2π integraldisplay b −b e −iqx dx = 1 √ 2π bracketleftbigg e −iqx −iq bracketrightbigg b −b = −1 iq √ 2π (e −iqb −e iqb )= 2sinqb q √ 2π . We have already seen that the convolution of these functions is the required function representing two wide slits (see figure 13.6). So, using the convolution theorem, the Fourier transform of the convolution is √ 2π times the product of the individual transforms, i.e. 4cosqa sinqb/(q √ 2π). This is, of course, the same result as that obtained in the example in subsection 13.1.2. triangleleftsld 448 13.1 FOURIER TRANSFORMS The inverse of convolution, called deconvolution, allows us to find a true distribution f(x) given an observed distribution h(z) and a resolution function g(y). trianglerightsldAn experimental quantity f(x) is measured using apparatus with a known resolution func- tion g(y) to give an observed distribution h(z).Howmayf(x) be extracted from the mea- sured distribution? From the convolution theorem (13.38), the Fourier transform of the measured distribution is tildewide h(k)= √ 2π tildewide f(k)tildewideg(k), from which we obtain tildewide f(k)= 1 √ 2π tildewide h(k) tildewideg(k) . Then on inverse Fourier transforming we find f(x)= 1 √ 2π F −1 bracketleftBigg tildewide h(k) tildewideg(k) bracketrightBigg . In words, to extract the true distribution, we divide the Fourier transform of the observed distribution by that of the resolution function for each value of k and then take the inverse Fourier transform of the function so generated. triangleleftsld This explicit method of extracting true distributions is straightforward for exact functions but, in practice, because of experimental and statistical uncertainties in the experimental data or because data over only a limited range are available, it is often not very precise, involving as it does three (numerical) transforms each requiring in principle an integral over an infinite range. 13.1.8 Correlation functions and energy spectra The cross-correlation of two functions f and g is defined by C(z)= integraldisplay ∞ −∞ f ∗ (x)g(x + z)dx. (13.40) Despite the formal similarity between (13.40) and the definition of the convolution in (13.37), the use and interpretation of the cross-correlation and of the convo- lution are very diﬀerent; the cross-correlation provides a quantitative measure of the similarity of two functions f and g as one is displaced through a distance z relative to the other. The cross-correlation is often notated as C = f⊗g, and, like convolution, it is both associative and distributive. Unlike convolution, however, it is not commutative, in fact [f⊗g](z)=[g⊗f] ∗ (−z). (13.41) 449 INTEGRAL TRANSFORMS trianglerightsldProve the Wiener–Kinchin theorem, tildewide C(k)= √ 2π [ tildewide f(k)] ∗ tildewideg(k). (13.42) Following a method similar to that for the convolution of f and g, let us consider the Fourier transform of (13.40): tildewide C(k)= 1 √ 2π integraldisplay ∞ −∞ dze −ikz braceleftbiggintegraldisplay ∞ −∞ f ∗ (x)g(x + z)dx bracerightbigg = 1 √ 2π integraldisplay ∞ −∞ dxf ∗ (x) braceleftbiggintegraldisplay ∞ −∞ g(x + z)e −ikz dz bracerightbigg . Making the substitution u = x + z inthesecondintegralweobtain tildewide C(k)= 1 √ 2π integraldisplay ∞ −∞ dxf ∗ (x) braceleftbiggintegraldisplay ∞ −∞ g(u)e −ik(u−x) du bracerightbigg = 1 √ 2π integraldisplay ∞ −∞ f ∗ (x)e ikx dx integraldisplay ∞ −∞ g(u)e −iku du = 1 √ 2π × √ 2π [ tildewide f(k)] ∗ × √ 2πtildewideg(k)= √ 2π [ tildewide f(k)] ∗ tildewideg(k). triangleleftsld Thus the Fourier transform of the cross-correlation of f and g is equal to the product of [ tildewide f(k)] ∗ and tildewideg(k) multiplied by √ 2π. This a statement of the Wiener–Kinchin theorem. Similarly we can derive the converse theorem F bracketleftbig f ∗ (x)g(x) bracketrightbig = 1 √ 2π tildewide f⊗tildewideg. If we now consider the special case where g is taken to be equal to f in (13.40) then, writing the LHS as a(z), we have a(z)= integraldisplay ∞ −∞ f ∗ (x)f(x + z)dx; (13.43) this is called the auto-correlation function of f(x). Using the Wiener–Kinchin theorem (13.42) we see that a(z)= 1 √ 2π integraldisplay ∞ −∞ tildewidea(k)e ikz dk = 1 √ 2π integraldisplay ∞ −∞ √ 2π [ tildewide f(k)] ∗ tildewide f(k)e ikz dk, so that a(z) is the inverse Fourier transform of √ 2π| tildewide f(k)| 2 , which is in turn called the energy spectrum of f. 13.1.9 Parseval’s theorem Using the results of the previous section we can immediately obtain Parseval’s theorem. The most general form of this (also called the multiplication theorem)is 450 13.1 FOURIER TRANSFORMS obtained simply by noting from (13.42) that the cross-correlation (13.40) of two functions f and g canbewrittenas C(z)= integraldisplay ∞ −∞ f ∗ (x)g(x + z)dx = integraldisplay ∞ −∞ [ tildewide f(k)] ∗ tildewideg(k)e ikz dk. (13.44) Then, setting z = 0 gives the multiplication theorem integraldisplay ∞ −∞ f ∗ (x)g(x)dx = integraldisplay [ tildewide f(k)] ∗ tildewideg(k)dk. (13.45) Specialising further, by letting g = f, we derive the most common form of Parseval’s theorem, integraldisplay ∞ −∞ |f(x)| 2 dx = integraldisplay ∞ −∞ | tildewide f(k)| 2 dk. (13.46) When f is a physical amplitude these integrals relate to the total intensity involved in some physical process. We have already met a form of Parseval’s theorem for Fourier series in chapter 12; it is in fact a special case of (13.46). trianglerightsldThe displacement of a damped harmonic oscillator as a function of time is given by f(t)= braceleftBigg 0 for t<0, e −t/τ sinω 0 t for t≥0. Find the Fourier transform of this function and so give a physical interpretation of Parseval’s theorem. Using the usual definition for the Fourier transform we find tildewide f(ω)= integraldisplay 0 −∞ 0×e −iωt dt + integraldisplay ∞ 0 e −t/τ sinω 0 te −iωt dt. Writing sinω 0 t as (e iω 0 t −e −iω 0 t )/2i we obtain tildewide f(ω)=0+ 1 2i integraldisplay ∞ 0 bracketleftbig e −it(ω−ω 0 −i/τ) −e −it(ω+ω 0 −i/τ) bracketrightbig dt = 1 2 bracketleftbigg 1 ω + ω 0 −i/τ − 1 ω−ω 0 −i/τ bracketrightbigg , which is the required Fourier transform. The physical interpretation of| tildewide f(ω)| 2 is the energy content per unit frequency interval (i.e. the energy spectrum) whilst|f(t)| 2 is proportional to the sum of the kinetic and potential energies of the oscillator. Hence (to within a constant) Parseval’s theorem shows the equivalence of these two alternative specifications for the total energy. triangleleftsld 13.1.10 Fourier transforms in higher dimensions The concept of the Fourier transform can be extended naturally to more than one dimension. For instance we may wish to find the spatial Fourier transform of 451 INTEGRAL TRANSFORMS two- or three-dimensional functions of position. For example, in three dimensions we can define the Fourier transform of f(x,y,z)as tildewide f(k x ,k y ,k z )= 1 (2π) 3/2 integraldisplayintegraldisplayintegraldisplay f(x,y,z)e −ikxx e −ikyy e −ikzz dxdydz, (13.47) and its inverse as f(x,y,z)= 1 (2π) 3/2 integraldisplayintegraldisplayintegraldisplay tildewide f(k x ,k y ,k z )e ikxx e ikyy e ikzz dk x dk y dk z . (13.48) Denoting the vector with components k x ,k y ,k z by k and that with components x, y, z by r, we can write the Fourier transform pair (13.47), (13.48) as tildewide f(k)= 1 (2π) 3/2 integraldisplay f(r)e −ik·r d 3 r, (13.49) f(r)= 1 (2π) 3/2 integraldisplay tildewide f(k)e ik·r d 3 k. (13.50) From these relations we may deduce that the three-dimensional Dirac δ-function canbewrittenas δ(r)= 1 (2π) 3 integraldisplay e ik·r d 3 k. (13.51) Similar relations to (13.49), (13.50) and (13.51) exist for spaces of other dimen- sionalities. trianglerightsldIn three-dimensional space a function f(r) possesses spherical symmetry, so that f(r)= f(r). Find the Fourier transform of f(r) as a one-dimensional integral. Let us choose spherical polar coordinates in which the vector k of the Fourier transform lies along the polar axis (θ =0).Thiswecandosincef(r) is spherically symmetric. We then have d 3 r = r 2 sinθdrdθdφ and k·r = krcosθ, where k =|k|. The Fourier transform is then given by tildewide f(k)= 1 (2π) 3/2 integraldisplay f(r)e −ik·r d 3 r = 1 (2π) 3/2 integraldisplay ∞ 0 dr integraldisplay π 0 dθ integraldisplay 2π 0 dφf(r)r 2 sinθe −ikr cosθ = 1 (2π) 3/2 integraldisplay ∞ 0 dr2πf(r)r 2 integraldisplay π 0 dθ sinθe −ikr cosθ . The integral over θ may be straightforwardly evaluated by noting that d dθ (e −ikrcosθ )=ikr sinθe −ikrcosθ . Therefore tildewide f(k)= 1 (2π) 3/2 integraldisplay ∞ 0 dr2πf(r)r 2 bracketleftbigg e −ikr cosθ ikr bracketrightbigg θ=π θ=0 = 1 (2π) 3/2 integraldisplay ∞ 0 4πr 2 f(r) parenleftbigg sinkr kr parenrightbigg dr. triangleleftsld 452 13.2 LAPLACE TRANSFORMS A similar result may be obtained for two-dimensional Fourier transforms in which f(r)=f(ρ), i.e. f(r) is independent of azimuthal angle φ. In this case, using the integral representation of the Bessel function J 0 (x) given at the very end of subsection 18.5.3, we find tildewide f(k)= 1 2π integraldisplay ∞ 0 2πρf(ρ)J 0 (kρ)dρ. (13.52) 13.2 Laplace transforms Often we are interested in functions f(t) for which the Fourier transform does not exist because f negationslash→ 0ast→∞, and so the integral defining tildewide f does not converge. This would be the case for the function f(t)=t, which does not possess a Fourier transform. Furthermore, we might be interested in a given function only for t>0, for example when we are given the value at t = 0 in an initial-value problem. This leads us to consider the Laplace transform, ¯ f(s)orL [f(t)],off(t), which is defined by ¯ f(s)≡ integraldisplay ∞ 0 f(t)e −st dt, (13.53) provided that the integral exists. We assume here that s is real, but complex values would have to be considered in a more detailed study. In practice, for a given function f(t) there will be some real number s 0 such that the integral in (13.53) exists for s>s 0 but diverges for s≤s 0 . Through (13.53) we define a linear transformation L that converts functions of the variable t to functions of a new variable s: L [af 1 (t)+bf 2 (t)] = aL [f 1 (t)] + bL [f 2 (t)] = a ¯ f 1 (s)+b ¯ f 2 (s). (13.54) trianglerightsldFind the Laplace transforms of the functions (i) f(t)=1, (ii) f(t)=e at , (iii) f(t)=t n , for n =0,1,2,.... (i) By direct application of the definition of a Laplace transform (13.53), we find L [1] = integraldisplay ∞ 0 e −st dt = bracketleftbigg −1 s e −st bracketrightbigg ∞ 0 = 1 s , if s>0, where the restriction s>0 is required for the integral to exist. (ii) Again using (13.53) directly, we find ¯ f(s)= integraldisplay ∞ 0 e at e −st dt = integraldisplay ∞ 0 e (a−s)t dt = bracketleftbigg e (a−s)t a−s bracketrightbigg ∞ 0 = 1 s−a if s>a. 453 INTEGRAL TRANSFORMS (iii) Once again using the definition (13.53) we have ¯ f n (s)= integraldisplay ∞ 0 t n e −st dt. Integrating by parts we find ¯ f n (s)= bracketleftbigg −t n e −st s bracketrightbigg ∞ 0 + n s integraldisplay ∞ 0 t n−1 e −st dt =0+ n s ¯ f n−1 (s), if s>0. We now have a recursion relation between successive transforms and by calculating ¯ f 0 we can infer ¯ f 1 , ¯ f 2 ,etc.Sincet 0 = 1, (i) above gives ¯ f 0 = 1 s , if s>0, (13.55) and ¯ f 1 (s)= 1 s 2 , ¯ f 2 (s)= 2! s 3 ,.., ¯ f n (s)= n! s n+1 if s>0. Thus, in each case (i)–(iii), direct application of the definition of the Laplace transform (13.53) yields the required result. triangleleftsld Unlike that for the Fourier transform, the inversion of the Laplace transform is not an easy operation to perform, since an explicit formula for f(t), given ¯ f(s), is not straightforwardly obtained from (13.53). The general method for obtaining an inverse Laplace transform makes use of complex variable theory and is not discussed until chapter 25. However, progress can be made without having to find an explicit inverse, since we can prepare from (13.53) a ‘dictionary’ of the Laplace transforms of common functions and, when faced with an inversion to carry out, hope to find the given transform (together with its parent function) in the listing. Such a list is given in table 13.1. When finding inverse Laplace transforms using table 13.1, it is useful to note that for all practical purposes the inverse Laplace transform is unique § and linear so that L −1 bracketleftbig a ¯ f 1 (s)+b ¯ f 2 (s) bracketrightbig = af 1 (t)+bf 2 (t). (13.56) In many practical problems the method of partial fractions can be useful in producing an expression from which the inverse Laplace transform can be found. trianglerightsldUsing table 13.1 find f(t) if ¯ f(s)= s +3 s(s +1) . Using partial fractions ¯ f(s) may be written ¯ f(s)= 3 s − 2 s +1 . § This is not strictly true, since two functions can diﬀer from one another at a finite number of isolated points but have the same Laplace transform. 454 13.2 LAPLACE TRANSFORMS f(t) ¯ f(s) s 0 cc/0 ct n cn!/s n+1 0 sinbt b/(s 2 + b 2 )0 cosbt s/(s 2 + b 2 e at 1/(s−a) a t n e at n!/(s−a) n+1 a sinhat a/(s 2 −a 2 ) |a| coshat s/(s 2 −a 2 ) |a| e at sinbt b/[(s−a) 2 + b 2 ] a e at cosbt (s−a)/[(s−a) 2 + b 2 ] a t 1/2 1 2 (π/s 3 ) 1/2 0 t −1/2 (π/s) 1/2 0 δ(t−t 0 ) e −st 0 0 H(t−t 0 )= braceleftBigg 1fort≥t 0 0forts 0 . Comparing this with the standard Laplace transforms in table 13.1, we find that the inverse transform of 3/s is 3 for s>0 and the inverse transform of 2/(s +1)is2e −t for s>−1, and so f(t)=3−2e −t , if s>0. triangleleftsld 13.2.1 Laplace transforms of derivatives and integrals One of the main uses of Laplace transforms is in solving diﬀerential equations. Diﬀerential equations are the subject of the next six chapters and we will return to the application of Laplace transforms to their solution in chapter 15. In the meantime we will derive the required results, i.e. the Laplace transforms of derivatives. The Laplace transform of the first derivative of f(t) is given by L bracketleftbigg df dt bracketrightbigg = integraldisplay ∞ 0 df dt e −st dt = bracketleftbig f(t)e −st bracketrightbig ∞ 0 + s integraldisplay ∞ 0 f(t)e −st dt =−f(0) + s ¯ f(s), for s>0. (13.57) The evaluation relies on integration by parts and higher-order derivatives may be found in a similar manner. 455 INTEGRAL TRANSFORMS trianglerightsldFind the Laplace transform of d 2 f/dt 2 . Using the definition of the Laplace transform and integrating by parts we obtain L bracketleftbigg d 2 f dt 2 bracketrightbigg = integraldisplay ∞ 0 d 2 f dt 2 e −st dt = bracketleftbigg df dt e −st bracketrightbigg ∞ 0 + s integraldisplay ∞ 0 df dt e −st dt =− df dt (0) + s[s ¯ f(s)−f(0)], for s>0, where (13.57) has been substituted for the integral. This can be written more neatly as L bracketleftbigg d 2 f dt 2 bracketrightbigg = s 2 ¯ f(s)−sf(0)− df dt (0), for s>0. triangleleftsld In general the Laplace transform of the nth derivative is given by L bracketleftbigg d n f dt n bracketrightbigg = s n ¯ f−s n−1 f(0)−s n−2 df dt (0)−···− d n−1 f dt n−1 (0), for s>0. (13.58) We now turn to integration, which is much more straightforward. From the definition (13.53), L bracketleftbiggintegraldisplay t 0 f(u)du bracketrightbigg = integraldisplay ∞ 0 dte −st integraldisplay t 0 f(u)du = bracketleftbigg − 1 s e −st integraldisplay t 0 f(u)du bracketrightbigg ∞ 0 + integraldisplay ∞ 0 1 s e −st f(t)dt. The first term on the RHS vanishes at both limits, and so L bracketleftbiggintegraldisplay t 0 f(u)du bracketrightbigg = 1 s L [f]. (13.59) 13.2.2 Other properties of Laplace transforms From table 13.1 it will be apparent that multiplying a function f(t)bye at has the eﬀect on its transform that s is replaced by s−a. This is easily proved generally: L bracketleftbig e at f(t) bracketrightbig = integraldisplay ∞ 0 f(t)e at e −st dt = integraldisplay ∞ 0 f(t)e −(s−a)t dt = ¯ f(s−a). (13.60) As it were, multiplying f(t)bye at moves the origin of s by an amount a. 456 13.2 LAPLACE TRANSFORMS We may now consider the eﬀect of multiplying the Laplace transform ¯ f(s)by e −bs (b>0). From the definition (13.53), e −bs ¯ f(s)= integraldisplay ∞ 0 e −s(t+b) f(t)dt = integraldisplay ∞ 0 e −sz f(z−b)dz, on putting t + b = z. Thus e −bs ¯ f(s) is the Laplace transform of a function g(t) defined by g(t)= braceleftBigg 0for0b. In other words, the function f has been translated to ‘later’ t (larger values of t) by an amount b. Further properties of Laplace transforms can be proved in similar ways and are listed below. (i) L [f(at)] = 1 a ¯ f parenleftBig s a parenrightBig , (13.61) (ii) L [t n f(t)] =(−1) n d n ¯ f(s) ds n , for n =1,2,3,..., (13.62) (iii) L bracketleftbigg f(t) t bracketrightbigg = integraldisplay ∞ s ¯ f(u)du, (13.63) provided lim t→0 [f(t)/t] exists. Related results may be easily proved. trianglerightsldFind an expression for the Laplace transform of td 2 f/dt 2 . From the definition of the Laplace transform we have L bracketleftbigg t d 2 f dt 2 bracketrightbigg = integraldisplay ∞ 0 e −st t d 2 f dt 2 dt =− d ds integraldisplay ∞ 0 e −st d 2 f dt 2 dt =− d ds [s 2 ¯ f(s)−sf(0)−f prime (0)] =−s 2 d ¯ f ds −2s ¯ f + f(0). triangleleftsld Finally we mention the convolution theorem for Laplace transforms (which is analogous to that for Fourier transforms discussed in subsection 13.1.7). If the functions f and g have Laplace transforms ¯ f(s)and¯g(s)then L bracketleftbiggintegraldisplay t 0 f(u)g(t−u)du bracketrightbigg = ¯ f(s)¯g(s), (13.64) 457 INTEGRAL TRANSFORMS Figure 13.7 Two representations of the Laplace transform convolution (see text). where the integral in the brackets on the LHS is the convolution of f and g, denoted by f∗g. As in the case of Fourier transforms, the convolution defined above is commutative, i.e. f∗g = g∗f, and is associative and distributive. From (13.64) we also see that L −1 bracketleftbig ¯ f(s)¯g(s) bracketrightbig = integraldisplay t 0 f(u)g(t−u)du = f∗g. trianglerightsldProve the convolution theorem (13.64) for Laplace transforms. From the definition (13.64), ¯ f(s)¯g(s)= integraldisplay ∞ 0 e −su f(u)du integraldisplay ∞ 0 e −sv g(v)dv = integraldisplay ∞ 0 du integraldisplay ∞ 0 dve −s(u+v) f(u)g(v). Now letting u + v = t changes the limits on the integrals, with the result that ¯ f(s)¯g(s)= integraldisplay ∞ 0 duf(u) integraldisplay ∞ u dtg(t−u)e −st . As shown in figure 13.7(a) the shaded area of integration may be considered as the sum of vertical strips. However, we may instead integrate over this area by summing over horizontal strips as shown in figure 13.7(b). Then the integral can be written as ¯ f(s)¯g(s)= integraldisplay t 0 duf(u) integraldisplay ∞ 0 dtg(t−u)e −st = integraldisplay ∞ 0 dte −st braceleftbiggintegraldisplay t 0 f(u)g(t−u)du bracerightbigg = L bracketleftbiggintegraldisplay t 0 f(u)g(t−u)du bracketrightbigg . triangleleftsld 458 13.3 CONCLUDING REMARKS The properties of the Laplace transform derived in this section can sometimes be useful in finding the Laplace transforms of particular functions. trianglerightsldFind the Laplace transform of f(t)=tsinbt. Although we could calculate the Laplace transform directly, we can use (13.62) to give ¯ f(s)=(−1) d ds L [sinbt] =− d ds parenleftbigg b s 2 + b 2 parenrightbigg = 2bs (s 2 + b 2 ) 2 , for s>0. triangleleftsld 13.3 Concluding remarks In this chapter we have discussed Fourier and Laplace transforms in some detail. Both are examples of integral transforms, which can be considered in a more general context. A general integral transform of a function f(t) takes the form F(α)= integraldisplay b a K(α,t)f(t)dt, (13.65) where F(α) is the transform of f(t) with respect to the kernel K(α,t), and α is the transform variable. For example, in the Laplace transform case K(s,t)=e −st , a =0,b =∞. Very often the inverse transform can also be written straightforwardly and we obtain a transform pair similar to that encountered in Fourier transforms. Examples of such pairs are (i) the Hankel transform F(k)= integraldisplay ∞ 0 f(x)J n (kx)xdx, f(x)= integraldisplay ∞ 0 F(k)J n (kx)kdk, where the J n are Bessel functions of order n,and (ii) the Mellin transform F(z)= integraldisplay ∞ 0 t z−1 f(t)dt, f(t)= 1 2πi integraldisplay i∞ −i∞ t −z F(z)dz. Although we do not have the space to discuss their general properties, the reader should at least be aware of this wider class of integral transforms. 459 INTEGRAL TRANSFORMS 13.4 Exercises 13.1 Find the Fourier transform of the function f(t)=exp(−|t|). (a) By applying Fourier’s inversion theorem prove that π 2 exp(−|t|)= integraldisplay ∞ 0 cosωt 1+ω 2 dω. (b) By making the substitution ω =tanθ, demonstrate the validity of Parseval’s theorem for this function. 13.2 Use the general definition and properties of Fourier transforms to show the following. (a) If f(x) is periodic with period a then ˜ f(k) = 0, unless ka =2πn for integer n. (b) The Fourier transform of tf(t)isid ˜ f(ω)/dω. (c) The Fourier transform of f(mt + c)is e iωc/m m ˜ f parenleftBig ω m parenrightBig . 13.3 Find the Fourier transform of H(x−a)e −bx ,whereH(x) is the Heaviside function. 13.4 Prove that the Fourier transform of the function f(t) defined in the tf-plane by straight-line segments joining (−T,0) to (0,1) to (T,0), with f(t) = 0 outside |t| 0, 0 t<0, where γ (> 0) and p are constant parameters. (b) The current I(t) flowing through a certain system is related to the applied voltage V(t) by the equation I(t)= integraldisplay ∞ −∞ K(t−u)V(u)du, where K(τ)=a 1 f(γ 1 ,p 1 ,τ)+a 2 f(γ 2 ,p 2 ,τ). The function f(γ,p,t) is as given in (a) and all the a i ,γ i (> 0) and p i are fixed parameters. By considering the Fourier transform of I(t), find the relationship that must hold between a 1 and a 2 if the total net charge Q passed through the system (over a very long time) is to be zero for an arbitrary applied voltage. 13.14 Prove the equality integraldisplay ∞ 0 e −2at sin 2 atdt = 1 π integraldisplay ∞ 0 a 2 4a 4 + ω 4 dω. 13.15 A linear amplifier produces an output that is the convolution of its input and its response function. The Fourier transform of the response function for a particular amplifier is ˜ K(ω)= iω √ 2π(α + iω) 2 . Determine the time variation of its output g(t) when its input is the Heaviside step function. (Consider the Fourier transform of a decaying exponential function and the result of exercise 13.2(b).) 13.16 In quantum mechanics, two equal-mass particles having momenta p j =planckover2pi1k j and energies E j =planckover2pi1ω j and represented by plane wavefunctions φ j =exp[i(k j ·r j −ω j t)], j =1,2, interact through a potential V = V(|r 1 −r 2 |). In first-order perturbation theory the probability of scattering to a state with momenta and energies p prime j ,E prime j is determined by the modulus squared of the quantity M = integraldisplayintegraldisplayintegraldisplay ψ ∗ f Vψ i dr 1 dr 2 dt. The initial state, ψ i ,isφ 1 φ 2 and the final state, ψ f ,isφ prime 1 φ prime 2 . (a) By writing r 1 + r 2 =2R and r 1 −r 2 = r and assuming that dr 1 dr 2 = dRdr, show that M can be written as the product of three one-dimensional integrals. (b) From two of the integrals deduce energy and momentum conservation in the form of δ-functions. (c) Show that M is proportional to the Fourier transform of V,i.e.to tildewide V(k) where 2planckover2pi1k =(p 2 −p 1 )−(p prime 2 −p prime 1 ) or, alternatively, planckover2pi1k = p prime 1 −p 1 . 13.17 For some ion–atom scattering processes, the potential V of the previous exercise may be approximated by V = |r 1 −r 2 | −1 exp(−µ|r 1 −r 2 |). Show, using the result of the worked example in subsection 13.1.10, that the probability that the ion will scatter from, say, p 1 to p prime 1 is proportional to (µ 2 + k 2 ) −2 ,wherek =|k| and k is as given in part (c) of that exercise. 463 INTEGRAL TRANSFORMS 13.18 The equivalent duration and bandwidth, T e and B e , of a signal x(t) are defined in terms of the latter and its Fourier transform ˜x(ω)by T e = 1 x(0) integraldisplay ∞ −∞ x(t)dt, B e = 1 ˜x(0) integraldisplay ∞ −∞ ˜x(ω)dω, where neither x(0) nor ˜x(0) is zero. Show that the product T e B e =2π (this is a form of uncertainty principle), and find the equivalent bandwidth of the signal x(t)=exp(−|t|/T). For this signal, determine the fraction of the total energy that lies in the frequency range |ω|

**|a|; 464 13.4 EXERCISES (c) L [sinhatcosbt] = a(s 2 −a 2 + b 2 )[(s−a) 2 + b 2 ] −1 [(s + a) 2 + b 2 ] −1 . 13.24 Find the solution (the so-called impulse response or Green’s function)ofthe equation T dx dt + x = δ(t) by proceeding as follows. (a) Show by substitution that x(t)=A(1−e −t/T )H(t) is a solution, for which x(0) = 0, of T dx dt + x = AH(t), (∗) where H(t) is the Heaviside step function. (b) Construct the solution when the RHS of (∗) is replaced by AH(t−τ), with dx/dt = x =0fort<τ, and hence find the solution when the RHS is a rectangular pulse of duration τ. (c) By setting A =1/τ and taking the limit as τ → 0, show that the impulse response is x(t)=T −1 e −t/T . (d) Obtain the same result much more directly by taking the Laplace transform of each term in the original equation, solving the resulting algebraic equation and then using the entries in table 13.1. 13.25 This exercise is concerned with the limiting behaviour of Laplace transforms. (a) If f(t)=A + g(t), where A is a constant and the indefinite integral of g(t)is bounded as its upper limit tends to ∞, show that lim s→0 s ¯ f(s)=A. (b) For t>0, the function y(t) obeys the diﬀerential equation d 2 y dt 2 + a dy dt + by = ccos 2 ωt, where a, b and c are positive constants. Find ¯y(s)andshowthats¯y(s)→c/2b as s→0. Interpret the result in the t-domain. 13.26 By writing f(x) as an integral involving the δ-function δ(ξ−x) and taking the Laplace transforms of both sides, show that the transform of the solution of the equation d 4 y dx 4 −y = f(x) for which y and its first three derivatives vanish at x = 0 can be written as ¯y(s)= integraldisplay ∞ 0 f(ξ) e −sξ s 4 −1 dξ. Use the properties of Laplace transforms and the entries in table 13.1 to show that y(x)= 1 2 integraldisplay x 0 f(ξ)[sinh(x−ξ)−sin(x−ξ)] dξ. 465 INTEGRAL TRANSFORMS 13.27 The function f a (x) is defined as unity for 0**0, and |z|,for z<0. Auto-correlation a(z)=[(1/(2λ 3 )]exp(−λ|z|). 13.21 Prove the result for t 1/2 by integrating that for t −1/2 by parts. 13.23 (a) Use (13.62) with n =2onL bracketleftbig√ t bracketrightbig ; (b) use (13.63); (c) consider L [exp(±at)cosbt] and use the translation property, subsection 13.2.2. 13.25 (a) Note that |lim integraltext g(t)e −st dt|≤|lim integraltext g(t)dt|. (b) (s 2 + as + b)¯y(s)={c(s 2 +2ω 2 )/[s(s 2 +4ω 2 )]}+(a + s)y(0) + y prime (0). For this damped system, at large t (corresponding to s → 0) rates of change are negligible and the equation reduces to by = ccos 2 ωt. The average value of cos 2 ωt is 1 2 . 13.27 s −1 [1−exp(−sa)]; g a (x)=x for 0 0 and ellipses for c<0. Singular solution y =±(x±1). 14.23 (a) Integrating factor is (a 2 +x 2 ) 1/2 , y =(a 2 +x 2 )/3+A(a 2 +x 2 ) −1/2 ; (b) separable, y = x(x 2 + Ax +4) −1 . 14.25 Use Laplace transforms; ¯xs(s 2 +4)=s + s 2 −2e −3s ; x(t)= 1 2 sin2t +cos2t− 1 2 H(t−3) + 1 2 cos(2t−6)H(t−3). 14.27 This is Clairaut’s equation with F(p)=A/p.Generalsolutiony = cx + A/c; singular solution, y =2 √ Ax. 14.29 Either Bernoulli’s equation with n = 2 or an isobaric equation with m =3/2; y(x)=5x 3/2 /(2 + 3x 5/2 ). 488 14.5 HINTS AND ANSWERS 14.31 Show that p =(Ce x − 1) −1 ,wherep = dy/dx; y =ln[C −e −x )/(C − 1)] or ln[D−(D−1)e −x ]orln(e −K +1−e −x )+K. 489 15 Higher-order ordinary diﬀerential equations Following on from the discussion of first-order ordinary diﬀerential equations (ODEs) given in the previous chapter, we now examine equations of second and higher order. Since a brief outline of the general properties of ODEs and their solutions was given at the beginning of the previous chapter, we will not repeat it here. Instead, we will begin with a discussion of various types of higher-order equation. This chapter is divided into three main parts. We first discuss linear equations with constant coeﬃcients and then investigate linear equations with variable coeﬃcients. Finally, we discuss a few methods that may be of use in solving general linear or non-linear ODEs. Let us start by considering some general points relating to all linear ODEs. Linear equations are of paramount importance in the description of physical processes. Moreover, it is an empirical fact that, when put into mathematical form, many natural processes appear as higher-order linear ODEs, most often as second-order equations. Although we could restrict our attention to these second-order equations, the generalisation to nth-order equations requires little extra work, and so we will consider this more general case. A linear ODE of general order n has the form a n (x) d n y dx n + a n−1 (x) d n−1 y dx n−1 +···+ a 1 (x) dy dx + a 0 (x)y = f(x). (15.1) If f(x) = 0 then the equation is called homogeneous; otherwise it is inhomogeneous. The first-order linear equation studied in subsection 14.2.4 is a special case of (15.1). As discussed at the beginning of the previous chapter, the general solution to (15.1) will contain n arbitrary constants, which may be determined if n boundary conditions are also provided. In order to solve any equation of the form (15.1), we must first find the general solution of the complementary equation, i.e. the equation formed by setting 490 HIGHER-ORDER ORDINARY DIFFERENTIAL EQUATIONS f(x)=0: a n (x) d n y dx n + a n−1 (x) d n−1 y dx n−1 +···+ a 1 (x) dy dx + a 0 (x)y =0. (15.2) To determine the general solution of (15.2), we must find n linearly independent functions that satisfy it. Once we have found these solutions, the general solution is given by a linear superposition of these n functions. In other words, if the n solutions of (15.2) are y 1 (x),y 2 (x),...,y n (x), then the general solution is given by the linear superposition y c (x)=c 1 y 1 (x)+c 2 y 2 (x)+···+ c n y n (x), (15.3) where the c m are arbitrary constants that may be determined if n boundary conditions are provided. The linear combination y c (x) is called the complementary function of (15.1). The question naturally arises how we establish that any n individual solutions to (15.2) are indeed linearly independent. For n functions to be linearly independent over an interval, there must not exist any set of constants c 1 ,c 2 ,...,c n such that c 1 y 1 (x)+c 2 y 2 (x)+···+ c n y n (x) = 0 (15.4) over the interval in question, except for the trivial case c 1 = c 2 = ···= c n =0. A statement equivalent to (15.4), which is perhaps more useful for the practical determination of linear independence, can be found by repeatedly diﬀerentiating (15.4), n−1 times in all, to obtain n simultaneous equations for c 1 ,c 2 ,...,c n : c 1 y 1 (x)+c 2 y 2 (x)+···+ c n y n (x)=0 c 1 y 1 prime (x)+c 2 y 2 prime (x)+···+ c n y n prime (x)=0 . . . c 1 y (n−1) 1 (x)+c 2 y (n−1) 2 + ···+ c n y (n−1) n (x)=0, (15.5) where the primes denote diﬀerentiation with respect to x. Referring to the discussion of simultaneous linear equations given in chapter 8, if the determinant of the coeﬃcients of c 1 ,c 2 ,...,c n is non-zero then the only solution to equations (15.5) is the trivial solution c 1 = c 2 = ···= c n = 0. In other words, the n functions y 1 (x),y 2 (x),...,y n (x) are linearly independent over an interval if W(y 1 ,y 2 ,...,y n )= vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle y 1 y 2 ... y n y 1 prime y 2 prime . . . . . . . . . . . . y (n−1) 1 ... ... y (n−1) n vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle vextendsingle negationslash= 0 (15.6) over that interval; W(y 1 ,y 2 ,...,y n ) is called the Wronskian of the set of functions. It should be noted, however, that the vanishing of the Wronskian does not guarantee that the functions are linearly dependent. 491 HIGHER-ORDER ORDINARY DIFFERENTIAL EQUATIONS If the original equation (15.1) has f(x) = 0 (i.e. it is homogeneous) then of course the complementary function y c (x) in (15.3) is already the general solution. If, however, the equation has f(x)negationslash= 0 (i.e. it is inhomogeneous) then y c (x)isonly one part of the solution. The general solution of (15.1) is then given by y(x)=y c (x)+y p (x), (15.7) where y p (x)istheparticular integral,whichcanbeany function that satisfies (15.1) directly, provided it is linearly independent of y c (x). It should be emphasised for practical purposes that any such function, no matter how simple (or complicated), is equally valid in forming the general solution (15.7). It is important to realise that the above method for finding the general solution to an ODE by superposing particular solutions assumes crucially that the ODE is linear. For non-linear equations, discussed in section 15.3, this method cannot be used, and indeed it is often impossible to find closed-form solutions to such equations. 15.1 Linear equations with constant coeﬃcients If the a m in (15.1) are constants rather than functions of x then we have a n d n y dx n + a n−1 d n−1 y dx n−1 +···+ a 1 dy dx + a 0 y = f(x). (15.8) Equations of this sort are very common throughout the physical sciences and engineering, and the method for their solution falls into two parts as discussed in the previous section, i.e. finding the complementary function y c (x) and finding the particular integral y p (x). If f(x) = 0 in (15.8) then we do not have to find a particular integral, and the complementary function is by itself the general solution. 15.1.1 Finding the complementary function y c (x) The complementary function must satisfy a n d n y dx n + a n−1 d n−1 y dx n−1 + ···+ a 1 dy dx + a 0 y = 0 (15.9) and contain n arbitrary constants (see equation (15.3)). The standard method for finding y c (x) is to try a solution of the form y = Ae λx , substituting this into (15.9). After dividing the resulting equation through by Ae λx , we are left with a polynomial equation in λ of order n;thisistheauxiliary equation and reads a n λ n + a n−1 λ n−1 +···+ a 1 λ + a 0 =0. (15.10) 492 15.1 LINEAR EQUATIONS WITH CONSTANT COEFFICIENTS In general the auxiliary equation has n roots, say λ 1 ,λ 2 ,...,λ n . In certain cases, some of these roots may be repeated and some may be complex. The three main cases are as follows. (i) All roots real and distinct. In this case the n solutions to (15.9) are expλ m x for m =1ton. It is easily shown by calculating the Wronskian (15.6) of these functions that if all the λ m are distinct then these solutions are linearly independent. We can therefore linearly superpose them, as in (15.3), to form the complementary function y c (x)=c 1 e λ1x + c 2 e λ2x + ···+ c n e λnx . (15.11) (ii) Some roots complex. For the special (but usual) case that all the coeﬃcients a m in (15.9) are real, if one of the roots of the auxiliary equation (15.10) is complex, say α + iβ, then its complex conjugate α−iβ is also a root. In this case we can write c 1 e (α+iβ)x + c 2 e (α−iβ)x = e αx (d 1 cosβx+ d 2 sinβx) = Ae αx braceleftbigg sin cos bracerightbigg (βx+ φ), (15.12) where A and φ are arbitrary constants. (iii) Some roots repeated. If, for example, λ 1 occurs k times (k>1) as a root of the auxiliary equation, then we have not found n linearly independent solutions of (15.9); formally the Wronskian (15.6) o