Tag Archives: vector

the Gram–Schmidt process

In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process is a method fororthonormalising a set of vectors in an inner product space, most commonly the Euclidean space Rn. The Gram–Schmidt process takes a finitelinearly independent set S = {v1, …, vk} for k ≤ n and generates an orthogonal setS′ = {u1, …, uk} that spans the same k-dimensional subspace of Rn as S.

The method is named after Jørgen Pedersen Gram and Erhard Schmidt but it appeared earlier in the work of Laplaceand Cauchy. In the theory of Lie group decompositions it is generalized by the Iwasawa decomposition.[1]

The application of the Gram–Schmidt process to the column vectors of a full column rank matrix yields the QR decomposition (it is decomposed into an orthogonal and a triangular matrix).

In linear algebra, a QR decomposition (also called a QR factorization) of a matrix is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and anupper triangular matrix R. QR decomposition is often used to solve the linear least squares problem, and is the basis for a particular eigenvalue algorithm, the QR algorithm.

If A has n linearly independent columns, then the first n columns of Q form an orthonormal basis for the column space of A. More specifically, the first k columns of Q form an orthonormal basis for the span of the first k columns of A for any 1 ≤ k ≤ n.[1] The fact that any column k of A only depends on the first k columns of Q is responsible for the triangular form of R


Deriving the Dot Product

Some books actually use |U||V|\cos(\theta) as the definition of the dot product. Another definition is u \cdot v =x_u x _v +y_u y_v.

The idea is to find the angle between two vectors. At way to do this is to look at the angles made with the x-axis. We want to know the difference between the two angles, which I’ll call \theta_u, \theta_v. Similarly I’ll let the vector u have two components (x_u, y_u) and v be (x_v, y_v) .

I want to find cos(x), which is:

\displaystyle \cos(\theta_u -\theta_v)  = \cos \theta_u \cos \theta_v + \sin \theta_u \sin \theta_v  = (\frac{x_u }{ |u|}) (\frac{x_v }{ |v|}) + (\frac{y_u} {|u|}) (\frac{y_v }{ |v|})

Very simple, it turns out, when you look at it the right way.

Make sense? Now, why don’t you try to derive the same result for
3-dimensional vectors? If you’re slick, you can actually use the
2-dimensional result (hint: there’s a plane that contains the two
vectors and the origin).





Cauchy–Schwarz inequality

From Wikipedia, the free encyclopedia

In mathematics, the Cauchy–Schwarz inequality (also known as the Bunyakovsky inequality, the Schwarz inequality, or theCauchy–Bunyakovsky–Schwarz inequality), is a useful inequality encountered in many different settings, such as linear algebra,analysis, in probability theory, and other areas. It is a specific case of Hölder’s inequality.

The inequality for sums was published by Augustin-Louis Cauchy (1821), while the corresponding inequality for integrals was first stated by Viktor Bunyakovsky (1859) and rediscovered by Hermann Amandus Schwarz (1888) (often misspelled “Schwartz”).

Statement of the inequality

The Cauchy–Schwarz inequality states that for all vectors x and y of an inner product space,

| \langle x,y\rangle|^2 \leq \langle x,x\rangle \cdot \langle y,y\rangle,

where \langle\cdot,\cdot\rangle is the inner product. Equivalently, by taking the square root of both sides, and referring to the norms of the vectors, the inequality is written as

 |\langle x,y\rangle| \leq \|x\| \cdot \|y\|.\,

Moreover, the two sides are equal if and only if x and y are linearly dependent (or, in a geometrical sense, they are parallel or one of the vectors is equal to zero).

If x_1,\ldots, x_n\in\mathbb C and y_1,\ldots, y_n\in\mathbb C are any complex numbers and the inner product is the standard inner product then the inequality may be restated in a more explicit way as follows:

|x_1 \bar{y_1} + \cdots + x_n \bar{y_n}|^2 \leq (|x_1|^2 + \cdots + |x_n|^2) (|y_1|^2 + \cdots + |y_n|^2).

When viewed in this way the numbers x1, …, xn, and y1, …, yn are the components of x and y with respect to an orthonormal basis of V.

Even more compactly written:

\left|\sum_{i=1}^n x_i \bar{y_i}\right|^2 \leq \sum_{j=1}^n |x_j|^2 \sum_{k=1}^n |y_k|^2 .

Equality holds if and only if x and y are linearly dependent, that is, one is a scalar multiple of the other (which includes the case when one or both are zero).

The finite-dimensional case of this inequality for real vectors was proved by Cauchy in 1821, and in 1859 Cauchy’s studentBunyakovsky noted that by taking limits one can obtain an integral form of Cauchy’s inequality. The general result for an inner product space was obtained by Schwarz in 1885.


Let uv be arbitrary vectors in a vector space V over F with an inner product, where F is the field of real or complex numbers. We prove the inequality

 \big| \langle u,v \rangle \big| \leq \left\|u\right\| \left\|v\right\|. \,

This inequality is trivial in the case v = 0, so we assume that <vv> is nonzero. Let δ be any number in the field F. Then,

 0 \leq \left\| u-\delta v \right\|^2 = \langle u-\delta v,u-\delta v \rangle = \langle u,u \rangle - \bar{\delta} \langle u,v \rangle - \delta \langle v,u \rangle + |\delta|^2 \langle v,v\rangle. \,

Choose the value of δ that minimizes this quadratic form, namely

 \delta = \langle u,v \rangle \cdot \langle v,v \rangle^{-1}. \,

(A quick way to remember this value of δ is to imagine F to be the reals, so that the quadratic form is a quadratic polynomial in the real variable δ, and the polynomial can easily be minimized by setting its derivative equal to zero.)

We obtain

 0 \leq \langle u,u \rangle - |\langle u,v \rangle|^2 \cdot \langle v,v \rangle^{-1} \,

which is true if and only if

 |\langle u,v \rangle|^2 \leq \langle u,u \rangle \cdot \langle v,v \rangle, \,

or equivalently:

 \big| \langle u,v \rangle \big| \leq \left\|u\right\| \left\|v\right\|, \,

which completes the proof.

Notable special cases


In Euclidean space Rn with the standard inner product, the Cauchy–Schwarz inequality is

\left(\sum_{i=1}^n x_i y_i\right)^2\leq \left(\sum_{i=1}^n x_i^2\right) \left(\sum_{i=1}^n y_i^2\right).

To prove this form of the inequality, consider the following quadratic polynomial in z.

(x_1 z + y_1)^2 + \cdots + (x_n z + y_n)^2.

Since it is nonnegative it has at most one real root in z, whence its discriminant is less than or equal to zero, that is,

\left(\sum ( x_i \cdot y_i ) \right)^2 - \sum {x_i^2} \cdot \sum {y_i^2} \le 0,

which yields the Cauchy–Schwarz inequality.

An equivalent proof for Rn starts with the summation below.

Expanding the brackets we have:

 \sum_{i=1}^n \sum_{j=1}^n \left( x_i y_j - x_j y_i \right)^2   = \sum_{i=1}^n x_i^2 \sum_{j=1}^n y_j^2 + \sum_{j=1}^n x_j^2 \sum_{i=1}^n y_i^2  - 2 \sum_{i=1}^n x_i y_i \sum_{j=1}^n x_j y_j ,

collecting together identical terms (albeit with different summation indices) we find:

 \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \left( x_i y_j - x_j y_i \right)^2   = \sum_{i=1}^n x_i^2 \sum_{i=1}^n y_i^2 - \left( \sum_{i=1}^n x_i y_i \right)^2 .

Because the left-hand side of the equation is a sum of the squares of real numbers it is greater than or equal to zero, thus:

 \sum_{i=1}^n x_i^2 \sum_{i=1}^n y_i^2 - \left( \sum_{i=1}^n x_i y_i \right)^2 \geq 0.

This form is used usually when solving school math problems.

Yet another approach when n ≥ 2 (n = 1 is trivial) is to consider the plane containing x and y. More precisely, recoordinatize Rn with any orthonormal basis whose first two vectors span a subspace containing x and y. In this basis only x_1,~x_2,~y_1 and y_2~ are nonzero, and the inequality reduces to the algebra of dot product in the plane, which is related to the angle between two vectors, from which we obtain the inequality:

|x \cdot y| = \|x\| \|y\| | \cos \theta | \le \|x\| \|y\|.

When n = 3 the Cauchy–Schwarz inequality can also be deduced from Lagrange’s identity, which takes the form

\langle x,x\rangle \cdot \langle y,y\rangle = |\langle x,y\rangle|^2 + |x \times y|^2

from which readily follows the Cauchy–Schwarz inequality.


For the inner product space of square-integrable complex-valued functions, one has

\left|\int f(x) g(x)\,dx\right|^2\leq\int \left|f(x)\right|^2\,dx \cdot \int\left|g(x)\right|^2\,dx.

A generalization of this is the Hölder inequality.


The triangle inequality for the inner product is often shown as a consequence of the Cauchy–Schwarz inequality, as follows: given vectors x and y:

 \begin{align} \|x + y\|^2 & = \langle x + y, x + y \rangle \\ & = \|x\|^2 + \langle x, y \rangle + \langle y, x \rangle + \|y\|^2 \\ & \le \|x\|^2 + 2|\langle x, y \rangle| + \|y\|^2 \\ & \le \|x\|^2 + 2\|x\|\|y\| + \|y\|^2 \\ & = \left(\|x\| + \|y\|\right)^2. \end{align}

Taking square roots gives the triangle inequality.

The Cauchy–Schwarz inequality allows one to extend the notion of “angle between two vectors” to any real inner product space, by defining:

 \cos\theta_{xy}=\frac{\langle x,y\rangle}{\|x\| \|y\|}.

The Cauchy–Schwarz inequality proves that this definition is sensible, by showing that the right hand side lies in the interval [−1, 1], and justifies the notion that (real) Hilbert spaces are simply generalizations of the Euclidean space.

It can also be used to define an angle in complex inner product spaces, by taking the absolute value of the right hand side, as is done when extracting a metric from quantum fidelity.

The Cauchy–Schwarz is used to prove that the inner product is a continuous function with respect to the topology induced by the inner product itself.

The Cauchy–Schwarz inequality is usually used to show Bessel’s inequality.

Other proofs

If either \left|x\right> or \left|y\right> are the zero vector, the statement holds trivially, so assume that both are nonzero.

For any nonzero vector \left|V\right>, \left<V|V\right> > 0. (NOTE: merits own proof)

\displaystyle \left< \alpha X + Y| \alpha X + Y \right> \geq 0

If the inner product is symmetric. Let \alpha be a real scalar.

\displaystyle \alpha^2\left< X |X\right>+\alpha(\left< X |Y\right>+\left< Y |X\right>)+ \left<Y|Y \right> \geq 0

The last expression is a quadratic polynomial that is non-negative for any \alpha.  The quadratic has either two complex roots,or  a single  real root. Intuitively, the polynomial is either ‘floating above’ the horizontal axis, if it has two complex roots, or tangent to it if it has one real root, since it can’t have two real roots because the graph of the function would have to ‘pass under’ the horizontal axis and take some negative values.

The roots are given by the quadratic formula


In particular, the term math must either be negative, yielding two complex roots, or zero, yielding a single real root. Thus



Substituting the values of mathmath and math into the last of these inequalities, it can be seen that

\displaystyle (\left< X |Y\right>+\left< Y |X\right>)^2 \leq 4\left< X |X\right> \left<Y|Y \right>

If the inner product is symmetric, this proves the inequality.

An alternative proof follows from the expression

\displaystyle  \frac{{(a+b)}^2}{x+y}=\frac{a^2}{x}+\frac{b^2}{y},

valid for a and b real and $x>0$ and $y>0$. This expression is a restatement of (a y - b x)^2 \geq  0. From this one can get a general n-term expression

\displaystyle  \frac{{(a_1+a_2+\cdots+a_n)}^2}{x_1+x_2+\cdots x_n}=\frac{a_1^2}{x_1}+\frac{a_2^2}{x_2}+\cdots +\frac{a_n^2}{x_n}

To get cauchy-Scwarz inequality set a_k=\alpha_k \beta_k and x_k=\beta_k^2.

If the inner product is skew-symmetric, take

\displaystyle \alpha = -\frac{\left<X|Y\right>}{\left<X|X\right>}

\displaystyle    ( -\frac{\left< Y |X\right>}{\left< X |X\right>} \left<X\right| + \left<Y\right|)( -\frac{\left< X |Y\right>}{\left< X |X\right>} \left|X\right> + \left|Y\right> ) \geq 0

\displaystyle    \frac{\left< Y |X\right>\left< X |Y\right>}{\left< X |X\right>} -\frac{\left< Y |X\right>\left< X |Y\right>}{\left< X |X\right>} -\frac{\left< Y |X\right>\left< X |Y\right>}{\left< X |X\right>} + \left<Y|Y\right> \geq 0

\displaystyle    \left< X |X\right>\left<Y|Y\right> \geq \left< Y |X\right>\left< X |Y\right>

The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities

Cauchy-Schwarz Inequality: Yet Another Proof


Continue reading

dot product

Vector formulation

The law of cosines is equivalent to the formula
vec bcdot vec c = Vert vec bVertVertvec cVertcos theta
in the theory of vectors, which expresses the dot product of two vectors in terms of their respective lengths and the angle they enclose.

Fig. 10 — Vector triangle

Proof of equivalence. Referring to Figure 10, note that
vec a=vec b-vec c,,
and so we may calculate:
 begin{align} Vertvec aVert^2 & = Vertvec b - vec cVert^2 \ & = (vec b - vec c)cdot(vec b - vec c) \ & = Vertvec b Vert^2 + Vertvec c Vert^2 - 2 vec bcdotvec c. end{align}
The law of cosines formulated in this notation states:
Vertvec aVert^2 = Vertvec b Vert^2 + Vertvec c Vert^2 - 2 Vert vec bVertVertvec cVertcos(theta), ,
which is equivalent to the above formula from the theory of vectors.

  1. (by definition of dot product)

    If you think of the length of the 3 vectors |A|,|B| and |B-A| as the lengths of the sides of a triangle, you can apply the law of cosines here too (To visualize this, draw the 2 vectors A and B onto a graph, now the vector from A to B will be given by B-A. The triangle formed by these 3 vectors is applied to the law of cosines for a triangle)

    In this case, we substitute: |B-A| for c, |A| for a, |B| for b
    and we obtain:

  2.   (by law of cosines)

Remember now, that Theta is the angle between the 2 vectors A, B.
Notice the common term |A||B|cos(Theta) in both equations. We now equate equation (1) and (2), and obtain

and hence

(by pythagorean length of a vector) and thus

Unit Vectors in Curvilinear Coordinates

Swapnil Sunil Jain

Let (u,v,w) be any non-cartesian coordinate system such that x=x(u,v,w), y =y(u,v,w), z=z(u,v,w).

We can combine the above three equations into a single vector equation that gives the position vector {\overrightarrow{\bf r}} of any point P(x,y,z) in space as a function of the coordinates u,v,w:

\displaystyle {\overrightarrow{\bf r}}=x(u,v,w)\hat{\bf i}+y(u,v,w)\hat{\bf j}+z(u,v,w)\hat{\bf k}

If we held $ u$ fixed s.t. $ u=u_0$ then the position vector becomes the parametric equation of the surface (called the coordinate surface) $ u=u_0$ where $ v,w$ play the role of parameters. Furthermore, if we held both $ u$ and $ v$ fixed s.t $ u=u_0$ and $ v=v_0$, then the position vector becomes the parametric equation of the curve (called the coordinate curve) formed by the intersection of the surfaces $ u=u_0$ and $ v=v_0$, in which $ w$ acts as a parameter along the curve.

Now, how do we find the tangent vectors? Well, what is the meaning of a tangent vector? A tangent vector is a vector which is tangent to a coordinate curve formed by the intersection of the two coordinate surfaces. In other words, it is a vector which indicates the direction in which one of the coordinates, say $ u$, increases while the other two coordinates (i.e. $ v$ and $ w$) are held fixed. Sound familiar? Yes, of course, partial derivatives! A partial derivative with respect to $ u$ would take the derivative of the position vector $ \vec{r}$ along the coordinate curve formed by the intersection of the surfaces $ v=v_0$ and $ w=w_0$ and hence return you a tangent vector along that curve. Hence, by taking the partial derivative of $ \vec{r}$ one by one with respect to all three coordinates, we would get all the three tangent vectors which are tangent to their respective coordinate curves. Thus, we arrive at the following three tangent vectors:

\displaystyle \overrightarrow{\bf v_\alpha}=\frac{\partial \overrightarrow{\bf r}}{\partial \alpha}, \alpha = u, v, w .

However, these are not normalized vectors. Most often we are interested in unit tangent vectors. So we divide them by their respective lengths. Therefore,

\displaystyle  \overrightarrow{\bf e_\alpha}=\frac{\frac{\partial \overrightarrow{\bf r}}{\partial \alpha}}{|\frac{\partial \overrightarrow{\bf r}}{\partial \alpha}|}, \alpha = u, v, w  .


\displaystyle \overrightarrow{\bf e_\alpha}=\frac{\frac{\partial \overrightarrow{\bf r}}{\partial \alpha}}{h_\alpha}, \alpha = u, v, w .


\displaystyle h_\alpha=\sqrt{{\frac{\partial x}{\partial \alpha}}^2+{\frac{\partial y}{\partial \alpha}}^2+{\frac{\partial z}{\partial \alpha}}^2}

are known as scale or metric factors (or coefficients).

Gram–Schmidt process

 the Gram–Schmidt process is a method for orthonormalising a set of vectors in an inner product space, most commonly theEuclidean space Rn. The Gram–Schmidt process takes a finitelinearly independent set S = {v1, …, vk} for k ≤ n and generates an orthogonal set S′ = {u1, …, uk} that spans the same k-dimensional subspace of Rn as S.
The method is named for Jørgen Pedersen Gram and Erhard Schmidt but it appeared earlier in the work of Laplace and Cauchy. In the theory of Lie group decompositions it is generalized by theIwasawa decomposition.
The application of the Gram–Schmidt process to the column vectors of a full column rank matrix yields the QR decomposition (it is decomposed into an orthogonal and a triangular matrix).

The Gram–Schmidt process

We define the projection operator by
mathrm{proj}_{mathbf{u}},(mathbf{v}) = {langle mathbf{v}, mathbf{u}rangleoverlangle mathbf{u}, mathbf{u}rangle}mathbf{u} ,
where 〈u, v〉 denotes the inner product of the vectors u and v. This operator projects the vector v orthogonally onto the vector u.
The Gram–Schmidt process then works as follows:
 begin{align} mathbf{u}_1 & = mathbf{v}_1, & mathbf{e}_1 & = {mathbf{u}_1 over |mathbf{u}_1|} \ mathbf{u}_2 & = mathbf{v}_2-mathrm{proj}_{mathbf{u}_1},(mathbf{v}_2), & mathbf{e}_2 & = {mathbf{u}_2 over |mathbf{u}_2|} \ mathbf{u}_3 & = mathbf{v}_3-mathrm{proj}_{mathbf{u}_1},(mathbf{v}_3)-mathrm{proj}_{mathbf{u}_2},(mathbf{v}_3), & mathbf{e}_3 & = {mathbf{u}_3 over |mathbf{u}_3|} \ mathbf{u}_4 & = mathbf{v}_4-mathrm{proj}_{mathbf{u}_1},(mathbf{v}_4)-mathrm{proj}_{mathbf{u}_2},(mathbf{v}_4)-mathrm{proj}_{mathbf{u}_3},(mathbf{v}_4), & mathbf{e}_4 & = {mathbf{u}_4 over |mathbf{u}_4|} \ vdots & vdots \ mathbf{u}_k & = mathbf{v}_k-sum_{j=1}^{k-1}mathrm{proj}_{mathbf{u}_j},(mathbf{v}_k), & mathbf{e}_k & = {mathbf{u}_kover |mathbf{u}_k |}. end{align}

The first two steps of the Gram–Schmidt process
The sequence u1, …, uk is the required system of orthogonal vectors, and the normalized vectors e1, …, ek form an orthonormal set. The calculation of the sequence u1, …, uk is known as Gram–Schmidt orthogonalization, while the calculation of the sequence e1, …,ek is known as Gram–Schmidt orthonormalization as the vectors are normalized.
To check that these formulas yield an orthogonal sequence, first compute 〈u1u2〉 by substituting the above formula for u2: we get zero. Then use this to compute 〈u1u3〉 again by substituting the formula for u3: we get zero. The general proof proceeds bymathematical induction.
Geometrically, this method proceeds as follows: to compute ui, it projects vi orthogonally onto the subspace U generated by u1, …,ui−1, which is the same as the subspace generated by v1, …, vi−1. The vector ui is then defined to be the difference between vi and this projection, guaranteed to be orthogonal to all of the vectors in the subspace U.
The Gram–Schmidt process also applies to a linearly independent infinite sequence {vi}i. The result is an orthogonal (or orthonormal) sequence {ui}i such that for natural number n: the algebraic span of v1, …, vn is the same as that of u1, …, un.
If the Gram–Schmidt process is applied to a linearly dependent sequence, it outputs the 0 vector on the ith step, assuming that vi is a linear combination of v1, …, vi−1. If an orthonormal basis is to be produced, then the algorithm should test for zero vectors in the output and discard them because no multiple of a zero vector can have a length of 1. The number of vectors output by the algorithm will then be the dimension of the space spanned by the original inputs.

Numerical stability

When this process is implemented on a computer, the vectors uk are often not quite orthogonal, due to rounding errors. For the Gram–Schmidt process as described above (sometimes referred to as “classical Gram–Schmidt”) this loss of orthogonality is particularly bad; therefore, it is said that the (classical) Gram–Schmidt process is numerically unstable.
The Gram–Schmidt process can be stabilized by a small modification. Instead of computing the vector uk as
 mathbf{u}_k = mathbf{v}_k - mathrm{proj}_{mathbf{u}_1},(mathbf{v}_k) - mathrm{proj}_{mathbf{u}_2},(mathbf{v}_k) - cdots - mathrm{proj}_{mathbf{u}_{k-1}},(mathbf{v}_k),
it is computed as
 begin{align} mathbf{u}_k^{(1)} &= mathbf{v}_k - mathrm{proj}_{mathbf{u}_1},(mathbf{v}_k), \ mathbf{u}_k^{(2)} &= mathbf{u}_k^{(1)} - mathrm{proj}_{mathbf{u}_2} , (mathbf{u}_k^{(1)}), \ & ,,, vdots \ mathbf{u}_k^{(k-2)} &= mathbf{u}_k^{(k-3)} - mathrm{proj}_{mathbf{u}_{k-2}} , (mathbf{u}_k^{(k-3)}), \ mathbf{u}_k^{(k-1)} &= mathbf{u}_k^{(k-2)} - mathrm{proj}_{mathbf{u}_{k-1}} , (mathbf{u}_k^{(k-2)}).  end{align}
Each step finds a vector  mathbf{u}_k^{(i)}  orthogonal to  mathbf{u}_k^{(i-1)} . Thus  mathbf{u}_k^{(i)}  is also orthogonalized against any errors introduced in computation of  mathbf{u}_k^{(i-1)} . This approach (sometimes referred to as “modified Gram–Schmidt”) gives the same result as the original formula in exact arithmetic and introduces smaller errors in finite-precision arithmetic.


The following algorithm implements the stabilized Gram–Schmidt orthonormalization. The vectors v1, …, vk are replaced by orthonormal vectors which span the same subspace.
for j from 1 to k do

for i from 1 to j − 1 do

 mathbf{v}_j leftarrow mathbf{v}_j - mathrm{proj}_{mathbf{v}_{i}} , (mathbf{v}_j)  (remove component in direction vi)
next i
 mathbf{v}_j leftarrow frac{mathbf{v}_j}{|mathbf{v}_j|}  (normalize)
next j
The cost of this algorithm is asymptotically 2nk2 floating point operations, where n is the dimensionality of the vectors (Golub & Van Loan 1996, §5.2.8)