From world space -> to camera space (view transformation):
In the camera space the camera is at the origin and all the objects coordinates are relative to its position and orientation. So the camera object doesn't actually exists. The view transformation transform the objects from world space as if the camera was there.
We use world coordinates as a reference system to locate objects in 3D space.
In actual applications, we are interested in generating a 2D view for a plane arbitrarily positioned in the space. This is achieved by adding a set of additional transformations before the projection.
We will focus on perspective, but the same reasoning applies also to parallel projection.
The Camera Matrix, denoted as $M_c$, moves and aligns the camera object to its desired position and orientation. The View Matrix, denoted as $M_v$, is equal to the inverse of $M_c$ and is commonly referred as the view-projection matrix.
Combining $M_v$ with the projection matrix $M_{Prj}$ we can define the **view-projection matrix** which allows us to obtain the normalized screen coordinates of space points as seen by the specified camera.
$
M_{V P}=M_{P r j} \cdot M_V
$
Several techniques exist to create a view matrix in a user-friendly way. The two most popular are:
- The look-in-direction technique
- The look-at technique
## Look-in model
Rotations must be performed in a specific order: roll must be performed first, then the elevation, and finally the direction. Translation is performed after the rotations. The Camera Matrix is then composed in this way.
$
M_V=\left(M_c\right)^{-1}=R_z(-\rho) \cdot R_x(-\beta) \cdot R_y(-\alpha) \cdot T\left(-c_x,-c_y,-c_z\right)
$
GLM does not provide any special support to build a Look-in-direction matrix. However, due to its simplicity, it can be easily implemented.
```cpp
glm::mat4 Mv = glm::rotate(glm::mat4(1.0), -rho, glm::vec3(0,0,1)) * glm::rotate(glm::mat4(1.0), -beta, glm::vec3(1,0,0)) * glm::rotate(glm::mat4(1.0), -alpha, glm::vec3(0,1,0)) * glm::translate(glm::mat4(1.0), glm::vec3(-cx, -cy ,-cz));`
```
## Look-at model
The look-at model is instead generally employed in third person applications. In this case, the camera tracks a point (or an object) aiming at it.

The View Matrix is computed by first determining the direction of its axis in World coordinates, and then using the corresponding information to build the Camera Matrix.

$v_z = \frac{c-a}{|c-a|}$
Then the x-axis has to be perpendicular to $v_z$ and parallel to the ground. To do this we use cross product.
$v_x=\frac{u \times v_z}{\left|u \times v_z\right|}$
GLM can compute the cross product of two vectors using the cross() function: `glm::vec3 uXvz = glm::cross(u, vz);` .
A problem can arise when the camera is perfectly vertical over the target. The idea is that basically we can tilt the camera arbitrarily. In math, this problem occours when the results of the cross product of vectors $u$ and $v_z$ is zero (they are aligned). This makes it impossible to determine an unique vector $v_x$ and calculate a proper camera matrix.
New y-axis should be perpendicular to both the new z-axis and the new x-axis. This could be computed via the cross product of the two vectors just obtained.
$v_y=v_z \times v_x$
Since both the new z-axis and the new x-axis are by construction perpendicular and unit vectors, normalization is not required.
# World view projection matrices
### 5 steps to get pixel position on screen from 3D model coordinates
{width=50%}
To get pixel position of a 3D model, following steps should be performed in order:
1. **World Transform**: from Local Space to Global Space, by multiplying them with the World Matrix $M_W$.
2. **View Transform**: The coordinates are converted from Global Space to Camera Space with the View Matrix $M_V$, typically generated using look-in-direction or look-at methods.
3. **Projection**: A process that transforms $3D$ coordinates into $2D$ coordinates. Perspective (projection matrix $M_{P-Persp}$) or parallel (projection matrix $M_{P-Ort}$) projection.
4. **Normalization**: A process that normalizes the homogenous coordinates that describe the points in the clipping space and maps them to a range of $-1$ to $+1$ on the $x$, $y$ and $z$ axes. Every coordinate is divided by the fourth component of the homogenous coordinate vector. This step is not necessary in parallel projections, since in this case matrix $M_{P-Ort}$ already provides normalized screen coordinates
5. **Screen Transform**: A process that maps the normalized clipping coordinates first to normalized screen coordinates (if necessary), and then to pixel coordinates(to ensure a proper visualization on different devices with different resolutions). This step is done by the driver of the video card.
Each step performs a coordinate transformation from one space to another
In most of the cases the World(1), View(2) and Projection (3) matrices are factorized in a single matrix