100 days of data engineering

       Embedded devices are a crucial part of the Internet of things and having spent the greater part of the past 4 years building these devices, I have decided to now concentrate more on the cloud side of things with more emphasis on **Data Engineering**. IoT devices generate a lot of data and this data will remain useless if in-depth analysis cannot be performed on it. This is because the main reason for collecting the data initially is to gain insights that will help drive key business and personal decisions.

Large volumes of data are collected from IoT devices and the first step in data engineering is data ingestion. Therefore, understanding the principles behind building a robust pipeline through which this data will flow is quite important hence my reason for wanting to focus more on data engineering. The knowledge I gain during the course of my learning is also transferable to other domains.

I will be documenting my journey over the next 100 days so as to hold myself accountable and also to provide a reference for others who might want to embark on this same journey.

Day 1: Today, I started out with docker as it is an important tool for every data engineer. Docker helps to isolate applications in such a way that they are easily transferrable across platforms. Once you package an application along with its dependencies and configurations as a docker image and it works successfully on your local machine, then it will work in production or any other machine without hassle. Docker images can be hosted on a public or private repository and pulled whenever a user requires it.

A lot of people use the words images and containers interchangeably but they are two separate things. An image is an actual package containing the application, its dependencies, and necessary configurations which can be easily pushed to a repository and easily movable while a container is a running environment for an image usually with its own file system. Once an image is pulled to for instance a local machine and the application starts running, that creates a container environment.

I also went through basic docker commands used to pull an image, run a container, list all running containers and those not running alongside starting and stopping containers.

The last concept I worked on was that of container and host port. From the knowledge of networking, a computer has various ports that it uses to listen to incoming messages. Once a port is assigned to a particular operation, it can't be assigned to another operation. This is the concept of host port. When it comes to container ports, the same container port can be used to listen for messages to various versions of an application but the host port must be different. If not properly set, an error will be thrown.

100 Days of Data Engineering- Day1