Data science work requires a proper ML workspace setup to execute python code and to work with data more efficiently. This is where Azure Machine Learning Studio comes into the picture which is the combination of azure managed resources within the workspace.
To get started with Azure ML studio you must understand its purpose. And if you are familiar with the data science lifecycle where a typical data scientist has to work with Data engineering teams, DevOps team, and security team to start with the initial system and data setup. The entire process takes a few weeks to months for the system up and running. Hence to avoid such hazardous processes, organizations are quickly moving towards cloud vendors such as Microsoft Azure, Amazon AWS, Google GCP for better resolution.
Azure Machine Learning Studio Architecture:
Azure Machine Learning (ML) Studio architecture consists of multiple components of azure services such as computing resources where data scientists can run coding notebooks, azure storage account to store any structure or unstructured data, azure Key vault adds up a security layer to make a secured workspace and many more features.
Let’s discuss each component in detail but before that, I will try to illustrate it with an example. Suppose your task is to predict if a patient has diabetes or not. You have a record of 1000’s patients with their age, gender, eating habits, lifestyle, medication, etc in a CSV file.
- Azure Storage Account: As mentioned above our problem statement consists of a dataset of people having diabetes. Such data might currently exist in your local system. An azure data storage account can be used for such tasks to store any kind of data as blob storage, file storage, or table store. This is also used to store Jupyter Notebooks.
- Azure Container Registry: Training a model and deploying it in different environments can be a cumbersome task. Docker images can be used to register them as a container.
- Azure Key Vault: Stores all the sensitive information such as private keys, ID, and other information securely.
- Azure Application Insights: To monitor the trained models and their performance.
Also Read: Data Cleaning Steps in NLP
We discussed how to store data in an Azure storage account, the next step is we need some computing power where we can start doing some experiments. Depending on the size of the dataset we must choose the computing power to explore and train the model.
- Compute Instances: Based on requiremen you can configuration computing machine, for smaller dataset (less resources), for high dimensions/million of rows/image (high configuration like GPU)
- Compute clusters: For Spark related (Hadoop) computation, or data residers in data lakes.
Till now we managed to store datasets in the azure storage account and using computing instance we can run our Jupyter notebook.
The main components in Azure Machine Learning Studio Workspace:
- Environments: The environment works as an container where we deploy machine/deep learning models, configure different environment variables, specify python packages.
- Experiments: Experiments stores the metadata of the runs as an artifacts. We can see how many time our model ran and its metrics over a period of time.
- Pipelines: To manage workflows of different machine learning phases like data preparation, models, inference/scoring phases.
- Datasets: Bringing data from data storage, then we can use this data in our model
- Models: After finanlizing the algorithm with better metrics, the model can be registered for the deployment.
- Endpoints: To create API end points which can be consumed by other models where we can predict the model performance in realtime.
Sequence: Environment Selection -> Datasets -> Experiments -> Pipelines -> Models -> Endpoints
If you are interested to learn more about azure, please follow the below link.
Resources: https://docs.microsoft.com/en-us/azure/machine-learning/