It’s important to understand the various specifications, requirements, priorities and budget. You will assess if you have the required resources present in terms of people, technology, time and data to support the project. In this phase, there will be a need to frame the business problem and formulate Initial Hypotheses (IH) to test.
Phase 2 requires an analytical sandbox in which you can perform analytics for the entire duration of the project. You will need to explore, pre-process and condition the data prior to modelling. Further, ETLT (extract, transform, load and transform) will be performed to sift data into the sandbox.
The methodology and techniques to draw the relationships between variables will be determined in this step. The relationships will set the base for the algorithms which will be implemented in the next phase. Application of Exploratory Data Analytics (EDA) will be implemented through various statistical formulas and visualisation tools.
In this phase, you will develop datasets for training and testing purposes. Here you need to consider whether your existing tools will suffice for running the models or it will need a more robust environment (like fast and parallel processing). You will analyse various learning techniques like classification, association and clustering to build the model.
This phase will require the delivery of final reports, briefings, code and technical documentation. The pilot project can also be implemented in a real-time production environment to give you a clear picture of the performance and other related constraints before the full deployment of the system.
It’s imperative that you have been able to achieve your planned objective within the first phase. The final phase should identify the key findings, communicate the results to the stakeholders and determine the success or failure of the project based on the criteria developed in Phase 1.