Reasons Counting is the Hardest Thing in Data Scientist
In Data Science, training should be based on the tasks assigned to the specialist. However, the tasks of a Data Scientist may differ depending on the field of activity of the company Data-Science UA. Here are some examples:
- detection of anomalies – for example, non-standard actions with a bank card, fraud;
- analysis and forecasting – performance indicators, quality of advertising campaigns;
- scoring and grading systems – processing large amounts of data for making decisions, for example, on granting a loan;
- basic interaction with the client – automatic replies in chats, voice assistants, sorting letters into folders.
But for any of the above tasks, you always need to follow approximately the same steps:
- Data collection – search for sources and methods of obtaining information, as well as the collection process itself.
- Checking – validation, removal of anomalies.
- Analysis – the study of data, making assumptions, conclusions.
- Visualization – bringing data into a human-readable form (graphs and diagrams).
- The result is making decisions based on the analyzed data, for example, about changing the marketing strategy or increasing the budget for any of the company’s activities.
What do you mean by “knowledge-intensive methods”?
In practical application, machine learning development company methods are widely known. There are many applied data processing algorithms. When working with data, sometimes problems appear that can be called scientific: they are formalized, some assumptions are made for them, and in this form the problem can be solved explicitly. For example, you can prove that some algorithm solves the problem in an optimal way.
Science-intensive methods are those methods behind which there is some non-trivial proof, and they work. Some of them are reproducible in practice and have practical use, and some turn out to be too model. In theory, the method works, but in practice it is not applicable, simply because. that the assumptions around which a mathematical or computer problem was proved are not reproducible in reality.
We can say that in one corner there are just purely theoretical problems that are not used in practice. And in another, there are data processing tasks, behind which there is no special science: you can simply take the data, group it, calculate the average, sort it and draw an analytical conclusion. Somewhere at the junction lie the very methods that are called “science-intensive”.
Can you tell us with specific examples what analysts do?
An analyst is someone who can look at data more broadly, build and test hypotheses. The task of analysts is to bring practical value and seek new knowledge in the data. Humans have been analyzing data even before the advent of computers.
For example, the Japanese engineer and statistician Genichi Taguchi developed the concept of “quality engineering” back in the 1940s. Within the framework of this idea, he statistically analyzed production data, conducted experiments and significantly reduced costs, while improving product quality. Subsequently, his methods were implemented in the production optimization of the Ford Motor Company.
I will give one example of an analyst’s task from my practice. Yandex conducted an experiment by adding new elements to search results – pictures in snippets. At the same time, users sometimes began to click less often and solve their problems, the metrics worsened. A dozen hypotheses could be put forward as to why this happened.