What is a good data model?

The criteria that define a good data model are: It is intuitive. Its data can be easily consumed. The data changes in it are scalable. It can evolve and support new business cases. A good data model is one that effectively represents the underlying data in a structured and meaningful way, facilitating efficient storage, retrieval, … Read more

What is the difference between true positive rate and recall?

There is no difference, they are the same, with the formula: (true positive)/(true positive + false negative) In the context of data analytics and machine learning, “true positive rate” (TPR) and “recall” are two terms often used interchangeably, but they represent slightly different concepts: True Positive Rate (TPR): True Positive Rate is also known as … Read more

What is K-means algorithm?

Kmeans algorithm partitions a data set into clusters such that a cluster formed is homogeneous and the points in each cluster are close to each other. The algorithm tries to maintain enough separation between these clusters. Due to the unsupervised nature, the clusters have no labels. For a data analytics interview question asking about the … Read more

Explain what you do with suspicious or missing data?

When there is a doubt in data or there is missing data, then: Make a validation report to provide information on the suspected data. Have an experienced personnel look at it so that its acceptability can be determined. Invalid data should be updated with a validation code. Use the best analysis strategy to work on … Read more

Why is KNN used to determine missing numbers?

KNN is used for missing values under the assumption that a point value can be approximated by the values of the points that are closest to it, based on other variables. Using KNN (K-Nearest Neighbors) to determine missing numbers in a dataset is not a conventional or standard approach. KNN is primarily used for classification … Read more