First of all, thanks for reading this post.
I know when it comes to machine learning, and I'm trying to use ML to classify some data. Now I have done some basic reading on controlled and unsupervised learning algorithms such as decision trees, clustering, neural networks..etc.
What I'm trying to understand is the correct general procedure for preparing datasets for an ML problem.
How to prepare a data set for ML so that I can measure the accuracy of algorithms?
My real understanding is that in order to evaluate the accuracy, the algorithm must be provided with pre-labeled results (from a significant subset of the data set?) In order to assess the difference between the expected result and the solution of the algorithm?
If this is correct, then how to pre-mark large datasets? My dataset is quite large and manual marking is not possible.
In addition, any tips on getting started with machine learning in Python would be greatly appreciated!
Thank you so much for your help in advance!
Yours faithfully,
Mike
python statistics machine-learning data-analysis
Mike
source share