Machine Learning Basics
About 617 wordsAbout 2 min
2026-04-16
What this lesson solves
The previous lesson explained what AI is. This lesson explains how a model learns. Once you understand inputs, labels, training, validation, and generalization, later discussions about large models, agent evaluation, or onchain risk systems become much easier to reason about.
A minimal workflow
Most Machine Learning workflows can be reduced to five steps:
- Define the task: what exactly should the system predict?
- Prepare the data: collect inputs and add labels when needed.
- Train the model: adjust parameters to reduce error.
- Validate the model: check performance on unseen data.
- Deploy and iterate: connect the model to real usage and monitor it.
Supervised vs. unsupervised learning
Supervised learning
Supervised learning uses labeled data.
Examples:
- an email labeled as spam or not spam
- a transaction labeled as fraudulent or normal
The model learns the relationship between the input and the desired answer.
Unsupervised learning
Unsupervised learning does not rely on explicit labels. It tries to discover structure inside the data.
Examples:
- grouping users into similar clusters
- identifying unusual transaction patterns
Training set, validation set, and test set
These datasets serve different roles and should not be mixed.
Training set
The training set is used to learn model parameters. It contains the examples the model directly sees during learning.
Validation set
The validation set helps compare approaches and tune settings during development.
Test set
The test set should be saved for the end. It is not a development aid. It is the final check on whether the model generalizes well.
What overfitting means
Overfitting means the model has learned the training data too specifically and performs worse on new data.
Typical signs include:
- very high training accuracy
- clearly worse validation or test performance
That usually means the model learned noise in addition to useful patterns.
Features, labels, and objectives
Features
Features are the inputs available to the model, for example:
- wallet age
- transaction frequency
- asset balance changes
- keywords in text
Labels
Labels are the answers the model is expected to predict.
Objective function
The objective function measures how wrong the model currently is. Training is the process of reducing that error.
Why data quality matters more than model branding
Many projects do not fail because the model is outdated. They fail because the data pipeline is weak:
- labels are inconsistent
- the dataset does not match real-world usage
- sampling is biased
- the training data is stale
This becomes even more visible in Web3. Onchain data is public, but it is not automatically ready for learning. The task still has to be defined, the noise has to be cleaned, and useful features have to be constructed.
A practical AI × Web3 example
Suppose you want to identify whether an address behaves like an airdrop hunter. A simple supervised setup might look like this:
- input features: number of protocols used, interaction frequency, bridge count, gas behavior, wallet age
- label: whether analysts previously classified the address as an airdrop hunter
- output: the probability that the address belongs to that class
The hard part is rarely the model choice. The hard part is:
- defining what “airdrop hunter” means
- constructing trustworthy labels
- avoiding false positives on normal users
Minimum takeaway
After this lesson, you should be able to explain:
- the difference between supervised and unsupervised learning
- why training, validation, and test sets should not be mixed
- what overfitting means
- why data quality often matters more than the model name
What comes next
The next lesson introduces Deep Learning. The focus is not mathematical derivation, but understanding why neural networks became so powerful and how modern large models fit into that story.