Training Neural Networks

728x90

CS231n Lecture 7. Training Neural Networks 2 참조

1. Data Preprocessing

1) Normalization

Before normalization : classification loss is very sensitive to weight matrix changes; hard to optimize

After normalization : classification loss is less sensitive to weight matrix changes; easier to optimize

2) Batch Normalization

Layer 연산 결과를 Normalize 함. Normalize 파라미터도 학습할 수 있음.

3) Babysitting Learning

4) Hyperparameter Search

Coarse to fine search : 대체적으로 Hyperparameter를 조절하여 경향성을 파악한 후 미세하게 조절

2. Optimization

# Vanilla Gradient Descent

while True:
	weights_grad = evaluate_gradient(loss_fun, data, weights)
	weights += - step_size * weights_grad # perform parameter update

Stochastic Gradient Descent의 문제점

1) Gradient Noise : 수렴지점까지 지그재그 형태로 수렴할 수 있다. 불필요한 과정이 추가되는 것으로 시간이 오래 걸린다. Dimensiondl 높을수록 지그재그 문제는 더 오랜 시간을 발생시킨다.

2) Local minima, saddle point : zero gradient, gradient descent gets stuck

해결 방법

Add Momentum term

1) SGD + Momentum

# Add Momentum term
vx = 0
while True:
	dx = compute_gradient(x)
	vx = rho * vx + dx
	x += learning_rate * vx

2) Nestrov Momentum

3) AdaGrad

4) RMSProp

5) Adam = Momentum + AdaGrad + RMSProp (가장 성능이 좋음)

위 모든 방법은 Learning rate를 하이퍼파라미터로 가지고 있음.

728x90

'개념공부 > AI, 머신러닝 등' 카테고리의 다른 글

[논문 리뷰] Vector Net: Encoding HD maps and agent dynamics from vectorized representation (0)	2022.11.06
[논문 리뷰] Safetynet: Safe planning for real-world self-driving vehicles using machine-learned policies (0)	2022.11.03
[CS231n] Lecture3. Loss Functions and Optimization 정리 (0)	2020.10.06
[CS231n ] Lecture 2. Image Classification 정리 (0)	2020.10.06
Anaconda 1 < 가상 환경 생성, 삭제, 실행> (0)	2020.09.04

Swimmer

Training Neural Networks

'개념공부 > AI, 머신러닝 등' 카테고리의 다른 글

티스토리툴바

Training Neural Networks

'개념공부 > AI, 머신러닝 등' 카테고리의 다른 글

관련글

티스토리툴바