He described Newton's method wrong. Newton's method is exactly like gradient decent except it assumes the slope continues indefinitely at the same angle and makes a jump to where it calculates it will hit zero. If anything gradient decent is an adaptation of Newton's method that takes it slower because NM isn't completely stable for multi-variable systems or high degree systems (multi-layer). And it can have oscillatory behavior. Technically so can gradient decent but you should tune the learning rate so it so it mostly doesn't.
If you have something like Newton's method that users a higher degree than one it's called Halley's method or Muller's method for second degree depending on how you set it up. There are names for the even higher order methods but I can't remember their names.
He described Newton's method wrong. Newton's method is exactly like gradient decent except it assumes the slope continues indefinitely at the same angle and makes a jump to where it calculates it will hit zero. If anything gradient decent is an adaptation of Newton's method that takes it slower because NM isn't completely stable for multi-variable systems or high degree systems (multi-layer). And it can have oscillatory behavior. Technically so can gradient decent but you should tune the learning rate so it so it mostly doesn't.
If you have something like Newton's method that users a higher degree than one it's called Halley's method or Muller's method for second degree depending on how you set it up. There are names for the even higher order methods but I can't remember their names.