Conversations on machine learning

🌱 conversations (any format) + prompts for conversation/contemplation 🌱

Researchers

Geoffrey Hinton

2024.11.29 Geoff Hinton - Will Digital Intelligence Replace Biological Intelligence? | Vector's Remarkable 2024 - YouTube

2024.05.20 Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition - YouTube

  • multiple timescales: temporary changes to the weights that depend on input data (fast weights) conflicts with parallelization (multiple inputs processed concurrently for efficient training)
    • graph call
    • sequential / online learning would be needed; may be solved when conductances are used for weights

2023.06.05 Geoffrey Hinton - Two Paths to Intelligence - YouTube

2023.01.16 Geoff Hinton | Mortal Computers - YouTube

  • upload a mind to a computer? Hinton intends to explain why this won’t be possible
  • standard computing: permanent memory, precise computation
    • ML: weights same for every copy of the model {because the underlying infrastructure is the same}
    • aka immortal computing
    • built on transistors which consume a lot of power
    • expensive for two reasons: power consumption, fabrication precision
      • manufacturing plant costs O($xB)
      • {total power consumption = [power per transistor] x [N transistors / chip] x [N chips / zPU] x [N zPU / compute unit] x [N compute units]}
  • low power analog hardware alternative {inputs/activity defined by voltages, model weights defined by capacitance (1/R), output is charge as an integral of current, I = V/R}
    • {is this low power simply because no transistors required?}
  • {relaxed fabrication specs: imprecise fab okay, shift the responsibility for reliable compute to the learning algorithm for the substrate}
    • {inherently memory limited because the system isn’t designed to be replicable and, by its nature, isn’t}
    • {‘growing’ vs manufacturing: GH doesn’t get into this, but my take is that the key difference is the process’ spurious variations; maybe it’s not noise, but some other distribution}
  • parallelism at the level of the weights: compute doesn’t need to be fast
    • {this implies that the same computational operation is applied to every compute element/neuron}
      • {the complexity is shifted away from the model structure to ? (neuron processing operation/transfer function, or ???)}
    • {standard DL leverages layer-level parallelism} Mortal computing
  • unlike artificial neural networks, can’t use backpropagation: to apply chain rule, need to fully specify the system
    • {this is an inverse problem situation: the processing system (model) doesn’t have a known analytical representation}
    • slow, noisy, high variance: weight perturbations -> measure effect -> multiply weights by effect -> new weights
      • also, not really parallelizable: applying weight perturations to batches increases variance even more
    • activity perturbation: evaluate random neuron input (activity) and use that to estimate the local gradient
      • adding noise to neurons not the weights so there are thousands of times less of them
      • better variance
      • works okay for, eg, MNIST
      • scaling to large nets is an issue
        • {why?}
        • one approach is to develop objective functions that work well with fewer parameters
        • eg: one objective function for a smaller subgroup of neurons (local objective functions)
        • {? isn’t this like having a bunch of narrower AIs side by side?}
  • memory-limited + low power consumption
  • knowledge transfer {(how?) or start from scratch?}
    • {if the latter: 1. how long to catch up? 2. timescale for keeping up? (i.e., knowledge latency: pragmatic limit vs target spec?)}
  • start by considering knowledge transfer within the system, between local patches
    • in computer vision, share knowledge across patches via convolution {(convolve each patch with some shared knowledge)} or transformers {(? why mentioned in the context of CV, specifically? link to attention)}
      • {works because all patches are built on the same substrate}
    • for imperfect and dissimilar substrates, use knowledge distillation
      • align local patches’ feature vectors (extract similarities, or transformations that make different feature vectors agree on a prediction)
      • this is useful as local patch (module) receptor fields (inputs?) can have a different number of sensors {(distinct measurements)} of different spacing {(resolution)} (no regular grid required)
  • knowledge transfer between mortal computers (distillation)
    • classification task: source / teacher provides answer plus relative probabilities of wrong answers
    • compute task: consensus of mortal computers on which additional outputs will {sufficiently represent the current knowledge-holder’s transfer function, transferring not just specific knowledge but how the origin (mortal) computer thinks}
  • black boxes: he’s embracing transfer functions for mortal computers
    • this is in contrast to explainable models {which are deconstructed into components and the model transfer function analysed in terms of components’ individual transfer functions}
  • computers that are more brain-like: neuromorphic hardware
    • don’t have this yet because a suitable general-purpose learning algorithm hasn’t yet been devised
    • the brain must have such a procedure, but it hasn’t been discovered yet

2014.11.07 Hinton: 2014 Reddit AMA

Yann LeCun

2024.10.18 Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun - YouTube

Francois Chollet

2024.06.11 Francois Chollet - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution - YouTube

Ilya Sutskever

2023.03.27 Ilya Sutskever - Building AGI, Alignment, Spies, Microsoft, & Enlightenment - YouTube

Andrej Karpathy

2024.09.05 No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla - YouTube

AI safety

2024.06.04 Leopold Aschenbrenner - 2027 AGI, China/US Super-Intelligence Race, & The Return of History - YouTube

AI: commercial development

2024.05.15 John Schulman - Reasoning, RLHF, & Plan for 2027 AGI - YouTube