JoshuaRodriguez

JoshuaRodriguez

0-day streak
Reading: Deep Learning - John D. Kelleher There are two ways to represent data - using a localist representation, or a distributed representation. A localist representation involves mapping each concept (word type, colour, country, etc) with a specific neuron in the network. e.g: A French, yellow cat may be represented as 110, where the 1st bit represents the country, the 2nd but represents the colour, and the 3rd but represents the animal. Whilst this means that it's easy to understand what a neuron activation means, because each neuron corresponds directly to a concept, we can easily pack more information into our neurons. Imagine we had 3 different colours we needed to encode for - we'd need to use 2 neurons, which can show 4 different activations themselves. So, we're losing 1 neuron activation pattern! A distributed representation doesn't allocate each neuron/set of neurons a specific concept - you just PACK THEM ALL IN. e.g: yellow+cat may be 000, and blue+dog may be 001. None of the neurons in this specifically stąd for anything, but that doesn't matter for a neural network - as long as the network's able to decipher it's representation at the end.
https://scrapbook-into-the-redwoods.s3.amazonaws.com/ea189bf2-b079-4816-aec7-3f9a23baf942-img_20240615_065245.jpg
Currently Reading: Deep Learning - John D. Kelleher Transfer learning is the idea that neural networks don't need to be trained from scratch every single time we have a new problem to solve - we can instead just create a general network, which solves the problem we're trying to solve pretty well, and then train it on a small dataset of information specific our problem. One way that BERT (A new type of language processing neural network) utilises transfer learning is through self-supervised learning. The model takes a dataset of unlabelled sentences (so, we haven't quality checked it, or made our own questions to train the model on, or anything like that), hides part of the sentence, and tries to guess what should fill it in - then updates the network accordingly. Then, the network is trained very specifically on a small dataset of sentences relating to medical diagnosis, natural speech, jokes, etc etc. The interesting thing about BERT's use of transfer learning is that it's not just using it to decrease the energy required to train future language processing networks - it's also doing it so that it doesn't need a large labelled dataset - it just needs a whole lot of unfiltered human garbage words, and then a small set of filtered, specific words. This makes data collection for training a neural network so much quicker!
https://scrapbook-into-the-redwoods.s3.amazonaws.com/e04eea67-2ec5-4afc-ae02-375699ef0793-img_20240613_192750.jpg