Shabaz blog for Fast.ai Course

MNIST number classification from Scratch

2021-01-28T00:00:00+00:00

These notes are from Course-v4 of fastai and it differs a lot from the course-v3. In this course the have published a book call fastBook (available for free as Jupyter Notebook).

In this chapter we try to first diffrentiate nbetween 3 and 7. By cvarious techniques

Using pixel similarity (gives around 85% accuracy)
Train a Linear Model from scratch (95% acc)
1. Optimisize this using inbuilt fastai and Pytorch classes and fns
Create Simple neural (non-liner = ReLU) net with 3 layers (97% acc)
Use cnn_learner along resnet18 as base model (9% acc)

Opening and viewing a image as tensor

im3_path = threes[0]
im3 = Image.open(im3_path)
tesnor(img) #creates img to array of pixels
df = pd.DataFrame(im3_t[4:15,4:22])
df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys') #plots image as table of greys

Learn to invisiona a simple base model which you think will perform reasonably well and there compare you models model with it.
Other was is to search around for similiar problems solved by people and using those solutions with our dataset.

A list comprehension looks like this:
new_list = [f(o) for o in a_list if o>0]
This will return every element of a_list that is greater than 0, after passing it to the function f.

stack : stacks up individual tensors in a collection into a single tensor.

rank = the number of axes or dimensions of a tensor
shape = the size of each axes of a tensor

PyTorch already provides mean absolute differenc(L1 norm) root mean squared error (RMSE) (L2Norm) as loss functions. You’ll find these inside torch.nn.functional, which the PyTorch team recommends importing as F (and is available by default under that name in fastai):

F.l1_loss(a_3.float(),mean7)
F.mse_loss(a_3,mean7).sqrt()

NumPy is the most widely used library for scientific and numeric programming in Python. It provides very similar functionality and a very similar API to that provided by PyTorch; however, it does not support using the GPU or calculating gradients, which are both critical for deep learning.
This can be jagged = can be arrays of arrays, with the innermost arrays potentially being different sizes

Pytorch Tensor restriction is that a tensor cannot use just any old type—it has to use a single basic numeric type for all components. For example, a PyTorch tensor cannot be jagged.

Crash Course on Using Tensors

Create a Tesnor: pass a list (or list of lists, or list of lists of lists, etc.)

data = [[1,2,3],[4,5,6]]
arr = array (data) #numpy
tns = tensor(data) #pytorch tensor

Access:Tensors are 0 indexed
Use indexes
tns[1] is [4,5,6] #access row
tns[:,1] is [2,5] #acces colum
tns[1,1:3] is [5,6] #slice access 3 is excluded
You can use the standard operators such as +, -, *, /:
tns+1 = [[2,3,4],[5,6,7]]
Tensors have a type: and wil automatically change type ex multiplies with float
tns.type() = ‘torch.LongTensor’

Broadcasting. Pytorch will expand the tensor with the smaller rank to have the same size as the one with the larger rank. Broadcasting is an important capability that makes tensor code much easier to write.

Stochastic Gradient Descent (SGD)

To minise a function we have to towards negative slope = negative gradient.
In other words, the gradients will tell us how much we have to change each weight to make our model better.
To plot a fn use (fast_ai)
plot_function(fn, x_axis, ya_xis)
to mark a point
plt.scatter(-1.5, f(-1.5), color='black');

Calculating gradient aka derivative value
xt = tensor(3.).requires_grad_()
‘_’ at end of fn indicates inplace changes ( tell PyTorch that we want to calculate gradients with respect to that variable at that value. It is essentially tagging the variable, so PyTorch will remember to keep track of how to compute gradients of the other, direct calculations on it that you will ask for.)

def f(x): return x**2
xt = tensor(3.).requires_grad_()
yt = f(xt)
yt  # = tensor(9., grad_fn=<PowBackward0>)
yt.backward() #backward propagation = calculate_gradient
xt.grad # = tensor(6.)

Tthe derivative of x**2 is 2*x, and we have x=3, so the gradients should be 2*3=6, which is what PyTorch calculated for us!

We don’t directly use the derivative adjust our weigts instead we need move slowly. The rate of moving is given by learning rate. generally it’s between 0.001 and 0.1 . Picking it is more on art than math.
The weight adjustment is given by

w -= gradient(w) * lr

Picking a low learning rate is better because if you pick too high the the loss is get worse

params.grad.data will get the grad but wont calculate it

Example to calculate Gradient descent

A example os a role coster going over a hill

time = torch.arange(0,20).float()
speed = torch.randn(20)*3 + 0.75*(time-9.5)**2 + 1 #some radom quadratic function
plt.scatter(time,speed);

Steps of a gradient descent:

def f(t, params):
    a,b,c = params
    return a*(t**2) + (b*t) + c #some quadratic fn our guess
    
params = torch.randn(3).requires_grad_() #random weigths
preds = f(time, params)

Initial : loss = 152.6150, random weights (got lucky)

loss.backward()
lr = 1e-5
params.data -= lr * params.grad.data
params.grad = None
preds = f(time,params)
mse(preds, speed)

1st iteration: loss = 152.3409

    
params = torch.randn(3).requires_grad_() #random weigths
lr = 1e-5

def apply_step(params, prn=True):
    preds = f(time, params)
    loss = mse(preds, speed)
    loss.backward()
    params.data -= lr * params.grad.data
    params.grad = None
    if prn: print(loss.item())
    return preds

for i in range(100): apply_step(params)

After apply step for 100 iteration: loss =125.3798828125

![aafc7974bd070abd287fb723ee1447d1.png](https://build3r.github.io/fastai//images/42a3098589994bb499bacd81204940d3.png)

On every stem the guessed function changes as we are updating the params

### Summarizing Gradient descent:

![136f40a920e639139ada0cb589bb163d.png](https://build3r.github.io/fastai//images/2cb9308593f74007854e375869fd3d6b.png)

## MNIST Example

1.  Create training data set
    
    ```python
    train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28) #converting matrix to list
    #train_x.shape = torch.Size([12396, 784]) = 6131+6265, 28*28
    train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1)
    #train_x.shape,train_y.shape = (torch.Size([12396, 784]), torch.Size([12396, 1]))
    dset = list(zip(train_x,train_y))
    x,y = dset[0]
    #x.shape,y =. (torch.Size([784]), tensor([1]))
    ```
    
    `view(-1,28*28)` = change a list of matrices (a rank-3 tensor) to a list of vectors (a rank-2 tensor).  
    -1 is a special parameter to view that means "make this axis as big as necessary to fit all the data"  
    `unsqueeze` = Returns a new tensor with a dimension of size one inserted at the specified position.
    
2.  Create Training Data set
    
    ```python
    valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)
    valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)
    valid_dset = list(zip(valid_x,valid_y))
    ```
3.  Initialize weights
    
    ```
    def init_params(size, std=1.0):
        return (torch.randn(size)*std).requires_grad_() #random weights withe gradient required
    weights = init_params((28*28,1)) #data of 28*28, 1
    
    ```
    
    Note:  
    **Bias**: The function weights*pixels won't be flexible enough—it is always equal to 0 when the pixels are equal to 0 (i.e., its intercept is 0). You might remember from high school math that the formula for a line is y=w*x+b; we still need the b. We'll initialize it to a random number too:
    
    ```
    bias = init_params(1)
    #test one data
    (train_x[0]*weights.T).sum() + bias #.T = transpose is  because we are doing normal multiplicatin in testing not matrix mul
    # = tensor([-6.2330], grad_fn=<AddBackward0>)
    ```
4.  Do Prediction
    
    ### @ = matrix multiply
    
    ```python
    def linear1(xb): return xb@weights + bias
    preds = linear1(train_x)
    """ preds = tensor([[ -6.2330],
        [-10.6388],
        [-20.8865],
          [-15.9176],
        [ -1.6866],
        [-11.3568]], grad_fn=<AddBackward0>)"""
    ```

### Why we don't use accuracy as loss function

TLDR: small change in x ont change the prediction leading to a 0 gradient  
We have a significant technical problem here. The gradient of a function is its slope, or its steepness, which can be defined as rise over run—that is, how much the value of the function goes up or down, divided by how much we changed the input. We can write this in mathematically as: (y\_new - y\_old) / (x\_new - x\_old). This gives us a good approximation of the gradient when x\_new is very similar to x\_old, meaning that their difference is very small. But accuracy only changes at all when a prediction changes from a 3 to a 7, or vice versa. The problem is that a small change in weights from x\_old to x\_new isn't likely to cause any prediction to change, so (y\_new - y\_old) will almost always be 0. In other words, the gradient is 0 almost everywhere.

> S: In mathematical terms, accuracy is a function that is constant almost everywhere (except at the threshold, 0.5), so its derivative is nil almost everywhere (and infinity at the threshold). This then gives gradients that are 0 or infinite, which are useless for updating the model.

`torch.where(a,b,c)` == `[b[i] if a[i] else c[i] for i in range(len(a))]`

### Sigmoid fn

mnist_loss as currently defined such that it assumes that predictions are always between 0 and 1. We need to ensure, then, that this is actually the case!  
sigmoid(x) = 1/(1+e^-x)

def sigmoid(x): return 1/(1+torch.exp(-x))

Metric = what we care about  
Loss function = Similar to metric but behaves properly with gradients

5.  Defines loss function

```python
    def mnist_loss(predictions, targets):
    predictions = predictions.sigmoid()
    return torch.where(targets==1, 1-predictions, predictions).mean()

Mini Batches

We can do optimzation for single data set or the whole data set and take average in every step but it for whole dataset it take lot of time and for single it wouldn’t give much info.

So we compromise between two can calculate avg loss for few data items at a time. Which is called mini batch. Large the batch size more accurate the predictions but longer it will take to process. This also helps in parallelizingthe work in gpu.

We get better generalization if we can vary things during training. One simple and effective thing we can vary is what data items we put in each mini-batch. Rather than simply enumerating our dataset in order for every epoch, instead what we normally do is randomly shuffle it on every epoch, before we create mini-batches. PyTorch and fastai provide a class that will do the shuffling and mini-batch collation for you, called DataLoader.

When we pass a Dataset to a DataLoader we will get back many batches which are themselves tuples of tensors representing batches of independent and dependent variables

Gradient

    for x,y in dl:
    pred = model(x)
    loss = loss_func(pred, y)
    loss.backward() #auto gradient calculation
    parameters -= parameters.grad * lr

Step
Puttling it all together weights = init_params((28*28,1)) bias = init_params(1) #create traning data loader dl = DataLoader(dset, batch_size=256) xb,yb = first(dl) #xb.shape,yb.shape = (torch.Size(\[256, 784\]), torch.Size(\[256, 1\])) #create validation dataloader valid\_dl = DataLoader(valid\_dset, batch_size=256)
def calc_grad(xb, yb, model):
preds = model(xb)
loss = mnist_loss(preds, yb)
loss.backward()
calc_grad(batch, train_y[:4], linear1)
#weights.grad.mean(),bias.grad = (tensor(-0.0415), tensor([-0.2826]))
weights.grad.zero_()
bias.grad.zero_();

loss.backward actually adds the gradients of loss to any gradients that are currently stored. So, we have to set the current gradients to 0 first.
Train:
.data: we have to tell PyTorch not to take the gradient of this step too—otherwise things will get very confusing when we try to compute the derivative at the next batch! If we assign to the data attribute of a tensor then PyTorch will not take the gradient of that step.

def train_epoch(model, lr, params):
for xb,yb in dl:
    calc_grad(xb, yb, model)
    for p in params:
        p.data -= p.grad*lr
        p.grad.zero_()

Batch Accuracy

def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()

Validate Epoch

def validate_epoch(model):
    accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]
    return round(torch.stack(accs).mean().item(), 4)
``````python
validate_epoch(linear1) # 0.7175
lr = 1.
params = weights,bias
train_epoch(linear1, lr, params)
validate_epoch(linear1) #0.7313

for i in range(20):
    train_epoch(linear1, lr, params)
    print(validate_epoch(linear1), end=' ')
#0.891 0.9354 0.9491 0.9569 0.9608 0.9627 0.9637 0.9651 0.9661 0.9661 0.9671 0.9681 0.9695 0.9705 0.971 0.972 0.9735 0.9735 0.9735 0.9744 

Creating Optimizer

We now do what we did manually above with Pytorch inbuilt functionality the function linear1 = x@w + b is give by pytorch as nn.Linear

nn.Linear does the same thing as our init_params and linear together. It contains both the weights and biases in a single class.

linear_model = nn.Linear(28*28,1)

Data setup

import fastbook
fastbook.setup_book()
from fastai.vision.all import *
from fastbook import *

matplotlib.rc('image', cmap='Greys')
lr = 1e-5

path =Path( "../data/mnist_sample")#untar_data(URLs.MNIST_SAMPLE)
Path.BASE_PATH = path
threes = (path/'train'/'3').ls().sorted()
sevens = (path/'train'/'7').ls().sorted()
seven_tensors = [tensor(Image.open(o)) for o in sevens]
three_tensors = [tensor(Image.open(o)) for o in threes]
stacked_sevens = torch.stack(seven_tensors).float()/255
stacked_threes = torch.stack(three_tensors).float()/255
#validation Set
valid_3_tens = torch.stack([tensor(Image.open(o)) 
                            for o in (path/'valid'/'3').ls()])
valid_3_tens = valid_3_tens.float()/255
valid_7_tens = torch.stack([tensor(Image.open(o)) 
                            for o in (path/'valid'/'7').ls()])
valid_7_tens = valid_7_tens.float()/255

#create training data
train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28) #converting matric to list
train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1) #We need a label for each image. Well use 1 for 3s and 0 for 7s
dset = list(zip(train_x,train_y))
dl = DataLoader(dset, batch_size=256)

#create validation data set
valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)
valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)
valid_dset = list(zip(valid_x,valid_y))
valid_dl = DataLoader(valid_dset, batch_size=256)

#Loss function with sigmoid
def mnist_loss(predictions, targets):
    predictions = predictions.sigmoid()
    return torch.where(targets==1, 1-predictions, predictions).mean()

#gradient calculation
def calc_grad(xb, yb, model):
    preds = model(xb)
    loss = mnist_loss(preds, yb)
    loss.backward()

#calculate batch accuracy
def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()
#validating one epoch
def validate_epoch(model):
    accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]
    return round(torch.stack(accs).mean().item(), 4)

Basic Optimizer

class BasicOptim:
    def __init__(self,params,lr): self.params,self.lr = list(params),lr

    def step(self, *args, **kwargs):
        for p in self.params: p.data -= p.grad.data * self.lr

    def zero_grad(self, *args, **kwargs):
        for p in self.params: p.grad = None
		
lr = 1e-5
opt = BasicOptim(linear_model.parameters(), lr)

Train Epoch and model

def train_epoch(model):
    for xb,yb in dl:
        calc_grad(xb, yb, model)
        opt.step()
        opt.zero_grad()
def train_model(model, epochs):
    for i in range(epochs):
        train_epoch(model)
        print(validate_epoch(model), end=' ')
train_model(linear_model, 20)

fastai provides the SGD class which, by default, does the same thing as our BasicOptim:

With inbuilt optimizer SGD

linear_model = nn.Linear(28*28,1)
opt = SGD(linear_model.parameters(), lr)
train_model(linear_model, 20)

Using Learner:

fastai also provides Learner.fit, which we can use instead of train_model. To create a Learner we first need to create a DataLoaders, by passing in our training and validation DataLoaders:

dls = DataLoaders(dl, valid_dl)
learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,
                loss_func=mnist_loss, metrics=batch_accuracy)
learn.fit(10, lr=lr)

Adding a Nonlinearity

So we made a simple linear classifier. A linear classifier is very constrained in terms of what it can do.

def simple_net(xb): 
    res = xb@w1 + b1
    res = res.max(tensor(0.0)) #ReLU
    res = res@w2 + b2
    return res
w1 = init_params((28*28,30))
b1 = init_params(30)
w2 = init_params((30,1))
b2 = init_params(1)

w1 and w2 are weight tensors, and b1 and b2 are bias tensors;

plot_function(F.relu)

Using Inbuilt fns

simple_net = nn.Sequential(
    nn.Linear(28*28,30),
    nn.ReLU(),
    nn.Linear(30,1)
) 
#weights and biases are auto initialised

Using Simple net

learn = Learner(dls, simple_net, opt_func=SGD,
                loss_func=mnist_loss, metrics=batch_accuracy)
learn.fit(40, 0.1)

|epoch | train_loss | valid_loss | batch_accuracy | time |—|—|—|—|–| |0| 0.313009| 0.408768| 0.507360| 00:00 |1| 0.146016| 0.232771| 0.798822| 00:00

Final Accuracy = 0.982826292514801 after 40 epochs

learn.recorder.values
[(#3) [0.31300875544548035,0.4087675213813782,0.5073601603507996],
 (#3) [0.14601589739322662,0.23277133703231812,0.7988224029541016],

the training process is recorded in learn.recorder, with the table of output stored in the values attributes

plt.plot(L(learn.recorder.values).itemgot(2)); #get batch accuracy

Visualize the weights

w,b = learn.model[0].parameters()
show_image(w[0].view(28,28))

Using full FastAi tool kit

dls = ImageDataLoaders.from_folder(path)
learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)

----------------------------------------------------------
epoch 	train_loss 	valid_loss 	accuracy 	time
0 	0.182440 	0.030915 	0.996075 	00:06

In Just 1 epoch we got accuracy of 0.996075 with power of inbuit learners and transfer learning from resnet 18

Deep learning vocabulary

Term	Meaning
ReLU	Function that returns 0 for negative numbers and doesn’t change positive numbers.
Mini-batch	A small group of inputs and labels gathered together in two arrays. A gradient descent step is updated on this batch (rather than a whole epoch).
Forward pass	Applying the model to some input and computing the predictions.
Loss	A value that represents how well (or badly) our model is doing.
Gradient	The derivative of the loss with respect to some parameter of the model.
Backward pass	Computing the gradients of the loss with respect to all model parameters.
Gradient descent	Taking a step in the directions opposite to the gradients to make the model parameters a little bit better.
Learning rate	The size of the step we take when applying SGD to update the parameters of the model.

Activations:: Numbers that are calculated (both by linear and nonlinear layers) Parameters:: Numbers that are randomly initialized, and optimized (that is, the numbers that define the model)

Lesson 3 Data blocks and Activation Functions

2020-07-20T00:00:00+00:00

Try out Andrew Ns Bottom up style Machine learning course on Coursera link

Kaggle has a Python based tool to download its data pip install kaggle --upgrade

Pandas

It is the standard to accessing tabular data

Ex : This shows the header and first few rows of a csv

df = pd.read_csv(path/'train_v2.csv')
df.head()

Data Block API

https://docs.fast.ai/data_block.html

Dataset just defines getItem and len which has to be implemented by all types of datasets

Creating data set ex: here src: DataSet

src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg')
       .split_by_rand_pct(0.2)
       .label_from_df(label_delim=' '))

DataLoader is used to create minibatches where you specify the batch size and dataset.

DataBunch binds together a training_data_loader and a valid_data_loader with optionally a test_data_loader.

Creating dataloader follwed by data bunch in one go ex:

data = (src.transform(tfms, size=128)
        .databunch().normalize(imagenet_stats))

COCO is the famous dataset for object detection.

Transforms will flip the image horizontally at randomly so make data better by default but some you ight need to flip them vertically too.

Ex tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)

f_score: it is a single number which is used measure false positive, false negative etc, there are diffrent types like f1, f2. Kaggle uses f2 score to judge.

Arg_max = the class with highest accuracy.

Accuracy_thresh is used when th classification can lead to more than one class. we can pass minimum threshold to it.

Partial function: a same function with ceratin parameter is fixed.

somefn(x1, x2)
somefn_x2_5 = somefn(x2, x2 = 5) #is same as
partial(somefn, x2=5)

Transfer learning: In the planet ex we initially used 128x128 now we can use the same learner and create a new data bunch with 256x256 and learn on existing learner which will keep things which were already learned and will add more to it.

Choosing learning rate:
Most of the initial lr looks the same, after the first set of fitting we try to chose lr which is 10x back from the point when it is shooting up and run it from taht to the the initial lr.

Collection of data sets: https://course.fast.ai/datasets

Segmentation

we use Learner.create_unet which us better model for segmentation than cnn.

Learner.create_unet(****).to_fp16 mixed flosting point precision. By deafult everything uses 32 bit precision for Floating points. But fastai library supports mixed precision which can lead to faster training time and sometimes to better results. Given that you have latest graphics card that supports it and also the latest drivers.

Face Center example

regression = when output is some continous number.

NLP Classification

We directly process text as it is not number so we need to convert it. This is done in two differents steps: tokenization and numericalization.

Tokenization

The first step of processing we make the texts go through is to split the raw sentences into words, or more exactly tokens. The easiest way to do this would be to split the string on spaces, but we can be smarter:

we need to take care of punctuation
some words are contractions of two different words, like isn’t or don’t
we may need to clean some parts of our texts, if there’s HTML code for instance

In text classification we do two steps

Language model
Classification model

What is Deep Learning

Basically

(Input Matrix X Weigth Matrix -> Activation Fn) -> (Input Matrix X Weigth Matrix -> Activation Fn) ->(Input Matrix X Weigth Matrix -> Activation Fn)…… See how close it is to target (minimize loss fn)

Update Weight matrix using gradient descent the Repeat thats it.

Use this learn more http://neuralnetworksanddeeplearning.com

Activation fn

It the function used on the output after matrix multipliction.

Initially we used to use Binary step, logistic, TanH, ArcTan etc

But nowadays everyone uses Rectified Liner Unit (ReLU) which is nothing but max(X,0) that zeroing out all negatives.

From Course v4

Script to Run a Train a Model and Predict End to End

from fastbook import *
from fastai.vision.widgets import *
#Assuming Data is downloaded
bear_types = 'grizzly','black','teddys'
path = Path('../data/bears')
#setup a data block valid_pct=0.2 means use 20 % data as validation 
bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))
#modify the data block to use RandomResizedCrop and augmentations
bears = bears.new(item_tfms=RandomResizedCrop(5, nrows=1)(128), batch_tfms=aug_transforms())
#load data
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)
#create convolution neural network with resnet18 as the base model
learn = cnn_learner(dls, resnet18, metrics=error_rate)
#fine tune/use transfer learning for 4 epochs
learn.fine_tune(4)
#generate a interpretation
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
#plot top 5 losses or high confident wrong answers
interp.plot_top_losses(5, nrows=1)
#save data as pkl file
learn.export('../data/bears/new_bear.pkl')
#load saved model for inference
learn_inf = load_learner(path/'new_bear.pkl')
#predic a given image
learn_inf.predict('../data/bears/grizzly/00000068.jpg')

Ethics

Recommendation systems are predicting what content people will like, but they also have a lot of power in determining what content people even see.

Lesson 2 Running you first classifier

2020-07-16T00:00:00+00:00

Old models wont work as Fastai introduced a breaking change pin lib in diffrent env

fastai == 2.0.3
fastcore == 1.0.4
pytorch == 1.4.0

Downloading data from Google Images: use the script go get all URLS run it in browser console

  urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl'));
  window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

Creating validation set automatically from data
new API is DataBlock = template for creating a `DataLoaders````python
bears = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=Resize(128))
dls = bears.dataloaders(path)

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

np.random.seed(42) is always set to same number here so that validation set remains same and improve our model independant of data changes.

We also use RandomResizedCrop to use a n random crop in every epoch

Tips for choosing learning rate: Look for the range where the accuracy is imroving steadily (not very steep).
ex here the error rate decrese is steady between &10^-5& and &10^-4& so we use
learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))
![learn_rate]

All data sets have x and y
x = file name
y = labels
You can create application with UI inside Jupyter notebooks.
open_image(path) fn to open any image
learn.export() exports the trained model to export.pkl its ~99MB
Starlette is web app toolkit similar to flas which has async/await style
Trainining loss > validation loss means data is not properly fitted
Too many epochs leads to overfitting. It happens when error rate seems to be going down tthen in subsequents epochs start going up.

Lesson 2: SGD

Stochastic Gradient Descent (SGD)

Tensor in deeplearning it is nothing but a array ( 1 or more dimensions) which is of regular shape (all rows are of equal size)
Rank: number of dimensions

x = matrix/ vector = tensor
a = matrix/ vector = tensor
y = x@a = matrix product

Mean squared Error MSE is the common error fn for regression
def mse(y_hat, y): return ((y_hat-y)**2).mean()
y_hat or $\hat{y}$ is predicted value and y is the actual

tensor(-1.,1) is eqivalent to tensor(-1.0,1.0) we do this because we want everything to be float

In this we are trying to fit a line of the form
y = mx +c which we convert to
$y_i$ = $a_1x_{1i} + a_2x_{2i}$ where $x_2$ = 1
We first try to gues $a_1,a_2$ and calculate MSE. we then try minimise MSE
We can do that either try to keep guessing by changing $a_1,a_2$ up and down and calculate MSE and see where it reduces or we can use Maths in which a derivative calculates the same thing
Gradient/Derivative~ tells us whether moving it up or down $a_2$ or tilting it up or down $a_1$ make it better i.e change the MSE.

Gradient descent is an algorithm that minimizes functions. Given a function defined by a set of parameters, gradient descent starts with an initial set of parameter values and iteratively moves toward a set of parameter values that minimize the function. This iterative minimization is achieved by taking steps in the negative direction of the function gradient.

mini-batches

In this ex we calculated loss for all the points in practice it willnot possible as tehre will be too mnay points when the rank/complexity increases.
So we use Mini batches for every epoch/iteration we calculate loss from a random subset of points

Model or Architecture is nothing but the mathematical fn here it is y = mx + c or $y = a_1x_1 + a_2x_2$

Parameter are the coefficients or weights are the number which we are trying to find here they are $a_1 and\ a_2$

How (and why) to create a good validation set? link

Lesson 1 Using Jupyter Notebook

2020-07-15T00:00:00+00:00

Basic Jupyter note book command and fast.ai library functions

Lesson 1

To auto reload

%reload_ext autoreload
%autoreload 2
%matplotlib inline

?function-name: Shows the definition and docstring for that function
??function-name: Shows the source code for that function help(fn_name) fast.ai helper fn like man
doc(fn_name) gives the details informations and links to docs Line magic star with ‘%’%timeit [i+1 for i in range(1000)] : Runs a line ten thousand times and displays the average time it took to run it.
%debug: Allows to inspect a function which is showing an error using the Python debugger. path.ls() Helper to do ‘ls’ on Path(python data type) object
You can append to path object like path/’images’

data = ImageDataBunch.from_name_re(path_img, fnames, pat, 
            ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

size=224 : standard image size because of GPU limitation
Return a data bunch object which will used in most of fast.ai course get_transforms
Tranforms the image to square and also does centre cropping and a lot more normalize(imagenet_stats)
Normalises the color RGB so that model can train well, if model is not working well try to normalise. data.classes = lists all the classes (ex different breeds) of data
data.c = lists the number of classes = len(data.classes) learn = cnn_learner(data, models.resnet34, metrics=error_rate)
cnn_learner = a type of convolution neural network
Resnet34 is type of model/architecture. Works almost every time two major ones resnet_34 and resnet_50 number denotes the layer size metrics = things to print can be error_rate or accuracy Tab to auto complete

machine_learning

Lesson 0 - Setting up fastai course and gotchas

2020-07-14T00:00:00+00:00

Setting up conda, gothchas if jupyter notebook and basic of markdown

Updating

We use Conda to manage the Jupyter notebook and Fastai libraries

Update conda conda update conda
List all the libs installed conda list | grep "fastai"
Update fastai conda install -c fastai fastai

Lesson 0 Jupyter Notebooks

Command mode blue :blue_book:
Edit mode green :green_book:

B to create new cell
S to save
m convert cell to mark down in command mode
y convert cell to code mode
A to insert above
b to insert below
c copy
X cut
Z undo

Crtl + shift + - spilt cell at cursor
Shift + enter to run
There is a inbuilt terminal which you can launch from man directory (laggy with tunnelling)

Markdown
- ‘*’ italics
- ‘**’ bold
- ‘`’ inline
- ‘>’ block quote
- [text](link) hyperlink

link with ‘[]’ and place the link adjacent to the text, surrounded with ‘()’