<hr>
# [Collective Knowledge](http://cknowledge.org): Community-driven benchmarking and optimization of computing systems - from classical to quantum
<hr>

### [Artificial Intelligence and Machine Learning](http://cknowledge.org/ai)
  - [ACM Reproducible Quality-Efficient Systems Tournaments](http://cknowledge.org/request) ([ReQuEST initiative](http://cknowledge.org/request.html#organizers))
  - [AI artifacts](http://cknowledge.org/ai-artifacts) ([cTuning foundation](http://ctuning.org))
  - [Android app](https://play.google.com/store/apps/details?id=openscience.crowdsource.video.experiments) (dividiti)
  - [Desktop app](https://github.com/dividiti/ck-crowdsource-dnn-optimization) (dividiti)
  - [CK-Caffe](https://github.com/dividiti/ck-caffe) (Berkeley)
  - [CK-Caffe2](https://github.com/ctuning/ck-caffe2) (Facebook)
  - [CK-CNTK](https://github.com/ctuning/ck-cntk) (Microsoft)
  - [CK-KaNN](https://github.com/ctuning/ck-kann) (Kalray)
  - [CK-MVNC](https://github.com/ctuning/ck-mvnc) (Movidius / Intel)
  - [CK-MXNet](https://github.com/ctuning/ck-mxnet) (Apache)
  - [CK-NNTest](https://github.com/ctuning/ck-nntest) (cTuning foundation)
  - [CK-PyTorch](https://github.com/ctuning/ck-pytorch) (Facebook)
  - [CK-TensorFlow](https://github.com/ctuning/ck-tensorflow) (Google)
  - [CK-TensorRT](https://github.com/ctuning/ck-tensorrt) (NVIDIA)
  - [CK-TVM](https://github.com/ctuning/ck-tvm) ([University of Washington](https://tvm.ai/about))
  - etc.


<hr>
# CK-NNTest: Collaboratively validating, benchmarking and optimizing neural net operators across platforms, frameworks and datasets
<hr>

## Table of Contents

1. [Overview](#overview)
1. [Platforms](#platforms)
  1. [Firefly RK3399](#platforms_firefly)
1. [Experimental data](#data) [for developers]
1. [Data wrangling code](#code) [for developers]
1. [MobileNets-v1-1.0-224 ("baseline")](#analysis_mobilenets_baseline)
  1. [Experimental setup](#analysis_mobilenets_baseline_setup)
  1. [All experiments](#analysis_mobilenets_baseline_experiments_all)
  1. [Failed experiments](#analysis_mobilenets_baseline_experiments_failed)
  1. [Plot by platform (microseconds)](#analysis_mobilenets_baseline_plot_platform_us)
  1. [Plot by platform (GFLOPS)](#analysis_mobilenets_baseline_plot_platform_gflops)
  1. [Plot speedup](#analysis_mobilenets_baseline_plot_speedup)
1. [MobileNets-v1-1.0-224 ("baseline"): profiler](#analysis_mobilenets_baseline_profiler) **TODO**
1. [MobileNets-v1-0.75-160 ("reduced")](#analysis_mobilenets_reduced) **TODO**

<a id="overview"></a>
## Overview

This Jupyter Notebook studies performance (execution time) of NN operators on the following platforms:
- [Firefly RK3399](http://en.t-firefly.com/en/firenow/Firefly_RK3399) (`firefly`).

<a id="platforms"></a>
## Platforms

<a id="platforms_firefly"></a>
### T-Firefly RK3399

  - Chip:
    - [Rockchip RK3399](http://rockchip.wikidot.com/rk3399)
  - CPU ("big"):
    - ARM&reg; Cortex&reg;-A72 architecture
    - Max clock 1800 MHz;
    - 2 cores;
  - CPU ("LITTLE"):
    - ARM&reg; Cortex&reg;-A53 architecture;
    - Max clock 1416 MHz;
    - 4 cores;
  - GPU:
    - ARM&reg; Mali&trade;-T860 architecture;
    - Max clock 800 MHz;
    - 4 cores;
    - OpenCL driver:
```
$ ck run program:tool-print-opencl-devices | grep "OpenCL C version:"
OpenCL C version: OpenCL C 1.2 v1.r13p0-00rel0-git(a4271c9).31ba04af2d3c01618138bef3aed66c2c
```

  - RAM:
    - Samsung dual-channel DDR3;
    - 4 GB;
  - BSP:    
    - [Firefly-rk3399_xubuntu1604_201711301130.7z](https://drive.google.com/drive/u/0/folders/1lbaR7XVyHT4SnXkJ2ybj5YXAzAjDBWfT)
    
```
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
$ uname -a
Linux firefly 4.4.77 #554 SMP Thu Nov 30 11:30:11 HKT 2017 aarch64 aarch64 aarch64 GNU/Linux
```

In [None]:
firefly_model = 'Rockchip RK3399 Firefly Board (Linux Opensource)\x00'
firefly_name  = 'Firefly RK3399'
firefly_id    = 'firefly'
firefly_gpu   = 'Mali-T860 MP4'
firefly_gpu_mhz = 800

### Platform mappings

In [None]:
model_to_id = {
    firefly_model : firefly_id,
}
id_to_name = {
    firefly_id : firefly_name,
}
id_to_gpu = {
    firefly_id : firefly_gpu,
}
id_to_gpu_mhz = {
    firefly_id : firefly_gpu_mhz,
}

<a id="data"></a>
## Get the experimental data

The experiments use packages, programs and scripts from the public [CK-NNTest](https://github.com/ctuning/ck-nntest) repository:
```
$ ck pull repo:ck-nntest
```

The Arm Compute Library variants of interest (`v18.05`) were installed on the experimental platforms as follows:
```
$ ck install ck-math:package:lib-armcl-opencl-18.05 \
--env.USE_GRAPH=ON --env.USE_NEON=ON --env.USE_EMBEDDED_KERNELS=ON
```

The experimental data was collected and archived on each platform as follows:
```
$ ck zip local:experiment:nntest*mobilenets-v1-1.0-224-<platform>* \
  --archive_name=ck-nntest-mobilenets-v1-1.0-224-<platform>.zip
```
(and then merged on a desktop machine into a single repository).

The data can be downloaded and registered with CK as follows:
```
$ wget https://www.dropbox.com/s/9lbp38v52y6mv85/ck-nntest-mobilenets-v1-1.0-224-firefly.zip
$ ck add repo --zip=ck-nntest-mobilenets-v1-1.0-224-firefly.zip --quiet
```

In [None]:
repo_uoa = 'ck-nntest-mobilenets-v1-1.0-224-firefly'

<a id="code"></a>
## Data wrangling code

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook.

### Includes

#### Standard

In [None]:
import os
import sys
import json
import re

#### Scientific

If some of the scientific packages are missing, please install them using:
```
# pip install jupyter pandas numpy matplotlib
```

In [None]:
import IPython as ip
import pandas as pd
import numpy as np
import matplotlib as mp
import seaborn as sb

In [None]:
print ('IPython version: %s' % ip.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('NumPy version: %s' % np.__version__)
print ('Matplotlib version: %s' % mp.__version__)
print ('Seaborn version: %s' % sb.__version__)

In [None]:
from IPython.display import Image, display
def display_in_full(df):
    pd.options.display.max_columns = len(df.columns)
    pd.options.display.max_rows = len(df.index)
    display(df)

In [None]:
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline

In [None]:
default_colormap = cm.autumn
default_figwidth = 24
default_figdpi = 200
default_fontsize = 16

In [None]:
if mp.__version__[0]=='2': mp.style.use('classic')
mp.rcParams['figure.dpi'] = default_figdpi
mp.rcParams['font.size'] = default_fontsize
mp.rcParams['legend.fontsize'] = 'medium'
mp.rcParams['figure.max_open_warning'] = 200

#### Collective Knowledge

If CK is not installed, please install it using:
```
# pip install ck
```

In [None]:
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)

### Access the experimental data

In [None]:
def get_experimental_results(repo_uoa='local', tags='nntest', profiling=False, skip_synthetic_dataset=True):
    module_uoa = 'experiment'
    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})
    if r['return']>0:
        print('Error: %s' % r['error'])
        exit(1)
    experiments = r['lst']

    dfs = []
    for experiment in experiments:
        data_uoa = experiment['data_uoa']
        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})
        if r['return']>0:
            print('Error: %s' % r['error'])
            exit(1)
        # Skip experiments if the tags are not in the expected format.
        skip = False
        library = None
        species = None
        # Library tags.
        library_prefix = 'arm-compute-library-'
        library_tags = [ tag[len(library_prefix):] for tag in r['dict']['tags'] if tag.startswith(library_prefix) ]
        if len(library_tags)==1:
            library = library_tags[0]
        else:
            skip = True
        # Species tags.
        species_tags = [ tag for tag in r['dict']['tags'] if tag in ['conv', 'fullyconnected', 'avgpool', 'softmax'] ]
        if len(species_tags)==1:
            species = species_tags[0]
        else:
            skip = True
        # Check if the experiment should be skipped.
        if skip:
            print('[Warning] Skipping experiment with tags:')
            print(r['dict']['tags'])
            continue
        for point in r['points']:
            point_file_path = os.path.join(r['path'], 'ckp-%s.0001.json' % point)
            with open(point_file_path) as point_file:
                point_data_raw = json.load(point_file)
            characteristics_list = point_data_raw['characteristics_list']
            num_repetitions = len(characteristics_list)
            platform = model_to_id[point_data_raw['features']['platform']['platform']['model']]
            # Shorten the Git hash to 7 symbols to unify across platforms.
            if platform=='hikey': # hikey_id
                if library_tags[0]=='request-d8f69c13':
                    library = 'opencl-18.03-d8f69c1-request'
                elif library_tags[0]=='opencl-18.05-0acd60ed-request':
                    library = 'opencl-18.05-0acd60e-request'
                else:
                    library = library_tags[0][:-1]
            batch_size = np.int64(point_data_raw['choices']['env'].get('CK_IN_SHAPE_N',-1))
            in_shape_n = np.int64(point_data_raw['choices']['env'].get('CK_IN_SHAPE_N',-1))
            in_shape_c = np.int64(point_data_raw['choices']['env'].get('CK_IN_SHAPE_C',-1))
            in_shape_h = np.int64(point_data_raw['choices']['env'].get('CK_IN_SHAPE_H',-1))
            in_shape_w = np.int64(point_data_raw['choices']['env'].get('CK_IN_SHAPE_W',-1))
            tuner = point_data_raw['choices']['env'].get('CK_LWS_TUNER_TYPE','NONE')
            program = point_data_raw['choices']['data_uoa']
            operator = program[:-len('-armcl-opencl')]
            dataset_uoa = point_data_raw['choices']['dataset_uoa']
            if skip_synthetic_dataset and dataset_uoa.find('synthetic')!=-1: continue
            dataset = point_data_raw['choices']['dataset_file']
            tensor = dataset[len('shape-'):]
            # Obtain column data.
            if profiling: # Obtain kernel time from profiling experiments.
                index = [
                    'platform', 'library', 'operator', 'tensor', 'batch_size', 'kernel', 'repetition_id'
                ]
                data = []
                if point_data_raw['choices'].get('dvdt_prof','')!='':
                    data = [
                        {
                            # features
                            'platform': platform,
                            'library': library,
                            'species': species,
                            # choices
                            'operator' : operator,
                            'tensor' : tensor,
                            'batch_size': batch_size,
                            'in_shape_n': in_shape_n,
                            'in_shape_c': in_shape_c,
                            'in_shape_h': in_shape_h,
                            'in_shape_w': in_shape_w,
                            # statistical repetition
                            'repetition_id': repetition_id,
                            # runtime characteristics
                            'kernel': kernel,
                            'time_us': time_us,
                            'dvdt_prof': characteristics['run'].get('dvdt_prof', {}),
                            'success?': characteristics['run'].get('run_success', 'n/a')
                        }
                        for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list)
                        for kernel, time_us in characteristics['run'].get('execution_time_opencl_us',{}).iteritems()
                    ]
                elif point_data_raw['choices'].get('mali_hwc','')!='':
                    data = [
                        {
                            # features
                            'platform': platform,
                            'library': library,
                            'species': species,                            
                            # choices
                            'operator' : operator,
                            'tensor' : tensor,
                            'batch_size': batch_size,
                            'in_shape_n': in_shape_n,
                            'in_shape_c': in_shape_c,
                            'in_shape_h': in_shape_h,
                            'in_shape_w': in_shape_w,
                            # statistical repetition
                            'repetition_id': repetition_id,
                            # runtime characteristics
                            'kernel': 'n/a',
                            'time_us': 0.0,
                            'mali_hwc': characteristics['run'].get('mali_hwc', {}),
                            'success?': characteristics['run'].get('run_success', 'n/a')
                        }
                        for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list)
                    ]
                else: # Skip non-profiling experiments.
                    continue
                # Deal with missing data (resulting from failed runs).
                if data==[]:
                    print('[Warning] Missing data for: '+
                          'platform=%s, dataset=%s, library=%s, batch_size=%d' %
                          (platform, dataset, library, batch_size))
                    print(point_file_path)
                    print
                    data = [
                        {
                            # features
                            'platform': platform,
                            'library': library,
                            'species': species,                            
                            # choices
                            'operator' : operator,
                            'tensor' : tensor,
                            'batch_size': batch_size,
                            'in_shape_n': in_shape_n,
                            'in_shape_c': in_shape_c,
                            'in_shape_h': in_shape_h,
                            'in_shape_w': in_shape_w,
                            # statistical repetition
                            'repetition_id': 0,
                            # runtime characteristics
                            'kernel': 'n/a',
                            'time_us': 0.0,
                            'success?': 'n/a'
                        }
                    ]
            else: # Obtain wallclock time from validation experiments.
                if point_data_raw['choices']['dvdt_prof']=='yes':
                    continue # Skip profiling experiments.
                data = [
                    {
                        # features
                        'platform': platform,
                        'library': library,
                        'species': species,                        
                        # choices
                        'tuner' : tuner,                        
                        'operator' : operator,
                        'tensor' : tensor,
                        'batch_size': batch_size,
                        'in_shape_n': in_shape_n,
                        'in_shape_c': in_shape_c,
                        'in_shape_h': in_shape_h,
                        'in_shape_w': in_shape_w,
                        # statistical repetition
                        'repetition_id': repetition_id,
                        # runtime characteristics
                        'time_us': 1e6*characteristics['run'].get('execution_time_kernel_1',0.0),
                        'success?': characteristics['run'].get('run_success', 'n/a')
                    }
                    for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list)
                ]
                index = [
                    'platform', 'library', 'operator', 'tensor', 'batch_size', 'tuner', 'repetition_id'
                ]
            # Construct a DataFrame.
            df = pd.DataFrame(data)
            # Calculate GFLOPS for conv and fullyconnected species. NB: 2 operations per element (multiply and accumulate).
            if species=='conv':
                flops = 2 * df['tensor'] \
                    .apply(lambda tensor : np.float64(tensor.split('-'))) \
                    .apply(lambda (in_C, H, W, K, out_C, stride, pad) : in_C*out_C*(W/stride)*(H/stride)*K*K) \
                    .values
            elif species=='fullyconnected':
                flops = 2 * df['tensor'] \
                    .apply(lambda tensor : np.float64(tensor.split('-'))) \
                    .apply(lambda (in_C, in_H, in_W, out_C, out_H, out_W) : (1, in_C*in_H*in_W, out_C*out_H*out_W)) \
                    .apply(lambda (M, K, N): M*K*N) \
                    .values
            else:
                flops = 0
            Gflops = 1e-9 * flops          # 1 Gflops == 1e+9 flops.
            time_s = 1e-6 * df['time_us']  # 1 second == 1e+6 microseconds.
            df['GFLOPS'] = Gflops / time_s # GFLOPS == Gflops per second.
            # Set index.
            df = df.set_index(index)
            # Append to the list of similarly constructed DataFrames.
            dfs.append(df) 
    if dfs:
        # Concatenate all thus constructed DataFrames (i.e. stack on top of each other).
        result = pd.concat(dfs)
        result = result.sort_index(level=result.index.names)
    else:
        # Construct a dummy DataFrame which success status can be safely checked.
        result = pd.DataFrame(columns=['success?', 'time_us', 'GFLOPS'])
    return result

### Plot the experimental data

#### Plot violin

In [None]:
# groupby_level: 'platform', 'operator', 'library' or 'kernel' (with dvdt-prof).
# Typically, when creating df_raw, we consider one of the following scenarios:
# - different platforms (e.g. hikey and mate), same library (e.g. v18.05), same operator (e.g. fullyconnected)
# - different operators (e.g. conv, directconv, winogradconv), same library (e.g. v18.05), same platform (e.g. mate)
# - different libraries (e.g. v18.03, v18.05), same operator (e.g. directconv), same platform (e.g. hikey)
def plot_violin(df_raw, groupby_level='operator', species=None, metric='time_us',
                title=None, figsize=None, fontscale=1.75, legend_loc='upper right',
                platform_id=firefly_id, gpu=firefly_gpu, gpu_mhz=firefly_gpu_mhz,
                xmin=None, xmax=None, xstep=None):
    # Get species.
    if not species:
        if(df_raw['species']=='fullyconnected').all():
            species = 'fullyconnected'
        elif(df_raw['species']=='conv').all():
            species = 'conv'
        elif(df_raw['species']=='avgpool').all():
            species = 'avgpool'
        elif(df_raw['species']=='softmax').all():
            species = 'softmax'
        else:
            print('Warning: unknown or mixed species')
    # Set depending on groupby_level. Drop batch_size (batch_size==1). TODO: support batch_size!=1.
    if groupby_level=='platform':
        tuples = [ (p,'%s;%s;%s'%(o,l,u),t,r) for (p,l,o,t,b,u,r) in df_raw.index.values ]
        names = [ 'platform', 'operator;library;tuner', 'tensor', 'repetition_id' ]
        hue_levels = 'operator;library;tuner'
        palette = 'summer'
    elif groupby_level=='operator':
        tuples = [ (o,'%s;%s;%s'%(p,l,u),t,r) for (p,l,o,t,b,u,r) in df_raw.index.values ]
        names = [ 'operator', 'platform;library;tuner', 'tensor', 'repetition_id' ]
        hue_levels = 'platform;library;tuner'
        palette = 'autumn'
    elif groupby_level=='library':
        tuples = [ (l,'%s;%s;%s'%(o,p,u),t,r) for (p,l,o,t,b,u,r) in df_raw.index.values ]
        names = [ 'library', 'operator;platform;tuner', 'tensor', 'repetition_id' ]
        hue_levels = 'operator;platform;tuner'
        palette = 'spring'
    elif groupby_level=='kernel': # no tuner
        tuples = [ (k,'%s;%s;%s'%(o,p,l),t,r) for (p,l,o,t,b,k,r) in df_raw.index.values ]
        names = [ 'kernel', 'operator;platform;library', 'tensor', 'repetition_id' ]
        hue_levels = 'operator;platform;library'
        palette = 'winter'
    else:
        print('Error: unsupported groupby_level=%s' % groupby_level)
        exit(1)
    # Create a new DataFrame using the index settings above.
    df_violin = pd.DataFrame(data=df_raw[metric].values, columns=[metric],
                             index=pd.MultiIndex.from_tuples(tuples=tuples, names=names)).sort_index()
    # Set style.
    sb.set_style('whitegrid')
    sb.set_palette(palette)
    fontsize = default_fontsize*fontscale
    if metric=='time_us':
        xlabel = 'Operator execution time (microseconds)'
    elif metric=='GFLOPS':
        xlabel = 'Operator GFLOPS (billion single-precision floating-point operations per second)'
    ylabel = 'Tensor shape'
    if species=='fullyconnected':
        ylabel += ' (in_C, in_H, in_W, out_C, out_H, out_W)'
    elif species=='conv':
        ylabel += ' (in_C, H, W, K, out_C, stride, pad)'
    elif species=='avgpool':
        ylabel += ' (C, H, W, K, stride)'
    elif species=='softmax':
        ylabel += ' (C, H, W)'
    if not figsize:
        num_groupby_values = len(df_raw.index.get_level_values(level=groupby_level).unique())
        # TODO: num_kernels
        num_platforms = len(df_raw.index.get_level_values(level='platform').unique())
        num_libraries = len(df_raw.index.get_level_values(level='library').unique())
        num_operators = len(df_raw.index.get_level_values(level='operator').unique())
        num_tensors = len(df_raw.index.get_level_values(level='tensor').unique())
        num_tuners = len(df_raw.index.get_level_values(level='tuner').unique())
        figheight = num_platforms * num_libraries * num_operators * num_tensors * num_tuners / num_groupby_values
        figsize = (default_figwidth, figheight)
    # For each unique groupby value.
    groupby_values = df_violin.index.get_level_values(level=groupby_level).unique()
    for groupby_value in groupby_values:
        df_violin_loc = df_violin.loc[groupby_value].reset_index()
        fig = plt.figure(figsize=figsize, dpi=default_figdpi)
        ax = fig.gca()
        sb.violinplot(ax=ax, data=df_violin_loc, x=metric, y='tensor', inner='point', split=False, saturation=0.8,
                      hue=hue_levels, fontscale=fontscale)
        # Title.
        if title:
            groupby_title = '%s: %s=%s' % (title, groupby_level, groupby_value)
        else:
            if groupby_level=='platform':
                groupby_title = '%s (GPU: %s @ %d MHz)' % (id_to_name[platform_id], gpu, gpu_mhz)
            else:
                groupby_title = '%s=%s' % (groupby_level, groupby_value)
        ax.set_title(groupby_title, fontsize=fontsize)
        # X axis.
        if not xstep: xstep = 1000
        if not xmin: xmin = np.int64(df_violin_loc[metric].min())
        if not xmax: xmax = np.int64(df_violin_loc[metric].max()) // xstep * xstep + xstep
        ax.set_xlim([xmin, xmax])
        ax.set_xticks(range(0, xmax+1, xstep))
        ax.set_xlabel(xlabel, fontsize=fontsize)
        # Y axis.
        ax.set_ylabel(ylabel, fontsize=fontsize)
        ax.tick_params(axis='y', labelsize=fontsize)
        # Vertical lines between groups of violins.
        for y in ax.get_yticks():
            ax.hlines(y=y+0.5, xmin=0, xmax=xmax, linestyles='dotted', colors='black')
        # Legend location.
        ax.legend(loc=legend_loc);

#### Plot speedup

In [None]:
def plot_speedup(df, baseline_levels='platform', baseline_values=firefly_id,
                 title=None, xmax=None, figsize=None, fontsize=default_fontsize,
                 legend_loc='lower right'):
    if not title: title = '%s=%s' % (str(baseline_levels), str(baseline_values))
    if not figsize:
        num_rows = len(df.index)
        num_tensors = len(df.index.get_level_values(level='tensor').unique())
        fig_height = 2 * num_rows / num_tensors
        figsize=[default_figwidth, fig_height]
    df_ = df.unstack(baseline_levels)
    df_speedup = 1. / df_.divide(df_[baseline_values], axis=0).stack(baseline_levels)
    # Plot the speedups.
    axes = pd.DataFrame(data=df_speedup, columns=['speedup (x)']) \
        .reorder_levels(['tensor', 'platform', 'library', 'operator', 'batch_size', 'tuner']) \
        .groupby(level=['tensor']) \
        .plot(kind='barh', title=title, stacked=False, width=0.8, grid=True, legend=True,
              xlim=[0,xmax], figsize=figsize, fontsize=default_fontsize, colormap=cm.autumn)
    # Annotate the bars with the speedups rounded to one digit after the decimal point.
    for ax in axes:
        ax.set_title(title, fontsize=fontsize)
        ax.tick_params(axis='x', labelsize=fontsize)
        ax.legend(loc=legend_loc)
        for patch in ax.patches:
            text = '{0:.1f}x'.format(patch.get_width())
            ax.annotate(text, (patch.get_width()*1.01, patch.get_y()*1.01), fontsize=64)

<a id="analysis_mobilenets_baseline"></a>
## `MobileNets-v1-1.0-224` (baseline)

<a id="analysis_mobilenets_baseline_setup"></a>
### Experimental setup

#### No tuner

```
$ ck run nntest:conv-armcl-opencl \
--dataset_uoa=tensor-conv-mobilenets-v1-1.0-224 \
--timestamp=mobilenets-v1-1.0-224-firefly-tuner-none \
--repetitions=10 --iterations=1 \
--env.CK_LWS_TUNER_TYPE=NONE

$ ck run nntest:directconv-armcl-opencl \
--dataset_uoa=tensor-conv-mobilenets-v1-1.0-224 \
--timestamp=mobilenets-v1-1.0-224-firefly-tuner-none \
--repetitions=10 --iterations=1 \
--env.CK_LWS_TUNER_TYPE=NONE
```

In [None]:
!ck list $repo_uoa:experiment:nntest*mobilenets-v1-1.0-224-firefly-tuner-none | sort

#### Default tuner

```
$ ck run nntest:conv-armcl-opencl \
--dataset_uoa=tensor-conv-mobilenets-v1-1.0-224 \
--timestamp=mobilenets-v1-1.0-224-firefly-tuner-default \
--repetitions=10 --iterations=1 \
--env.CK_LWS_TUNER_TYPE=DEFAULT

$ ck run nntest:directconv-armcl-opencl \
--dataset_uoa=tensor-conv-mobilenets-v1-1.0-224 \
--timestamp=mobilenets-v1-1.0-224-firefly-tuner-default \
--repetitions=10 --iterations=1 \
--env.CK_LWS_TUNER_TYPE=DEFAULT
```

In [None]:
!ck list $repo_uoa:experiment:nntest*mobilenets-v1-1.0-224-firefly-tuner-default | sort

<a id="analysis_mobilenets_baseline_experiments_all"></a>
### All experiments

In [None]:
df = get_experimental_results(repo_uoa=repo_uoa, tags='conv', profiling=False)
display_in_full(df[['time_us','GFLOPS','success?']])

<a id="analysis_mobilenets_baseline_experiments_failed"></a>
### Failed experiments

In [None]:
df_failed = df[df['success?']!='yes']
df = df[df['success?']=='yes']
display_in_full(df_failed)

<a id="analysis_mobilenets_baseline_plot_platform_us"></a>
### Plot by platform (microseconds)

In [None]:
plot_violin(df_raw=df, groupby_level='platform', xstep=1000, xmax=20000)

<a id="analysis_mobilenets_baseline_plot_platform_gflops"></a>
### Plot by platform (GFLOPS)

In [None]:
plot_violin(df_raw=df, groupby_level='platform', metric='GFLOPS', xstep=1, xmax=22)

<a id="analysis_mobilenets_baseline_plot_speedup"></a>
### Plot speedup over untuned direct convolution

#### Use median (50% percentile)

In [None]:
median = df['time_us'].groupby(level=df.index.names[:-1]).describe()['50%']
plot_speedup(median, xmax=8.0, legend_loc='upper right',
             baseline_levels=('platform', 'library', 'operator', 'tuner'),
             baseline_values=('firefly', 'opencl-18.05-b3a371b', 'directconv', 'NONE'))

#### Use minimum

In [None]:
minimum = df['time_us'].groupby(level=df.index.names[:-1]).min()
plot_speedup(minimum, xmax=8.0, legend_loc='upper right',
             baseline_levels=('platform', 'library', 'operator', 'tuner'),
             baseline_values=('firefly', 'opencl-18.05-b3a371b', 'directconv', 'NONE'))

In [None]:
# Check that DEFAULT is always better than NONE.
best_tuner = minimum.groupby(level=minimum.index.names[:-1]).idxmin()
minimum[best_tuner]

### Obtain optimal schedule

In [None]:
df_min = pd.DataFrame(minimum)
display_in_full(df_min)

In [None]:
df_best_tuner = df_min.loc[best_tuner]
display_in_full(df_best_tuner)

In [None]:
df_best_operator = df_best_tuner.swaplevel('tuner', 'operator')
display_in_full(df_best_operator)

In [None]:
# TODO: simplify.
best_operator = df_best_operator['time_us'].groupby(level=df_best_operator.index.names[:-1]).idxmin()
df_best_operator = df_best_operator \
    .loc[best_operator] \
    .swaplevel('operator', 'tuner')
display_in_full(df_best_operator)

### Estimate execution time

In [None]:
# Tensor '512-14-14-1-512-1-0' is repeated 5 times.
# TODO: just index using this tensor.
index_of_repeated_layer = (firefly_id, 'opencl-18.05-b3a371b', 'conv', '512-14-14-1-512-1-0', 1, 'DEFAULT')
df_best_operator.sum() + 4 * df_best_operator.loc[index_of_repeated_layer]

<a id="analysis_mobilenets_baseline_profiler"></a>
## `MobileNets-v1-1.0-224` ("baseline"): profiler

### Experimental setup

#### No tuner

```
$ ck run nntest:conv-armcl-opencl --dvdt_prof \
--dataset_uoa=tensor-conv-mobilenets-v1-1.0-224 \
--timestamp=mobilenets-v1-1.0-224-firefly-profiler \
--repetitions=10 --iterations=1 \
--env.CK_LWS_TUNER_TYPE=NONE

$ ck run nntest:directconv-armcl-opencl --dvdt_prof \
--dataset_uoa=tensor-conv-mobilenets-v1-1.0-224 \
--timestamp=mobilenets-v1-1.0-224-firefly-profiler \
--repetitions=10 --iterations=1 \
--env.CK_LWS_TUNER_TYPE=NONE
```

In [None]:
!ck list $repo_uoa:experiment:nntest*mobilenets-v1-1.0-224-firefly-profiler | sort

<a id="analysis_mobilenets_reduced"></a>
## `MobileNets-v1-0.75-160` ("reduced")