.. _kernel-dev-flow:

Kernel Development Flow
=======================

Developing a kernel
-------------------

Before proceeding, make sure you read the :ref:`computing-with-cerebras` section.

.. admonition:: Scope of this section

   This section does not directly discuss how to write your kernel using a CSL program. However, you should read this section to understand, at a high-level, the steps to develop a kernel with CSL. Also see :ref:`cslang-guides`.

CSL code and runtime script
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Developing a kernel constitutes the following:

    - Developing a CSL program, such as a ``<filename>.csl`` that defines the operations a PE or a set of PEs must perform. Often you will create multiple ``csl`` files. For example, see :ref:`02-multiple-source-files`.
    - Compiling the top-level ``.csl`` program with the ``cslc`` compiler.
    - Running the program, either with the simulator or on the Cerebras Wafer Scale Engine (WSE), using a runtime configuration script, usually written in Python, such as ``code.csl.run.py``. Here you will provide the input tensors to the simulator, and
    - When the simulation is complete, read the simulator output and compare against a reference for validating the program output.


Kernel development steps
~~~~~~~~~~~~~~~~~~~~~~~~

The following diagrams show the sequence of steps for developing a kernel.

**Step 1**

.. _cslang-kernel-dev-flow1:

.. figure:: images/cslang-kernel-dev-flow1.png
    :align: center
    :width: 750px

In your CSL code you must explicitly:

- Define a layout by using ``@set_rectangle()`` function. This defines a Rectangular region of contiguous processing elements (PEs).
- For each PE, use the ``@set_tile_code()`` to define the code the PE will run.
- Configure the routes and colors with ``@set_color_config()``.

**Steps 2 and 3**

.. _cslang-kernel-dev-flow2:

.. figure:: images/cslang-kernel-dev-flow2.png
    :align: center
    :width: 750px

- Next, you compile the top-level ``code.csl`` with the compiler tool ``cslc``. This will generate a binary ELF file for each PE.
- Finally, use the runtime Python script ``code.csl.run.py`` to run the code either on the simulator or directly on the Cerebras WSE.

.. note::

	 The above flow is the same when you are targeting the actual network-attached Cerebras WSE accelerator device, except that Run step will target the network-attached CS WSE accelerator, instead of the local fabric simulator.

Example walkthrough
-------------------

See :ref:`example-intro` and :ref:`working-with-code-samples`.