==================================================================

Tentative List of Coauthors:
Trifunovic, P., Trifunovic, N., Milutinovic, V., Hurson, A.,

Tentative Title:
On the Coupling of ControlFlow and DataFlow
for Effective Hybrid SuperComputing

Tentative Abstract:

DataFlow machines along the Maxeler Paradigm have been introduced
with an intention to overcome the dissadvantages
of abbundant low level communications
on the microarchitecture level in the von Neumann Paradigm,
which is being used in MultiCore approaches (e.g., Intel)
and ManyCore approaches (e.g., NVidia).
The Maxeler Dataflow Paradigm execution model is based on the Execution Graph
and follows the wisdom of the Nobel Laureate Richard Feynman,
insisting on minimal data movements in the microarchitecture level,
for minimal power dissipation and maximal speedup.
This goal is achieved more effectively if temporal and spatial data are separated.
This means that the Maxeler Paradigm also follows the wisdom of the Nobel Laureate Ilya Prigogine,
because separating the spatial and temporal data minimizes the entropy of the computing system,
thus creating conditions for a most effective compiler optimization.
This separation implies that the processing of the predominantly temporal data
be done at the host (ControlFlow)
and the processing of the predominantly spatial data
be done at an accelerator (DataFlow),
which means that the movement of data from host to accelerator
and the movement of results from accelerator to host
becomes the major computing bottleneck;
it is the central topic of this research that studies the HLL impact on this bottleneck,
and ranks the HLLs as far as their suitability for the Maxeler DataFlow Paradigm.
A set of 32 most popular modern HLLs was analyzed and the BigData sizes were determined
for which the impact of the relatively slow host-to-accelerator interface
could be neglected in stream processing.
The Maxeler DataFlow Paradigm could also benefit
from the wisdoms of the Nobel Laureate Daniel Kahneman
(approximate computing to release resources for better speedup)
and the Nobel Laureate XY
(relative computing to trade latency for better precision),
so the effects of these two issues are studied, as well.

1. Introduction

General

Specific

Hypothesis (mission, vision, angle, goal)

2. Problem Statement

To determine:
(a) Complexities of skins for various HLLs.
(b) Latencies indroduced by skins of various HLLs.
(c) Databandwidth of various skins.
(d) Datasizes at which the effects of skins fade away.

The problem is important because ...

The importance of the problem grow over time ...

3. Existing Solutions

DSM

MPI

RPC


4. Proposed Solutions

Modify RPC for the most effective implementation of DF skins.

Skins for 32 languages,
each one evaluated by:
(a) code complexity,
(b) latency for data transfer from host to accelerator,
(c) datasizes at which the impact of skin fades away
    (less than one promile).
(d) some -ability ???

Changes of the above are also analyzed,
if Kahneman is employed,
and lower Floating-Point (FLOP) data are transfered
(32, 16, 8),
or Fixed-Point data are transfered
(X, Y, Z),
or Integers,
or Bits (only the sign bit).

In other words,
the above four issues (a,b,c,d) will be re-checked
for a number of FLOP precisions (64, 32, 24, 16),
a number of FIXP precisions (X, Y, Z),
plus integers and single bits
(all this presented as 32 plots).

Changes of the above are also analyzed,
if latency is traded for a better precission,
in apps that could tolerate higher latencies
(plots show what HLLs to avoid for given latency-bounds).

THIS NEEDS MORE BRAINSTORMING OF VM and PT!!!

5. Conditions and Assumtions

The list has to be formed as the work progresses.

6. Analytics

Some math has to be included - we will brainstorm!

7. Simulation

Some programs will be run - effects will be noted - be creative!
Themes: Math, DeepLearning,

8. Implementation

This can go into an appendix, with GitHub pointers.

9. Conclusions

Business as usual - see HOW TO CREATE ...

10. Annotated References

IEEE format is mandatory!
While working, keep mnemonics [Milutinovic1989], not numbers [1,2]

==================================================================

The importance of supporting HLLs could be studied from
[Milutinovic1988]:

(1) Milutinovic, V., Editor,
    High-Level Language Computer Architecture,
    Computer Science Press,
    NY, NY, USA, 1988
    (ISBN: 0-88175-132-4)

The variety of simulation methods in computing could be studied from
[Tomasevic1996]:

(2) Tomasevic, M., Milutinovic, v.,
    "A Simulation Study of Hardware-Oriented DSM Approaches,"
    IEEE Parallel and Distributed Technology,
    Vol. 4., Issue 1, 1996
    (DOI: 10.1109/88.481689).

An interesting application to Internet search could be found in
[Knezevic2000]:

(3) Knezevic, P., Radunovic, B., Nikolic, N., Jovanovic, T.,
    Milanov, D., Nikolic, M., Milutinovic, V.,
    Casselman, S., Schewel, J.,
    "The Architecture of the Obelix:
    An Improved Internet Search Engine,"
    Proceedings of the 33rd Annual IEEE Hawaii Int'l Conference
    on System Sciences, Maui, January 4-7, 2000
    (DOI: 10.1109/hicss.2000.926873).

About the microprocessor design one can find details in
[Milutinovic1996]:

(4) Milutinovic, v.,
    Surviving the Design of a 200MHz RISC Microprocessor,
    IEEE Computer Society Press,
    Washington DC, USA, 1996
    (ISBN: 0-8186-7343-5).