---------------------------------------------------
The GI course includes 2 homework assignments,
each one consisting of 4 parts.
The first homework assignment brings deep knowledge
of the algorithmic and implementational side of GI.
The second homework assiggnment brings deep knowledge
of the data manipulation and analysis side of GI.
The text below defines the work
and the deliverables:
====================================================
HW#1:
Each student selects a different GI algorithm,
and, using the AWS (Amazon Web Services),
implements it in four different technologies
(full freedom on algorithm selection, zero on technology selection):
a. CPU
b. GPU
c. DFE
d. WSN
Detailed explanations follow in the classroom
(in the first class of the semester).
The deliverables for this assignment are six
(seven for those taking the one-course-grade-up option):
a. PowerPoint with the essence of the algorithm,
and where it could be used,
and what are the best uses
for each particular implementational tecnology.
b. PowerPoints with the platform details, plus
flowcharts, execution graphs, code, and sample runs,
for each one of the four implementational technologies.
This means four different PowerPoint presentations.
c. PowerPoint which compares the four implementations
for one BigData problem set, in the following domains:
measured speed, estimated power,
calculated complexity, and precision analysis.
d. Optional, for one higher course grade,
for those who like to turn this homework into a paper
for a conference or a journal: one more one-slide PopwerPoint
comparing 4 implementations for several BigData sets.
The students are asked to use the best possible infrastructure
that the ASW is offering,
for the budget available
(this course likes the same infrastructure to be used as @MIT).
===============================================================
HW#2:
a. Do a short survey of sequencers available on the market,
presented in the form of a PowerPoint presentation.
b. Using a sequencer-generated data set obtained from 7BG,
do the preprocessing of the sequencer stream,
using the BWTA tool,
and expain the work done, using a PowerPoint.
c. Using the GATK tool,
find all the mutations in the given set of 20000 genes
where between 40000 and 100000 mutations are expected
(in the whole genome of a person, about 4M mutations are expected).
Each mutation should be labeled in one of the following ways:
"known effects," "no effects," or "unknown effects."
Present the entire process of this paragraph,
using a PowerPoint presentation.
d. Study the published results of the EU FP7 ARTreat Project,
for which the Belgrade effort was led by Professor Milutinovic,
and on which the major contributions were generated
by Dr. Rakocevic and DrCandidate Mihajlovic.
Develop the process that uses the 4 major algorithms from PSZ
(the datamining course of Professor Milutinovic),
in the effort to find effects of the mutations
now labeled as "effects NOT known,"
of course, for a statistically significant number of persons
(in the same way in which Dr. Rakocevic did it,
for NNs, DecisionTrees, RuleInduction, and CaseBasedReasoning).
If and where necessary, do imputations,
as it was done by DrCandidate Mihajlovic in the ARTreat project.
The developed process should be explained using a PowerPoint.
For another one-course-grade-up and another paper,
run the developed process on a vector of data sets,
related to a statistically long enough vector of persons.
===================================================================
Notes:
Each PowerPoint could be as short as 10 pages,
or as long as needed.
Each PowerPoint should follow the USP recommendations,
meaning the visual effectiveness, semantic breaks, and n/N.
Each student could be asked to run the described code
of HW#1 in front of the entire classroom.
Each student could be asked to run the used tools
of HW#2 in front of the entire classroom.
===========================================================
a. Help for Homework #1
b. Help for Homework #2
vm@etf.rs