Prerequisites

Objectives

  • Revisiting namespaces, references, and pointers.
  • Work with multiple files.
  • Work with multiple applications.
  • Realize the dependencies between files.
  • Implement basic array operations:

    • print (traverse)
    • mean
    • variance
    • max
    • min
    • counters
  • Work with biological data in array structures.

    • Analyze DNA
    • Return the complementary sequence of DNA
    • Count a specific base pair in DNA
    • Analyze ECG: mean, variance, max, and min.
  • Compilation of multiple files.

Deadline

Thursday 26/3/2020 11:59pm PST.

  • Go to the assignment page and git clone your own repository.
  • If you need to work individually to practice the whole assignment your self, you can register also from this link individual assignment page. This will not be graded, but can be reviewed if you would like so.

Overview

You will learn how to work with multiple files. We will write our interesting functions in header files, and then use these functions in our interesting applications.

The header files (library)

We will work and implement our logic inside these files:

  • mathematics.hpp to contain some mathematical functions like calculation.
  • arrays.hpp to contain our array functions.
  • ecg.hpp to contain analysis functions on ECG.
  • dna.hpp to contain analysis functions on DNA.

Our source files of our applications that we are going to compile into useful and very important applications:

  • calculator.cpp to implement the Calculator application, and depends on mathematics.hpp.
  • heron.cpp to implement a simple application that implements Heron Formula, and also depends on mathematics.hpp.
  • analyzeECG.cpp to implement very useful application for ECG Analysis, depends on ecg.hpp.
  • analyzeDNA.cpp to implement very useful application for DNA Analysis, depends on dna.hpp.

You will find a useful header file helpers.hpp, no need to understand its content. We will just use two functions from helpers.hpp to load our DNA and ECG files from the disk.

Dependency Graph

dependencies

By the way, keep in mind, that application source files that are compiled into executable files, has to be:

  • including a main function. Each main function is a simply program entry point.
  • and main function should be existing in a .cpp file.

You will also realize our header files begin and ends with

#ifndef MATHEMATICS_HPP
#define MATHEMATICS_HPP

// includes of external headers here

// Your functions here

#endif // MATHEMATICS_HPP

No worries, they are called header guards.

Requirement 1: mathematics.hpp file

  • R1.1 Implement our calculation function using either if, else if, else or switch-case.
  • R1.2 Implement Heron Formula, that is used to compute the triangle area given its three sides, hmmmm very interesting.

tri

\[A = \sqrt{s(s-a)(s-b)(s-c)},\]

where s is the semiperimeter of the triangle; that is,

\[s=\frac{a+b+c}{2}.\]

Use this function declaration:

double heron( double a , double b , double c )
{
    // Logic here
}

You also need to #include <cmath> as an external header, to use std::sqrt function that computes the square root.

  • R1.3 It is your job now to implement the main function in heron.cpp file. It is required to make the Heron Formula application to receive the three parameters of the triangle through terminal. So you will retrieve the three parameters through the argv in main function. Remember that you will need to use std::atof function and #include <string>. To use heron, add #include "mathematics.hpp". Hint, you may cheat from calculation.cpp source file (but with receiving three doubles in this case).

Requirement 2: arrays.hpp file

  • R2.1 Implement a function that prints all array elements on terminal, using the following declaration:
void printAll( double *base , int arraySize )
{
    // Logic here
}
  • R2.2+R2.3 Implement a function that returns the maximum element and another one for minimum element, using the following declarations:
double maxArray( double *base, int arraySize )
{
    // Logic here
}

double minArray( double *base, int arraySize )
{
    // Logic here
}
  • R2.5+R2.6 Implement a function that returns the mean (average) of array elements and another one that returns the variance, using the following declaration:
double meanArray( double *base , int arraySize )
{
    // Logic here
}


double varianceArray( double *base, int arraySize )
{
    // Logic here

    // Hint: use meanArray ;)
    // Do you need a square function?
    // Maybe you can implement one in mathematics.hpp
    // then include "mathematics.hpp" to use mathematics::square here
}

If you don’t know variance,

\[var = \frac{1}{N} \sum_{n=1}^{N} ( \text{mean} - x_i )^2\]

Requirement 3: ecg.hpp file

  • R3.1 Make a function that computes the average, variance, max, and min of ECG signal. But these are not single variables so we can return. Alternatively, we will use a struct type Statistics to return them in a single object.
struct Statistics
{
    double average;
    double variance;
    double max;
    double min;
};
Statistics analyzeECG( double *base , int arraySize)
{
    // Logic here (4 lines)
}

Use the four functions we already implemented in arrays.hpp. Don’t forget to #include "arrays.hpp" in the current header file ecg.hpp.

Requirement 4: dna.hpp file and revisiting arrays.hpp file

Revisit arrays.hpp file

  • R4.1 Make a function that counts a given character in array of characters, using the following declaration:
int countCharacter( char *array , int size , char query )
{
    // Logic here
} 

Now dna.hpp file

  • R4.1 Make a namespace dna that will contain our functions.
  • R4.2 Implement complementaryBase you did in the first week using either if, else if, else or switch-case, with the following declaration:
char complementaryBase( char base )
{
    // Logic here
}
  • R4.3 Implement complementarySequence function that returns the complementary DNA sequence.

Please beware that the double strands of our DNA are directional, and they have opposite directions.

For example, the sequence ACG has a complementary sequence CGT, not TGC.

So in your for loop, you may read the original sequence from begining, and write the complementary sequence starting from the end of the complementary sequence array.

By the way, you have to allocate the complementary sequence on the heap (dynamic array) at the begining of function (using the given size).

Use the following declaration:

char * complementarySequence( char *base, int size )
{
    // Your logic here
}
  • R4.4 Implement analyzeDNA function that counts the 4 bases in a sequence, and returns the complementary sequence. Again, you have four counters and a complementary sequence. So you only return the complementary sequence, and the counters will saved back to reference integers. Remember to use arrays::countChar and to #include "arrays.hpp" in the current file. Use the following declaration:
char *analyzeDNA( char *base, int size, int &countA, int &countC, int &countG, int &countT )
{
    // Your logic here (5 lines).
}

Generating Executables and Testing Output

Compiling and Testing calculator.cpp

$ g++ calculator.cpp -o Calculator
$ ./Calculator 24 / 7
3.42857
$ ./Calculator 24 x 7
168

Compiling and Testing heron.cpp

$ g++ heron.cpp -o Heron
$ ./Heron 3 4 5
6

Compiling and Testing analyzeECG.cpp

We compile and test using an ECG dataset stored in datasets/ecg_data.txt.

Our application in the main function loads data from the file then use ecg::analyzeECG function implemented in ecg.hpp.

$ g++ analyzeECG.cpp -o AnalyzeECG
$ ./AnalyzeECG datasets/ecg_data.txt
ECG average : 0.82964
ECG variance: 0.00865574
ECG range   : (0.592,1.408)

Compiling and Testing analyzeDNA.cpp

We compile and test the program twice using:

  1. a DNA dataset stored in datasets/covid19.fasta.
  2. a DNA dataset stored in datasets/hepatitis_c_virus_genome.txt

Our application in the main function loads data from the file then use dna::analyzeDNA function implemented in dna.hpp.

$ g++ analyzeDNA.cpp -o AnalyzeDNA
$ ./AnalyzeDNA datasets/hepatitis_c_virus_genome.txt
Adenine (A) content:??
Guanine (G) content:??
Cytocine(C) content:??
Thymine (T) content:??

Complementary Sequence:
??
$ ./AnalyzeDNA datasets/covid19.fasta
Adenine (A) content:??
Guanine (G) content:??
Cytocine(C) content:??
Thymine (T) content:??

Complementary Sequence:
??