Part 2: Problem definition

The information we have are the food groups and constraints.

foods <- read.csv('data/foods.csv', sep = ',')
data.table::setDT(foods) # use data.table format
head(foods)  # print the fist 6
           food intake energy protein   fat carbs sugar alcohol  ghge
         <char>  <num>  <num>   <num> <num> <num> <num>   <num> <num>
1:        Bread  175.4 10.696   0.091 0.030 0.441 0.002       0 0.001
2: Other grains   45.0 14.022   0.100 0.042 0.607 0.011       0 0.002
3:        Cakes   35.6 14.185   0.067 0.152 0.424 0.185       0 0.002
4:     Potatoes   67.8  3.791   0.021 0.007 0.178 0.000       0 0.000
5:   Vegetables  154.6  1.565   0.015 0.008 0.050 0.005       0 0.001
6:      Legumes    3.5  8.571   0.143 0.029 0.286 0.000       0 0.001
  constraint energy protein  fat carbs sugar alcohol ghge
1      lower   9000    55.0 61.8 250.0   0.0       0  0.0
2      upper  10000   111.5 98.8 334.6  54.8      10  4.7

Formulation

Aim: find a diet combination that satisfy the nutritional and environmental constraints, while similar to the current diet.

Notation

We make the following notation:

  • \(x_1, x_2, ..., x_{k}\) are the target food intake (in grams, or other units) for \(k\) food groups.
  • \(X_1, X_2, ..., X_{k}\) are the current food intake (in grams, or other units).

For the constraints,

  • \(e_1, ..., e_k\): energy associated with each of the food groups
    • \(E\) is the total energy for all foods, with range between \(E_{lower}, E_{upper}\)
    • For example, with the data we have, this range is (9000, 10000).
  • \(p_1, ..., p_k\): protein
  • \(f_1, ..., f_k\): fat
  • \(c_1, ..., c_k\): carbs
  • \(s_1, ..., s_k\): sugar
  • \(a_1, ..., a_k\): alcohol
  • \(g_1, ..., g_k\): ghge

Optimization

Find a set of \(x_1, ..., x_k\) such that the values would

minimise the squared sum of differences between current diet and target diet:

\((x_1 - X_1)^2 + (x_2 - X_2)^2 + ... + (x_k - X_k)^2\)

and satisfy the following constraints:

\(x_1, ..., x_k >= 0\) (realistic diet intake can not be negative)

\(x_1e_1 + x_2 e_2 + ... + x_k e_k >= E_{lower}\), total energy above the lower limit

\(x_1e_1 + x_2 e_2 + ... + x_k e_k <= E_{upper}\), total energy below the upper limit

\(x_1p_1 + x_2 p_2 + ... + x_k p_k >= P_{lower}\), total protein below the upper limit

\(x_1p_1 + x_2 p_2 + ... + x_k e_k <= P_{upper}\), total protein below the upper limit

And so on.

Solve the optimization problem

This setting is a quadratic program (QP). It is an optimization problem with quadratic objective, and inequality constraints. We do not have equality constraints in this setting.

With R, there are various software to find a solution:

  • nloptr in nloptr package (non-linear optimization),
  • constrOptim in stats package, which relies on optim function,
  • solve.QP in quadprog package

among others.