Note on the input data

This is the current implementation of input needed for the optimization algorithm.

What does the algorithm require

The current aim of optimization is to find a set of values (‘diet’) that is similar to the current diet, yet satisfies some constraints on nutrition and environment impact.

For the objective function,

  • a vector of current diet (in grams), diet0. This is used to compute the deviation (sum of squares) between the new (target) and current.

For the inequality constraints (standaradized or original),

  • a list of constraint values, ordered by tag_outcome (e.g. energy, ghge)
  • inside each tag_outcome such as energy,
    • unit_contrib for each food: a vector of size n
    • lwr, upr: constraint lower and upper bound. This is after reduction.
Constraint bounds

The values of constraint bounds need to be pre-computed before entering the algorithm - that means, setting the reduction on ghge is already done.

How are constaints computed

The total contribution of a diet is a weighted sum of all food intake (diet, in gram) and contribution to this particular nutrition / environment impact outcome (tag_outcome). For instance, in total, the current diet of 188g bread and 165g red meat contributes to X1 units of energy; X2 units of ghge.

The current diet is the average for each food group among all subjects who we collected data from. The lwr, upr of the current are used to limit the search region for the new diet. In the current implementation, they are 5% and 95% quantiles from all the subject.

The inequality constraints (e.g. energy) requires two values: constr_min, constr_max. This means that the computed total contribution of the new diet need to be between these two. In the current implementation:

  • minimum (lower bound) is 0.9 times of the total contribution
  • maximum (upper bound) is the same of the total contribution
  • if we want to reduce ghge, then multiply a factor to the two values above.

In addition to the raw values, we also implement a standardized version for each of the tag_outcome.

Rationale for standardization

We wish to have roughly the same scale for different tag_outcomes. The current implementation takes the standard deviation across all foods for a specific tag (e.g. energy), then divide by this value. This is only ONE of the many ways to standardize for numerical stability.

Alternatively, it is also possible to multiply a fixed constant such as 1000 to ghge. The interpretation could be better. As long the original diet vector is intact (meaning that the ratio between the original food 1, food 2 are unchanged), one can artificially modify the coefficients as they wish.

However, it is important to keep consistency in the values and the inequality function!

demo_input <- readRDS('data/demo_9foods_input.rda')
demo_input$current_diet
       food_name intake_mean intake_lwr intake_upr
1          Bread   188.31866  18.831866      343.8
2     Vegetables    72.79364   7.279364      230.7
3       Red meat   165.98669  16.598669      419.7
4  Milk, yoghurt   184.13142  18.413142      552.7
5           Fish   126.26154  12.626154      299.6
6         Cheese    74.61885   7.461885      302.9
7           Eggs    26.41185   2.641185      111.6
8 Fruit, berries   328.64505  32.864505      900.6
9       Potatoes    46.59652   4.659652      121.6
demo_input$unit_contrib
       food_name    energy     protein       carbs         fat    vitaminc
1          Bread 10.695553 0.091220068 0.441277081 0.030216648 0.005701254
2     Vegetables  3.790560 0.020648968 0.178466077 0.007374631 0.132743363
3       Red meat  1.565330 0.014877102 0.049805951 0.008408797 0.206985770
4  Milk, yoghurt  2.728863 0.007580175 0.134110787 0.004081633 0.198250729
5           Fish  8.341837 0.172619048 0.013605442 0.139455782 0.042517007
6         Cheese  6.086331 0.169784173 0.024460432 0.074820144 0.000000000
7           Eggs  6.178862 0.130081301 0.004065041 0.105691057 0.000000000
8 Fruit, berries  1.979745 0.035935969 0.055864097 0.011107481 0.000000000
9       Potatoes 13.502304 0.216589862 0.048387097 0.241935484 0.000000000
     calcium    ghge
1 0.33637400 0.00107
2 0.08849558 0.00037
3 0.25873221 0.00103
4 0.15160350 0.00072
5 0.11054422 0.01294
6 0.24460432 0.00311
7 0.52845529 0.00215
8 1.28062725 0.00143
9 6.58986175 0.01030
demo_constraints <- readRDS('data/demo_9foods_constraints.rda')
demo_constraints
  tag_outcome total_contrib_raw total_contrib_std    std_coef constr_min_std
1      energy       6002.791497         1460.2828   0.2432673      1314.2545
2     protein         82.349152         1034.1940  12.5586475       930.7746
3       carbs        153.317587         1108.5790   7.2306054       997.7211
4         fat         49.280653          606.6639  12.3103864       545.9975
5    vitaminc         86.965858          976.6934  11.2307682       879.0241
6     calcium        914.751986          434.7877   0.4753067       391.3090
7        ghge          3.404557          736.7069 216.3885105       663.0362
  cosntr_max_std constr_min_raw constr_max_raw
1      1460.2828    5402.512347    6002.791497
2      1034.1940      74.114237      82.349152
3      1108.5790     137.985828     153.317587
4       606.6639      44.352588      49.280653
5       976.6934      78.269272      86.965858
6       434.7877     823.276788     914.751986
7       736.7069       3.064101       3.404557