# Solved – Weighted entropy as a measure of diversity

Suppose that you are a company manager and you are looking for a statistical measure that defines the international reputation of your company. So, you collect data on your clients and the countries they come from.

You want this measure to have the following properties:

1. The larger the number of countries where your client come from, the more your company is international.

2. If all of your clients are from a single country, i.e., the number of countries is 1, your international reputation should be the lowest value (let's say 0 for convenience).

3. Unfortunately, you have to discriminate between the countries on your client list. For business reasons, you prefer to have more clients from high-income countries and fewer clients from low-income countries and there are countries that you can't deal with because of sanctions, embargoes or similar reasons.

Let's say you have the following data:

1. \$n_i\$: The number of clients from the i-th country.

2. \$p_i\$: The fraction of your clients that come from the i-th country.

3. \$w_i\$: The desirability of dealing with clients from the i-th country.

4. \$N\$: The sum of \$n_i\$'s. i.e. the number of countries.

I am currently reading about Rényi entropy as a potential method. Rényi entropy of order \$q\$ is given by the formula

\$\$^qH = frac{1}{1-q} lnleft(sum_{i=1}^{N}p_i^qright)\$\$

And I'm thinking of modifying it to a new formula taking "desirability coefficients" \$w_i\$ into account. My suggestion is this:

\$\$^qH = frac{1}{1-q} lnleft(sum_{i=1}^{N}w_ip_i^qright)\$\$

Do you think this is a valid method? Are there any better methods or suggestions available?

UPDATE:

One way to take the desirability of trade into account is to update the probabilities in this way:

\$\$tilde p_i = frac{w_icdot n_i}{sum_{i=1}^N(w_i cdot n_i)}\$\$

But I'm not sure if this actually works. How can I know that it's a good choice?

Contents

You want some effective formula, so there is a lot of freedom you can do.

For rescaling I would do something slightly different:

\$\$ tilde{p}_i = frac{w_i p_i}{sum_i w_i p_i} \$\$

where \$tilde{p}_i\$ is the estimated percent of money coming from country \$i\$. So that if e.g. Germans pay twice as much as Poles, you weight them twice as much.

Then you can use \$H_q({tilde{p}_i})\$.

BTW: For communicating results I would rather use some diversity index (see https://stats.stackexchange.com/a/135153/6552 or https://stats.stackexchange.com/a/144235/6552). Then you can say something like "we have money coming from effectively 3.4 countries".

Rate this post