Assignment 4¶

In this assignment, you'll combine the assignment 3 data set with nutrition data from the USDA Food Composition Databases. The CSV file fresh.csv contains the fresh fruits and vegetables data you extracted in assignment 3.

The USDA Food Composition Databases have a documented web API that returns data in JSON format . You need a key in order to use the API. Only 1000 requests are allowed per hour, so it would be a good idea to use caching.

Sign up for an API key here. The key will work with any Data.gov API. You may need the key again later in the quarter, so make sure you save it.

These modules may be useful:

Exercise 1.1. Read the search request documentation, then write a function called ndb_search() that makes a search request. The function should accept the search term as an argument. The function should return the search result items as a list (for 0 items, return an empty list).

Note that the search url is: https://api.nal.usda.gov/ndb/search

As an example, a search for "quail eggs" should return this list:

[{u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'CHAOKOH, QUAIL EGG IN BRINE, UPC: 044738074186',
  u'ndbno': u'45094707',
  u'offset': 0},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'L&W, QUAIL EGGS, UPC: 024072000256',
  u'ndbno': u'45094890',
  u'offset': 1},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'BUDDHA, QUAIL EGGS IN BRINE, UPC: 761934535098',
  u'ndbno': u'45099560',
  u'offset': 2},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'GRAN SABANA, QUAIL EGGS, UPC: 819140010103',
  u'ndbno': u'45169279',
  u'offset': 3},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u"D'ARTAGNAN, QUAIL EGGS, UPC: 736622102630",
  u'ndbno': u'45178254',
  u'offset': 4},
 {u'ds': u'SR',
  u'group': u'Dairy and Egg Products',
  u'name': u'Egg, quail, whole, fresh, raw',
  u'ndbno': u'01140',
  u'offset': 5}]

As usual, make sure you document and test your function.

import requests
import requests_cache
from urlparse import urlunparse, urlparse
import pandas as pd
# Set up a cache for requests.
requests_cache.install_cache("cache")
key = "HwyOBVU5rcviLL3vPvvL9EHaQWEdMMoh3Tw6yU7Z"

def ndb_search(item):
    """
    input: name of the item. e.g: 'quail eggs'
    output: list
    """
    urll = list(urlparse("https://api.nal.usda.gov/ndb/search/?format=json&q=butter&sort=n&max=25&offset=0&api_key=DEMO_KEY"))
    #method = "ed/collegescorecard/v1/schools.json"
    params = "format=json&q={}&sort=n&max=25&offset=0&api_key=HwyOBVU5rcviLL3vPvvL9EHaQWEdMMoh3Tw6yU7Z".format(item)
    urll[4] = params
    response = requests.get(urlunparse(urll))
    # if there is no item in my search 
    if response.json().keys()==[u'errors']:
        # print out the error message
        return response.json()[u'errors'][u'error'][0][u'message']
    # convert the output of ndb_search() to a data frame with pd.DataFrame()
    else: 
        results = response.json()[u'list'][u'item']
        return results

ndb_search("quail eggs")

[{u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'BUDDHA, QUAIL EGGS IN BRINE, UPC: 761934535098',
  u'ndbno': u'45099560',
  u'offset': 0},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'CHAOKOH, QUAIL EGG IN BRINE, UPC: 044738074186',
  u'ndbno': u'45094707',
  u'offset': 1},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u"D'ARTAGNAN, QUAIL EGGS, UPC: 736622102630",
  u'ndbno': u'45178254',
  u'offset': 2},
 {u'ds': u'SR',
  u'group': u'Dairy and Egg Products',
  u'name': u'Egg, quail, whole, fresh, raw',
  u'ndbno': u'01140',
  u'offset': 3},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'GRAN SABANA, QUAIL EGGS, UPC: 819140010103',
  u'ndbno': u'45169279',
  u'offset': 4},
 {u'ds': u'BL',
  u'group': u'Branded Food Products Database',
  u'name': u'L&W, QUAIL EGGS, UPC: 024072000256',
  u'ndbno': u'45094890',
  u'offset': 5}]

Exercise 1.2. Use your search function to get NDB numbers for the foods in the fresh.csv file. It's okay if you don't get an NDB number for every food, but try to come up with a strategy that gets most of them. Discuss your strategy in a short paragraph.

Hints:

The foods are all raw and unbranded.
You can test search terms with the online search page.
You can convert the output of ndb_search() to a data frame with pd.DataFrame().
The string methods for Python and Pandas are useful here. It's okay if you use simple regular expressions in the Pandas methods, although this exercise can be solved without them.
You can merge data frames that have a column in common with pd.merge().

(1) the foods are all raw and unbranded
 1. so do I need to add "raw" for search?
    If I do not add "raw" and search for cabbage, there will be results like :"Cabbage,             kimchi". kimchi is not a raw food.
    Also, there are some name with "raw" and "juice". Apparently "juice" is not a fresh food.       We should drop these rows. 
 2. how to define unbranded items? 
    2.1 if the fruit is unbranded, the group column will have value "Fruits and Fruit Juices"       or "Vegetables and Vegetable Products". 
    2.2 But there are also exceptions like "American Indian/Alaska Native Foods".
    2.3 for banded item, the value in group column will be "Branded Food Products Database"
 3. if there is nothing find in the website by using "food(the fresh name) raw", try to find others
 4. sometimes there will return multiple ndb numbers. Try to find the one that best matches food.

# make some change of the ndb_search() function
def ndb_search(item):
    """
    input: name of the item. e.g: 'quail eggs'
    output: dataframe with ndb numbers
    """
    urll = list(urlparse("https://api.nal.usda.gov/ndb/search/?format=json&q=butter&sort=n&max=25&offset=0&api_key=DEMO_KEY"))
    #method = "ed/collegescorecard/v1/schools.json"
    params = "format=json&q={}&sort=n&max=25&offset=0&api_key=HwyOBVU5rcviLL3vPvvL9EHaQWEdMMoh3Tw6yU7Z".format(item)
    urll[4] = params
    response = requests.get(urlunparse(urll))
    # if there is no item in my search 
    if response.json().keys()==[u'errors']:
        # print out the error message
        return response.json()[u'errors'][u'error'][0][u'message']
    # convert the output of ndb_search() to a data frame with pd.DataFrame()
    else: 
        results = response.json()[u'list'][u'item']
        return pd.DataFrame(results)

fresh = pd.read_csv("fresh.csv")
print fresh.shape
print len(fresh.food.unique())
# results below shows that there are 47 unique food names however, there are 49 rows in fresh dataset.
# By taking a look of the fresh dataframe, we know "cucumbers" and "cabbage" are the two items that got repeated.
print fresh[fresh["food"]=="cucumbers"].index.tolist()
fresh = fresh.drop(26)
print fresh[fresh["food"]=="cabbage"].index.tolist()
fresh = fresh.drop(36)

(49, 7)
47
[25, 26]
[36, 37]

import re
def get_food(item):
    # only return the foods which are raw and unbranded
    tb = ndb_search(item+" "+"raw")
    if not type(tb) == unicode: # when type(tb) == unicode, tb has value "u'Your search resulted in zero results.Change your parameters and try again'"
        tb_n = tb[(~tb.group.str.contains("Branded")) & ((tb.group.str.contains("Fruit")) | (tb.group.str.contains("Vegetable")))&(~tb.name.str.contains("juice"))].copy()
        if not tb_n.empty:
            #print type(re.search("^([0-9]+)(\s)+([0-9]+).*",str(tb_n.iloc[0:1,3])).group(3))
            # type of tb_n.iloc[0:1,3]], I can not extract the number directly
            # "\s" means space
            tb_n1 = pd.DataFrame({"ndbno": re.search("^([0-9]+)(\s)+([0-9]+).*",str(tb_n.iloc[0:1,3])).group(3),"food":[item]}) 
            # if i do not use regular expression, use df.x.str.strip("0characters") and ("change to numeric": pd.to_numeric(df.x) x column in df)  
            #tb_n1 = pd.DataFrame({"ndbno": pd.to_numeric(tb_n.iloc[0:1,3]),"food":[item]}) 
            # but return to numeric is not a good method because for string "09876" it will turn to be 9876. But what we want is exactly "09876"
            #tb_n1 = pd.DataFrame({"ndbno": tb_n.ndbno.str.split("\s"),"food":[item]}) 
            # when I want to make a dataframe and there is only one object, I need to use ["the value"]. otherwise, there will be error
            #tb_n1["fresh"] = item
            return tb_n1
dbn_fresh = pd.concat([get_food(x) for x in fresh["food"]])
print dbn_fresh

               food  ndbno
0        watermelon  09326
0        cantaloupe  09181
0        tangerines  09218
0      strawberries  09140
0             plums  09061
0           oranges  09216
0       raspberries  09302
0       pomegranate  09286
0         pineapple  09266
0          apricots  09021
0          honeydew  09184
0            papaya  09226
0          cherries  09001
0           bananas  09040
0            apples  09504
0           peaches  09236
0        nectarines  09191
0             pears  09427
0        grapefruit  09112
0      blackberries  09042
0            grapes  11974
0       blueberries  09050
0           mangoes  09176
0         asparagus  11011
0         cucumbers  11206
0   lettuce_iceberg  11252
0            onions  11282
0     turnip_greens  11568
0    mustard_greens  11270
0          potatoes  11352
0           cabbage  11116
0    sweet_potatoes  11505
0            radish  11430
0          avocados  09037
0              kale  11233
0         artichoke  11007
0              okra  11278
0  brussels_sprouts  11098
0        corn_sweet  11900

fresh

# merge this two tables
fresh_n = pd.merge(fresh,dbn_fresh,how="right")
fresh_n.head()

Exercise 1.3. Read the food reports V2 documentation, then write a function called ndb_report() that requests a basic food report. The function should accept the NDB number as an argument and return the list of nutrients for the food.

Note that the report url is: https://api.nal.usda.gov/ndb/V2/reports

For example, for "09279" (raw plums) the first element of the returned list should be:

{u'group': u'Proximates',
 u'measures': [{u'eqv': 165.0,
   u'eunit': u'g',
   u'label': u'cup, sliced',
   u'qty': 1.0,
   u'value': u'143.93'},
  {u'eqv': 66.0,
   u'eunit': u'g',
   u'label': u'fruit (2-1/8" dia)',
   u'qty': 1.0,
   u'value': u'57.57'},
  {u'eqv': 151.0,
   u'eunit': u'g',
   u'label': u'NLEA serving',
   u'qty': 1.0,
   u'value': u'131.72'}],
 u'name': u'Water',
 u'nutrient_id': u'255',
 u'unit': u'g',
 u'value': u'87.23'}

Be sure to document and test your function.

def ndb_report(ndb):
    """
    input: name of the ndb. e.g: '09279'
    output: list of nutritions
    """
    url = "https://api.nal.usda.gov/ndb/V2/reports"
    response = requests.get(url, params = {
        "api_key": "HwyOBVU5rcviLL3vPvvL9EHaQWEdMMoh3Tw6yU7Z",
        "ndbno": ndb
        #"type": "b"
        #"format":"json"
    })
    # if there is no item in my search 
    if response.json().keys()==[u'errors']:
        # print out the error message
        return response.json()[u'errors'][u'error'][0][u'message']
    # convert the output of ndb_search() to a data frame with pd.DataFrame()
    else: 
        results = response.json()[u'foods'][0][u'food'][u'nutrients']
        return results

pd.DataFrame(ndb_report("11098")).head()

Exercise 1.4. Which foods provide the best combination of price, yield, and nutrition? You can use kilocalories as a measure of "nutrition" here, but more a detailed analysis is better. Use plots to support your analysis.

1.Add a column that named "nutrition"?

no, there are many kinds of nutrition. I will choose those noramlly shown in the package: protein; Carbohydrate; Sodium; cholesterol; Energy

2. do I need to transfer the unit here?

Yes: compare the same thing with the same unit

3. what kinds of plots should I use? barplot? scatter plot(fruits and veg)?

3.1 barplots: (protein; Carbohydrate; Sodium; cholesterol; Energy), price and yield for each fruit and vegetable.

3.2 scatter plots: make pairwise comparison (e.g energy vs price) for different groups (e.g fruits vs vegetables)

3.3 plot the distribution of one quantity for different groups. The way of creating groups is based on the results we have before. (e.g we already know berry has higher price then other fruits. We can divide fruits into berries and notberries)

# the dataframe with fresh name, price, yield is called "fresh_n"
# for create a dataframe for each nutrition: protein, Carbohydrate, Sodium, cholesterol, Energy (be careful about the unit, also make sure the type in each cell is numerical)
 
for y in ["Protein","Carbohydrate","Energy"]:
    locals()['df_{}'.format(y)] = []
    for x in fresh_n["ndbno"]:
        df = pd.DataFrame(ndb_report(x))
        # keep the "ndbno" in the table
        df["ndbno"] = [x]*df.shape[0]
        locals()['df_{}'.format(y)].append(df.loc[df.name.str.contains(y)])
    locals()['nutr_{}'.format(y)] = pd.concat(locals()['df_{}'.format(y)])
    locals()['nutr_{}'.format(y)] = pd.merge(locals()['nutr_{}'.format(y)],fresh_n)

nutr_Protein.head()

# plot 1: barplot
# Matplotlib
import matplotlib.pyplot as plt
import numpy as np

# for fruits
fresh_f = fresh_n.loc[fresh_n.type.str.contains("fruit")]
def barplt(dataframe,x):
    fig, ax = plt.subplots(1, 1)
    Int = np.arange(dataframe.shape[0])
    bar_width = 0.8
    plt.barh(Int,dataframe[x])
    plt.yticks(Int + bar_width / 2, dataframe["food"])
barplt(fresh_f,"price_per_lb")
plt.xlabel('Price')
plt.ylabel('Fruits')
plt.title('Bar chart of Price for each fruit')
fig.autofmt_xdate()
plt.show()

Plot 1

# for vegetables
# for fruits
fresh_v = fresh_n.loc[fresh_n.type.str.contains("vegetable")]

barplt(fresh_v,"price_per_lb")
plt.xlabel("Price")
plt.ylabel('Vegetable')
plt.title('Bar chart of Price for each vegetable')
fig.autofmt_xdate()
plt.show()

Plot 2

By analysing those two plots (plot1 and plot2) we know that most of the fruits and vegetables have price less than \$3 per lb. But there is only okra and asparagus in vegetable have price higher than \$3. For fruits which have relatively higher price are berries, like blueberries, cherries and raspberries.
For fruits that have relatively lower prices are some common ones like banana, grapefruit, cantaloupe, watermelon and pineapple. For vegetables, potatoes, sweet-potatoes have the lowest prices.

# yield
# for fruits
barplt(fresh_f,"yield")
plt.xlabel('Yield')
plt.ylabel('Fruits')
plt.title('Bar chart of Yield for each fruit')
fig.autofmt_xdate()
plt.show()

Plot 3

barplt(fresh_v,"yield")
plt.ylabel('Vegetables')
plt.xlabel('Yield')
plt.title('Bar chart of Yield for each vegetable')
fig.autofmt_xdate()
plt.show()

Plot 4

From plot3 we know that for the fruits have lower price like banana, pineapple and grapefruit, they also have lower yields. However, for those berries, they have much higher yields.
From plot 4 we know that for vegetables, artichoke has the lowest yield. Brussels sprouts and kale have the highest yields. Kale and artichoke have relatively high price.

# nutrition content
#(1) protein
pro_f = nutr_Protein.loc[nutr_Protein.type.str.contains("fruit")]
#fresh_v = fresh_n.loc[fresh_n.type.str.contains("vegetable")]
barplt(pro_f,"value")
plt.ylabel('Fruit')
plt.xlabel('Protein')
plt.title('Bar chart of Protein for each fruit')
fig.autofmt_xdate()
plt.show()

Plot 5

pro_v = nutr_Protein.loc[nutr_Protein.type.str.contains("vegetable")]
barplt(pro_v,"value")
plt.ylabel('vegetable')
plt.xlabel('Protein')
plt.title('Bar chart of Protein for each vegetable')
fig.autofmt_xdate()
plt.show()

Plot 6

From plot 5 we know that most of the protein in fruits are less than 2g. Only grape has the largest protein value which is lager than 5g.
From plot 6 we know that there are several vegetables that have protein value larger than 2g. Kale has the largest protein value: 4.3g among vegetables. Corn sweet, brussels sprouts and artichoke also have relatively high protein which are larger than 3g.

#(2) Carbohydrate
ca_f = nutr_Carbohydrate.loc[nutr_Carbohydrate.type.str.contains("fruit")]
barplt(ca_f,"value")
plt.ylabel('Fruit')
plt.xlabel('Carbohydrate')
plt.title('Bar chart of Carbohydrate for each fruit')
fig.autofmt_xdate()
plt.show()

Plot 7

ca_v = nutr_Carbohydrate.loc[nutr_Carbohydrate.type.str.contains("vegetables")]
barplt(ca_v,"value")
plt.ylabel('Vegetables')
plt.xlabel('Carbohydrate')
plt.title('Bar chart of Carbohydrate for each vegetable')
fig.autofmt_xdate()
plt.show()

Plot 8

From plot 7 we know that for all the fruits, the content of Carbohydrate are about the same. Banana and orange have the highest Carbohydrate. Cherries has the lowest.
Plot 8 shows that for the vegetables, the content of their Carbohydrate are comparatively lower than fruits. Potatoes and corn sweet have the highest Carbohydrate. Cabbage and cucumbers have the lowest.

# (3)Energy
en_f = nutr_Energy.loc[nutr_Energy.type.str.contains("fruit")]
barplt(en_f,"value")
plt.ylabel('Fruit')
plt.xlabel('Energy')
plt.title('Bar chart of Energy for each fruit')
fig.autofmt_xdate()
plt.show()

Plot 9

en_v = nutr_Energy.loc[nutr_Energy.type.str.contains("vegetables")]
barplt(ca_v,"value")
plt.ylabel('Vegetables')
plt.xlabel('Energy')
plt.title('Bar chart of Energy for each vegetable')
fig.autofmt_xdate()
plt.show()

Plot 10

By comparing the value of energy in plot 9 and plot 10 we know that fruits have much larger energy than vegetables. For vegetables, corn sweet and potatoes have the largest energy content (about 19 kcal). For the fruits, orange, grape and banana have the largest energy content (about 95 kcal).

Now, let's summary all the conclusion we have:
(1)Price
By analysing those two plots (plot1 and plot2) we know that most of the fruits and vegetables have price less than \$3 per lb. But there is only okra and asparagus in vegetable have price higher than \$3. For fruits which have relatively higher price are berries, like blueberries, cherries and raspberries.
For fruits that have relatively lower prices are some common ones like banana, grapefruit, cantaloupe, watermelon and pineapple. For vegetables, potatoes, sweet-potatoes have the lowest prices.
(2)Yield
From plot3 we know that for the fruits have lower price like banana, pineapple and grapefruit, they also have lower yields. However, for those berries, they have much higher yields.
From plot 4 we know that for vegetables, artichoke has the lowest yield. Brussels sprouts and kale have the highest yields. Kale and artichoke have relatively high price.
(3)Protein
From plot 5 we know that most of the protein in fruits are less than 2g. Only grape has the largest protein value which is lager than 5g.
From plot 6 we know that there are several vegetables that have protein value larger than 2g. Kale has the largest protein value: 4.3g among vegetables. Corn sweet, brussels sprouts and artichoke also have relatively high protein which are larger than 3g.
(4)Carbohydrate
From plot 7 we know that for all the fruits, the content of Carbohydrate are about the same. Banana and orange have the highest Carbohydrate. Cherries has the lowest.
Plot 8 shows that for the vegetables, the content of their Carbohydrate are comparatively lower than fruits. Potatoes and corn sweet have the highest Carbohydrate. Cabbage and cucumbers have the lowest.
(5)Energy
By comparing the value of energy in plot 9 and plot 10 we know that fruits have much larger energy than vegetables. For vegetables, corn sweet and potatoes have the largest energy content (about 19 kcal). For the fruits, orange, grape and banana have the largest energy content (about 95 kcal).

In order to find the best combination of price, yield and nutrition, we want to find the food that have the lowest price, highest nutrition and highest yield. For nutrition content, we want to find a food that has the highest protein, lowest Carbohydrate and highest Energy. Thus, by considering all those aspects, my choice of the best food is among Brussels sprouts, corn sweet, potatoes, banana and grape.

	form	price_per_lb	yield	lb_per_cup	price_per_cup	food	type
0	Fresh1	0.333412	0.520000	0.330693	0.212033	watermelon	fruit
1	Fresh1	0.535874	0.510000	0.374786	0.393800	cantaloupe	fruit
2	Fresh1	1.377962	0.740000	0.407855	0.759471	tangerines	fruit
3	Fresh1	2.358808	0.940000	0.319670	0.802171	strawberries	fruit
4	Fresh1	1.827416	0.940000	0.363763	0.707176	plums	fruit
5	Fresh1	1.035173	0.730000	0.407855	0.578357	oranges	fruit
6	Fresh1	6.975811	0.960000	0.319670	2.322874	raspberries	fruit
7	Fresh1	2.173590	0.560000	0.341717	1.326342	pomegranate	fruit
8	Fresh1	0.627662	0.510000	0.363763	0.447686	pineapple	fruit
9	Fresh1	3.040072	0.930000	0.363763	1.189102	apricots	fruit
10	Fresh1	0.796656	0.460000	0.374786	0.649077	honeydew	fruit
11	Fresh1	1.298012	0.620000	0.308647	0.646174	papaya	fruit
12	Fresh1	2.044683	0.760000	0.385809	1.037970	kiwi	fruit
13	Fresh1	3.592990	0.920000	0.341717	1.334548	cherries	fruit
14	Fresh1	0.566983	0.640000	0.330693	0.292965	bananas	fruit
15	Fresh1	1.567515	0.900000	0.242508	0.422373	apples	fruit
16	Fresh1	1.591187	0.960000	0.341717	0.566390	peaches	fruit
17	Fresh1	1.761148	0.910000	0.319670	0.618667	nectarines	fruit
18	Fresh1	1.461575	0.900000	0.363763	0.590740	pears	fruit
19	Fresh1	0.897802	0.490000	0.462971	0.848278	grapefruit	fruit
20	Fresh1	5.774708	0.960000	0.319670	1.922919	blackberries	fruit
21	Fresh1	2.093827	0.960000	0.330693	0.721266	grapes	fruit
22	Fresh1	4.734622	0.950000	0.319670	1.593177	blueberries	fruit
23	Fresh1	1.377563	0.710000	0.363763	0.705783	mangoes	fruit
24	Fresh1	3.213494	0.493835	0.396832	2.582272	asparagus	vegetables
25	Fresh, consumed with peel1	1.295931	0.970000	0.264555	0.353448	cucumbers	vegetables
27	Fresh1	1.213039	0.950000	0.242508	0.309655	lettuce_iceberg	vegetables
28	Fresh1	1.038107	0.900000	0.352740	0.406868	onions	vegetables
29	Fresh1	2.471749	0.750000	0.319670	1.053526	turnip_greens	vegetables
30	Fresh1	2.569235	0.840000	0.308647	0.944032	mustard_greens	vegetables
31	Fresh1	0.564320	0.811301	0.264555	0.184017	potatoes	vegetables
32	Fresh1	2.630838	1.160000	0.286601	0.650001	collard_greens	vegetables
33	Fresh1	2.139972	0.846575	0.275578	0.696606	green_beans	vegetables
34	Fresh1	1.172248	0.458554	0.451948	1.155360	acorn_squash	vegetables
35	Fresh1	2.277940	0.820000	0.264555	0.734926	red_peppers	vegetables
37	Fresh red cabbage1	1.056450	0.779107	0.330693	0.448412	cabbage	vegetables
38	Fresh1	0.918897	0.811301	0.440925	0.499400	sweet_potatoes	vegetables
39	Fresh1	1.639477	0.769500	0.396832	0.845480	summer_squash	vegetables
40	Fresh1	1.311629	0.900000	0.275578	0.401618	radish	vegetables
41	Fresh1	1.244737	0.714000	0.451948	0.787893	butternut_squash	vegetables
42	Fresh1	2.235874	0.740753	0.319670	0.964886	avocados	vegetables
43	Fresh1	2.807302	1.050000	0.286601	0.766262	kale	vegetables
44	Fresh1	2.213050	0.375309	0.385809	2.274967	artichoke	vegetables
45	Fresh1	3.213552	0.769474	0.352740	1.473146	okra	vegetables
46	Fresh1	1.410363	0.820000	0.264555	0.455022	green_peppers	vegetables
47	Fresh1	2.763553	1.060000	0.341717	0.890898	brussels_sprouts	vegetables
48	Fresh1	2.690623	0.540000	0.363763	1.812497	corn_sweet	vegetables

	form	price_per_lb	yield	lb_per_cup	price_per_cup	food	type	ndbno
0	Fresh1	0.333412	0.52	0.330693	0.212033	watermelon	fruit	09326
1	Fresh1	0.535874	0.51	0.374786	0.393800	cantaloupe	fruit	09181
2	Fresh1	1.377962	0.74	0.407855	0.759471	tangerines	fruit	09218
3	Fresh1	2.358808	0.94	0.319670	0.802171	strawberries	fruit	09140
4	Fresh1	1.827416	0.94	0.363763	0.707176	plums	fruit	09061

	group	measures	name	nutrient_id	unit	value	ndbno	form	price_per_lb	yield	lb_per_cup	price_per_cup	food	type
0	Proximates	[{u'eunit': u'g', u'eqv': 154.0, u'qty': 1.0, ...	Protein	203	g	0.61	09326	Fresh1	0.333412	0.52	0.330693	0.212033	watermelon	fruit
1	Proximates	[{u'eunit': u'g', u'eqv': 177.0, u'qty': 1.0, ...	Protein	203	g	0.84	09181	Fresh1	0.535874	0.51	0.374786	0.393800	cantaloupe	fruit
2	Proximates	[{u'eunit': u'g', u'eqv': 195.0, u'qty': 1.0, ...	Protein	203	g	0.81	09218	Fresh1	1.377962	0.74	0.407855	0.759471	tangerines	fruit
3	Proximates	[{u'eunit': u'g', u'eqv': 244.0, u'qty': 1.0, ...	Protein	203	g	0.58	09140	Fresh1	2.358808	0.94	0.319670	0.802171	strawberries	fruit
4	Proximates	[{u'eunit': u'g', u'eqv': 150.0, u'qty': 1.0, ...	Protein	203	g	0.50	09061	Fresh1	1.827416	0.94	0.363763	0.707176	plums	fruit

	group	measures	name	nutrient_id	unit	value
0	Proximates	[{u'eunit': u'g', u'eqv': 88.0, u'qty': 1.0, u...	Water	255	g	86.00
1	Proximates	[{u'eunit': u'g', u'eqv': 88.0, u'qty': 1.0, u...	Energy	208	kcal	43
2	Proximates	[{u'eunit': u'g', u'eqv': 88.0, u'qty': 1.0, u...	Protein	203	g	3.38
3	Proximates	[{u'eunit': u'g', u'eqv': 88.0, u'qty': 1.0, u...	Total lipid (fat)	204	g	0.30
4	Proximates	[{u'eunit': u'g', u'eqv': 88.0, u'qty': 1.0, u...	Carbohydrate, by difference	205	g	8.95