Cadena Productiva de Café

Leyendo Base de Datos.
Estructura de Datos
Producción Total de Café en todo el país, período 2007 - 2015.
Cuales fueron los 10 municipios que más produjeron café y cuanto fue la producción? (Período 2007-2015).
Cuanto del Área Sembrada es realmente Cosechada.
Cual es el rendimiento promedio por hectárea de los departamentos productores de café en Colombia.
Cambios en el nivel de producción del período 2010-2015, en los departamentos del eje cafetero (Caldas, Risaralda, Quindío, Valle del Cauca y Tolima).

Leyendo Base de Datos.

datos <- read.csv(file = "CadenaCafe.csv", encoding = "UTF-8")

Estos son los datos de la Cadena Productiva del Café (Área, Producción y Rendimiento), comprende los períodos de 2007 a 2015. Los datos se encuentran disponibles en https://www.datos.gov.co/Agricultura-y-Desarrollo-Rural/Cadena-Productiva-Caf-Area-Producci-n-Y-Rendimient/mc73-h8xp. Consultada el 5 de Diciembre de 2017.

Estructura de Datos

str(datos)

## 'data.frame':    5464 obs. of  16 variables:
##  $ CÓD..DEP.                                    : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ DEPARTAMENTO                                 : Factor w/ 24 levels "ANTIOQUIA","ARAUCA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ CÓD..MUN.                                    : int  5002 5004 5021 5030 5031 5034 5036 5038 5040 5044 ...
##  $ MUNICIPIO                                    : Factor w/ 614 levels "ABEJORRAL","ABREGO",..: 1 3 17 24 25 29 30 31 33 36 ...
##  $ GRUPO.DE.CULTIVO                             : Factor w/ 1 level "OTROS PERMANENTES": 1 1 1 1 1 1 1 1 1 1 ...
##  $ SUBGRUPO.DE.CULTIVO                          : Factor w/ 1 level "CAFE": 1 1 1 1 1 1 1 1 1 1 ...
##  $ CULTIVO                                      : Factor w/ 1 level "CAFE": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DESAGREGACIÓN.REGIONAL.Y.O.SISTEMA.PRODUCTIVO: Factor w/ 1 level "CAFE": 1 1 1 1 1 1 1 1 1 1 ...
##  $ CÓDIGO.CULTIVO                               : num  1.12e+11 1.12e+11 1.12e+11 1.12e+11 1.12e+11 ...
##  $ NOMBRE.CIENTIFICO                            : Factor w/ 1 level "COFFEA ARABICA ": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PERIODO                                      : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
##  $ Área.Sembrada.ha.                            : int  4067 135 739 832 1552 8696 600 1240 214 1200 ...
##  $ Área.Cosechada.ha.                           : int  3819 135 733 785 1432 8484 600 1180 194 1140 ...
##  $ Producción.t.                                : int  2329 81 733 570 1439 13574 480 1180 217 1311 ...
##  $ Rendimiento.t.ha.                            : num  0.6 0.6 1 0.7 1 1.6 0.8 1 1.1 1.2 ...
##  $ ESTADO.FISICO.PRODUCCION                     : Factor w/ 1 level "PERGAMINO SECO": 1 1 1 1 1 1 1 1 1 1 ...

summary(datos[ ,c(12:15)])

##  Área.Sembrada.ha. Área.Cosechada.ha. Producción.t.   Rendimiento.t.ha.
##  Min.   :    0     Min.   :    0.0    Min.   :    0   Min.   : 0.1000  
##  1st Qu.:  201     1st Qu.:  161.8    1st Qu.:  113   1st Qu.: 0.6000  
##  Median :  758     Median :  639.0    Median :  506   Median : 0.8900  
##  Mean   : 1506     Mean   : 1247.3    Mean   : 1216   Mean   : 0.8722  
##  3rd Qu.: 1818     3rd Qu.: 1543.2    3rd Qu.: 1375   3rd Qu.: 1.0700  
##  Max.   :20465     Max.   :17125.0    Max.   :18751   Max.   :10.7000  
##                                                       NA's   :36

Se eliminaron los datos ausentes de Rendimiento por hectárea. Al igual que Áreas y Producción con valor cero (0). Adicional, la variable “PERIODO” fue convertida a factor.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

datos2 <- datos %>%
  filter(Área.Sembrada.ha. != 0 & Área.Cosechada.ha.!= 0
          & Producción.t. != 0 & Rendimiento.t.ha. != "")

summary(datos2[ ,c(12:15)])

##  Área.Sembrada.ha. Área.Cosechada.ha. Producción.t.   Rendimiento.t.ha.
##  Min.   :    1     Min.   :    1      Min.   :    1   Min.   : 0.1000  
##  1st Qu.:  210     1st Qu.:  170      1st Qu.:  119   1st Qu.: 0.6000  
##  Median :  771     Median :  651      Median :  511   Median : 0.8900  
##  Mean   : 1518     Mean   : 1258      Mean   : 1226   Mean   : 0.8728  
##  3rd Qu.: 1840     3rd Qu.: 1557      3rd Qu.: 1385   3rd Qu.: 1.0700  
##  Max.   :20465     Max.   :17125      Max.   :18751   Max.   :10.7000

datos2[ ,11] <- factor(datos2[ ,11])

str(datos2[ ,11])

##  Factor w/ 9 levels "2007","2008",..: 1 1 1 1 1 1 1 1 1 1 ...

La base de datos fue llevada de Formato Ancho a Formato Largo.

library(tidyr)

datos3 <- datos2 %>%
  gather(key = Variable,
         value = Valor,
         -c(1:4,11))

## Warning in if (!is.finite(x)) return(FALSE): the condition has length > 1
## and only the first element will be used

## Warning: attributes are not identical across measure variables;
## they will be dropped

Producción Total de Café en todo el país, período 2007 - 2015.

library(ggplot2)
library(ggthemes)

datosP <- datos3 %>%
  filter(Variable == "Producción.t.") %>%
  mutate(Valor = as.integer(Valor))

datosP2 <- datosP %>%
  group_by(PERIODO, Variable) %>%
  summarise(Total = sum(Valor)) %>%
  select(Variable, PERIODO, Total)

ggplot(data = datosP2, aes(x = PERIODO, y = Total)) +
  geom_bar(stat = "identity", color = "green3", fill = "gray20") +
  geom_text(aes(label = Total),
            color = "darkred",
            size = 4.7,
            position = position_identity(), vjust = -0.2) +
  labs(x = "Año - Período",
       y = "Toneladas de Café",
       title = "Producción de Café en Colombia, años 2007 a 2015") +
  theme_light()

Distribución de la Producción de Café en Colombia.

ggplot(data = datos2, aes(x = Producción.t.)) +
  geom_histogram(aes(y = ..density..), bins = 100, color = "blue", fill = "gray20") +
  geom_density(alpha = 0.3, fill = "red", colour = "red") +
  labs(x = "Producción en Toneladas",
       y = "Densidad",
       title = "Distribución Producción de Café") +
  theme_light()

Cuales fueron los 10 municipios que más produjeron café y cuanto fue la producción? (Período 2007-2015).

#Se agrupo por Municipio y Departamento, debido a que existen Municipios con el mismo nombre en diferentes Departamentos.

datosM <- datosP %>%
  group_by(MUNICIPIO, DEPARTAMENTO) %>%
  summarise(Total = sum(Valor)) %>%
  arrange(-Total)

ggplot(data = datosM[1:10, ], aes(x = MUNICIPIO, y = Total)) +
         geom_bar(stat = "identity", fill = "gray20", color = "yellow") +
         geom_text(aes(label = Total),
                   position = position_identity(), vjust = -0.1,
                   color = "darkred",
                   size = 4) +
         labs(x = "Municipio",
              y = "Producción en Toneladas",
              title = "10 Municipios con mayor Producción de Café, Período 2007-2015") +
  theme_light() +
  theme(axis.text.x = element_text(size = 6.5, angle = 90))

Cuanto del Área Sembrada es realmente Cosechada.

datosA <- datos3 %>%
  filter(Variable == "Área.Sembrada.ha.") %>%
  group_by(Variable) %>%
  mutate(Valor = as.integer(Valor)) %>%
  summarise(Total = sum(Valor))

datosA1 <- datos3 %>%
  filter(Variable == "Área.Cosechada.ha.") %>%
  group_by(Variable) %>%
  mutate(Valor = as.integer(Valor)) %>%
  summarise(Total = sum(Valor))

datosA2 <- data.frame(cbind(AreaSembrada = 8225624, AreaCosechada = 6815097))

datosA3 <- datosA2 %>%
  mutate(round((Porcentaje = AreaCosechada/AreaSembrada * 100), digits = 2))

names(datosA3) <- c("Área Sembrada (ha)", "Área Cosechada (ha)", "% de Cosecha")

library(pander)

pander(datosA3)

Área Sembrada (ha)	Área Cosechada (ha)	% de Cosecha
8225624	6815097	82.85

Cual es el rendimiento promedio por hectárea de los departamentos productores de café en Colombia.

datosR <- datos3 %>%
  filter(Variable == "Rendimiento.t.ha.") %>%
  group_by(DEPARTAMENTO) %>%
  mutate(Valor = as.integer(Valor)) %>%
  summarise(Rendimiento = round(mean(Valor), digits = 3)) %>%
  arrange(-Rendimiento)

names(datosR) <- c("Departamento", "Rendimiento (Ton./ha)")

library(DT)

datatable(datosR) %>%
  formatStyle(columns = c(1,2), 'text-align' = "center")

Nota: El Departamento de Arauca solo contaba con un (1) dato, sin embargo no fue excluido de la medición.

Cambios en el nivel de producción del período 2010-2015, en los departamentos del eje cafetero (Caldas, Risaralda, Quindío, Valle del Cauca y Tolima).

datosD <- datos3 %>%
  filter(DEPARTAMENTO == "CALDAS" |
           DEPARTAMENTO == "RISARALDA" |
           DEPARTAMENTO == "QUINDIO" |
           DEPARTAMENTO == "VALLE DEL CAUCA" |
           DEPARTAMENTO == "TOLIMA")

datosD1 <- datosD %>%
  filter(Variable == "Producción.t.") %>%
  mutate(Valor = as.integer(Valor)) %>%
  group_by(DEPARTAMENTO, PERIODO) %>%
  summarise(Total = sum(Valor))

datosD2 <- datosD1 %>%
  filter(PERIODO == 2010 |
           PERIODO == 2011 |
           PERIODO == 2012 |
           PERIODO == 2013 |
           PERIODO == 2014 |
           PERIODO == 2015)


ggplot(data = datosD2, aes(x = DEPARTAMENTO, y = Total)) +
  geom_bar(aes(fill = PERIODO), stat = "identity", position = "dodge", color = 1) +
  scale_fill_brewer(palette = "Set1") +
  labs(x = "Departamento",
       y = "Producción Total",
       title = "Producción de Café en los departamentos del eje cafetero, 2010-2015") +
  theme_light()