Solved – Panel-data exploratory data analysis

I have a dataset for around 40k firms over fiscal years 1950-2011 with about 430k firm-years. If I'm not mistaken I have panel-data. In addition, the firms are nested within 9 industries.

I created a unique identifier ticn for each firm. Years are indicated by fyear. For now my variables of interest are yearly sales sale, yearly advertising xad, and yearly R&D expenses xrd. I have industry dummies indicated by sicagg. I am interested in the relationship between yearly sales and advertising/R&D expenditures, including some control variables.

Currently I am in the exploratory phase of my research.

So my objective, I want to get a feel for the data, give descriptives and maybe make a few plots.

First I computed between and within descriptive statistics (mean, stdev, min and max). I also made scatter plots between sales, R&D and advertising. In addition, I plotted the time-series of the yearly average advertising expenses for each industry in a nice graph.

Can you give me ideas for additional analyses? Thanks in advance!

I always start by doing a PCA (Principal Component Analysis) in R because it takes almost no writing. Say you have all this in a data.frame that we call data.

pca <- prcomp(data) # Screeplot. plot(pca) # Biplot. biplot(pca) 

For R users, there is also the ggplot2 library. I know that it can do wonders for data representation, but I don't know how to use it. Maybe someone will suggest something with it?

