I have a dataset for around 40k firms over fiscal years 1950-2011 with about 430k firm-years. If I'm not mistaken I have panel-data. In addition, the firms are nested within 9 industries.
I created a unique identifier
ticn for each firm. Years are indicated by
fyear. For now my variables of interest are yearly sales
sale, yearly advertising
xad, and yearly R&D expenses
xrd. I have industry dummies indicated by
sicagg. I am interested in the relationship between yearly sales and advertising/R&D expenditures, including some control variables.
Currently I am in the exploratory phase of my research.
So my objective, I want to get a feel for the data, give descriptives and maybe make a few plots.
First I computed between and within descriptive statistics (mean, stdev, min and max). I also made scatter plots between sales, R&D and advertising. In addition, I plotted the time-series of the yearly average advertising expenses for each industry in a nice graph.
Can you give me ideas for additional analyses? Thanks in advance!
I always start by doing a PCA (Principal Component Analysis) in R because it takes almost no writing. Say you have all this in a
data.frame that we call
pca <- prcomp(data) # Screeplot. plot(pca) # Biplot. biplot(pca)
For R users, there is also the
ggplot2 library. I know that it can do wonders for data representation, but I don't know how to use it. Maybe someone will suggest something with it?