I have a dataset for around 40k firms over fiscal years 1950-2011 with about 430k firm-years. If I'm not mistaken I have panel-data. In addition, the firms are nested within 9 industries.

I created a unique identifier `ticn`

for each firm. Years are indicated by `fyear`

. For now my variables of interest are yearly sales `sale`

, yearly advertising `xad`

, and yearly R&D expenses `xrd`

. I have industry dummies indicated by `sicagg`

. I am interested in the relationship between yearly sales and advertising/R&D expenditures, including some control variables.

Currently I am in the exploratory phase of my research.

*So my objective, I want to get a feel for the data, give descriptives and maybe make a few plots.*

First I computed between and within descriptive statistics (mean, stdev, min and max). I also made scatter plots between sales, R&D and advertising. In addition, I plotted the time-series of the yearly average advertising expenses for each industry in a nice graph.

Can you give me ideas for additional analyses? Thanks in advance!

**Contents**hide

#### Best Answer

I always start by doing a PCA (Principal Component Analysis) in R because it takes almost no writing. Say you have all this in a `data.frame`

that we call `data`

.

`pca <- prcomp(data) # Screeplot. plot(pca) # Biplot. biplot(pca) `

For R users, there is also the `ggplot2`

library. I know that it can do wonders for data representation, but I don't know how to use it. Maybe someone will suggest something with it?