Working with a colleague on some household income data, where work is irregular. As usual, I start by writing a synthetic data simulation to talk through with colleague. Helps to ensure I understand the problem right. Also brings up fun (for me) like sources of measure error.
In this case, income data are reports and almost certainly suffer rounding and heaping. It's the little things like this that make even simple exercises not so simple.
Also beginning to worry I am a weird sort of economist now, since half of my recent projects are household income data and I've started using the word "elasticity" in casual conversation.
At least I don't use Stata.
At least I don't use Stata.
A burst of work this morning, before turning myself over to continuous meetings until 10pm (!) tonight, and I have a seemingly working prototype inference model. Still pending: rounding error in reports. That'll be next.
On the horizon, according to the person who gathered the data, is modeling seasonality. So going to build some smoother on top of all of this I image.
Break between meetings, I've added partial pooling, because I'm just some matrix algebra banshee who wails Cholesky factors into everything now
In place of a Dark Simulation you would have Facts About Random Variables! Not dark but beautiful and terrible as the Dawn!
Ahem, computing some summaries of the inferred income distributions.
Ahem, computing some summaries of the inferred income distributions.
After getting distracted by commission work the last two weeks, I grab some time today to chip away at this project. Now making a clear doc with the mathematical version of the generative and statistical models. Good chance to find corner cases I missed earlier.
OMG just realized the Pr(d=D) expression needs to account for observed gaps. i.e. implied sum only up to [length of reporting interval] - [days until most recent income report].
This is going to be some hypergeometric nonsense, so might just brute force it in a loop.
This is going to be some hypergeometric nonsense, so might just brute force it in a loop.
I hope you people appreciate my posting my L's here