Here& #39;s a bunch of  @Stata tips for handling large datasets (millions of observations).
I wish I knew them when I was starting with admin data analysis...
                    
                                    
                    I wish I knew them when I was starting with admin data analysis...
                        
                        
                        1) Memory is often an issue, so store your data efficiently!
- use & #39;compress& #39; command to recast your variables into appropriate data types
- declare the correct data types when generating variables
(& #39;gen byte var1 = 1& #39;)
                    
                                    
                    - use & #39;compress& #39; command to recast your variables into appropriate data types
- declare the correct data types when generating variables
(& #39;gen byte var1 = 1& #39;)
                        
                        
                        2) Merging does not need to take forever!
- be aware that the & #39;merge& #39; command sorts & #39;master& #39; and & #39;using& #39; data on matching variables
- you can save a lot of time by running the merge command on datasets that are already sorted!
                    
                                    
                    - be aware that the & #39;merge& #39; command sorts & #39;master& #39; and & #39;using& #39; data on matching variables
- you can save a lot of time by running the merge command on datasets that are already sorted!
                        
                        
                        3) Command & #39;joinby& #39; does what & #39;merge m:m& #39; should have been doing all along
- that is, it forms pairwise combinations between & #39;master& #39; and & #39;using& #39;
                    
                                    
                    - that is, it forms pairwise combinations between & #39;master& #39; and & #39;using& #39;
                        
                        
                        4) If you& #39;re working with spell-level data, learn to use Stata& #39;s native date & time functions
- these will allow you to store the time data efficiently and avoid transformation/approximation errors
                    
                                    
                    - these will allow you to store the time data efficiently and avoid transformation/approximation errors
                        
                        
                        5) Factor variables can help you overcome memory constraints in regressions
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        6) When estimating regression models with if conditions, it is often faster to drop all the irrelevant data
preserve
keep if var1 ==1
reg y x, robust
restore
                    
                                    
                    preserve
keep if var1 ==1
reg y x, robust
restore
                        
                        
                        I& #39;ll add more when I think of something relevant. 
Feel free to chime in too https://abs.twimg.com/emoji/v2/... draggable="false" alt="🙏" title="Folded hands" aria-label="Emoji: Folded hands">
https://abs.twimg.com/emoji/v2/... draggable="false" alt="🙏" title="Folded hands" aria-label="Emoji: Folded hands">
                        
                        
                        
                        
                                                
                    
                    
                
                Feel free to chime in too
 
                         Read on Twitter
Read on Twitter 
                                     
                                    