I was handed 5,300 PDFs of medical manuals and now I& #39;m going to put them into the Archive: A thread.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        First, I& #39;m keeping the original 29gb .tar.bzip2 they gave it to me in, because I know there are folks for whom they want just one big pile, and don& #39;t want my clever little uploads. Keep the originals around, if you can - let someone else have the chance you did to start.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Next, the metadata is partially in the directory tree. I am writing a custom script to take the directory structure to add keywords.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        I can make collections because I& #39;m an admin. The collection name will be "manuals_medicaldevices" and be in the "manuals" collection.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        I& #39;m now rewriting my longtime uploading script to do a little extra work since the metadata is there in the file directory. I& #39;ll then test with a single item.
The collection is now waiting for me: https://archive.org/details/manuals_medicaldevices">https://archive.org/details/m...
                    
                                    
                    The collection is now waiting for me: https://archive.org/details/manuals_medicaldevices">https://archive.org/details/m...
                        
                        
                        These things almost never go right the first time, so I& #39;m running just one iteration of my script, on a single item: a Welch Allyn LCI 100 & 200 Imaging System Service Manual. I see it got uploaded, possibly with useful metadata.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Now we& #39;re going to run into an interesting situation - the archive has a massive queue system running, with hundreds of thousands of "jobs" a day. My manual upload will fall into place, over the course of a few minutes, and then generate a readable version. It& #39;s not instant.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        You can now see the item here:  https://archive.org/details/manual_Welch_Allyn_LCI_100_200_Imaging_System_Service_Manual">https://archive.org/details/m...
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Note that how it looks depends on when you see it. If you& #39;re following this thread this exact moment, then it& #39;s going to be very incomplete. But then, over time, it will pull in a thumbnail, generate an online readable version, and it& #39;ll add OCR to the search function.
                        
                        
                        
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Looking over this item, I already now see there& #39;s an interesting situation: I happened to choose an item where the creators of this collection would put two perfectly the same copies into the directories!
                        
                        
                        
                        
                                                
                        
                                                
                    
                    
                                    
                    
                        
                        
                        Devices.
                        
                        
                        
                        
                                                
                    
                    
                
                 
                         Read on Twitter
Read on Twitter 
                             
                                     
                                    