The Data Import utility allows a user to load data from various source file types into a database table. QLR Manager's importing process supports the filtering of the source data, so only the desired data from the source file is loaded to the table. For more advanced users, the data itself can be manipulated during the import process with QLR Manager's "evaluated formula" technique when adjustments to the raw data are desired.
Data can be imported into an existing table, or a new table can be created at the time of the import. For an existing table, there are 4 ways that the imported data can be handled. These options are in addition to the filtering that can be applied to the source data.
|•||Load the rows of data into the specified table. No examination of the existing table data takes place.|
|•||Load the rows of data only when new information is found. The User specifies how to determine when data is to be considered a new entry using the data mapping checkboxes.|
|•||Update desired fields in the existing rows of data in a table with information found in the imported data. The User specifies how to match the incoming data to the existing table data using the data mapping checkboxes.|
|•||Update desired fields in existing rows of data in a table with information found in the imported data, but do so only when there is a unique 1 to 1 match with the incoming data. The User specifies how to match the incoming data to the existing table data using the data mapping checkboxes.|
For Users with authority, this utility is accessible by selecting "User ID Admin and Tools" from the Connect panel, then clicking the Data Import header tab. Importing data is a two set process. Step one is to define the Source File and the Target Table or a New Table associated with the data. Step two is to "map" the data found in the Source File to the columns found in the Target Table. Optionally, a third step can be used, which is the definition of load filters and formulas to alter the raw data.
Once the data has been acquired and mapped to a table, clicking the Load Data button found in the top left corner of the data preview will execute the import.
A local source file, such as one found on your workstation, is selected by using the browser's HTML file upload control to navigate to the desired file using the Browse... button. After a file is selected, clicking the Get File button will load the contents of the file to the server, allowing QLR Manager to work with the data. The Get File button need only be used once to initially load the data, but the process can be repeated to load a different source file. There is no harm in acquiring the same data again, but with large source files, the upload process may be time consuming.
Note: The size of file that can be imported is limited by 2 different php settings on your server. These are upload_max_filesize and post_max_size. Generally, these are set to be 2 megabytes. QLR Manager will display a message in the Source File section of the Import panel containing the maximum size that can be uploaded.
A server source file is a file located on the server where QLR Manager resides. These are selected by using the Browse... button. After a file is selected, clicking the Get File button will load the contents of the file allowing QLR Manager to work with the data. The Get File button need only be used once to initially load the data, but the process can be repeated to load a different source file. There is no harm in acquiring the same data again, but with large source files, the upload process may be time consuming.
The Browse... button is actually a browser pop up window. It is designed to list files with certain file type extensions within a restricted directory structure. This is for security reasons, as it has the ability to navigate to any file on a server. There are two things that limit what QLR Manager will access. The first is the top level directory (the default value is to stay within the directory where QLR Manager is installed). The second is to only display files with certain file extensions (the default values being 'csv, dat, data, tsv, txt, text, xls').
Since these files are already located on your server, there is no upload file size restriction.
These values can be changed by placing entries into the qlr_info table. Here are some examples:
|•||insert into qlr_info values ('server browser top dir', '/home/public_html/data/');|
|•||insert into qlr_info values ('server browser top dir', 'C:\\Program Files\\Apache Software Foundation\\Apache2.2\\htdocs');
Notice how the backslashes have been "escaped". This is necessary for certain database engines, such as MySQL.
|•||insert into qlr_info values ('server browser file types', 'csv, txt');
This entry would limit file access to .csv and .txt files.
|•||insert into qlr_info values ('server browser file types', '*');
The asterisk allows for all file types to be displayed.
The data delimiter determines which character QLR Manager will use to parse the source data. The types of delimited data that can be imported into QLR Manager are "Comma", "Semi-colon", "Space", "Tab" and "Fixed width". The delimiter can be chosen prior to acquiring the source file data. However, if the delimiter must be changed after selecting and loading a source file, the data does not need to be acquired again. Simply change the delimiter setting. QLR Manager will automatically parse the data using the newly selected delimiter.
Fixed width data does not use a character for parsing. Instead, the data is separated by defining the starting points of each column or field of data. The ending point for that field is one character (space) less than the next starting point value. When Fixed width is the delimiter, and the Source File and Target File have been defined, the User will be able to specify the starting points in the preview data table that will be displayed. The starting points are entered into the Col start positions text box, with each starting point separated by a space. QLR Manager will pre-populate this field with its "best guess" for the field starting points. As the starting points are entered, use the Refresh button to see how the new values impact the parsing of the data.
The Target table is chosen using the controls provided for selecting a database/schema and then the desired table. Once the source file data has been acquired, and target table has been selected, the "Data Filtering & Mapping" section will appear.
When this option is checked the first row of data in the table is treated as a header row. If there is a match between the value in this row and the name of a column in the Target table, QLR Manager will automatically map the data to that table column by preselecting that column name from the column selection list. The selection can be overridden if desired.
The preview type provides a means to analyze the data before importing. When the table containing the sample (50 rows maximum) of the data to be imported is initially displayed, all the rows will be displayed. This includes the rows that will be loaded along with the rows that will be filtered displayed with a red background. To see a sample of just the rows that will be filtered, choose the "Rows to be filtered" selection and click the Refresh button. Conversely, choosing the "Rows to be loaded" selection will show a sample of only those rows that will be loaded during the import.
The header rows refer to leading rows that may be present at the start of the source file, such as column headings or descriptions. This option provides a means to bypass the loading of these header rows by choosing the number of rows in the beginning of the source file to be ignored when the data is loaded. These rows will be displayed in a dark red background after clicking the Refresh button.
Data load processing will pause when the error limit number is exceeded. This allows the User to evaluate if they want to continue with the data load. The default value is 500. In order to continue processing, press the Refresh button, followed by the Load Data button.
The batch size allows you to specify how many rows of data to processing before pausing. This can be very useful when your systems PHP maximum execution time does not allow for all data to be processed at once. Note that this does not effect the amount of data being uploaded to your PHP server environment, as the complete file is still uploaded. This only applies to how many of the rows are processed in a batch. The default value is 50000.
Once the batch size is reached, QLR Manager will automatically refresh your browser session to process the next set of records. This prevents PHP processing, and your session with the server, from exceeding the maximum execution time.
The data filtering controls allow the User to specify the records to load when the data in a column meets the given criteria. Data rows that do not meet the filter criteria will be shaded with a light red background after clicking the Refresh button. Rows that meet the import criteria will have a white background. There are two sets of filters that can be used in conjunction with one another, in either an "And" or "Or" relationship. The column number found in the select list in the first field of the filter represents the column number as found in the data preview area. In order for a filter to be applied, a column number must be selected. If the column number selection is left blank, the filter is ignored.
The second field of the data filter allows the User to choose a comparison operator. The comparison operator will be applied to the column selected in the first field. The third field of the filter is used to enter the comparison values. An example of a basic filter is Col # 7 equals Montana. There is no need to surround the comparison values with quotes. Multiple criteria values can be entered separated by commas such as Montana,Idaho,Utah.
When the "evaluated formula" is chosen as the comparison operator, the User can create their own PHP based formula in the comparison field. An example might be the 3rd column of data contains the last name and only imported data desired is where the last name is Jay, but the data it is a mixture of entries such as Jay, JAY and jay, along with other last names. The User could set the column reference to Col # 3, the compare operator to "evaluated formula" and the comparison text field could contain the evaluated formula of strtoupper($value)=='JAY'. This combination will find all occurrences of Jay, regardless of the case of the letters. The evaluated formulas used in the data filters do not change the data being loaded. The evaluated formula found in the data preview area can be used to manipulate the data during load.
As with other places where evaluated formulas are supported in QLR Manager, data found in other columns can be referenced by using the $val[ ] array format. For example, if the customer number is found in column 1, and the state is found in column 6, an evaluated formula can be written to import all customers, whose customer number is greater than 2000 and are in NY: $val > 2000 && $val=='NY'. Please note that when using the $val[ ] technique to reference column data, the column reference field (such as Col # 1) is not used. However, a column number must be selected to apply the filter (any Col #). QLR Manager will only apply a filter if the column reference selection is not blank.
This optional text is used to limit the table rows being impacted by the Import event. This applies when using the "Insert when new", "Update existing data" or the "Update when unique" options.
This field allows the User to add addition SQL to enhance the data selection criteria from the existing Table. For example, suppose it is desired to update only a subset of data such as only the persons located in Denmark. The text country_code='dk' could be entered into this field. When the update query is generated, it may look something like this:
update customer set language='danish' where last_name='hansen' and country_code='dk'
The ' and country_code='dk' was appended to the update query string matching.
Once a source file and target table have been selected, the Import Data Preview will appear showing the first 50 rows of source data. It is in this preview where the source data is mapped to the columns in the chosen table.
When the data is displayed in the preview, the number of characters for any given column of source file information is limited. When delimiters other than "Fixed width" are used, the data is limited to displaying the first 25 characters, followed by "...", which indicates that the data has been limited for display purposes. When the data is fixed width, the middle section of the data is removed and replaced with "...". The first 25 and last 25 characters are displayed.
The source file column number is used as a reference for each of the data columns that are the result of the data parsing. These column references are used with the data filtering option.
This is the most important of the import preview options. The column mapping is where the columns in the source file are mapped to the table columns when loaded. At least one column in the source data must be mapped to a table column. A source data column can only be mapped to one table column. QLR Manager will check for both of these conditions when the Load Data button is clicked. If either condition is found, the User is returned to the Data Preview to correct the situation.
The Map table column option is not shown when creating a New Table.
This option is displayed when creating a New Table with the imported data.
The Column name field provides a means of defining the column name for the data being imported. Only alpha-numeric characters and underbars should be used. If a field is left empty, then no table column will be created and data will not be imported for that column.
These checkboxes are used when it is desired to insert data into a table, or update existing data in a table. When checked, it indicates to QLR Manager to match the data being imported to the data that already exists in that table column. This option is not displayed when importing data to a New Table.
Suppose a table called "contacts" exists that contains potential new customers. The table contains columns for first_name, last_name, postal_code, email and status. You also have a file that contains more potential customers. However,this file may contain names and emails of persons that already exist in the "contacts" table.
Inserting new data:
If it is desired to import new contacts only when the email is not already found in the "contacts" table, you would first choose the Import type of Insert when new. Then check the proper Unique/New criteria checkboxes. In this example, you would check the checkbox in the column which maps the email table field to the email data found in the input file. When QLR Manager loads the data, it will only load the file data into the table when the email is not already found in the table. If the file data contained multiple entries for the same email address, it would only load the first record with that email.
Update existing data:
There may be times when you want to update existing table data based on other data that you acquire. Suppose that you have a file of postal codes and the date that a mailer was sent to all persons living in that postal code. What you are looking to do, is to update the status field in your "contacts" table with that date. This is accomplished by first choosing the Import type of Update existing data. The second step is to check the checkbox which maps the postal code in the data file to the postal_code field in the table. The last step is to map the data fields to be updated. This is accomplished by mapping the date found in the data file to the status column found in the table. Using the Map table columns selection list, select the status column to be mapped to the date data.
This option will allow for multiple rows of data in your table to be updated by one row of data in your source file. In this case, you have a postal code entry in your data file that may have updated the status field for 200 or 4000 records in the "contacts" table.
Update when unique:
This works the same way as the Update existing data option, except that it will only update the data in the table when a single row is found that meets the mapping criteria.
Suppose you have a file of first and last names, with an email address. What you are looking to do, is to update the email field in your "contacts" table with the new email address. However, you only want to update the email address when you find a single occurrence of that first and last name. You will handle those with multiple matches manually. This is accomplished by first choosing the Import type of Update when unique. The second step is to check the checkboxes which map the first name and last name data to the first_name and last_name columns in the table. The last step is to map the data fields to be updated. This is done by mapping the email data found in the file to the email column found in the table. Using the Map table columns selection list, select the email column to be mapped to the email data.
When unique combinations of first_name and last_name are found in the table, the email address will be updated.
As the mapped columns are selected, the column data type is updated to display the "type" of data that is associated to that table column. Typical entries are varchar, int, datetime, etc. This can help in determining if the source data is being mapped to the correct table column.
When creating a New Table, various data type options are displayed. The options are based upon the type of data that is found in the data being imported. The User can choose to override the suggested data type that QLR Manager has calculated.
When creating a New Table, certain data type fields have associated size values. For example, a variable character field (varchar) has a size value that specifies the maximum allowable number of characters that can be input into that field. Each database engine has its own maximum value that can be specified. MySQL is 255, and Oracle is 4000 characters. QLR Manager calculates the varchar size based on the data being imported. It will be displayed in the selection list, such as varchar(24). This size can be overridden by entering a new value into the Column size field.
For numeric fields, such as the decimal data type, both the maximum size of the number, and the number of decimal places can be specified. For example, if a data type of decimal is chosen, the Column size can be entered as 7. This means that the largest number entered can have 7 positions. If two decimal places are desired for that field, the column size can be entered as 7,2. Please be aware that the number of decimal places is part of the overall size of the number, so if you add decimal places, you should increase the total size as well. In this example, 9,2 would accomodate numbers as large as 7.
Loading date information can be challenging because a date, in the source data, can be in so many different formats. QLR Manager provides the User with options for parsing data that is to be loaded as a date or datetime information. These various parsing formats are values for the Date format option.
The Date format option is only applicable when a Column Type of "date" or "datetime" is selected. When "date" is chosen, QLR Manager will try and convert data to the standard yyyy-mm-dd SQL date format for inserting the data into your database. When the "datetime" format is chosen, QLR converts the date portion to the yyyy-mm-dd format. The remaining portion is then appended to the date, so it is the User's responsibility to provide a valid time format. If the input cannot be converted to a valid date, it will be given a date of '1969-12-31'. When QLR Manager first obtains the source data file, it looks for patterns in the data that may signify that the information for a column is a date. If a recognized pattern is found, the Column type is preset to "date", and the Date format is preset to the pattern that is found.
|(blank)||When the blank date format is chosen, QLR Manager uses the PHP string to time function, strtotime(), to try and convert a string to a date value. This will convert values that are in a random pattern to a date: July 14th, 2004 will be converted to "2004-07-14". In order for this function to work properly, the input string must contain a four digit year, and enough information to distinguish the month from the day.|
|ddmmyyyy||This set of formats can be used to parse data where there is no separator between the date components. The input must be totally numeric and the data parsing is positional dependent. The User must choose the proper format for parsing, as QLR Manager does not attempt to guess at the correct format.|
|dd-mm-yyyy||This option is based upon QLR Manager finding a consistent separator between the date components. When a two digit year is availabe, QLR Manager will set the century to 1900 when the year is greater than 20. 24 will be set as 1924. 18 will be set as 2018. Valid separators are a blank space,'.','-','/' or '_'. The "mm" represents a numeric representation of the month. The month can be 1 or 2 digits. QLR Manager will guess at the format option to use, but the User can change the selected option.|
|dd-mmm-yyyy||This option is also based upon QLR Manager finding a valid date separator character. When a two digit year is available, QLR Manager will set the century to 1900 when the year is greater than 20. 24 will be set as 1924. 18 will be set as 2018. The "mmm" represents a character based month of 3 or more characters. The month can be determined as long as the first 3 characters of the input data are the first 3 characters of a valid month name. QLR Manager will guess at the format option to use, but the User can change the selected option.|
When importing Date data, it is strongly suggested that the User Preview the data to be loaded to determine if the desired results will be achieved.
The evaluated formula provides the User the ability to manipulate their data to be loaded using PHP. This gives the User unlimited flexibility in performing clean-up when importing their data. The data in the column is referenced by $value. Data in other columns is referenced using the $val[ ] array format. Data from the prior row can be accessed using $pval[ ]. Some evaluated formula examples are:
|•||$value = strtolower($value)|
|•||$value = substr($value,1,4)|
|•||$value = $val.substr($val,2,8)|
|•||$value = date('Y-m-d')|
|•||$value = $pval + 1 (This increments the current row value by 1)|
The impact of an evaluated formula can be seen by clicking the Refresh button.
This option is only displayed when "Fixed width" data is being imported. It allows the User to specify the break points to parse the source data into separate columns of information. The numeric values entered into this field are separated by spaces, such as 1 15 32 56. When a source file is first loaded, and the Data delimiter is "Fixed width", QLR Manager will try and determine what the starting points might be by looking for repeating patterns in the source data. It also does this when the starting point input field is blank, and a Refresh button is clicked. The User can edit the values in this field and click the Refresh button to view the impact of new values.
This option is only displayed when "Fixed width" data is being imported. The gauge allows the User to determine the proper starting points to be defined when working with fixed width data.