codemonth.dk

One project every month - making stuff better ...

Chinese support added to parts of the plsql random data generator

For quite a while I have been wanting to add Chinese support to some of the dbms_random functions, and this last weekend I finally got some time to work a little bit on it. So now the library support Chinese output in functions in the core_random package, text_random package and location_random package.

Version your Oracle ORDS REST API using Edition Based Redefinition

Updated, May 4 2017: New approach One of the last entries in my blog described how I could quickly publish all random_ninja functions as rest endpoints. While I am still waiting for the ORDS team at Oracle to come out with automatic publish of functions and procedures including input parameters, I wanted to add another layer of functionality to my REST endpoints. Versioning; And for that we can use the cool Oracle feature called Edition Based Redefinition.

Take a random REST

I needed to test ORDS for some webservice related projects at work. So I decided to build a rest interface for all the different random data functions in the RANDOM_NINJA package to test out, just how easy it is.

First order is to setup ORDS. Like always Tim over at oracle-base have a couple of really good articles on installing and setting up ORDS itself. So no need for me to duplicate that.

After setting up ORDS, and enabling my schema for rest services like in this oracle-base article, it is time to enable the functions in the different data domain packages. I decided to keep the URL scheme simple, so that the path would be /[package_name]/[function_name]/ for any of the functions that are part of the RANDOM_NINJA package.

Updated random_ninja and testdata_ninja packages

A couple of people have requested that I explain how to install the entire random_ninja and testdata_ninja packages manually. I have created a gist here with the full and complete order of the files: Full install list. So download random_ninja zipfile and testdata_ninja zipfile and follow the order of the gist to install all the code.

Once you have installed the packages you have the full random library available, which you can read in full detail about here: Post #1, Post #2 and Post #3. For testdata package you can see a quick demo here and here.

Release 3 of random_ninja is coming up and this is the list of new functionality that will be fully supported (HINT: The code is already in the github source):

  • company_random
    • r_companyname - generate a random company name.
    • r_industry - generate a random industry type.
    • r_companyid - generate a random country tax id.
    • r_employees - generate a random employee count.
    • r_revenue - generate a random revenue number.
  • phone_random
    • r_brand - generate a random brand name.
  • computer_random
    • r_error - generate a random error string.
  • util_random
    • ru_permute - Permute a string: Morten -> Metron.
    • ru_scramble - Scramble a string: Morten -> Diwnah.
    • ru_obfuscate - Obfuscate a string: Morten -> FasNituha.
  • transport_random
    • r_vehicle_registration - generate random country specific vehicle registration plate.
    • r_icao - generate a random country specific aircraft registration code.
    • r_imo - generate a random vessel registration code.

Also there are 3 new locale supported for names, addresses and more. These countries are Denmark, China and Dhubai. Performance improvements for Markov text generation are coming as well as new SWIFT and FIX financial random generation.

Random fun

It has been some fun evenings lately. I have added a lot of new functionality and new data domains to the RANDOM_NINJA package. One of the bigger updates is that I have created a Markov Chain function for text generation, so the random text is a lot more real looking:

SQL> select text_data.markov_sentence from dual;

MARKOV_SENTENCE
--------------------------------------------------------------------------------------------------------------
In but thruout the entire world various kinds of traps and snares as.

SQL> select text_data.markov_sentence from dual; 

MARKOV_SENTENCE
--------------------------------------------------------------------------------------------------------------
Safe however that not one-half of the best trappers use them. It requires.

SQL> select text_data.markov_sentence from dual;

MARKOV_SENTENCE
--------------------------------------------------------------------------------------------------------------
The trappers use them. It is a humane way of killing.

SQL> 

This really adds a lot of functionality in terms of testing text functionality and text indexing, as the generated text is much closer to how real text would look.

The person_random has received a lot of job related functionality, so we can now generate job titles, salaries and more:

SQL> select person_random.r_jobsector from dual;

R_JOBSECTOR
--------------------------------------------------------------------------------------------------------------
Educational Services

SQL> select person_random.r_jobtitle from dual;

R_JOBTITLE
--------------------------------------------------------------------------------------------------------------
Consultant

SQL> select person_random.r_jobtitle(r_jobsector => 'Educational Services') from dual;

PERSON_RANDOM.R_JOBTITLE(R_JOBSECTOR=>'EDUCATIONALSERVICES')
--------------------------------------------------------------------------------------------------------------
Teacher

SQL> select person_random.r_salary(r_jobsector => 'Educational Services') from dual;

PERSON_RANDOM.R_SALARY(R_JOBSECTOR=>'EDUCATIONALSERVICES')
----------------------------------------------------------
						     94260

SQL> r
  1* select person_random.r_salary(r_jobsector => 'Educational Services') from dual

PERSON_RANDOM.R_SALARY(R_JOBSECTOR=>'EDUCATIONALSERVICES')
----------------------------------------------------------
						     50400

SQL> select person_random.r_jobsector from dual;

R_JOBSECTOR
--------------------------------------------------------------------------------------------------------------
Management

SQL> 

The finance_random package has been updated with features related to accounts and account data generation. So we can generate random IBAN and BBAN valid bank accounts for a lot of countries, and we can generate random transaction types and account balances:

SQL> select finance_random.r_accountnumber from dual;

R_ACCOUNTNUMBER
--------------------------------------------------------------------------------------------------------------
FO1361785488259274

SQL> select finance_random.r_accountnumber('DK') from dual;

FINANCE_RANDOM.R_ACCOUNTNUMBER('DK')
--------------------------------------------------------------------------------------------------------------
DK1848327576613652

SQL> select finance_random.r_accounttype from dual;

R_ACCOUNTTYPE
--------------------------------------------------------------------------------------------------------------
Investment Account

SQL> r
  1* select finance_random.r_accounttype from dual

R_ACCOUNTTYPE
--------------------------------------------------------------------------------------------------------------
Investment Account

SQL> r
  1* select finance_random.r_accounttype from dual

R_ACCOUNTTYPE
--------------------------------------------------------------------------------------------------------------
Savings Account

SQL> select finance_random.r_accounttransaction('Savings Account') from dual;

FINANCE_RANDOM.R_ACCOUNTTRANSACTION('SAVINGSACCOUNT')
--------------------------------------------------------------------------------------------------------------
Deposit

SQL> r
  1* select finance_random.r_accounttransaction('Savings Account') from dual

FINANCE_RANDOM.R_ACCOUNTTRANSACTION('SAVINGSACCOUNT')
--------------------------------------------------------------------------------------------------------------
Deposit

SQL> r
  1* select finance_random.r_accounttransaction('Savings Account') from dual

FINANCE_RANDOM.R_ACCOUNTTRANSACTION('SAVINGSACCOUNT')
--------------------------------------------------------------------------------------------------------------
Withdrawal

SQL> select finance_random.r_accounttransaction('Investment Account') from dual;

FINANCE_RANDOM.R_ACCOUNTTRANSACTION('INVESTMENTACCOUNT')
--------------------------------------------------------------------------------------------------------------
Dividend

SQL> r
  1* select finance_random.r_accounttransaction('Investment Account') from dual

FINANCE_RANDOM.R_ACCOUNTTRANSACTION('INVESTMENTACCOUNT')
--------------------------------------------------------------------------------------------------------------
Cap. Gains Short

SQL> select finance_random.r_accountbalance from dual;

R_ACCOUNTBALANCE
----------------
       287555.25

SQL> select finance_random.r_accountbalance('Savings Account') from dual;

FINANCE_RANDOM.R_ACCOUNTBALANCE('SAVINGSACCOUNT')
-------------------------------------------------
					 430614.3

SQL> r
  1* select finance_random.r_accountbalance('Savings Account') from dual

FINANCE_RANDOM.R_ACCOUNTBALANCE('SAVINGSACCOUNT')
-------------------------------------------------
					  5411.24

SQL> 

Lots of random data

After a couple of long weekends, I have finally reached what I think is a good first attempt of a random data generation package. The number of different domains that you can generate data for, should be enough to cover most general use cases.

The package, or packages, are divided into a couple of subcategories to make it a bit more easy to get an overview of the different capabilities. You can see the below list of what you can do with this:

  • core_random - The basic random functionalities required.
    • r_bool - Random true/false values. Chance of true can be specified.
    • r_natural - Random natural numbers. Range can be specified.
    • r_character - Random character. Casing, Choice pool and Symbols can be specified.
    • r_float - Random float number. Range and precision can be specified.
    • r_integer - Random integer number. Range can be specified.
    • r_string - Random string. Length and character pool can be defined.
    • r_hex - Random hex string. Length can be defined.
  • text_random - Generate random pieces of text, to simulate comments, descriptions, articles and more.
    • r_syllable - Random syllables.
    • r_word - Random word. Number of syllables, length and capitalizon can be specified.
    • r_sentence - Random sentence. The number of words in the sentence can be specified.
    • r_paragraph - Random paragraph. Number of sentences can be specified.
  • time_random - Generate random time and date components.
    • r_second - Random second.
    • r_minute - Random minute.
    • r_hour - Random hour. am/pm format can be specified.
    • r_ampm - Random am or pm.
    • r_millisecond - Random millisecond.
    • r_year - Random year. Min and max can be specified, as well as two digits result if needed.
    • r_month - Random month. Season can be specified.
    • r_day - Random day. Month can be specified.
    • r_date - Random date. Any component of the date format can be fixed if defined.
    • r_epoch - Random epoch date.
    • r_timestamp - Random timestamp. Any component of the timestamp can be fixed if defined.
  • location_random - A wide variety of random location related data.
    • r_country - Random country. Either full name, or short ISO form can be specified.
    • r_height - Random height. Result in feet and livable height can be specified.
    • r_depth - Random depth. Result in feet, and depth on land and not ocean can be specified.
    • r_latitude - Random latitude angle. Precision can be specified.
    • r_longtitude - Random longtitude. Precision can be specified.
    • r_coordinate - Random coordinates as a pair of latitude and longtitude points.
    • r_altitude - Random altitude, from 0 to upper atmosphere. Result in feet can be specified.
    • r_state - Random state name. Shortform option available. Country can also be specified.
    • r_zipcode - Random zip code. Country can specified for country specfic formats.
    • r_street - Random street name. Country and shortform can be specified.
    • r_address - Random address. Country and max numbering of street location can be specified.
    • r_city - Random city name. Country and state can be specified.
    • r_address2 - Random address 2 line, such as apartment or name of building. Country can be specified.
    • r_level - Random building level.
  • person_random - Random personal related data.
    • r_age - Random age. Type (child, teen, adult, senior) can be specified.
    • r_birthday - Random birthday date. As with age, type can be specified.
    • r_firstname - Random first name. Both country and gender can be specified.
    • r_middlename - Random middle name. Country and gender can be specified.
    • r_lastname - Random last name. Country and gender can be specified
    • r_name - Random name. Country, Gender, Include middlename, Middle initial and prefix can be specified.
    • r_gender - Random gender.
    • r_prefix - Random prefix. Gender and country can be specified.
    • r_identification - Random valid identification number, based on the country specified.
    • r_suffix - Random suffix. Country and shortform can be specified.
  • phone_random - Generate random data related to phones.
    • r_imei - Random valid IMEI number. IMEISV version can be specified.
    • r_meid - Random valid MEID number.
    • r_country_calling_code - Random valid international code. + or 00 can be specified.
    • r_phonenumber - Random valid phone number. Generates valid landline or mobile numbers. Country and country code can be specified.
    • r_mcc - Random valid MCC code. Country can be specified.
    • r_mnc - Random valid MNC code. Country can be specified.
    • r_imsi - Random valid IMSI number. Country can be specified.
    • r_call_type - Random call type, for CDR data.
    • r_call_type_service - Random call type service, for CDR data.
    • r_operator_code - Random and valid operator code (pmnl) for CDR records. Country can be specified.
  • finance_random - Generate random financial data.
    • r_currency - Random valid ISO currency. ISO shortform can be specified.
    • r_currencypair - Random base and quote currencies. Both can be specified.
    • r_creditcard - Random valid credit card name. Shortform can be specified.
    • r_creditcardnum - Random valid creditcard number. Creditcard type can be specified.
    • r_expirydate - Random credit card expiry date. Future data can be specified.
    • r_amount - Random amount. Range and precision can be specified.
  • web_random - Generate random web/internet related data.
    • r_tld - Random top level domain. Generic and country inclusion can be specified.
    • r_domain - Random domain name. TLD can be specified.
    • r_email - Random email address. Specify if you want to use real name.
    • r_ipv4 - Random valid ipv4 ip address.
    • r_url - Random URL. Specify if you want to use https and if you want to add query parameter.
    • r_ipv6 - Random valid ipv6 ip address.
  • consumer_random - Generate random consumer related data.
    • r_category - Random consumer category. You can choose between all, food ro non-food categories.
    • r_food_item - Random food item. Category can be specified.
    • r_nonfood_item - Random non-food item. Category can be specified.
    • r_service_item - Random service related item. Category can be specified.

Phew... no wonder I felt like this was taking a lot of time to create. I really hope that people can use this for something. I know that there are several times where I have to create random test data, and I always end up using excel, or even worse, just one row with 123,123,123,asdf,asdf like data.

Now we have a better option

Generating random data

We all need to create random data every now and again. Most of the times I have resolved to simply using dbms_random to generate random strings and numbers. The problem with approach though, is that data does not look like "real" data, and indexing will not be realistic. The Alexandria library contains a package that has a few functions random_util_pkg, but it is limited in the type of data you can create, and I felt that there were a couple of more parameters where I needed a bit more control on the output.

So I have created RANDOM_NINJA which for now has the basic functions that I needed.

Just see the quick demo I have created below: