We access conflict data from Uganda from the past 20 years. We clean and end up with a list of events, describing the actors involved, the type of conflict and its location.

Here, we are cleaning data from inidividual conflict events from the ACLED API.

%run /Users/thomasadler/Desktop/futuristic-platipus/capstone/notebooks/ta_01_packages_functions.py
/Users/thomasadler/opt/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
%run /Users/thomasadler/Desktop/futuristic-platipus/keys.py
conflict_api_endpoint = "https://api.acleddata.com/acled/read"

We want to get every single conflict event that happened in Uganda since 1997 (the start of the dataset).

uganda_iso = 800
conflict_r = requests.get(
    f'{conflict_api_endpoint}?key={conflict_api_key}&email={conflict_api_email}&limit=0&iso={uganda_iso}.csv'
)

#saving as json data
data = conflict_r.json()

#extract events information
events = data['data']

#save to a dataframe
uganda_conflict = pd.DataFrame(events)
uganda_conflict_df = uganda_conflict.copy()
uganda_conflict_df.tail()
data_id iso event_id_cnty event_id_no_cnty event_date year time_precision event_type sub_event_type actor1 ... location latitude longitude geo_precision source source_scale notes fatalities timestamp iso3
7849 6876098 800 UGA5 5 1997-01-11 1997 1 Violence against civilians Abduction/forced disappearance LRA: Lords Resistance Army ... Acholi-Bur 3.1258 32.9197 1 New York Times International LRA abduct an unknown number of people taking ... 0 1618581922 UGA
7850 6876117 800 UGA4 4 1997-01-08 1997 1 Battles Armed clash Military Forces of Uganda (1986-) ... Kasese 0.1833 30.0833 3 Local Source Subnational Battle between Ugandan army and ADF rebels - 2... 2 1618581759 UGA
7851 6876122 800 UGA3 3 1997-01-07 1997 1 Battles Armed clash Military Forces of Uganda (1986-) ... Nyabani 0.1358 30.3636 1 Local Source Subnational 5 ADF rebels were killed when the Ugandan army... 5 1618581598 UGA
7852 6876154 800 UGA1 1 1997-01-01 1997 3 Battles Armed clash Military Forces of Uganda (1986-) ... Gulu 2.7667 32.3056 3 Africa Research Bulletin Other Ugandan army battled with LRA rebels - 4 rebel... 4 1618581296 UGA
7853 6876155 800 UGA2 2 1997-01-01 1997 3 Battles Armed clash Military Forces of Uganda (1986-) ... Mityana 0.4015 32.0452 3 Africa Research Bulletin Other Over 20 rebel groups believed to belong to Dun... 5 1618581439 UGA

5 rows × 31 columns

uganda_conflict_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7854 entries, 0 to 7853
Data columns (total 31 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   data_id           7854 non-null   object
 1   iso               7854 non-null   object
 2   event_id_cnty     7854 non-null   object
 3   event_id_no_cnty  7854 non-null   object
 4   event_date        7854 non-null   object
 5   year              7854 non-null   object
 6   time_precision    7854 non-null   object
 7   event_type        7854 non-null   object
 8   sub_event_type    7854 non-null   object
 9   actor1            7854 non-null   object
 10  assoc_actor_1     7854 non-null   object
 11  inter1            7854 non-null   object
 12  actor2            7854 non-null   object
 13  assoc_actor_2     7854 non-null   object
 14  inter2            7854 non-null   object
 15  interaction       7854 non-null   object
 16  region            7854 non-null   object
 17  country           7854 non-null   object
 18  admin1            7854 non-null   object
 19  admin2            7854 non-null   object
 20  admin3            7854 non-null   object
 21  location          7854 non-null   object
 22  latitude          7854 non-null   object
 23  longitude         7854 non-null   object
 24  geo_precision     7854 non-null   object
 25  source            7854 non-null   object
 26  source_scale      7854 non-null   object
 27  notes             7854 non-null   object
 28  fatalities        7854 non-null   object
 29  timestamp         7854 non-null   object
 30  iso3              7854 non-null   object
dtypes: object(31)
memory usage: 1.9+ MB
num_columns = [
    'latitude',
    'longitude',
    'fatalities',
]

for col in num_columns:
    float_converter(uganda_conflict_df, col)

#check
uganda_conflict_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7854 entries, 0 to 7853
Data columns (total 31 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   data_id           7854 non-null   object 
 1   iso               7854 non-null   object 
 2   event_id_cnty     7854 non-null   object 
 3   event_id_no_cnty  7854 non-null   object 
 4   event_date        7854 non-null   object 
 5   year              7854 non-null   object 
 6   time_precision    7854 non-null   object 
 7   event_type        7854 non-null   object 
 8   sub_event_type    7854 non-null   object 
 9   actor1            7854 non-null   object 
 10  assoc_actor_1     7854 non-null   object 
 11  inter1            7854 non-null   object 
 12  actor2            7854 non-null   object 
 13  assoc_actor_2     7854 non-null   object 
 14  inter2            7854 non-null   object 
 15  interaction       7854 non-null   object 
 16  region            7854 non-null   object 
 17  country           7854 non-null   object 
 18  admin1            7854 non-null   object 
 19  admin2            7854 non-null   object 
 20  admin3            7854 non-null   object 
 21  location          7854 non-null   object 
 22  latitude          7854 non-null   float32
 23  longitude         7854 non-null   float32
 24  geo_precision     7854 non-null   object 
 25  source            7854 non-null   object 
 26  source_scale      7854 non-null   object 
 27  notes             7854 non-null   object 
 28  fatalities        7854 non-null   float32
 29  timestamp         7854 non-null   object 
 30  iso3              7854 non-null   object 
dtypes: float32(3), object(28)
memory usage: 1.8+ MB
date_converter(uganda_conflict_df, 'event_date')

#check
uganda_conflict_df['event_date']
0      2022-07-28
1      2022-07-28
2      2022-07-27
3      2022-07-27
4      2022-07-27
          ...    
7849   1997-01-11
7850   1997-01-08
7851   1997-01-07
7852   1997-01-01
7853   1997-01-01
Name: event_date, Length: 7854, dtype: datetime64[ns]
uganda_conflict_df=pd.DataFrame(uganda_conflict_df.drop(columns=['time_precision', 'event_id_cnty','event_id_no_cnty',\
                                                              'geo_precision','timestamp','year',\
                                                             'iso','iso3', 'region','country']))

#check current columns
uganda_conflict_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7854 entries, 0 to 7853
Data columns (total 21 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   data_id         7854 non-null   object        
 1   event_date      7854 non-null   datetime64[ns]
 2   event_type      7854 non-null   object        
 3   sub_event_type  7854 non-null   object        
 4   actor1          7854 non-null   object        
 5   assoc_actor_1   7854 non-null   object        
 6   inter1          7854 non-null   object        
 7   actor2          7854 non-null   object        
 8   assoc_actor_2   7854 non-null   object        
 9   inter2          7854 non-null   object        
 10  interaction     7854 non-null   object        
 11  admin1          7854 non-null   object        
 12  admin2          7854 non-null   object        
 13  admin3          7854 non-null   object        
 14  location        7854 non-null   object        
 15  latitude        7854 non-null   float32       
 16  longitude       7854 non-null   float32       
 17  source          7854 non-null   object        
 18  source_scale    7854 non-null   object        
 19  notes           7854 non-null   object        
 20  fatalities      7854 non-null   float32       
dtypes: datetime64[ns](1), float32(3), object(17)
memory usage: 1.2+ MB
print(
    'admin1 in the conflict dataset should be clean_adm1 in the water dataset, check with:',
    uganda_conflict_df['admin1'].head(1)[0])
print(
    'admin2 in the conflict dataset should be clean_adm2 in the water dataset, check with:',
    uganda_conflict_df['admin2'].head(1)[0])
print(
    'admin3 in the conflict dataset should be clean_adm3 in the water dataset, check with:',
    uganda_conflict_df['admin3'].head(1)[0])
print(
    'location in the conflict dataset should be clean_adm4 in the water dataset, check with:',
    uganda_conflict_df['location'].head(1)[0])
admin1 in the conflict dataset should be clean_adm1 in the water dataset, check with: Northern
admin2 in the conflict dataset should be clean_adm2 in the water dataset, check with: Napak
admin3 in the conflict dataset should be clean_adm3 in the water dataset, check with: Bokora
location in the conflict dataset should be clean_adm4 in the water dataset, check with: Kalokengel
uganda_conflict_df.rename(columns={
    'admin1': 'clean_adm1',
    'admin2': 'clean_adm2',
    'admin3': 'clean_adm3',
    'location': 'clean_adm4'
},
                          inplace=True)
print(uganda_conflict_df.isna().sum().sum()>0,\
      uganda_conflict_df.duplicated().sum()>0,\
      uganda_conflict_df.T.duplicated().sum()>0)
False False False
uganda_conflict_df.to_csv(data_filepath + 'ta_3_conflict_df_clean.csv')
Image(dictionary_filepath+"3A-Conflict-Dictionary.png")
uganda_conflict_df_upper = uganda_conflict_df.copy()
for col in['clean_adm1', 'clean_adm2', 'clean_adm3', 'clean_adm4']:
    uganda_conflict_df_upper[col] = uganda_conflict_df_upper[col].str.upper()

#export to cleaned dataset to csv
uganda_conflict_df_upper.to_csv(data_filepath + 'ta_3_conflict_df_clean_upper.csv')